Java Regex Practice: Using Backreferences and Named Group

Recently, I need write one regular expression to validate SSN for our UIMA related project.
I found this article Validating Social Security Numbers through Regular Expressions
^(?!219-09-9999|078-05-1120)(?!666|000|9\d{2})\d{3}-(?!00)\d{2}-(?!0{4})\d{4}$
^(?!219099999|078051120)(?!666|000|9\d{2})\d{3}(?!00)\d{2}(?!0{4})\d{4}$

The regex I need is a little different: the ssn must be separated by dash or one space.

Using Backreferences
The first separator can be dash or space, but the second separator has to be same. This is exactly where we should use backreferences, which repeats the text previous pattern matches.
(?!219-09-9999|078-05-1120)(?!666|000|9\\d{2})\\d{3}([- ])(?!00)\\d{2}\\1(?!0{4})(\\d{4})
\\1 here is to repeat the previous text matched bt pattern: ([- ])

Using Named Group
When the regex is complex, it's hard to figure out what \\1 refers to. Also we may change the regex in future, and may point \\1 to a different regex pattern by mistake.

We can use named group feature which is added in Java 7 to improve the regex's readability.
(1) (?X<NAME>) to define a named group NAME"  
(2) \\k<NAME>to backref a named group "NAME" 
(3) <$<NAME>to reference to captured group in matcher's replacement str 
(4) group(String NAME) to return the captured input subsequence by the given "named group"

The final regex is like below:
(?!219-09-9999|078-05-1120)(?!666|000|9\\d{2})\\d{3}(?<SEP>[- ])(?!00)\\d{2}\\k<SEP>(?!0{4})(\\d{4})
(?<SEP>[- ]) gives the pattern a name: SEP, \\k<SEP> refers to it.

Java Code
@Test
  public void regexBackReference() {
    String pStr = "(?!219-09-9999|078-05-1120)(?!666|000|9\\d{2})\\d{3}([- ])(?!00)\\d{2}\\1(?!0{4})(\\d{4})";
    testSSNRegex(pStr);
  }
  
  @Test
  public void regexBackNamedGroupReference() {
    String pStr = "(?!219-09-9999|078-05-1120)(?!666|000|9\\d{2})\\d{3}(?<SEP>[- ])(?!00)\\d{2}\\k<SEP>(?!0{4})(\\d{4})";
    testSSNRegex(pStr);
  }
  
  public void testSSNRegex(String pStr) {
    Matcher m = Pattern.compile(pStr).matcher("123 45 6789");
    if (m.matches()) {
      System.out.println(m.group());
    } else {
      fail("Didn't find match");
    }
    m = Pattern.compile(pStr).matcher("123-45-6789");
    if (m.matches()) {
      System.out.println(m.group());
    } else {
      fail("Didn't find match");
    }
    
    m = Pattern.compile(pStr).matcher("123-456789");
    if (m.matches()) {
      fail("Should not find match");
    }
  }
References
Validating Social Security Numbers through Regular Expressions
Named Capturing Group in JDK7 RegEx
Post a Comment

Labels

Java (159) Lucene-Solr (110) All (60) Interview (59) J2SE (53) Algorithm (37) Eclipse (35) Soft Skills (35) Code Example (31) Linux (26) JavaScript (23) Spring (22) Windows (22) Web Development (20) Tools (19) Nutch2 (18) Bugs (17) Debug (15) Defects (14) Text Mining (14) J2EE (13) Network (13) PowerShell (11) Chrome (9) Continuous Integration (9) How to (9) Learning code (9) Performance (9) UIMA (9) html (9) Design (8) Dynamic Languages (8) Http Client (8) Maven (8) Security (8) Trouble Shooting (8) bat (8) blogger (8) Big Data (7) Google (7) Guava (7) JSON (7) Problem Solving (7) ANT (6) Coding Skills (6) Database (6) Scala (6) Shell (6) css (6) Algorithm Series (5) Cache (5) IDE (5) Lesson Learned (5) Miscs (5) Programmer Skills (5) System Design (5) Tips (5) adsense (5) xml (5) AIX (4) Code Quality (4) GAE (4) Git (4) Good Programming Practices (4) Jackson (4) Memory Usage (4) OpenNLP (4) Project Managment (4) Python (4) Spark (4) Testing (4) ads (4) regular-expression (4) Android (3) Apache Spark (3) Become a Better You (3) Concurrency (3) Eclipse RCP (3) English (3) Firefox (3) Happy Hacking (3) IBM (3) J2SE Knowledge Series (3) JAX-RS (3) Jetty (3) Restful Web Service (3) Script (3) regex (3) seo (3) .Net (2) Android Studio (2) Apache (2) Apache Procrun (2) Architecture (2) Batch (2) Build (2) Building Scalable Web Sites (2) C# (2) C/C++ (2) CSV (2) Career (2) Cassandra (2) Distributed (2) Fiddler (2) Google Drive (2) Gson (2) Html Parser (2) Http (2) Image Tools (2) JQuery (2) Jersey (2) LDAP (2) Life (2) Logging (2) Software Issues (2) Storage (2) Text Search (2) xml parser (2) AOP (1) Application Design (1) AspectJ (1) Bit Operation (1) Chrome DevTools (1) Cloud (1) Codility (1) Data Mining (1) Data Structure (1) ExceptionUtils (1) Exif (1) Feature Request (1) FindBugs (1) Greasemonkey (1) HTML5 (1) Httpd (1) I18N (1) IBM Java Thread Dump Analyzer (1) JDK Source Code (1) JDK8 (1) JMX (1) Lazy Developer (1) Mac (1) Machine Learning (1) Mobile (1) My Plan for 2010 (1) Netbeans (1) Notes (1) Operating System (1) Perl (1) Problems (1) Product Architecture (1) Programming Life (1) Quality (1) Redhat (1) Redis (1) Review (1) RxJava (1) Solutions logs (1) Team Management (1) Thread Dump Analyzer (1) Troubleshooting (1) Visualization (1) boilerpipe (1) htm (1) ongoing (1) procrun (1) rss (1)

Popular Posts