Java Regex Practice: Using Backreferences and Named Group


Recently, I need write one regular expression to validate SSN for our UIMA related project.
I found this article Validating Social Security Numbers through Regular Expressions
^(?!219-09-9999|078-05-1120)(?!666|000|9\d{2})\d{3}-(?!00)\d{2}-(?!0{4})\d{4}$
^(?!219099999|078051120)(?!666|000|9\d{2})\d{3}(?!00)\d{2}(?!0{4})\d{4}$

The regex I need is a little different: the ssn must be separated by dash or one space.

Using Backreferences
The first separator can be dash or space, but the second separator has to be same. This is exactly where we should use backreferences, which repeats the text previous pattern matches.
(?!219-09-9999|078-05-1120)(?!666|000|9\\d{2})\\d{3}([- ])(?!00)\\d{2}\\1(?!0{4})(\\d{4})
\\1 here is to repeat the previous text matched bt pattern: ([- ])

Using Named Group
When the regex is complex, it's hard to figure out what \\1 refers to. Also we may change the regex in future, and may point \\1 to a different regex pattern by mistake.

We can use named group feature which is added in Java 7 to improve the regex's readability.
(1) (?X<NAME>) to define a named group NAME"  
(2) \\k<NAME>to backref a named group "NAME" 
(3) <$<NAME>to reference to captured group in matcher's replacement str 
(4) group(String NAME) to return the captured input subsequence by the given "named group"

The final regex is like below:
(?!219-09-9999|078-05-1120)(?!666|000|9\\d{2})\\d{3}(?<SEP>[- ])(?!00)\\d{2}\\k<SEP>(?!0{4})(\\d{4})
(?<SEP>[- ]) gives the pattern a name: SEP, \\k<SEP> refers to it.

Java Code
@Test
  public void regexBackReference() {
    String pStr = "(?!219-09-9999|078-05-1120)(?!666|000|9\\d{2})\\d{3}([- ])(?!00)\\d{2}\\1(?!0{4})(\\d{4})";
    testSSNRegex(pStr);
  }
  
  @Test
  public void regexBackNamedGroupReference() {
    String pStr = "(?!219-09-9999|078-05-1120)(?!666|000|9\\d{2})\\d{3}(?<SEP>[- ])(?!00)\\d{2}\\k<SEP>(?!0{4})(\\d{4})";
    testSSNRegex(pStr);
  }
  
  public void testSSNRegex(String pStr) {
    Matcher m = Pattern.compile(pStr).matcher("123 45 6789");
    if (m.matches()) {
      System.out.println(m.group());
    } else {
      fail("Didn't find match");
    }
    m = Pattern.compile(pStr).matcher("123-45-6789");
    if (m.matches()) {
      System.out.println(m.group());
    } else {
      fail("Didn't find match");
    }
    
    m = Pattern.compile(pStr).matcher("123-456789");
    if (m.matches()) {
      fail("Should not find match");
    }
  }
References
Validating Social Security Numbers through Regular Expressions
Named Capturing Group in JDK7 RegEx

Labels

adsense (5) Algorithm (69) Algorithm Series (35) Android (7) ANT (6) bat (8) Big Data (7) Blogger (14) Bugs (6) Cache (5) Chrome (19) Code Example (29) Code Quality (7) Coding Skills (5) Database (7) Debug (16) Design (5) Dev Tips (63) Eclipse (32) Git (5) Google (33) Guava (7) How to (9) Http Client (8) IDE (7) Interview (88) J2EE (13) J2SE (49) Java (186) JavaScript (27) JSON (7) Learning code (9) Lesson Learned (6) Linux (26) Lucene-Solr (112) Mac (10) Maven (8) Network (9) Nutch2 (18) Performance (9) PowerShell (11) Problem Solving (11) Programmer Skills (6) regex (5) Scala (6) Security (9) Soft Skills (38) Spring (22) System Design (11) Testing (7) Text Mining (14) Tips (17) Tools (24) Troubleshooting (29) UIMA (9) Web Development (19) Windows (21) xml (5)