Get Start End Offset of Named Group in JDK7

The Problem
We want to know the start and end offset of named group, but Matcher start(), end() in JDK 7 doesn't accept group name as its parameter.

JDK7 adds the support of Named Group:
(1) (?<NAME>X) to define a named group NAME".
(2) \\k<Name> to backref a named group "NAME"                   
(3) <$<NAME> to reference to captured group in matcher's replacement str 

We can use matcher.group(String NAME) to return the captured input subsequence by the given "named group", but its start(), end() in matcher doesn't accept group name as its parameter.

The Solution
Check the JDK code, look at how mathcer.group(String name) is implemented:
public String group(String name) {
    if (name == null)
        throw new NullPointerException("Null group name");
    if (first < 0)
        throw new IllegalStateException("No match found");
    if (!parentPattern.namedGroups().containsKey(name))
        throw new IllegalArgumentException("No group with name <" + name + ">");
    int group = parentPattern.namedGroups().get(name);
    if ((groups[group*2] == -1) || (groups[group*2+1] == -1))
        return null;
    return getSubSequence(groups[group * 2], groups[group * 2 + 1]).toString();
}
It uses int group = parentPattern.namedGroups().get(name) to get the group position of the named group. Check the pattern code: its namedGroups is not public: it's package visible only.
Map<String, Integer> namedGroups() {
    if (namedGroups == null)
        namedGroups = new HashMap<>(2);
    return namedGroups;
}
We can't call it directly, but we can use Java reflection to call this package visible method.

public void testGetNamedGrpupPositionInJDK7() throws Exception {
  Pattern pattern = Pattern.compile("((?<capture>abc).*d)(ef)");
  Integer groupPos = getNamedGrpupPositionInJDK7(pattern, "capture");
  if (groupPos == null) {
    System.out
        .println("Doesn't contain named group: capture, the pattern: "
            + pattern.toString());
  }
  Matcher matcher = pattern.matcher("abcxxdef");
  while (matcher.find()) {
    String matchedText = matcher.group("capture");
    matchedText = matcher.group(groupPos);
    System.out.println(matchedText + " " + matcher.start(groupPos)
        + ":" + matcher.end(groupPos));
  }
}

@SuppressWarnings("unchecked")
// don't use int, it would throw NPE if the regex doesn't contain the named
// group
private Integer getNamedGrpupPositionInJDK7(Pattern pattern,
    String namedGroup) throws NoSuchMethodException,
    IllegalAccessException, InvocationTargetException {
  Method namedGroupsMethod = Pattern.class.getDeclaredMethod(
      "namedGroups", null);
  namedGroupsMethod.setAccessible(true);

  Map<String, Integer> namedGroups = (Map<String, Integer>) namedGroupsMethod
      .invoke(pattern, null);
  return namedGroups.get(namedGroup);
}
Get Start End Offset of Named Group in JDK8
JDK8 realized this problem and added APIs: start(String groupName), end(String groupName) to get start and end offset of named group.
public void testGetNamedGrpupPositionInJDK8() throws Exception {
  Pattern pattern = Pattern.compile("((?<capture>abc).*d)(ef)");
  Matcher matcher = pattern.matcher("abcxxdef");
  while (matcher.find()) {
    // if the regex doesn't contain the named group, it would throw
    // IllegalArgumentException: No group with name <capture>
    System.out.println(matcher.group("capture") + " "
        + matcher.start("capture") + ":" + matcher.end("capture"));
  }
}
References
Named Capturing Group in JDK7 RegEx
Post a Comment

Labels

Java (159) Lucene-Solr (110) All (58) Interview (58) J2SE (53) Algorithm (41) Soft Skills (36) Eclipse (34) Code Example (31) Linux (25) JavaScript (23) Spring (22) Windows (22) Web Development (20) Nutch2 (18) Tools (18) Bugs (17) Debug (15) Defects (14) Text Mining (14) J2EE (13) Network (13) PowerShell (11) Chrome (9) Design (9) How to (9) Learning code (9) Performance (9) UIMA (9) html (9) Continuous Integration (8) Dynamic Languages (8) Http Client (8) Maven (8) Security (8) Trouble Shooting (8) bat (8) blogger (8) Big Data (7) Google (7) Guava (7) JSON (7) Problem Solving (7) ANT (6) Coding Skills (6) Database (6) Scala (6) Shell (6) css (6) Algorithm Series (5) Cache (5) IDE (5) Lesson Learned (5) Programmer Skills (5) System Design (5) Tips (5) adsense (5) xml (5) AIX (4) Code Quality (4) GAE (4) Git (4) Good Programming Practices (4) Jackson (4) Memory Usage (4) Miscs (4) OpenNLP (4) Project Managment (4) Python (4) Spark (4) Testing (4) ads (4) regular-expression (4) Android (3) Apache Spark (3) Become a Better You (3) Concurrency (3) Eclipse RCP (3) English (3) Happy Hacking (3) IBM (3) J2SE Knowledge Series (3) JAX-RS (3) Jetty (3) Restful Web Service (3) Script (3) regex (3) seo (3) .Net (2) Android Studio (2) Apache (2) Apache Procrun (2) Architecture (2) Batch (2) Bit Operation (2) Build (2) Building Scalable Web Sites (2) C# (2) C/C++ (2) CSV (2) Career (2) Cassandra (2) Distributed (2) Fiddler (2) Firefox (2) Google Drive (2) Gson (2) Html Parser (2) Http (2) Image Tools (2) JQuery (2) Jersey (2) LDAP (2) Life (2) Logging (2) Software Issues (2) Storage (2) Text Search (2) xml parser (2) AOP (1) Application Design (1) AspectJ (1) Chrome DevTools (1) Cloud (1) Codility (1) Data Mining (1) Data Structure (1) ExceptionUtils (1) Exif (1) Feature Request (1) FindBugs (1) Greasemonkey (1) HTML5 (1) Httpd (1) I18N (1) IBM Java Thread Dump Analyzer (1) JDK Source Code (1) JDK8 (1) JMX (1) Lazy Developer (1) Mac (1) Machine Learning (1) Mobile (1) My Plan for 2010 (1) Netbeans (1) Notes (1) Operating System (1) Perl (1) Problems (1) Product Architecture (1) Programming Life (1) Quality (1) Redhat (1) Redis (1) Review (1) RxJava (1) Solutions logs (1) Team Management (1) Thread Dump Analyzer (1) Troubleshooting (1) Visualization (1) boilerpipe (1) htm (1) ongoing (1) procrun (1) rss (1)

Popular Posts