Using Fiddler and Eclipse to Trouble Shooting: The entity name must immediately follow the '&'

The Problem
One client application sent data to our custom Solr Handler, it failed with the following exception:

The entity name must immediately follow the '&' in the entity reference.org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 70; The entity name must immediately follow the '&' in the entity reference.
at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:256)
at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:345)
at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:121)
at XXX.ExtendUpdateRequestHandler.handleRequestBody(ExtendUpdateRequestHandler.java:200)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1821)
Use Fiddler to Capture Request and Replay in development Machine
It seems that it's caused by the invalid character in XML. But as client application put everything in CDATA, why this problem still happened?

To figure out the real cause and the fix, we sat together and reproduced the problem: we used fiddler to capture the request client application sent: the request looks like below:
Content-Type: application/x-www-form-urlencoded
stream.body=<![CDATA[2033\Test%25%26!@Test_]]><![CDATA[2033\Test%25%26!@Test_]]>

Now I can reproduce the problem in my development machine: 
click on the captured request, select Replay -> Reissue from Composer, then change the url to my local development machine.
Using Eclipse Display View to Print the Complete Stack Trace
Look at the code, com.sun.org.apache.xerces.internal.parsers.DOMParser.parse, seems it suppress the underlying exception:
public void parse(InputSource inputSource)  throws SAXException, IOException
{
  try
  {
    parse(xmlInputSource);
  }
  catch (XMLParseException e)
  {
    Exception ex = e.getException();
    if (ex == null)
    {
      LocatorImpl locatorImpl = new LocatorImpl();
      locatorImpl.setPublicId(e.getPublicId());
      locatorImpl.setSystemId(e.getExpandedSystemId());
      locatorImpl.setLineNumber(e.getLineNumber());
      locatorImpl.setColumnNumber(e.getColumnNumber());
      throw new SAXParseException(e.getMessage(), locatorImpl); // throws exception from here
    }
  }
}
I would like to view the complete stack trace.


So I attached remote debug, added a breakpoint before the throw exception. Replay the request in the Composer, it stops at the breakpoint.

Go to Eclipse Display view, type the following java code in Display view, select all lines and execute them.
java.io.Writer result = new java.io.StringWriter();
java.io.PrintWriter printWriter = new java.io.PrintWriter(result);
e.printStackTrace(printWriter);
result.toString();
It prints the exception stack in Display view:
(java.lang.String) ::::1:616:615:The entity name must immediately follow the '&' in the entity reference.
at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:417)
at com.sun.org.apache.xerces.internal.impl.XMLScanner.scanAttributeValue(XMLScanner.java:837)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanAttribute(XMLDocumentFragmentScannerImpl.java:1551)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanStartElement(XMLDocumentFragmentScannerImpl.java:1324)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2768)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:606)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:511)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:846)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:775)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:123)
at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:242)
at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:345)
at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:121)
at XXX.ExtendUpdateRequestHandler.handleRequestBody(ExtendUpdateRequestHandler.java:200)

at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:762)

The root cause is much clearer now, the exception is caused by & in attribute. Now check the post data, we put element value in CDATA, but not attribute value.

The Solution
Actually we can't put use CDATA in attribute value, we have to convert these special set of characters to its entity name:
http://xml.silmaril.ie/specials.html

As we are using urlencoded-from to post data, we have to also url encoded the converted string.

Now, change the post data:
The origin data is: 2033\Test&!@Test, change xml special characters then url encode it, the final result is as below:
2033%5CTest%26amp%3B!%40Test

stream.body=<![CDATA[2033\Test%25%26!@Test_]]><![CDATA[2033\Test%25%26!@Test_]]>

Send the new urlencoded-post data in Fiddler Composer. 

Great, it works. 
Happy Debugging.
Post a Comment

Labels

Java (159) Lucene-Solr (110) All (58) Interview (58) J2SE (53) Algorithm (41) Soft Skills (36) Eclipse (34) Code Example (31) Linux (25) JavaScript (23) Spring (22) Windows (22) Web Development (20) Nutch2 (18) Tools (18) Bugs (17) Debug (15) Defects (14) Text Mining (14) J2EE (13) Network (13) PowerShell (11) Chrome (9) Design (9) How to (9) Learning code (9) Performance (9) UIMA (9) html (9) Continuous Integration (8) Dynamic Languages (8) Http Client (8) Maven (8) Security (8) Trouble Shooting (8) bat (8) blogger (8) Big Data (7) Google (7) Guava (7) JSON (7) Problem Solving (7) ANT (6) Coding Skills (6) Database (6) Scala (6) Shell (6) css (6) Algorithm Series (5) Cache (5) IDE (5) Lesson Learned (5) Programmer Skills (5) System Design (5) Tips (5) adsense (5) xml (5) AIX (4) Code Quality (4) GAE (4) Git (4) Good Programming Practices (4) Jackson (4) Memory Usage (4) Miscs (4) OpenNLP (4) Project Managment (4) Python (4) Spark (4) Testing (4) ads (4) regular-expression (4) Android (3) Apache Spark (3) Become a Better You (3) Concurrency (3) Eclipse RCP (3) English (3) Happy Hacking (3) IBM (3) J2SE Knowledge Series (3) JAX-RS (3) Jetty (3) Restful Web Service (3) Script (3) regex (3) seo (3) .Net (2) Android Studio (2) Apache (2) Apache Procrun (2) Architecture (2) Batch (2) Bit Operation (2) Build (2) Building Scalable Web Sites (2) C# (2) C/C++ (2) CSV (2) Career (2) Cassandra (2) Distributed (2) Fiddler (2) Firefox (2) Google Drive (2) Gson (2) Html Parser (2) Http (2) Image Tools (2) JQuery (2) Jersey (2) LDAP (2) Life (2) Logging (2) Software Issues (2) Storage (2) Text Search (2) xml parser (2) AOP (1) Application Design (1) AspectJ (1) Chrome DevTools (1) Cloud (1) Codility (1) Data Mining (1) Data Structure (1) ExceptionUtils (1) Exif (1) Feature Request (1) FindBugs (1) Greasemonkey (1) HTML5 (1) Httpd (1) I18N (1) IBM Java Thread Dump Analyzer (1) JDK Source Code (1) JDK8 (1) JMX (1) Lazy Developer (1) Mac (1) Machine Learning (1) Mobile (1) My Plan for 2010 (1) Netbeans (1) Notes (1) Operating System (1) Perl (1) Problems (1) Product Architecture (1) Programming Life (1) Quality (1) Redhat (1) Redis (1) Review (1) RxJava (1) Solutions logs (1) Team Management (1) Thread Dump Analyzer (1) Troubleshooting (1) Visualization (1) boilerpipe (1) htm (1) ongoing (1) procrun (1) rss (1)

Popular Posts