Using Fiddler and Eclipse to Trouble Shooting: The entity name must immediately follow the '&'


The Problem
One client application sent data to our custom Solr Handler, it failed with the following exception:

The entity name must immediately follow the '&' in the entity reference.org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 70; The entity name must immediately follow the '&' in the entity reference.
at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:256)
at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:345)
at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:121)
at XXX.ExtendUpdateRequestHandler.handleRequestBody(ExtendUpdateRequestHandler.java:200)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1821)
Use Fiddler to Capture Request and Replay in development Machine
It seems that it's caused by the invalid character in XML. But as client application put everything in CDATA, why this problem still happened?

To figure out the real cause and the fix, we sat together and reproduced the problem: we used fiddler to capture the request client application sent: the request looks like below:
Content-Type: application/x-www-form-urlencoded
stream.body=<![CDATA[2033\Test%25%26!@Test_]]><![CDATA[2033\Test%25%26!@Test_]]>

Now I can reproduce the problem in my development machine: 
click on the captured request, select Replay -> Reissue from Composer, then change the url to my local development machine.
Using Eclipse Display View to Print the Complete Stack Trace
Look at the code, com.sun.org.apache.xerces.internal.parsers.DOMParser.parse, seems it suppress the underlying exception:
public void parse(InputSource inputSource)  throws SAXException, IOException
{
  try
  {
    parse(xmlInputSource);
  }
  catch (XMLParseException e)
  {
    Exception ex = e.getException();
    if (ex == null)
    {
      LocatorImpl locatorImpl = new LocatorImpl();
      locatorImpl.setPublicId(e.getPublicId());
      locatorImpl.setSystemId(e.getExpandedSystemId());
      locatorImpl.setLineNumber(e.getLineNumber());
      locatorImpl.setColumnNumber(e.getColumnNumber());
      throw new SAXParseException(e.getMessage(), locatorImpl); // throws exception from here
    }
  }
}
I would like to view the complete stack trace.


So I attached remote debug, added a breakpoint before the throw exception. Replay the request in the Composer, it stops at the breakpoint.

Go to Eclipse Display view, type the following java code in Display view, select all lines and execute them.
java.io.Writer result = new java.io.StringWriter();
java.io.PrintWriter printWriter = new java.io.PrintWriter(result);
e.printStackTrace(printWriter);
result.toString();
It prints the exception stack in Display view:
(java.lang.String) ::::1:616:615:The entity name must immediately follow the '&' in the entity reference.
at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:417)
at com.sun.org.apache.xerces.internal.impl.XMLScanner.scanAttributeValue(XMLScanner.java:837)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanAttribute(XMLDocumentFragmentScannerImpl.java:1551)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanStartElement(XMLDocumentFragmentScannerImpl.java:1324)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2768)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:606)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:511)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:846)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:775)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:123)
at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:242)
at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:345)
at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:121)
at XXX.ExtendUpdateRequestHandler.handleRequestBody(ExtendUpdateRequestHandler.java:200)

at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:762)

The root cause is much clearer now, the exception is caused by & in attribute. Now check the post data, we put element value in CDATA, but not attribute value.

The Solution
Actually we can't put use CDATA in attribute value, we have to convert these special set of characters to its entity name:
http://xml.silmaril.ie/specials.html

As we are using urlencoded-from to post data, we have to also url encoded the converted string.

Now, change the post data:
The origin data is: 2033\Test&!@Test, change xml special characters then url encode it, the final result is as below:
2033%5CTest%26amp%3B!%40Test

stream.body=<![CDATA[2033\Test%25%26!@Test_]]><![CDATA[2033\Test%25%26!@Test_]]>

Send the new urlencoded-post data in Fiddler Composer. 

Great, it works. 
Happy Debugging.

Labels

adsense (5) Algorithm (69) Algorithm Series (35) Android (7) ANT (6) bat (8) Big Data (7) Blogger (14) Bugs (6) Cache (5) Chrome (19) Code Example (29) Code Quality (7) Coding Skills (5) Database (7) Debug (16) Design (5) Dev Tips (63) Eclipse (32) Git (5) Google (33) Guava (7) How to (9) Http Client (8) IDE (7) Interview (88) J2EE (13) J2SE (49) Java (186) JavaScript (27) JSON (7) Learning code (9) Lesson Learned (6) Linux (26) Lucene-Solr (112) Mac (10) Maven (8) Network (9) Nutch2 (18) Performance (9) PowerShell (11) Problem Solving (11) Programmer Skills (6) regex (5) Scala (6) Security (9) Soft Skills (38) Spring (22) System Design (11) Testing (7) Text Mining (14) Tips (17) Tools (24) Troubleshooting (29) UIMA (9) Web Development (19) Windows (21) xml (5)