Solr: Escape Special Character when Import Data

We are importing XML(CSV) data via curl Get request, in order to make it work, we need handle escape special characters: XML special Characters and URL special characters.

We need first escape XML special characters: & < > " ' to: & < > " '. In code, we can use org.apache.commons.lang.StringEscapeUtils.escapeXml(String).

Then we use code, String) to escape URL special characters, especially $ & + , / : ; = ? @.
URLEncoder.encode will also convert new line feed(\r\n) to %0D%0A.

For example if filed content includes the following 2-lines data:
xml sepcail: & < > " '
url sepcail: $ & + , / : ; = ? @

The Curl Get request to import the data would be like below:
http://localhost:8080/solr/update?stream.body=<add><doc><field name="id">id1</field><field name="content">xml+sepcail%3A+%26amp%3B+%26lt%3B+%26gt%3B+%26quot%3B+%26apos%3B%0D%0Aurl+sepcail%3A+%24+%26amp%3B+%2B+%2C+%2F+%3A+%3B+%3D+%3F+%40</field></doc></add>&commit=true
Code to convert the XML field data
private String escapleXMLEncodeUrl(String str)
  throws UnsupportedEncodingException {
 String result= URLEncoder.encode(StringEscapeUtils.escapeXml(str), "UTF-8");
 return result;
From org.apache.solr.client.solrj.util.ClientUtils.escapeQueryChars
We can know that we need escape(add \) the following special character for query string: \, +, -, !, (, ), :, ^, [, ], \, {, }, ~, *, ?, |, &, ;, /, or whitespace.
Online XML Escape
Online URL Encoder/Decoder
RFC 1738: Uniform Resource Locators (URL) specification


ANT (6) Algorithm (69) Algorithm Series (35) Android (7) Big Data (7) Blogger (14) Bugs (6) Cache (5) Chrome (19) Code Example (29) Code Quality (7) Coding Skills (5) Database (7) Debug (16) Design (5) Dev Tips (63) Eclipse (32) Git (5) Google (33) Guava (7) How to (9) Http Client (8) IDE (7) Interview (88) J2EE (13) J2SE (49) JSON (7) Java (186) JavaScript (27) Learning code (9) Lesson Learned (6) Linux (26) Lucene-Solr (112) Mac (10) Maven (8) Network (9) Nutch2 (18) Performance (9) PowerShell (11) Problem Solving (11) Programmer Skills (6) Scala (6) Security (9) Soft Skills (38) Spring (22) System Design (11) Testing (7) Text Mining (14) Tips (17) Tools (24) Troubleshooting (29) UIMA (9) Web Development (19) Windows (21) adsense (5) bat (8) regex (5) xml (5)