We are importing XML(CSV) data via curl Get request, in order to make it work, we need handle escape special characters: XML special Characters and URL special characters.
We need first escape XML special characters: & < > " ' to: & < > " '. In code, we can use org.apache.commons.lang.StringEscapeUtils.escapeXml(String).
Then we use code java.net.URLEncoder.encode(String, String) to escape URL special characters, especially $ & + , / : ; = ? @.
URLEncoder.encode will also convert new line feed(\r\n) to %0D%0A.
For example if filed content includes the following 2-lines data:
xml sepcail: & < > " '
url sepcail: $ & + , / : ; = ? @
The Curl Get request to import the data would be like below:
http://localhost:8080/solr/update?stream.body=<add><doc><field name="id">id1</field><field name="content">xml+sepcail%3A+%26amp%3B+%26lt%3B+%26gt%3B+%26quot%3B+%26apos%3B%0D%0Aurl+sepcail%3A+%24+%26amp%3B+%2B+%2C+%2F+%3A+%3B+%3D+%3F+%40</field></doc></add>&commit=true
Code to convert the XML field data
We can know that we need escape(add \) the following special character for query string: \, +, -, !, (, ), :, ^, [, ], \, {, }, ~, *, ?, |, &, ;, /, or whitespace.
Resources
Online XML Escape
Online URL Encoder/Decoder
RFC 1738: Uniform Resource Locators (URL) specification
http://www.xmlnews.org/docs/xml-basics.html
We need first escape XML special characters: & < > " ' to: & < > " '. In code, we can use org.apache.commons.lang.StringEscapeUtils.escapeXml(String).
Then we use code java.net.URLEncoder.encode(String, String) to escape URL special characters, especially $ & + , / : ; = ? @.
URLEncoder.encode will also convert new line feed(\r\n) to %0D%0A.
For example if filed content includes the following 2-lines data:
xml sepcail: & < > " '
url sepcail: $ & + , / : ; = ? @
The Curl Get request to import the data would be like below:
http://localhost:8080/solr/update?stream.body=<add><doc><field name="id">id1</field><field name="content">xml+sepcail%3A+%26amp%3B+%26lt%3B+%26gt%3B+%26quot%3B+%26apos%3B%0D%0Aurl+sepcail%3A+%24+%26amp%3B+%2B+%2C+%2F+%3A+%3B+%3D+%3F+%40</field></doc></add>&commit=true
Code to convert the XML field data
private String escapleXMLEncodeUrl(String str) throws UnsupportedEncodingException { String result= URLEncoder.encode(StringEscapeUtils.escapeXml(str), "UTF-8"); return result; }From org.apache.solr.client.solrj.util.ClientUtils.escapeQueryChars
We can know that we need escape(add \) the following special character for query string: \, +, -, !, (, ), :, ^, [, ], \, {, }, ~, *, ?, |, &, ;, /, or whitespace.
Resources
Online XML Escape
Online URL Encoder/Decoder
RFC 1738: Uniform Resource Locators (URL) specification
http://www.xmlnews.org/docs/xml-basics.html