Learning Solr: How to import CSV String into Solr


Examples we find on web are usually to import csv files into Solr, here I want to to show how to import CSV string into Solr.
Why we want to import CSV string?
1 Compared to importing csv files: importing csv string would be much faster, as no unnecessary IO: no need to write the csv files in client side and read it at server side.
2 Compared to importing XML String, importing csv string should be also faster: 
It is faster than writing xml string in the client would and reading/parsing it in server side.

3. CSV string is usually much smaller than XML, less time spend on network transfer, also can save a little bandwidth.
How to import CSV String into Solr?
1. Use stream.body to specify csv data, one thing we should pay attention, we have to use ASCII code %0D%0A to separate lines - use \r\r wouldn't work.
2. In the stream body, you have to escape special character, change “ to \”, \ to \\.

The following request would import 2 lines into Solr.
curl -d "stream.body=2,0,1,0,1,\"c:\\\",1,0,\"c:\",0,1,16 %0D%0A 2,0,1,0,1,\"x:\\\",2,0,\"x:\",0,1,16 &separator=,&fieldnames=omiited&literal.id=9000&stream.contentType=text/csv;charset=utf-8&commit=true" http://localhost:8080/solr/update/csv

In order to simplify client code, we can create a new requestHandler, which sets fieldnames, so client need not specify fieldnames in each request. 
Solr Code

org.apache.solr.handler.loader.CSVLoaderBase.load(SolrQueryRequest, SolrQueryResponse, ContentStream, UpdateRequestProcessor)
org.apache.solr.internal.csv.CSVParser
getLine()
nextToken(Token)
  private boolean isEndOfLine(int c) throws IOException {
    // check if we have \r\n...
    if (c == '\r') {
      if (in.lookAhead() == '\n') {
        // note: does not change c outside of this method !!
        c = in.read();
      }
    }
    return (c == '\n');
  }
References:
UpdateCSV - Solr Wiki

Labels

adsense (5) Algorithm (69) Algorithm Series (35) Android (7) ANT (6) bat (8) Big Data (7) Blogger (14) Bugs (6) Cache (5) Chrome (19) Code Example (29) Code Quality (7) Coding Skills (5) Database (7) Debug (16) Design (5) Dev Tips (63) Eclipse (32) Git (5) Google (33) Guava (7) How to (9) Http Client (8) IDE (7) Interview (88) J2EE (13) J2SE (49) Java (186) JavaScript (27) JSON (7) Learning code (9) Lesson Learned (6) Linux (26) Lucene-Solr (112) Mac (10) Maven (8) Network (9) Nutch2 (18) Performance (9) PowerShell (11) Problem Solving (11) Programmer Skills (6) regex (5) Scala (6) Security (9) Soft Skills (38) Spring (22) System Design (11) Testing (7) Text Mining (14) Tips (17) Tools (24) Troubleshooting (29) UIMA (9) Web Development (19) Windows (21) xml (5)