Examples we find on web are usually to import csv files into Solr, here I want to to show how to import CSV string into Solr.
Why we want to import CSV string?
1 Compared to importing csv files: importing csv string would be much faster, as no unnecessary IO: no need to write the csv files in client side and read it at server side.
2 Compared to importing XML String, importing csv string should be also faster:
It is faster than writing xml string in the client would and reading/parsing it in server side.
3. CSV string is usually much smaller than XML, less time spend on network transfer, also can save a little bandwidth.
How to import CSV String into Solr?
1. Use stream.body to specify csv data, one thing we should pay attention, we have to use ASCII code %0D%0A to separate lines - use \r\r wouldn't work.
2. In the stream body, you have to escape special character, change “ to \”, \ to \\.
The following request would import 2 lines into Solr.
curl -d "stream.body=2,0,1,0,1,\"c:\\\",1,0,\"c:\",0,1,16 %0D%0A 2,0,1,0,1,\"x:\\\",2,0,\"x:\",0,1,16 &separator=,&fieldnames=omiited&literal.id=9000&stream.contentType=text/csv;charset=utf-8&commit=true" http://localhost:8080/solr/update/csv
In order to simplify client code, we can create a new requestHandler, which sets fieldnames, so client need not specify fieldnames in each request.
Solr Code
Why we want to import CSV string?
1 Compared to importing csv files: importing csv string would be much faster, as no unnecessary IO: no need to write the csv files in client side and read it at server side.
2 Compared to importing XML String, importing csv string should be also faster:
It is faster than writing xml string in the client would and reading/parsing it in server side.
3. CSV string is usually much smaller than XML, less time spend on network transfer, also can save a little bandwidth.
How to import CSV String into Solr?
1. Use stream.body to specify csv data, one thing we should pay attention, we have to use ASCII code %0D%0A to separate lines - use \r\r wouldn't work.
2. In the stream body, you have to escape special character, change “ to \”, \ to \\.
The following request would import 2 lines into Solr.
curl -d "stream.body=2,0,1,0,1,\"c:\\\",1,0,\"c:\",0,1,16 %0D%0A 2,0,1,0,1,\"x:\\\",2,0,\"x:\",0,1,16 &separator=,&fieldnames=omiited&literal.id=9000&stream.contentType=text/csv;charset=utf-8&commit=true" http://localhost:8080/solr/update/csv
In order to simplify client code, we can create a new requestHandler, which sets fieldnames, so client need not specify fieldnames in each request.
Solr Code
org.apache.solr.handler.loader.CSVLoaderBase.load(SolrQueryRequest, SolrQueryResponse, ContentStream, UpdateRequestProcessor) org.apache.solr.internal.csv.CSVParser getLine() nextToken(Token) private boolean isEndOfLine(int c) throws IOException { // check if we have \r\n... if (c == '\r') { if (in.lookAhead() == '\n') { // note: does not change c outside of this method !! c = in.read(); } } return (c == '\n'); }References:
UpdateCSV - Solr Wiki