Solr: Use SAX Parser to Read XML Response to Reduce Memory Usage


My Solr client talks with a proxy application which talks with remote Solr Server to get data. 
In previous post, Solr: Use JSON(GSon) Streaming to Reduce Memory Usage

I described the problem we faced, how to use JSON(GSon) Streaming, and also some other approaches to reduce memory usage. In this post I will use XML SAX Parser to iterative xml response stream. In next post I will introduce how to use Stax Parser to parse XML response.
Implementation
The code to use SAX to read document one by one from http stream:
-- Use SAX parser and Java Executors Future to wait all thread finished: all docs imported.
private static ImportedResult handleXMLResponseViaSax(
      SolrQueryRequest request, InputStream in, int fetchSize)
      throws IOException, ParserConfigurationException, SAXException {
 
    ImportedResult importedResult = new ImportedResult();
    SAXParserFactory parserFactor = SAXParserFactory.newInstance();
    SAXParser parser = parserFactor.newSAXParser();
    SolrResponseHandler handler = new SolrResponseHandler(request);
    parser.parse(in, handler);
    
    importedResult.setFetched(handler.fetchedSize);
    importedResult
        .setHasMore((handler.fetchedSize + handler.start) < handler.numFound);
    importedResult.setImportedData((handler.fetchedSize != 0));    
    return importedResult;
  }
  
  private static class SolrResponseHandler extends DefaultHandler {
    protected int fetchedSize = 0;
    protected int numFound = -1, start = -1;
    protected String contentid, bindoc = null;
    protected List<Future<Void>> futures = new ArrayList<Future<Void>>();
    
    String curName, curValue;
    private SolrQueryRequest request;
    
    public SolrResponseHandler(SolrQueryRequest request) {
      this.request = request;
    }
    
    @Override
    public void startElement(String uri, String localName, String qName,
        Attributes attributes) throws SAXException {
      
      switch (qName) {
        case "result": {
          numFound = Integer.valueOf(attributes.getValue("numFound"));
          start = Integer.valueOf(attributes.getValue("start"));
          break;
        }
        case "str": {
          String name = attributes.getValue("name");
          if ("contentid".equals(name)) {
            curName = "contentid";
          } else if ("bindoc".equals(name)) {
            curName = "bindoc";
          }
          break;
        }
        default:
          break;
      }
    }
    
    @Override
    public void endElement(String uri, String localName, String qName)
        throws SAXException {
      switch (qName) {
        case "str": {
          if ("contentid".equals(curName)) {
            contentid = curValue;
          } else if ("bindoc".equals(curName)) {
            bindoc = curValue;
          }
          break;
        }
        case "doc": {
          ++fetchedSize;
          futures.add(CVSyncDataImporter.getInstance().importData(request,
              contentid, bindoc));
          break;
        }
        default:
          break;
      }
    }
    @Override
    public void characters(char[] ch, int start, int length)
        throws SAXException {
      curValue = String.copyValueOf(ch, start, length).trim();
    }
  }
    
Resources
Parsing XML using DOM, SAX and StAX Parser in Java
Java SAX vs. StAX

Labels

adsense (5) Algorithm (69) Algorithm Series (35) Android (4) ANT (6) bat (8) Become a Better You (4) Big Data (7) Blogger (14) Bugs (4) Cache (5) Chrome (17) Code Example (29) Code Quality (6) Coding Skills (5) Concurrency (4) Database (7) Debug (16) Design (5) Dev Tips (62) Eclipse (32) GAE (4) Git (5) Good Programming Practices (4) Google (27) Guava (7) How to (9) Http Client (8) IDE (6) Interview (88) J2EE (13) J2SE (49) Jackson (4) Java (177) JavaScript (27) JSON (7) Learning code (9) Lesson Learned (6) Linux (22) Lucene-Solr (112) Mac (10) Maven (8) Memory Usage (4) Network (9) Nutch2 (18) OpenNLP (4) Performance (9) PowerShell (11) Problem Solving (11) Programmer Skills (6) regex (5) Review (4) Scala (6) Security (9) Soft Skills (38) Spark (4) Spring (22) System Design (11) Testing (6) Text Mining (14) Tips (12) Tools (24) Troubleshooting (29) UIMA (9) Web Development (19) Windows (21) xml (5)

Trending