Solr: Use STAX Parser to Read XML Response to Reduce Memory Usage


My Solr client talks with a proxy application which communicates with remote Solr Server to get data. 
In previous post, Solr: Use JSON(GSon) Streaming to Reduce Memory UsageI described the problem we faced, how to use JSON(GSon) Streaming, and also some other approaches to reduce memory usage. 

In post Solr: Use SAX Parser to Read XML Response to Reduce Memory Usage
I also described how to use SAX to parse response for better performance.

In this post, I will introduce how to use Stax Parser to parse XML response.
Implementation
The code to use Stax to read document one by one from http stream:
-- Use Stax parser and Java Executors Future to wait all thread finished: all docs imported.
private static ImportedResult handleXMLResponseViaStax(
      SolrQueryRequest request, InputStream in, int fetchSize)
      throws XMLStreamException {
    XMLInputFactory factory = XMLInputFactory.newInstance();
    XMLStreamReader reader = null;
    
    ImportedResult importedResult = new ImportedResult();
    List<Future<Void>> futures = new ArrayList<Future<Void>>();
    
    try {
      reader = factory.createXMLStreamReader(in);
      int fetchedSize = 0;
      int numFound = -1, start = -1;
      
      while (reader.hasNext()) {
        int event = reader.next();
        switch (event) {
          case XMLStreamConstants.START_ELEMENT: {
            if ("result".equals(reader.getLocalName())) {
              numFound = Integer.valueOf(reader.getAttributeValue("",
                  "numFound"));
            } else if ("start".equals(reader.getLocalName())) {
              start = Integer.valueOf(reader.getAttributeValue("", "start"));
            } else if ("doc".equals(reader.getLocalName())) {
              ++fetchedSize;
              futures.add(readOneDoc(request, reader));
            }
            break;
          }
          default:
            break;
        }
        
      }
      importedResult.setFetched(fetchedSize);
      importedResult.setHasMore((fetchedSize + start) < numFound);
      importedResult.setImportedData((fetchedSize != 0));
      return importedResult;
    } finally {
      if (reader != null) {
        reader.close();
      }
    }
  }
  
  private static Future<Void> readOneDoc(SolrQueryRequest request,
      XMLStreamReader reader) throws XMLStreamException {
    String contentid = null, bindoc = null;
    OUTER: while (reader.hasNext()) {
      int event = reader.next();
      INNER: switch (event) {
        case XMLStreamConstants.START_ELEMENT: {
          if ("str".equals(reader.getLocalName())) {
            
            String fieldName = reader.getAttributeValue(0);
            if ("contentid".equals(fieldName)) {
              contentid = reader.getElementText();
            } else if ("bindoc".equals(fieldName)) {
              bindoc = reader.getElementText();
            }
          }
          break INNER;
        }
        case XMLStreamReader.END_ELEMENT: {
          if ("doc".equals(reader.getLocalName())) {
            break OUTER;
          }
        }
        default:
          break;
      }
    }
    return CVSyncDataImporter.getInstance().importData(request, contentid,
        bindoc);
  }
    
Resources
Parsing XML using DOM, SAX and StAX Parser in Java
Java SAX vs. StAX

Labels

adsense (5) Algorithm (69) Algorithm Series (35) Android (7) ANT (6) bat (8) Big Data (7) Blogger (14) Bugs (6) Cache (5) Chrome (19) Code Example (29) Code Quality (7) Coding Skills (5) Database (7) Debug (16) Design (5) Dev Tips (63) Eclipse (32) Git (5) Google (33) Guava (7) How to (9) Http Client (8) IDE (7) Interview (88) J2EE (13) J2SE (49) Java (186) JavaScript (27) JSON (7) Learning code (9) Lesson Learned (6) Linux (26) Lucene-Solr (112) Mac (10) Maven (8) Network (9) Nutch2 (18) Performance (9) PowerShell (11) Problem Solving (11) Programmer Skills (6) regex (5) Scala (6) Security (9) Soft Skills (38) Spring (22) System Design (11) Testing (7) Text Mining (14) Tips (17) Tools (24) Troubleshooting (29) UIMA (9) Web Development (19) Windows (21) xml (5)