Solr: Use STAX Parser to Read XML Response to Reduce Memory Usage

My Solr client talks with a proxy application which communicates with remote Solr Server to get data. 
In previous post, Solr: Use JSON(GSon) Streaming to Reduce Memory UsageI described the problem we faced, how to use JSON(GSon) Streaming, and also some other approaches to reduce memory usage. 

In post Solr: Use SAX Parser to Read XML Response to Reduce Memory Usage
I also described how to use SAX to parse response for better performance.

In this post, I will introduce how to use Stax Parser to parse XML response.
Implementation
The code to use Stax to read document one by one from http stream:
-- Use Stax parser and Java Executors Future to wait all thread finished: all docs imported.
private static ImportedResult handleXMLResponseViaStax(
      SolrQueryRequest request, InputStream in, int fetchSize)
      throws XMLStreamException {
    XMLInputFactory factory = XMLInputFactory.newInstance();
    XMLStreamReader reader = null;
    
    ImportedResult importedResult = new ImportedResult();
    List<Future<Void>> futures = new ArrayList<Future<Void>>();
    
    try {
      reader = factory.createXMLStreamReader(in);
      int fetchedSize = 0;
      int numFound = -1, start = -1;
      
      while (reader.hasNext()) {
        int event = reader.next();
        switch (event) {
          case XMLStreamConstants.START_ELEMENT: {
            if ("result".equals(reader.getLocalName())) {
              numFound = Integer.valueOf(reader.getAttributeValue("",
                  "numFound"));
            } else if ("start".equals(reader.getLocalName())) {
              start = Integer.valueOf(reader.getAttributeValue("", "start"));
            } else if ("doc".equals(reader.getLocalName())) {
              ++fetchedSize;
              futures.add(readOneDoc(request, reader));
            }
            break;
          }
          default:
            break;
        }
        
      }
      importedResult.setFetched(fetchedSize);
      importedResult.setHasMore((fetchedSize + start) < numFound);
      importedResult.setImportedData((fetchedSize != 0));
      return importedResult;
    } finally {
      if (reader != null) {
        reader.close();
      }
    }
  }
  
  private static Future<Void> readOneDoc(SolrQueryRequest request,
      XMLStreamReader reader) throws XMLStreamException {
    String contentid = null, bindoc = null;
    OUTER: while (reader.hasNext()) {
      int event = reader.next();
      INNER: switch (event) {
        case XMLStreamConstants.START_ELEMENT: {
          if ("str".equals(reader.getLocalName())) {
            
            String fieldName = reader.getAttributeValue(0);
            if ("contentid".equals(fieldName)) {
              contentid = reader.getElementText();
            } else if ("bindoc".equals(fieldName)) {
              bindoc = reader.getElementText();
            }
          }
          break INNER;
        }
        case XMLStreamReader.END_ELEMENT: {
          if ("doc".equals(reader.getLocalName())) {
            break OUTER;
          }
        }
        default:
          break;
      }
    }
    return CVSyncDataImporter.getInstance().importData(request, contentid,
        bindoc);
  }
    
Resources
Parsing XML using DOM, SAX and StAX Parser in Java
Java SAX vs. StAX
Post a Comment

Labels

Java (159) Lucene-Solr (112) Interview (61) All (58) J2SE (53) Algorithm (45) Soft Skills (38) Eclipse (33) Code Example (31) Linux (25) JavaScript (23) Spring (22) Windows (22) Web Development (20) Tools (19) Nutch2 (18) Bugs (17) Debug (16) Defects (14) Text Mining (14) J2EE (13) Network (13) Troubleshooting (13) PowerShell (11) Chrome (9) Design (9) How to (9) Learning code (9) Performance (9) Problem Solving (9) UIMA (9) html (9) Http Client (8) Maven (8) Security (8) bat (8) blogger (8) Big Data (7) Continuous Integration (7) Google (7) Guava (7) JSON (7) Shell (7) ANT (6) Coding Skills (6) Database (6) Lesson Learned (6) Programmer Skills (6) Scala (6) Tips (6) css (6) Algorithm Series (5) Cache (5) Dynamic Languages (5) IDE (5) System Design (5) adsense (5) xml (5) AIX (4) Code Quality (4) GAE (4) Git (4) Good Programming Practices (4) Jackson (4) Memory Usage (4) Miscs (4) OpenNLP (4) Project Managment (4) Spark (4) Testing (4) ads (4) regular-expression (4) Android (3) Apache Spark (3) Become a Better You (3) Concurrency (3) Eclipse RCP (3) English (3) Happy Hacking (3) IBM (3) J2SE Knowledge Series (3) JAX-RS (3) Jetty (3) Restful Web Service (3) Script (3) regex (3) seo (3) .Net (2) Android Studio (2) Apache (2) Apache Procrun (2) Architecture (2) Batch (2) Bit Operation (2) Build (2) Building Scalable Web Sites (2) C# (2) C/C++ (2) CSV (2) Career (2) Cassandra (2) Distributed (2) Fiddler (2) Firefox (2) Google Drive (2) Gson (2) How to Interview (2) Html Parser (2) Http (2) Image Tools (2) JQuery (2) Jersey (2) LDAP (2) Life (2) Logging (2) Python (2) Software Issues (2) Storage (2) Text Search (2) xml parser (2) AOP (1) Application Design (1) AspectJ (1) Chrome DevTools (1) Cloud (1) Codility (1) Data Mining (1) Data Structure (1) ExceptionUtils (1) Exif (1) Feature Request (1) FindBugs (1) Greasemonkey (1) HTML5 (1) Httpd (1) I18N (1) IBM Java Thread Dump Analyzer (1) JDK Source Code (1) JDK8 (1) JMX (1) Lazy Developer (1) Mac (1) Machine Learning (1) Mobile (1) My Plan for 2010 (1) Netbeans (1) Notes (1) Operating System (1) Perl (1) Problems (1) Product Architecture (1) Programming Life (1) Quality (1) Redhat (1) Redis (1) Review (1) RxJava (1) Solutions logs (1) Team Management (1) Thread Dump Analyzer (1) Visualization (1) boilerpipe (1) htm (1) ongoing (1) procrun (1) rss (1)

Popular Posts