Solr: Use SAX Parser to Read XML Response to Reduce Memory Usage

My Solr client talks with a proxy application which talks with remote Solr Server to get data. 
In previous post, Solr: Use JSON(GSon) Streaming to Reduce Memory Usage

I described the problem we faced, how to use JSON(GSon) Streaming, and also some other approaches to reduce memory usage. In this post I will use XML SAX Parser to iterative xml response stream. In next post I will introduce how to use Stax Parser to parse XML response.
Implementation
The code to use SAX to read document one by one from http stream:
-- Use SAX parser and Java Executors Future to wait all thread finished: all docs imported.
private static ImportedResult handleXMLResponseViaSax(
      SolrQueryRequest request, InputStream in, int fetchSize)
      throws IOException, ParserConfigurationException, SAXException {
 
    ImportedResult importedResult = new ImportedResult();
    SAXParserFactory parserFactor = SAXParserFactory.newInstance();
    SAXParser parser = parserFactor.newSAXParser();
    SolrResponseHandler handler = new SolrResponseHandler(request);
    parser.parse(in, handler);
    
    importedResult.setFetched(handler.fetchedSize);
    importedResult
        .setHasMore((handler.fetchedSize + handler.start) < handler.numFound);
    importedResult.setImportedData((handler.fetchedSize != 0));    
    return importedResult;
  }
  
  private static class SolrResponseHandler extends DefaultHandler {
    protected int fetchedSize = 0;
    protected int numFound = -1, start = -1;
    protected String contentid, bindoc = null;
    protected List<Future<Void>> futures = new ArrayList<Future<Void>>();
    
    String curName, curValue;
    private SolrQueryRequest request;
    
    public SolrResponseHandler(SolrQueryRequest request) {
      this.request = request;
    }
    
    @Override
    public void startElement(String uri, String localName, String qName,
        Attributes attributes) throws SAXException {
      
      switch (qName) {
        case "result": {
          numFound = Integer.valueOf(attributes.getValue("numFound"));
          start = Integer.valueOf(attributes.getValue("start"));
          break;
        }
        case "str": {
          String name = attributes.getValue("name");
          if ("contentid".equals(name)) {
            curName = "contentid";
          } else if ("bindoc".equals(name)) {
            curName = "bindoc";
          }
          break;
        }
        default:
          break;
      }
    }
    
    @Override
    public void endElement(String uri, String localName, String qName)
        throws SAXException {
      switch (qName) {
        case "str": {
          if ("contentid".equals(curName)) {
            contentid = curValue;
          } else if ("bindoc".equals(curName)) {
            bindoc = curValue;
          }
          break;
        }
        case "doc": {
          ++fetchedSize;
          futures.add(CVSyncDataImporter.getInstance().importData(request,
              contentid, bindoc));
          break;
        }
        default:
          break;
      }
    }
    @Override
    public void characters(char[] ch, int start, int length)
        throws SAXException {
      curValue = String.copyValueOf(ch, start, length).trim();
    }
  }
    
Resources
Parsing XML using DOM, SAX and StAX Parser in Java
Java SAX vs. StAX
Post a Comment

Labels

Java (159) Lucene-Solr (111) Interview (61) All (58) J2SE (53) Algorithm (45) Soft Skills (37) Eclipse (33) Code Example (31) Linux (24) JavaScript (23) Spring (22) Windows (22) Web Development (20) Nutch2 (18) Tools (18) Bugs (17) Debug (16) Defects (14) Text Mining (14) J2EE (13) Network (13) Troubleshooting (13) PowerShell (11) Chrome (9) Design (9) How to (9) Learning code (9) Performance (9) Problem Solving (9) UIMA (9) html (9) Http Client (8) Maven (8) Security (8) bat (8) blogger (8) Big Data (7) Continuous Integration (7) Google (7) Guava (7) JSON (7) ANT (6) Coding Skills (6) Database (6) Scala (6) Shell (6) css (6) Algorithm Series (5) Cache (5) Dynamic Languages (5) IDE (5) Lesson Learned (5) Programmer Skills (5) System Design (5) Tips (5) adsense (5) xml (5) AIX (4) Code Quality (4) GAE (4) Git (4) Good Programming Practices (4) Jackson (4) Memory Usage (4) Miscs (4) OpenNLP (4) Project Managment (4) Spark (4) Testing (4) ads (4) regular-expression (4) Android (3) Apache Spark (3) Become a Better You (3) Concurrency (3) Eclipse RCP (3) English (3) Happy Hacking (3) IBM (3) J2SE Knowledge Series (3) JAX-RS (3) Jetty (3) Restful Web Service (3) Script (3) regex (3) seo (3) .Net (2) Android Studio (2) Apache (2) Apache Procrun (2) Architecture (2) Batch (2) Bit Operation (2) Build (2) Building Scalable Web Sites (2) C# (2) C/C++ (2) CSV (2) Career (2) Cassandra (2) Distributed (2) Fiddler (2) Firefox (2) Google Drive (2) Gson (2) How to Interview (2) Html Parser (2) Http (2) Image Tools (2) JQuery (2) Jersey (2) LDAP (2) Life (2) Logging (2) Python (2) Software Issues (2) Storage (2) Text Search (2) xml parser (2) AOP (1) Application Design (1) AspectJ (1) Chrome DevTools (1) Cloud (1) Codility (1) Data Mining (1) Data Structure (1) ExceptionUtils (1) Exif (1) Feature Request (1) FindBugs (1) Greasemonkey (1) HTML5 (1) Httpd (1) I18N (1) IBM Java Thread Dump Analyzer (1) JDK Source Code (1) JDK8 (1) JMX (1) Lazy Developer (1) Mac (1) Machine Learning (1) Mobile (1) My Plan for 2010 (1) Netbeans (1) Notes (1) Operating System (1) Perl (1) Problems (1) Product Architecture (1) Programming Life (1) Quality (1) Redhat (1) Redis (1) Review (1) RxJava (1) Solutions logs (1) Team Management (1) Thread Dump Analyzer (1) Visualization (1) boilerpipe (1) htm (1) ongoing (1) procrun (1) rss (1)

Popular Posts