Paginating Lucene Search Results

Use IndexSearcher.searchAfter
/**
 * useSearcherAfter, need client record the returned last ScoreDoc
 * lastBottom, and pass it in next round.
 */
private void useSearcherAfter(DirectoryReader indexReader,
    IndexSearcher searcher, int pageSize) throws IOException {
  Query query = new TermQuery(new Term("title", "java"));
  // query = new MatchAllDocsQuery();
  ScoreDoc lastBottom = null;
  while (true) {
    TopDocs paged = null;
    paged = searcher.searchAfter(lastBottom, query, null, pageSize);
    if (paged.scoreDocs.length == 0) {
      // no more data, break;
      break;
    }
    ScoreDoc[] scoreDocs = paged.scoreDocs;
    for (ScoreDoc scoreDoc : scoreDocs) {
      Utils.printDoc(searcher.doc(scoreDoc.doc), "id", "title");
    }

    lastBottom = paged.scoreDocs[paged.scoreDocs.length - 1];
  }
}

Skip Previous Docs
Not good at performance and memory usage.
private void skipPreviousRows(DirectoryReader indexReader,
    IndexSearcher searcher, int pageStart, int pageSize)
    throws IOException {
  Query query = new TermQuery(new Term("title", "java"));
  int pageEnd = pageStart - 1 + pageSize;
  TopDocs hits = searcher.search(query, pageEnd);

  for (int i = pageStart - 1; i < pageEnd; i++) {
    int docId = hits.scoreDocs[i].doc;

    // load the document
    Document doc = searcher.doc(docId);
    Utils.printDocAndExplain(doc, searcher, query, docId, "id", "title");
  }
}

In Solr4.7, we can do deep paging with cursorMark
Solr Deep Pagination Problem Fixed in Solr-5463
Sorting, Paging, and Deep Paging in Solr
http://solr1:8080/solr/select?q=accesstime:[* TO NOW-5YEAR/DAY]&sort=accesstime desc, contentid asc&sort=accesstime desc,id asc&rows=1000&start=0&cursorMark=*

http://solr1:8080/solr/select?q=accesstime:[* TO NOW-5YEAR/DAY]&sort=accesstime desc, contentid asc&sort=accesstime desc,id asc&rows=1000&start=0&cursorMark=AoJ42tmu%2FZ4CKTQxMDMyMzEwMw%3D%3D
Post a Comment

Labels

Java (159) Lucene-Solr (112) Interview (61) All (58) J2SE (53) Algorithm (45) Soft Skills (38) Eclipse (33) Code Example (31) Linux (25) JavaScript (23) Spring (22) Windows (22) Web Development (20) Tools (19) Nutch2 (18) Bugs (17) Debug (16) Defects (14) Text Mining (14) J2EE (13) Network (13) Troubleshooting (13) PowerShell (11) Chrome (9) Design (9) How to (9) Learning code (9) Performance (9) Problem Solving (9) UIMA (9) html (9) Http Client (8) Maven (8) Security (8) bat (8) blogger (8) Big Data (7) Continuous Integration (7) Google (7) Guava (7) JSON (7) Shell (7) ANT (6) Coding Skills (6) Database (6) Lesson Learned (6) Programmer Skills (6) Scala (6) Tips (6) css (6) Algorithm Series (5) Cache (5) Dynamic Languages (5) IDE (5) System Design (5) adsense (5) xml (5) AIX (4) Code Quality (4) GAE (4) Git (4) Good Programming Practices (4) Jackson (4) Memory Usage (4) Miscs (4) OpenNLP (4) Project Managment (4) Spark (4) Testing (4) ads (4) regular-expression (4) Android (3) Apache Spark (3) Become a Better You (3) Concurrency (3) Eclipse RCP (3) English (3) Happy Hacking (3) IBM (3) J2SE Knowledge Series (3) JAX-RS (3) Jetty (3) Restful Web Service (3) Script (3) regex (3) seo (3) .Net (2) Android Studio (2) Apache (2) Apache Procrun (2) Architecture (2) Batch (2) Bit Operation (2) Build (2) Building Scalable Web Sites (2) C# (2) C/C++ (2) CSV (2) Career (2) Cassandra (2) Distributed (2) Fiddler (2) Firefox (2) Google Drive (2) Gson (2) How to Interview (2) Html Parser (2) Http (2) Image Tools (2) JQuery (2) Jersey (2) LDAP (2) Life (2) Logging (2) Python (2) Software Issues (2) Storage (2) Text Search (2) xml parser (2) AOP (1) Application Design (1) AspectJ (1) Chrome DevTools (1) Cloud (1) Codility (1) Data Mining (1) Data Structure (1) ExceptionUtils (1) Exif (1) Feature Request (1) FindBugs (1) Greasemonkey (1) HTML5 (1) Httpd (1) I18N (1) IBM Java Thread Dump Analyzer (1) JDK Source Code (1) JDK8 (1) JMX (1) Lazy Developer (1) Mac (1) Machine Learning (1) Mobile (1) My Plan for 2010 (1) Netbeans (1) Notes (1) Operating System (1) Perl (1) Problems (1) Product Architecture (1) Programming Life (1) Quality (1) Redhat (1) Redis (1) Review (1) RxJava (1) Solutions logs (1) Team Management (1) Thread Dump Analyzer (1) Visualization (1) boilerpipe (1) htm (1) ongoing (1) procrun (1) rss (1)

Popular Posts