Lucene Built-in Collectors
Check Lucene Javadoc for all Lucene built-in collectors.
Check Lucene Javadoc for all Lucene built-in collectors.
Lucene's core collectors are derived from Collector. Likely your application can use one of these classes, or subclass
TopDocsCollector
, instead of implementing Collector directly:TopDocsCollector
is an abstract base class that assumes you will retrieve the top N docs, according to some criteria, after collection is done.TopScoreDocCollector
is a concrete subclassTopDocsCollector
and sorts according to score + docID. This is used internally by theIndexSearcher
search methods that do not take an explicitSort
. It is likely the most frequently used collector.TopFieldCollector
subclassesTopDocsCollector
and sorts according to a specifiedSort
object (sort by field). This is used internally by theIndexSearcher
search methods that take an explicitSort
.TimeLimitingCollector
, which wraps any other Collector and aborts the search if it's taken too much time.PositiveScoresOnlyCollector
wraps any other Collector and prevents collection of hits whose score is <= 0.0
It's a good start to read Lucene's built-in collectors' code to learn how to build our own collectors:
TotalHitCountCollector: Just count the number of hits.
public void collect(int doc) { totalHits++; }
PositiveScoresOnlyCollector:
if (scorer.score() > 0) { c.collect(doc); } // only include the doc if its score >0
TimeLimitingCollector: use an external counter, and compare timeout in collect, throw TimeExceededException if the allowed time has passed:
long time = clock.get(); if (timeout < time) {throw new TimeExceededException( timeout-t0, time-t0, docBase + doc );}
Also TestTimeLimitingCollector.MyHitCollector is an example of custom collector.
FilterCollector: A collector that filters incoming doc ids that are not in the filter. Used by Grouping.
if (scorer.score() > 0) { c.collect(doc); } // only include the doc if its score >0
TimeLimitingCollector: use an external counter, and compare timeout in collect, throw TimeExceededException if the allowed time has passed:
long time = clock.get(); if (timeout < time) {throw new TimeExceededException( timeout-t0, time-t0, docBase + doc );}
Also TestTimeLimitingCollector.MyHitCollector is an example of custom collector.
FilterCollector: A collector that filters incoming doc ids that are not in the filter. Used by Grouping.
Using TimeLimitingCollector to Stop Slow Query
public void testTimeLimitingCollector() throws IOException { // SimulateSlowCollector is a copy of // org.apache.lucene.search.TestTimeLimitingCollector.MyHitCollector SimulateSlowCollector slowCollector = new SimulateSlowCollector(); slowCollector.setSlowDown(1000 * 10); Counter clock = Counter.newCounter(true); int tick = 10; TimeLimitingCollector collector = new TimeLimitingCollector( slowCollector, clock, tick); collector.setBaseline(0); try (Directory directory = FSDirectory.open(new File(FILE_PATH)); DirectoryReader indexReader = DirectoryReader.open(directory);) { IndexSearcher searcher = new IndexSearcher(indexReader); try { new Thread() { public void run() { // will kill the indexSearcher.search(...) after 10 // ticks (10 seconds) while (clock.get() <= tick) { try { Thread.sleep(1000); clock.addAndGet(1); } catch (InterruptedException e) { e.printStackTrace(); } } } }.start(); searcher.search(new MatchAllDocsQuery(), collector); System.out.println(slowCollector.hitCount()); } catch (TimeExceededException e) { // it throws exception here. System.out.println("Too much time taken."); e.printStackTrace(); } } }Write a Custom Collector
public class FacetCountCollector extends Collector { private MapUsing Custom CollectorcountMap = new HashMap<>(); // scorer and docBase are actually not used. private Scorer scorer; private int docBase; private IndexSearcher searcher = null; public FacetCountCollector(IndexSearcher searcher) { this.searcher = searcher; } @Override public void collect(int doc) { try { Document document = searcher.doc(doc); if (document != null) { IndexableField[] categoriesDoc = document .getFields("categories"); if (categoriesDoc != null && categoriesDoc.length > 0) { for (int i = 0; i < categoriesDoc.length; i++) { if (countMap .containsKey(categoriesDoc[i].stringValue())) { countMap.put(categoriesDoc[i].stringValue(), Long .valueOf(countMap.get(categoriesDoc[i] .stringValue())) + 1); } else { countMap.put(categoriesDoc[i].stringValue(), 1L); } } } } } catch (IOException e) { e.printStackTrace(); } } public Map getCountMap() { return Collections.unmodifiableMap(countMap); } public void setScorer(Scorer scorer) throws IOException { this.scorer = scorer; } public void setNextReader(AtomicReaderContext context) throws IOException { this.docBase = context.docBase;// Record the readers absolute doc base } public boolean acceptsDocsOutOfOrder() { // Return true if this collector does not require the matching docIDs to // be delivered in int sort order (smallest to largest) to collect. return true; } }
public void testFacetCountCollector() throws IOException {
try (Directory directory = FSDirectory.open(new File(FILE_PATH));
DirectoryReader indexReader = DirectoryReader.open(directory);) {
IndexSearcher searcher = new IndexSearcher(indexReader);
try {
FacetCountCollector collector = new FacetCountCollector(
searcher);
searcher.search(new MatchAllDocsQuery(), collector);
System.out.println(collector.getCountMap());
// printResult(topDocsCollector, searcher);
} catch (TimeExceededException e) {
// it throws exception here.
System.out.println("Too much time taken.");
e.printStackTrace();
}
}
}
References
Lucene Built-in Collectors