Lucene Built-in Collectors
Check Lucene Javadoc for all Lucene built-in collectors.
Check Lucene Javadoc for all Lucene built-in collectors.
Lucene's core collectors are derived from Collector. Likely your application can use one of these classes, or subclass
TopDocsCollector, instead of implementing Collector directly:TopDocsCollectoris an abstract base class that assumes you will retrieve the top N docs, according to some criteria, after collection is done.TopScoreDocCollectoris a concrete subclassTopDocsCollectorand sorts according to score + docID. This is used internally by theIndexSearchersearch methods that do not take an explicitSort. It is likely the most frequently used collector.TopFieldCollectorsubclassesTopDocsCollectorand sorts according to a specifiedSortobject (sort by field). This is used internally by theIndexSearchersearch methods that take an explicitSort.TimeLimitingCollector, which wraps any other Collector and aborts the search if it's taken too much time.PositiveScoresOnlyCollectorwraps any other Collector and prevents collection of hits whose score is <= 0.0
It's a good start to read Lucene's built-in collectors' code to learn how to build our own collectors:
TotalHitCountCollector: Just count the number of hits.
public void collect(int doc) { totalHits++; }
PositiveScoresOnlyCollector:
if (scorer.score() > 0) { c.collect(doc); } // only include the doc if its score >0
TimeLimitingCollector: use an external counter, and compare timeout in collect, throw TimeExceededException if the allowed time has passed:
long time = clock.get(); if (timeout < time) {throw new TimeExceededException( timeout-t0, time-t0, docBase + doc );}
Also TestTimeLimitingCollector.MyHitCollector is an example of custom collector.
FilterCollector: A collector that filters incoming doc ids that are not in the filter. Used by Grouping.
if (scorer.score() > 0) { c.collect(doc); } // only include the doc if its score >0
TimeLimitingCollector: use an external counter, and compare timeout in collect, throw TimeExceededException if the allowed time has passed:
long time = clock.get(); if (timeout < time) {throw new TimeExceededException( timeout-t0, time-t0, docBase + doc );}
Also TestTimeLimitingCollector.MyHitCollector is an example of custom collector.
FilterCollector: A collector that filters incoming doc ids that are not in the filter. Used by Grouping.
Using TimeLimitingCollector to Stop Slow Query
public void testTimeLimitingCollector() throws IOException {
// SimulateSlowCollector is a copy of
// org.apache.lucene.search.TestTimeLimitingCollector.MyHitCollector
SimulateSlowCollector slowCollector = new SimulateSlowCollector();
slowCollector.setSlowDown(1000 * 10);
Counter clock = Counter.newCounter(true);
int tick = 10;
TimeLimitingCollector collector = new TimeLimitingCollector(
slowCollector, clock, tick);
collector.setBaseline(0);
try (Directory directory = FSDirectory.open(new File(FILE_PATH));
DirectoryReader indexReader = DirectoryReader.open(directory);) {
IndexSearcher searcher = new IndexSearcher(indexReader);
try {
new Thread() {
public void run() {
// will kill the indexSearcher.search(...) after 10
// ticks (10 seconds)
while (clock.get() <= tick) {
try {
Thread.sleep(1000);
clock.addAndGet(1);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
}.start();
searcher.search(new MatchAllDocsQuery(), collector);
System.out.println(slowCollector.hitCount());
} catch (TimeExceededException e) {
// it throws exception here.
System.out.println("Too much time taken.");
e.printStackTrace();
}
}
}
Write a Custom Collector
public class FacetCountCollector extends Collector {
private Map countMap = new HashMap<>();
// scorer and docBase are actually not used.
private Scorer scorer;
private int docBase;
private IndexSearcher searcher = null;
public FacetCountCollector(IndexSearcher searcher) {
this.searcher = searcher;
}
@Override
public void collect(int doc) {
try {
Document document = searcher.doc(doc);
if (document != null) {
IndexableField[] categoriesDoc = document
.getFields("categories");
if (categoriesDoc != null && categoriesDoc.length > 0) {
for (int i = 0; i < categoriesDoc.length; i++) {
if (countMap
.containsKey(categoriesDoc[i].stringValue())) {
countMap.put(categoriesDoc[i].stringValue(), Long
.valueOf(countMap.get(categoriesDoc[i]
.stringValue())) + 1);
} else {
countMap.put(categoriesDoc[i].stringValue(), 1L);
}
}
}
}
} catch (IOException e) {
e.printStackTrace();
}
}
public Map getCountMap() {
return Collections.unmodifiableMap(countMap);
}
public void setScorer(Scorer scorer) throws IOException {
this.scorer = scorer;
}
public void setNextReader(AtomicReaderContext context) throws IOException {
this.docBase = context.docBase;// Record the readers absolute doc base
}
public boolean acceptsDocsOutOfOrder() {
// Return true if this collector does not require the matching docIDs to
// be delivered in int sort order (smallest to largest) to collect.
return true;
}
}
Using Custom Collector
public void testFacetCountCollector() throws IOException {
try (Directory directory = FSDirectory.open(new File(FILE_PATH));
DirectoryReader indexReader = DirectoryReader.open(directory);) {
IndexSearcher searcher = new IndexSearcher(indexReader);
try {
FacetCountCollector collector = new FacetCountCollector(
searcher);
searcher.search(new MatchAllDocsQuery(), collector);
System.out.println(collector.getCountMap());
// printResult(topDocsCollector, searcher);
} catch (TimeExceededException e) {
// it throws exception here.
System.out.println("Too much time taken.");
e.printStackTrace();
}
}
}
References
Lucene Built-in Collectors