Programmer: Lifelong Learning: Lucene Internal APIs

BytesRef
Represents byte[], as a slice (offset + length) into an existing byte[].
byte bytes[] = new byte[] { (byte)'a', (byte)'b', (byte)'c', (byte)'d' };

BytesRef b = new BytesRef(bytes);

BytesRef b2 = new BytesRef(bytes, 1, 3);

assertEquals("bcd", b2.utf8ToString());

public String utf8ToString() {
final char[] ref = new char[length];
final int len = UnicodeUtil.UTF8toUTF16(bytes, offset, length, ref);
return new String(ref, 0, len);
}

Term

public final class Term implements Comparable {

String field;

BytesRef bytes;

}

A Term represents a word from text. This is the unit of search. It is composed of two elements, the text of the word, as a string, and the name of the field that the text occurred in.

Iterator to seek (seekCeil(BytesRef), seekExact(BytesRef)) or step through (next terms to obtain frequency information (docFreq), DocsEnum or DocsAndPositionsEnum for the current term (docs.

Term enumerations are always ordered by getComparator. Each term in the enumeration is greater than the one before it.

TermsEnum

The TermsEnum is unpositioned when you first obtain it and you must first successfully call next or one of the seek methods.

org.apache.lucene.index.TestTermsEnum

DocsEnum
Iterates through the documents and term freqs. NOTE: you must first call nextDoc before using any of the per-doc methods.

Lucene Internal APIs

Labels