Lucene Internal APIs

BytesRef
Represents byte[], as a slice (offset + length) into an existing byte[].
byte bytes[] = new byte[] { (byte)'a', (byte)'b', (byte)'c', (byte)'d' };
BytesRef b = new BytesRef(bytes);
BytesRef b2 = new BytesRef(bytes, 1, 3);
assertEquals("bcd", b2.utf8ToString());

public String utf8ToString() {
  final char[] ref = new char[length];
  final int len = UnicodeUtil.UTF8toUTF16(bytes, offset, length, ref);
  return new String(ref, 0, len);
}

Term
public final class Term implements Comparable {
  String field;
  BytesRef bytes;
}
A Term represents a word from text. This is the unit of search. It is composed of two elements, the text of the word, as a string, and the name of the field that the text occurred in.

Iterator to seek (seekCeil(BytesRef), seekExact(BytesRef)) or step through (next terms to obtain frequency information (docFreq), DocsEnum or DocsAndPositionsEnum for the current term (docs. 

Term enumerations are always ordered by getComparator. Each term in the enumeration is greater than the one before it.
TermsEnum
The TermsEnum is unpositioned when you first obtain it and you must first successfully call next or one of the seek methods.

org.apache.lucene.index.TestTermsEnum

DocsEnum
Iterates through the documents and term freqs. NOTE: you must first call nextDoc before using any of the per-doc methods. 

Post a Comment

Labels

Java (159) Lucene-Solr (110) All (60) Interview (59) J2SE (53) Algorithm (37) Eclipse (35) Soft Skills (35) Code Example (31) Linux (26) JavaScript (23) Spring (22) Windows (22) Web Development (20) Tools (19) Nutch2 (18) Bugs (17) Debug (15) Defects (14) Text Mining (14) J2EE (13) Network (13) PowerShell (11) Chrome (9) Continuous Integration (9) How to (9) Learning code (9) Performance (9) UIMA (9) html (9) Design (8) Dynamic Languages (8) Http Client (8) Maven (8) Security (8) Trouble Shooting (8) bat (8) blogger (8) Big Data (7) Google (7) Guava (7) JSON (7) Problem Solving (7) ANT (6) Coding Skills (6) Database (6) Scala (6) Shell (6) css (6) Algorithm Series (5) Cache (5) IDE (5) Lesson Learned (5) Miscs (5) Programmer Skills (5) System Design (5) Tips (5) adsense (5) xml (5) AIX (4) Code Quality (4) GAE (4) Git (4) Good Programming Practices (4) Jackson (4) Memory Usage (4) OpenNLP (4) Project Managment (4) Python (4) Spark (4) Testing (4) ads (4) regular-expression (4) Android (3) Apache Spark (3) Become a Better You (3) Concurrency (3) Eclipse RCP (3) English (3) Firefox (3) Happy Hacking (3) IBM (3) J2SE Knowledge Series (3) JAX-RS (3) Jetty (3) Restful Web Service (3) Script (3) regex (3) seo (3) .Net (2) Android Studio (2) Apache (2) Apache Procrun (2) Architecture (2) Batch (2) Build (2) Building Scalable Web Sites (2) C# (2) C/C++ (2) CSV (2) Career (2) Cassandra (2) Distributed (2) Fiddler (2) Google Drive (2) Gson (2) Html Parser (2) Http (2) Image Tools (2) JQuery (2) Jersey (2) LDAP (2) Life (2) Logging (2) Software Issues (2) Storage (2) Text Search (2) xml parser (2) AOP (1) Application Design (1) AspectJ (1) Bit Operation (1) Chrome DevTools (1) Cloud (1) Codility (1) Data Mining (1) Data Structure (1) ExceptionUtils (1) Exif (1) Feature Request (1) FindBugs (1) Greasemonkey (1) HTML5 (1) Httpd (1) I18N (1) IBM Java Thread Dump Analyzer (1) JDK Source Code (1) JDK8 (1) JMX (1) Lazy Developer (1) Mac (1) Machine Learning (1) Mobile (1) My Plan for 2010 (1) Netbeans (1) Notes (1) Operating System (1) Perl (1) Problems (1) Product Architecture (1) Programming Life (1) Quality (1) Redhat (1) Redis (1) Review (1) RxJava (1) Solutions logs (1) Team Management (1) Thread Dump Analyzer (1) Troubleshooting (1) Visualization (1) boilerpipe (1) htm (1) ongoing (1) procrun (1) rss (1)

Popular Posts