Java APIs to Build Solr Suggester and Get Suggestion

User Case
Usually we provide Rest APIs to manage Solr, same for suggestor.
This article focuses on how to programmatically build Solr suggester and get suggestions using java code.

The implementation
Please check the end of the article for Solr configuration files.

Build Suggester
In Solr, after we add docs to Solr, we call suggest?suggest.build=true to build the suggestor to make them available for autocompletion.

The only trick here is the suggest.build request doesn't build suggester for all cores in the collection, BUT only builds suggester to the core that receives the request.

We need get all replicas urls of the collection, add them into shards parameter, and also add shards.qt=/suggest:
shards=127.0.0.1:4567/solr/myCollection_shard1_replica3,127.0.0.1:4565/solr/myCollection_shard1_replica2,127.0.0.1:4566/solr/myCollection_shard1_replica1,127.0.0.1:4567/solr/myCollection_shard2_replica3,127.0.0.1:4566/solr/myCollection_shard2_replica1/,127.0.0.1:4565/solr/myCollection_shard2_replica2&shards.qt=/suggest

public void buildSuggester() {
    final SolrQuery solrQuery = new SolrQuery();
    final List<String> urls = getAllSolrCoreUrls(getSolrClient());

    solrQuery.setRequestHandler("/suggest").setParam("suggest.build", "true")
            .setParam(ShardParams.SHARDS, COMMA_JOINER.join(urls))
            .setParam(ShardParams.SHARDS_QT, "/suggest");
    try {
        final QueryResponse queryResponse = getSolrClient().query(solrQuery);
        final int status = queryResponse.getStatus();
        if (status >= 300) {
            throw new BusinessException(ErrorCode.data_access_error,
                    MessageFormat.format("Failed to build suggestions: status: {0}", status));
        }
    } catch (SolrServerException | IOException e) {
        throw new BusinessException(ErrorCode.data_access_error, e, "Failed to build suggestions");
    }
}
public static List<String> getAllSolrCoreUrls(final CloudSolrClient solrClient) {
    final ZkStateReader zkReader = getZKReader(solrClient);
    final ClusterState clusterState = zkReader.getClusterState();

    final Collection<Slice> slices = clusterState.getSlices(solrClient.getDefaultCollection());
    if (slices.isEmpty()) {
        throw new BusinessException(ErrorCode.data_access_error, "No slices");
    }
    return slices.stream().map(slice -> slice.getReplicas()).flatMap(replicas -> replicas.stream())
            .map(replica -> replica.getCoreUrl()).collect(Collectors.toList());
}

private static ZkStateReader getZKReader(final CloudSolrClient solrClient) {
    final ZkStateReader zkReader = solrClient.getZkStateReader();
    if (zkReader == null) {
        // This only happens when we first time call solrClient to do anything
        // Usually we will call solrClient to do something during abolition starts: such as
        // healthCheck, so in most cases, its already connected.
        solrClient.connect();
    }
    return solrClient.getZkStateReader();
}

Get Suggestions


public Set<SearchSuggestion> getSuggestions(final String prefix, final int limit) {
   final Set<SearchSuggestion> result = new LinkedHashSet<>(limit);
   try {
       final SolrQuery solrQuery = new SolrQuery().setRequestHandler("/suggest").setParam("suggest.q", prefix)
               .setParam("suggest.count", String.valueOf(limit)).setParam(CommonParams.TIME_ALLOWED,
                       mergedConfig.getConfigByNameAsString("search.suggestions.time_allowed.millSeconds"));
       // context filters
       solrQuery.setParam("suggest.cfq", getContextFilters());
       final QueryResponse queryResponse = getSolrClient().query(solrQuery);
       if (queryResponse != null) {
           final SuggesterResponse suggesterResponse = queryResponse.getSuggesterResponse();
           final Map<String, List<Suggestion>> map = suggesterResponse.getSuggestions();
           final List<Suggestion> infixSuggesters = map.get("infixSuggester");
           if (infixSuggesters != null) {
               for (final Suggestion suggester : infixSuggesters) {
                   if (result.size() < limit) {
                       result.add(new SearchSuggestion().setText(suggester.getTerm())
                               .setHighlightedText(replaceTagB(suggester.getTerm())));
                   } else {
                       break;
                   }
               }
           }
       }
       logger.info(
               MessageFormat.format("User: {0}, query: {1}, limit: {2}, result: {3}", user, query, limit, result));
       return result;
   } catch (final Exception e) {
       throw new BusinessException(ErrorCode.data_access_error, e, "Failed to get suggestions for " + query);
   }
}
private static final Pattern TAGB_PATTERN = Pattern.compile("<b>|</b>");
public static String replaceTagB(String input)
{
    return TAGB_PATTERN.matcher(input).replaceAll("");
}

Schema.xml
We define textSuggest and suggesterContextField, copy fields which are shown in the autocompletion to textSuggest field, and copy filter fields such as zipCodes, genres to suggesterContextField.

Solr suggester supports filters on multiple fields, all we just need copy all these filter fields to suggesterContextField.


<field name="suggester" type="textSuggest" indexed="true"
  stored="true" multiValued="true" />
<field name="suggesterContextField" type="string" indexed="true" stored="true"
  multiValued="true" />

<copyField source="seriesTitle" dest="suggester" />
<copyField source="programTitle" dest="suggester" />

<copyField source="zipCodes" dest="suggesterContextField" />
<copyField source="genres" dest="suggesterContextField" />
SolrConfig.xml
We can add multiple suggester implementations to searchComponent. Another very useful is FileDictionaryFactory which allows us to using an external file that contains suggest entries. We may use it in future.


<searchComponent name="suggest" class="solr.SuggestComponent">
  <lst name="suggester">
    <str name="name">infixSuggester</str>
    <str name="lookupImpl">BlendedInfixLookupFactory</str>
    <str name="dictionaryImpl">DocumentDictionaryFactory</str>
    <str name="blenderType">position_linear</str>
    <str name="field">suggester</str>
    <str name="contextField">suggesterContextField</str>
    <str name="minPrefixChars">4</str>
    <str name="suggestAnalyzerFieldType">textSuggest</str>
    <str name="indexPath">infix_suggestions</str>
    <str name="highlight">true</str>
    <str name="buildOnStartup">false</str>
    <str name="buildOnCommit">false</str>
  </lst>
</searchComponent>

<requestHandler name="/suggest" class="solr.SearchHandler"
  >
  <lst name="defaults">
    <str name="suggest">true</str>
    <str name="suggest.dictionary">infixSuggester</str>
    <str name="suggest.onlyMorePopular">true</str>
    <str name="suggest.count">10</str>
    <str name="suggest.collate">true</str>
  </lst>
  <arr name="components">
    <str>suggest</str>
  </arr>
</requestHandler>

Resources
Solr Suggester
Post a Comment

Labels

Java (159) Lucene-Solr (112) Interview (61) All (58) J2SE (53) Algorithm (45) Soft Skills (38) Eclipse (33) Code Example (31) Linux (25) JavaScript (23) Spring (22) Windows (22) Web Development (20) Tools (19) Nutch2 (18) Bugs (17) Debug (16) Defects (14) Text Mining (14) J2EE (13) Network (13) Troubleshooting (13) PowerShell (11) Chrome (9) Design (9) How to (9) Learning code (9) Performance (9) Problem Solving (9) UIMA (9) html (9) Http Client (8) Maven (8) Security (8) bat (8) blogger (8) Big Data (7) Continuous Integration (7) Google (7) Guava (7) JSON (7) Shell (7) ANT (6) Coding Skills (6) Database (6) Lesson Learned (6) Programmer Skills (6) Scala (6) Tips (6) css (6) Algorithm Series (5) Cache (5) Dynamic Languages (5) IDE (5) System Design (5) adsense (5) xml (5) AIX (4) Code Quality (4) GAE (4) Git (4) Good Programming Practices (4) Jackson (4) Memory Usage (4) Miscs (4) OpenNLP (4) Project Managment (4) Spark (4) Testing (4) ads (4) regular-expression (4) Android (3) Apache Spark (3) Become a Better You (3) Concurrency (3) Eclipse RCP (3) English (3) Happy Hacking (3) IBM (3) J2SE Knowledge Series (3) JAX-RS (3) Jetty (3) Restful Web Service (3) Script (3) regex (3) seo (3) .Net (2) Android Studio (2) Apache (2) Apache Procrun (2) Architecture (2) Batch (2) Bit Operation (2) Build (2) Building Scalable Web Sites (2) C# (2) C/C++ (2) CSV (2) Career (2) Cassandra (2) Distributed (2) Fiddler (2) Firefox (2) Google Drive (2) Gson (2) How to Interview (2) Html Parser (2) Http (2) Image Tools (2) JQuery (2) Jersey (2) LDAP (2) Life (2) Logging (2) Python (2) Software Issues (2) Storage (2) Text Search (2) xml parser (2) AOP (1) Application Design (1) AspectJ (1) Chrome DevTools (1) Cloud (1) Codility (1) Data Mining (1) Data Structure (1) ExceptionUtils (1) Exif (1) Feature Request (1) FindBugs (1) Greasemonkey (1) HTML5 (1) Httpd (1) I18N (1) IBM Java Thread Dump Analyzer (1) JDK Source Code (1) JDK8 (1) JMX (1) Lazy Developer (1) Mac (1) Machine Learning (1) Mobile (1) My Plan for 2010 (1) Netbeans (1) Notes (1) Operating System (1) Perl (1) Problems (1) Product Architecture (1) Programming Life (1) Quality (1) Redhat (1) Redis (1) Review (1) RxJava (1) Solutions logs (1) Team Management (1) Thread Dump Analyzer (1) Visualization (1) boilerpipe (1) htm (1) ongoing (1) procrun (1) rss (1)

Popular Posts