Solr: Use DocTransformer to Change Response

We have two documentation website, one is for internal test and one is for production. The content in internal and production site are exactly same.
I am using Nutch2 and Solr to crawl internal documentation site and index to Solr4.x server: make some change, re-crawl, test and check the result. All is good.

Next I need deploy the all-in-one package and Solr index to production machine. -- We package embedded jetty and Solr server, Solr home and index in one package to make it easy to deploy and test

Now I found out that I need change fields: url and urlfolder value. The fields in the index are http://internalsite/doc, I need to change all values to  http://externalsite/doc.

I can either recrawl production documentation site, this would take some time, and is not flexible.
It is better if I can reuse(don't make change to) the internalsite index, and just change these 2 fields when return response.

Luckily, Solr 4.0 provides Document Transformers, which give you a way to modify fields and document that are returned to the user.
For example: [value] to add constant value, [docid], [shard] and [explain].

So I can write a document transformer to change url and urlfolder fields after Solr has searched Solr, but before Solr return response to client.
Implementation Code
The complete source code can be found at Github.
package org.codeexample.jeffery.solr.transform;
public class PrefixReplaceTransformerFactory extends TransformerFactory {
  private List<String> fieldNames = new ArrayList<String>();
  private List<String> fieldPrefixs = new ArrayList<String>();
  private List<String> fieldReplaces = new ArrayList<String>();
  private boolean enabled = false;
  protected static Logger logger = LoggerFactory
      .getLogger(PrefixReplaceTransformerFactory.class);

  @SuppressWarnings("rawtypes")
  @Override
  public void init(NamedList args) {
    super.init(args);
    if (args != null) {
      SolrParams params = SolrParams.toSolrParams(args);
      enabled = params.getBool("enabled", false);
      String str = params.get("fields");
      if (str != null) {
        fieldNames = StrUtils.splitSmart(str, ',');
      }
      str = params.get("prefixes");
      if (str != null) {
        fieldPrefixs = StrUtils.splitSmart(str, ',');
      }
      str = params.get("replaces");
      if (str != null) {
        fieldReplaces = StrUtils.splitSmart(str, ',');
      }
      if (fieldPrefixs.size() != fieldReplaces.size())
        throw new RuntimeException(
            "Size of prefixes and replaces must be same, fieldPrefixs.size: "
                + fieldPrefixs.size() + ",fieldReplace.size: "
                + fieldReplaces.size());
    }
  }

  @Override
  public DocTransformer create(String field, SolrParams params,
      SolrQueryRequest req) {
    return new PrefixReplaceTransformer();
  }

  class PrefixReplaceTransformer extends DocTransformer {
    @Override
    public String getName() {
      return PrefixReplaceTransformer.class.getName();
    }

    @Override
    public void transform(SolrDocument doc, int docid) throws IOException {
      if (enabled) {
        for (int i = 0; i < fieldNames.size(); i++) {
          String fieldName = fieldNames.get(i);

          Object obj = doc.getFieldValue(fieldName);
          if (obj == null)
            continue;
          if (obj instanceof Field) {
            Field field = (Field) obj;
            String fieldValue = field.stringValue();

            boolean match = false;
            int j = 0;
            while (!match && j < fieldPrefixs.size()) {
              String prefix = fieldPrefixs.get(j);
              if (fieldValue.startsWith(prefix)) {
                match = true;
                fieldValue = fieldReplaces.get(j)
                    + fieldValue.substring(prefix.length());
                field.setStringValue(fieldValue);
              }
              ++j;
            }
          } else {
            logger.error("Should not happen: obj.type:" + obj.getClass());
          }
        }
      }
    }
  }
}
The code is like below: you can review the complete code at Github
Register Document Transformer in SolrConfig.xml
Next we need register my new doc transformer in solrconfig.xml, also add the doc transformer to the request handler's invariants.
<transformer name="valuereplace"
 class="com.commvault.response.transform.CVPrefixReplaceTransformerFactory">
 <bool name="enabled">true</bool>
 <str name="fields">url,urlfolder,contentid</str>
 <str name="prefixes">http://internalsite/</str>
 <str name="replaces">http://externalsite/</str>
</transformer>

<requestHandler name="/searchdoc" class="solr.SearchHandler">
 <lst name="invariants">
  <str name="fl">title,url,score,[valuereplace]</str>
 </lst>
</requestHandler>
References
http://wiki.apache.org/solr/DocTransformers
http://solr.pl/en/2011/12/05/solr-4-0-doctransformers-first-look/
Post a Comment

Labels

Java (122) Lucene-Solr (96) All (60) Interview (55) J2SE (53) Algorithm (33) Soft Skills (32) Code Example (31) Eclipse (25) Linux (25) JavaScript (22) Windows (22) Web Development (19) Nutch2 (18) Tools (16) Bugs (15) Defects (14) Text Mining (14) Network (13) Continuous Integration (12) J2EE (12) PowerShell (11) UIMA (9) html (9) Chrome (8) Debug (8) Dynamic Languages (8) How to (8) bat (8) Google (7) Http Client (7) Learning code (7) Performance (7) blogger (7) ANT (6) Coding Skills (6) Database (6) Guava (6) css (6) Algorithm Series (5) Miscs (5) Shell (5) adsense (5) xml (5) AIX (4) GAE (4) Good Programming Practices (4) Maven (4) Memory Usage (4) OpenNLP (4) Project Managment (4) Python (4) ads (4) regular-expression (4) Android (3) Become a Better You (3) Eclipse RCP (3) English (3) Firefox (3) IBM (3) IDE (3) J2SE Knowledge Series (3) JSON (3) Jetty (3) Lesson Learned (3) Script (3) Security (3) Tips (3) regex (3) seo (3) .Net (2) Android Studio (2) Apache (2) Apache Procrun (2) Batch (2) Build (2) Building Scalable Web Sites (2) C# (2) C/C++ (2) Design (2) Fiddler (2) Git (2) Google Drive (2) Gson (2) Happy Hacking (2) Html Parser (2) Http (2) Image Tools (2) JQuery (2) Life (2) Scala (2) Software Issues (2) Storage (2) xml parser (2) Big Data (1) Bit Operation (1) CSV (1) Cache (1) Chrome DevTools (1) Codility (1) Data Mining (1) ExceptionUtils (1) Exif (1) Feature Request (1) FindBugs (1) Greasemonkey (1) HTML5 (1) I18N (1) IBM Java Thread Dump Analyzer (1) JDK Source Code (1) JDK8 (1) JMX (1) Machine Learning (1) Mobile (1) My Plan for 2010 (1) Netbeans (1) Notes (1) Perl (1) Problems (1) Programming Life (1) Quality (1) Redhat (1) Solutions logs (1) Spark (1) Team Management (1) Thread Dump Analyzer (1) Visualization (1) boilerpipe (1) htm (1) procrun (1) rss (1)

Popular Posts