We have two documentation website, one is for internal test and one is for production. The content in internal and production site are exactly same.
I am using Nutch2 and Solr to crawl internal documentation site and index to Solr4.x server: make some change, re-crawl, test and check the result. All is good.
Next I need deploy the all-in-one package and Solr index to production machine. -- We package embedded jetty and Solr server, Solr home and index in one package to make it easy to deploy and test
Now I found out that I need change fields: url and urlfolder value. The fields in the index are http://internalsite/doc, I need to change all values to http://externalsite/doc.
I can either recrawl production documentation site, this would take some time, and is not flexible.
It is better if I can reuse(don't make change to) the internalsite index, and just change these 2 fields when return response.
Luckily, Solr 4.0 provides Document Transformers, which give you a way to modify fields and document that are returned to the user.
For example: [value] to add constant value, [docid], [shard] and [explain].
So I can write a document transformer to change url and urlfolder fields after Solr has searched Solr, but before Solr return response to client.
Implementation Code
The complete source code can be found at Github.
Register Document Transformer in SolrConfig.xml
Next we need register my new doc transformer in solrconfig.xml, also add the doc transformer to the request handler's invariants.
http://wiki.apache.org/solr/DocTransformers
http://solr.pl/en/2011/12/05/solr-4-0-doctransformers-first-look/
I am using Nutch2 and Solr to crawl internal documentation site and index to Solr4.x server: make some change, re-crawl, test and check the result. All is good.
Next I need deploy the all-in-one package and Solr index to production machine. -- We package embedded jetty and Solr server, Solr home and index in one package to make it easy to deploy and test
Now I found out that I need change fields: url and urlfolder value. The fields in the index are http://internalsite/doc, I need to change all values to http://externalsite/doc.
I can either recrawl production documentation site, this would take some time, and is not flexible.
It is better if I can reuse(don't make change to) the internalsite index, and just change these 2 fields when return response.
Luckily, Solr 4.0 provides Document Transformers, which give you a way to modify fields and document that are returned to the user.
For example: [value] to add constant value, [docid], [shard] and [explain].
So I can write a document transformer to change url and urlfolder fields after Solr has searched Solr, but before Solr return response to client.
Implementation Code
The complete source code can be found at Github.
package org.codeexample.jeffery.solr.transform; public class PrefixReplaceTransformerFactory extends TransformerFactory { private List<String> fieldNames = new ArrayList<String>(); private List<String> fieldPrefixs = new ArrayList<String>(); private List<String> fieldReplaces = new ArrayList<String>(); private boolean enabled = false; protected static Logger logger = LoggerFactory .getLogger(PrefixReplaceTransformerFactory.class); @SuppressWarnings("rawtypes") @Override public void init(NamedList args) { super.init(args); if (args != null) { SolrParams params = SolrParams.toSolrParams(args); enabled = params.getBool("enabled", false); String str = params.get("fields"); if (str != null) { fieldNames = StrUtils.splitSmart(str, ','); } str = params.get("prefixes"); if (str != null) { fieldPrefixs = StrUtils.splitSmart(str, ','); } str = params.get("replaces"); if (str != null) { fieldReplaces = StrUtils.splitSmart(str, ','); } if (fieldPrefixs.size() != fieldReplaces.size()) throw new RuntimeException( "Size of prefixes and replaces must be same, fieldPrefixs.size: " + fieldPrefixs.size() + ",fieldReplace.size: " + fieldReplaces.size()); } } @Override public DocTransformer create(String field, SolrParams params, SolrQueryRequest req) { return new PrefixReplaceTransformer(); } class PrefixReplaceTransformer extends DocTransformer { @Override public String getName() { return PrefixReplaceTransformer.class.getName(); } @Override public void transform(SolrDocument doc, int docid) throws IOException { if (enabled) { for (int i = 0; i < fieldNames.size(); i++) { String fieldName = fieldNames.get(i); Object obj = doc.getFieldValue(fieldName); if (obj == null) continue; if (obj instanceof Field) { Field field = (Field) obj; String fieldValue = field.stringValue(); boolean match = false; int j = 0; while (!match && j < fieldPrefixs.size()) { String prefix = fieldPrefixs.get(j); if (fieldValue.startsWith(prefix)) { match = true; fieldValue = fieldReplaces.get(j) + fieldValue.substring(prefix.length()); field.setStringValue(fieldValue); } ++j; } } else { logger.error("Should not happen: obj.type:" + obj.getClass()); } } } } } }The code is like below: you can review the complete code at Github.
Register Document Transformer in SolrConfig.xml
Next we need register my new doc transformer in solrconfig.xml, also add the doc transformer to the request handler's invariants.
<transformer name="valuereplace" class="com.lifelongprogrammer.response.transform.CVPrefixReplaceTransformerFactory"> <bool name="enabled">true</bool> <str name="fields">url,urlfolder,contentid</str> <str name="prefixes">http://internalsite/</str> <str name="replaces">http://externalsite/</str> </transformer> <requestHandler name="/searchdoc" class="solr.SearchHandler"> <lst name="invariants"> <str name="fl">title,url,score,[valuereplace]</str> </lst> </requestHandler>References
http://wiki.apache.org/solr/DocTransformers
http://solr.pl/en/2011/12/05/solr-4-0-doctransformers-first-look/