Solr: Add new fields with Default Value for Existing Documents


In some cases, we have to upgrade existing Solr application to add new fields, and we don't want or can't reindex old data.
For example, in old solr application, we only store information about regular file. Now, we need to upgrade it to store other types of files, so we want to add a field: fileType. fileType=0 means it is a regular file, fileType=1 means it's a folder. In future we may add other types of file.

We add the following definition in schema.xml:
<field name="fileType" type="TINT" indexed="true" stored="true" default="-1"/>

Adding this definition in schema.xml doesn't affect existing data: they still don't have fileType field. No fileType value in the response, also no term in fileType field for query: query fileType:[* TO *] returns empty result.

To fix this issue, we have to consider two parts, one is search query, one is search response.
Fix Search Query by Querying NULL Field
As all old data is about regular file, so it means if no value for fileType, then it's regular file. 
When we search regular file: the query should be adjusted as below: It will fetche data where value of fileType is 0, or no value for fileType.
-(-fileType:0 AND fileType:[* TO *])

No change needed when search other field types. We can wrap this change in our own search handler: if the query includes fileType:0, change it to -(-fileType:0 AND fileType:[* TO *]), or we can write a new query parser.
Fix Search Response by Using DocTransformer to Add Default Value
For the old data, there is no value for fileType. We need add fileType=0 in the search response. To do this, we can define a Solr DocTransformer
DocTransformer allow us to modify fields that are returned to the user. 
In our DocTransformer, we can check the value for fileType, if there is no value, set its value as the default value. Now in the response date(xml or json), it will show fileType=0 for old data.  
NullDefaultValueTransformerFactory Implementation
public class NullDefaultValueTransformerFactory extends TransformerFactory {
  private Map<String,String> nullDefaultMap = new HashMap<String,String>();
  private boolean enabled = false;
  protected static Logger logger = LoggerFactory
      .getLogger(NullDefaultValueTransformerFactory.class);
  public void init(NamedList args) {
    super.init(args);
    if (args != null) {
      SolrParams params = SolrParams.toSolrParams(args);
      enabled = params.getBool("enabled", false);
      if (!enabled) return;
      
      List<String> fieldNames = new ArrayList<String>();
      String str = params.get("fields");
      if (str != null) {
        fieldNames = StrUtils.splitSmart(str, ',');
      }
      List<String> nullDefaultvalue = new ArrayList<String>();
      str = params.get("nullDefaultValue");
      if (str != null) {
        nullDefaultvalue = StrUtils.splitSmart(str, ',');
      }
      if (fieldNames.size() != nullDefaultvalue.size()) {
        logger.error("Size doesn't match, fieldNames.size: "
            + fieldNames.size() + ",nullDefaultvalue.size: "
            + nullDefaultvalue.size());
        enabled = false;
      } else {
        if (fieldNames.isEmpty()) {
          logger.error("No fields are set.");
          enabled = false;
        }
      }
      
      for (int i = 0; i < fieldNames.size(); i++) {
        nullDefaultMap.put(fieldNames.get(i).trim(), nullDefaultvalue.get(i)
            .trim());
      }
    }
  }
  public DocTransformer create(String field, SolrParams params,
      SolrQueryRequest req) {
    return new NullDefaultValueTransformer();
  }
  
  class NullDefaultValueTransformer extends DocTransformer {
    public String getName() {
      return NullDefaultValueTransformer.class.getName();
    }
    public void transform(SolrDocument doc, int docid) throws IOException {
      if (enabled) {
        Iterator<Entry<String,String>> it = nullDefaultMap.entrySet()
            .iterator();
        while (it.hasNext()) {
          Entry<String,String> entry = it.next();
          String fieldName = entry.getKey();
          Object obj = doc.getFieldValue(fieldName);
          if (obj == null) {
            doc.setField(fieldName, entry.getValue());
          }
        }
      }
    }
  }
}
With the previous 2 changes, the client application can kind of think the old data has default value 0 for fieldType. Be aware that some functions will not work, such as sort, stats.
Resources
Solr: Use DocTransformer to Change Response
Searching for date range or null/no field in Solr
Solr DocTransformers

Labels

adsense (5) Algorithm (69) Algorithm Series (35) Android (7) ANT (6) bat (8) Big Data (7) Blogger (14) Bugs (6) Cache (5) Chrome (19) Code Example (29) Code Quality (7) Coding Skills (5) Database (7) Debug (16) Design (5) Dev Tips (63) Eclipse (32) Git (5) Google (33) Guava (7) How to (9) Http Client (8) IDE (7) Interview (88) J2EE (13) J2SE (49) Java (186) JavaScript (27) JSON (7) Learning code (9) Lesson Learned (6) Linux (26) Lucene-Solr (112) Mac (10) Maven (8) Network (9) Nutch2 (18) Performance (9) PowerShell (11) Problem Solving (11) Programmer Skills (6) regex (5) Scala (6) Security (9) Soft Skills (38) Spring (22) System Design (11) Testing (7) Text Mining (14) Tips (17) Tools (24) Troubleshooting (29) UIMA (9) Web Development (19) Windows (21) xml (5)