Solr: Add new fields with Default Value for Existing Documents

In some cases, we have to upgrade existing Solr application to add new fields, and we don't want or can't reindex old data.
For example, in old solr application, we only store information about regular file. Now, we need to upgrade it to store other types of files, so we want to add a field: fileType. fileType=0 means it is a regular file, fileType=1 means it's a folder. In future we may add other types of file.

We add the following definition in schema.xml:
<field name="fileType" type="TINT" indexed="true" stored="true" default="-1"/>

Adding this definition in schema.xml doesn't affect existing data: they still don't have fileType field. No fileType value in the response, also no term in fileType field for query: query fileType:[* TO *] returns empty result.

To fix this issue, we have to consider two parts, one is search query, one is search response.
Fix Search Query by Querying NULL Field
As all old data is about regular file, so it means if no value for fileType, then it's regular file. 
When we search regular file: the query should be adjusted as below: It will fetche data where value of fileType is 0, or no value for fileType.
-(-fileType:0 AND fileType:[* TO *])

No change needed when search other field types. We can wrap this change in our own search handler: if the query includes fileType:0, change it to -(-fileType:0 AND fileType:[* TO *]), or we can write a new query parser.
Fix Search Response by Using DocTransformer to Add Default Value
For the old data, there is no value for fileType. We need add fileType=0 in the search response. To do this, we can define a Solr DocTransformer
DocTransformer allow us to modify fields that are returned to the user. 
In our DocTransformer, we can check the value for fileType, if there is no value, set its value as the default value. Now in the response date(xml or json), it will show fileType=0 for old data.  
NullDefaultValueTransformerFactory Implementation
public class NullDefaultValueTransformerFactory extends TransformerFactory {
  private Map<String,String> nullDefaultMap = new HashMap<String,String>();
  private boolean enabled = false;
  protected static Logger logger = LoggerFactory
      .getLogger(NullDefaultValueTransformerFactory.class);
  public void init(NamedList args) {
    super.init(args);
    if (args != null) {
      SolrParams params = SolrParams.toSolrParams(args);
      enabled = params.getBool("enabled", false);
      if (!enabled) return;
      
      List<String> fieldNames = new ArrayList<String>();
      String str = params.get("fields");
      if (str != null) {
        fieldNames = StrUtils.splitSmart(str, ',');
      }
      List<String> nullDefaultvalue = new ArrayList<String>();
      str = params.get("nullDefaultValue");
      if (str != null) {
        nullDefaultvalue = StrUtils.splitSmart(str, ',');
      }
      if (fieldNames.size() != nullDefaultvalue.size()) {
        logger.error("Size doesn't match, fieldNames.size: "
            + fieldNames.size() + ",nullDefaultvalue.size: "
            + nullDefaultvalue.size());
        enabled = false;
      } else {
        if (fieldNames.isEmpty()) {
          logger.error("No fields are set.");
          enabled = false;
        }
      }
      
      for (int i = 0; i < fieldNames.size(); i++) {
        nullDefaultMap.put(fieldNames.get(i).trim(), nullDefaultvalue.get(i)
            .trim());
      }
    }
  }
  public DocTransformer create(String field, SolrParams params,
      SolrQueryRequest req) {
    return new NullDefaultValueTransformer();
  }
  
  class NullDefaultValueTransformer extends DocTransformer {
    public String getName() {
      return NullDefaultValueTransformer.class.getName();
    }
    public void transform(SolrDocument doc, int docid) throws IOException {
      if (enabled) {
        Iterator<Entry<String,String>> it = nullDefaultMap.entrySet()
            .iterator();
        while (it.hasNext()) {
          Entry<String,String> entry = it.next();
          String fieldName = entry.getKey();
          Object obj = doc.getFieldValue(fieldName);
          if (obj == null) {
            doc.setField(fieldName, entry.getValue());
          }
        }
      }
    }
  }
}
With the previous 2 changes, the client application can kind of think the old data has default value 0 for fieldType. Be aware that some functions will not work, such as sort, stats.
Resources
Solr: Use DocTransformer to Change Response
Searching for date range or null/no field in Solr
Solr DocTransformers
Post a Comment

Labels

Java (159) Lucene-Solr (110) Interview (61) All (58) J2SE (53) Algorithm (45) Soft Skills (36) Eclipse (34) Code Example (31) Linux (24) JavaScript (23) Spring (22) Windows (22) Web Development (20) Nutch2 (18) Tools (18) Bugs (17) Debug (15) Defects (14) Text Mining (14) J2EE (13) Network (13) PowerShell (11) Troubleshooting (11) Chrome (9) Design (9) How to (9) Learning code (9) Performance (9) UIMA (9) html (9) Http Client (8) Maven (8) Problem Solving (8) Security (8) bat (8) blogger (8) Big Data (7) Continuous Integration (7) Google (7) Guava (7) JSON (7) ANT (6) Coding Skills (6) Database (6) Scala (6) Shell (6) css (6) Algorithm Series (5) Cache (5) Dynamic Languages (5) IDE (5) Lesson Learned (5) Programmer Skills (5) Tips (5) adsense (5) xml (5) AIX (4) Code Quality (4) GAE (4) Git (4) Good Programming Practices (4) Jackson (4) Memory Usage (4) Miscs (4) OpenNLP (4) Project Managment (4) Spark (4) System Design (4) Testing (4) ads (4) regular-expression (4) Android (3) Apache Spark (3) Become a Better You (3) Concurrency (3) Eclipse RCP (3) English (3) Happy Hacking (3) IBM (3) J2SE Knowledge Series (3) JAX-RS (3) Jetty (3) Restful Web Service (3) Script (3) regex (3) seo (3) .Net (2) Android Studio (2) Apache (2) Apache Procrun (2) Architecture (2) Batch (2) Bit Operation (2) Build (2) Building Scalable Web Sites (2) C# (2) C/C++ (2) CSV (2) Career (2) Cassandra (2) Distributed (2) Fiddler (2) Firefox (2) Google Drive (2) Gson (2) How to Interview (2) Html Parser (2) Http (2) Image Tools (2) JQuery (2) Jersey (2) LDAP (2) Life (2) Logging (2) Python (2) Software Issues (2) Storage (2) Text Search (2) xml parser (2) AOP (1) Application Design (1) AspectJ (1) Chrome DevTools (1) Cloud (1) Codility (1) Data Mining (1) Data Structure (1) ExceptionUtils (1) Exif (1) Feature Request (1) FindBugs (1) Greasemonkey (1) HTML5 (1) Httpd (1) I18N (1) IBM Java Thread Dump Analyzer (1) JDK Source Code (1) JDK8 (1) JMX (1) Lazy Developer (1) Mac (1) Machine Learning (1) Mobile (1) My Plan for 2010 (1) Netbeans (1) Notes (1) Operating System (1) Perl (1) Problems (1) Product Architecture (1) Programming Life (1) Quality (1) Redhat (1) Redis (1) Review (1) RxJava (1) Solutions logs (1) Team Management (1) Thread Dump Analyzer (1) Visualization (1) boilerpipe (1) htm (1) ongoing (1) procrun (1) rss (1)

Popular Posts