Solr Data Schema Migration Practice

The Problem
When we design data schema, we should choose field name and type carefully. As it takes much more effort to change the schema, migrate old data without compared with changing code.
If we store data in Solr, we can explicitly store searchable fields as separate fields, store all other fields as a json string.

But sometimes we still have to change the schema: add fields or change field type.  

API to Getting the current Solr Schema Version
Solr stores current schema version like below: the version is a double number - not a string: means you can't store something like 1.6.0. 

The following code uses Spring Solr Data SolrSchemaRequest to get the version - you can also use solrj to do the same thing.
public double getVersion() {
    double version = -1;
    try {
        final NamedList<Object> nl = solrServer.request(SolrSchemaRequest.version(), getCollection());
        final Object object = nl.get("json");
        if (object != null) {
            version = MoreObjects.firstNonNull(createFailSafeObjectmapper()
                    .readValue(object.toString(), SchemaDefinition.class).getVersion(), -1d);
        }
    } catch (final Exception e) {
        LOGGER.error(MessageFormat.format("unable to get version for collection: {0}", getCollection()), e);
    }
    return version;
}

Store the version in the data
When we save data to Solr, call: entity.setObjectVersion(getVersion());

API/Script to upgrade data, check all versions
For example, recently we move some Solr fields that are not searched to a JSON body wich maps to a Java class - XXDetail.
- So when we need to add not-searched fields, we don;t have to update Solr schema - just put it into fields in XXDetail.

So we can write either java code or scripts to migrate the old version data to the new version schema.

Change field Type
Practice - Change Long to Date - Not Searchable
Previously we store the date(field:updateDate) as tlong in Solr, and we want to change it to date - As it will make the rest API more readable, easier to read the data and query Solr.

For Solr itself: after change type tlong to date, Solr can still read and query the old data.

If this field is not searched, and you already have Spring's LongToDateConverter, DateToLongConverter. Then you can just change its type from tlong to date.
    @ReadingConverter
    @WritingConverter
    public enum LongToDateConverter implements Converter {
        INSTANCE;

        @Override
        public Date convert(final Long source) {
            if (source == null) {
                return null;
            }
            return new Date(source);
        }
    }
    @ReadingConverter
    @WritingConverter
    public enum DateToLongConverter implements Converter {
        INSTANCE;
        @Override
        public Long convert(final Date source) {
            if (source == null) {
                return null;
            }
            return source.getTime();
        }
    }

Old code(using long) - new schema
When to save data, LongToDateConverter will convert long to date object automatically; when read data, DateToLongConverter will convert date and return long.
New code(using date) - old Schema
When to save data, DateToLongConverter will convert date object to long; when read data, LongToDateConverter will convert long and return Date.

Practice - Change Long to Date - Searchable
But if we search the field in code, then we can't just simply change the type.
Old code(using long) - new schema
When search: the query will be like:
field:[along to blong], but the field is already changed to date type, so the query will fail:
Invalid Date String:'2'
at org.apache.solr.schema.TrieDateField.parseMath(TrieDateField.java:150)
at org.apache.solr.schema.TrieField.getRangeQuery(TrieField.java:369)
at org.apache.solr.parser.SolrQueryParserBase.getRangeQuery(SolrQueryParserBase.java:761)
at org.apache.solr.parser.QueryParser.Term(QueryParser.java:382)
at org.apache.solr.parser.QueryParser.Clause(QueryParser.java:185)
at org.apache.solr.parser.QueryParser.Query(QueryParser.java:107)
New code(using date) - old Schema
When search in new code, the query will be like:
field:["yyyy-MM-dd:Thh:mm:ssZ" TO "yyyy-MM-dd:Thh:mm:ssZ"], but the field is still long in schema: so it will fail:
java.lang.NumberFormatException: For input string: "2016-02-23T20:08:15.208Z"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:441)
at java.lang.Long.parseLong(Long.java:483)
at org.apache.solr.schema.TrieField.getRangeQuery(TrieField.java:343)
at org.apache.solr.parser.SolrQueryParserBase.getRangeQuery(SolrQueryParserBase.java:761)
at org.apache.solr.parser.QueryParser.Term(QueryParser.java:382)
at org.apache.solr.parser.QueryParser.Clause(QueryParser.java:185)
at org.apache.solr.parser.QueryParser.Query(QueryParser.java:107)
at org.apache.solr.parser.QueryParser.TopLevelQuery(QueryParser.java:96)
at org.apache.solr.parser.SolrQueryParserBase.parse(SolrQueryParserBase.java:151)
at org.apache.solr.search.LuceneQParser.parse(LuceneQParser.java:50)

The Solution
Now we will have to add a new field: updateTime - type date.
1. Update Schema - add fields, but not update or delete fields
-- Make sure old code works with new schema
The schema would contain both fields updateDate(tlong) and updateTime(date), the new code will store and query field updateTime(date).

2. Update to new code
-- Make sure new code work with old data

3. Migrate old version data to new version After we migrate all old version code to the new version, we will run the migrate script which will copy the old updateDate(tlong) to 
new field updateTime(date). Later we can get rid of the old field.

4. Remove code that handles the old version and old fields in next release

References
Database Migrations Done Right
Post a Comment

Labels

Java (159) Lucene-Solr (111) Interview (61) All (58) J2SE (53) Algorithm (45) Soft Skills (37) Eclipse (33) Code Example (31) Linux (24) JavaScript (23) Spring (22) Windows (22) Web Development (20) Nutch2 (18) Tools (18) Bugs (17) Debug (16) Defects (14) Text Mining (14) J2EE (13) Network (13) Troubleshooting (13) PowerShell (11) Chrome (9) Design (9) How to (9) Learning code (9) Performance (9) Problem Solving (9) UIMA (9) html (9) Http Client (8) Maven (8) Security (8) bat (8) blogger (8) Big Data (7) Continuous Integration (7) Google (7) Guava (7) JSON (7) ANT (6) Coding Skills (6) Database (6) Scala (6) Shell (6) css (6) Algorithm Series (5) Cache (5) Dynamic Languages (5) IDE (5) Lesson Learned (5) Programmer Skills (5) System Design (5) Tips (5) adsense (5) xml (5) AIX (4) Code Quality (4) GAE (4) Git (4) Good Programming Practices (4) Jackson (4) Memory Usage (4) Miscs (4) OpenNLP (4) Project Managment (4) Spark (4) Testing (4) ads (4) regular-expression (4) Android (3) Apache Spark (3) Become a Better You (3) Concurrency (3) Eclipse RCP (3) English (3) Happy Hacking (3) IBM (3) J2SE Knowledge Series (3) JAX-RS (3) Jetty (3) Restful Web Service (3) Script (3) regex (3) seo (3) .Net (2) Android Studio (2) Apache (2) Apache Procrun (2) Architecture (2) Batch (2) Bit Operation (2) Build (2) Building Scalable Web Sites (2) C# (2) C/C++ (2) CSV (2) Career (2) Cassandra (2) Distributed (2) Fiddler (2) Firefox (2) Google Drive (2) Gson (2) How to Interview (2) Html Parser (2) Http (2) Image Tools (2) JQuery (2) Jersey (2) LDAP (2) Life (2) Logging (2) Python (2) Software Issues (2) Storage (2) Text Search (2) xml parser (2) AOP (1) Application Design (1) AspectJ (1) Chrome DevTools (1) Cloud (1) Codility (1) Data Mining (1) Data Structure (1) ExceptionUtils (1) Exif (1) Feature Request (1) FindBugs (1) Greasemonkey (1) HTML5 (1) Httpd (1) I18N (1) IBM Java Thread Dump Analyzer (1) JDK Source Code (1) JDK8 (1) JMX (1) Lazy Developer (1) Mac (1) Machine Learning (1) Mobile (1) My Plan for 2010 (1) Netbeans (1) Notes (1) Operating System (1) Perl (1) Problems (1) Product Architecture (1) Programming Life (1) Quality (1) Redhat (1) Redis (1) Review (1) RxJava (1) Solutions logs (1) Team Management (1) Thread Dump Analyzer (1) Visualization (1) boilerpipe (1) htm (1) ongoing (1) procrun (1) rss (1)

Popular Posts