Solr - Create custom data transformer to remove fields

Overview
Create custom data transformer to remove fields and remove field from json data in Solr.

The Problem
We store campaign message in Solr. One type of campaign is voucher. We return this user's voucher and other data based on user's accountId.

To support this, we add one searchable field: accountIds which includes all accountIds for this campaign. Add another field: details field which is a json string (mapping to java class) and non-searchable. It includes vouchers properties - a mapping from accountId to voucherCode.
-- We choose this approach to be consistent with existing data and make server code simpler.

accountIds and details.vouchers fields are big, and when return to client, actually we only need c this user's voucherCode.

The Solution
Excluding one field
We can build only fl to only include all fields except accountIds.  - This is kind of cumbersome, and every time we add a new field, we have to change the fl in SolrQuery.

[SOLR-3191] field exclusion from fl is promising, but it's not merged into Solr release.

So we create a data transformer which supports the following params:
removeFields - what fields to remove
Example: removeFields=accountIds,field1,field2

removeOthersVoucher - enable the feature if it's true
If removeOthersVoucher is true:
If accountId is empty, then remove all voucherCodes from details field.
If accountId is not empty, then remove all voucherCodes except accountId's voucher.

How to uses it
fl=*,[removeFeilds]&removeFields=accountIds&removeOthersVoucher=true&accountId=account1

Writing Custom Data Transformer
We use Jackson ObjectMapper to deserialize details field from String to  Map<String, Object>.

public class  MyTransformerFactory extends TransformerFactory {
    protected static Logger logger = LoggerFactory.getLogger(MyTransformerFactory.class);
    private boolean enabled = false;

    @Override
    public void init(@SuppressWarnings("rawtypes") final NamedList args) {
        try {
            super.init(args);
            if (args != null) {
                final SolrParams params = SolrParams.toSolrParams(args);
                enabled = SolrParams.toSolrParams(args).getBool("enabled", true);
            }
        } catch (final Exception e) {
            logger.error("MyTransformerFactory init failed", e);
        }
    }

    @Override
    public DocTransformer create(final String field, final SolrParams params, final SolrQueryRequest req) {
        final SolrParams reqParams = req.getParams();
        final String removeFields = reqParams.get("removeFields");
        final boolean removeOthersVoucher = reqParams.getBool("removeOthersVoucher", false);
        final String accountId = reqParams.get("accountId");
        if (!enabled || (removeFields == null && !removeOthersVoucher)) {
            return null;
        }
        return new  MyTransformer(removeFields, removeOthersVoucher, accountId);
    }

    private static class  MyTransformer extends DocTransformer {
        private static final String FIELD_DETAILS = "details";
        private static final String DETAIL_VOUCHER_CODES = "voucherCodes";

        private static ObjectMapper objectMapper = new ObjectMapper();
        private static Splitter splitter = Splitter.on(",").trimResults();

        private final String removeFields;
        private final boolean removeOthersVoucher;
        private final String accountId;

        public  MyTransformer(final String removeFields, final boolean removeOthersVoucher, final String accountId) {
            this.removeFields = removeFields;
            this.removeOthersVoucher = removeOthersVoucher;
            this.accountId = accountId;
        }
        @Override
        public String getName() {
            return  MyTransformer.class.getSimpleName();
        }

        @Override
        public void transform(final SolrDocument doc, final int docid) throws IOException {
            if (removeFields != null) {
                final Iterable<String> it = splitter.split(removeFields);
                for (final String removeField : it) {
                    doc.removeFields(removeField);
                }
            }
            try {
                if (removeOthersVoucher) {
                    removeOthersVoucher(doc);
                }
            } catch (final Exception e) {
                // ignore it if there is exception
                logger.error("MyTransformer transform failed", e);
            }
        }

        protected void removeOthersVoucher(final SolrDocument doc)
                throws IOException, JsonParseException, JsonMappingException, JsonProcessingException {
            final String detailsObj = getFieldValue(doc, FIELD_DETAILS);
            if (detailsObj == null) {
                return;
            }

            final Map<String, Object> detailsMap = objectMapper.readValue(detailsObj.toString(),
                    TypeFactory.defaultInstance().constructMapType(Map.class, String.class, Object.class));
            if (detailsMap == null) {
                return;
            }
            final Object voucherCodesObj = detailsMap.get(DETAIL_VOUCHER_CODES);
            if (!(voucherCodesObj instanceof HashMap)) {
                return;
            }
            final Map<String, String> voucherCodesMap = (Map<String, String>) voucherCodesObj;
            final String voucherCode = voucherCodesMap.get(accountId);

            final Map<String, String> myVoucherMap = new HashMap<String, String>();
            if (voucherCode != null) {
                myVoucherMap.put(accountId, voucherCode);
            }
            detailsMap.put(DETAIL_VOUCHER_CODES, myVoucherMap);

            doc.setField(FIELD_DETAILS, objectMapper.writeValueAsString(detailsMap));
        }
    }

    public static String getFieldValue(final SolrDocument doc, final String field) {
        final List<String> rst = new ArrayList<String>();
        final Object obj = doc.get(field);
        getFieldvalues(doc, rst, obj);

        if (rst.isEmpty()) {
            return null;
        }
        return rst.get(0);
    }

    public static void getFieldvalues(final SolrDocument doc, final List<String> rst, final Object obj) {
        if (obj == null) {
            return;
        }
        if (obj instanceof org.apache.lucene.document.Field) {
            final org.apache.lucene.document.Field field = (Field) obj;
            final String oldValue = field.stringValue();
            if (oldValue != null) {
                rst.add(oldValue);
            }
        } else if (obj instanceof IndexableField) {
            final IndexableField field = (IndexableField) obj;
            final String oldValue = field.stringValue();
            if (oldValue != null) {
                rst.add(oldValue);
            }
        } else if (obj instanceof Collection) {
            final Collection colls = (Collection) obj;
            for (final Object newObj : colls) {
                getFieldvalues(doc, rst, newObj);
            }
        } else {
            logger.error(MessageFormat.format("type: {0}", obj.getClass()));
            rst.add(obj.toString());
        }
    }
}
Add Transformer into solrConfig.xml

<lib dir="../../../lib" regex="lifelongprogrammer-solr-extension-jar-with-dependencies.jar" />

<transformer name="removeFeilds" class="com.lifelongprogrammer.solr.MyTransformerFactory">
  <bool name="enabled">true</bool>
</transformer>

pom.xml - Build solr-extension jar
We declare scope of solr-core as provided and use maven-assembly-plugin to build jar-with-dependencies.

<build>
  <finalName>lifelongprogrammer-solr-extension</finalName>
  <plugins>
    <plugin>
      <artifactId>maven-assembly-plugin</artifactId>
      <version>2.6</version>
      <configuration>
        <descriptorRefs>
          <descriptorRef>jar-with-dependencies</descriptorRef>
        </descriptorRefs>
      </configuration>
      <executions>
        <execution>
          <id>make-assembly</id>
          <phase>package</phase>
          <goals>
            <goal>single</goal>
          </goals>
        </execution>
      </executions>
    </plugin>
  </plugins>
</build>

<dependencies>
  <dependency>
    <groupId>org.apache.solr</groupId>
    <artifactId>solr-core</artifactId>
    <version>5.2.0</version>
    <scope>provided</scope>
  </dependency>
  <dependency>
    <groupId>com.fasterxml.jackson.core</groupId>
    <artifactId>jackson-databind</artifactId>
    <version>2.7.4</version>
  </dependency>
</dependencies>

Post a Comment

Labels

Java (159) Lucene-Solr (110) All (60) Interview (59) J2SE (53) Algorithm (37) Eclipse (35) Soft Skills (35) Code Example (31) Linux (26) JavaScript (23) Spring (22) Windows (22) Web Development (20) Tools (19) Nutch2 (18) Bugs (17) Debug (15) Defects (14) Text Mining (14) J2EE (13) Network (13) PowerShell (11) Chrome (9) Continuous Integration (9) How to (9) Learning code (9) Performance (9) UIMA (9) html (9) Design (8) Dynamic Languages (8) Http Client (8) Maven (8) Security (8) Trouble Shooting (8) bat (8) blogger (8) Big Data (7) Google (7) Guava (7) JSON (7) Problem Solving (7) ANT (6) Coding Skills (6) Database (6) Scala (6) Shell (6) css (6) Algorithm Series (5) Cache (5) IDE (5) Lesson Learned (5) Miscs (5) Programmer Skills (5) System Design (5) Tips (5) adsense (5) xml (5) AIX (4) Code Quality (4) GAE (4) Git (4) Good Programming Practices (4) Jackson (4) Memory Usage (4) OpenNLP (4) Project Managment (4) Python (4) Spark (4) Testing (4) ads (4) regular-expression (4) Android (3) Apache Spark (3) Become a Better You (3) Concurrency (3) Eclipse RCP (3) English (3) Firefox (3) Happy Hacking (3) IBM (3) J2SE Knowledge Series (3) JAX-RS (3) Jetty (3) Restful Web Service (3) Script (3) regex (3) seo (3) .Net (2) Android Studio (2) Apache (2) Apache Procrun (2) Architecture (2) Batch (2) Build (2) Building Scalable Web Sites (2) C# (2) C/C++ (2) CSV (2) Career (2) Cassandra (2) Distributed (2) Fiddler (2) Google Drive (2) Gson (2) Html Parser (2) Http (2) Image Tools (2) JQuery (2) Jersey (2) LDAP (2) Life (2) Logging (2) Software Issues (2) Storage (2) Text Search (2) xml parser (2) AOP (1) Application Design (1) AspectJ (1) Bit Operation (1) Chrome DevTools (1) Cloud (1) Codility (1) Data Mining (1) Data Structure (1) ExceptionUtils (1) Exif (1) Feature Request (1) FindBugs (1) Greasemonkey (1) HTML5 (1) Httpd (1) I18N (1) IBM Java Thread Dump Analyzer (1) JDK Source Code (1) JDK8 (1) JMX (1) Lazy Developer (1) Mac (1) Machine Learning (1) Mobile (1) My Plan for 2010 (1) Netbeans (1) Notes (1) Operating System (1) Perl (1) Problems (1) Product Architecture (1) Programming Life (1) Quality (1) Redhat (1) Redis (1) Review (1) RxJava (1) Solutions logs (1) Team Management (1) Thread Dump Analyzer (1) Troubleshooting (1) Visualization (1) boilerpipe (1) htm (1) ongoing (1) procrun (1) rss (1)

Popular Posts