Solr Join: Return Parent and Child Documents

The Requirement
We had a requirement to return both parent and child documents in Solr Join Query.

Solr Join Query doesn't return docs from parent documents: Solr Join Query
Compared To SQL
For people who are used to SQL, it's important to note that Joins in Solr are not really equivalent to SQL Joins because no information about the table being joined "from" is carried forward into the final result. A more appropriate SQL analogy would be an "inner query".
This Solr request...
/solr/collection1/select ? fl=xxx,yyy & q={!join from=inner_id to=outer_id}zzz:vvv
Is comparable to this SQL statement...

SELECT xxx, yyy
FROM collection1
WHERE outer_id IN (SELECT inner_id FROM collection1 where zzz = "vvv")

To support this, We have to change Solr's code to support the syntax like below:
q={!join from=fromField to=toField includeParent=true childfq=childfq}parentQuery

includeParent=true means it will return parent docs. Default value is false.
childfq is a query that will filter child docs.
Changing JoinQParserPlugin
First, we will change org.apache.solr.search.JoinQParserPlugin.createParser to parse local parameter.
  public QParser createParser(String qstr, SolrParams localParams,
      SolrParams params, SolrQueryRequest req) {
    return new QParser(qstr, localParams, params, req) {
      public Query parse() throws SyntaxError {
        // omitted
        boolean includeParent = Boolean.parseBoolean(getParam("includeParent"));
        String childfq = getParam("childfq");
        if (StringUtils.isNotBlank(childfq)) {
          QParser parser = QParser.getParser(childfq, "lucene", req);
          childfqQuery = parser.getQuery();
        }
        
        JoinQuery jq = new JoinQuery(fromField, toField, fromIndex, fromQuery,
            childfqQuery, includeParent);
        jq.fromCoreOpenTime = fromCoreOpenTime;
        return jq;
      }
    };
  }
}
Changing JoinQueryWeight and JoinQuery
We need change org.apache.solr.search.JoinQuery.JoinQueryWeight.getDocSet(), before return result: filter child docs if childfq is not null, include parent docs if includeParent is true.

Also we have to change JoinQuery's equals and hashCode method to consider includeParent and childfq parameters.
class JoinQuery extends Query {
  private Query childfq;
  private boolean includeParent;
  
  public JoinQuery(String fromField, String toField, String fromIndex,
      Query subQuery) {
    this(fromField, toField, fromIndex, subQuery, null, false);
  }
  
  public JoinQuery(String fromField, String toField, String fromIndex,
      Query subQuery, Query childfq, boolean includeParent) {
    this.fromField = fromField;
    this.toField = toField;
    this.fromIndex = fromIndex;
    this.q = subQuery;
    this.childfq = childfq;
    this.includeParent = includeParent;
  }
  
  private class JoinQueryWeight extends Weight {  
    public DocSet getDocSet() throws IOException {
      while (term != null) {
        // keep same
      }
      smallSetsDeferred = resultList.size();
      
      if (resultBits != null) {
        for (DocSet set : resultList) {
          set.setBitsOn(resultBits);
        }
        // return new BitDocSet(resultBits); changed as below:
        DocSet rstDocset = new BitDocSet(resultBits);
        rstDocset = postProcess(fromSet, rstDocset);
        return rstDocset;
      }
      
      if (resultList.size() == 0) {
        return DocSet.EMPTY;
      }
      
      if (resultList.size() == 1) {
        // return resultList.get(0); changed as below:
        DocSet rstDocset = resultList.get(0);
        rstDocset = postProcess(fromSet, rstDocset);
        return rstDocset;
      }
      // omitted
      
      //return new SortedIntDocSet(dedup, dedup.length); changed as below:
      DocSet rstDocset = new SortedIntDocSet(dedup, dedup.length);
      rstDocset = postProcess(fromSet, rstDocset);
      return rstDocset;
    }
    public DocSet postProcess(DocSet fromSet, DocSet rstDocset)
        throws IOException {
      if (childfq != null) {
        DocSet filterSet = toSearcher.getDocSet(childfq);
        rstDocset = rstDocset.intersection(filterSet);
      }
      if (includeParent) {
        rstDocset = rstDocset.union(fromSet);
      }
      return rstDocset;
    }
 }
  public boolean equals(Object o) {
    if (!super.equals(o)) return false;
    JoinQuery other = (JoinQuery) o;
    return this.fromField.equals(other.fromField)
        && this.toField.equals(other.toField)
        && this.getBoost() == other.getBoost()
        && this.q.equals(other.q)
        && (this.fromIndex == other.fromIndex || this.fromIndex != null
            && this.fromIndex.equals(other.fromIndex))
        && this.fromCoreOpenTime == other.fromCoreOpenTime
        && this.includeParent == other.includeParent
        && (this.childfq == other.childfq || this.childfq != null
            && this.childfq.equals(other.childfq));
  }

  public int hashCode() {
    int h = super.hashCode();
    h = h * 31 + q.hashCode();
    h = h * 31 + (int) fromCoreOpenTime;
    h = h * 31 + fromField.hashCode();
    h = h * 31 + toField.hashCode();
    // as boolean.hashCode
    h = h * 31 + (includeParent ? 1231 : 1237);
    if (childfq != null) h = h * 31 + childfq.hashCode();
    return h;
  }
  public String toString(String field) {
    return "{!join from=" + fromField + " to=" + toField
        + (fromIndex != null ? " fromIndex=" + fromIndex : "")
        + "includeParent=" + includeParent
        + (childfq != null ? " childfq=" + childfq : "") + "}" + q.toString();
  }  
 }
Caveat
As we only use this feature in single core mode, not in shards(multiple cores) mode, so we have not tested whether this works in shards mode.

References
Solr Join Query
Solr Other Parsers
Post a Comment

Labels

Java (159) Lucene-Solr (110) All (58) Interview (58) J2SE (53) Algorithm (41) Soft Skills (36) Eclipse (34) Code Example (31) Linux (25) JavaScript (23) Spring (22) Windows (22) Web Development (20) Nutch2 (18) Tools (18) Bugs (17) Debug (15) Defects (14) Text Mining (14) J2EE (13) Network (13) PowerShell (11) Chrome (9) Design (9) How to (9) Learning code (9) Performance (9) UIMA (9) html (9) Continuous Integration (8) Dynamic Languages (8) Http Client (8) Maven (8) Security (8) Trouble Shooting (8) bat (8) blogger (8) Big Data (7) Google (7) Guava (7) JSON (7) Problem Solving (7) ANT (6) Coding Skills (6) Database (6) Scala (6) Shell (6) css (6) Algorithm Series (5) Cache (5) IDE (5) Lesson Learned (5) Programmer Skills (5) System Design (5) Tips (5) adsense (5) xml (5) AIX (4) Code Quality (4) GAE (4) Git (4) Good Programming Practices (4) Jackson (4) Memory Usage (4) Miscs (4) OpenNLP (4) Project Managment (4) Python (4) Spark (4) Testing (4) ads (4) regular-expression (4) Android (3) Apache Spark (3) Become a Better You (3) Concurrency (3) Eclipse RCP (3) English (3) Happy Hacking (3) IBM (3) J2SE Knowledge Series (3) JAX-RS (3) Jetty (3) Restful Web Service (3) Script (3) regex (3) seo (3) .Net (2) Android Studio (2) Apache (2) Apache Procrun (2) Architecture (2) Batch (2) Bit Operation (2) Build (2) Building Scalable Web Sites (2) C# (2) C/C++ (2) CSV (2) Career (2) Cassandra (2) Distributed (2) Fiddler (2) Firefox (2) Google Drive (2) Gson (2) Html Parser (2) Http (2) Image Tools (2) JQuery (2) Jersey (2) LDAP (2) Life (2) Logging (2) Software Issues (2) Storage (2) Text Search (2) xml parser (2) AOP (1) Application Design (1) AspectJ (1) Chrome DevTools (1) Cloud (1) Codility (1) Data Mining (1) Data Structure (1) ExceptionUtils (1) Exif (1) Feature Request (1) FindBugs (1) Greasemonkey (1) HTML5 (1) Httpd (1) I18N (1) IBM Java Thread Dump Analyzer (1) JDK Source Code (1) JDK8 (1) JMX (1) Lazy Developer (1) Mac (1) Machine Learning (1) Mobile (1) My Plan for 2010 (1) Netbeans (1) Notes (1) Operating System (1) Perl (1) Problems (1) Product Architecture (1) Programming Life (1) Quality (1) Redhat (1) Redis (1) Review (1) RxJava (1) Solutions logs (1) Team Management (1) Thread Dump Analyzer (1) Troubleshooting (1) Visualization (1) boilerpipe (1) htm (1) ongoing (1) procrun (1) rss (1)

Popular Posts