Use Solr map function query(group.sort=map(type,1,1,-1) ) in group flat mode

Summary
How to use Solr map function to make the fake group doc front of all child docs in group flat mode: group.sort=map(type,1,1,-1) asc,time desc
Updated
Actually the solution is much simpler than we thought:

sort=type asc, time desc&&group.sort=type desc, time desc
sort=type asc, time asc&&group.sort=type desc, time asc

But it's still good to know function queries in Solr and how Solr group and function query works.

The User Case
There are two types of docs in Solr: one is child doc including fields: type(value 0), groupId, time and etc. 
another type of doc is group doc: type(value 1), they are actually just some faked docd.

We extend the join query to make Solr return both parent and child docs: check Solr Join: Return Parent and Child Documents about how to implement it.

Then we use Solr group function: group.main=true&group.limit=100 and we want Solr return response like below:
<doc>
  <str name="id">group1</str> <!-- group doc -->
  <int name="type">1</int>
  <str name="subject">subject1</str>
  <!-- this field should be dynamically generated, as the child docs that match q and fq may vary, 
  the value should be same as the first child -->
  <str name="time">2015-01-06T14:45:00.000Z</str> 
  <!-- how many child docs that match q and fq, dynamically generated -->
  <int name="[groupCount]">3</int>
</doc>
<doc>
  <date name="time">2015-01-06T14:45:00Z</date>
  <str name="subject">subject1</str>
  <str name="id">child1</str>
  <int name="type">0</int>
  <str name="groupId">group1</str>
</doc>
<!-- .... other child docs in this group -->
<doc>
  <str name="id">group2</str> <!-- another group -->
  <int name="type">1</int>
  <str name="subject">subject2</str>
  <str name="time">2015-01-05T14:45:00.000Z</str>
  <int name="[groupCount]">7</int>
</doc>
Then we can use start and rows to do pagination.

We will talk about how to dynamically generate groupCount and time value for group doc in later post, this post will focus on this issue:
How we make sure groups are sorted by time(the max value in the group) and the group doc is always be front of all child docs.

The Solution
We tried several solution, but at last we find out the solution is actually quite easy:
As there is no time value in group doc, so it will not take into count when calculate the score for group, Solr will use the max score in child docs.
All we need do is to make group doc be front of all child docs.

We can use Solr map function in group.sort: 
group.sort=map(type,1,1,-1) asc,time desc
if its type is 1(group doc), then map its score to -1, and sort by the score asc, so the group(type=1) doc would be always the first one in the group.

http://localhost:8983/solr/select?defType=edismax&q={!join from=groupId to=id includeParent=true}some query here&group.main=true&group.limit=100&group.sort=map(type,1,1,-1) asc,time desc&sort=time desc,score 

To hide the implementation from client side, or avoid client side change, we can encapsulate the logic in our request handler.

Resources
Solr function Queries
Solr Join: Return Parent and Child Documents
Use Solr map function query(group.sort=map(type,1,1,-1) ) in group flat mode
Solr: Update other Document in DocTransformer by Writing custom SolrWriter
Solr: Use DocTransformer to dynamically Generate groupCount and time value for group doc
Post a Comment

Labels

Java (159) Lucene-Solr (112) Interview (61) All (58) J2SE (53) Algorithm (45) Soft Skills (38) Eclipse (33) Code Example (31) Linux (25) JavaScript (23) Spring (22) Windows (22) Web Development (20) Tools (19) Nutch2 (18) Bugs (17) Debug (16) Defects (14) Text Mining (14) J2EE (13) Network (13) Troubleshooting (13) PowerShell (11) Chrome (9) Design (9) How to (9) Learning code (9) Performance (9) Problem Solving (9) UIMA (9) html (9) Http Client (8) Maven (8) Security (8) bat (8) blogger (8) Big Data (7) Continuous Integration (7) Google (7) Guava (7) JSON (7) Shell (7) ANT (6) Coding Skills (6) Database (6) Lesson Learned (6) Programmer Skills (6) Scala (6) Tips (6) css (6) Algorithm Series (5) Cache (5) Dynamic Languages (5) IDE (5) System Design (5) adsense (5) xml (5) AIX (4) Code Quality (4) GAE (4) Git (4) Good Programming Practices (4) Jackson (4) Memory Usage (4) Miscs (4) OpenNLP (4) Project Managment (4) Spark (4) Testing (4) ads (4) regular-expression (4) Android (3) Apache Spark (3) Become a Better You (3) Concurrency (3) Eclipse RCP (3) English (3) Happy Hacking (3) IBM (3) J2SE Knowledge Series (3) JAX-RS (3) Jetty (3) Restful Web Service (3) Script (3) regex (3) seo (3) .Net (2) Android Studio (2) Apache (2) Apache Procrun (2) Architecture (2) Batch (2) Bit Operation (2) Build (2) Building Scalable Web Sites (2) C# (2) C/C++ (2) CSV (2) Career (2) Cassandra (2) Distributed (2) Fiddler (2) Firefox (2) Google Drive (2) Gson (2) How to Interview (2) Html Parser (2) Http (2) Image Tools (2) JQuery (2) Jersey (2) LDAP (2) Life (2) Logging (2) Python (2) Software Issues (2) Storage (2) Text Search (2) xml parser (2) AOP (1) Application Design (1) AspectJ (1) Chrome DevTools (1) Cloud (1) Codility (1) Data Mining (1) Data Structure (1) ExceptionUtils (1) Exif (1) Feature Request (1) FindBugs (1) Greasemonkey (1) HTML5 (1) Httpd (1) I18N (1) IBM Java Thread Dump Analyzer (1) JDK Source Code (1) JDK8 (1) JMX (1) Lazy Developer (1) Mac (1) Machine Learning (1) Mobile (1) My Plan for 2010 (1) Netbeans (1) Notes (1) Operating System (1) Perl (1) Problems (1) Product Architecture (1) Programming Life (1) Quality (1) Redhat (1) Redis (1) Review (1) RxJava (1) Solutions logs (1) Team Management (1) Thread Dump Analyzer (1) Visualization (1) boilerpipe (1) htm (1) ongoing (1) procrun (1) rss (1)

Popular Posts