Solr Boost to Improve Search Relevancy

One important step when use Solr is to tune search relevancy.
Boosting Fields: qf
Not all the fields have the same importance, We can boost some fields, for example keyword or title fields.
qf=keyword^10 title^5
Boosting Phrases: field~slop^boost
pf can be used to boost close proximity. We can also specify an optional slop factor directly in "pf" with the syntax field~slop.
pf=title^20 content^10
This will translate a user query like foo bar into:
title:”foo bar”^20 OR content:”foo bar”^10
Boost Queries
A boost query is a query that will be executed on background after a user query, and that will boost the documents that matched it.
bq=sponsored:true or bq=instock:true
Boost Functions
bf=recip(ms(NOW,publicationDate),3.16e-11,1,1)
The boost Parameter in edismax
Boost exact match
Copy the content into two fields, one with LowerCaseFilterFactory to do case-insensitive search, one without LowerCaseFilterFactory to do exact match, and boost on that field.
title titleExact^10
title^10 titleExact^100
Minimum 'Should' Match: mm



Boost records that contain all terms: boost when mm=100%
Use a pf to boost on a phrase in those same two fields (just common sense)
Set up a boost query (bq) to boost the score if all the search terms are present
'q'='_query_:"{!dismax qf=$f1 mm=$mm1 pf=$f1 bq=$bq1 v=$q1}"',
'mm1'='0%',
'f1'='author^3 title^1',
'q1'='Dueber Constructivism',
'bq1'='_query_:"{!dismax qf=$f1 mm=\'100%\' v=$q1 }"^5',
'fl' ='score,*'
Boosting Documents in Solr by Recency, Popularity and User Preferences
Comparing boost methods in Solr

Resources
http://www.lucidworks.com/about-us/news/options-tune-document%E2%80%99s-relevance-solr
http://everydaydeveloper.blogspot.com/2012/02/solr-improve-relevancy-by-boosting.html
http://robotlibrarian.billdueber.com/using-localparams-in-solr-sst-2/#An_example_boost_records_that_contain_all_terms
http://wiki.apache.org/solr/ExtendedDisMax
http://nolanlawson.com/2012/06/02/comparing-boost-methods-in-solr/
http://www.slideshare.net/LucidImagination/boosting-documents-in-solr-by-recency-popularity-and-user-preferences
Post a Comment

Labels

Java (159) Lucene-Solr (110) All (60) Interview (59) J2SE (53) Algorithm (37) Eclipse (35) Soft Skills (35) Code Example (31) Linux (26) JavaScript (23) Spring (22) Windows (22) Web Development (20) Tools (19) Nutch2 (18) Bugs (17) Debug (15) Defects (14) Text Mining (14) J2EE (13) Network (13) PowerShell (11) Chrome (9) Continuous Integration (9) How to (9) Learning code (9) Performance (9) UIMA (9) html (9) Design (8) Dynamic Languages (8) Http Client (8) Maven (8) Security (8) Trouble Shooting (8) bat (8) blogger (8) Big Data (7) Google (7) Guava (7) JSON (7) Problem Solving (7) ANT (6) Coding Skills (6) Database (6) Scala (6) Shell (6) css (6) Algorithm Series (5) Cache (5) IDE (5) Lesson Learned (5) Miscs (5) Programmer Skills (5) System Design (5) Tips (5) adsense (5) xml (5) AIX (4) Code Quality (4) GAE (4) Git (4) Good Programming Practices (4) Jackson (4) Memory Usage (4) OpenNLP (4) Project Managment (4) Python (4) Spark (4) Testing (4) ads (4) regular-expression (4) Android (3) Apache Spark (3) Become a Better You (3) Concurrency (3) Eclipse RCP (3) English (3) Firefox (3) Happy Hacking (3) IBM (3) J2SE Knowledge Series (3) JAX-RS (3) Jetty (3) Restful Web Service (3) Script (3) regex (3) seo (3) .Net (2) Android Studio (2) Apache (2) Apache Procrun (2) Architecture (2) Batch (2) Build (2) Building Scalable Web Sites (2) C# (2) C/C++ (2) CSV (2) Career (2) Cassandra (2) Distributed (2) Fiddler (2) Google Drive (2) Gson (2) Html Parser (2) Http (2) Image Tools (2) JQuery (2) Jersey (2) LDAP (2) Life (2) Logging (2) Software Issues (2) Storage (2) Text Search (2) xml parser (2) AOP (1) Application Design (1) AspectJ (1) Bit Operation (1) Chrome DevTools (1) Cloud (1) Codility (1) Data Mining (1) Data Structure (1) ExceptionUtils (1) Exif (1) Feature Request (1) FindBugs (1) Greasemonkey (1) HTML5 (1) Httpd (1) I18N (1) IBM Java Thread Dump Analyzer (1) JDK Source Code (1) JDK8 (1) JMX (1) Lazy Developer (1) Mac (1) Machine Learning (1) Mobile (1) My Plan for 2010 (1) Netbeans (1) Notes (1) Operating System (1) Perl (1) Problems (1) Product Architecture (1) Programming Life (1) Quality (1) Redhat (1) Redis (1) Review (1) RxJava (1) Solutions logs (1) Team Management (1) Thread Dump Analyzer (1) Troubleshooting (1) Visualization (1) boilerpipe (1) htm (1) ongoing (1) procrun (1) rss (1)

Popular Posts