Learning from Solr Jira issues - Part 1

One good way to improve our coding/design skills and knowledge is to learn from open source code.
I use Lucene/Solr and extend them with new features, it's my task and duty to learn deeply how they work and how I can write efficient code.  

It's also important to learn how Lucene/Solr evolves, why need these features, how they are implemented, also how some bugs are  introduced, found and fixed.

This post will focus on bugs, minor features in Solr, I will update it once in a while.

Add Long/FixedBitSet and replace usage of OpenBitSet
==> when a lot of data and computation, squeeze every uneeded operation
So clearly FBS is faster than OBS (perhaps unless you use fastSet/Get) since it doesn't need to do bounds checking.
Also, FBS lets your grow itself by offering a convenient copy constructor which allows to expand/shrink the set.

SOLR-7050: realtime get should internally load only fields specified in fl [Performance]
== Only load needed fields when call search.doc
StoredDocument luceneDocument = searcher.doc(docid);
changed to:
       StoredDocument luceneDocument = searcher.doc(docid, rsp.getReturnFields().getLuceneFieldNames());

SOLR-6845: Suggester tests start new cores instead of reloading
LOG.info("reload(" + name + ")"); // better logging
add buildOnStartup option ==> don't make not necessary time consuming(build suggestor) blocks starts.
// in solr, we can use lazy up cores, in web application, don't put time consuming operation in listener, make it asynchronous, servelt can return "INIT Not finished“ error if still not finished.
else if (getStoreFile().exists()) { //
        if (LOG.isDebugEnabled()) {
          LOG.debug("attempt reload of the stored lookup from file " + getStoreFile());

Resource Tracking
SOLR-6950: Ensure TransactionLogs are closed with test ObjectReleaseTracker.
assert ObjectReleaseTracker.track(this);
assert ObjectReleaseTracker.release(this);
// integration test in assert mode
// use ObjectReleaseTracker to make sure resource is closed and released

Http Client
SOLR-6931: We should do a limited retry when using HttpClient.
// always call setUseRetry, whether it is in config or not
 HttpClientUtil.setUseRetry(httpClient, config.getBool(HttpClientUtil.PROP_USE_RETRY, true));

SOLR-6932: All HttpClient ConnectionManagers and SolrJ clients should always be shutdown in tests and regular code.
change HttpClient to CloseableHttpClient
all of these type of things should be made closeable - including SolrJ clients for 5.0 (rather than shutdown).

SOLR-6324: Set finite default timeouts for select and update.
Currently HttpShardHandlerFactory and UpdateShardHandler default to infinite timeouts for socket connection and read. This can lead to undesirable behaviour, for example, if a machine crashes, then searches in progress will wait forever for a result to come back and end up using threads which will only get terminated at shutdown.
clientParams.set(HttpClientUtil.PROP_CONNECTION_TIMEOUT, connectionTimeout);
this.defaultClient = HttpClientUtil.createClient(clientParams);
set socketTimeout and connTimeout for shardHandlerFactory in solr.xml

SOLR-6909: Extract atomic update handling logic into AtomicUpdateDocumentMerger
Allow pluggable atomic update merging logic
util method: int docid = searcher.getFirstMatch(new Term(idField.getName(), idBytes));

SOLR-6643: Fix error reporting & logging of low level JVM Errors that occur when loading/reloading a SolrCore
Great example about how to reproduce the problem and add test cases.

SOLR-4839: Upgrade to Jetty 9
set persistTempDirectory to true
Jetty 9 has builtin support for disabling protocols (POODLE)
excludeProtocols: SSLv3

Using paramset with multi-valued keys leads to a 500
Not complete change: Actually map in MapSolrParams is changed from Map to Map

NullPointerException when group.function uses query() function
(Map) context = ValueSource.newContext(searcher); 
The variable context is always null because it's scope is local to this function, but it gets passed on to another function later.

API Design:
Considering changing SolrClient#shutdown to SolrClient#close.
SolrClient implements Serializable, Closeable
==> so client can use try-with-resource to avoid resource leak

Add first class support for Real Time Get in Solrj

Bash and Bat
Learn from solr.cmd or solr(.sh)
SOLR-6928: solr.cmd stop works only in english
Consider Different locals when write bash/bat
  For /f "tokens=5" %%j in ('netstat -aon ^| find /i "listening" ^| find ":%SOLR_PORT%"') do (
  For /f "tokens=5" %%j in ('netstat -aon ^| find "TCP " ^| find ":%SOLR_PORT%"') do (
One related edit is that the find command should look for ":8983 " (with a space after the port number) to avoid matching other ports, e.g. the following stop command would select two lines in netstat output since :1234 will also match :12345
solr start -p 1234
solr start -p 12345
solr stop -p 1234

SOLR-7016: Fix bin\solr.cmd to work in a directory with spaces in the name.
Add "": "%SOLR_TIP%\bin"

SOLR-7013: use unzip if jar is not available (merged from r1653943)
Solr only reuiqres jre nor jdk, and jre doesn't have jar 
Try best to make the app works

Solr 7024: improve java detection and error message
Unclear error message with solr script when lacking jar executable
Post a Comment


Java (159) Lucene-Solr (110) All (58) Interview (58) J2SE (53) Algorithm (43) Soft Skills (36) Eclipse (34) Code Example (31) Linux (24) JavaScript (23) Spring (22) Windows (22) Web Development (20) Nutch2 (18) Tools (18) Bugs (17) Debug (15) Defects (14) Text Mining (14) J2EE (13) Network (13) PowerShell (11) Chrome (9) Design (9) How to (9) Learning code (9) Performance (9) UIMA (9) html (9) Dynamic Languages (8) Http Client (8) Maven (8) Security (8) Trouble Shooting (8) bat (8) blogger (8) Big Data (7) Continuous Integration (7) Google (7) Guava (7) JSON (7) Problem Solving (7) ANT (6) Coding Skills (6) Database (6) Scala (6) Shell (6) css (6) Algorithm Series (5) Cache (5) IDE (5) Lesson Learned (5) Programmer Skills (5) System Design (5) Tips (5) adsense (5) xml (5) AIX (4) Code Quality (4) GAE (4) Git (4) Good Programming Practices (4) Jackson (4) Memory Usage (4) Miscs (4) OpenNLP (4) Project Managment (4) Python (4) Spark (4) Testing (4) ads (4) regular-expression (4) Android (3) Apache Spark (3) Become a Better You (3) Concurrency (3) Eclipse RCP (3) English (3) Happy Hacking (3) IBM (3) J2SE Knowledge Series (3) JAX-RS (3) Jetty (3) Restful Web Service (3) Script (3) regex (3) seo (3) .Net (2) Android Studio (2) Apache (2) Apache Procrun (2) Architecture (2) Batch (2) Bit Operation (2) Build (2) Building Scalable Web Sites (2) C# (2) C/C++ (2) CSV (2) Career (2) Cassandra (2) Distributed (2) Fiddler (2) Firefox (2) Google Drive (2) Gson (2) Html Parser (2) Http (2) Image Tools (2) JQuery (2) Jersey (2) LDAP (2) Life (2) Logging (2) Software Issues (2) Storage (2) Text Search (2) xml parser (2) AOP (1) Application Design (1) AspectJ (1) Chrome DevTools (1) Cloud (1) Codility (1) Data Mining (1) Data Structure (1) ExceptionUtils (1) Exif (1) Feature Request (1) FindBugs (1) Greasemonkey (1) HTML5 (1) Httpd (1) I18N (1) IBM Java Thread Dump Analyzer (1) JDK Source Code (1) JDK8 (1) JMX (1) Lazy Developer (1) Mac (1) Machine Learning (1) Mobile (1) My Plan for 2010 (1) Netbeans (1) Notes (1) Operating System (1) Perl (1) Problems (1) Product Architecture (1) Programming Life (1) Quality (1) Redhat (1) Redis (1) Review (1) RxJava (1) Solutions logs (1) Team Management (1) Thread Dump Analyzer (1) Troubleshooting (1) Visualization (1) boilerpipe (1) htm (1) ongoing (1) procrun (1) rss (1)

Popular Posts