One good way to improve our coding/design skills and knowledge is to learn from open source code.
I use Lucene/Solr and extend them with new features, it's my task and duty to learn deeply how they work and how I can write efficient code.
It's also important to learn how Lucene/Solr evolves, why need these features, how they are implemented, also how some bugs are introduced, found and fixed.
This post will focus on bugs, minor features in Solr, I will update it once in a while.
Coding
Performance
Add Long/FixedBitSet and replace usage of OpenBitSet
https://issues.apache.org/jira/browse/LUCENE-5440
http://lucene.markmail.org/thread/35gw3amo53dsqsqj
==> when a lot of data and computation, squeeze every uneeded operation
So clearly FBS is faster than OBS (perhaps unless you use fastSet/Get) since it doesn't need to do bounds checking.
Also, FBS lets your grow itself by offering a convenient copy constructor which allows to expand/shrink the set.
SOLR-7050: realtime get should internally load only fields specified in fl [Performance]
https://issues.apache.org/jira/browse/SOLR-7050
== Only load needed fields when call search.doc
StoredDocument luceneDocument = searcher.doc(docid);
changed to:
StoredDocument luceneDocument = searcher.doc(docid, rsp.getReturnFields().getLuceneFieldNames());
SOLR-6845: Suggester tests start new cores instead of reloading
https://issues.apache.org/jira/browse/SOLR-6845
LOG.info("reload(" + name + ")"); // better logging
add buildOnStartup option ==> don't make not necessary time consuming(build suggestor) blocks starts.
// in solr, we can use lazy up cores, in web application, don't put time consuming operation in listener, make it asynchronous, servelt can return "INIT Not finished“ error if still not finished.
init
else if (getStoreFile().exists()) { //
if (LOG.isDebugEnabled()) {
LOG.debug("attempt reload of the stored lookup from file " + getStoreFile());
}
https://issues.apache.org/jira/browse/SOLR-6954
Considering changing SolrClient#shutdown to SolrClient#close.
SolrClient implements Serializable, Closeable
==> so client can use try-with-resource to avoid resource leak
https://issues.apache.org/jira/browse/SOLR-6449
Add first class support for Real Time Get in Solrj
Bash and Bat
Learn from solr.cmd or solr(.sh)
SOLR-6928: solr.cmd stop works only in english
Consider Different locals when write bash/bat
change
For /f "tokens=5" %%j in ('netstat -aon ^| find /i "listening" ^| find ":%SOLR_PORT%"') do (
to
For /f "tokens=5" %%j in ('netstat -aon ^| find "TCP " ^| find ":%SOLR_PORT%"') do (
One related edit is that the find command should look for ":8983 " (with a space after the port number) to avoid matching other ports, e.g. the following stop command would select two lines in netstat output since :1234 will also match :12345
solr start -p 1234
solr start -p 12345
solr stop -p 1234
SOLR-7016: Fix bin\solr.cmd to work in a directory with spaces in the name.
Add "": "%SOLR_TIP%\bin"
SOLR-7013: use unzip if jar is not available (merged from r1653943)
Solr only reuiqres jre nor jdk, and jre doesn't have jar
Try best to make the app works
Solr 7024: improve java detection and error message
Unclear error message with solr script when lacking jar executable
https://issues.apache.org/jira/browse/SOLR-7013
I use Lucene/Solr and extend them with new features, it's my task and duty to learn deeply how they work and how I can write efficient code.
It's also important to learn how Lucene/Solr evolves, why need these features, how they are implemented, also how some bugs are introduced, found and fixed.
This post will focus on bugs, minor features in Solr, I will update it once in a while.
Coding
Performance
Add Long/FixedBitSet and replace usage of OpenBitSet
https://issues.apache.org/jira/browse/LUCENE-5440
http://lucene.markmail.org/thread/35gw3amo53dsqsqj
==> when a lot of data and computation, squeeze every uneeded operation
So clearly FBS is faster than OBS (perhaps unless you use fastSet/Get) since it doesn't need to do bounds checking.
Also, FBS lets your grow itself by offering a convenient copy constructor which allows to expand/shrink the set.
SOLR-7050: realtime get should internally load only fields specified in fl [Performance]
https://issues.apache.org/jira/browse/SOLR-7050
== Only load needed fields when call search.doc
StoredDocument luceneDocument = searcher.doc(docid);
changed to:
StoredDocument luceneDocument = searcher.doc(docid, rsp.getReturnFields().getLuceneFieldNames());
SOLR-6845: Suggester tests start new cores instead of reloading
https://issues.apache.org/jira/browse/SOLR-6845
LOG.info("reload(" + name + ")"); // better logging
add buildOnStartup option ==> don't make not necessary time consuming(build suggestor) blocks starts.
// in solr, we can use lazy up cores, in web application, don't put time consuming operation in listener, make it asynchronous, servelt can return "INIT Not finished“ error if still not finished.
init
else if (getStoreFile().exists()) { //
if (LOG.isDebugEnabled()) {
LOG.debug("attempt reload of the stored lookup from file " + getStoreFile());
}
Resource Tracking
SOLR-6950: Ensure TransactionLogs are closed with test ObjectReleaseTracker.
assert ObjectReleaseTracker.track(this);
assert ObjectReleaseTracker.release(this);
// integration test in assert mode
// use ObjectReleaseTracker to make sure resource is closed and released
assert ObjectReleaseTracker.track(this);
assert ObjectReleaseTracker.release(this);
// integration test in assert mode
// use ObjectReleaseTracker to make sure resource is closed and released
Http Client
SOLR-6931: We should do a limited retry when using HttpClient.
// always call setUseRetry, whether it is in config or not
HttpClientUtil.setUseRetry(httpClient, config.getBool(HttpClientUtil.PROP_USE_RETRY, true));
SOLR-6932: All HttpClient ConnectionManagers and SolrJ clients should always be shutdown in tests and regular code.
change HttpClient to CloseableHttpClient
all of these type of things should be made closeable - including SolrJ clients for 5.0 (rather than shutdown).
SOLR-6324: Set finite default timeouts for select and update.
Currently HttpShardHandlerFactory and UpdateShardHandler default to infinite timeouts for socket connection and read. This can lead to undesirable behaviour, for example, if a machine crashes, then searches in progress will wait forever for a result to come back and end up using threads which will only get terminated at shutdown.
clientParams.set(HttpClientUtil.PROP_CONNECTION_TIMEOUT, connectionTimeout);
this.defaultClient = HttpClientUtil.createClient(clientParams);
set socketTimeout and connTimeout for shardHandlerFactory in solr.xml
// always call setUseRetry, whether it is in config or not
HttpClientUtil.setUseRetry(httpClient, config.getBool(HttpClientUtil.PROP_USE_RETRY, true));
SOLR-6932: All HttpClient ConnectionManagers and SolrJ clients should always be shutdown in tests and regular code.
change HttpClient to CloseableHttpClient
all of these type of things should be made closeable - including SolrJ clients for 5.0 (rather than shutdown).
SOLR-6324: Set finite default timeouts for select and update.
Currently HttpShardHandlerFactory and UpdateShardHandler default to infinite timeouts for socket connection and read. This can lead to undesirable behaviour, for example, if a machine crashes, then searches in progress will wait forever for a result to come back and end up using threads which will only get terminated at shutdown.
clientParams.set(HttpClientUtil.PROP_CONNECTION_TIMEOUT, connectionTimeout);
this.defaultClient = HttpClientUtil.createClient(clientParams);
set socketTimeout and connTimeout for shardHandlerFactory in solr.xml
Miscs
SOLR-6909: Extract atomic update handling logic into AtomicUpdateDocumentMerger
Allow pluggable atomic update merging logic
util method: int docid = searcher.getFirstMatch(new Term(idField.getName(), idBytes));
Allow pluggable atomic update merging logic
util method: int docid = searcher.getFirstMatch(new Term(idField.getName(), idBytes));
SOLR-6643: Fix error reporting & logging of low level JVM Errors that occur when loading/reloading a SolrCore
Great example about how to reproduce the problem and add test cases.
CoreContainerCoreInitFailuresTest.testJavaLangErrorFromHandlerOnStartup
SOLR-4839: Upgrade to Jetty 9
set persistTempDirectory to true
Jetty 9 has builtin support for disabling protocols (POODLE)
Great example about how to reproduce the problem and add test cases.
CoreContainerCoreInitFailuresTest.testJavaLangErrorFromHandlerOnStartup
SOLR-4839: Upgrade to Jetty 9
set persistTempDirectory to true
Jetty 9 has builtin support for disabling protocols (POODLE)
excludeProtocols: SSLv3
https://issues.apache.org/jira/browse/SOLR-7059
Using paramset with multi-valued keys leads to a 500
Not complete change: Actually map in MapSolrParams is changed from Map to Map
https://issues.apache.org/jira/browse/SOLR-7046
NullPointerException when group.function uses query() function
API Design:Using paramset with multi-valued keys leads to a 500
Not complete change: Actually map in MapSolrParams is changed from Map
https://issues.apache.org/jira/browse/SOLR-7046
NullPointerException when group.function uses query() function
(Map) context = ValueSource.newContext(searcher);
The variable context is always null because it's scope is local to this function, but it gets passed on to another function later.https://issues.apache.org/jira/browse/SOLR-6954
Considering changing SolrClient#shutdown to SolrClient#close.
SolrClient implements Serializable, Closeable
==> so client can use try-with-resource to avoid resource leak
https://issues.apache.org/jira/browse/SOLR-6449
Add first class support for Real Time Get in Solrj
Bash and Bat
Learn from solr.cmd or solr(.sh)
SOLR-6928: solr.cmd stop works only in english
Consider Different locals when write bash/bat
change
For /f "tokens=5" %%j in ('netstat -aon ^| find /i "listening" ^| find ":%SOLR_PORT%"') do (
to
For /f "tokens=5" %%j in ('netstat -aon ^| find "TCP " ^| find ":%SOLR_PORT%"') do (
One related edit is that the find command should look for ":8983 " (with a space after the port number) to avoid matching other ports, e.g. the following stop command would select two lines in netstat output since :1234 will also match :12345
solr start -p 1234
solr start -p 12345
solr stop -p 1234
SOLR-7016: Fix bin\solr.cmd to work in a directory with spaces in the name.
Add "": "%SOLR_TIP%\bin"
SOLR-7013: use unzip if jar is not available (merged from r1653943)
Solr only reuiqres jre nor jdk, and jre doesn't have jar
Try best to make the app works
Solr 7024: improve java detection and error message
Unclear error message with solr script when lacking jar executable
https://issues.apache.org/jira/browse/SOLR-7013