Part 1: Shrink Solr Application Size




We want to run solr application in client side, client need download and install it, so we need try best to reduce the application's size.

From high level architecture view, we run solr.war in embedded jetty.


1. Reduce Jetty Size Jar:

Refer to: http://wiki.eclipse.org/Jetty/Tutorial/Jetty_HelloWorld
We only need download jetty-all-8.1.8.v20121106.jar from 
http://repo1.maven.org/maven2/org/eclipse/jetty/aggregate/jetty-all/8.1.8.v20121106/,or other jetty version.
Then download http://repo1.maven.org/maven2/javax/servlet/servlet-api/3.0-alpha-1/

Size of jetty-all-8.1.8.v20121106.jar is 1,785 kb + servlet-api-3.0.jar 196 kb = 1,981 kb.


As we will just run servlet in our embedded jetty, some functions are not needed, we can continue to reduce jetty seize.

http://stackoverflow.com/questions/4223597/libraries-for-embedding-jetty

So we download jetty-distribution-8.1.8.v20121106 from eclipse jetty site, just keep the following 9 jars:

jetty-http-8.1.8.v20121106.jar
jetty-io-8.1.8.v20121106.jar
jetty-security-8.1.8.v20121106.jar
jetty-server-8.1.8.v20121106.jar
jetty-servlet-8.1.8.v20121106.jar
jetty-util-8.1.8.v20121106.jar
jetty-webapp-8.1.8.v20121106.jar
jetty-xml-8.1.8.v20121106.jar
servlet-api-3.0.jar

Copy them to a temporary directory, unzip them all to current directory then just zip javax and or directory to a new jar jetty.min-8.1.8.jar: size 1,297 kb, decrease 0.7 mb.


2. Reduce Solr.war size

Download apache-solr-4.1-2012-11-17_23-18-40.zip from https://builds.apache.org/job/Solr-Artifacts-4.x/lastSuccessfulBuild/artifact/solr/package/.

Size of apache-solr-4.1-2012-11-17_23-18-40.war is 14,732 KB.


Our solr application use DataImportHandler to fetch index perodically from remote solr server, and provide http services(/solr/select) to local client.

So we remove all unneeded files from solr.war:
remove folder: csss, img, js, META-INF, tpl, admin.html, favicon.ico, WEB-INF\weblogic.xml.

Next big step is to remove unneeded jars from WEB-INF\lib.

Solr didn't do a good job at modularization: for example if I don't use Spatial Search function, we can't just remove spatial4j-0.3.jar.

So each time, we try to remove one jar, start server and run our tests, see whether the tests run well. If so, remove it, if not, keep it.


For our application, we can remove lucene-analyzers-kuromoji, lucene-grouping, lucene-memory, lucene-spatial, commons-cli, commons-lang, commons-codec, wstx-asl, httpmime, guava. 


As I don't use  solrcloud function, so I think I can remove zookeeper-3.4.5.jar, but after I remove it, it reports exception:

SEVERE: null:java.lang.NoClassDefFoundError: org/apache/zookeeper/KeeperException
        at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:315)
So I remove all other classes from zookeeper.jat except KeeperException related classes.

This step reduces 8.63 mb.


3. Reduce size of Solr.Home

In Solr.Home, we only keep the modules(jars) we need: apache-solr-dataimporthandler.jar, remove all unnecessay files from \\conf.


You can view all source code from github: 
https://github.com/jefferyyuan/tools/tree/master/ant-scripts/shrink-solr

Labels

adsense (5) Algorithm (69) Algorithm Series (35) Android (7) ANT (6) bat (8) Big Data (7) Blogger (14) Bugs (6) Cache (5) Chrome (19) Code Example (29) Code Quality (7) Coding Skills (5) Database (7) Debug (16) Design (5) Dev Tips (63) Eclipse (32) Git (5) Google (33) Guava (7) How to (9) Http Client (8) IDE (7) Interview (88) J2EE (13) J2SE (49) Java (186) JavaScript (27) JSON (7) Learning code (9) Lesson Learned (6) Linux (26) Lucene-Solr (112) Mac (10) Maven (8) Network (9) Nutch2 (18) Performance (9) PowerShell (11) Problem Solving (11) Programmer Skills (6) regex (5) Scala (6) Security (9) Soft Skills (38) Spring (22) System Design (11) Testing (7) Text Mining (14) Tips (17) Tools (24) Troubleshooting (29) UIMA (9) Web Development (19) Windows (21) xml (5)