Part 1: Shrink Solr Application Size



We want to run solr application in client side, client need download and install it, so we need try best to reduce the application's size.

From high level architecture view, we run solr.war in embedded jetty.


1. Reduce Jetty Size Jar:

Refer to: http://wiki.eclipse.org/Jetty/Tutorial/Jetty_HelloWorld
We only need download jetty-all-8.1.8.v20121106.jar from 
http://repo1.maven.org/maven2/org/eclipse/jetty/aggregate/jetty-all/8.1.8.v20121106/,or other jetty version.
Then download http://repo1.maven.org/maven2/javax/servlet/servlet-api/3.0-alpha-1/

Size of jetty-all-8.1.8.v20121106.jar is 1,785 kb + servlet-api-3.0.jar 196 kb = 1,981 kb.


As we will just run servlet in our embedded jetty, some functions are not needed, we can continue to reduce jetty seize.

http://stackoverflow.com/questions/4223597/libraries-for-embedding-jetty

So we download jetty-distribution-8.1.8.v20121106 from eclipse jetty site, just keep the following 9 jars:

jetty-http-8.1.8.v20121106.jar
jetty-io-8.1.8.v20121106.jar
jetty-security-8.1.8.v20121106.jar
jetty-server-8.1.8.v20121106.jar
jetty-servlet-8.1.8.v20121106.jar
jetty-util-8.1.8.v20121106.jar
jetty-webapp-8.1.8.v20121106.jar
jetty-xml-8.1.8.v20121106.jar
servlet-api-3.0.jar

Copy them to a temporary directory, unzip them all to current directory then just zip javax and or directory to a new jar jetty.min-8.1.8.jar: size 1,297 kb, decrease 0.7 mb.


2. Reduce Solr.war size

Download apache-solr-4.1-2012-11-17_23-18-40.zip from https://builds.apache.org/job/Solr-Artifacts-4.x/lastSuccessfulBuild/artifact/solr/package/.

Size of apache-solr-4.1-2012-11-17_23-18-40.war is 14,732 KB.


Our solr application use DataImportHandler to fetch index perodically from remote solr server, and provide http services(/solr/select) to local client.

So we remove all unneeded files from solr.war:
remove folder: csss, img, js, META-INF, tpl, admin.html, favicon.ico, WEB-INF\weblogic.xml.

Next big step is to remove unneeded jars from WEB-INF\lib.

Solr didn't do a good job at modularization: for example if I don't use Spatial Search function, we can't just remove spatial4j-0.3.jar.

So each time, we try to remove one jar, start server and run our tests, see whether the tests run well. If so, remove it, if not, keep it.


For our application, we can remove lucene-analyzers-kuromoji, lucene-grouping, lucene-memory, lucene-spatial, commons-cli, commons-lang, commons-codec, wstx-asl, httpmime, guava. 


As I don't use  solrcloud function, so I think I can remove zookeeper-3.4.5.jar, but after I remove it, it reports exception:

SEVERE: null:java.lang.NoClassDefFoundError: org/apache/zookeeper/KeeperException
        at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:315)
So I remove all other classes from zookeeper.jat except KeeperException related classes.

This step reduces 8.63 mb.


3. Reduce size of Solr.Home

In Solr.Home, we only keep the modules(jars) we need: apache-solr-dataimporthandler.jar, remove all unnecessay files from \\conf.


You can view all source code from github: 
https://github.com/jefferyyuan/tools/tree/master/ant-scripts/shrink-solr
Post a Comment

Labels

Java (159) Lucene-Solr (110) All (60) Interview (59) J2SE (53) Algorithm (37) Eclipse (35) Soft Skills (35) Code Example (31) Linux (26) JavaScript (23) Spring (22) Windows (22) Web Development (20) Tools (19) Nutch2 (18) Bugs (17) Debug (15) Defects (14) Text Mining (14) J2EE (13) Network (13) PowerShell (11) Chrome (9) Continuous Integration (9) How to (9) Learning code (9) Performance (9) UIMA (9) html (9) Design (8) Dynamic Languages (8) Http Client (8) Maven (8) Security (8) Trouble Shooting (8) bat (8) blogger (8) Big Data (7) Google (7) Guava (7) JSON (7) Problem Solving (7) ANT (6) Coding Skills (6) Database (6) Scala (6) Shell (6) css (6) Algorithm Series (5) Cache (5) IDE (5) Lesson Learned (5) Miscs (5) Programmer Skills (5) System Design (5) Tips (5) adsense (5) xml (5) AIX (4) Code Quality (4) GAE (4) Git (4) Good Programming Practices (4) Jackson (4) Memory Usage (4) OpenNLP (4) Project Managment (4) Python (4) Spark (4) Testing (4) ads (4) regular-expression (4) Android (3) Apache Spark (3) Become a Better You (3) Concurrency (3) Eclipse RCP (3) English (3) Firefox (3) Happy Hacking (3) IBM (3) J2SE Knowledge Series (3) JAX-RS (3) Jetty (3) Restful Web Service (3) Script (3) regex (3) seo (3) .Net (2) Android Studio (2) Apache (2) Apache Procrun (2) Architecture (2) Batch (2) Build (2) Building Scalable Web Sites (2) C# (2) C/C++ (2) CSV (2) Career (2) Cassandra (2) Distributed (2) Fiddler (2) Google Drive (2) Gson (2) Html Parser (2) Http (2) Image Tools (2) JQuery (2) Jersey (2) LDAP (2) Life (2) Logging (2) Software Issues (2) Storage (2) Text Search (2) xml parser (2) AOP (1) Application Design (1) AspectJ (1) Bit Operation (1) Chrome DevTools (1) Cloud (1) Codility (1) Data Mining (1) Data Structure (1) ExceptionUtils (1) Exif (1) Feature Request (1) FindBugs (1) Greasemonkey (1) HTML5 (1) Httpd (1) I18N (1) IBM Java Thread Dump Analyzer (1) JDK Source Code (1) JDK8 (1) JMX (1) Lazy Developer (1) Mac (1) Machine Learning (1) Mobile (1) My Plan for 2010 (1) Netbeans (1) Notes (1) Operating System (1) Perl (1) Problems (1) Product Architecture (1) Programming Life (1) Quality (1) Redhat (1) Redis (1) Review (1) RxJava (1) Solutions logs (1) Team Management (1) Thread Dump Analyzer (1) Troubleshooting (1) Visualization (1) boilerpipe (1) htm (1) ongoing (1) procrun (1) rss (1)

Popular Posts