Part 2: Use Proguard to Shrink Solr Application Size

See: Part 1: Shrink Solr Application Size
4. Use proguard to shrink jar size.
http://proguard.sourceforge.net/
proguard detects and removes unused classes, fields, methods, and attributes. It optimizes bytecode and removes unused instructions.

we need use "Trial and error": run it, if report NoClassDefFoundError, or NoSuchMethodException, modify the configuration file like below:
-keep public class org.apache.solr.servlet.SolrDispatchFilter
-keepclassmembers public class org.apache.solr.servlet.SolrDispatchFilter {
    *;
}
Then redo previous steps.

We can use ANT to do these tasks, the script will copy original jars to some places, use proguard to shrink them, copy shrinked jars to solr application, start the server, and run some tests.
<taskdef resource="proguard/ant/task.properties"
  classpath="<proguard4.8>\lib\proguard.jar" />
<target name="shrinkJetty" depends="postProcess">
 <proguard configuration="conf-jetty.txt"/>
</target>

<target name="shrinkSolr" depends="postProcess">
 <proguard configuration="conf-solr.txt"/>
</target>

5. Use ANT to shrink XML files
First we need remove all unneeded configuration from solrconfig.xml, Solr.xml, web.xml, and xml under /conf are very verbose: remove comments, and white spaces.
<target name="shrinkRuntimeXMLs">
 <path id="orignal.xmls.path">
  <fileset dir="${runtime.solr.core.home}/conf/">
   <include name="*.xml" />
  </fileset>
 </path>
 <property name="orignal.xmls" refid="orignal.xmls.path" />
 <echo message="orignal.xmls: ${orignal.xmls}" />
 <ac:foreach list="${orignal.xmls};${runtime.solr.home}/solr.xml;${runtime.solr.war}/WEB-INF/web.xml" delimiter=";" param="original.xml" target="shrinkXml"/>
</target>
<target name="shrinkXml">
 <basename property="original.xml.basename" file="${original.xml}"/>
 <dirname property="original.xml.dirname" file="${original.xml}"/>
 <echo message="original.xml: ${original.xml}"/>
 <xslt basedir="${original.xml.dirname}" includes="${original.xml.basename}" destdir="${shrinked.xml.output}"
   extension=".xml" style="shrink.xslt"/>
 <move file="${shrinked.xml.output}/${original.xml.basename}" tofile="${original.xml}"/>
</target>

shrink.xslt:
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:strip-space elements="*"/>
 <xsl:template match="node()|@*">
  <xsl:copy>
   <xsl:apply-templates select="@*"/>
   <xsl:apply-templates/>
  </xsl:copy>
 </xsl:template>
 <xsl:template match="comment()"/>
</xsl:stylesheet>
6. Use ANT to shrink property files
We can also remove comment, empty lines from property files.
<target name="shrinkPropertyFiles">
 <path id="orignal.properties.path">
  <fileset dir="${runtime.solr.core.home}/conf/">
   <include name="*.txt" />
  </fileset>
 </path>
 <property name="orignal.properties" refid="orignal.properties.path" />
 <ac:foreach list="${orignal.properties}" delimiter=";" param="original.propertyFile" target="shrinkPropertyFile"/>
</target>
<target name="shrinkPropertyFile">
 <replaceregexp file="${original.propertyFile}"
     match="^#.*"
     replace=""
     byline="true"/>
 <copy file="${original.propertyFile}" toFile="${original.propertyFile}-tmp">
  <filterchain>
   <ignoreblank/>
  </filterchain>
 </copy>
 <move file="${original.propertyFile}-tmp" tofile="${original.propertyFile}"/>
</target>
After all these steps, we remove the whole application from 16.6mb to 7.3 mb.
You can view all source code from github:
https://github.com/jefferyyuan/tools/tree/master/ant-scripts/shrink-solr

Part 1: Shrink Solr Application Size



We want to run solr application in client side, client need download and install it, so we need try best to reduce the application's size.

From high level architecture view, we run solr.war in embedded jetty.


1. Reduce Jetty Size Jar:

Refer to: http://wiki.eclipse.org/Jetty/Tutorial/Jetty_HelloWorld
We only need download jetty-all-8.1.8.v20121106.jar from 
http://repo1.maven.org/maven2/org/eclipse/jetty/aggregate/jetty-all/8.1.8.v20121106/,or other jetty version.
Then download http://repo1.maven.org/maven2/javax/servlet/servlet-api/3.0-alpha-1/

Size of jetty-all-8.1.8.v20121106.jar is 1,785 kb + servlet-api-3.0.jar 196 kb = 1,981 kb.


As we will just run servlet in our embedded jetty, some functions are not needed, we can continue to reduce jetty seize.

http://stackoverflow.com/questions/4223597/libraries-for-embedding-jetty

So we download jetty-distribution-8.1.8.v20121106 from eclipse jetty site, just keep the following 9 jars:

jetty-http-8.1.8.v20121106.jar
jetty-io-8.1.8.v20121106.jar
jetty-security-8.1.8.v20121106.jar
jetty-server-8.1.8.v20121106.jar
jetty-servlet-8.1.8.v20121106.jar
jetty-util-8.1.8.v20121106.jar
jetty-webapp-8.1.8.v20121106.jar
jetty-xml-8.1.8.v20121106.jar
servlet-api-3.0.jar

Copy them to a temporary directory, unzip them all to current directory then just zip javax and or directory to a new jar jetty.min-8.1.8.jar: size 1,297 kb, decrease 0.7 mb.


2. Reduce Solr.war size

Download apache-solr-4.1-2012-11-17_23-18-40.zip from https://builds.apache.org/job/Solr-Artifacts-4.x/lastSuccessfulBuild/artifact/solr/package/.

Size of apache-solr-4.1-2012-11-17_23-18-40.war is 14,732 KB.


Our solr application use DataImportHandler to fetch index perodically from remote solr server, and provide http services(/solr/select) to local client.

So we remove all unneeded files from solr.war:
remove folder: csss, img, js, META-INF, tpl, admin.html, favicon.ico, WEB-INF\weblogic.xml.

Next big step is to remove unneeded jars from WEB-INF\lib.

Solr didn't do a good job at modularization: for example if I don't use Spatial Search function, we can't just remove spatial4j-0.3.jar.

So each time, we try to remove one jar, start server and run our tests, see whether the tests run well. If so, remove it, if not, keep it.


For our application, we can remove lucene-analyzers-kuromoji, lucene-grouping, lucene-memory, lucene-spatial, commons-cli, commons-lang, commons-codec, wstx-asl, httpmime, guava. 


As I don't use  solrcloud function, so I think I can remove zookeeper-3.4.5.jar, but after I remove it, it reports exception:

SEVERE: null:java.lang.NoClassDefFoundError: org/apache/zookeeper/KeeperException
        at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:315)
So I remove all other classes from zookeeper.jat except KeeperException related classes.

This step reduces 8.63 mb.


3. Reduce size of Solr.Home

In Solr.Home, we only keep the modules(jars) we need: apache-solr-dataimporthandler.jar, remove all unnecessay files from \\conf.


You can view all source code from github: 
https://github.com/jefferyyuan/tools/tree/master/ant-scripts/shrink-solr

How Solr 4.0 Resolves Library Path


A core in Solr 4.0 will load libs in <Solr_home>/<core_home>/lib,so the simplest way is to put all your jars in that directory.

You can also specify jars in other directory in solrconfig.xml in each core. Just remember that if you use relative path(we always use relative path), it is relative to current core home:<solr_home>/<core_home>, not relative to the path where solrconfig.xml exists:

In solrconfig.xml,all directories and paths are resolved relative to the instanceDir.
I should have read the document more carefully :)

You can see this from solr code:
org.apache.solr.core.SolrResourceLoader:
Path is resolved relative to current instance directory:
  void addToClassLoader(final String baseDir, final FileFilter filter) {
    File base = FileUtils.resolvePath(new File(getInstanceDir()), baseDir);
    this.classLoader = replaceClassLoader(classLoader, base, filter);
  }

org.apache.solr.core.SolrConfig.initLibs() read all lib directive, and parse the path relative to current instanceDir.

In org.apache.solr.core.SolrResourceLoader.SolrResourceLoader(String, ClassLoader, Properties):
addToClassLoader("./lib/", null);
It adds all jars in <core_home>/lib to classloader.

Ways to specify datadir - index directory
1. In solr.xml:
http://wiki.apache.org/solr/CoreAdmin
 <core name="collection1" instanceDir="collection1" dataDir="C:/jeffery/environment/solr4.1/solr4.1-index">  
Or:
 <core name="collection1" instanceDir="collection1">  
      <property name="dataDir" value="C:/jeffery/environment/solr4.1/solr4.1-index" />  
 </core>  
org.apache.solr.core.CoreContainer
org.apache.solr.core.CoreContainer.load(String, InputSource)
Here, we can see how it parses solr.xml.

2. In solrconfig.xml - previous ways are better.
http://wiki.apache.org/solr/SolrConfigXml
 <dataDir>C:/jeffery/environment/solr4.1/solr4.1-index</dataDir>  

Linking an external folder as source or output folder in Eclipse


Today, I read this port:
Linking an external folder to Eclipse/Flex Builder Project, which explains how to add an external folder as source code in Eclipse.
1. Right click on the project -> new -> Folder.
2. Click on the Advanced button.
3. Check the checkbox with label : "Link to folder in the file system".
4. Then set it as source folder.

This makes me find way to solve my problem that has been bother me lately:
I want to save some projects to Google drive, so I can open and synchronize it in computers at company or home.
Obviously I don't want to import the built classes to GDrive, as the class files will change frequently. GDrive doesn't provide us a simple way to mark some folders as no-synchronize.


Now I can solve my problem with previous solution:
1. Create a folder bin which links to folder in the system.
2. Change your output folder to the previously created folder.

Source:
http://deviltechie.wordpress.com/2010/12/06/linking-an-external-folder-to-eclipse-flex-builder-project

Labels

Java (159) Lucene-Solr (112) Interview (61) All (58) J2SE (53) Algorithm (45) Soft Skills (38) Eclipse (33) Code Example (31) Linux (25) JavaScript (23) Spring (22) Windows (22) Web Development (20) Tools (19) Nutch2 (18) Bugs (17) Debug (16) Defects (14) Text Mining (14) J2EE (13) Network (13) Troubleshooting (13) PowerShell (11) Chrome (9) Design (9) How to (9) Learning code (9) Performance (9) Problem Solving (9) UIMA (9) html (9) Http Client (8) Maven (8) Security (8) bat (8) blogger (8) Big Data (7) Continuous Integration (7) Google (7) Guava (7) JSON (7) Shell (7) ANT (6) Coding Skills (6) Database (6) Lesson Learned (6) Programmer Skills (6) Scala (6) Tips (6) css (6) Algorithm Series (5) Cache (5) Dynamic Languages (5) IDE (5) System Design (5) adsense (5) xml (5) AIX (4) Code Quality (4) GAE (4) Git (4) Good Programming Practices (4) Jackson (4) Memory Usage (4) Miscs (4) OpenNLP (4) Project Managment (4) Spark (4) Testing (4) ads (4) regular-expression (4) Android (3) Apache Spark (3) Become a Better You (3) Concurrency (3) Eclipse RCP (3) English (3) Happy Hacking (3) IBM (3) J2SE Knowledge Series (3) JAX-RS (3) Jetty (3) Restful Web Service (3) Script (3) regex (3) seo (3) .Net (2) Android Studio (2) Apache (2) Apache Procrun (2) Architecture (2) Batch (2) Bit Operation (2) Build (2) Building Scalable Web Sites (2) C# (2) C/C++ (2) CSV (2) Career (2) Cassandra (2) Distributed (2) Fiddler (2) Firefox (2) Google Drive (2) Gson (2) How to Interview (2) Html Parser (2) Http (2) Image Tools (2) JQuery (2) Jersey (2) LDAP (2) Life (2) Logging (2) Python (2) Software Issues (2) Storage (2) Text Search (2) xml parser (2) AOP (1) Application Design (1) AspectJ (1) Chrome DevTools (1) Cloud (1) Codility (1) Data Mining (1) Data Structure (1) ExceptionUtils (1) Exif (1) Feature Request (1) FindBugs (1) Greasemonkey (1) HTML5 (1) Httpd (1) I18N (1) IBM Java Thread Dump Analyzer (1) JDK Source Code (1) JDK8 (1) JMX (1) Lazy Developer (1) Mac (1) Machine Learning (1) Mobile (1) My Plan for 2010 (1) Netbeans (1) Notes (1) Operating System (1) Perl (1) Problems (1) Product Architecture (1) Programming Life (1) Quality (1) Redhat (1) Redis (1) Review (1) RxJava (1) Solutions logs (1) Team Management (1) Thread Dump Analyzer (1) Visualization (1) boilerpipe (1) htm (1) ongoing (1) procrun (1) rss (1)

Popular Posts