Programmer: Lifelong Learning: Java: Using classmexer MemoryUtil to Get Object Deep Memory

The Problem
In some case, we may want to get the deep memory usage of one object.
For example, in recent project, I developed one Solr request handler which will copy docs from remote solr to local solr.

The request looks like this: /solr/core/pulldocs?remoteSolr=solrurl&q=query&fl=fields&rows=ROWS&start=START
Internally, it will get 100 docs each time: first get START to START+100 then get START+100 to START+200 - there are actually 5 threads to pull docs and insert to local solr at same time.

But in one test environment, the tester reports that the get 100-docs request gets slower and slower. I am guessing it's not the case, but because some 100 docs are abnormal and huge.

So I need to find it out and prove it: I want to print each request execution time and the size of solr response from remote solr server.

Solution: Use classmexer MemoryUtil to Get Deep Memory Usage
So, how to get deep memory usage of java object
Via google search, I found we can use Java Instrumentation to get object size(Instrumentation.getObjectSize), but which just gives the shallow size of object.

Then I found MemoryUtil from classmexer which can get deep memory usage of object.
MemoryUtil.deepMemoryUsageOf(object)

Integrate classmexer MemoryUtil to Web Application
In order to use MemoryUtil in our Solr application, I add the -javaagent:C:\mysolrapp\extra\classmexer.jar to the Java startup parameter.

Then change the code like below:

QueryResponse rsp = solrServer.query(fetchQuery);
logger.info("start: " + fetchQuery.getStart() + ", deep size: "
  + MemoryUtil.deepMemoryUsageOf(rsp));

Copy the new built class to WEB-INF/classes, restart server and rerun the test. From the log, I can easily find the huge solr response from remote solr like below:
INFO: start: 4000, deep size: 714, 778, 104 ==> 700mb approximately, in normal case, it should between 1 and 10 mb.
INFO: Added 100, start: 4000, took 1195796

Then clean data, rerun test with start=4000&rows=100
Check the solr index, the size of solr index is more than 5 g, use Luke to analyze the Solr index, and found 99.99% is content field, which has more than 41 million terms.

The real root cause is in the server side, when server extracts text from file, if the file is corrupted, it will get the binary data and add it into content field which is huge. We fixed the server side code issue, and everything works fine.

Java: Using classmexer MemoryUtil to Get Object Deep Memory

Labels