Programmer: Lifelong Learning: 2012

Solr: How to Update Multiple Cores in One Request

Solr supports distributed search, its syntax is like: http://localhost:8080/solr/select?shards=localhost:8080/solr,localhost:9090/solr&indent=true&q=nexus7.

It doesn't support to update/upload files to multiple cores, but it is easy to support:
We can add one parameter shards to specify url of multiple cores, add one parameter shardn.parameter_name=parameter_value to specify the parameter which will be sent to shardn, parameters that not starts with shardn will be sent to all cores.
Example: Upload all csv files in folder1 to core1, all csv files in folder2 to core2:
http://localhost:8080/solr/cores?shards=http://localhost:8080/solr/collection1/,http://localhost:8080/solr/collection2/&url=/import/csv&shard0.stream.folder=foler1_path&shard1.stream.folder=folder2path&stream.contentType=text/csv;charset=utf-8

Please refer here about how to use multiple threads to upload multiple local streams files, and support stream.folder and stream.file.pattern.

Commit to core1 and core2 in one request:
http://localhost:8080/solr/cores?shards=http://localhost:8080/solr/collection1/,http://localhost:8080/solr/collection2/,&url=/update&commit=true"

Now we can update multiple cores in one request, and it's easy to write our script.

The code is like below. You can also view the complete source code here: https://github.com/jefferyyuan/solr.misc

public class MultiCoreUpdateRequestHandler extends UpdateRequestHandler {
  private static String PARAM_SHARDS = "shards";
  
  @Override
  public void handleRequestBody(final SolrQueryRequest req,
      final SolrQueryResponse rsp) throws Exception {
    try {
      
      SolrParams params = req.getParams();
      String shardsStr = params.get(PARAM_SHARDS);
      if (shardsStr == null) {
        throw new RuntimeException("No shards paramter found.");
      }
      List<String> shards = StrUtils.splitSmart(shardsStr, ',');
      
      List<ModifiableSolrParams> shardParams = new ArrayList<ModifiableSolrParams>();
      for (int i = 0; i < shards.size(); i++) {
        shardParams.add(new ModifiableSolrParams());
      }
      Iterator<String> iterator = params.getParameterNamesIterator();
      String shardParamPrefix = "shard";
      while (iterator.hasNext()) {
        String paramName = iterator.next();
        if (paramName.equals(PARAM_SHARDS)) continue;
        if (paramName.startsWith(shardParamPrefix)) {
          int index = paramName.indexOf(".");
          if (index < 0) continue;
          String numStr = paramName.substring(shardParamPrefix.length(), index);
          try {
            int shardNumber = Integer.parseInt(numStr);
            String shardParam = paramName.substring(index + 1);
            shardParams.get(shardNumber).add(shardParam, params.get(paramName));
          } catch (Exception e) {
            // do nothing
          }
        } else {
          // add common parameters
          for (ModifiableSolrParams tmp : shardParams) {
            tmp.add(paramName, params.get(paramName));
          }
        }
      }
      handleShards(shards, shardParams, rsp);
    } finally {}
  }
  
  private void handleShards(final List<String> shards,
      final List<ModifiableSolrParams> shardParams, final SolrQueryResponse rsp)
      throws InterruptedException {
    
    ExecutorService executor = null;
    
    executor = Executors.newFixedThreadPool(shards.size());
    
    for (int i = 0; i < shards.size(); i++) {
      final int index = i;
      executor.submit(new Runnable() {
        @SuppressWarnings("unchecked")
        @Override
        public void run() {
          Map<String,Object> resultMap = new LinkedHashMap<String,Object>();
          try {
            SolrServer solr = new HttpSolrServer(shards.get(index));
            
            ModifiableSolrParams params = shardParams.get(index);
            UpdateRequest request = new UpdateRequest(params.get("url"));
            resultMap.put("params", params.toNamedList());
            request.setParams(params);
            UpdateResponse response = request.process(solr);
            NamedList<Object> header = response.getResponseHeader();
            resultMap.put("responseHeader", header);
            System.err.println(response);
          } catch (Exception e) {
            NamedList<Object> error = new NamedList<Object>();
            error.add("msg", e.getMessage());
            StringWriter sw = new StringWriter();
            e.printStackTrace(new PrintWriter(sw));
            error.add("trace", sw.toString());
            resultMap.put("error", error);
            throw new RuntimeException(e);
          } finally {
            rsp.add("shard" + index, resultMap);
          }
        }
      });
    }
    executor.shutdown();
    
    boolean terminated = executor.awaitTermination(Long.MAX_VALUE,
        TimeUnit.SECONDS);
    if (!terminated) {
      throw new RuntimeException("Request takes too much time");
    }
  }
}

Solr: How to Speed Up Indexing

Store Less And Index Less
Please refer to How to Shrink Solr Index Size
Outline:
Indexed=false or Stored=false
Use best fit and least-size field type: tlong or tint.
Clean Data
Round Data
Increase precisionStep
Set omitNorms=true

Increase JAVA RAM
java -server -Xms8192M -Xmx8192M

Set overwrite as false
If the unqiue key is generated automatically, either uuid or generated in our code, or we can gurantee there is no duplicate date, we can set overwrite as false, see code: org.apache.solr.update.DirectUpdateHandler2.addDoc(AddUpdateCommand)
After push is finished, we can run facet.field=idfield&facet.mincount=2 to find out whether there is duplicate ids: either delete the old one or check whether there is error in the data.

Increase ramBufferSizeMB maxBufferedDocs, and mergeFactor in solrconfig.xml
This will reduce disk IO times.
After commit data, you may run optimize to increase query speed.

Increase size of buffer reader to reduce IO times.
To do this, you have to change solr code:
BUFFER_READER_SIZE = params.getInt(PARAM_BUFFER_READER_SIZE, 0);
if (BUFFER_READER_SIZE != 0) {
reader = new BufferedReader(reader, BUFFER_READER_SIZE);
}
Then configure size of BufferedReader in solrconfig.xml.

Use multiple threads to upload multiple files at same time.
Please refer to Solr: Use Multiple Threads to Import Local stream Files

Use multiple update processor threads
https://issues.apache.org/jira/browse/SOLR-3585
Import this improvement into your solr build.

Use Solr Multiple Cores
In my test, using one core to upload 56 million data, it takes 70 minutes, using 2 cores in one solr server, it takes 40 minutes. But no improve when increases to use 3 cores(in fact worse).
I think this is because when one core busy at IO, another core can do CPU busy operation.

Deploying multiple cores in different web server instances,in different JVMs, the performance will be better.

Solr Cloud
I tested Solr Cloud, and found it is not suitable for my task, because it requires to enable solr transaction logs, which is quite slow, and also because the overhead of zookeeper. Using Solr Cloud with 2 nodes, it takes 4 hours, much much slower.
Solr Cloud should be more suitable when the index is so huge that can't be stored in one machine.

Solr: How to Shrink Index Size

To reduce index size, we should try best to understand the application’s requirement, what each field means, what type it should be(for example, tlong or tint), what tokenizer or filter should be used, what what queries user may make.

Indexed and Stored
If user will not search on that field, we can set indexed=false for that field.
If that field is for search only, customers will never retrieve the original content, we can set stored=false.

Use best fit and least-size field type: tlong or tint.

Clean data before index them.
For instance, remove garbage data, such as NA.

Round Data
For example, for a date field, user may only cares date part, not hh:mm:ss part, so we can round date: round 2012-12-21T12:12:12.234Z to 2012-12-21T00:00:00Z. This can reduce term size.

Use StopFilterFactory to remove stop words.
What analyzers or filters to use to index input.
Range Query and precisionStep
For fields, that don't need range query, or performance is not important when do range query, we can set precisionStep to larger number, this can reduce term size in the cost of query speed.
termVectors, termPositions and termOffsets
For fields we don't need highlighting functionality, set these three properties to false, it will tell Solr not to store any information about terms in the index.
omitNorms
Norms are used to boosts and field length normalization during indexing time so that short document has higher score.
Set omitNorms= true for text fields, that are usually small, and don't need boost for short value.

For primitive types such as string, integer, and so on it's turned on by default in Solr 4.0). This would shrink the index a bit more and in addition to that save us some memory during queries.

Solr: Use Multiple Threads to Import Local stream Files

When import data to Solr, user can use several parameters: stream.file="path" to import multiple local files. But Solr's UpdateRequestHandler import them one by one:

org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(SolrQueryRequest, SolrQueryResponse)
for (ContentStream stream : streams) {
  documentLoader.load(req, rsp, stream, processor);
}

So to speed up index, we can use multiple threads to imports files simultaneously.
Meanwhile, I want to extent UpdateRequestHandler to add parameter stream.folder, so it will import all files under on folder, also extend UpdateRequestHandler to add parameter stream.file.pattern, so it will import all files that match the pattern.

package org.codeexample.jeffery.solr;
public class ThreadedUpdateRequestHandler extends UpdateRequestHandler {

 private static String PARAM_THREAD_NUMBER = "threads";

 private static String PARAM_STREAM_FOLDER = "stream.folder";
 private static String PARAM_STREAM_FILE_PATTERN = "stream.file.pattern";

 private static final int DEFAULT_THREAD_NUMBER = 10;
 private static int DEFAULT_THREADS = DEFAULT_THREAD_NUMBER;

 @SuppressWarnings("rawtypes")
 @Override
 public void init(NamedList args) {
  super.init(args);
  if (args != null) {
   NamedList namedList = ((NamedList) args.get("defaults"));
   if (namedList != null) {
    Object obj = namedList.get(PARAM_THREAD_NUMBER);
    if (obj != null) {
     DEFAULT_THREADS = Integer.parseInt(obj.toString());
    }
   }
  }
 }

 @Override
 public void handleRequestBody(final SolrQueryRequest req,
   final SolrQueryResponse rsp) throws Exception {

  List<ContentStream> streams = new ArrayList<ContentStream>();

  handleReqStream(req, streams);
  // here, we handle the new two parameters: stream.folder and
  // strem.filepattern
  handleStreamFolders(req, streams);
  handleFilePatterns(req, streams);
  if (streams.size() < 2) {
   // No need to use threadpool.
   SolrQueryRequestBase reqBase = (SolrQueryRequestBase) req;
   if (!streams.isEmpty()) {
    String contentType = req.getParams().get(
      CommonParams.STREAM_CONTENTTYPE);
    ContentStream stream = streams.get(0);
    if (stream instanceof ContentStreamBase) {
     ((ContentStreamBase) stream).setContentType(contentType);

    }
   }
   reqBase.setContentStreams(streams);
   super.handleRequestBody(req, rsp);
  } else {
   importStreamsMultiThreaded(req, rsp, streams);
  }
 }

 private void handleReqStream(final SolrQueryRequest req,
   List<ContentStream> streams) {
  Iterable<ContentStream> iterabler = req.getContentStreams();
  if (iterabler != null) {
   Iterator<ContentStream> iterator = iterabler.iterator();
   while (iterator.hasNext()) {
    streams.add(iterator.next());
    iterator.remove();
   }
  }
 }

 private ExecutorService importStreamsMultiThreaded(
   final SolrQueryRequest req, final SolrQueryResponse rsp,
   List<ContentStream> streams) throws InterruptedException,
   IOException {
  ExecutorService executor = null;
  SolrParams params = req.getParams();

  final UpdateRequestProcessorChain processorChain = req
    .getCore()
    .getUpdateProcessingChain(params.get(UpdateParams.UPDATE_CHAIN));

  UpdateRequestProcessor processor = processorChain.createProcessor(req,
    rsp);
  try {
   Map<String, Object> resultMap = new LinkedHashMap<String, Object>();

   resultMap.put("start_time", new Date());
   List<Map<String, Object>> details = new ArrayList<Map<String, Object>>();

   try {

    int threads = determineThreadsNumber(params, streams.size());
    ThreadFactory threadFactory = new ThreadFactory() {
     public Thread newThread(Runnable r) {
      return new Thread(r, "threadedReqeustHandler-"
        + new Date());
     }
    };
    executor = Executors.newFixedThreadPool(threads, threadFactory);
    String contentType = params
      .get(CommonParams.STREAM_CONTENTTYPE);

    Iterator<ContentStream> iterator = streams.iterator();
    while (iterator.hasNext()) {
     ContentStream stream = iterator.next();
     iterator.remove();
     if (stream instanceof ContentStreamBase) {
      ((ContentStreamBase) stream)
        .setContentType(contentType);

     }
     submitTask(req, rsp, processorChain, executor, stream,
       details);
    }

    executor.shutdown();

    boolean terminated = executor.awaitTermination(Long.MAX_VALUE,
      TimeUnit.SECONDS);
    if (!terminated) {
     throw new RuntimeException("Request takes too much time");
    }
    // Perhaps commit from the parameters
    RequestHandlerUtils.handleCommit(req, processor, params, false);
    RequestHandlerUtils.handleRollback(req, processor, params,
      false);
   } finally {
    resultMap.put("end_time", new Date());

    // check whether there is error in details
    for (Map<String, Object> map : details) {
     Exception ex = (Exception) map.get("exception");
     if (ex != null) {
      rsp.setException(ex);
      if (ex instanceof SolrException) {
       rsp.add("status", ((SolrException) ex).code());
      } else {
       rsp.add("status",
         SolrException.ErrorCode.BAD_REQUEST);
      }
      break;
     }
    }
   }
   resultMap.put("details", details);
   rsp.add("result", resultMap);
   return executor;
  } finally {
   if (executor != null && !executor.isShutdown()) {
    executor.shutdownNow();
   }
   // finish the request
   processor.finish();
  }
 }

 private int determineThreadsNumber(SolrParams params, int streamSize) {
  int threads = DEFAULT_THREADS;
  String str = params.get(PARAM_THREAD_NUMBER);
  if (str != null) {
   threads = Integer.parseInt(str);
  }

  if (streamSize < threads) {
   threads = streamSize;
  }
  return threads;
 }

 private void handleFilePatterns(final SolrQueryRequest req,
   List<ContentStream> streams) {
  String[] strs = req.getParams().getParams(PARAM_STREAM_FILE_PATTERN);
  if (strs != null) {
   for (String filePattern : strs) {
    // it may point to a file
    File file = new File(filePattern);
    if (file.isFile()) {
     streams.add(new ContentStreamBase.FileStream(file));
    } else {
     // only supports tail regular expression, such as
     // c:\foldera\c*.csv
     int lastIndex = filePattern.lastIndexOf(File.separator);
     if (lastIndex > -1) {
      File folder = new File(filePattern.substring(0,
        lastIndex));

      if (!folder.exists()) {
       throw new RuntimeException("Folder " + folder
         + " doesn't exists.");
      }

      String pattern = filePattern.substring(lastIndex + 1);
      pattern = convertPattern(pattern);
      final Pattern p = Pattern.compile(pattern);

      File[] files = folder.listFiles(new FilenameFilter() {
       @Override
       public boolean accept(File dir, String name) {
        Matcher matcher = p.matcher(name);
        return matcher.matches();
       }
      });

      if (files != null) {
       for (File tmp : files) {
        streams.add(new ContentStreamBase.FileStream(
          tmp));
       }
      }
     }
    }
   }
  }
 }

 private void handleStreamFolders(final SolrQueryRequest req,
   List<ContentStream> streams) {
  String[] strs = req.getParams().getParams(PARAM_STREAM_FOLDER);
  if (strs != null) {
   for (String folderStr : strs) {

    File folder = new File(folderStr);

    File[] files = folder.listFiles();

    if (files != null) {
     for (File file : files) {
      streams.add(new ContentStreamBase.FileStream(file));
     }
    }
   }
  }
 }

 /**
  * replace * to .*, replace . to \.
  */
 private String convertPattern(String pattern) {
  pattern = pattern.replaceAll("\\.", "\\\\.");
  pattern = pattern.replaceAll("\\*", ".*");
  return pattern;
 }

 private void submitTask(final SolrQueryRequest req,
   final SolrQueryResponse rsp,
   final UpdateRequestProcessorChain processorChain,
   ExecutorService executor, final ContentStream stream,
   final List<Map<String, Object>> rspResult) {
  Thread thread = new Thread() {
   public void run() {
    Map<String, Object> map = new LinkedHashMap<String, Object>();
    map.put("start_time", new Date().toString());

    if (stream instanceof ContentStreamBase.FileStream) {
     map.put("Import File: ",
       ((ContentStreamBase.FileStream) stream).getName());
    }
    try {
     UpdateRequestProcessor processor = null;
     try {
      processor = processorChain.createProcessor(req, rsp);

      ContentStreamLoader documentLoader = newLoader(req,
        processor);

      documentLoader.load(req, rsp, stream, processor);
      System.err.println(rsp);

     } finally {
      if (processor != null) {
       // finish the request
       processor.finish();
      }
     }
    } catch (Exception e) {
     rsp.setException(e);
    } finally {
     map.put("end_time", new Date().toString());
     if (rsp.getException() != null) {
      map.put("exception", rsp.getException());
     }
     rspResult.add(map);
    }

   };
  };

  executor.execute(thread);
 }
}

You can view the complete source code here:
https://github.com/jefferyyuan/solr.misc

Solr: Define an Custom Field Type to Round Data

In previous post: Solr: Use UpdateRequestProcessor to Round Data, I use UpdateRequestProcessor to round a date, in this post, I want to describe how to define a custom field type to round a date.

In Solr, for text field, we can define analyzers. Solr will index the data as specified by the analyzers, and we can set stored=true, this will keep the original data.
But for an number(various subtypes of TrieField) or date field(TrieDateField), we can't define analyzers.

But we can create a custom number field, which extends TrieLongField or other types, or create a custom date field which extends TrieDateField. In the implementation, we can change the stored and indexed data.

The code looks like below.

public class RoundDateField extends TrieDateField {
   
   private SimpleDateFormat sdf;
   
   private String fromFormat = null;
   private static String PARAM_FROM_FORMAT = "fromFormat";
   private static String DATE_FORMAT_UNIX_SECOND = "UNIX_SECOND";
   private static String DATE_FORMAT_UNIX_MILLSECOND = "UNIX_MILLSECOND";
   
   private static long MS_IN_DAY = 3600 * 24 * 1000;
   private static final long SECONDS_FROM_EPCO = new Date().getTime() / 1000;
   
   @Override
   protected void init(IndexSchema schema, Map<String,String> args) {
     if (args != null) {
       fromFormat = args.remove(PARAM_FROM_FORMAT);
     }
     sdf = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss.SSS'Z'", Locale.US);
     sdf.setTimeZone(UTC);
     super.init(schema, args);
   }
   
   /**
    * if value > SECONDS_FROM_EPCO, then treat value as milliseconds, otherwise
    * treat value as seconds
    * 
    * @param value
    * @return
    */
   private long convertToMillseconds(long value) {
     long result = value;
     if (value < SECONDS_FROM_EPCO) {
       result = result * 1000L;
     }
     return result;
   }
   
   @Override
   public IndexableField createField(SchemaField field, Object value, float boost) {
     
     try {
       long millseconds = -1;
       
       try {
         millseconds = Long.parseLong(value.toString());
         
         if (fromFormat != null) {
           if (DATE_FORMAT_UNIX_MILLSECOND.equalsIgnoreCase(fromFormat)) {
             // do nothing
           } else if (DATE_FORMAT_UNIX_SECOND.equalsIgnoreCase(fromFormat)) {
             millseconds = millseconds * 1000L;
           } else {
             throw new RuntimeException("Invalid fromFormat: " + fromFormat);
           }
         } else {
           millseconds = convertToMillseconds(millseconds);
         }
         
       } catch (Exception ex) {
         // so it should be a date string
         millseconds = sdf.parse(value.toString()).getTime();
       }
       
       millseconds = (millseconds / MS_IN_DAY) * MS_IN_DAY + (MS_IN_DAY / 2);
       // returned value must be a date time string
       value = new Date(millseconds);
     } catch (Exception ex) {
       throw new RuntimeException(ex);
     }
     
     return super.createField(field, value, boost);
   }
}

We may want to search on the rounded date, but still able to retrieve the original date. To do this, we can create one field to store the original content, but set indexed=false, as we won't search on it. Then we create another field to index the rounded date: we set indexed=true, but stored=false, as we will not retrieve or display the round value to user.

In solr.XML, the field, access_time is an normal date type, and it will store the original date value, we copy its value to another type access_time_rounded, which type is the custom type we define.

<fieldType name="roundededDate" class="org.codeexample.jeffery.solr.RoundDateField" omitNorms="true" precisionStep="6" 
  positionIncrementGap="0" fromFormat="UNIX_SECOND" /> 

<fieldType name="roundededDateSmart" class="org.codeexample.jeffery.solr.RoundDateField" omitNorms="true" precisionStep="6" 
  positionIncrementGap="0" /> 

<field name="access_time" type="tdate" indexed="false" stored="true" omitNorms="true"/>
<field name="access_time_rouned" type="roundededDate" indexed="false" stored="true" omitNorms="true"/>
<copyField source="access_time" dest="access_time_rouned"/>

Compared with previous version, this has some advantages that:
1. It can auto detect the format of the passed data, whether it is a valid solr date format string, or whether it uses seconds or million seconds to represent the date.
2. Easier to use, no need to configure processor factory in solrconfig.xml, just declare field type of your data.
You can view the complete source code here:
https://github.com/jefferyyuan/solr.misc

Solr: Use UpdateRequestProcessor to Round Data

We can extend UpdateRequestProcessor to extend Solr to do many things, clean data, transform date, etc.

Sometimes, we need round the passed in data, for example: a date value, 2012-12-21T12:12:12.234Z, customer may only cares about date part, doesn't care about hour, minute parts.

So to reduce index size, and improve query performance, we can use UpdateRequestProcessor round date to 2012-12-21T00:00:00Z.
In solrconfig.xml, we can configure a processor to specify round what fields to what format, in the following code, we round them to only keep date part.

<updateRequestProcessorChain name="dateRoundChain">
  <processor class="solr.LogUpdateProcessorFactory" />
  </processor>
  <processor class="org.codeexample.jeffery.solr.DateRoundProcessorFactory" >
   <bool name="ignoreError">true</bool>
   <str name="date.fields">access_time,modify_time,mtm</str>
   <str name="date.round.fields">day,day,day</str>
  </processor>
  <processor class="solr.RunUpdateProcessorFactory" />
 </updateRequestProcessorChain>

  <requestHandler name="/import/csv" class="solr.CSVRequestHandler">
  <lst name="defaults">
   <str name="stream.contentType">application/csv</str>
   <str name="update.chain">dateRoundChain</str>
  </lst>
 </requestHandler>

The code is like below:
It now only support rounding date to only keep date or second parts, but you can easily add code to round date to only keep year, month, hour, minute part.

package org.codeexample.jeffery.solr;
public class DateRoundProcessorFactory extends UpdateRequestProcessorFactory {

	private List<String> dateFields;
	private List<String> dateRoundFields;
	// ignoreError
	private boolean ignoreError;

	private static String ROUND_DAY = "DAY";
	private static String FORMAT_DAY = "yyyy-MM-dd'T'00:00:00.0'Z'";

	// yyyy-MM-dd'T'HH:mm:ss.SSS'Z'
	private static String ROUND_SECOND = "SECOND";
	private static String FORMAT_SECOND = "yyyy-MM-dd'T'HH:mm:ss'Z'";

	@SuppressWarnings("rawtypes")
	@Override
	public void init(final NamedList args) {
		if (args != null) {
			SolrParams params = SolrParams.toSolrParams(args);
			Object fields = args.get("date.fields");
			dateFields = fields == null ? null : StrUtils.splitSmart(
					(String) fields, ",", true);

			fields = args.get("date.round.fields");
			dateRoundFields = fields == null ? null : StrUtils.splitSmart(
					(String) fields, ",", true);

			if ((dateFields == null && dateRoundFields != null)
					|| (dateFields != null && dateRoundFields == null)
					|| (dateFields != null && dateRoundFields != null
							& dateFields.size() != dateRoundFields.size()))
				throw new IllegalArgumentException(
						"Size of date.fields and date.round.fields must be same.");
			ignoreError = params.getBool("ignoreError", false);
		}
	}

	@Override
	public UpdateRequestProcessor getInstance(SolrQueryRequest req,
			SolrQueryResponse rsp, UpdateRequestProcessor next) {
		return new DateRoundProcessor(req, next);
	}

	class DateRoundProcessor extends UpdateRequestProcessor {
		public DateRoundProcessor(SolrQueryRequest req,
				UpdateRequestProcessor next) {
			super(next);
		}

		@Override
		public void processAdd(AddUpdateCommand cmd) throws IOException {
			SolrInputDocument solrInputDocument = cmd.getSolrInputDocument();
			for (int i = 0; i < dateFields.size(); i++) {
				try {
					String dateField = dateFields.get(i);
					SolrInputField inputField = solrInputDocument
							.getField(dateField);

					if (inputField != null) {
						Object obj = inputField.getValue();
						Object result = null;
						if (obj instanceof String) {
							String value = (String) obj;
							Date solrDate = parseSolrDate(value);
							String roundTo = dateRoundFields.get(i);
							DateFormat df = null;
							if (ROUND_DAY.equalsIgnoreCase(roundTo)) {
								df = new SimpleDateFormat(FORMAT_DAY);
							} else if (ROUND_SECOND.equalsIgnoreCase(roundTo)) {
								df = new SimpleDateFormat(FORMAT_SECOND);
							}
							if (df != null) {
								result = df.format(solrDate);
								// only remove it, if there is no error
								solrInputDocument.removeField(dateField);
								solrInputDocument.addField(dateField, result);
							}
						}
					}
				} catch (Exception ex) {
					if (!ignoreError) {
						throw new IOException(ex);
					}
				}
			}
			super.processAdd(cmd);
		}
	}

	public Date parseSolrDate(String dateString) throws ParseException {
		SimpleDateFormat sdf = new SimpleDateFormat(
				"yyyy-MM-dd'T'HH:mm:ss.SSS'Z'", Locale.US);
		sdf.setTimeZone(TimeZone.getTimeZone("UTC"));
		return sdf.parse(dateString);
	}
}

You can view the complete source code here:
https://github.com/jefferyyuan/solr.misc

Part 3: Use Pack200 to Shrink Solr Application Size

Part 1: Shrink Solr Application Size
Part 2: Use Proguard to Shrink Solr Application Size
Part 3: Use Pack200 to Shrink Solr Application Size
In order to continue to reduce the installation file, I decide to use pack200 to shrink jar size.

Please refer to http://docs.oracle.com/javase/1.5.0/docs/guide/deployment/deployment-guide/pack200.html
https://blogs.oracle.com/manveen/entry/pack200_and_compression_through_ant

This can remove all jars from 6.02mb to 4.44mb: 27% less.

The following is the ANT scrip to pack all jars:

<property name="jarpack-task.jar" value="C:\pathto\Pack200Task.jar" />
<taskdef name="pack200" classname="com.sun.tools.apache.ant.pack200.Pack200Task" classpath="${jarpack-task.jar}" />
<taskdef name="unpack200" classname="com.sun.tools.apache.ant.pack200.Unpack200Task" classpath="${jarpack-task.jar}" />

<target name="pack.all.jars">
 <ac:foreach target="pack.jar" param="file.name">
  <path>
   <fileset dir="${final.jars.output}" includes="*.jar" />
  </path>
 </ac:foreach>
</target>

<target name="pack.jar" description="Applying the pack utility on jars">
 <basename property="file.basename" file="${file.name}" />
 <echo message="pack ${file.name} to ${final.jars.output}/${file.basename}.pack" />
 <pack200 src="${file.name}" destfile="${final.jars.output}/${file.basename}.pack" stripdebug="true" deflatehint="keep" unknownattribute="pass" keepfileorder="true" />
 <delete file="${file.name}" />
</target>

We can use ANT to unpack these jars:

<target name="unpack.all.jars" >
  <ac:foreach target="unpack.jar" param="file.name">
   <path>
    <fileset dir="${runtime.home}" includes="*.pack" />
    <fileset dir="${runtime.home.lib}" includes="*.pack" />
    <fileset dir="${runtime.home}" includes="*.pack" />
    <fileset dir="${runtime.solr.war.lib}" includes="*.pack" />
    <fileset dir="${runtime.solr.core.lib}" includes="*.pack" />
   </path>
  </ac:foreach>
 </target>

 <target name="unpack.jar">
  <propertyregex property="file.unpack.name" input="${file.name}" regexp="(.*).pack" select="\1" />
  <echo message="unpack file ${file.name} to ${file.unpack.name}" />
  <unpack200 src="${file.name}" dest="${file.unpack.name}" />
  <delete file="${file.name}" />
 </target>

Or we can use windows(linux) script to do this:

@ECHO OFF 

echo "Unpack startjetty.jar.pack"
CALL :unpackjar startjetty.jar.pack 

echo "Unpack jars in folder ./lib"
For %%X in (lib\*.pack) do CALL :unpackjar %%X

echo "Unpack jars in folder ./solr.war\WEB-INF\lib"
For %%X in (solr.war\WEB-INF\lib\*.pack) do CALL :unpackjar %%X

echo "Unpack jars in folder ./solr-home\collection1\lib"
For %%X in (solr-home\collection1\lib\*.pack) do CALL :unpackjar %%X

GOTO :EOF

:unpackjar
set packedfile=%1
set unpackedfile=%packedfile:~0,-5%
echo unpack file: %unpackedfile% %packedfile%
unpack200 %packedfile% %unpackedfile%
DEL /Q %packedfile%
GOTO :EOF

@ECHO ON

After all these steps, we use 7zip to zip the application, size is 1,779 kb.
You can view all source code from github:
https://github.com/jefferyyuan/tools/tree/master/ant-scripts/shrink-solr

Part 2: Use Proguard to Shrink Solr Application Size

See: Part 1: Shrink Solr Application Size
4. Use proguard to shrink jar size.
http://proguard.sourceforge.net/
proguard detects and removes unused classes, fields, methods, and attributes. It optimizes bytecode and removes unused instructions.

we need use "Trial and error": run it, if report NoClassDefFoundError, or NoSuchMethodException, modify the configuration file like below:
-keep public class org.apache.solr.servlet.SolrDispatchFilter
-keepclassmembers public class org.apache.solr.servlet.SolrDispatchFilter {
*;
}
Then redo previous steps.

We can use ANT to do these tasks, the script will copy original jars to some places, use proguard to shrink them, copy shrinked jars to solr application, start the server, and run some tests.

<taskdef resource="proguard/ant/task.properties"
  classpath="<proguard4.8>\lib\proguard.jar" />
<target name="shrinkJetty" depends="postProcess">
 <proguard configuration="conf-jetty.txt"/>
</target>

<target name="shrinkSolr" depends="postProcess">
 <proguard configuration="conf-solr.txt"/>
</target>

5. Use ANT to shrink XML files
First we need remove all unneeded configuration from solrconfig.xml, Solr.xml, web.xml, and xml under /conf are very verbose: remove comments, and white spaces.

<target name="shrinkRuntimeXMLs">
 <path id="orignal.xmls.path">
  <fileset dir="${runtime.solr.core.home}/conf/">
   <include name="*.xml" />
  </fileset>
 </path>
 <property name="orignal.xmls" refid="orignal.xmls.path" />
 <echo message="orignal.xmls: ${orignal.xmls}" />
 <ac:foreach list="${orignal.xmls};${runtime.solr.home}/solr.xml;${runtime.solr.war}/WEB-INF/web.xml" delimiter=";" param="original.xml" target="shrinkXml"/>
</target>
<target name="shrinkXml">
 <basename property="original.xml.basename" file="${original.xml}"/>
 <dirname property="original.xml.dirname" file="${original.xml}"/>
 <echo message="original.xml: ${original.xml}"/>
 <xslt basedir="${original.xml.dirname}" includes="${original.xml.basename}" destdir="${shrinked.xml.output}"
   extension=".xml" style="shrink.xslt"/>
 <move file="${shrinked.xml.output}/${original.xml.basename}" tofile="${original.xml}"/>
</target>

shrink.xslt:

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:strip-space elements="*"/>
 <xsl:template match="node()|@*">
  <xsl:copy>
   <xsl:apply-templates select="@*"/>
   <xsl:apply-templates/>
  </xsl:copy>
 </xsl:template>
 <xsl:template match="comment()"/>
</xsl:stylesheet>

6. Use ANT to shrink property files
We can also remove comment, empty lines from property files.

<target name="shrinkPropertyFiles">
 <path id="orignal.properties.path">
  <fileset dir="${runtime.solr.core.home}/conf/">
   <include name="*.txt" />
  </fileset>
 </path>
 <property name="orignal.properties" refid="orignal.properties.path" />
 <ac:foreach list="${orignal.properties}" delimiter=";" param="original.propertyFile" target="shrinkPropertyFile"/>
</target>
<target name="shrinkPropertyFile">
 <replaceregexp file="${original.propertyFile}"
     match="^#.*"
     replace=""
     byline="true"/>
 <copy file="${original.propertyFile}" toFile="${original.propertyFile}-tmp">
  <filterchain>
   <ignoreblank/>
  </filterchain>
 </copy>
 <move file="${original.propertyFile}-tmp" tofile="${original.propertyFile}"/>
</target>

After all these steps, we remove the whole application from 16.6mb to 7.3 mb.
You can view all source code from github:
https://github.com/jefferyyuan/tools/tree/master/ant-scripts/shrink-solr

Part 1: Shrink Solr Application Size

We want to run solr application in client side, client need download and install it, so we need try best to reduce the application's size.

From high level architecture view, we run solr.war in embedded jetty.

1. Reduce Jetty Size Jar:
Refer to: http://wiki.eclipse.org/Jetty/Tutorial/Jetty_HelloWorld
We only need download jetty-all-8.1.8.v20121106.jar from
http://repo1.maven.org/maven2/org/eclipse/jetty/aggregate/jetty-all/8.1.8.v20121106/,or other jetty version.
Then download http://repo1.maven.org/maven2/javax/servlet/servlet-api/3.0-alpha-1/

Size of jetty-all-8.1.8.v20121106.jar is 1,785 kb + servlet-api-3.0.jar 196 kb = 1,981 kb.

As we will just run servlet in our embedded jetty, some functions are not needed, we can continue to reduce jetty seize.
http://stackoverflow.com/questions/4223597/libraries-for-embedding-jetty

So we download jetty-distribution-8.1.8.v20121106 from eclipse jetty site, just keep the following 9 jars:
jetty-http-8.1.8.v20121106.jar
jetty-io-8.1.8.v20121106.jar
jetty-security-8.1.8.v20121106.jar
jetty-server-8.1.8.v20121106.jar
jetty-servlet-8.1.8.v20121106.jar
jetty-util-8.1.8.v20121106.jar
jetty-webapp-8.1.8.v20121106.jar
jetty-xml-8.1.8.v20121106.jar
servlet-api-3.0.jar

Copy them to a temporary directory, unzip them all to current directory then just zip javax and or directory to a new jar jetty.min-8.1.8.jar: size 1,297 kb, decrease 0.7 mb.

2. Reduce Solr.war size
Download apache-solr-4.1-2012-11-17_23-18-40.zip from https://builds.apache.org/job/Solr-Artifacts-4.x/lastSuccessfulBuild/artifact/solr/package/.

Size of apache-solr-4.1-2012-11-17_23-18-40.war is 14,732 KB.

Our solr application use DataImportHandler to fetch index perodically from remote solr server, and provide http services(/solr/select) to local client.
So we remove all unneeded files from solr.war:
remove folder: csss, img, js, META-INF, tpl, admin.html, favicon.ico, WEB-INF\weblogic.xml.

Next big step is to remove unneeded jars from WEB-INF\lib.
Solr didn't do a good job at modularization: for example if I don't use Spatial Search function, we can't just remove spatial4j-0.3.jar.

So each time, we try to remove one jar, start server and run our tests, see whether the tests run well. If so, remove it, if not, keep it.

For our application, we can remove lucene-analyzers-kuromoji, lucene-grouping, lucene-memory, lucene-spatial, commons-cli, commons-lang, commons-codec, wstx-asl, httpmime, guava.

As I don't use solrcloud function, so I think I can remove zookeeper-3.4.5.jar, but after I remove it, it reports exception:
SEVERE: null:java.lang.NoClassDefFoundError: org/apache/zookeeper/KeeperException
at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:315)
So I remove all other classes from zookeeper.jat except KeeperException related classes.

This step reduces 8.63 mb.

3. Reduce size of Solr.Home
In Solr.Home, we only keep the modules(jars) we need: apache-solr-dataimporthandler.jar, remove all unnecessay files from \\conf.

You can view all source code from github:
https://github.com/jefferyyuan/tools/tree/master/ant-scripts/shrink-solr

How Solr 4.0 Resolves Library Path

A core in Solr 4.0 will load libs in <Solr_home>/<core_home>/lib,so the simplest way is to put all your jars in that directory.

You can also specify jars in other directory in solrconfig.xml in each core. Just remember that if you use relative path(we always use relative path), it is relative to current core home:<solr_home>/<core_home>, not relative to the path where solrconfig.xml exists:

In solrconfig.xml,all directories and paths are resolved relative to the instanceDir.
I should have read the document more carefully :)

You can see this from solr code:
org.apache.solr.core.SolrResourceLoader：
Path is resolved relative to current instance directory:
void addToClassLoader(final String baseDir, final FileFilter filter) {
File base = FileUtils.resolvePath(new File(getInstanceDir()), baseDir);
this.classLoader = replaceClassLoader(classLoader, base, filter);
}

org.apache.solr.core.SolrConfig.initLibs() read all lib directive, and parse the path relative to current instanceDir.

In org.apache.solr.core.SolrResourceLoader.SolrResourceLoader(String, ClassLoader, Properties):
addToClassLoader("./lib/", null);
It adds all jars in <core_home>/lib to classloader.

Ways to specify datadir - index directory
1. In solr.xml:
http://wiki.apache.org/solr/CoreAdmin

 <core name="collection1" instanceDir="collection1" dataDir="C:/jeffery/environment/solr4.1/solr4.1-index">

Or:

 <core name="collection1" instanceDir="collection1">  
      <property name="dataDir" value="C:/jeffery/environment/solr4.1/solr4.1-index" />  
 </core>

org.apache.solr.core.CoreContainer
org.apache.solr.core.CoreContainer.load(String, InputSource)
Here, we can see how it parses solr.xml.

2. In solrconfig.xml - previous ways are better.
http://wiki.apache.org/solr/SolrConfigXml

 <dataDir>C:/jeffery/environment/solr4.1/solr4.1-index</dataDir>

Linking an external folder as source or output folder in Eclipse

Today, I read this port:
Linking an external folder to Eclipse/Flex Builder Project, which explains how to add an external folder as source code in Eclipse.
1. Right click on the project -> new -> Folder.
2. Click on the Advanced button.
3. Check the checkbox with label : "Link to folder in the file system".
4. Then set it as source folder.

This makes me find way to solve my problem that has been bother me lately:
I want to save some projects to Google drive, so I can open and synchronize it in computers at company or home.
Obviously I don't want to import the built classes to GDrive, as the class files will change frequently. GDrive doesn't provide us a simple way to mark some folders as no-synchronize.

Now I can solve my problem with previous solution:
1. Create a folder bin which links to folder in the system.
2. Change your output folder to the previously created folder.

Source:
http://deviltechie.wordpress.com/2010/12/06/linking-an-external-folder-to-eclipse-flex-builder-project

Does Two Integer arrays Come from one sequence

Question:

There is a list of int arrays; the next int array is the explanation of the previous one.

For example, if the first one is [1], the next int array would be [1, 1], it means the previous array has a number one, the next array would be [2, 1], means the previous array has two one, and so on.

1 1

2 1

1 2 1 1

1 1 1 2 2 1

3 1 2 2 1 1

So the question would be: given a two int arrays A and B, you need determine whether they come form one sequence? In other word, whether you can induce from A to B, or from B to A?

Answer:

It seems that we can just induce from A, and get its next int array, and its next's next array, if one int array equals B, then we can return true.

But this problem is that what if A and B don't come from one sequence, when we stop?

We don't know.

In this problem, we can think reversely, if we starts from A, and get A's previous array, if it equals B, we can return true, if not, we continue. But if not, when we stop?

The first case: we can’t conclude previous array from the curry array:

Two cases:

1. When the current int array has odd numbers, we stop, as it's impossible to get its previous array. The reason is simple: (a[i], b[i]) describes one item of previous array, if current array has odd numbers, (a0, b0) .. (a[n], b[n]), a[n+1], a[n+1] can't describe one item of previous array.

2. When the current int array has even digits, but have some invalid pairs, such as (0 1).

Another case: if we deduce from A, and get it's parent, and its parent's parent, what if we get A again, if we continue, it will loop for ever. So in this case, we should return false, why?

A's parent array A[p'] is unqiue, A[p']'s parent A[p''] is also unique.

A[p'']

A[p']

A <--

...

A[p'']

A[p']

A <--

So the whole array sequence would be a loop. if we search from A, and meet A again, and no B during the path. So B would not be in the sequence.

Also remember that if the previous process determines whether B is in front of A in one sequence, we still need determine whether A is in front of B in some sequence.

Code:

The complete algorithm/test code and also many other algorithm problems and solutions are available from https://github.com/jefferyyuan/myAlgorithms.

package org.codeexample.jefferyyuan.sameSequence;
import java.util.ArrayList;
import java.util.List;
import org.codeexample.common.Utils;

public class AlgorithmSameSequnce {
 
    /**
     * see
     * http://programer-tips.blogspot.com/2011/08/
         * two-integer-arrays-from-same-sequence.html
     * <p>
     * 
     * @param arrayA
     * @param arrayB
     * @return
     */
    public static boolean isInSameSequnce(int[] arrayA, int[] arrayB) {
        return isInSameSequnce(Utils.toList(arrayA), Utils.toList(arrayB));
    }
 
    /**
     * see
     * http://programer-tips.blogspot.com/2011/08/
         * two-integer-arrays-from-same-sequence.html
     */
    public static boolean isInSameSequnce(List<Integer> listA,
            List<Integer> listB) {
        List<Integer> listACopy = new ArrayList<Integer>(listA);
        if (isInSameSequnceImpl(listA, listACopy, listB))
            return true;
        List<Integer> listBCopy = new ArrayList<Integer>(listB);
        return isInSameSequnceImpl(listB, listBCopy, listA);
    }
 
    private static boolean isInSameSequnceImpl(List<Integer> listA,
            List<Integer> interim, List<Integer> listB) {
        List<Integer> previous = getPrevious(interim);
        if (previous.equals(listB))
            return true;
        // meet listA again
        if (previous.equals(listA))
            return false;
        if (previous.isEmpty())
            return false;
        return isInSameSequnceImpl(listA, previous, listB);
    }
 
    /**
     * Return the previous array, for example, the previous array of [2, 1]
     * would be [1, 1], the previous of [1, 2, 1, 1] would be [2, 1]. 
 
     * If the list is invalid or can't induce its previous array, return one
     * empty list.
     * 
     * @param list
     * @return
     */
    private static List<Integer> getPrevious(List<Integer> list) {
        ArrayList<Integer> result = new ArrayList<Integer>();
        // if the list has odd number, return empty list;
        if (list.size() % 2 == 1)
            return result;
 
        for (int i = 0; i <= list.size() - 2;) {
            int times = list.get(i++);
 
            // no previous row for input [0, 1],
            if (times == 0)
                return new ArrayList<Integer>();
            int digit = list.get(i++);
            for (int j = 0; j < times; j++) {
                result.add(digit);
            }
        }
        return result;
    }
}

Is Successive Array?

Question:

Given an unordered int list, except 0, each number can appear only once, 0 can be regarded as any number. Now we need determine whether the digits from the list are logically successive,

For example,

3, 2, 1 would be considered as successive.

0, 3, 1 also is successive, as 0 can be regarded as 2.

0, 0, 3, 1 also is successive as 0, 0 can be regarded as (0, 2), or (2, 4).

Answer:

First, simplify this question, if there can be only one 0, and can't be considered as any number. How we determine whether the array is successive?

In this case, we can get the maximum and minimum of this array, if (max - min) = (length of the array -1), then this array is considered as successive. This is very straightforward.

So back to the original problem, suppose the length of the array is n, and there is x 0, and thus n-x non-zeros, so we can get the inequality: if (max - min) <= (n -1), this array is successive.

0, 3, 1 ==> (3-1) = len -1 = 2

0, 0, 3, 1 ==> (3-1) < len - 1 = 3

Code:

The complete algorithm/test code and also many other algorithm problems and solutions are available from https://github.com/jefferyyuan/myAlgorithms.

package org.codeexample.jefferyyuan.successiveArray;

public class SuccessiveArray {
  
  /**
   * see http://programer-tips.blogspot.com/ 2011/08/is-array-successive.html
   * 
   * Determine whether the unordered array is logically successive, 0 can be
   * regarded as any number. For example, [0, 3, 1] is successive, as 0 can be
   * regarded as 2.
   * 
   * @param array
   * @return
   */
  public static boolean isArraySuccessive(int[] array) {
    int min = Integer.MAX_VALUE, max = Integer.MIN_VALUE;
    
    for (int i = 0; i < array.length; i++) {
      int temp = array[i];
      if (temp == 0) continue;
      if (temp < min) {
        min = temp;
      }
      if (temp > max) {
        max = temp;
      }
    }
    return (max - min) <= (array.length - 1);
  }
}

Implementation Variants of Singleton Pattern

Singleton is one of the most common design patterns, but it has many implementation variants: lazy Instantiation, eager Instantiation, static holder idiom, and etc. Static holder idiom is my favorite.

package org.codeexample.jefferyyuan.javacode.singletons;

import java.io.Serializable;

public class SinletonVariants {}

/**
 * When the singleton class is referenced, its instance would not be created,
 * and also Java guarantees that the class initialization is atomic.
 * 
 * So using the static holder idiom, we combine the benefit of lazy
 * instantiation and no further synchronization after the instance is created,
 * 
 * My favorite, always use this one.
 */
class SingletonHolderIdiom {
  private SingletonHolderIdiom() {}
  
  private static class SingletonHolder {
    private static final SingletonHolderIdiom instance = new SingletonHolderIdiom();
  }
  
  public static SingletonHolderIdiom getInstance() {
    return SingletonHolder.instance;
  }
}

/**
 * To maintain the singleton guarantee, you have to declare all instance fields
 * transient and provide a readResolve method that directly return the static
 * instance, also you must use eager instantiation.
 * 
 * see Effective Java 2nd Edition: Item 3: Enforce the singleton property with a
 * private constructor or an enum type
 */
class SerializableSingleton implements Serializable {
  private static final long serialVersionUID = 1L;
  private static SerializableSingleton instance = new SerializableSingleton();
  
  private SerializableSingleton() {}
  
  public static SerializableSingleton getInstance() {
    return instance;
  }
  
  // readResolve method to preserve singleton property
  private Object readResolve() {
    return instance;
  }
}

/**
 * This variant avoids the drawback of eager instantiation, as no resources are
 * allocated before the instance is actually accessed, but further
 * synchronization might seem unnecessary and expensive after the instance is
 * already constructed.
 * 
 */
class SingletonLazyInstantiation {
  private static SingletonLazyInstantiation instance;
  
  private SingletonLazyInstantiation() {}
  
  public static synchronized SingletonLazyInstantiation getInstance() {
    if (instance == null) {
      instance = new SingletonLazyInstantiation();
    }
    return instance;
  }
}

/**
 * This would initialize this singleton class eagerly, when the class is loaded
 * at first time. Thus, it may happen that the singleton instance is constructed
 * even if it is not accessed. This is a drawback, especially when the
 * construction is complex and time/resource consuming. The good part of this
 * variant is its simplicity.
 * 
 */
class SingletonEagerInstantiation {
  private static SingletonEagerInstantiation instance = new SingletonEagerInstantiation();
  
  private SingletonEagerInstantiation() {}
  
  public static SingletonEagerInstantiation getInstance() {
    return instance;
  }
}

Learning Java Integer Code and Puzzles

1. Question - What the program would output?

int i = 127;
boolean b = (Integer.valueOf(i) == Integer.valueOf(i));
System.err.println(b);
i = 128;
b = (Integer.valueOf(i) == Integer.valueOf(i));
System.err.println(b);

We can immediately know the answer after we look at the source code.

public static Integer valueOf(int i) {
    if(i >= -128 && i <= IntegerCache.high)
        return IntegerCache.cache[i + 128];
    else
        return new Integer(i);
}

We can know it doesn't cache all Integer values, as this may consume too much memory, so it just caches the numbers between [-128, 127] in the static IntegerCache class, for other int numbers, each time it would return a new Integer. Class Long also only caches the numbers between [-128, 127].

The output of the previous program would be obvious now: true and false.

2. Autobox and Auto-unbox

2. 1 How is it implemented in JDK?

Simply put, when we call "Integer wrapper = 2;", the java compile would translate it to "Integer wrapper = Integer.valueOf(wrapper);".

When we call "int i = wrapper;", the java compile would translate it to "int i = wrapper.intValue();".

You can verify this by using javap to look. at the compiled java class: javap -c IntegerTester.

Long.valueOf(0L).equals(0)?

2.2 What would be the output of the following program?

Long l = 0L;
Integer i = 0;
System.out.println(l.equals(i));
System.out.println(i.equals(l));

The program above is same as:

So let's look at the JDK source code again:

public final class Integer extends Number 
  implements Comparable<Integer> {
    public boolean equals(Object obj) {
    if (obj instanceof Integer) {
        return value == ((Integer)obj).intValue();
    }
    return false;
    }
}
public final class Long extends Number 
  implements Comparable<Long> {
    public boolean equals(Object obj) {
      if (obj instanceof Long) {
          return value == ((Long)obj).longValue();
      }
      return false;
    }
}

From the code, we can see if the type of parameter is not same, these method would return false.

So the output of previous program would be false, false.

Autobox and auto-unbox are good features, as it will convert the primitive type or wrapper type to the needed one, and we can write less code, but we should use it carefully, as usually, it may create object under the hood, if we are unaware of this, it may cause big performance penalty.

public void test(Integer i) {
    while (i < 0) {
        --i;
    }
}

In the previous program, in each step, JVM actually does this: it calls i.intValue(), subtracts it by one, and then create a new Integer value.

For JVM, it would be same as:

while (i.intValue() < 0) {

i = Integer.valueOf(i.intValue() - 1);

}

So in each step, we would unnecessary create one Integer value and call intValue methods twice.

Try to use javap to look at the compiled class file.

If we know this, we can change our code like the below, it would be faster.

public void test(Integer i) {
      int j = i;
      while (j < 0) {
           --j;
      }
      i = j;
}

In our code, it's to better use primitive type as often as possible.

In Integer class there are many other interesting and useful methods, some methods are listed below, is it cool and a little confusing? Try to figure it out.

public final class Integer extends Number 
  implements Comparable<Integer> {
    public static int reverseBytes(int i) {
        return ((i >>> 24)           ) |
               ((i >>   8) &   0xFF00) |
               ((i <<   8) & 0xFF0000) |
               ((i << 24));
    }
    public static int reverse(int i) {
        // HD, Figure 7-1
 i = (i & 0x55555555) << 1 | (i >>> 1) & 0x55555555;
 i = (i & 0x33333333) << 2 | (i >>> 2) & 0x33333333;
 i = (i & 0x0f0f0f0f) << 4 | (i >>> 4) & 0x0f0f0f0f;
 i = (i << 24) | ((i & 0xff00) << 8) |
     ((i >>> 8) & 0xff00) | (i >>> 24);
 return i;
    }
    public static int bitCount(int i) {
        // HD, Figure 5-2
 i = i - ((i >>> 1) & 0x55555555);
 i = (i & 0x33333333) + ((i >>> 2) & 0x33333333);
 i = (i + (i >>> 4)) & 0x0f0f0f0f;
 i = i + (i >>> 8);
 i = i + (i >>> 16);
 return i & 0x3f;
    }
}

Reference:

The Advantages and Traps of Autoboxing

Labels