Writing Rest API to Export Data as CSV File - Jersey

The Problem
Sometimes we need support export data as CSV file, so people can download it and open it in excel and do some analysis.

The Data
The data we want to export it as below:
public class SurveyAnswers{
    public String surveyId;
    // some other meta data about survey, user
    public List<QuestionAnswer> answers;
// user's choice for this question
public class QuestionAnswer {
    public int questionId;
    public int optionId;
    // other meta data about the question and the option

The CSV file would be like this:
SurveyId, dataTime, /*fields about user meta data */, QuestionId1, AnswerId1, QuestionId2, AnswerId2...

The difficult part is that field headers are not fixed due to the list of QuestionAnswer.

Choose CSV Library
I choose Super-CSV, as we want to use its CsvMapWriter. We will convert the data to a hashmap, which contains fields like: QuestionId1, AnswerId1, QuestionId2, AnswerId2 etc.

Check more about superCSV at Writing CSV files

It's better to use CsvDozerBeanReader for our use case.

CsvBeanWritable Interface
In most cases, we are going to implement this interface.
 * Check: http://stackoverflow.com/questions/21942042/using-supercsv-to-change-header-values
public interface CsvBeanWritable {
     * Header in csv file - it can be anything such as 

     * new String[] { "First Name", "Last Name", "Birthday"};
    public String[] getCsvHeader();
     * The mapping of bean field to cvv field - order matters

     * new String[] { "firstName", "lastName", "birthDate"};
    public String[] getCsvMapping();

CsvMapWritable Interface
When logic is complex like in our case, we can't use CsvBeanWriter, we will implement CsvMapWritable interface.
CsvMessageBodyWriter will use superCSV CsvMapWriter to write the object as csv data.
public interface CsvMapWritable {
    public Map<String, Object> getCsvBody();
    public String[] getCsvHeader();
CsvMessageBodyWriter - the Provider
CsvMessageBodyWriter  will marshall object as csv data if the object implements CsvBeanWritable or CsvMapWritable, of the object is a collection of CsvBeanWritable or CsvMapWritable.

We need register it in ResourceConfig or tell jersey to scan the package to find it.
@Produces({CsvMessageBodyWriter.TEXT_CSV, CsvMessageBodyWriter.APPLICATION_EXCEL})
public class CsvMessageBodyWriter<T> implements MessageBodyWriter<T> {
    public static final String TEXT_CSV = "text/csv";
    public static final String APPLICATION_EXCEL = "application/vnd.ms-excel";

    public boolean isWriteable(final Class<?> type, final Type genericType, final Annotation[] annotations,
            final MediaType mediaType) {
        return true;
    public long getSize(final T data, final Class<?> type, final Type genericType, final Annotation annotations[],
            final MediaType mediaType) {
        return -1;
    public void writeTo(final T data, final Class<?> type, final Type genericType, final Annotation[] annotations,
            final MediaType mediaType, final MultivaluedMap<String, Object> httpHeaders,
            final OutputStream entityStream) throws java.io.IOException, javax.ws.rs.WebApplicationException {
        try (AbstractCsvWriter csvWriter = getCsvWriter(data, entityStream)) {
            writeDate(data, csvWriter);
    private AbstractCsvWriter getCsvWriter(final T data, final OutputStream entityStream) {
        AbstractCsvWriter csvWriter = null;
        if (data instanceof CsvBeanWritable) {
            csvWriter = new CsvBeanWriter(new OutputStreamWriter(entityStream), CsvPreference.STANDARD_PREFERENCE);
        } else if (data instanceof CsvMapWritable) {
            csvWriter = new CsvMapWriter(new OutputStreamWriter(entityStream), CsvPreference.STANDARD_PREFERENCE);
        } else if (data instanceof Collection) {
            final Collection<?> collection = (Collection<?>) data;
            csvWriter = getCsvWritterFromCollection(collection, entityStream);
        return csvWriter;

    protected void writeDate(final T data, final AbstractCsvWriter csvWriter) throws IOException {
        if (data instanceof CsvBeanWritable) {
            final CsvBeanWritable writable = (CsvBeanWritable) data;
            ((CsvBeanWriter) csvWriter).write(writable, writable.getCsvMapping());
        } else if (data instanceof CsvMapWritable) {
            final CsvMapWritable writable = (CsvMapWritable) data;
            ((CsvMapWriter) csvWriter).write(writable.getCsvBody(), writable.getCsvHeader());
        } else if (data instanceof Collection) {
            writeCollection(data, csvWriter);
        } else {
            throw new XXException("doesn't support download as csv");

    protected void writeCollection(final T data, final AbstractCsvWriter csvWriter) throws IOException {
        final Collection<?> collection = (Collection<?>) data;
        boolean first = true;
        final Iterator<?> it = collection.iterator();
        while (it.hasNext()) {
            final Object obj = it.next();
            if (CsvBeanWritable.class.isAssignableFrom(obj.getClass())) {
                final CsvBeanWritable writable = (CsvBeanWritable) obj;
                if (first) {
                    first = false;
                ((CsvBeanWriter) csvWriter).write(writable, writable.getCsvMapping());
            } else if (CsvMapWritable.class.isAssignableFrom(obj.getClass())) {
                final CsvMapWritable writable = (CsvMapWritable) obj;
                if (first) {
                    first = false;
                ((CsvMapWriter) csvWriter).write(writable.getCsvBody(), writable.getCsvHeader());
            } else {
                throw new XXException("doesn't support download as csv");
    protected static AbstractCsvWriter getCsvWritterFromCollection(final Collection<?> collection,
            final OutputStream entityStream) {
        AbstractCsvWriter csvWriter = null;
        final Iterator<?> it = collection.iterator();
        while (it.hasNext()) {
            final Object obj = it.next();
            if (CsvMapWritable.class.isAssignableFrom(obj.getClass())) {
                csvWriter = new CsvMapWriter(new OutputStreamWriter(entityStream), CsvPreference.STANDARD_PREFERENCE);
            } else if (CsvBeanWritable.class.isAssignableFrom(obj.getClass())) {
                csvWriter = new CsvBeanWriter(new OutputStreamWriter(entityStream), CsvPreference.STANDARD_PREFERENCE);
        return csvWriter;
Add CSV ability to API
 * @param downloadAsFile: downloadAsFile=false, it seems not work in chrome, only work in safari,
 *        not try safari.
@Produces({MediaType.APPLICATION_JSON, CsvMessageBodyWriter.TEXT_CSV})
public Set getFlatSurveyChoiceResponse(@QueryParam("surveyId") final String surveyId,
        @QueryParam("downloadAsFile") @DefaultValue("true") final boolean downloadAsFile) {
    if (downloadAsFile) {
        servletResponse.addHeader("Content-Disposition", MessageFormat.format("attachment; filename={0}.csv",
                DateUtil.getThreadLocalDateFormat().format(new Date())));
    return service.getSurveyAnswers(surveyId);
Using CsvMapWritable in our SurveyAnswers example

Create csv headers for this survey and add it into SurveyHeadersHolder.
for (final Integer questionId : questions) {
SurveyHeadersHolder.INSTANCE.addSurveyHeaders(surveyId, extraHeaders);

Implementing CsvMapWritable in SurveyAnswers
public Map<String, Object> getCsvBody() {
    final Map<String, Object> map = new LinkedHashMap<>();
    map.put(HEADER_SURVEY_ID, surveyId);
    map.put(HEADER_DATE, DateUtil.getThreadLocalDateFormat().format(date));
    for (final SurveyAnswers answer : answers) {
        map.put(getQuestionIDHeader(answer.getQuestionId()), answer.getQuestionId());
        map.put(getQuestionTitleHeader(answer.getQuestionId()), answer.getQuestionText());
        map.put(getOptionIDHeader(answer.getQuestionId()), answer.getOptionId());
        map.put(getOptionTextHeader(answer.getQuestionId()), answer.getOptionText());
    return map;

public String[] getCsvHeader() {
    return SurveyHeadersHolder.INSTANCE.getSurveyHeaders(surveyId);

Implement Circuit Breaker Pattern with Netflix Hystrix

When we design services, it's important to make them resilient and prevent cascading failures.

Circuit Breaker Pattern
From -Martin Fowler
The basic idea behind the circuit breaker is very simple. You wrap a protected function call in a circuit breaker object, which monitors for failures. Once the failures reach a certain threshold, the circuit breaker trips, and all further calls to the circuit breaker return with an error, without the protected call being made at all. Usually you'll also want some kind of monitor alert if the circuit breaker trips.

There are several ways to apply circuit breaker pattern in java.

Using Netflix Hystrix
public class GetProductsCommand extends HystrixCommand<Set<Product>> {
  private final GetProductsConfiguration config;
  public GetEntitlementsCommand(final GetProductsConfiguration config, final String token) {
      this.config = config;
      this.token = token;

  protected Set<Product> run() throws Exception {
   // if it's client error, throws HystrixBadRequestException
   // it will not trigger fallback, not count against failure metrics and thus not trigger the circuit breaker.
  protected Set<Product> getFallback() throws Exception {}

  public static class GetProductsConfiguration {
      // auto wire services that's going to be used by GetProductsCommand
      private int timeoutInMilliseconds;    

Call HystrixCommand asynchronously, Get result later
private GetProductsConfiguration getProductsConfiguration;

// call it asycnchoursly
final Future<Set<Products>> productsFuture = new GetProductsCommand(getProductsConfiguration, token).queue();

// later
final Set<Products> products = productsFuture.get();

Propagating ThreadLocal to HystrixCommand
Sometimes, the service we are calling expects it's called in same http thread - it expects thread local from current http thread
requestAttributes = (ServletRequestAttributes) RequestContextHolder.currentRequestAttributes();

We can get current requestAttributes and pass to HystrixCommand:
public class MyHystrixCommand extends HystrixCommand<Result> {
  private final ServletRequestAttributes requestAttributes;
  public GetEntitlementsCommand() {
      this.requestAttributes = requestAttributes;
      this.requestAttributes = RequestContextHolder.getRequestAttributes();
      this.thread = Thread.currentThread();      
  protected Result run() throws Exception {
    try {
      //do something
    } finally {
  private void clearThreadLocal()
    if (Thread.currentThread() != thread) {
    thread = null;

Using Spring Cloud Hystrix
Spring cloud wraps Netflix Hystrix to make it easier to use.

First add @EnableCircuitBreaker in spring configuration class.
Then add @HystrixCommand annotation to service methods.
@HystrixCommand(fallbackMethod = "fallBack",
commandProperties = {
        @HystrixProperty(name = "fallback.isolation.semaphore.maxConcurrentRequests", value = "1000"),
        @HystrixProperty(name = "execution.isolation.semaphore.maxConcurrentRequests", value = "1000"),
        @HystrixProperty(name = "execution.isolation.thread.timeoutInMilliseconds", value = "2000")},
ignoreExceptions = {InvalidTokenException.class})
public Set<Product> getProducts() {}

public Set<Product> fallBack() {}

@HystrixProperty(name = "execution.isolation.strategy", value = "SEMAPHORE")
  • THREAD — it executes on a separate thread and concurrent requests are limited by the number of threads in the thread-pool
  • SEMAPHORE — it executes on the calling thread and concurrent requests are limited by the semaphore count
Netflix Hystrix How to Use

Read the Error Message - Problem Solving Skills

The first step of trouble shooting is to read and understand the error message, then we can infer,guess and list likely causes from it, verify or exclude it one by one.

Senario 1
After upgrade some library to newer version and upgrade to JDK 8 and tomcat 8, deployment of the web application fails.

Step 1: Read/Understand the error message
SEVERE [localhost-startStop-1] org.apache.catalina.core.ContainerBase.addChildInternal ContainerBase.addChild: start:
 org.apache.catalina.LifecycleException: Failed to start component [StandardEngine[Catalina].StandardHost[localhost].StandardContext[/webappA]]
        at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:154)
        at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:725)
        at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:701)
        at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:717)
        at org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:945)
        at org.apache.catalina.startup.HostConfig$DeployWar.run(HostConfig.java:1798)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalStateException: Unable to complete the scan for annotations for web application [/webappA] due to a StackOverflowError. Possible root causes include a too low setting for -Xss and illegal cyclic inheritance dependencies. The class hierarchy being processed was [org.bouncycastle.asn1.ASN1Boolean->org.bouncycastle.asn1.DERBoolean->org.bouncycastle.asn1.ASN1Boolean]
        at org.apache.catalina.startup.ContextConfig.checkHandlesTypes(ContextConfig.java:2066)
        at org.apache.catalina.startup.ContextConfig.processAnnotationsStream(ContextConfig.java:2012)
        at org.apache.catalina.startup.ContextConfig.processAnnotationsJar(ContextConfig.java:1961)
        at org.apache.catalina.startup.ContextConfig.processAnnotationsUrl(ContextConfig.java:1936)
        at org.apache.catalina.startup.ContextConfig.processAnnotations(ContextConfig.java:1897)
        at org.apache.catalina.startup.ContextConfig.webConfig(ContextConfig.java:1149)
        at org.apache.catalina.startup.ContextConfig.configureStart(ContextConfig.java:771)
        at org.apache.catalina.startup.ContextConfig.lifecycleEvent(ContextConfig.java:305)
        at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:95)
        at org.apache.catalina.util.LifecycleBase.fireLifecycleEvent(LifecycleBase.java:90)
        at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5080)
        at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
        ... 10 more
In another app, the error message is:
Caused by: java.lang.IllegalStateException: Unable to complete the scan for annotations for web application [/megaphone-admin-1.0.3_jbuild363] due to a StackOverflowError. Possible root causes include a too low setting for -Xss and illegal cyclic inheritance dependencies. The class hierarchy being processed was [org.bouncycastle.asn1.ASN1EncodableVector->org.bouncycastle.asn1.DEREncodableVector->org.bouncycastle.asn1.ASN1EncodableVector]

Step 2: Find which jar contains the class

ls -altr | grep bcprov
-rw-r--r-- 1 tomcat tomcat  2902942 Sep 25 22:01 bcprov-jdk15on-1.52.jar
-rw-r--r-- 1 tomcat tomcat  1876535 Oct 12 07:01 bcprov-jdk16-1.46.jar
-rw-r--r-- 1 tomcat tomcat  1593423 Oct 12 07:01 bcprov-jdk15-140.jar

run mvn dependency:tree and check why these jars are imported, and use dependency/exclusion to exclude the old ones.

Senario 2
When deploy the app to jdk8, it fails:
org.aspectj.apache.bcel.classfile.ClassFormatException: Invalid byte tag in constant pool: 18.

To fix it, I need upgrade aspectj related libs to latest one 1.8.9 - exclude from the framework that imported it, and explicitly declare them to use 1.8.9.

Then when I run mvn clean install, it fails:  
The following artifacts could not be resolved: aspectjrt:org.aspectj:jar:1.8.9, aspectjweaver:org.aspectj:jar:1.8.9

It took me one hour and finally realized that I made a stupid mistake: I typed artifactId where it should be groupid.

If I can check the error message more carefully and think about the possible causes, it would save me a hour.

Senario 3
Error creating bean with name 'org.springframework.security.config.authentication.AuthenticationManagerFactoryBean#0'
I converted the xml declaration of authentication-manager to java config bean, it failed with above error.

There are a lot of error messages(200+ lines), but as long as I scan all error messages, the root cause is clear:
Caused by: org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'org.springframework.security.authenticationManager' defined in class path resource WebSecurityConfiguration.class]: Bean instantiation via factory method failed; nested exception is org.springframework.beans.BeanInstantiationException: Failed to instantiate [org.springframework.security.authentication.AuthenticationManager]: Factory method 'authenticationManager' threw exception; nested exception is java.lang.IllegalArgumentException: A parent AuthenticationManager or a list of AuthenticationProviders is required
Caused by: java.lang.IllegalArgumentException: A parent AuthenticationManager or a list of AuthenticationProviders is required
at org.springframework.security.authentication.ProviderManager.checkState(ProviderManager.java:117) ~[spring-security-core-3.2.5.RELEASE.jar:3.2.5.RELEASE]
at org.springframework.security.authentication.ProviderManager.<init>(ProviderManager.java:106) ~[spring-security-core-3.2.5.RELEASE.jar:3.2.5.RELEASE]
at org.springframework.security.authentication.ProviderManager.<init>(ProviderManager.java:99) ~[spring-security-core-3.2.5.RELEASE.jar:3.2.5.RELEASE]

at xx.services.app.config.WebSecurityConfiguration.authenticationManager(WebSecurityConfiguration.java:33) ~[WebSecurityConfiguration.class:na]

This is because of the bug in the code: I first create an empty list then create authenticationManager with the empty list.
public AuthenticationManager authenticationManager() {
    final List<AuthenticationProvider> providers = new ArrayList<>();
    // this will fail: we have to first add providers to the list then create authenticationManager with the non-empty provider list.
    final AuthenticationManager authenticationManager = new ProviderManager(providers);
    ...// build daoAuthenticationProvider, ldapAuthenticationProvider

Designing Data Structure

During programming, it's important to design and use right data structures.

Here is a list of problems that we can use to improve our data structure design skills.

Design an in-memory search engine
How to index and store in memory
How to support free text queries, phrase queries
Map<String, List<Document>>
List<Document> is sorted
Document: docId, List<Long> positionIds
List<Long> is sorted
How to save to files
How to merge small files into big files
-- When save to file, make it sorted by word
-- Use merge sort to merge multiple fields
How to make it scalable - how solr cloud works

Google – Manager Peer Problem
1. setManager(A, B) sets A as a direct manager of B
2. setPeer(A, B) sets A as a colleague of B. After that, A and B will have the same direct Manager.

3. query(A, B) returns if A is in the management chain of B.
Tree + HashMap
Map<Integer, TNode> nodeMap

TNode: value, parent, neighbors

Design an Excel sheet’s Data structure
You need to perform operations like addition. The excel sheet is very sparse and is used to store numbers in the range 1-65K. Index for a cell is known.
Sparse table: Map<Integer, Map<Integer, String>> data
Follow-up question: In excel, one cell can refer to other cells, if I update one cell, how do you update all the dependent cells?
--Topological sort

- Use multiple data structures
Design a data structure that supports insert, delete, search and getRandom in constant time
private List<String> list = new ArrayList<String>();
Map<String,Integer> indexes = new HashMap<String,Integer>();
-- When remove key, swap the old element in list with the last element, and change the last element index to its new location

Follow up
- What if the value may be duplicated?
- How to test getRandom()?
-- Implement addItem - O(1), getTop10Items
-- Implement HashTable with get,set,delete,getRandom functions in O(1).

Implement Get and Insert for TimeTravelingHashTable
- insert(key, value, timestamp)
- get(key, timestamp)
- get(key) // returns value associated with key at latest time.
Map<K, TreeMap<Float, V>> keyToBSTMap = new HashMap<>();
public V get(K k, Float time){
        if(keyToBSTMap.containsKey(k) == false) return null;
                return  keyToBSTMap.get(k).get(time);
                return  keyToBSTMap.get(k).lowerEntry(time).getValue();

Stack supports getMin or getMax
Stack<Integer> main = new Stack<>();
Stack<Integer> minStack = new Stack<>();

Design a stack with operations(findMiddle|deleteMiddle) on middle element
-- Use Double Linked List

LeetCode 311 - Sparse Matrix Multiplication
Map<Integer, HashMap<Integer, Integer>> tableA, tableB

Design SparseVector:
-- supports dot(SparseVector that)
Design SparseMatrix
-- supports plus(SparseMatrix that)

Google - remove alarm
hash map - map priority to set of alarm id

max priority heap - PriorityQueue<Integer>

Recover a Quack Data Structure
Copy its all elements in to an array
--it's pop()/peek() randomly remove or return first or last element

Two Sum III - Data structure design
add - Add the number to an internal data structure.
find - Find if there exists any pair of numbers which sum is equal to the value.
1. O(1) add, O(n) find
2. O(n) add, O(1) find

Add and Search Word
search(word) can search a literal word or a regular expression string containing only letters a-z or .. A . means it can represent any one letter.

Shortest Word Distance II
Design a class which receives a list of words in the constructor, and implements a method that takes two words word1 and word2 and return the shortest distance between these two words in the list.

Your method will be called repeatedly many times with different parameters.
Map<String, List<Integer>> map:  word -< sorted index

Solr - Tips and Tricks

Admin UI
bin/solr help
bin/solr status
bin/solr healthcheck
bin/solr stop -all
(bin/solr start -cloud -s example/cloud/node1/solr -p 8983 -h  && (bin/solr start -cloud -s example/cloud/node2/solr -p 7574 -z -h && (bin/solr start -cloud -s example/cloud/node3/solr -p 6463 -z -h

Delete docs:
change *;* to your query

Use invariants to lock options and overwrite values client passes.
Use appends to append options, use defaults to provide default options.

Request Paramters
distrib=false - only query current core



Range Query inclusive: [a to b]
exclusive: {a to b} - it's not ().
mixed: [a to b} {a to b]

Negative Query
Query empty fields: -field:*
field is empty or is abc: (*:* OR -field:*) OR field:abc
(*:* -id:1) OR id:1 - return all docs
-foo is transformed by solr into (*:* -foo)
The big caveat is that Solr only checks to see if the top level query is a pure negative query!

zkcli.sh -zkhost zooServer:port  -cmd putfile /configs/solrconfig.xml solrconfig.xml
zkcli.sh -zkhost zooServer:port  -cmd get /configs/schema.xml

To get solrcloud nodes info(such as ip address)
java -classpath "*" org.apache.solr.cloud.ZkCLI "-zkhost myzkhost -cmd get /clusterstate.json"
zkcli.sh -zkhost myzkhost:port -cmd get /clusterstate.json

Rest API

SolrJ Field Annotation
Map Dynamic fields to fields
Map> supplier;

Map supplier_simple;

private String[] allSuppliers;

@Field(child = true)
Child[] child;

Staring Solr
-m 2g: Start Solr with the defined value as the min (-Xms) and max (-Xmx) heap size for the JVM.

bin/solr stop -all

(bin/solr start -cloud -s example/cloud/node1/solr -p 8983 -h -m 2g)  && (bin/solr start -cloud -s example/cloud/node2/solr -p 7574 -z -h -m 2g) && (bin/solr start -cloud -s example/cloud/node3/solr -p 6463 -z -h -m 2g)
-- Use -h so solr can continue to work even ip changed.

Extending Solr
Implement the SolrCoreAware interface in custom RequestHandler to get SolrCore in inform method.

Customize and extend DocumentObjectBinder

Get solr static fields
SolrServer solrCore = new HttpSolrServer("http://{host:port}/solr/core-name");
SolrQuery query = new SolrQuery();

// query.add(CommonParams.QT, "/schema/fields");
QueryResponse response = solrClient.query(query);
NamedList responseHeader = response.getResponseHeader();
ArrayList fields = (ArrayList) response.getResponse().get("fields");
for (SimpleOrderedMap field : fields) {
    Object fieldName = field.get("name");


Solr Internals

The Solr replicationFactor has nothing to do with quorum. Solr uses Zookeeper's Quorum sensing to insure that all Solr nodes have a consistent picture of the cluster.

openSearcher and hardCommit
- Soft commit always opens new searcher.
- openSearcher only makes sense for hardcommit

Use config api to change solr settings dynamically

Use JSON API, but be aware SolrJ may not work with JSON API in some cases.

Solr Facet APIs

Don't forget facet.mincount=1

Solr Nested Objects
Define _root_ field

Use [child] - ChildDocTransformerFactory to return child documents

admin adn collection apis

curl http://localhost:8983/solr/mycollection/update -X POST -H 'Content-Type: application/json' --data-binary @atomic.json

Clean ZK data - link
run java -cp zookeeper-3.4.6.jar:conf org.apache.zookeeper.server.PurgeTxnLog  ../zoo_data/ ../zoo_data/ -n 3

Access solr cloud via ssh tunnel
Create a tunnel to zookeeper and solr nodes
- But wen solrJ queries zookeeper, it still returns the external solr nodes that we can't access directly
Add a conditional breakpoint at CloudSolrClient.sendRequest(SolrRequest, String)
- before  LBHttpSolrClient.Req req = new LBHttpSolrClient.Req(request, theUrlList);

return false;

Solr suggester
It supports filter on multiple fields. Just copy these fields to the contextFilterFeild.

400 Unknown Version - when run curl solr
- Maybe u need encode query parameters

Solving Problem from Different Angles - Programmer Skills

Hit one issue in production: when we query solr, the parameter rows is too small. 
We need fix this problem ASAP.

One possible solution is to fix the code and make a new deployment.
Another possible solution is to change the data so the one we want to return to client listed at first.

The solution that we used is to change solrconfix xml(only we uses the solr server) to always get 100 rows, ignore rows client passes.

-- invariants - provides param values that will be used in spite of any values provided at request time. They are a way of letting the Solr maintainer lock down the options available to Solr clients. Any params values specified here are used regardless of what values may be specified in either the query, the "defaults", or the "appends" params.

<requestHandler name="/select" class="solr.SearchHandler">
   <lst name="defaults">
     <str name="echoParams">explicit</str>
     <int name="rows">10</int>

   <lst name="invariants">
     <int name="rows">100</int>
Later we fixed the code and design issue, and removed the invariants in solrconfig.xml.

Lessons Learned about Programming and Soft Skills - 2016

Design schema carefully
The cost of changing/migrating old data is expensive, so it's important to try to make it right at first.
Give meaningful, easy-to-remember name
Use ID(instead of name) to reference other data.

Be schema-less
For example: in solr, only add explicit field if you need to search against it; for all other fields that we are sure we will never search, considering store all as one JSON or binary data field.
So it's easier to add/remove these fields.

Use public APIs as much as possible
-- For instance: when develop restful service: use apis from javax(HttpServletRequest) or javax.ws.rs.core if possible - not apis from com.sun.jersey or org.glassfish.jersey.
-- APIs from private apis is more likely to change.

Ensure You Can Wire On and Off Functions

Always Propose Different Solutions then Compare them
Use List and graph to compare pros and cons

Use Cases Analysis
How to implementation in high level 
What's the effort
Don't over-engineer

Pull form upstream(master) branch frequently
- Always work on latest code 

Build/Try simple workable code first

Solve Problem from Different Angles

What Could Possibly Go Wrong?
-- During coding, deployment

Soft Skills
How to find best solution 
Always try/propose different approaches
Talk with others/team members

Don't get too defensive for your solution
- Be open minded
What important is to:
- find best/easiest/robust solution - good for you to maintain code later
- Learn new things, new ways to tackle problems

Don't (just) try to convince others to take your approach
It's easy that we propose different approaches for feature other developer is working on.
But more important is to compare whether the new approach is better or whether there is other better solution.

Different not means better
This is especially more important for senior engineers, because if you give not-well-thought solution, others may be (forced) convinced to take your approach.

More important is to carefully compare these different approaches in detail
-- How to implement all (important/basic) functions in each approach
-- How to handle new requirements later

Communicate early and often
Let others know what you are doing and your progress
-- Avoid duplicate work

Be open-minded
Learn aggressively

Java Frequently Used Utillity Methods

Objects.requireNonNull(key, message)
Objects.equals(a,b) - null safe
.setCharAt(int, char)
- O(1) if decrease the length

.fill(long[] a, long val)
fill(long[] a, int fromIndex, int toIndex, long val)

To fill multiple dimension arrays:
int[][] target = new int[2][4];
int [] temp = new int[4];
Arrays.fill(temp, -1);

Arrays.fill(target, temp);

E remove(int index);
boolean remove(Object o);

next/hasNext/nextIndex, previous/hasPrevious/previousIndex, 

remove elements from startIndex to endIndex
Don't update original list while still need the subList.

Arrays.asList(T…a) returns a Subclass of AbstractList that doesn’t implement the add or remove method - it throws UnsupportedOperationException.

map.putIfAbsent(sum, i);
Instead of:
if (!map.containsKey(sum)) {
    map.put(sum, i);

map.getOrDefault(key, defaultValue)

iterable.forEach(x -> x.setXXX(null));
list.removeIf(s -> s.length() == 0);

list.replaceAll(s -> s.toUpperCase());
list.sort((x, y) -> x.length() – y.length());
logger.finest(() -> complexMsg());
lineCount = Files.lines(path).count();

mapStream.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));

Math.xxxExact - throws exception if overflow


SomeEnum someEnum = Enum.valueOf(SomeEnum.class, value);

// Need escape ': ''

Initilaize 2-d array
int[][] target = new int[2][4];
int [] temp = new int[4];
Arrays.fill(temp, -1);
Arrays.fill(target, temp);

Get the inputstream from classpath:

firstNonNull(@Nullable T first, @Nullable T second)
joiner = Joiner.on("; ").skipNulls();
splitter = Splitter.on(',').trimResults().omitEmptyStrings();

ComparisonChain.start().compare(this.aString, that.aString)....compare(this.anEnum, that.anEnum, Ordering.natural().nullsLast()).result();

Ordering<String> ordering = = Ordering.nullsFirst().reverse();

Collections.sort(names, ordering);
Map<String, String> paramMap = Splitter.on("&").withKeyValueSeparator("=").split(params);

CaseFormat.LOWER_UNDERSCORE.to(CaseFormat.UPPER_CAMEL, tableName);


Files.hash(file, Hashing.md5());

Solr Commons

Top 16 Java Utility Classes
Apache Commons


firstNonNull(T... values)
defaultIfNull(final T object, final T defaultValue)


Testing Tips for Java Developers

- More Readable.
- Provides Better Failure Messages
Type Safety
import static org.hamcrest.MatcherAssert.assertThat;
import static org.hamcrest.Matchers.is;
containsString, startsWith, endsWith, equalTo,equalToIgnoringCase, equalToIgnoringWhitespace, isEmpty, isEmptyOrNull and stringContainsInOrder

assertThat(actual, containsString(expected));
assertThat(actual, nullValue());
assertThat(actual, notNullValue());

assertThat(subClass, instanceOf(BaseClass.class));

-- Don't mock too much, only mock directly dependence
import static org.mockito.Mockito.*;
- used to wrap a real object. Every call, unless specified otherwise, is delegated to the object.
ArgumentCaptor<XClass> captor=ArgumentCaptor.forClass(XClass.class);
when(mockObj.someMethod(1)).thenThrow(new RuntimeException());

Use @Spy or the spy() method to wrap a real object
Important gotcha on spying real objects!
- use doReturn|Answer|Throw() to stub spy object

verify(mockObj, never()).someMethod(param);
verify(mockObj, atLeast(2)).someMethod();
verify(mockObj, times(3)).someMethod();

Invalid use of argument matchers!
When using matchers, all arguments have to be provided by matchers.
//incorrect: someMethod(anyObject(), "raw String"); 
// Should be

someMethod(anyObject(), eq("raw String"));

Testing Servlet
Mock chained calles
@Mock(answer = Answers.RETURNS_DEEP_STUBS) SomeService service;


Main Steps
create mock:@Mock -> stubbing:when/thenReturn -> Verify: verify

public ExpectedException thrown = ExpectedException.none();
thrown.expect(XyzException.class); // before call the tested method

Testing with TestNG

Spring + JUnit
JUnit 4.11 doesn't work with Spring Test framework.

Using REST Assured to test http APIs
Response rsp = given().filter(sessionFilter).when().get(someUrl).then().statusCode(is(200))).extract().response()
Map importResult = rsp.as(Map[].class)[0];

assertThat(Boolean.valueOf(importResult.get("success").toString()), is(true));

Run specific method
mvn test -Dtest=className#methodName

Eclipse Tips
Add Static Import Automatically
To help write test cases easier, when we type "assertT" and hit Ctrl+Space, we want Eclipse to add static import automatically for us: import static org.hamcrest.MatcherAssert.assertThat;
Window > Preferences > Java > Editor > Content Assist > Favorites, then add:




Run Tests Across Multiple Projects
--Eclipse Tips + Tricks - 2016
Create a maven project depending on all the projects you want to test.
Create Test code:
import org.junit.extensions.cpsuite.ClasspathSuite;
import org.junit.runner.RunWith;
public class TestRunner {

Essential Linux Commands for Developpers

Ctrl + a - Move to the start of line
Ctrl + e - Move to the end of line
Clear the screen:  Ctrl + l
Search as you type. Ctrl + r and type the search term; Repeat Ctrl + r to loop through results.

tac: print a file line by line in reverse order.

List all open ports
netstat -ltn -- all process that listens on tcp ports

Which process opens 9160
lsof -i :9160
lsof -p pid1
lsof /var/log/system.log
List opened files under a directory
lsof +D /var/log/
List files opened by a specific user
lsof -u user
lsof -u ^user
lsof -i

To get a complete presentation of the netfilter rules
iptables -vL -t filter
iptables -vL -t nat
iptables -vL -t mangle
iptables -vL -t raw

iptables -vL -t security

nohup someCommand > someFile.log 2>&1 &

sort | uniq -d
     -u      Only output lines that are not repeated in the input.
     -d      Only output lines that are repeated in the input.
     -c      count

find /usr -size +10M
find . -exec xxx {} \;

Find configuration data
find /etc -type f -exec grep -FHi "ENCRYPTION_PASSWORD" {} +
grep -r ENCRYPTION_PASSWORD /etc 2>/dev/null

find /etc -type f -print0  | xargs -0 grep ENCRYPTION_PASSWORD

How long a command takes
time curl ""

List all functions
declare -f
declare -F - only list function names
declare -f function_name

Truncate a file
cat /dev/null > file
> file

Generate random number
/dev/random block when entropy pool is exhausted
/dev/urandom will not block

echo $RANDOM
od -An -N1 -i /dev/urandom
od -An -N2 -i /dev/urandom
for i in {1..5}; do echo $RANDOM; done

Display disk usage
du -sm
List all directories and their total size:
du -sh *
-s: Display an entry for each specified file. (Equivalent to -d 0)
Show only total for each directories

du -h -d 1

List hidden files:  ls -a | grep "^\."
List files and sorted by size: ls -l | grep ^- | sort -nr
List link files: ls -l | grep '^l'

-c --count: Show count of matched lines
-E, --extended-regexp - same as egrep
-n, --line-number

grep -E '^abc(worda|wordb)' /etc/group
-n or --line-number
-A NUM, --after-context=NUM
       Print NUM lines of trailing context after matching lines.
-B NUM, --before-context=NUM
       Print NUM lines of leading context before matching lines.
-C NUM, --context=NUM
       Print NUM lines of output context. 
-w - searched as a word
-o, --only-matching
Prints only the matching part of the lines.
-l  -- only show matched file name
-w -- only if it's a whole word
-r  -- recursively search

Grep file that contains binary data
cat -v tmp/test.log | grep regex
-v      Display non-printing characters so they are visible.

Search specific line ranges:
sed -n 'startLine,endLinep' a.txt | grep XX 

Use extended regular expression with grep -E

scroll results and pagination
grep ** | more/less 

sed s/word1/word2/g fileName
Only display nth line:         sed -n 'n p' file
Delete nth line:          sed 'n d' file > newFile
Delte nth line in place:  sed –i 'n d' file.txt

Remove last line:          sed –i '$ d' file.txt
-i change in place
Delete first line: sed –i '$ d' file.txt
sed –i 'm,n d' file.txt
sed -n 'n p' file.txt | wc -c
-i extension
Edit files in-place, saving backups with the specified extension.  If a zero-length extension is given, no backup will be saved.

echo a b c | xargs echo
find /tmp -name "*.bk" -type f -print | xargs /bin/rm -f
find /tmp -name "*.bk" -print0 | xargs -0 -I {} mv {} ~/bk.files
-- better: find /tmp -depth -name "*.bk" -type f -delete
find /tmp -name "*.bk" -print0 | xargs -0 -I file mv file ~/bk.files
cut -d: -f1 < /etc/passwd | sort | xargs echo

-I replstr
--null, -0 - handle spaces in file name
Change xargs to expect NUL (``\0'') characters as separators, instead of spaces and newlines.  This is expected to be used in concert with the -print0 function in find

python -m SimpleHTTPServer
| python -mjson.tool

if [[ -f ~/.bashrc ]]; then
   source ~/.bashrc
[ ! -f $FILE ] && { echo "$FILE not found"; exit -1; }

Use $( ... ), not `` to capture command output
var="$(command "$(command1)")"
$(( $a+$b )) to execute arithmetic expressions
Put ; do and ; then on the same line as the while, for or if.
Prefer brace-quoting all other variables.
Prefer "${var}" over "$var"
Use "$@" unless you have a specific reason to use $*.
- "$@" will retain arguments as-is, so no args provided will result in no args being passed on;
- "$*" expands to one argument, with all args joined by (usually) spaces, so no args provided will result in one empty string being passed on.

while IFS=, read var1 var2 var3; do

done < file.txt

Make variable readonly: readonly var=value
Make function readonly: readonly -f function
readonly -p/-f

Use Local Variables
local var="something"
local var
var="$(func)" || return

if [[ "${my_var}" = "some_string" ]]; then
-z (string length is zero) and -n (string length is not zero)

if ! mv "${file_list}" "${dest_dir}/" ; then

Scp copy from local to remote:
scp /file/to/send username@remote:/where/to/put
Remote to local:
scp username@remote:/file/to/send /where/to/put
Send files between two remote hosts:

scp username@remote_1:/file/to/send username@remote_2:/where/to/put

Copy file from remote host to local via gateway 
scp -o "ProxyCommand ssh $USER@$bastion-host nc $destinationHost 22" $USER@$destinationHost:/home/$USER/heapdump.hprof heapdump.hprof

Copy file from local to remote host via gateway 
scp -o "ProxyCommand ssh $USER@$bastion-host nc $destinationHost 22" heapdump.hprof $USER@$destinationHost:/home/$USER/heapdump.hprof 

Netcat - nc
Listening on server
nc -l 2389 > test

nc -k -l 2389 - server would stay up
Connect to server on specific port

cat testfile | nc remoteHost 2389
Port Scanning
nc -zv remoteHost 20-30

Bulk rename files
brew install rename
rename -n -v 's/\.csv$/\.json/' *.csv
-n: --just-print/--dry-run

Brace Expansion
echo a{d,c,b}e

sleep 10 - sleep 10 seconds
wait pid - wait process finish
command &
wait $!

Check Linux System Info
free -t -m
cat /proc/pid/smaps
pmap pid | grep total

jstack -m

top -p PID 
-on mac: top -pid PID
top -c or Press 'c' in top view: to show full command
sort on other fields (default by cpu)

Press "SHIFT + F" and Select your choice below and press ENTER.

Get the hostname of remote server
host ip-address

Check system time zone
date +%Z
cat  /etc/localtime

Create zip file
gzip -k the-file
- without the tar, -k: keep the original file

tar -czf my.tar.gz the-folder_or_file
gunzip file.gz
gzip -d file.gz
unzip –t file.zip
test whether zipfile is corrupted or not

awk, gawk
gawk 'match($0, pattern, ary) {print ary[1]}'

Count the number of occurrences of a word
:set all
- (no)nu, (no)ic,
Compound search on multiple lines
MA -> `A -> ``

dd, 5dd
d$  - delete to end of line
d0  - delete to beginning of line

1,$d - delete all
1,.d  - delete to beginning of file
.,$d  - delete to end of file
Y - copy
- p pastes after the cursor position
- P pastes before.

Y$ - the end of the line
G - go to the last line

ZZ in command line - :wq to exit vi
append a file to current file

:r file2

ls -l --block-size=M
apropos - search the whatis database for strings
apropos "kill process"

Use jstack to generate thread dump
nohup jmap -F -dump:format=b,file=/root/heapdump.hprof pid &

Commands for troubleshooting
find class in jars
find . -name "*.jar" | xargs grep Hello.class

Google Shell Style Guide


Java (159) Lucene-Solr (110) All (60) Interview (59) J2SE (53) Algorithm (37) Eclipse (35) Soft Skills (35) Code Example (31) Linux (26) JavaScript (23) Spring (22) Windows (22) Web Development (20) Tools (19) Nutch2 (18) Bugs (17) Debug (15) Defects (14) Text Mining (14) J2EE (13) Network (13) PowerShell (11) Chrome (9) Continuous Integration (9) How to (9) Learning code (9) Performance (9) UIMA (9) html (9) Design (8) Dynamic Languages (8) Http Client (8) Maven (8) Security (8) Trouble Shooting (8) bat (8) blogger (8) Big Data (7) Google (7) Guava (7) JSON (7) Problem Solving (7) ANT (6) Coding Skills (6) Database (6) Scala (6) Shell (6) css (6) Algorithm Series (5) Cache (5) IDE (5) Lesson Learned (5) Miscs (5) Programmer Skills (5) System Design (5) Tips (5) adsense (5) xml (5) AIX (4) Code Quality (4) GAE (4) Git (4) Good Programming Practices (4) Jackson (4) Memory Usage (4) OpenNLP (4) Project Managment (4) Python (4) Spark (4) Testing (4) ads (4) regular-expression (4) Android (3) Apache Spark (3) Become a Better You (3) Concurrency (3) Eclipse RCP (3) English (3) Firefox (3) Happy Hacking (3) IBM (3) J2SE Knowledge Series (3) JAX-RS (3) Jetty (3) Restful Web Service (3) Script (3) regex (3) seo (3) .Net (2) Android Studio (2) Apache (2) Apache Procrun (2) Architecture (2) Batch (2) Build (2) Building Scalable Web Sites (2) C# (2) C/C++ (2) CSV (2) Career (2) Cassandra (2) Distributed (2) Fiddler (2) Google Drive (2) Gson (2) Html Parser (2) Http (2) Image Tools (2) JQuery (2) Jersey (2) LDAP (2) Life (2) Logging (2) Software Issues (2) Storage (2) Text Search (2) xml parser (2) AOP (1) Application Design (1) AspectJ (1) Bit Operation (1) Chrome DevTools (1) Cloud (1) Codility (1) Data Mining (1) Data Structure (1) ExceptionUtils (1) Exif (1) Feature Request (1) FindBugs (1) Greasemonkey (1) HTML5 (1) Httpd (1) I18N (1) IBM Java Thread Dump Analyzer (1) JDK Source Code (1) JDK8 (1) JMX (1) Lazy Developer (1) Mac (1) Machine Learning (1) Mobile (1) My Plan for 2010 (1) Netbeans (1) Notes (1) Operating System (1) Perl (1) Problems (1) Product Architecture (1) Programming Life (1) Quality (1) Redhat (1) Redis (1) Review (1) RxJava (1) Solutions logs (1) Team Management (1) Thread Dump Analyzer (1) Troubleshooting (1) Visualization (1) boilerpipe (1) htm (1) ongoing (1) procrun (1) rss (1)

Popular Posts