Writing Rest API to Export Data as CSV File - Jersey


Writing Rest API to Export Data as CSV File - Jersey

The Problem

Sometimes we need support export data as CSV file, so people can download it and open it in excel and do some analysis.

The Data

The data we want to export it as below:

public class SurveyAnswers{
    public String surveyId;
    // some other meta data about survey, user
    public List<QuestionAnswer> answers;
}
// user's choice for this question
public class QuestionAnswer {
    public int questionId;
    public int optionId;
    // other meta data about the question and the option
}

The date in CSV file would be like this:

SurveyId, dataTime, /*fields about user meta data */, QuestionId1, AnswerId1, QuestionId2, AnswerId2...

The difficult part is that field headers are not fixed due to the list of QuestionAnswer.

Choose CSV Library

I choose Super-CSV, as we want to use its CsvMapWriter. We will convert the data to a hashmap, which contains fields like: QuestionId1, AnswerId1, QuestionId2, AnswerId2 etc.

Check more about superCSV at Writing CSV files

//Update:

It’s better to use CsvDozerBeanReader for our use case.

CsvBeanWritable Interface

In most cases, we are going to implement this interface.

/**
 * Check: http://stackoverflow.com/questions/21942042/using-supercsv-to-change-header-values
 *
 */
public interface CsvBeanWritable {
    /**
     * Header in csv file - it can be anything such as

     * new String[] { "First Name", "Last Name", "Birthday"};
     */
    @JsonIgnore
    public String[] getCsvHeader();
    /**
     * The mapping of bean field to cvv field - order matters

     * new String[] { "firstName", "lastName", "birthDate"};
     */
    @JsonIgnore
    public String[] getCsvMapping();
}

CsvMapWritable Interface

  • When logic is complex like in our case, we can’t use CsvBeanWriter, we will implement CsvMapWritable interface.
  • CsvMessageBodyWriter will use superCSV CsvMapWriter to write the object as csv data.
public interface CsvMapWritable {
    @JsonIgnore
    public Map<String, Object> getCsvBody();
    @JsonIgnore
    public String[] getCsvHeader();
}
CsvMessageBodyWriter - the Provider
CsvMessageBodyWriter  will marshall object as csv data if the object implements CsvBeanWritable or CsvMapWritable, of the object is a collection of CsvBeanWritable or CsvMapWritable.

We need register it in ResourceConfig or tell jersey to scan the package to find it.

@Component
@Provider
@Produces({CsvMessageBodyWriter.TEXT_CSV, CsvMessageBodyWriter.APPLICATION_EXCEL})
public class CsvMessageBodyWriter<T> implements MessageBodyWriter<T> {
    public static final String TEXT_CSV = "text/csv";
    public static final String APPLICATION_EXCEL = "application/vnd.ms-excel";

    @Override
    public boolean isWriteable(final Class<?> type, final Type genericType, final Annotation[] annotations,
            final MediaType mediaType) {
        return true;
    }
    @Override
    public long getSize(final T data, final Class<?> type, final Type genericType, final Annotation annotations[],
            final MediaType mediaType) {
        return -1;
    }
    @Override
    public void writeTo(final T data, final Class<?> type, final Type genericType, final Annotation[] annotations,
            final MediaType mediaType, final MultivaluedMap<String, Object> httpHeaders,
            final OutputStream entityStream) throws java.io.IOException, javax.ws.rs.WebApplicationException {
        try (AbstractCsvWriter csvWriter = getCsvWriter(data, entityStream)) {
            writeDate(data, csvWriter);
        }
    }
    private AbstractCsvWriter getCsvWriter(final T data, final OutputStream entityStream) {
        AbstractCsvWriter csvWriter = null;
        if (data instanceof CsvBeanWritable) {
            csvWriter = new CsvBeanWriter(new OutputStreamWriter(entityStream), CsvPreference.STANDARD_PREFERENCE);
        } else if (data instanceof CsvMapWritable) {
            csvWriter = new CsvMapWriter(new OutputStreamWriter(entityStream), CsvPreference.STANDARD_PREFERENCE);
        } else if (data instanceof Collection) {
            final Collection<?> collection = (Collection<?>) data;
            csvWriter = getCsvWritterFromCollection(collection, entityStream);
        }
        return csvWriter;
    }


    protected void writeDate(final T data, final AbstractCsvWriter csvWriter) throws IOException {
        if (data instanceof CsvBeanWritable) {
            final CsvBeanWritable writable = (CsvBeanWritable) data;
            csvWriter.writeHeader(writable.getCsvHeader());
            ((CsvBeanWriter) csvWriter).write(writable, writable.getCsvMapping());
        } else if (data instanceof CsvMapWritable) {
            final CsvMapWritable writable = (CsvMapWritable) data;
            csvWriter.writeHeader(writable.getCsvHeader());
            ((CsvMapWriter) csvWriter).write(writable.getCsvBody(), writable.getCsvHeader());
        } else if (data instanceof Collection) {
            writeCollection(data, csvWriter);
        } else {
            throw new XXException("doesn't support download as csv");
        }
    }

    protected void writeCollection(final T data, final AbstractCsvWriter csvWriter) throws IOException {
        final Collection<?> collection = (Collection<?>) data;
        boolean first = true;
        final Iterator<?> it = collection.iterator();
        while (it.hasNext()) {
            final Object obj = it.next();
            if (CsvBeanWritable.class.isAssignableFrom(obj.getClass())) {
                final CsvBeanWritable writable = (CsvBeanWritable) obj;
                if (first) {
                    csvWriter.writeHeader(writable.getCsvHeader());
                    first = false;
                }
                ((CsvBeanWriter) csvWriter).write(writable, writable.getCsvMapping());
            } else if (CsvMapWritable.class.isAssignableFrom(obj.getClass())) {
                final CsvMapWritable writable = (CsvMapWritable) obj;
                if (first) {
                    csvWriter.writeHeader(writable.getCsvHeader());
                    first = false;
                }
                ((CsvMapWriter) csvWriter).write(writable.getCsvBody(), writable.getCsvHeader());
            } else {
                throw new XXException("doesn't support download as csv");
            }
        }
    }
    protected static AbstractCsvWriter getCsvWritterFromCollection(final Collection<?> collection,
            final OutputStream entityStream) {
        AbstractCsvWriter csvWriter = null;
        final Iterator<?> it = collection.iterator();
        while (it.hasNext()) {
            final Object obj = it.next();
            if (CsvMapWritable.class.isAssignableFrom(obj.getClass())) {
                csvWriter = new CsvMapWriter(new OutputStreamWriter(entityStream), CsvPreference.STANDARD_PREFERENCE);
            } else if (CsvBeanWritable.class.isAssignableFrom(obj.getClass())) {
                csvWriter = new CsvBeanWriter(new OutputStreamWriter(entityStream), CsvPreference.STANDARD_PREFERENCE);
            }
        }
        return csvWriter;
    }
}

Add CSV ability to API

 * @param downloadAsFile: downloadAsFile=false, it seems not work in chrome, only work in safari,
 *        not try safari.
@GET
@Produces({MediaType.APPLICATION_JSON, CsvMessageBodyWriter.TEXT_CSV})
public Set getFlatSurveyChoiceResponse(@QueryParam("surveyId") final String surveyId,
        @QueryParam("downloadAsFile") @DefaultValue("true") final boolean downloadAsFile) {
    if (downloadAsFile) {
        servletResponse.addHeader("Content-Disposition", MessageFormat.format("attachment; filename={0}.csv",
                DateUtil.getThreadLocalDateFormat().format(new Date())));
    }
    return service.getSurveyAnswers(surveyId);
}

Using CsvMapWritable in our SurveyAnswers example

Create csv headers for this survey and add it into SurveyHeadersHolder.

for (final Integer questionId : questions) {
    extraHeaders.add(getQuestionIDHeader(questionId));
    extraHeaders.add(getQuestionTitleHeader(questionId));
    extraHeaders.add(getOptionIDHeader(questionId));
    extraHeaders.add(getOptionTextHeader(questionId));
}

SurveyHeadersHolder.INSTANCE.addSurveyHeaders(surveyId, extraHeaders);

Implementing CsvMapWritable in SurveyAnswers

@Override
public Map<String, Object> getCsvBody() {
    final Map<String, Object> map = new LinkedHashMap<>();
    map.put(HEADER_SURVEY_ID, surveyId);
    map.put(HEADER_DATE, DateUtil.getThreadLocalDateFormat().format(date));
    for (final SurveyAnswers answer : answers) {
        map.put(getQuestionIDHeader(answer.getQuestionId()), answer.getQuestionId());
        map.put(getQuestionTitleHeader(answer.getQuestionId()), answer.getQuestionText());
        map.put(getOptionIDHeader(answer.getQuestionId()), answer.getOptionId());
        map.put(getOptionTextHeader(answer.getQuestionId()), answer.getOptionText());
    }
    return map;
}

@Override
public String[] getCsvHeader() {
    return SurveyHeadersHolder.INSTANCE.getSurveyHeaders(surveyId);
}

Implement Circuit Breaker Pattern with Netflix Hystrix


When we design services, it's important to make them resilient and prevent cascading failures.

Circuit Breaker Pattern
From -Martin Fowler
The basic idea behind the circuit breaker is very simple. You wrap a protected function call in a circuit breaker object, which monitors for failures. Once the failures reach a certain threshold, the circuit breaker trips, and all further calls to the circuit breaker return with an error, without the protected call being made at all. Usually you'll also want some kind of monitor alert if the circuit breaker trips.

There are several ways to apply circuit breaker pattern in java.

Using Netflix Hystrix
public class GetProductsCommand extends HystrixCommand<Set<Product>> {
  private final GetProductsConfiguration config;
  public GetEntitlementsCommand(final GetProductsConfiguration config, final String token) {
      super(HystrixCommandGroupKey.Factory.asKey("products"),config.getTimeoutInMilliseconds());
      this.config = config;
      this.token = token;
  }

  @Override
  protected Set<Product> run() throws Exception {
   // if it's client error, throws HystrixBadRequestException
   // it will not trigger fallback, not count against failure metrics and thus not trigger the circuit breaker.
  }
  @Override
  protected Set<Product> getFallback() throws Exception {}

  @Component
  public static class GetProductsConfiguration {
      @Autowired
      // auto wire services that's going to be used by GetProductsCommand
      @Value("${cobra.oauth.timeout.milliseconds:1000}")
      private int timeoutInMilliseconds;    
  }
}

Call HystrixCommand asynchronously, Get result later
@Autowired
private GetProductsConfiguration getProductsConfiguration;

// call it asycnchoursly
final Future<Set<Products>> productsFuture = new GetProductsCommand(getProductsConfiguration, token).queue();

// later
final Set<Products> products = productsFuture.get();

Propagating ThreadLocal to HystrixCommand
Sometimes, the service we are calling expects it's called in same http thread - it expects thread local from current http thread
requestAttributes = (ServletRequestAttributes) RequestContextHolder.currentRequestAttributes();

We can get current requestAttributes and pass to HystrixCommand:
public class MyHystrixCommand extends HystrixCommand<Result> {
  private final ServletRequestAttributes requestAttributes;
  public GetEntitlementsCommand() {
      super(HystrixCommandGroupKey.Factory.asKey("default"));
      this.requestAttributes = requestAttributes;
      this.requestAttributes = RequestContextHolder.getRequestAttributes();
      this.thread = Thread.currentThread();      
  }
  @Override
  protected Result run() throws Exception {
    try {
      RequestContextHolder.setRequestAttributes(requestAttributes);
      //do something
    } finally {
      clearThreadLocal();
    }
  }
  private void clearThreadLocal()
  {
    if (Thread.currentThread() != thread) {
      RequestContextHolder.resetRequestAttributes();
    }
    thread = null;
  }
}

Using Spring Cloud Hystrix
Spring cloud wraps Netflix Hystrix to make it easier to use.

First add @EnableCircuitBreaker in spring configuration class.
Then add @HystrixCommand annotation to service methods.
@HystrixCommand(fallbackMethod = "fallBack",
commandProperties = {
        @HystrixProperty(name = "fallback.isolation.semaphore.maxConcurrentRequests", value = "1000"),
        @HystrixProperty(name = "execution.isolation.semaphore.maxConcurrentRequests", value = "1000"),
        @HystrixProperty(name = "execution.isolation.thread.timeoutInMilliseconds", value = "2000")},
ignoreExceptions = {InvalidTokenException.class})
public Set<Product> getProducts() {}

public Set<Product> fallBack() {}

@HystrixProperty(name = "execution.isolation.strategy", value = "SEMAPHORE")
  • THREAD — it executes on a separate thread and concurrent requests are limited by the number of threads in the thread-pool
  • SEMAPHORE — it executes on the calling thread and concurrent requests are limited by the semaphore count
Resources:
Netflix Hystrix How to Use

Read the Error Message - Problem Solving Skills


The first step of trouble shooting is to read and understand the error message, then we can infer,guess and list likely causes from it, verify or exclude it one by one.

Senario 1
After upgrade some library to newer version and upgrade to JDK 8 and tomcat 8, deployment of the web application fails.

Step 1: Read/Understand the error message
SEVERE [localhost-startStop-1] org.apache.catalina.core.ContainerBase.addChildInternal ContainerBase.addChild: start:
 org.apache.catalina.LifecycleException: Failed to start component [StandardEngine[Catalina].StandardHost[localhost].StandardContext[/webappA]]
        at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:154)
        at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:725)
        at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:701)
        at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:717)
        at org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:945)
        at org.apache.catalina.startup.HostConfig$DeployWar.run(HostConfig.java:1798)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalStateException: Unable to complete the scan for annotations for web application [/webappA] due to a StackOverflowError. Possible root causes include a too low setting for -Xss and illegal cyclic inheritance dependencies. The class hierarchy being processed was [org.bouncycastle.asn1.ASN1Boolean->org.bouncycastle.asn1.DERBoolean->org.bouncycastle.asn1.ASN1Boolean]
        at org.apache.catalina.startup.ContextConfig.checkHandlesTypes(ContextConfig.java:2066)
        at org.apache.catalina.startup.ContextConfig.processAnnotationsStream(ContextConfig.java:2012)
        at org.apache.catalina.startup.ContextConfig.processAnnotationsJar(ContextConfig.java:1961)
        at org.apache.catalina.startup.ContextConfig.processAnnotationsUrl(ContextConfig.java:1936)
        at org.apache.catalina.startup.ContextConfig.processAnnotations(ContextConfig.java:1897)
        at org.apache.catalina.startup.ContextConfig.webConfig(ContextConfig.java:1149)
        at org.apache.catalina.startup.ContextConfig.configureStart(ContextConfig.java:771)
        at org.apache.catalina.startup.ContextConfig.lifecycleEvent(ContextConfig.java:305)
        at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:95)
        at org.apache.catalina.util.LifecycleBase.fireLifecycleEvent(LifecycleBase.java:90)
        at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5080)
        at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
        ... 10 more
In another app, the error message is:
Caused by: java.lang.IllegalStateException: Unable to complete the scan for annotations for web application [/megaphone-admin-1.0.3_jbuild363] due to a StackOverflowError. Possible root causes include a too low setting for -Xss and illegal cyclic inheritance dependencies. The class hierarchy being processed was [org.bouncycastle.asn1.ASN1EncodableVector->org.bouncycastle.asn1.DEREncodableVector->org.bouncycastle.asn1.ASN1EncodableVector]

Step 2: Find which jar contains the class
org.bouncycastle.asn1.DERBoolean
http://www.findjar.com/class/org/bouncycastle/asn1/DERBoolean.html

ls -altr | grep bcprov
-rw-r--r-- 1 tomcat tomcat  2902942 Sep 25 22:01 bcprov-jdk15on-1.52.jar
-rw-r--r-- 1 tomcat tomcat  1876535 Oct 12 07:01 bcprov-jdk16-1.46.jar
-rw-r--r-- 1 tomcat tomcat  1593423 Oct 12 07:01 bcprov-jdk15-140.jar

run mvn dependency:tree and check why these jars are imported, and use dependency/exclusion to exclude the old ones.
<exclusion>
  <groupId>org.bouncycastle</groupId>
  <artifactId>bcmail-jdk15on</artifactId>
</exclusion>
<exclusion>
  <groupId>org.bouncycastle</groupId>
  <artifactId>bcprov-jdk15on</artifactId>
</exclusion>

Senario 2
When deploy the app to jdk8, it fails:
org.aspectj.apache.bcel.classfile.ClassFormatException: Invalid byte tag in constant pool: 18.

To fix it, I need upgrade aspectj related libs to latest one 1.8.9 - exclude from the framework that imported it, and explicitly declare them to use 1.8.9.

Then when I run mvn clean install, it fails:  
The following artifacts could not be resolved: aspectjrt:org.aspectj:jar:1.8.9, aspectjweaver:org.aspectj:jar:1.8.9

It took me one hour and finally realized that I made a stupid mistake: I typed artifactId where it should be groupid.
<exclusion>
  <artifactId>org.aspectj</artifactId>
  <groupId>aspectjrt</groupId>
</exclusion>

If I can check the error message more carefully and think about the possible causes, it would save me a hour.

Senario 3
Error creating bean with name 'org.springframework.security.config.authentication.AuthenticationManagerFactoryBean#0'
I converted the xml declaration of authentication-manager to java config bean, it failed with above error.

There are a lot of error messages(200+ lines), but as long as I scan all error messages, the root cause is clear:
Caused by: org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'org.springframework.security.authenticationManager' defined in class path resource WebSecurityConfiguration.class]: Bean instantiation via factory method failed; nested exception is org.springframework.beans.BeanInstantiationException: Failed to instantiate [org.springframework.security.authentication.AuthenticationManager]: Factory method 'authenticationManager' threw exception; nested exception is java.lang.IllegalArgumentException: A parent AuthenticationManager or a list of AuthenticationProviders is required
....
Caused by: java.lang.IllegalArgumentException: A parent AuthenticationManager or a list of AuthenticationProviders is required
at org.springframework.security.authentication.ProviderManager.checkState(ProviderManager.java:117) ~[spring-security-core-3.2.5.RELEASE.jar:3.2.5.RELEASE]
at org.springframework.security.authentication.ProviderManager.<init>(ProviderManager.java:106) ~[spring-security-core-3.2.5.RELEASE.jar:3.2.5.RELEASE]
at org.springframework.security.authentication.ProviderManager.<init>(ProviderManager.java:99) ~[spring-security-core-3.2.5.RELEASE.jar:3.2.5.RELEASE]

at xx.services.app.config.WebSecurityConfiguration.authenticationManager(WebSecurityConfiguration.java:33) ~[WebSecurityConfiguration.class:na]

This is because of the bug in the code: I first create an empty list then create authenticationManager with the empty list.
@Bean
public AuthenticationManager authenticationManager() {
    final List<AuthenticationProvider> providers = new ArrayList<>();
    // this will fail: we have to first add providers to the list then create authenticationManager with the non-empty provider list.
    final AuthenticationManager authenticationManager = new ProviderManager(providers);
    ...// build daoAuthenticationProvider, ldapAuthenticationProvider
    providers.add(daoAuthenticationProvider);
    providers.add(ldapAuthenticationProvider);
}

Designing Data Structure


During programming, it's important to design and use right data structures.

Here is a list of problems that we can use to improve our data structure design skills.

Design an in-memory search engine
How to index and store in memory
How to support free text queries, phrase queries
Map<String, List<Document>>
List<Document> is sorted
Document: docId, List<Long> positionIds
List<Long> is sorted
How to save to files
How to merge small files into big files
-- When save to file, make it sorted by word
-- Use merge sort to merge multiple fields
How to make it scalable - how solr cloud works

Google – Manager Peer Problem
1. setManager(A, B) sets A as a direct manager of B
2. setPeer(A, B) sets A as a colleague of B. After that, A and B will have the same direct Manager.

3. query(A, B) returns if A is in the management chain of B.
Tree + HashMap
Map<Integer, TNode> nodeMap

TNode: value, parent, neighbors

Design an Excel sheet’s Data structure
http://codeinterviews.com/Uber-Design-Excel/
You need to perform operations like addition. The excel sheet is very sparse and is used to store numbers in the range 1-65K. Index for a cell is known.
Sparse table: Map<Integer, Map<Integer, String>> data
Follow-up question: In excel, one cell can refer to other cells, if I update one cell, how do you update all the dependent cells?
--Topological sort

- Use multiple data structures
Design a data structure that supports insert, delete, search and getRandom in constant time
private List<String> list = new ArrayList<String>();
Map<String,Integer> indexes = new HashMap<String,Integer>();
-- When remove key, swap the old element in list with the last element, and change the last element index to its new location

Follow up
- What if the value may be duplicated?
- How to test getRandom()?
-- Implement addItem - O(1), getTop10Items
-- Implement HashTable with get,set,delete,getRandom functions in O(1).

Implement Get and Insert for TimeTravelingHashTable
- insert(key, value, timestamp)
- get(key, timestamp)
- get(key) // returns value associated with key at latest time.
Map<K, TreeMap<Float, V>> keyToBSTMap = new HashMap<>();
public V get(K k, Float time){
        if(keyToBSTMap.containsKey(k) == false) return null;
        if(keyToBSTMap.get(k).containsKey(time))
                return  keyToBSTMap.get(k).get(time);
        else
                return  keyToBSTMap.get(k).lowerEntry(time).getValue();
}

Stack supports getMin or getMax
Stack<Integer> main = new Stack<>();
Stack<Integer> minStack = new Stack<>();

Design a stack with operations(findMiddle|deleteMiddle) on middle element
-- Use Double Linked List

LeetCode 311 - Sparse Matrix Multiplication
https://discuss.leetcode.com/topic/30631/my-java-solution
Map<Integer, HashMap<Integer, Integer>> tableA, tableB

Design SparseVector:
-- supports dot(SparseVector that)
Design SparseMatrix
-- supports plus(SparseMatrix that)

Google - remove alarm
https://reeestart.wordpress.com/2016/06/30/google-remove-alarm/
hash map - map priority to set of alarm id

max priority heap - PriorityQueue<Integer>

Recover a Quack Data Structure
Copy its all elements in to an array
--it's pop()/peek() randomly remove or return first or last element

Two Sum III - Data structure design
add - Add the number to an internal data structure.
find - Find if there exists any pair of numbers which sum is equal to the value.
1. O(1) add, O(n) find
2. O(n) add, O(1) find

Add and Search Word
search(word) can search a literal word or a regular expression string containing only letters a-z or .. A . means it can represent any one letter.

Shortest Word Distance II
Design a class which receives a list of words in the constructor, and implements a method that takes two words word1 and word2 and return the shortest distance between these two words in the list.


Your method will be called repeatedly many times with different parameters.
Map<String, List<Integer>> map:  word -< sorted index

Solr - Tips and Tricks


Admin UI
http://127.0.0.1:8983/solr/#/~cloud?view=tree
bin/solr help
bin/solr status
bin/solr healthcheck
bin/solr stop -all
(bin/solr start -cloud -s example/cloud/node1/solr -p 8983 -h 127.0.0.1)  && (bin/solr start -cloud -s example/cloud/node2/solr -p 7574 -z 127.0.0.1:9983 -h 127.0.0.1) && (bin/solr start -cloud -s example/cloud/node3/solr -p 6463 -z 127.0.0.1:9983 -h 127.0.0.1)

Delete docs:
change *;* to your query
update?commit=true&stream.body=<delete><query>*:*</query></delete>

https://wiki.apache.org/solr/SearchHandler
Use invariants to lock options and overwrite values client passes.
Use appends to append options, use defaults to provide default options.

Request Paramters
distrib=false - only query current core

debugQuery
debug=query/results/timing

explainOther
debug=results&explainOther=id:MA*

Range Query inclusive: [a to b]
exclusive: {a to b} - it's not ().
mixed: [a to b} {a to b]

Negative Query
Query empty fields: -field:*
field is empty or is abc: (*:* OR -field:*) OR field:abc
(*:* -id:1) OR id:1 - return all docs
http://stackoverflow.com/questions/634765/using-or-and-not-in-solr-query
-foo is transformed by solr into (*:* -foo)
The big caveat is that Solr only checks to see if the top level query is a pure negative query!

Solr zkcli
zkcli.sh -zkhost zooServer:port  -cmd putfile /configs/solrconfig.xml solrconfig.xml
zkcli.sh -zkhost zooServer:port  -cmd get /configs/schema.xml

To get solrcloud nodes info(such as ip address)
java -classpath "*" org.apache.solr.cloud.ZkCLI -zkhost myzkhost -cmd get /clusterstate.json | grep base_url
zkcli.sh -zkhost myzkhost:port -cmd get /clusterstate.json

Rest API
solr/collection/config
solr/collection/config/requestHandler
solr/collection/schema
solr/collection/schema/version
solr/admin/collections?action=RELOAD&name=$NAME

SolrJ Field Annotation
Map Dynamic fields to fields
@Field("supplier_*")
Map> supplier;

@Field("sup_simple_*")
Map supplier_simple;

@Field("allsupplier_*")
private String[] allSuppliers;

@Field(child = true)
Child[] child;

Staring Solr
-m 2g: Start Solr with the defined value as the min (-Xms) and max (-Xmx) heap size for the JVM.

bin/solr stop -all

(bin/solr start -cloud -s example/cloud/node1/solr -p 8983 -h 127.0.0.1 -m 2g)  && (bin/solr start -cloud -s example/cloud/node2/solr -p 7574 -z 127.0.0.1:9983 -h 127.0.0.1 -m 2g) && (bin/solr start -cloud -s example/cloud/node3/solr -p 6463 -z 127.0.0.1:9983 -h 127.0.0.1 -m 2g)
-- Use -h 127.0.0.1 so solr can continue to work even ip changed.

Extending Solr
Implement the SolrCoreAware interface in custom RequestHandler to get SolrCore in inform method.

Customize and extend DocumentObjectBinder

Get solr static fields
SolrServer solrCore = new HttpSolrServer("http://{host:port}/solr/core-name");
SolrQuery query = new SolrQuery();

query.setRequestHandler("/schema/fields");
// query.add(CommonParams.QT, "/schema/fields");
QueryResponse response = solrClient.query(query);
NamedList responseHeader = response.getResponseHeader();
ArrayList fields = (ArrayList) response.getResponse().get("fields");
for (SimpleOrderedMap field : fields) {
    Object fieldName = field.get("name");

}

Solr Internals
replicationFactor

The Solr replicationFactor has nothing to do with quorum. Solr uses Zookeeper's Quorum sensing to insure that all Solr nodes have a consistent picture of the cluster.

openSearcher and hardCommit
- Soft commit always opens new searcher.
- openSearcher only makes sense for hardcommit

Use config api to change solr settings dynamically

Use JSON API, but be aware SolrJ may not work with JSON API in some cases.

Solr Facet APIs
http://yonik.com/json-facet-api/
http://yonik.com/solr-facet-functions/

Don't forget facet.mincount=1

update.distrib
=toLeader when one replica sends this doc to it's leader=
=fromLeader when the doc's leader sends it to its followers

Solr Nested Objects
Define _root_ field

Use [child] - ChildDocTransformerFactory to return child documents

admin collection apis
/admin/collections?action=CLUSTERSTATUS

curl http://localhost:8983/solr/mycollection/update -X POST -H 'Content-Type: application/json' --data-binary @atomic.json

Zookeeper
Clean ZK data - link
run java -cp zookeeper-3.4.6.jar:conf org.apache.zookeeper.server.PurgeTxnLog  ../zoo_data/ ../zoo_data/ -n 3

Access solr cloud via ssh tunnel
Create a tunnel to zookeeper and solr nodes
- But when solrJ queries zookeeper, it still returns the external solr nodes that we can't access directly
Add a conditional breakpoint at CloudSolrClient.sendRequest(SolrRequest, String)
- before  LBHttpSolrClient.Req req = new LBHttpSolrClient.Req(request, theUrlList);
theUrlList.clear();
theUrlList.add("http://localhost:18983/solr/searchItems/");
theUrlList.add("http://localhost:28983/solr/searchItems/");

return false;

Solr suggester
It supports filter on multiple fields. Just copy these fields to the contextFilterFeild.

Troubleshooting
400 Unknown Version - when run curl solr
- Maybe u need encode query parameters

Debug Solr Query
http://splainer.io/

APIs
http://localhost:8983/solr/admin/collections
?action=LIST&wt=json

solr/admin/collections?action=OVERSEERSTATUS
overseer_queue_size
overseer_work_queue_size

admin/mbeans?key=fieldCache&stats=true

Coding
Watches are one time triggers

SolrJ
*SolrClient
request(SolrRequest)
.getZkStateReader()

GenericSolrRequest 
solrClient.request(new GenericSolrRequest(SolrRequest.METHOD.GET, "/admin/mbeans", params))
(NamedList<Object>) nl.findRecursive("solr-mbeans", "CACHE", "fieldCache", "stats");</Object>

Use Zookeeper client
./zkCli.sh -server localhost:9983
ls(delete) path
create path data // data can be ''
get path
stat /overseer/collection-queue-work
get /overseer/collection-queue-work/qn-0001379031
Create chroot path
create /the-chroot-path []

Solving Problem from Different Angles - Programmer Skills


Senario
Hit one issue in production: when we query solr, the parameter rows is too small. 
We need fix this problem ASAP.

One possible solution is to fix the code and make a new deployment.
Another possible solution is to change the data so the one we want to return to client listed at first.

The solution that we used is to change solrconfix xml(only we uses the solr server) to always get 100 rows, ignore rows client passes.

https://wiki.apache.org/solr/SearchHandler
-- invariants - provides param values that will be used in spite of any values provided at request time. They are a way of letting the Solr maintainer lock down the options available to Solr clients. Any params values specified here are used regardless of what values may be specified in either the query, the "defaults", or the "appends" params.

<requestHandler name="/select" class="solr.SearchHandler">
   <lst name="defaults">
     <str name="echoParams">explicit</str>
     <int name="rows">10</int>
   </lst>

   <lst name="invariants">
     <int name="rows">100</int>
   </lst>
</requestHandler>
Later we fixed the code and design issue, and removed the invariants in solrconfig.xml.

Lessons Learned about Programming and Soft Skills - 2016


Design schema carefully
The cost of changing/migrating old data is expensive, so it's important to try to make it right at first.
Give meaningful, easy-to-remember name
Use ID(instead of name) to reference other data.

Be schema-less
For example: in solr, only add explicit field if you need to search against it; for all other fields that we are sure we will never search, considering store all as one JSON or binary data field.
So it's easier to add/remove these fields.

Use public APIs as much as possible
-- For instance: when develop restful service: use apis from javax(HttpServletRequest) or javax.ws.rs.core if possible - not apis from com.sun.jersey or org.glassfish.jersey.
-- APIs from private apis is more likely to change.

Ensure You Can Wire On and Off Functions

Always Propose Different Solutions then Compare them
Use List and graph to compare pros and cons

Use Cases Analysis
How to implementation in high level 
What's the effort
Don't over-engineer

Pull form upstream(master) branch frequently
- Always work on latest code 

Build/Try simple workable code first

Solve Problem from Different Angles

What Could Possibly Go Wrong?
-- During coding, deployment

Soft Skills
How to find best solution 
Always try/propose different approaches
Talk with others/team members

Don't get too defensive for your solution
- Be open minded
What important is to:
- find best/easiest/robust solution - good for you to maintain code later
- Learn new things, new ways to tackle problems

Don't (just) try to convince others to take your approach
It's easy that we propose different approaches for feature other developer is working on.
But more important is to compare whether the new approach is better or whether there is other better solution.

Different doesn't means better
This is especially more important for senior engineers, because if you give not-well-thought solution, others may be (forced) convinced to take your approach.

More important is to carefully compare these different approaches in detail
-- How to implement all (important/basic) functions in each approach
-- How to handle new requirements later

Communication
Communicate early and often
Let others know what you are doing and your progress
-- Avoid duplicate work

Be open-minded
Learn aggressively

Java Frequently Used Utillity Methods


Objects.requireNonNull(key, message)
Objects.equals(a,b) - null safe
StringBuilder
.charAt(i) 
.setCharAt(int, char)
.setLength(0)/setLength(len-1)
- O(1) if decrease the length
.reverse()
.insert(offset,value)
.delete(start,end)
.deleteCharAt(i)


Arrays.equals(a1,a2)
.fill(long[] a, long val)
fill(long[] a, int fromIndex, int toIndex, long val)

To fill multiple dimension arrays:
int[][] target = new int[2][4];
int [] temp = new int[4];
Arrays.fill(temp, -1);

Arrays.fill(target, temp);

Collection
List
E remove(int index);
boolean remove(Object o);

ListIterator
next/hasNext/nextIndex, previous/hasPrevious/previousIndex, 
add/remove/set

ArrayList.subList
remove elements from startIndex to endIndex
list1.subList(startIndexendIndex).clear();
Don't update original list while still need the subList.

Arrays.asList(T…a) returns a Subclass of AbstractList that doesn’t implement the add or remove method - it throws UnsupportedOperationException.

JDK 8
map.putIfAbsent(sum, i);
Instead of:
if (!map.containsKey(sum)) {
    map.put(sum, i);
}

map.getOrDefault(key, defaultValue)
computeIfAbsent
computeIfPresent

Lambda
iterable.forEach(x -> x.setXXX(null));
list.removeIf(s -> s.length() == 0);

Stream
list.replaceAll(s -> s.toUpperCase());
list.sort((x, y) -> x.length() – y.length());
logger.finest(() -> complexMsg());
lineCount = Files.lines(path).count();

Collectors
stream().collect(Collectors.groupingBy(xxx));
mapStream.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));

Math:
Math.xxxExact - throws exception if overflow
addExact/subtractExact...
Base64

SecureRandom.getInstanceStrong()

Enum
SomeEnum someEnum = Enum.valueOf(SomeEnum.class, value);

// Need escape ': ''
MessageFormat.format

Initilaize 2-d array
int[][] target = new int[2][4];
int [] temp = new int[4];
Arrays.fill(temp, -1);
Arrays.fill(target, temp);

Get the inputstream from classpath:
ClassLoader.getResourceAsStream()

Guava
Preconditions
MoreObjects
firstNonNull(@Nullable T first, @Nullable T second)
joiner = Joiner.on("; ").skipNulls();
splitter = Splitter.on(',').trimResults().omitEmptyStrings();

ComparisonChain.start().compare(this.aString, that.aString)....compare(this.anEnum, that.anEnum, Ordering.natural().nullsLast()).result();

Ordering<String> ordering = = Ordering.nullsFirst().reverse();

Collections.sort(names, ordering);
Map<String, String> paramMap = Splitter.on("&").withKeyValueSeparator("=").split(params);
Joiner.on("&").withKeyValueSeparator("=").join(paramMap);

CaseFormat.LOWER_UNDERSCORE.to(CaseFormat.UPPER_CAMEL, tableName);

HtmlEscapers.htmlEscaper().escape(str)
UrlEscapers.urlFormParameterEscaper()/urlFragmentEscaper()/urlPathSegmentEscaper()/urlFormParameterEscaper().

Files.hash(file, Hashing.md5());
RateLimiter

Solr Commons
DateUtil.parseDate(dateString)
DateUtil.getThreadLocalDateFormat().format(date)

Top 16 Java Utility Classes
Apache Commons
StringUtils
CollectionUtils
ReflectionToStringBuilder.toString(this)

org.apache.commons.lang.SerializationUtils
clone
serialize
deserialize

ObjectUtils
firstNonNull(T... values)
defaultIfNull(final T object, final T defaultValue)

ArrayUtils.contains(arr,targetValue)


Testing Tips for Java Developers


Mockito

doReturn|Answer|Throw()

reset

  • after reset
Mock chained calls
Do different things when call multiple times
Invalid use of argument matchers!
  • When using matchers, all arguments have to be provided by matchers.
UnfinishedVerificationException: Missing method call for verify(mock)
  • in verify, use any() for primitive arguments
Testing Servlet

Hamcrest

[TestNG]

Spring + JUnit

Using REST Assured to test http APIs

randomizedtesting


Eclipse

Plugins

  • MoreUnit
    • Ctrl+U: create test method
    • Ctrl+J: jump to test method

Add Static Import Automatically

  • To help write test cases easier, when we type “assertT” and hit Ctrl+Space, we want Eclipse to add static import automatically for us: import static org.hamcrest.MatcherAssert.assertThat;
  • Go to Window > Preferences > Java > Editor > Content Assist > Favorites, then add:

Run Tests across Multiple Projects

Create a maven project depending on all the projects you want to test.
Create Test code:

Maven

  • Run specific method: mvn test -Dtest=className#methodName

Misc

Labels

adsense (5) Algorithm (69) Algorithm Series (35) Android (4) ANT (6) bat (8) Become a Better You (4) Big Data (7) Blogger (14) Bugs (4) Cache (5) Chrome (17) Code Example (29) Code Quality (6) Coding Skills (5) Concurrency (4) Database (7) Debug (16) Design (5) Dev Tips (62) Eclipse (32) GAE (4) Git (5) Good Programming Practices (4) Google (27) Guava (7) How to (9) Http Client (8) IDE (6) Interview (88) J2EE (13) J2SE (49) Jackson (4) Java (177) JavaScript (27) JSON (7) Learning code (9) Lesson Learned (6) Linux (22) Lucene-Solr (112) Mac (10) Maven (8) Memory Usage (4) Network (9) Nutch2 (18) OpenNLP (4) Performance (9) PowerShell (11) Problem Solving (11) Programmer Skills (6) regex (5) Review (4) Scala (6) Security (9) Soft Skills (38) Spark (4) Spring (22) System Design (11) Testing (6) Text Mining (14) Tips (12) Tools (24) Troubleshooting (29) UIMA (9) Web Development (19) Windows (21) xml (5)

Trending