Tips and Tricks for Atom Editor

-- Disabled by default; find autosave in packages, go to its settigns and select "enabled".
Find whitespace package, uncheck Ensure Single Trailing Newline option
Atom -> Preference -> Editor -> Enable soft wrap
Atom -> Config
    enabled: true
    softWrap: true
    ensureSingleTrailingNewline: false

Line Ending Converter
-- It can format json, xml and other program langs such as java etc.
dictionary: ctrl-cmd-k
-- Copy JSON data save as a.json. jsonlint will automatically run and check the json data.

Command Palette: Cmd-shift-P
Goto line:              Ctrl + g
Go to Matching Bracket Ctrl+m
Toggle Tree View: Cmd+\
Fuzzy Find Files
Increase Font Size: Cmd++
Decrease Font Size: Cmd+-

Convert to Upper Case: ⌘-k-u 
Convert to Lower Case: ⌘-k-L
Cut to End of Line:        Ctrl-k
Delete Line:               Ctrl+Shift+k

cmd+shift+: to bring up the list of corrections

How Solr Create Collection - Learn Solr Code

Test Code to create collections
MiniSolrCloudCluster cluster = new MiniSolrCloudCluster(4 /*numServers*/, testBaseDir, solrXml, JettyConfig.builder().setContext("/solr").build());
cluster.createCollection(collectionName, 2/*numShards*/, 2/*replicationFactor*/, "cie-default", null);

CollectionsHandler.handleRequestBody(SolrQueryRequest, SolrQueryResponse)

CollectionAction action = CollectionAction.get(a); // CollectionAction .CREATE(true)
CollectionOperation operation = CollectionOperation.get(action); //CollectionOperation .CREATE_OP(CREATE)

Map result =, rsp, this);
Return a mpa like this:
{name=collectionName, fromApi=true, replicationFactor=2, collection.configName=configName, numShards=2, stateFormat=2}

ZkNodeProps props = new ZkNodeProps(result);
if (operation.sendToOCPQueue) handleResponse(operation.action.toLower(), props, rsp, operation.timeOut);

CollectionsHandler. handleResponse
QueueEvent event = coreContainer.getZkController() .getOverseerCollectionQueue() .offer(Utils.toJSON(m), timeout);

This uses DistributedQueue.offer(byte[] data, long timeout) to add a task to /overseer/collection-queue-work/qnr-numbers.

It uses LatchWatcher to wait until this task is processed.

Overseer and OverseerCollectionProcessor

OverseerCollectionProcessor.processMessage(ZkNodeProps, String operation /*create*/)
OverseerCollectionProcessor.processMessage(ZkNodeProps, String)
OverseerCollectionProcessor.createCollection(ClusterState, ZkNodeProps, NamedList)

  ClusterStateMutator.getShardNames(numSlices, shardNames);
   positionVsNodes = identifyNodes(clusterState, nodeList, message, shardNames, repFactor); // round-robin if rule not set

  createConfNode(configName, collectionName, isLegacyCloud);
// This message will be processed by ClusterStateUpdater
// wait for a while until we do see the collection in the clusterState

  for (Map.Entry e : positionVsNodes.entrySet()) {
  if (isLegacyCloud) {
    shardHandler.submit(sreq, sreq.shards[0], sreq.params);
  } else {
    coresToCreate.put(coreName, sreq);

This will send http call and be handled by CoreAdminHandler.handleRequestBody.

  // if there were any errors while processing
  // the state queue, items would have been left in the
  // work queue so let's process those first
  byte[] data = workQueue.peek();
  boolean hadWorkItems = data != null;
  while (data != null)  {
    final ZkNodeProps message = ZkNodeProps.load(data);
    clusterState = processQueueItem(message, clusterState, zkStateWriter, false, null);
    workQueue.poll(); // poll-ing removes the element we got by peek-ing
    data = workQueue.peek();


  zkWriteCommand = processMessage(clusterState, message, operation);

  clusterState = zkStateWriter.enqueueUpdate(clusterState, zkWriteCommand, callback);

  case CREATE:
    return new ClusterStateMutator(getZkStateReader()).createCollection(clusterState, message);

overseer.ClusterStateMutator.createCollection(ClusterState, ZkNodeProps)

Gradle Tips and Tricks - 2017

Run tasks on sub-projects only
./gradlew sub-project:build

Skip Tasks
-x test -x findbugsMain -x findbugsTest

Run specific tests
gradle test --tests org.gradle.SomeTest.someMethod
gradle test --tests org.gradle.SomeTest
gradle test --tests org.gradle.internal*
//select all ui test methods from integration tests by naming convention
gradle test --tests *IntegTest*ui*
//selecting tests from different test tasks
gradle test --tests *UiTest integTest --tests *WebTest*ui

gw - run gradle in sub folders
brew install gdub

Maven Tips and Tricks - 2016

Lessons Learned about Programming and Soft Skills - 2017

How to compare different approaches
- Always think about different approaches (even if you already finished/committed code)

- Don't just choose one that looks good
- List them and compare them
- Always ask you why choose this approache
- Try hard to find problems in your current approach, and how to fix them
For small coding
- Implement them if possible
- Then compare which makes code cleaner, less change etc
Example: Exclude source and javadoc from -jar
APP_BIN=$APP_BIN_DIR/$(ls $APP_BIN_DIR | grep -E 'jarName-version.*jar' | grep -v sources | grep -v javadoc | grep -v pom)

How to quickly scan/learn new classes
Sometimes we need quickly scan/check a bunch of related classes to check how to implement a function, use a method etc
- Check the class's Javadoc
- Check the class signature
- Check main methods:
  - especially static methods
- Check call hierarchy in source code
- Check test code/examples
- Google search code example

Lessons Learned about Programming and Soft Skills - 2016

Bash Scripting Essentials

Brace Expansion
chown root /usr/{ucb/{ex,edit},lib/{ex?.?*,how_ex}}

Special Variables
$? Exit value of last executed command.
wait $pid

$! Process number of last background command.
$0 First word; that is, the command name. This will have the full pathname if it was found via a PATH search.
$n Individual arguments on command line (positional parameters).
$# Number of command-line arguments.

“$*” All arguments on command line as one string (“$1 $2…”). The values are separated by the first character in $IFS.
“$@” All arguments on command line, individually quoted (“$1” “$2” …).

Use variable $1, $2..$n to access argument passed to the function.

Hello () {
   echo "Hello $1 $2"
   return 10

Hello a b

for i in $( command ); do command $i; done

for i in $( command ); do
command $i

Google Shell Style Guide
quote your variables; prefer "${var}" over "$var",
  • Use "$@" unless you have a specific reason to use $*.
Use $(command) instead of backticks.
[[ ... ]] is preferred over [test

[[ ... ]] reduces errors as no pathname expansion or word splitting takes place between [[ and ]] and [[ ... ]] allows for regular expression matching where [ ... ] does not

Use readonly or declare -r to ensure they're read only.

Use Local Variables

Use set -o errexit (a.k.a. set -e) to make your script exit when a command fails.
Then add || true to commands that you allow to fail.
set -e - enable exit immediately
set +e - disable exit immediately

Use set -o nounset (a.k.a. set -u) to exit when your script tries to use undeclared variables.
Use set -o xtrace (a.k.a set -x) to trace what gets executed. Useful for debugging.

Use $(( ... )), not expr for executing arithmetic expressions. which is more forgiving about space
Use (( or let, not $(( when you don't need the result

Identify common problems with shellcheck.

Essential Linux Commands for Developers

How DistributedUpdateProcessor Works - Learning Solr Code

Case Study
Case 1: Update request is first sent to a follower (can be any node)
I: The coordinator nodes receives the add request:

DistribPhase phase =

phrase is None

DistributedUpdateProcessor.setupRequest(String, SolrInputDocument, String)

ClusterState cstate = zkController.getClusterState();
DocCollection coll = cstate.getCollection(collection);
Slice slice = coll.getRouter().getTargetSlice(id, doc, route, req.getParams(), coll);
String shardId = slice.getName();

In which shard this doc should store

Replica leaderReplica = zkController.getZkStateReader().getLeaderRetry(
    collection, shardId);
isLeader = leaderReplica.getName().equals(

Whether I am the leader that should store the doc: false

2. Forward to the leader that should store the doc
// I need to forward onto the leader...
nodes = new ArrayList<>(1);

3. DistributedUpdateProcessor.processAdd(AddUpdateCommand)
           (isLeader || isSubShardLeader ?
            DistribPhase.FROMLEADER.toString() :
            DistribPhase.TOLEADER.toString())); ==> TOLEADER
params.set(DISTRIB_FROM, ZkCoreNodeProps.getCoreUrl(
    zkController.getBaseUrl(), req.getCore().getName()));
cmdDistrib.distribAdd(cmd, nodes, params, false, replicationTracker);

II: The leader receives the request:
org.apache.solr.update.processor.UpdateRequestProcessorChain.createProcessor(SolrQueryRequest, SolrQueryResponse)
final String distribPhase = req.getParams().get(DistributingUpdateProcessorFactory.DISTRIB_UPDATE_PARAM);
skipToDistrib true
// skip anything that doesn't have the marker interface - UpdateRequestProcessorFactory.RunAlways

DistribPhase phase = TOLEADER
String fromCollection = updateCommand.getReq().getParams().get(DISTRIB_FROM_COLLECTION);
if (isLeader || isSubShardLeader) {
          // that means I want to forward onto my replicas...
          // so get the replicas...
          forwardToLeader = false;
nodes = follower nodes

2. The leader adds the doc locally first
boolean dropCmd = false;
if (!forwardToLeader) {    // forwardToLeader false
  dropCmd = versionAdd(cmd); // usually return false

private void doLocalAdd(AddUpdateCommand cmd) throws IOException {

if (willDistrib) { // true
  cmd.solrDoc = clonedDoc;

3. The leader forwards the add request to its followers
params = new ModifiableSolrParams(filterParams(req.getParams()));
           (isLeader || isSubShardLeader ?
            DistribPhase.FROMLEADER.toString() :
params.set(DISTRIB_FROM, ZkCoreNodeProps.getCoreUrl(
    zkController.getBaseUrl(), req.getCore().getName()));

if (replicationTracker != null && minRf > 1)
  params.set(UpdateRequest.MIN_REPFACT, String.valueOf(minRf));

cmdDistrib.distribAdd(cmd, nodes, params, false, replicationTracker);

III: Followers receives the request:
1. org.apache.solr.update.processor.UpdateRequestProcessorChain.createProcessor(SolrQueryRequest, SolrQueryResponse)
final String distribPhase = req.getParams().get(DistributingUpdateProcessorFactory.DISTRIB_UPDATE_PARAM); //FROMLEADER
final boolean skipToDistrib = distribPhase != null; // true

if (DistribPhase.FROMLEADER == phase && !couldIbeSubShardLeader(coll)) {
  if (req.getCore().getCoreDescriptor().getCloudDescriptor().isLeader()) {
    // locally we think we are leader but the request says it came FROMLEADER
    // that could indicate a problem, let the full logic below figure it out
  } else {
    isLeader = false;     // we actually might be the leader, but we don't want leader-logic for these types of updates anyway.
    forwardToLeader = false;
    return nodes;

return empty nodes
if (!forwardToLeader) {
  dropCmd = versionAdd(cmd);

Case 2: The update request is sent to leader which should store this doc
DistributedUpdateProcessor.setupRequest(String, SolrInputDocument, String)
DistribPhase phase: none

Replica leaderReplica = zkController.getZkStateReader().getLeaderRetry(
    collection, shardId);
isLeader = leaderReplica.getName().equals(

if (isLeader || isSubShardLeader) {
          // that means I want to forward onto my replicas...
          // so get the replicas...
          forwardToLeader = false;
nodes = followers

It will forward the request to its followers with params:

if (!forwardToLeader) { // false
  dropCmd = versionAdd(cmd);
It will add to its local at this stage

// It doesn't forward this request to itself again, so no stage update.distrib=TOLEADER

Case 3: The add request is sent to a leader which should not own this doc
Case 4: The add request is sent to a leader which should not own this doc
The coordinator node will forward the add request to the leader of the shard that should store the request

DistributedUpdateProcessor.setupRequest(String, SolrInputDocument, String)
ClusterState cstate = zkController.getClusterState();
DocCollection coll = cstate.getCollection(collection);
Slice slice = coll.getRouter().getTargetSlice(id, doc, route, req.getParams(), coll);
String shardId = slice.getName();

decide which shard this doc belongs to
return nodes - the leader that should store the doc


cmdDistrib.distribAdd(cmd, nodes, params, false, replicationTracker);

Case 5: Send multiple docs in one command to a follower
XMLLoader.processUpdate(SolrQueryRequest, UpdateRequestProcessor, XMLStreamReader)

while (true) {
  if ("doc".equals(currTag)) {
    if(addCmd != null) {
      log.trace("adding doc...");
      addCmd.solrDoc = readDoc(parser);
    } else {
      throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, "Unexpected tag without an tag surrounding it.");

It calls processAdd for each doc.

Related Code

UpdateRequestProcessorChain.createProcessor(SolrQueryRequest, SolrQueryResponse)

if the chain includes the RunUpdateProcessorFactory, but does not include an implementation of the DistributingUpdateProcessorFactory interface, then an instance of DistributedUpdateProcessorFactory will be injected immediately prior to the RunUpdateProcessorFactory.
if (0 <= runIndex && 0 == numDistrib) {
  // by default, add distrib processor immediately before run
  DistributedUpdateProcessorFactory distrib
    = new DistributedUpdateProcessorFactory();
  distrib.init(new NamedList());
  list.add(runIndex, distrib);


DistribPhase phase = DistribPhase.parseParam(req.getParams().get(DISTRIB_UPDATE_PARAM))

boolean isOnCoordinateNode = (phase == null || phase == DistribPhase.NONE);

How To Conduct a Technical Interview Effectively

Technical Skills
- Problem solving: not-easy algorithm questions
- Coding
- Design
Soft skills
- Communication
- Retrospect
  - Mistakes related with design/decision
  - What you learned from your mistake
  - Bugs/troubleshooting
- Eager to learn
- Be flexible, willing to listen, not stubborn

What questions to ask
- ask interesting/challenging questions
- Or questions that's not difficult but focus on coding (bug free)
- ask questions that can be solved in different ways
- Avoid questions that can only solved one specific approach, unless it's obvious(binary search etc), and you are tesing coding skills not problem solving skills

Don't ask 
- brain teasers, puzzles, riddles
- problems only because you are interested, you just happen to know, or you just learned recently

Know the questions very well
- Different approaches
- Expect different approaches that you don't even know
  - Verify it(use example, proof), if it works, the candidate does a good job and you also learn something new

Know common cause of bugs
- Able to detect bugs in candidate's code quickly

Give candidates the opportunity to prove themselves and shine
We are trying to evaluate the candidate's skills thoroughly, what he/she is good at, what not.
If you plan to ask 2 coding questions, one simple, and one more difficult, tell candidates
Let the candidates know your expectation

Make the candidates learn something
- If the candidate doesn't give right solution/answer, and at the end of the interview, he/she wants to know how to approach it, tell him/her.
- Candidates takes a lot of effort for the interview (one day off and commute), if they desire to learn something, and learning something make them feel good
- Prove that you know the solution and have reasonable answer, and not ask questions you even don't know much

No surprise
If you find issues/bugs in candidate's code or design, point them out
The candidate should have a rough idea about how he/she performs in this interview

Be fair

Phone interview
Prefer coding question over design question
- as design is partly about communication and it's hard to test communication skills over phone

About me - Jeffery Yuan (2017)

This would be a short list that about I am good at and what I should improve.
- I will keep updating it, and hope when I retrospect after 1 year, I will realize that I have improved  and learned a lot of things.

Retrospect and Learning Logs
- I like to summarize what I have learned, and write them down

Sharing Knowledge

Problem Solving and troubleshooting
- I like to solve difficult problems as I can always learn something from it.
- I also summarize how(what steps) I take to solve the problems, what I learned that can make me solve problems quicker later.
- Search and find resource needed to solve the problem
- See more at my blog: Troubleshooting

Proactively find problems and fix them
- such as find problems in existing design and code, and think about how to improve them

Be honest
- to myself and colleague about what I know and what I don't
Be moderate
- I know there are still a lot of things that I should learn and improve.
- I like to learn from others

Proactively learning
- Have a safaribooksonline account
- Like to learn from book, and people
- When I use Cassandra, Kafka in our project, I took time to learn not only how to use it but more importantly its high level design.
- Read more at my log System Design
Programmer: Lifelong Learning

Weakness - things need improving
System design
Knowledge about distributed system
Public Speaking

How to Review and Discuss Software Design

Talk/Think about all related
- how do we store data, 
- client api 
- ui change
- back compatibility: how to handle old data/client

But focus on most important stuff (first)

Talk/think about design principles/practices
- such as idempotent, parallelization,monitoring, etc
- Check more at System Design - Summary

What's the impact of other (internal and cross-team) components?

How others components use it?

What're the known and potential constraints/issues/flaws in current design?
Don't only talk about its advantages, 
Also talk about issues, don't hide them

What are alternatives?
Think alternative and different approaches, this can help find better solution
We can't really review and compare if there is no alternatives

Welcome different approaches
- although it doesn't mean it's better, or we will use it

Development Cost
- How difficult it takes to implement?

What may change and How to evolve

What may change in (very) near future?

How do can we know when the new feature works or doesn't work
How can we know problems happen

Feature Flag
Can we enable/disable the feature at runtime

Be Prepared
Ok to have informal/impromptu discussion with one or two colleagues

But make sure everyone is prepared for the formal team design discussion
All attendees should know the topic: how they would design it

Don't make design decision immediately - for things that really matters
Take time to reflect and develop disagreement, talk it again later

Listen first

When you don't agree with other's approaches
Don't get too defensive
Talk about ideas not people

Be prepared

System Design - Summary

Problem Solving Practice - Redis cache.put Hangs

The Issue
After deployed the change: Multi Tiered Caching - Using in-process EhCache in front of Distributed Redis to test environment (with some other change and someone did some change in the server like restart), we found out that cache.put hangs when save data to redis.

Troubleshooting Process
First we tried to reproduce the issue in my local setup, it always works. But we can easily reproduce it in test environment.

This mde me think this maybe something related with the test environment.

Then I used kill -8 processId to generate several thread dumps when reproduce the issue in test machine. I found out some suspect:
"ajp-nio-8009-exec-10" #91 daemon prio=5 os_prio=0 tid=0x00007f49c400a800 nid=0x75db waiting on condition [0x00007f495333e000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at  RedisCache$RedisCachePutCallback(RedisCache$AbstractRedisCacheCallback).waitForLock(RedisConnection) line: 600
RedisCache$RedisCachePutCallback(RedisCache$AbstractRedisCacheCallback).doInRedis(RedisConnection) line: 564
at com.lifelong.example.MultiTieredCache.lambda$put$40(
at com.lifelong.example.MultiTieredCache$$Lambda$18/1283186866.accept(Unknown Source)
at java.util.ArrayList.forEach(
at com.lifelong.example.MultiTieredCache.put(
at org.springframework.cache.interceptor.AbstractCacheInvoker.doPut(
at org.springframework.cache.interceptor.CacheAspectSupport$CachePutRequest.apply(
at org.springframework.cache.interceptor.CacheAspectSupport.execute(
at org.springframework.cache.interceptor.CacheAspectSupport.execute(
at org.springframework.cache.interceptor.CacheInterceptor.invoke(

Check the code at RedisCache$AbstractRedisCacheCallback to understand how it works:
for operations like put/putIfAbsent/evict/clear, @cacheable with sync =true(RedisWriteThroughCallback), it check whether there is a key like cacheName~lock in redis, if exist, it will wait until it's gone.

This lock is created and deleted for @Cacheable with sync =true in RedisWriteThroughCallback which calls lock and unlock methods.

This made me check the settings in redis: after created the tunnel to redis, ran command: key cacheName~lock, I found out that it's indeed there.

Now everything make sense:
- we did set sync=true and run performance test, then restarted the server and removed it. The cacheName~lock was left there may be due to server restart. Due to the cacheName~lock, now all resid update api would not work.

After removed cacheName~lock in redis, everything works fine.

Take away
- When use some feature (@Cacheable(sync=true) in this case), know how it's implemented.

Multi Tiered Caching - Using in-process Cache in front of Distributed Cache

Why Multi Tiered Caching?
  To improve application's performance, we usually cache data in distributed cache like redis/memcached or in-process cache like EhCache. 

  Each have its own strengths and weaknesses:
  In-Process Cache is faster but it's hard to maintain consistency and can't store a lot of data; This can be easily solved when using a distributed cache, but it's slower due to network latency and serialization and deserialization.

  In some cases, we may want to use both: mainly use a distributed cache to cache data, but also cache data that is small and doesn't change often (or at all) such as configuration in in-process cache.
The Implementation
  Spring uses CacheManager to determine which cache implementation to use.
  We define our own MultiTieredCacheManager and MultiTieredCache like below.
public class MultiTieredCacheManager extends AbstractCacheManager {
    private final List<CacheManager> cacheManagers;
     * @param cacheManagers - the order matters, when fetch data, it will check the first one if not
     *        there, will check the second one, then back-fill the first one
    public MultiTieredCacheManager(final List<CacheManager> cacheManagers) {
        this.cacheManagers = cacheManagers;
    protected Collection<? extends Cache> loadCaches() {
        return new ArrayList<>();
    protected Cache getMissingCache(final String name) {
        return new MultiTieredCache(name, cacheManagers);

public class MultiTieredCache implements Cache {
    private static final Logger logger = LoggerFactory.getLogger(MultiTieredCache.class);

    private final List<Cache> caches = new ArrayList<>();
    private final String name;

    public MultiTieredCache(final String name, @Nonnull final List<CacheManager> cacheManagers) { = name;
        for (final CacheManager cacheManager : cacheManagers) {

    public ValueWrapper get(final Object key) {
        ValueWrapper result = null;
        final List<Cache> cachesWithoutKey = new ArrayList<>();
        for (final Cache cache : caches) {
            result = cache.get(key);
            if (result != null) {
            } else {
        if (result != null) {
            for (final Cache cache : cachesWithoutKey) {
                cache.put(key, result.get());
        return result;

    public <T> T get(final Object key, final Class<T> type) {
        T result = null;
        final List<Cache> noThisKeyCaches = new ArrayList<>();
        for (final Cache cache : caches) {
            result = cache.get(key, type);
            if (result != null) {
            } else {
        if (result != null) {
            for (final Cache cache : noThisKeyCaches) {
                cache.put(key, result);

        return result;
    // called when set sync = true in @Cacheable
    public <T> T get(final Object key, final Callable<T> valueLoader) {
        T result = null;
        for (final Cache cache : caches) {
            result = cache.get(key, valueLoader);
            if (result != null) {
        return result;
    public void put(final Object key, final Object value) {
        caches.forEach(cache -> cache.put(key, value));
    public void evict(final Object key) {
        caches.forEach(cache -> cache.evict(key));
    public void clear() {
        caches.forEach(cache -> cache.clear());
    public String getName() {
        return name;
    public Object getNativeCache() {
        return this;

public class CacheConfig extends CachingConfigurerSupport {
  public CacheManager cacheManager(EhCacheCacheManager ehCacheCacheManager, RedisCacheManager redisCacheManager) {
      if (!cacheEnabled) {
          return new NoOpCacheManager();
      // Be careful when make change - the order matters
      ArrayList<CacheManager> cacheManagers = new ArrayList<>();
      if (ehCacheEnabled) {
      if (redisCacheEnabled) {
      return new MultiTieredCacheManager(cacheManagers);

  public EhCacheCacheManager ehCacheCacheManager() {
      final EhCacheManagerFactoryBean ehCacheManagerFactoryBean = new EhCacheManagerFactoryBean();
      ehCacheManagerFactoryBean.setConfigLocation(new ClassPathResource("ehcache.xml"));

      final EhCacheManagerWrapper ehCacheManagerWrapper = new EhCacheManagerWrapper();
      return ehCacheManagerWrapper;

  @Bean(name = "redisCacheManager")
  public RedisCacheManager redisCacheManager(final RedisTemplate<String, Object> redisTemplate) {
      final RedisCacheManager redisCacheManager =
              new RedisCacheManager(redisTemplate, Collections.<String>emptyList(), true);
      return redisCacheManager;
  Others things we can do when use multi (tiered) cache in CacheManager:
- We can use cache name prefix to determine which cache to use.
- We can add logic to only cache some kinds of data in specific cache.

- able to use only Distributed Cache or only in-process Cache

Making Child Documents Working with Spring-data-solr

The Problem
We use spring-data-solr in our project - as we like its conversion feature which can convert string to enum, entity to json data and etc, and vice versa, and recently we need use Solr's nested documents feature which spring-data-solr doesn't support.

Issues in Spring-data-solr
SolrInputDocument class contains a Map _fields AND List _childDocuments.

Spring-data-solr converts java entity class to SolrDocument. It provides two converters: MappingSolrConverter and SolrJConverter.

MappingSolrConverter converts the entity to a Map: MappingSolrConverter.write(Object, Map, SolrPersistentEntity)

SolrJConverter uses solr's DocumentObjectBinder to convert entity to SolrInputDocument,
it will convert field that is annotated with @Field(child = true) to child documents.
- This also means that spring-data-solt's convert features will not work with SolrJConverter

BUT SolrJConverter still just thinks SolrInputDocument is a map and add all into the destination: Map sink
- SolrJConverter.write(Object, Map)

After this, the child documents is discarded.

The Fix
We still want to use spring-data-solr's conversion functions - partly because we don't want to rewrite everything to use SolrJ directly.

So when save to solr: we uses spring-data-solr's MappingSolrConverter to convert parent entity as solrInputDocument, then convert child entities as solrInputDocuments and add them into parent's solrInputDocument.

When read from solr, we read the SolrDocument as parent entity, then read its child documents as child entities and add them into parent entity.
public class ParentEntity {
  @Field(child = true)
  private List<ChildEntity> children;
protected SolrClient solrClient;

// we add our own converters into MappingSolrConverter
// for more, please check 
protected MyMappingSolrConverter solrConverter;

public void save(@Nonnull final ParentEntity parentEntity) {
    final SolrInputDocument solrInputDocument = solrConverter.createAndWrite(parentEntity);
    daddChildDocuemnts(parentEntity, solrInputDocument);
    try {
        solrClient.add(getCollection(), solrInputDocument);
    } catch (SolrServerException | IOException e) {
        throw new BusinessException(e, "failed to save " + parentEntity);

protected void daddChildDocuemnts(@Nonnull final ParentEntity parentEntity,
        @Nonnull final SolrInputDocument solrInputDocument) {
            .map(child -> solrConverter.createAndWrite(child)).collect(Collectors.toList()));

public List<T> querySolr(final SolrParams query) {
    try {
        final QueryResponse response = solrClient.query(getCollection(), query);
        return convertFromSolrDocs(response.getResults());
    } catch (final Exception e) {
        throw new BusinessException("data retrieve failed." + query);
 * Also return child documents in solr response as ChildEntity if it exists
protected List<ParentEntity> convertFromSolrDocs(final SolrDocumentList docList) {
    List<ParentEntity> result = new ArrayList<>();
    if (docList != null) {
        result = -> {
            final ParentEntity parentEntity =, solrDoc);
            final List<SolrDocument> childDocs = solrDoc.getChildDocuments();
            if (childDocs != null) {
               ->, solrDoc))

            return parentEntity;

    return result;
Mix Spring Data Solr and SolrJ in Solr Cloud 5
SolrJ: Support Converter and make it easier to extend DocumentObjectBinder

Eclipse: Add another Project as Dependency may Cause Unexpected Exception

The Problem
In local development, we run spring-boot application in eclipse tomcat - as we also deploy the project as a war.

But for some reason, one developer stills run it as a java application, and it fails with error - while it works well when run in (eclipse) tomcat.
Bean instantiation via factory method failed; nested exception is org.springframework.beans.BeanInstantiationException:
Failed to instantiate []: Factory method 's3Client' threw exception;
nested exception is java.lang.NoSuchMethodError: com.amazonaws.handlers.HandlerChainFactory.getGlobalHandlers()Ljava/util/List;

The Root Cause
Maven uses nearest wins strategy to determine which version to use, So we explicitly specify what version of aws-java-sdk-s3 to use in admin's pom.xml, but one library also implicitly depends on aws-java-sdk-s3 in common-module project. 

run mvn dependency:tree -Dverbose -Dincludes=com.amazonaws:aws-java-sdk-core, which shows that maven chooses the right version.
[INFO] |  \- com.amazonaws:aws-java-sdk-cloudfront:jar:1.11.32:compile
[INFO] |     \- (com.amazonaws:aws-java-sdk-core:jar:1.11.32:compile - omitted for conflict with 1.11.98)

- We can also get which version maven uses and why in eclipse: 
open pom,xml then go to dependency hierarchy tab , select the library in right panel.

I tried to run it as a java application, it works fine. But why it failed in his environment?

I compared the difference between his eclipse setup and mine, and found out that he manually added common-module in admin's Java Build Path -> Projects tab.

Now it's kind of clear why it failed: when we add a project as dependency, Eclipse also includes all libraries it depends on to the project. So now the project includes both versions, and Eclipse chooses the wrong version to use.

I created one bug Bug 514094 - Adding another Project as Dependency Causes Unexpected Exception to track it.

Troubleshooting - JsonMappingException: Already had POJO for id

The Problem
We have two entities with one-to-many relationships which references each other, but it failed with the exception:
com.fasterxml.jackson.databind.JsonMappingException: Already had POJO for id

The Fix
To easily troubleshoot the issue, I created a sample class like below:
@Accessors(chain = true)
@EqualsAndHashCode(of = {"employeeId"}, callSuper = false)
@JsonIdentityInfo(generator = ObjectIdGenerators.PropertyGenerator.class, property = "employeeId")
public static class Employee {
    private UUID employeeId;
    private String name;
    private Department department;
@Accessors(chain = true)
@EqualsAndHashCode(of = {"departmentId"}, callSuper = false)
@JsonIdentityInfo(generator = ObjectIdGenerators.PropertyGenerator.class, property = "departmentId")
@ToString(exclude = "employees")
private static class Department {
    private UUID departmentId;
    private String name;
    private Set<Employee> employees;

    public Set<Employee> getEmployees() {
        if (employees == null) {
            employees = new HashSet<>();
        return employees;

public static void main(String[] args) throws IOException {
    Employee e1 = new Employee().setEmployeeId(UUID.randomUUID()).setName("e1");
    Department d1 =
            new Department().setDepartmentId(UUID.randomUUID()).setName("oldD1").setEmployees(Sets.newHashSet(e1));
    ObjectMapper objectMapper = new ObjectMapper();

    String departmentStr = objectMapper.writeValueAsString(d1);

    Department oldD1 = objectMapper.readValue(departmentStr, Department.class);

    Department newD1 = new Department().setDepartmentId(d1.getDepartmentId()).setName("newD1");
    // without the following statements: it will throw
    // com.fasterxml.jackson.databind.JsonMappingException: Already had POJO for id
    // for (Employee e : oldD1.getEmployees()) {
    // e.setDepartment(newD1);
    // }

    departmentStr = objectMapper.writeValueAsString(newD1);
    System.out.println("new department: " + departmentStr);
    // now read it back will throw sonMappingException: Already had POJO for id
    Department newNewD1 = objectMapper.readValue(departmentStr, Department.class);
    System.out.println("---" + newNewD1);
This reproduces the issue, and from the output:
new department: {"departmentId":"e3e0e676-0c52-493d-8f49-bedde05cbb11","name":"newD1","employees":[{"employeeId":"6b7bbbec-8be6-4423-a4ef-af7924df177b","name":"e1","department":{"departmentId":"e3e0e676-0c52-493d-8f49-bedde05cbb11","name":"oldD1","employees":["6b7bbbec-8be6-4423-a4ef-af7924df177b"]}}]}
I found that after I changed the department to newD1, the employee still refers to old department object with department name: oldD1.

This leads to my fix like below: after I made change to the department object, make sure the employees refers to the new department object.

// without the following statements: it will throw
// com.fasterxml.jackson.databind.JsonMappingException: Already had POJO for id
for (Employee e : oldD1.getEmployees()) {

We need exclude employees from Department's toString: @ToString(exclude = "employees")
- Otherwise it would throw java.lang.StackOverflowError
Likewise, we need exclude employees from @EqualsAndHashCode.

Support Spring Expression Language in Spring AOP

User Case
We want to create @Loggable so developers can use it to specify log level and what to log before or after the method being called. developers can use #p0, #p1 to log param values, use #result to log response or specify any valid spring expression.

Learning How to Do it from Spring Code
We know that spring cache annotations supports spring expression language, we can use #p0, #p1 in @Cacheable, use #result in @CachePut. So we can debug spring cache code to figure it how it works.

Relate code in Spring CacheAspectSupport.execute
CacheAspectSupport.generateKey(CacheOperationContext, Object)

if (result != NO_RESULT) evaluationContext.setVariable(RESULT_VARIABLE, result);
- To support #result, we just put method return result value into the varaible: result.

The Implementation
public @interface Loggable {
    Level level() default Level.INFO;
     * This value has to be a valid spring expression<br>
     * Example: "'-begin'" or use #p0, #p1 to refer method params.
     * @return
    String beginMessage() default "";
     * This value has to be a valid spring expression<br>
     * Example: "#result", "'-end'"
     * @return
    String endMessage() default "";
public class LoggableAspect {
    private boolean throwExceptionIfInvalidExpression;
    public static final String RESULT_VARIABLE = "result";
     * It is recommended to reuse ParameterNameDiscoverer instances as far as possible.
    private static final ParameterNameDiscoverer parameterNameDiscoverer =
            new LocalVariableTableParameterNameDiscoverer();
     * SpEL parser. Instances are reusable and thread-safe.
    private static final ExpressionParser parser = new SpelExpressionParser();

    public Object logExecutionTime(ProceedingJoinPoint joinPoint) throws Throwable {
        Method method = ((MethodSignature) joinPoint.getSignature()).getMethod();
        final Class<?> declaringClass = method.getDeclaringClass();
        final Logger logger = LoggerFactory.getLogger(declaringClass);
        final Loggable loggable = method.getAnnotation(Loggable.class);
        Object result = null;
        if (loggable != null) {
            final Level logLevel = loggable.level();
            if (whetherParseExpression(loggable.beginMessage(), logger, logLevel)) {
                final EvaluationContext beginContext = new MethodBasedEvaluationContext(invocation.getThis(), method,
                        invocation.getArguments(), parameterNameDiscoverer);
                parseExpressionValue(logger, logLevel, loggable.beginMessage(), beginContext);
            try {
                result = invocation.proceed();
                if (whetherParseExpression(loggable.endMessage(), logger, logLevel)) {
                    final EvaluationContext context = new MethodBasedEvaluationContext(invocation.getThis(), method,
                            invocation.getArguments(), parameterNameDiscoverer);
                    context.setVariable(RESULT_VARIABLE, result);
                    parseExpressionValue(logger, logLevel, loggable.endMessage(), context);
                return result;
            } catch (final RuntimeException e) {
                logValues(logger, "Failed with exception: " + e.getMessage(), logLevel);
                throw e;
        return invocation.proceed();

    private void parseExpressionValue(final Logger logger, final Level logLevel, final String expression,
            final EvaluationContext context) {
        if (StringUtils.isBlank(expression)) {
        Object value = null;
        try {
            value = parser.parseExpression(expression).getValue(context);
        } catch (final Exception e) {
            if (throwExceptionIfInvalidExpression) {
                throw new VMSBusinessException(ErrorCode.internal_error, e,
                        "Failed to parse expression: " + expression);
        logValues(logger, value != null ? value.toString() : expression, logLevel);

     * If the log level is not enabled, no need to do anything at all.
    private boolean whetherParseExpression(String expression, final Logger logger, final Level logLevel) {
        if (StringUtils.isBlank(expression)) {
            return false;
        switch (logLevel) {
            case INFO:
                return logger.isInfoEnabled();
            case ERROR:
                return logger.isErrorEnabled();
            case WARN:
                return logger.isWarnEnabled();
            case DEBUG:
                return logger.isDebugEnabled();
            case TRACE:
                return logger.isTraceEnabled();
                return false;

Cassandra in Theory and Practice

Not using the “in” query for multiple partitions
- Query them one by one instead

Primary key vs partition key
The first part of primary key is partition key which determines which node stores the data.
Composite/compound keys
skinny rows
- the primary key only contains the partition key
wide rows

- the primary key contains columns other than the partition key

primary key restrictions
- it must contain all the primary key columns of the base table. This ensures that every row of the view correspond to exactly one row of the base table.
- it can only contain a single column that is not a primary key column in the base table.

Materialized view
- implemented as normal Cassandra table which takes as the same amount of disk space as the base table

Table design
- Determine what queries to support, use different tables(or Materialized view) for different queries if needed
- Avoid hot spot and unbounded row growth
- Spreads data evenly
- Minimal partitions read
DESCending for time to search for recent, time-based data

We can only run EQ or IN in partition key.

How deletes are implemented and why
Delete and tombstones
- grace period
Understanding Deletes
A row tombstone is a row with no liveness_info and no cells.
A cell tombstone: no liveness_info at the column level
Range delete
Partition delete

Local Index
Secondary index is slow, requires to access all nodes
- only suited for low cardinality data

SASI - SStable-Attached Secondary Indexing
- a new on-disk format based on B+ trees
- it attaches to each sstable/memtable its own immutable index file

- SSTable in memory
- write-back cache

off-heap memory
- Same concept for Cassandra, Kafka

- serialize cache data (row-cache, key cache) to avoid cold restart

DESCRIBE keyspaces;
describe tables;

COPY keyspace.table to 'output.txt';
COPY keyspace.table(column1,c2) to 'output.txt';

Write query result to file
cqlsh -e'cqlQuery' > output.txt

Use CAPTURE command to export the query result to a file:
cqlsh> CAPTURE
cqlsh> CAPTURE '~/output.txt';

File Store Format
Data (Data.db)
Primary Index (Index.db)
SSTable Index Summary (SUMMARY.db)
Bloom filter (Filter.db)
Compression Information (CompressionInfo.db)
Statistics (Statistics.db)
SSTable Table of Contents (TOC.txt)

Secondary Index (SI_.*.db)


Java (159) Lucene-Solr (112) Interview (61) All (58) J2SE (53) Algorithm (45) Soft Skills (38) Eclipse (33) Code Example (31) Linux (25) JavaScript (23) Spring (22) Windows (22) Web Development (20) Tools (19) Nutch2 (18) Bugs (17) Debug (16) Defects (14) Text Mining (14) J2EE (13) Network (13) Troubleshooting (13) PowerShell (11) Chrome (9) Design (9) How to (9) Learning code (9) Performance (9) Problem Solving (9) UIMA (9) html (9) Http Client (8) Maven (8) Security (8) bat (8) blogger (8) Big Data (7) Continuous Integration (7) Google (7) Guava (7) JSON (7) Shell (7) ANT (6) Coding Skills (6) Database (6) Lesson Learned (6) Programmer Skills (6) Scala (6) Tips (6) css (6) Algorithm Series (5) Cache (5) Dynamic Languages (5) IDE (5) System Design (5) adsense (5) xml (5) AIX (4) Code Quality (4) GAE (4) Git (4) Good Programming Practices (4) Jackson (4) Memory Usage (4) Miscs (4) OpenNLP (4) Project Managment (4) Spark (4) Testing (4) ads (4) regular-expression (4) Android (3) Apache Spark (3) Become a Better You (3) Concurrency (3) Eclipse RCP (3) English (3) Happy Hacking (3) IBM (3) J2SE Knowledge Series (3) JAX-RS (3) Jetty (3) Restful Web Service (3) Script (3) regex (3) seo (3) .Net (2) Android Studio (2) Apache (2) Apache Procrun (2) Architecture (2) Batch (2) Bit Operation (2) Build (2) Building Scalable Web Sites (2) C# (2) C/C++ (2) CSV (2) Career (2) Cassandra (2) Distributed (2) Fiddler (2) Firefox (2) Google Drive (2) Gson (2) How to Interview (2) Html Parser (2) Http (2) Image Tools (2) JQuery (2) Jersey (2) LDAP (2) Life (2) Logging (2) Python (2) Software Issues (2) Storage (2) Text Search (2) xml parser (2) AOP (1) Application Design (1) AspectJ (1) Chrome DevTools (1) Cloud (1) Codility (1) Data Mining (1) Data Structure (1) ExceptionUtils (1) Exif (1) Feature Request (1) FindBugs (1) Greasemonkey (1) HTML5 (1) Httpd (1) I18N (1) IBM Java Thread Dump Analyzer (1) JDK Source Code (1) JDK8 (1) JMX (1) Lazy Developer (1) Mac (1) Machine Learning (1) Mobile (1) My Plan for 2010 (1) Netbeans (1) Notes (1) Operating System (1) Perl (1) Problems (1) Product Architecture (1) Programming Life (1) Quality (1) Redhat (1) Redis (1) Review (1) RxJava (1) Solutions logs (1) Team Management (1) Thread Dump Analyzer (1) Visualization (1) boilerpipe (1) htm (1) ongoing (1) procrun (1) rss (1)

Popular Posts