How to Read Internal docs to Solve Problems

Often, we need find some docs in internal website and read them to figure out how to do some task.

Internal vs Internet
This is different from searching internet, which usually we can find plenty of resources, after read multiple articles, we can usually have a sense how to do it. It's fine if we don't read some articles carefully, - not ideal, but usually we can find related/similar articles and can understand or get it at that time.

Usually there would be only a few docs/pages for internal docs: they are well documented, and give all information we need.

But if you don't read them carefully or miss some key information, you would be mot able to solve the problem by just reading the docs. - you can still ask help from others. 

Be organized when read the docs
- Maybe open all related docs in a different browser, in one window
- Maybe start with the entrance page (maybe given by someone) , and follow the links while read it

Notice/Write down what you don't understand or is strange to you
-- Usually they are the keys to solve the problem

Use tools like evernote to help read the docs
- add note, highlight content etc

First find/know all internal websites that are useful
Search them, read them carefully
Examples: docker sidecar, yubikey-ssh

Tips and Tricks for Docker

docker run -d -p 80:80 --name webserver nginx
-v host-dir:container-dir
-p host-port:container-port

docker ps
Attach to existing container's shell
docker exec -it container_id /bin/sh

docker kill/stop $(docker ps -q)
docker build --no-cache .
docker-compose build --no-cache mysql

Delete all containers
docker rm $(docker ps -a -q)
Delete all images
docker rmi -f $(docker images -q)

Clean up disk space used by Docker
docker system df
docker system prune

Increase docker memory
In advance tab of preference, change the memory or cpu.
configure --memory 6g in docker run command to set a different value.

Disable container auto-run
docker update --restart=no my-container

List Docker Container Names and IPs
docker ps -q | xargs -n 1 docker inspect --format '{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}} {{ .Name }}' | sed 's/ \// /'

docker stats
check memory and cpu usage

docker-compose up

RUN creates an intermediate container, runs the script and freeze the new state of that container in a new intermediate image.


Prefer exec form over shell form
docker run --entrypoint [my_entrypoint] containter_name [command 1] [arg1] [arg2]

Use COPY if don't need ADD's specical features

Use dockerfile to build each image, docker compose file to assemble them.

EXPOSE 8983-8986
USER builder

up, down, logs, 
scale service=number
docker-compose build

1. absolute path or relative path started with ./ means local path
2. named volume that references from the volumes list
docker volume prune
docker volume ls
docker inspect $volume_name

Change hostname
docker run -it -h myhost ...
docker run --rm -it --cap-add SYS_ADMIN ...

docker logs $container_id
docker events &

Tips and Tricks for Atom Editor


Show files/folders ignored by .gitignore
  • uncheck “Hide VCS Ignored Files” in Tree View package
Make files/folders ignored by .gitignore searchable
  • uncheck “Exclude VCS Ignored Paths” in Settings
  • disabled by default, find autosave in packages, go to its settings and select “enabled”.
uncheck “Ensure Single Trailing Newline”
  • Find whitespace package, uncheck “Ensure Single Trailing Newline” option
Auto Reveal Tree-view
  • Find Tree-view package, select “Auto Reveal”
Atom -> Preference -> Editor -> Enable soft wrap
Show or edit config at: Atom -> Config


Check Atom Shortcuts at here
  • Use Cmd+/ to comment: it know the right syntax for different language

How Solr Create Collection - Learn Solr Code

Test Code to create collections
MiniSolrCloudCluster cluster = new MiniSolrCloudCluster(4 /*numServers*/, testBaseDir, solrXml, JettyConfig.builder().setContext("/solr").build());
cluster.createCollection(collectionName, 2/*numShards*/, 2/*replicationFactor*/, "cie-default", null);

CollectionsHandler.handleRequestBody(SolrQueryRequest, SolrQueryResponse)

CollectionAction action = CollectionAction.get(a); // CollectionAction .CREATE(true)
CollectionOperation operation = CollectionOperation.get(action); //CollectionOperation .CREATE_OP(CREATE)

Map result =, rsp, this);
Return a mpa like this:
{name=collectionName, fromApi=true, replicationFactor=2, collection.configName=configName, numShards=2, stateFormat=2}

ZkNodeProps props = new ZkNodeProps(result);
if (operation.sendToOCPQueue) handleResponse(operation.action.toLower(), props, rsp, operation.timeOut);

CollectionsHandler. handleResponse
QueueEvent event = coreContainer.getZkController() .getOverseerCollectionQueue() .offer(Utils.toJSON(m), timeout);

This uses DistributedQueue.offer(byte[] data, long timeout) to add a task to /overseer/collection-queue-work/qnr-numbers.

It uses LatchWatcher to wait until this task is processed.

Overseer and OverseerCollectionProcessor

OverseerCollectionProcessor.processMessage(ZkNodeProps, String operation /*create*/)
OverseerCollectionProcessor.processMessage(ZkNodeProps, String)
OverseerCollectionProcessor.createCollection(ClusterState, ZkNodeProps, NamedList)

  ClusterStateMutator.getShardNames(numSlices, shardNames);
   positionVsNodes = identifyNodes(clusterState, nodeList, message, shardNames, repFactor); // round-robin if rule not set

  createConfNode(configName, collectionName, isLegacyCloud);
// This message will be processed by ClusterStateUpdater
// wait for a while until we do see the collection in the clusterState

  for (Map.Entry e : positionVsNodes.entrySet()) {
  if (isLegacyCloud) {
    shardHandler.submit(sreq, sreq.shards[0], sreq.params);
  } else {
    coresToCreate.put(coreName, sreq);

This will send http call and be handled by CoreAdminHandler.handleRequestBody.

  // if there were any errors while processing
  // the state queue, items would have been left in the
  // work queue so let's process those first
  byte[] data = workQueue.peek();
  boolean hadWorkItems = data != null;
  while (data != null)  {
    final ZkNodeProps message = ZkNodeProps.load(data);
    clusterState = processQueueItem(message, clusterState, zkStateWriter, false, null);
    workQueue.poll(); // poll-ing removes the element we got by peek-ing
    data = workQueue.peek();


  zkWriteCommand = processMessage(clusterState, message, operation);

  clusterState = zkStateWriter.enqueueUpdate(clusterState, zkWriteCommand, callback);

  case CREATE:
    return new ClusterStateMutator(getZkStateReader()).createCollection(clusterState, message);

overseer.ClusterStateMutator.createCollection(ClusterState, ZkNodeProps)

Gradle Tips and Tricks - 2017

Run tasks on sub-projects only
./gradlew sub-project:build

./gradlew classes

Gradle task options Skip Tasks
-x test -x testClasses -x javadoc -x javadocJar -x integrationTestClasses 
-x findbugsMain -x findbugsTest -x findbugsIntegrationTest -x pmdMain -x pmdTest -x pmdIntegrationTest 


-s, --stacktrace

Run specific tests
gradle test --tests org.gradle.SomeTest.someMethod
gradle test --tests org.gradle.SomeTest
gradle test --tests org.gradle.internal*
//select all ui test methods from integration tests by naming convention
gradle test --tests *IntegTest*ui*
//selecting tests from different test tasks
gradle test --tests *UiTest integTest --tests *WebTest*ui

Install an artifact locally
apply plugin: 'maven-publish'
gradle publishToMavenLocal

Using artifacts from local maven
apply plugin: "maven"
allprojects {
  repositories {

Run tasks in remote debug mode
export GRADLE_OPTS="-Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=5005"
- this is same as above, but more flexible.

gw - run gradle in sub folders
brew install gdub

Config files
Add extra steps into init script


allprojects {
  apply plugin: 'maven-publish'
  buildscript {
    repositories {

  repositories {


- Figure out which projects are to take part in a build.
include ':repository', ':services', ':web-app' vs settings.gradle

Change subproject name
include "foo" = 'projectName'
project(":foo").name = "foofoo"

allprojects {}
subprojects {}

settings.gradle in root folder
build.gradle for each module

~/.gradle/caches, it can also be configured to use local or remote maven.

Use doLast instead of <<
task copyJarToBin(type: Copy) {
    from createJar // shortcut for createJar.outputs.files
    into "d:/tmp"

task stopTomcat(type:Exec)

Maven Tips and Tricks - 2016

Lessons Learned about Programming and Soft Skills - 2017

How to compare different approaches
- Always think about different approaches (even if you already finished/committed code)

- Don't just choose one that looks good
- List them and compare them
- Always ask you why choose this approache
- Try hard to find problems in your current approach, and how to fix them
For small coding
- Implement them if possible
- Then compare which makes code cleaner, less change etc
Example: Exclude source and javadoc from -jar
APP_BIN=$APP_BIN_DIR/$(ls $APP_BIN_DIR | grep -E 'jarName-version.*jar' | grep -v sources | grep -v javadoc | grep -v pom)

How to quickly scan/learn new classes
Sometimes we need quickly scan/check a bunch of related classes to check how to implement a function, use a method etc
- Check the class's Javadoc
- Check the class signature
- Check main methods:
  - static methods
  - using ctrol+o or outline view
- Check call hierarchy in source code
- Check test code/examples
- Google search code example

When refactor/change the code, also check/change/improve its related code.

Find related doc, check/read the doc carefully.
- Mark/Note the important part of the doc.

For some task, we can use the trial and error approach, just do it, then fix it.
But for some task(production or physical hardware related), it's better to figure out the right way to do it first.

Evaluate the outcome of the action. 
- best/worst outcome

How to implement/work on a feature
- what's goal, what to achieve
- how to test/verify/deploy/enable it in test or production environment easily 
  - useful tricks: dry-run, 
  - able to enable/disable default configuration automatically, but override it manually
- how to measure whether the change makes improvement

Think different/3+/more approaches
Compare them
Don't stop until find a solution that looks good to you
Use tools(notebook, whiteboard)
- Usually we are not happy with the first approach that comes to our minds, and find out a different approach: maybe it's a little better or maybe just be different. In some cases, we stop there: maybe we started to talk it with others or present it (to get others' ideas)
-Example: add Explanation to transform actions

Before ask a question
- Try to Solve it by yourself
- Make sure you read all related code/doc: from top to bottom(quickly, scan but don't ignore any code that may be important)

- Example: NightlyTestRunner

Verify the assumption
- Be aware of the the assumption we or others made in the design or the code.
- Verify whether it's true or not
- Example: one line one record in csv, one-to-one between tms id and bam_id

- Check and realize alternatives
Sometimes, we want A, but maybe B also works and is even better.
- Example: asset letter or account statement

Use tools
Write down in whiteboard or notebook or app
Take a picture now 
- always bring the phone

Check carefully and verify your claim before blame others or think others are wrong
We incline to think others are wrong or made a mistake even if someone told you he/she did that - we made a very brief search and didn't check carefully, then started to think they are wrong.

Prefer to use code to enforce the rule than documentation
Example: all tests must extends XbaseUnitTest.

Make API/feature easier
- to use
- to test/rollback in production (feature flag)

Realize your assumption/decision and verify it first
- Otherwise you may go farther but on the wrong path
Step by step and verify each step

Identify useful info and active quickly (or you may forget about it)
Don't let your past experience affect you
- Try it, it may be different this time
Example: big item delivery

When some thing totally doesn't make sense:
- maybe they are totally different things, you are comparing Apple with Banaba
Example: 12/15, 1/15

Lessons Learned about Programming and Soft Skills - 2016

Bash Scripting Essentials

Brace Expansion
chown root /usr/{ucb/{ex,edit},lib/{ex?.?*,how_ex}}

Special Variables
$? Exit value of last executed command.
wait $pid

$! Process number of last background command.
$0 First word; that is, the command name. This will have the full pathname if it was found via a PATH search.
$n Individual arguments on command line (positional parameters).
$# Number of command-line arguments.

“$*” All arguments on command line as one string (“$1 $2…”). The values are separated by the first character in $IFS.
“$@” All arguments on command line, individually quoted (“$1” “$2” …).

-n string is not null.
-z string is null, that is, has zero length

Testing for File Characteristics
-d File is a directory
-e File exists
-f File is a regular file
-s File has a size greater than zero
-r, -w, -x, -s - socket

[ -d "$dir" ] && echo "$dir exists." || echo "$dir doesn't exists."

Testing with Pattern Matches
== pattern
=~ ere
if [[ "${MYFILENAME}" == *.jpg ]]

-a, &&
-o, ||

if [ ! -d $param ]
if [ $? -ne 0 ]

HEAP_DUMP_DIR=$(sed 's/-XX:HeapDumpPath=\([^ ]*\)/\1/' <<< $param)

Use variable $1, $2..$n to access argument passed to the function.
Hello () {
   echo "Hello $1 $2"
   return 10
Return value in bash
echo in the function, capture the result in caller $()

Hello a b

for i in $( command ); do command $i; done

for i in $( command ); do
  command $i
if [ "$var" == "value" ];  then


Google Shell Style Guide
quote your variables; prefer "${var}" over "$var",
Use $(command) instead of backticks.
[[ ... ]] is preferred over [test
- [[ ... ]] reduces errors as no pathname expansion or word splitting takes place between [[ and ]] and [[ ... ]] allows for regular expression matching where [ ... ] does not

Use readonly or declare -r to ensure they're read only.
Make variable readonly: readonly var=value
Make function readonly: readonly -f function
readonly -p/-f

if [[ -f ~/.bashrc ]]; then
   source ~/.bashrc
[ ! -f $FILE ] && { echo "$FILE not found"; exit -1; }

$(( $a+$b )) to execute arithmetic expressions
Put ; do and ; then on the same line as the while, for or if.
Prefer brace-quoting all other variables.
Use "$@" unless you have a specific reason to use $*.
- "$@" will retain arguments as-is, so no args provided will result in no args being passed on;
- "$*" expands to one argument, with all args joined by (usually) spaces, so no args provided will result in one empty string being passed on.

while IFS=, read var1 var2 var3; do

done < file.txt

Use Local Variables
local var="something"
local var
var="$(func)" || return

if [[ "${my_var}" = "some_string" ]]; then
-z (string length is zero) and -n (string length is not zero)

if ! mv "${file_list}" "${dest_dir}/" ; then


Use set -o errexit (a.k.a. set -e) to make your script exit when a command fails.
Then add || true to commands that you allow to fail.
set -e - enable exit immediately
set +e - disable exit immediately
set -x  - print a trace Use set -o nounset (a.k.a. set -u) to exit when your script tries to use undeclared variables.
Use set -o xtrace (a.k.a set -x) to trace what gets executed. Useful for debugging.
set -u   Fail for undefined variable (set -o nounset)

Use $(( ... )), not expr for executing arithmetic expressions. which is more forgiving about space
Use (( or let, not $(( when you don't need the result

Identify common problems with shellcheck.

while true; do some_commands_here; done
while true

Run command until success
until $the_command; do echo "Try again"; done
while [ -n $(the command) ]; do echo "Try again";done;

Essential Linux Commands for Developers

How DistributedUpdateProcessor Works - Learning Solr Code

Case Study
Case 1: Update request is first sent to a follower (can be any node)
I: The coordinator nodes receives the add request:

DistribPhase phase =

phrase is None

DistributedUpdateProcessor.setupRequest(String, SolrInputDocument, String)

ClusterState cstate = zkController.getClusterState();
DocCollection coll = cstate.getCollection(collection);
Slice slice = coll.getRouter().getTargetSlice(id, doc, route, req.getParams(), coll);
String shardId = slice.getName();

In which shard this doc should store

Replica leaderReplica = zkController.getZkStateReader().getLeaderRetry(
    collection, shardId);
isLeader = leaderReplica.getName().equals(

Whether I am the leader that should store the doc: false

2. Forward to the leader that should store the doc
// I need to forward onto the leader...
nodes = new ArrayList<>(1);

3. DistributedUpdateProcessor.processAdd(AddUpdateCommand)
           (isLeader || isSubShardLeader ?
            DistribPhase.FROMLEADER.toString() :
            DistribPhase.TOLEADER.toString())); ==> TOLEADER
params.set(DISTRIB_FROM, ZkCoreNodeProps.getCoreUrl(
    zkController.getBaseUrl(), req.getCore().getName()));
cmdDistrib.distribAdd(cmd, nodes, params, false, replicationTracker);

II: The leader receives the request:
org.apache.solr.update.processor.UpdateRequestProcessorChain.createProcessor(SolrQueryRequest, SolrQueryResponse)
final String distribPhase = req.getParams().get(DistributingUpdateProcessorFactory.DISTRIB_UPDATE_PARAM);
skipToDistrib true
// skip anything that doesn't have the marker interface - UpdateRequestProcessorFactory.RunAlways

DistribPhase phase = TOLEADER
String fromCollection = updateCommand.getReq().getParams().get(DISTRIB_FROM_COLLECTION);
if (isLeader || isSubShardLeader) {
          // that means I want to forward onto my replicas...
          // so get the replicas...
          forwardToLeader = false;
nodes = follower nodes

2. The leader adds the doc locally first
boolean dropCmd = false;
if (!forwardToLeader) {    // forwardToLeader false
  dropCmd = versionAdd(cmd); // usually return false

private void doLocalAdd(AddUpdateCommand cmd) throws IOException {

if (willDistrib) { // true
  cmd.solrDoc = clonedDoc;

3. The leader forwards the add request to its followers
params = new ModifiableSolrParams(filterParams(req.getParams()));
           (isLeader || isSubShardLeader ?
            DistribPhase.FROMLEADER.toString() :
params.set(DISTRIB_FROM, ZkCoreNodeProps.getCoreUrl(
    zkController.getBaseUrl(), req.getCore().getName()));

if (replicationTracker != null && minRf > 1)
  params.set(UpdateRequest.MIN_REPFACT, String.valueOf(minRf));

cmdDistrib.distribAdd(cmd, nodes, params, false, replicationTracker);

III: Followers receives the request:
1. org.apache.solr.update.processor.UpdateRequestProcessorChain.createProcessor(SolrQueryRequest, SolrQueryResponse)
final String distribPhase = req.getParams().get(DistributingUpdateProcessorFactory.DISTRIB_UPDATE_PARAM); //FROMLEADER
final boolean skipToDistrib = distribPhase != null; // true

if (DistribPhase.FROMLEADER == phase && !couldIbeSubShardLeader(coll)) {
  if (req.getCore().getCoreDescriptor().getCloudDescriptor().isLeader()) {
    // locally we think we are leader but the request says it came FROMLEADER
    // that could indicate a problem, let the full logic below figure it out
  } else {
    isLeader = false;     // we actually might be the leader, but we don't want leader-logic for these types of updates anyway.
    forwardToLeader = false;
    return nodes;

return empty nodes
if (!forwardToLeader) {
  dropCmd = versionAdd(cmd);

Case 2: The update request is sent to leader which should store this doc
DistributedUpdateProcessor.setupRequest(String, SolrInputDocument, String)
DistribPhase phase: none

Replica leaderReplica = zkController.getZkStateReader().getLeaderRetry(
    collection, shardId);
isLeader = leaderReplica.getName().equals(

if (isLeader || isSubShardLeader) {
          // that means I want to forward onto my replicas...
          // so get the replicas...
          forwardToLeader = false;
nodes = followers

It will forward the request to its followers with params:

if (!forwardToLeader) { // false
  dropCmd = versionAdd(cmd);
It will add to its local at this stage

// It doesn't forward this request to itself again, so no stage update.distrib=TOLEADER

Case 3: The add request is sent to a leader which should not own this doc
Case 4: The add request is sent to a leader which should not own this doc
The coordinator node will forward the add request to the leader of the shard that should store the request

DistributedUpdateProcessor.setupRequest(String, SolrInputDocument, String)
ClusterState cstate = zkController.getClusterState();
DocCollection coll = cstate.getCollection(collection);
Slice slice = coll.getRouter().getTargetSlice(id, doc, route, req.getParams(), coll);
String shardId = slice.getName();

decide which shard this doc belongs to
return nodes - the leader that should store the doc


cmdDistrib.distribAdd(cmd, nodes, params, false, replicationTracker);

Case 5: Send multiple docs in one command to a follower
XMLLoader.processUpdate(SolrQueryRequest, UpdateRequestProcessor, XMLStreamReader)

while (true) {
  if ("doc".equals(currTag)) {
    if(addCmd != null) {
      log.trace("adding doc...");
      addCmd.solrDoc = readDoc(parser);
    } else {
      throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, "Unexpected tag without an tag surrounding it.");

It calls processAdd for each doc.

Related Code

UpdateRequestProcessorChain.createProcessor(SolrQueryRequest, SolrQueryResponse)

if the chain includes the RunUpdateProcessorFactory, but does not include an implementation of the DistributingUpdateProcessorFactory interface, then an instance of DistributedUpdateProcessorFactory will be injected immediately prior to the RunUpdateProcessorFactory.
if (0 <= runIndex && 0 == numDistrib) {
  // by default, add distrib processor immediately before run
  DistributedUpdateProcessorFactory distrib
    = new DistributedUpdateProcessorFactory();
  distrib.init(new NamedList());
  list.add(runIndex, distrib);


DistribPhase phase = DistribPhase.parseParam(req.getParams().get(DISTRIB_UPDATE_PARAM))

boolean isOnCoordinateNode = (phase == null || phase == DistribPhase.NONE);

How to Improve Your Skills as a Programmer

How to improve your skills as a programmer from Yun Yuan

What to read
How to find stuffs to read

Learn from your bugs/mistakes
Symptom/Root Cause/How Found

How to Improve Design Skills

How to improve design skills from Yun Yuan

Non-function features
Unique features
Rate limit - scalable, availability,  DDos

What's the bottleneck
How to scale
How to handle change - node added/removed/crashed

Better user experience
Thinking from client/user perspective
How they use it, what they would like to know

How to Improve Problem Solving Skills

How to improve problem solving skills from Yun Yuan
How to test it (locally) and easily
How to verify whether it works easily
What's the short comings of your current approach?
Whether there is better approach?

How To Ask Questions The Smart Way

How To Conduct a Technical Interview Effectively

Technical Skills
- Problem solving: not-easy algorithm questions
- Coding
- Design
Soft skills
- Communication
- Retrospect
  - Mistakes related with design/decision
  - What you learned from your mistake
  - Bugs/troubleshooting
- Eager to learn
- Be flexible, willing to listen, not stubborn

What questions to ask
- ask interesting/challenging questions
- Or questions that's not difficult but focus on coding (bug free)
- ask questions that can be solved in different ways
- Avoid questions that can only solved one specific approach, unless it's obvious(binary search etc), and you are tesing coding skills not problem solving skills

Don't ask 
- brain teasers, puzzles, riddles
- problems only because you are interested, you just happen to know, or you just learned recently

Know the questions very well
- Different approaches
- Expect different approaches that you don't even know
  - Verify it(use example, proof), if it works, the candidate does a good job and you also learn something new

Know common cause of bugs
- Able to detect bugs in candidate's code quickly

Give candidates the opportunity to prove themselves and shine
We are trying to evaluate the candidate's skills thoroughly, what he/she is good at, what not.
If you plan to ask 2 coding questions, one simple, and one more difficult, tell candidates
Let the candidates know your expectation

Make the candidates learn something
- If the candidate doesn't give right solution/answer, and at the end of the interview, he/she wants to know how to approach it, tell him/her.
- Candidates takes a lot of effort for the interview (one day off and commute), if they desire to learn something, and learning something make them feel good
- Prove that you know the solution and have reasonable answer, and not ask questions you even don't know much

No surprise
If you find issues/bugs in candidate's code or design, point them out
The candidate should have a rough idea about how he/she performs in this interview

Be fair

Phone interview
Prefer coding question over design question
- as design is partly about communication and it's hard to test communication skills over phone

About me - Jeffery Yuan (2017)

This would be a short list that about I am good at and what I should improve.
- I will keep updating it, and hope when I retrospect after 1 year, I will realize that I have improved  and learned a lot of things.

Retrospect and Learning Logs
- I like to summarize what I have learned, and write them down

Sharing Knowledge

Problem Solving and troubleshooting
- I like to solve difficult problems as I can always learn something from it.
- I also summarize how(what steps) I take to solve the problems, what I learned that can make me solve problems quicker later.
- Search and find resource needed to solve the problem
- See more at my blog: Troubleshooting

Proactively find problems and fix them
- such as find problems in existing design and code, and think about how to improve them

Be honest
- to myself and colleague about what I know and what I don't
Be moderate
- I know there are still a lot of things that I should learn and improve.
- I like to learn from others

Proactively learning
- Have a safaribooksonline account
- Like to learn from book, and people
- When I use Cassandra, Kafka in our project, I took time to learn not only how to use it but more importantly its high level design.
- Read more at my log System Design
Programmer: Lifelong Learning

Weakness - things need improving
System design
Knowledge about distributed system
Public Speaking

How to Review and Discuss - Software Design

Talk/Think about all related
- how do we store data, 
- client api 
- ui change
- back compatibility: how to handle old data/client

But focus on most important stuff (first)

Talk/think about design principles/practices
- such as idempotent, parallelization,monitoring, etc
- Check more at System Design - Summary

What's the impact of other (internal and cross-team) components?

How others components use it?

What're the known and potential constraints/issues/flaws in current design?
Don't only talk about its advantages, 
Also talk about issues, don't hide them

What are alternatives?
Think alternative and different approaches, this can help find better solution
We can't really review and compare if there is no alternatives

Welcome different approaches
- although it doesn't mean it's better, or we will use it

Development Cost
- How difficult it takes to implement?

What may change and How to evolve

What may change in (very) near future?

How do can we know when the new feature works or doesn't work
How can we know problems happen

Feature Flag
Can we enable/disable the feature at runtime

Be Prepared
Ok to have informal/impromptu discussion with one or two colleagues

But make sure everyone is prepared for the formal team design discussion
All attendees should know the topic: how they would design it

Don't make design decision immediately - for things that really matters
Take time to reflect and develop disagreement, talk it again later

Listen first

When you don't agree with other's approaches
Don't get too defensive
Talk about ideas not people

Be prepared

Make API/Feature easier
- to use
- to test/rollback in production (feature flag)

System Design - Summary

Problem Solving Practice - Redis cache.put Hangs

The Issue
After deployed the change: Multi Tiered Caching - Using in-process EhCache in front of Distributed Redis to test environment (with some other change and someone did some change in the server like restart), we found out that cache.put hangs when save data to redis.

Troubleshooting Process
First we tried to reproduce the issue in my local setup, it always works. But we can easily reproduce it in test environment.

This mde me think this maybe something related with the test environment.

Then I used kill -8 processId to generate several thread dumps when reproduce the issue in test machine. I found out some suspect:
"ajp-nio-8009-exec-10" #91 daemon prio=5 os_prio=0 tid=0x00007f49c400a800 nid=0x75db waiting on condition [0x00007f495333e000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at  RedisCache$RedisCachePutCallback(RedisCache$AbstractRedisCacheCallback).waitForLock(RedisConnection) line: 600
RedisCache$RedisCachePutCallback(RedisCache$AbstractRedisCacheCallback).doInRedis(RedisConnection) line: 564
at com.lifelong.example.MultiTieredCache.lambda$put$40(
at com.lifelong.example.MultiTieredCache$$Lambda$18/1283186866.accept(Unknown Source)
at java.util.ArrayList.forEach(
at com.lifelong.example.MultiTieredCache.put(
at org.springframework.cache.interceptor.AbstractCacheInvoker.doPut(
at org.springframework.cache.interceptor.CacheAspectSupport$CachePutRequest.apply(
at org.springframework.cache.interceptor.CacheAspectSupport.execute(
at org.springframework.cache.interceptor.CacheAspectSupport.execute(
at org.springframework.cache.interceptor.CacheInterceptor.invoke(

Check the code at RedisCache$AbstractRedisCacheCallback to understand how it works:
for operations like put/putIfAbsent/evict/clear, @cacheable with sync =true(RedisWriteThroughCallback), it check whether there is a key like cacheName~lock in redis, if exist, it will wait until it's gone.

This lock is created and deleted for @Cacheable with sync =true in RedisWriteThroughCallback which calls lock and unlock methods.

This made me check the settings in redis: after created the tunnel to redis, ran command: key cacheName~lock, I found out that it's indeed there.

Now everything make sense:
- we did set sync=true and run performance test, then restarted the server and removed it. The cacheName~lock was left there may be due to server restart. Due to the cacheName~lock, now all resid update api would not work.

After removed cacheName~lock in redis, everything works fine.

Take away
- When use some feature (@Cacheable(sync=true) in this case), know how it's implemented.


Java (161) Lucene-Solr (112) Interview (64) All (58) J2SE (53) Algorithm (45) Soft Skills (39) Eclipse (33) Code Example (31) JavaScript (23) Linux (22) Spring (22) Tools (22) Windows (22) Web Development (20) Nutch2 (18) Bugs (17) Debug (16) Defects (14) Text Mining (14) Troubleshooting (14) J2EE (13) Network (13) Tips (12) PowerShell (11) Chrome (10) Problem Solving (10) Design (9) How to (9) Learning code (9) Performance (9) Security (9) UIMA (9) html (9) Http Client (8) Maven (8) bat (8) blogger (8) Big Data (7) Database (7) Google (7) Guava (7) JSON (7) Shell (7) System Design (7) ANT (6) Coding Skills (6) Lesson Learned (6) Programmer Skills (6) Scala (6) css (6) Algorithm Series (5) Cache (5) Continuous Integration (5) IDE (5) adsense (5) xml (5) AIX (4) Become a Better You (4) Code Quality (4) Concurrency (4) GAE (4) Git (4) Good Programming Practices (4) Jackson (4) Life (4) Memory Usage (4) Miscs (4) OpenNLP (4) Project Managment (4) Review (4) Spark (4) Testing (4) ads (4) regular-expression (4) Android (3) Apache Spark (3) Distributed (3) Dynamic Languages (3) Eclipse RCP (3) English (3) Happy Hacking (3) IBM (3) J2SE Knowledge Series (3) JAX-RS (3) Jetty (3) Mac (3) Python (3) Restful Web Service (3) Script (3) regex (3) seo (3) .Net (2) Android Studio (2) Apache (2) Apache Procrun (2) Architecture (2) Batch (2) Bit Operation (2) Build (2) Building Scalable Web Sites (2) C# (2) C/C++ (2) CSV (2) Career (2) Cassandra (2) Fiddler (2) Google Drive (2) Gson (2) How to Interview (2) Html Parser (2) Http (2) Image Tools (2) JQuery (2) Jersey (2) LDAP (2) Logging (2) Software Issues (2) Storage (2) Text Search (2) xml parser (2) AOP (1) Application Design (1) AspectJ (1) Chrome DevTools (1) Cloud (1) Codility (1) Data Mining (1) Data Structure (1) ExceptionUtils (1) Exif (1) Feature Request (1) FindBugs (1) Firefox (1) Greasemonkey (1) HTML5 (1) Httpd (1) I18N (1) IBM Java Thread Dump Analyzer (1) Invest (1) JDK Source Code (1) JDK8 (1) JMX (1) Lazy Developer (1) Machine Learning (1) Mobile (1) My Plan for 2010 (1) Netbeans (1) Notes (1) Operating System (1) Perl (1) Problems (1) Product Architecture (1) Programming Life (1) Quality (1) Redhat (1) Redis (1) RxJava (1) Solutions logs (1) Team Management (1) Thread Dump Analyzer (1) Visualization (1) boilerpipe (1) htm (1) ongoing (1) procrun (1) rss (1)

Popular Posts