Eclipse vs IntelliJ

Better ways to organize info - Winner: Eclipse
-  Eclipse provides workspace, working set, projects
-  Organizes views into perspectives, provides view like console, problems, errors. 
- We always know where to find info we need.

Editor -Winner: Eclipse
Eclipse shows error/warning in current window in left side bar. 
IntelliJ only shows error/warning for all content in right side bar. (Eclipse also shows things in right side bar)
When you write code, its much easier to check error/warning in Eclipse.

Shortcut keys -Winner: Eclipse
Eclipse's default shortcut is more natural to me: easier to remember and type. Usually Eclipse uses less keys. IntelliJ uses all keys F1-10.

To format code: Cmd+shift+f in eclipse Cmd+alt+l in IntelliJ
- what L means?

Quick fix - Winner: Eclipse
- Eclipse quick fix(ctrl+1) is too powerful, it can fix bugs, generate code based on current context.
- IntelliJ provides too many ways to do similar things - wtf?
Show intention actions - alt+enter
Basic code completion - ctrl+space

Smart type code completion - ctrl+shift+space

- Drop to frame always works in Eclipse, but doesn't in IntelliJ
- Display view in Eclipse stores history, it's much easier to use

IntelliJ does have its own advantages such as: smart step into for anonymous classes and lambdas.

Search everywhere in IntelliJ
In Eclipse, you have to use different shortcuts/ways to search type, resource, views. 
In IntelliJ, just double press shift keys.

Better Performance/More Stable - Winner: IntelliJ
Sometimes Eclipse is slow or gets stuck, you need take time to customize it, check Improve Eclipse Performance

Same code may work in Eclipse but not in IntelliJ
Eclipse and IntelliJ doesn't works well together, if some team members are using eclipse, you better also use eclipse, as sometimes, the same code works and can be deployed in eclipse's tomcat but doesn't work or can't be deployed in IntelliJ tomcat.

More about Eclipse and IntelliJ:
Mastering Eclipse IDE - Tips, Tricks
IntelliJ IDEA Tips and Tricks

How to Be a Great Technical Lead

How to choose tech stack
- List your current/near-future features
- Consider multiple/different options and how to use them to implement features 
- Compare their pros and cons: manageability, scalability 
- What techs are used in your team and departments
    - But this doesn't mean always use same tech stacks used before 
- What tech are supported by operation team

Build reputation
- be useful and help others

Know your team members
- their strength or weakness
  - programming skills
  - design skills
  - trouble shooting skills
- what they are interested, what are not
- what they want to learn and improve
Help them to grow and improve their skills

Know yourself
Know how/what you can help the team
- your programming/design/communication skills
Know when to stand alone and when to step in
Know when to let others shine

- don't try to do everything
- let other people do the job they are good at and observe
- let others contribute and take ownership/leadership

Spring Tips & Tricks

Spring-boot specifies and manages its dependencies' version:
Don't specify and overwrite the version again in our project's pom.xml.
Don't use same property value to overwrite the version by accident.

@Bean public static PropertySourcesPlaceholderConfigurer propertyConfig(){}

Bind properties to a list or a map
First create ConversionService bean
@Bean public ConversionService conversionService() {
    return new DefaultConversionService();

private List<Integer> myList

redis.expires={typeA:900, typeB: 86400}

private Map<String, Long> expires;

Spring Cache
Support Spring Expression Language in Spring AOP
In contrast to the @Cacheable annotation, @CachePut does not cause the advised method to be skipped. Rather, it always causes the method to be invoked and its result to be stored in the associated cache. 
We can use #result as key in @CachePut.

if (result != NO_RESULT) evaluationContext.setVariable(RESULT_VARIABLE, result);

- To support #result, we just put method return result value into the variable: result.

Synchronized caching
@CacheConfig(cacheNames = "cacheA")
@Cacheable(key= ("#p0['name']")) // p0 is a map
@CachePut(key = "") // p0 is an instance
@Cacheable(key = "'somePrefix-'+#p0")

public static final String PREFIX = "'" + CACHE_NAME + "-'+";
@Cacheable(key = PREFIX + "#p0.toString()") // p0 is UUID

@CachePut(key = PREFIX + "#result.uuid.toString()", condition = "#p2 == 'typeA'")

Compared with condition, unless expressions are evaluated after the method has been called.
unless = "#result != null"

Not cache null value explicitly
@Cacheable(unless = "#result == null")

Change cache log level to trace
logger name="org.springframework.cache" level="trace"
No cache entry for key
Cache entry for key found in cache

Redis Cache
- Cache null values
- Use properties to set different expires for different cache regions
public CacheManager cacheManager(final StringRedisTemplate redisTemplate) {
    final RedisCacheManager cacheManager =
            new RedisCacheManager(redisTemplate, Collections.<String>emptyList(), true);
    return cacheManager;
configKeys = stringRedisTemplate.opsForZSet().range(cacheKey, 0, -1)

Bind Collection of Beans
@Autowired(required = false)
private List<XInterface> instances;


Using Spring managed Bean in non-managed object

Spring EL
private List myList;

Parse annotation's value using Spring EL
// instance are thread safe
private final ExpressionParser parser = new SpelExpressionParser(); 
final XAnnotation xAnnotation = method.getAnnotation(XAnnotation.class);
final Expression exp = parser.parseExpression(xAnnotation.propertyA());
final EvaluationContext context = new MethodBasedEvaluationContext(invocation.getThis(), invocation.getMethod(),
        invocation.getArguments(), new LocalVariableTableParameterNameDiscoverer());

final Object value = exp.getValue(context);

Spring EL
#root, #root.methodName,, #result


BeanUtils.copyProperties - copy properties while ignoring null values
@Retryable(value = {FooException.class, BarException.class}, maxAttempts = 5)

Use AbstractSecurityWebApplicationInitializer
Configure it by extends WebSecurityConfigurerAdapter

Authentication auth = SecurityContextHolder.getContext().getAuthentication();

Integrate In-Memory Authentication for Test Automation

Use inMemoryAuthentication in dev lines for automation test

Extend UsernamePasswordAuthenticationFilter
Build Multi-Tenant Application

Spring Test
@ContextConfiguration(loader = AnnotationConfigContextLoader.class)

Spring test doesn't work with JUnit 4.11
InitializationError When Use Spring Test + JUnit

Spring data
How pring-data-cassandra runs query


Add @NoRepositoryBean to intermediate repository interface.

@Score private Float score;

com.jcraft.jsch.JSchException: UnknownHostKey
ssh-keyscan server-ip-address > my_known_hosts
new DefaultSftpSessionFactory().setKnownHosts(new ClassPathResource("my_known_hosts").getFile().getAbsolutePath());

Return raw json string
- Add StringHttpMessageConverter before MappingJackson2HttpMessageConverter in method configureMessageConverters of WebMvcConfigurationSupport

Injecting Bean Dependencies as Method Parameters
public Foo foo(@Qualifier("bar1") Bar bar1) {
 return new Foo(bar1);
@Bean public Bar bar1() {
 return new Bar("bar1");
This approach is better than the following one as there is only one bean(bar) created.
public Foo foo() {
 return new Foo(bar());

If we create a bean manually in code and it implements InitializingBean interface which will be called automatically after the bean is created by Spring, we may have to call afterPropertiesSet manually.

@Autowired(required = false)

private Configuration aConf;

Journey to be A Great Engineer

Be aware of yourself
- strength, weaknesses
- have a plan to overcome the weaknesses that matters
Brand yourself/build reputation
- join discussion, other projects proactively
- make presentation, give a talk
- be useful and help others grow
Be better everyday
- writer better code, design system better
- try new/different/best approach
Never give up
- exhausts All Resources to solve the problem
- ask more questions and dig deeper
Follow up
- be inquisitive
- how others eventually fix/solve the problem
- what's the real root cause

Keep Learning everyday
- in different ways
  - learn from books/blog/video/podcast
  - learn from others: no matter whether developers are senior or junior than you
  - be open-minded
- learn related/important skills that matters
  - how to search in codebase, log, git(hub) efficiently
  - troubleshoot, IDE
  - how to review code
- learn tech stack used by other team (members)
- understand trends and new techs
  - what's the advantage, when should use it when not
  - how it works, how it's designed
Maintain your learning log 
- blog, notes, doc etc
Maintain your toolbox
- languages/frameworks/ide/bash, new features

Keep calm and be happy/excited
- when meet some difficult problems
- when hear suggestions

- prepare for next meeting/discussion/feature/project
- learning the techs that may be used in current or next feature/project
- learn the techs that may be different(better) and can improve your current solution
- don't blindly repeat same approach/design/code

Use time efficiently
- Know the topics, what you can get and what you can contribute
- Whether it's important to you, if not and you are not required, skip it

- Avoid fragmented time; if can't, use it efficiently(reading, podcast)
- don’t lose state when context switch
- use tools to automate

Think/Discuss/design more before take action
- full algorithm/design before coding
- don't take action blindly: make change then test it or debug it
- check and infer first
Admit what you don’t know
- don't pretend
- ask questions and learn it

Share knowledge
- share more and learn more

How to Ask questions
Ask more and deeper questions to yourself
Ask questions to others
- don't ask simple knowledgeable questions too often
  -- learn/read/google first
- But don't hesitate to ask questions to any one if needed

How to answer questions
- first understand the question and what's the real question and issue
- don't just assume others are asking simple or stupid questions
- if not 100% sure, check it before answer

Quickly vs Slowly
Know when to solve problem or do things quickly when to do things slowly
- Solve problems quickly so others can move on
- But take time to find root cause or reflect if it matters

Career Development
- be initiative
- talk with your managers
- express your interest/passion and goal
- work on interesting/challenging projects
- prefer yes over no

Troubleshooting Practices

Troubleshooting is important and inevitable for engineers when we try to figure out why the code doesn't work; it also improves our problem solving skills: what steps to take to quickly find the root cause.

This articles is to reflect the process that I took to find the root cause.

Can't deploy Spring-boot application to tomcat
- SpringBootServletInitializer.configure is not called at all

During dev, the team is running spring-boot application as java application. But later when we deploy to the cloud, we need run it as web application. The developer tried it but doesn't work.

We found out it's because the dev specifies 2.5 in web.xml unintentionally and Spring-boot requires Servlet 3.0 by default.

jackson-module-scala: NoSuchMethodError due to intellij
Recently when I ran tests in intellij, it throws the following exception:
java.lang.NoSuchMethodError: scala.util.matching.Regex.unapplySeq(Ljava/lang/CharSequence;)Lscala/Option;
at com.fasterxml.jackson.module.scala.JacksonModule$.version$lzycompute(JacksonModule.scala:30)
at com.fasterxml.jackson.module.scala.JacksonModule$.version(JacksonModule.scala:26)
at com.fasterxml.jackson.module.scala.JacksonModule$class.version(JacksonModule.scala:49)
at com.fasterxml.jackson.module.scala.DefaultScalaModule.version(DefaultScalaModule.scala:19)
at com.fasterxml.jackson.databind.ObjectMapper.registerModule(

Seems it's caused by jackson-module-scala, but I use mvn dependency:tree and dependency:copy-dependencies to check all thing looks good: we use scala-2.11 and jackson-module-scala_2.11. Also if I ran the test in command line, it works.

This made me think maybe this is because of intellij. 
I added a breakpoint and run: scala.util.Properties.versionString in debug mode, it's 2.10.6. This is strange. Then I check external libraries in intellij, there are indeed multiple scala-library versions: 2.10.6 and 2.11.8.

In our current projects, we don't use scala-library-2.10.6 anywhere. Maybe previously we use scala 2.10.6, later we upgraded to 2.11. But intellij still keeps reference to old scala in external libraries.
- no effects when reimport all maven projects

The workaround is to delete .idea/libraries internals and restart IntelliJ and reimport projects.

ClassNotFoundException: StandardSessionIdGenerator because we overwrite tomcat.version
- Asked by team mate to help solve this issue
Usually this is because of version conflict. We use spring-boot 1.5, but the tomcat-embed-core is 7.0.65. This looks suspicious. Then I checked the pom.xml. This makes me check how spring-boot specifies its version. search tomcat-embed-core in, found 
spring-boot-starters/spring-boot-starter-tomcat/pom.xml and spring-boot-dependencies/pom.xml:

But we also defined same property: tomcat.version to 7.0.50 in project's pom.xml for some tomcat dependencies we used. This overwrites spring-boot property. So now spring-boot use tomcat-embed-core 7.0.65 which doesn't have StandardSessionIdGenerator class.

The fix is to change our own property to xx. tomcat.version and also bumped its version to same as spring-boot.
- spring-boot should add prefix such as spring.boot to these properties.

Find code by running sample code and debugging it
Build Web Service APIs to Update Solr's Managed Resources (stop words, synonyms)
I need figure out how to get address of one of live solr nodes by using CloudSolrClient.  I know solrJ uses apache http client to send requests, somewhere it needs get addresses of solr nodes, so I write some sample code which sends invalid request, solrJ will throw exception, then I check the stack trace form the log and find out the code that I need at org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(SolrRequest, String).

System Design Practices

Design home page stream
Design feed
Mixed approaches - Push + Pull from 
-- selectively disable push for celebrities, pull them instead.
Design Decisions For Scaling Your High Traffic Feeds

Cassandra + Denormalize
Redis + Normalized

Selective Fanout Based On Producer/Consumer
URL shortener
- read heavy
- separate read/write to different apps
- able to disable write functions
- id (auto increment), actual url, and shorten url

- base 62 encode the id to string(shorten url)
  - generate auto-generate id
  - can use sql or nosql as long as find a way to generate unique long
- rate limit, prevent abuse
Insert Long Url to db, get the primary key: long
Convert base10 to Base62[0-9a-zA-Z], other characters such as -,_+, convert long to short url
Decode short url to long number and query db to get long url.

How to make it scalable
How to analyze short url click event

- politeness: rate limit
- put same host pages into same queue
- different priority for different sites - multiple queues
- normalization/canonicalization
- directed graph
- each URL: last visited time, score, fingerprint
- Use Bloom filters to remove duplicates
- predict which websites are most likely to have frequent update, put those into the fetchers’ priority queue.
How to handle realtime news?
- each url: next visited time based on its score and how frequently it changes
- if page(links, content) is (mostly) not changed, increase its next visit time(cut+2t) otherwise decrease its next visit time
- begin with a seed set
- use a simple fingerprint such as a checksum
- the frontier
- normalized URL
DNS resolver
- Put data centers in different regions - machines in asian to crawl asian web sites

To crawl price histories of products
- previous price history changes
- how many reviews they have

- To detect new products: provide api so client can add or manually crawl new added pages.

Design an in-memory search engine
How to index and store in memory
How to support free text queries, phrase queries
Map<String, List<Document>>
List<Document> is sorted
Document: docId, List<Long> positionIds
List<Long> is sorted
How to save to files
How to merge small files into big files
-- When save to file, make it sorted by word
-- Use merge sort to merge multiple fields
How to make it scalable - how solr cloud works 

Auto completion in search
- single node
- Trie, Topk

Design Hit Counter - how to count number of requests in last second, minute and hour - link1,

Generate Game Map
- brute force, randomly generate one, check whether it can be solved in limit time by machine

- different difficulty level

Design score/rank system for social game.
Need update score realtime?
- yes, for this user's operation, when score is small
- no, for others' operation, when score is big
Rank - realtime?
- yes, when user is ranked in X(1000),
- no, if user ranks is below Y(1M)
Average score
- rank: use sorted tree - sorted by score, node splitted by score range
- table: userid, score
- sharding

- append, not update existing data
- separate score change to different types: real time vs batchable
- batch score change

Search nearby places
- GeoHash

Design Chat app like messenger, what's app
How to manage/detect friends status
How to store messages
How to keep connections
- persistent connection?


- url dedup: use bloom filter
Detect duplicate documents in billions of urls
- back-of-the-envelope calculations
- (30 bytes for url + int/long(4 or 8 bytes)) * 10^9(~1g) ~ 34g
- A1: Hash and split into multiple files  - slow (even with multiple threads) as a lot of IOs
- A2: Parallelize in multiple machines - like map-reduce

Design Slack

Design logging collection and analysis system
Design shopping cart
- monitor
- abandoned cart
guest cart
- cookie
- merged if user login later
- check items availability at some point

- call other services - stock service(price, availability), metadata service, images(cdn)

ecommerce - amazon
- analytics
- abandoned cart, user behavior(click,view,add/remove from cart)
checkout + payment

customer service

Design rate limiter
Design Miao Sha
- Principle: fair(fifo) - but no need to 100% guarantee
Two types of queues
- each server has its own queue, only keep x requests, reject directly if more than it
- one queue in one single server which gets all requests from all server, and process it synchronously
- avoid distributed problems
- asynchronous processing
Protect against from hackers

- filter based on ipAddress, acountId
- Prevent abuse: cheating

Design email

- 1. Use count table
- 2. Use Redis

Hierarchical Data
- id, parentid

Generate unique ID numbers
Twitter Snowflake
- 41bit timestamp
- 10bit workerid

- 12bit sequence
Ticket Servers
auto-increment-increment = 2
auto-increment-offset = 1

auto-increment-increment = 2
auto-increment-offset = 2

Design the data structure for large social network
Design system to store/read images

Expire mp3 in cdn and generate new one after x min

Playlist Shuffle
- Fisher-Yates shuffle
- a good, uniformly distributed mix of all the tracks, no burst

Design key-value Database(lookup service)
- membership and failure detection(gossip)
- Partition (Virtual nodes, consistent hashing)
- Replication(Versioning)
- handling temporary failures
- read repair and anti-entropy using Merkle tree
- Global(Local) secondary index
- Conflict resolution

- Request coordinator

Java Concurrency Tips

executor.execute vs submit
execute can only accept Runnable, execute returns nothing
- void execute(Runnable task)
- used for fire and forget

submit can accept Runnable and Callable, submit returns Future
Future submit(Callable task)
Future submit(Runnable task)
- used when need get the result

Exception handling - if task throws unhanded exception
- execute: the UncaughtExceptionHandler for the Thread running the task to be invoked. By default it prints exception to System.err. We can register global UncaughtExceptionHandler: Thread.setDefaultUncaughtExceptionHandler();
- submit: Any exception thrown will be bound in Future. Call future.get will throw an ExecutionException with the original Throwable as its cause.

RejectedExecutionHandler: ThreadPoolExecutor.CallerRunsPolicy(), NewThreadRunsPolicy

-Isolation: Allocate a separate thread pool for slow web requests within the same application

- corePoolSize 30, maximumPoolSize 50, ArrayBlockingQueue(100): what it means

Deadlock prevention
- Same Lock Ordering
- Lock Timeout

Mutual Exclusion
Hold and Wait
No Preemption

Circular Wait

Intrinsic Lock - synchronized
- every object and class is logically associated with a monitor
- wait, notify, notifyAll
- acquire lock interruptibly and with timeout - tryLock
- lockInterruptibly: used to interrupt thread when it is waiting for lock
- fair lock
- better performance: uses atomic variable and faster CAS operation
- get list of waiting thread for lock
- need put it into try-finally block, code unreadable
- easier to make mistake: forget to unlock in finally block
associate a counter with each thread

- wait for/notify a particular condition to become true
- can have multiple conditions on same lock
- enables the effect of having multiple wait-sets per object
- factors out the Object monitor methods (wait, notify and notifyAll) into distinct objects to give the effect of having multiple wait-sets per object
- Where a Lock replaces the use of synchronized methods and statements, a Condition replaces the use of the Object monitor methods.

- await releases the lock, suspend the thread, signal, signalAll
- BoundedBuffer
Condition notFull = lock.newCondition();
Condition notEmpty = lock.newCondition();

- can be released by other threads, useful for deadlock recovery
- P(acquire), V(release)

- a counter (integer) that allows a thread to get into a critical region if the value of the counter is greater than 0
- implementation: no lock, CAS
- acquire/(permits), tryAcquire, release/(permits)
CountDownLatch countDownLatch = new CountDownLatch(TOTAL_EVENTS);
- We can reuse CyclicBarrier even if Barrier is broken but can't reuse CountDownLatch

Programming Concurrency on the JVM
Read-write lock - ReentrantReadWriteLock
Thread local storage
Prefer local thread variables over static and shared when possible
Avoiding deadlocks by ordering the locks
Using atomic variables instead of synchronization
Holding locks for as short a time as possible
Avoid executing inside the critical section the code you don't control.

Use lock instead of synchronize
Use CyclicBarrier and CountdownLatch instead of wait/notify

- visibility: any thread that reads a field will see the most recently written value
- operations on it not be reordered

- not cached in registers or in caches; always read from and write to main memory - thus a little slower

Ensure Visibility
- cross the memory barrier on getters, use volatile, atomicXXX, readlock etc
Ensure Atomicity

Don't start threads in constructors
- use static factory methods, thread pool
Use tryLock - instead of lock
Cancel a task - future.cancel, executor.cancel

Initialization-on-demand holder idiom
Double-checked locking

The long elements in the AtomicLongArray can be updated atomically
- compareAndSet: compare the value of a given element with a specified value, and if the two values are equal, set a new value for that element.
- getAndAdd(index, incrment), getAndIncrement(index), incrementAndGet


Concurrent data structures
- guarantees O(log(n)) operations
- sorted, ceilingEntry/Key, floorEntry/Key

- thread-safe implementation of the SortedSet
- concurrent crud
- slower than TreeSet, don't use it unless in highly ...
- size is slow, O(n)
- head, nil are in every skip list
- when add, a random number generator is used to determine the levels of the new node
extend thread class vs implement runnable interface
- use threadpool to share threads if implement runnable

Missed Signals - link
boolean wasSignalled = false;
     } catch(InterruptedException e){...}
  wasSignalled = false;

Don't call wait() on constant String's or global objects
- use more memory
- prefer AtomicLong when contention is low


unsafe.getAndAddLong(this, valueOffset, 1L)
Implement AtomicInteger
int currentValue;
int previousValue;

Implement ReadWriteLock and ReentrantReadWriteLock - link
private Map readingThreads =
     new HashMap();

 private int writeAccesses    = 0;
 private int writeRequests    = 0;
 private Thread writingThread = null;
Implement Lock and ReEntrantLock
int lockHoldCount;
long IdOfThreadCurrentlyHoldingLock;

Implement Semaphore
int permits; synchronized
while(this.permits == 0) wait();

Implement CyclicBarrier
2 or more threads wait for each other to reach a common barrier point. When all threads have reached common barrier point (i.e. when all threads have called await() method), all waiting threads are released.
CyclicBarrier(int parties)

int initialParties; //total parties
int partiesAwait; //parties yet to arrive
synchronized await

Implementing (Thread Safe) Stacklink
- if(head.compareAndSet(oldHead, newHead))
Design threadsafe real time counter
private AtomicLongArray counter = new AtomicLongArray(GRANULARITY);
private volatile int pos = 0;
- Another thread that update pos every second
void incrementPosition(){
    counter.set((pos + 1)%GRANULARITY, 0);
    pos = (pos + 1)%GRANULARITY;


Person A holds A, PB holds B, PC holds C, how can we guarantee always print ABC
- Use Semaphore

Print even and odd numbers using threads
Dining Philosophers Problem

System Design: Learning from Existing Products

Pinterest Home Feed
- best-first rather than newest-first
- separation of available, but unseen, content and content that’s already been presented

- fan out write pins to followers
- batch write
standby HBase cluster
- speculative execution
  - make another call to the standby cluster if the first call to primary cluster fails or doesn’t return within certain time
- randomly forward x percent of calls to warm up it

smart feed worker
- score/store pins: quality and meaning to users
- repins by followers, related pins, interest Pins
- 3 priority queues(hbase tables) for each user, use key-based sorting of HBase: userid, score, pin
Smart feed content generator
- pins selected are removed from pool(hbase table)
Smart feed service
materialized feed

discover interests - Pinterest
-  n-grams
- verify with search queries, public domain ontologies like Wikipedia titles
- blacklist, stop words
- implicit and explicit signals from users

related interests
- co-occurrence relationship for interests

search query
- user's demographic stats
- store and use previous queries
- adjacent queries in the same search session

- clicks afert submit query

- users,tweets,be-followed,following
- persistent connection between client and server
Add a twitter
- Write it to the queue, return to client
- in background, save to db, push to (active) follower's queue
Cache aggressively
Vector Cache
- ids(user, tweet)
- user's followers(id)

Row Cache
- tweet

Fragment Cache
Page Cache

- separated and put into a different machine

Home Timeline - link
user's timeline vs home timeline

- precomputed and cached in redis for active users
Mixed Pull+Push
- local decisions about each producer/consumer pair
- how often producer adds a event, how ofter consumer views the event
- celebrity twitter - Pull

- normal user twitter - Push
Flock maintains the follower and followings lists.


- tweets are stored in memory Lucene

- rerank based on social proof, the number of retweets, favorites, and replies.

- so every time, a tweet is retweeted or etc, no need update 


Separate data
Recent Activity - hot
- changes frequently, cache is frequently invalidated
- small, fast to regenerate them

Non-Recent Activity - cold
- a lot of data, slow to generate
- but don't change, cache will not be invalidated

Photo storage
- use cdn popular images
- use Haystack for other images - long tail
NFS photo infrastructure - old version
- huge metadata, most are not used(permission)
- use cdn to store images
- one(or more) to translate the filename into inode number, another to read the inode from disk, and a final one to read the file itself.
- need keep all inodes in memeory
- CDN good for hot data, but not so for cold data
long tail data
-  requests for less popular (often older) content

Haystack - link
Haystack Object Store
- a log structured (append-only) object store containing needles representing the stored objects
- eliminates the metadata overhead by aggregating hundreds of thousands of images in a single haystack store file
- at most one disk operation per read, keep all metadata in main memory
- haystack store file
- index - needle's location, not critical, can be rebuilt, written asynchronously

needle is identified by Offset, Key, Alternate Key, Cookie
- Don't allow overwrite existing needle, instead: add a new version and its index
- needles with tuple: duplicate Key, Alternate Key, Cookie, the largest offset is the most recent one
Read - Pass needle offset, key, alternate key, cookie and the data size
Delete - set a deleted bit in the flags

- Fault-tolerant: replicates

Photo Store Server - http layer
- keeps a minimal in-memory index
- Google sparse hash data structure

- reclaim the space
- copy needles while skipping any duplicate or deleted entries

Haystack Cache
- insulation if upstream CDN nodes fail and

Timeline - your story
News Feed - Multifeed
- multi-fetch and aggregate stories at read time
- aggregator(staeless) and leafs
- aggregator servers sends request to all leaf servers and ranks the results.
- all data in leafs is in memory (memcached)
- fetch recent activity from all your friends
- gather and Group into stories
- rank stories by relevance
- more flexible, manageable
- fan-out reads can be bounded. writes, often cannot

- others's story that related to you
- rank edges(stories) by importance(relevance) - link
- content score: ∑e = ue we de, wherein ∑e is the sum of the edge's rank, ue is the affinity score with the user who created the edge, we is the weight for the content type, and de is a time decay factor

- a friend's affinity score: ∑i = li ni wi, wherein ∑i is the sum of the interactions with that friend, li is the time since your last interaction (this would need to be weighted so that 1 day > 30 days), ni is the number of interacts, and wi is the weight of those interactions

- Activity(id, user_id, source_id, activity_type, edge_rank, parent_id, parent_type, data, time)
- timeline - user's story
- newsfeed
connections and activities

"Push" Model, or Fan-out-on-write
- a lot of write, only push to active users for non-celebrity users
- use write-optimized data store such as Cassandra, HBase, or BigTable
"Pull" Model, or Fan-out-on-load
- less data, less computation
- materialized what user already viewed into a table, only pull new data
- have a fallback mechanism if fails to pull - show old materialized data

Combination of push and pull
- make decision locally on per consumer/producer basis
- a frequent producer of events may overshadow events from less frequent producers
Selective Fanout Based On Consumer
- only fan-out to active/valuable users
Selective Fanout Based On Producer

- don't fan-out for high profile users
Pull for new users
- because nothing pushed to the user 
- when there is no enough data, pull from following

Make schema changes is expensive in traditional db
- such as add/remove indexes
- Users don't know what they don't know.

- All content is not created equal.

- Treat nonlogged user as second class citizens
- Return cached content to them
- Put a limit on everything
- Provide timeout

- Expire data. Lock out old threads and create a fully rendered page and cache it.

Red Envelop
- store count, balance into redis
- use redis 
  - decr count, CAS to decrease balance
- ttl

Driver - Long connection
- matches drivers and riders, the supply and demand
Maps/ETA (estimated time of arrival)

- plugins, extensible
- fetching pages, filtering  and normalizing URLs and parsing responses

Etsy - link
$choose_connection = mt_rand() < $affinity;
Pruning Same/similiar message
combine similar stories into rollups
- group similar storeis

- seperate global and unique to the person
Cache Warming - gearman
- proactively
- TTL Caching
Lazy Below the Fold


EFK - Log collection/visualization
logstash/fleuent -> kafka -> Elasticsearch -> Kibana

System Design - Summary

Identify the user cases, requirements, constraints, scope 
- ask a lot of questions
- current requirements and requirements in near future
talk about near-future requirements and design for it. 
- who is the target audience? how it will be used?
- which features would be Pri-0 versus Pri-1?
- what are key and high traffic operations?
- how accurate it requires?
- max requests/day, QPS, storage size, latency, availability
- high level design and architecture
- main user cases/specific APIs for each component
Try multiple approaches
- high level: API, application layer, data layer then drill down
- sketch up the major components and their relationships
- draw a simple diagram

Identify bottlenecks/limitations and how to tackle them

Performance, Consistency, Availability, Safety

Compare with other approaches

- performance characteristics of proposed architecture

Relax Temporal Constraints
- Alleviate temporal constraints in your system whenever possible

Keys to improve performance
- cache, partition, replicate, index

Differentiated service classes
Different services/data need different guarantees/SLA

Separation of concerns/things

separate user-facing components from non user-facing components
- critical vs non-critical services
- different db/apps into different servers
- differentiation of different components
Separate frontend and backend
- always writable in frontend
- backend can write to queue, and asynchronously do business logic
Functional partition services
- understand who are priority customers

- coarse granularity of services
- establish clear relationships between the service
- isolate problems, scale independently
- solve problems independently

Restful Service
- cache friendly

- good for public api
- put: idempotent

Use standard timestamps - UTC only

Use downcased and dash-separated path names, downcased, underscore separators for attribute names

- actions/:action
Avoid auto-I=increment id; use UUID
Use Plural Noun
put response entity into a namespace(data), able to extend/add counts, links
flexible - client can choose which fields/sub resources to return
pagination - Define a Maximum
Don’t use integers as timestamps

Etag + If-None-Match conditional GET

- Protobuf, Thrift, Apache Avro
- Efficency
- But hard to debug and not flexible

Services features
- read heavy (improve latency) write heavy(save space, async,queue) or both
- user facing or not?
 - delay acceptable, (write feeds)
- SLA, performance requirement

- each node may be the leader for some partitions, and a follower for other partitions.
partitioning by hash of key
- lose the ability to do efficient range queries: rang query has to be sent to all partitions
- (user_id, update_timestamp); (sensor, update_timestamp)
- can perform an efficient range query if specify fixed value for the first key

CAP theorem
Consistency - all of the nodes see the same data at the same time. 
Availability - any available node can serve client requests even when other nodes fail. 
Partition tolerance 
- system can operate even in the face of network failures where communication between nodes is impossible.
- happens when some nodes in the system cannot reach other nodes, but all can be reached by clients.
CP - HBase, Bigtable, MongoDB, Redis
AP - Dynamdb, Cassandra, Voldemort, Riak, SimpleDB

CA - RMDBMs, Zookeeper

Consistency - link

Eventual consistency
Causal consistency
RYW (Read-Your-Writes) consistency - desired
- read it from the leader
- read user own data from leader, read others data from replicate
- or always read/write to same replica
Session consistency
- guarantee read-your-writes consistency in same session
Monotonic read consistency - desired
- if a process has seen a particular value for the object, any subsequent accesses will never return any previous values.
- always read to same replica
Monotonic write consistency
- guarantee to serialize the writes by the same process

- vector clock + RWN

Use write version - to implement optimistic lock, or consistency
Stickiness of clients

- N: number of replicas
- W: number of replicas that need to acknowledge update
- R: number of replicas that are read
- Normally: N=3, W=2, R=2
- Optimize for read:  R=1, W=N
- Optimize for write: W=1, R=N

BASE(Basically Available, Soft State, and Eventually Consistent)

- Scale reads not writes

- Statement-based
- Write-ahead log (WAL)
- Logical log - MySQL binlog
Master-Master - lose single-source-of-truth
Single Master/Multiple Slaves

Fine-Grained Dynamic Partitioning
Speeds Failure Recovery
- Many machines each recover one or a few partition
Selective Replication
- find heavily used items and make more replicas
- can be static or dynamic


- Kick off the same computation more than once and return the one that finishes fastest.
- Use multiple vendors for same service

- rate-limiting, remove slow servers
- Always put a limit on everything

How to reduce latency - link
Differentiated service classes
- prioritized request queues in servers
Manage expensive background activities
- defer until load is lower
remove capacity under load to improve latency
Use right image/video types
Decrease processing time
- Collocation(locality)
- Caching, Pooling
- Parallelization

- Partitioning

Cross request adaptation - balance load across set of servers
Within request adaptation
- Canary Requests
- Backup Requests (w/ Cross-Server Cancellation)
  – send request to first replica
  – wait 2 ms, and send to second replica
  – servers cancel request on other replica when starting read
Tainted Partial Results

- mark such results as tainted in caches

Use Timeouts pattern

- default Jersey uses HttpURLConnection, its socket timeout of 0 millis, meaning that the timeout is infinite.
- tryLock, lock with timeout

Use Connection pooling

How to handle failures
Design for Fault Tolerance and Graceful Failure
Graceful Degradation - reduced functionality
- degradable service

- better to give users limited functionality than an error page
- still do something reasonable even if not all is right
  - read only from local cache, only from distributed cache, only return fixed data
- rate limit, throttles
Fail as Early/fast as Possible - circuit breaker

Avoid single point of failure
Strive for active/active
Avoid hotspot

How to handle overload in each node
- know your limit, and do throttle
- reject new request
- clear request queue

Lazy purge - actively purge

Approximate correctness
- do a transaction every once in awhile instead of every time
- Don't do work unless it's really needed

No computation at all
Pre-computation and cache


Route 53 - latency-based routing

geoDNS - based on location

- Use multiple domains

Load Balancer

- SSL termination/offloading (AWS ELB)

- SSL farm
- scale lb using dns otr layered lbs

Cache aggressively
CDN - CloudFront
- static files, not-change-often API response
Layered Cache
Application Cache
- In-Process Caching(ehcahce)
- Distributed Caching(redis)
- Vector/row/fragment/page cache
- spring cache sync=true
Reverse proxy: Nginx
Browser cache: cache header(filter, or httpd config)
- Prevent 雪崩

- separate global and unique to the person
Cache Warming - gearman
- proactively
- TTL(timeout-based) Caching
Write-through cache - link
- directs write I/O onto cache and through to underlying permanent storage
- write is slower, good for apps that writes and then re-reads data frequently
Write-around cache
- write I/O is written directly to permanent storage, bypassing the cache
Write-back cache
-  write I/O is directed to cache, backed up to permanent storage only at specified intervals or under certain conditions
- good for write-intensive, non-critical app, may lose data if cache dies
- time shifting, rate limiting, conflation
Cache Aside
- read from cache, if missed, read from db then put it into cache

- update to db, and invalidate cache

Queue - kafka, redis, MQ
- Queue-based Load Leveling Pattern
- Concurrent request converted to sequential request
- Evening out traffic spikes, overdraft protection
- Isolating failures and self-healing
- Enabling asynchronous processing
- Decoupling
Partial message ordering

- one approach: at-least-once + idempotent
Multiple Queue
- different priorities, put same host into same queue - to be polite

Priority Queue pattern

- Use redis

-- or multiple queue with different priorities

Publish/Subscribe - Kafka, MQ
Asynchrony - asynchronous processing

- order placed successfully TO order received successfully
- queue, fire and forget calls

- avoid blocking calls, reduce contention

Batch operations

Share Nothing

Make API Retryable, Idempotent and Killable

- selective retry - only for transient errors that may succeed on a subsequent attempt
- don't retry permanent(unrecoverable) errors
- limit the number of retries and time spent
- use exponential backoff with random
- handle overload-related errors differently from other errors
RetryTemplate, FixedBackOffPolicy, setMaxAttempts, HttpClient

Only retry in outermost service

Accurate vs approximately
Availability and Reliability
- how things can fail
Learn and work on an actual system
- focus on the architecture and the tradeoffs behind each decision

- do back-of-the-envelope calculations for something you’re building

Feature flags

- control flag to upstream(in nginx)

Design to Clone Things (X Axis)
- clone services + load balancers
Design to Split Different Things (Y Axis)
- Separate sets of data, can scale them differently.
- Separation of distinct and different functions and data

Design to Split Similar Things (Z Axis): sharding and podding

Design for Rollback
Design to Be Disabled
Design to Be Monitored
Nagios, Ganglia, Graphite, statsd

Memory-based session state with Load balancer affinity(AWS ELB)
- Cookie learning vs Cookie injection

Persist session state to a DB

Data compression - gzip header

Polyglot MicroServices
- Single-purpose
- Modular and independent
- Don't share a persistence layer, each service has its own database

Categorize users

- less frequency ops for inactive users

- return cached content(ads) for non-logged in user

Explore locality - partition
- temporal, location

Data Store

- Choose the right database for the right service
- Know what queries are needed currently or in near-future
Read to write ratio
- read/write heavy or both
- how frequently we write/read
- use different db for read/write
- How much data
  - it may be too large/cost to keep all in memory
- is it time series?
- row size
- how long to keep the data, policy to delete/archive old data
Hot/Cold Data
- Hot data to memory, cold data to ssd or disk(Glacier)
- Store them in different system
Separate index and content
- index for search, no need to store everything (solr).
- store content in KV db
Separate read/write
- Write to master, read from replica
- First write to queue, then sync to master
- Use different data stores

Store raw data to S3 or hdfs
Store Data that matters to datastore

- index everything want to query on, only query on indexes

Denormalize the data
Split up large tables

- more read than writes
- writes goes to master
- Related or not
- Need ACID or not


Use multiple data centers

Break incompatible schema change into multiple database and application deployments.

Optimistic vs Pessimistic lock

- separate big fields into different table: post/tweet body
  - read by blocks, smaller rows => read more rows

- avoid joins, full table scan

PoPs servers - link
- maintain a set of ready-to-use, opened TCP connections to the main cluster service
- serve nearby user requests and forward them to the main cluster service using the ready-to-use, established connections

GeoHash - link
- divide the world into a grid of 32 cells
- each cell can be further divided into another 32 cells, which can be divided into another 32 cells, and so on.
- the longer, the more accurate

- the longer the shared prefix, the closer they are

Prefer sequential write vs random write
- Example: Cassandra, Solr

Gated Launch
Jitter - Add some randomness
- avoid a thundering herd
- Add Entropy Back into Your System

Security - link
fail securely
least privilege

never trust any input!

Deployments/Zero downtime
- rolling deployments


- not instantaneous

- can scale up or down

Valet Key Pattern - supported by S3

Command and Query Responsibility Segregation (CQRS) Pattern

Materialized View Pattern


SPU - Standard Product Unit
SKU - Stock Keeping Unit
MVP - minimum viable product

Robustness Principle
Be conservative in what you send, be liberal in what you accept
Be conservative in what you do, be liberal in what you accept from others

Reactive programming
- data flows and the propagation of change

- when x changes in one location,  things depend on the value of x are recalculated and updated in various other locations

CPU Bound Application
Caching precomputing values
Performing the computation in separate background job.

Most apps are IO Bound Application - db/web service calls

Lambda Architecture - link
- all data is dispatched to both the batch layer and the speed layer
- batch layer: managing the master dataset (an immutable, append-only set of raw data), to pre-compute the batch views.
- serving layer indexes the batch views
- merging results from batch views and real-time views.

Option 1. Hadoop + Spark Streaming + Memory Storage + Columnar Storage

Metrics, Numbers, Estimation
Apache Httpd Server
- one connection, one process
- max cocurrent connections: ~4000
- yaws
HAProxy, nginx
- 20K concurrent connections
Web application server
- few thousand concurrent requests
6gb memory - 50K or 140K RPS (different bandwidth)
50gb memory - 150K or 250K with Redis clustering, throughput increases linearly as you increase the number of shards
- mean read latency: 13 ms
- write latency: 25 ms
- 200k messages/sec for writes and 3M messages/sec for reads
- Each broker can handle terabytes of messages

Send 1M bytes over 1Gbps network 10ms
Read 1M sequentially from memory 0.25ms
Round trip within data center 0.5ms
Disk seek 8~10ms
Read 1MB sequentially from disk 20~25ms

Back-of-the-envelope Calculation
2^30 bytes ~~ 1GB
2^40 ~~ 1 TB
2^10 ~~ 1 KB
2^20 ~~ 1 MB

million -> 10^6
billion -> 10^9 ~ 2^30 bytes ~ 1GB
trillion -> 10^12 bytes~ 1 TB

QPS - Queries per second SLA - Service-level agreement 
- Inter-Reference Recency
- Low IRR, Hight IRR



Java (159) Lucene-Solr (110) All (60) Interview (59) J2SE (53) Algorithm (37) Eclipse (35) Soft Skills (35) Code Example (31) Linux (26) JavaScript (23) Spring (22) Windows (22) Web Development (20) Tools (19) Nutch2 (18) Bugs (17) Debug (15) Defects (14) Text Mining (14) J2EE (13) Network (13) PowerShell (11) Chrome (9) Continuous Integration (9) How to (9) Learning code (9) Performance (9) UIMA (9) html (9) Design (8) Dynamic Languages (8) Http Client (8) Maven (8) Security (8) Trouble Shooting (8) bat (8) blogger (8) Big Data (7) Google (7) Guava (7) JSON (7) Problem Solving (7) ANT (6) Coding Skills (6) Database (6) Scala (6) Shell (6) css (6) Algorithm Series (5) Cache (5) IDE (5) Lesson Learned (5) Miscs (5) Programmer Skills (5) System Design (5) Tips (5) adsense (5) xml (5) AIX (4) Code Quality (4) GAE (4) Git (4) Good Programming Practices (4) Jackson (4) Memory Usage (4) OpenNLP (4) Project Managment (4) Python (4) Spark (4) Testing (4) ads (4) regular-expression (4) Android (3) Apache Spark (3) Become a Better You (3) Concurrency (3) Eclipse RCP (3) English (3) Firefox (3) Happy Hacking (3) IBM (3) J2SE Knowledge Series (3) JAX-RS (3) Jetty (3) Restful Web Service (3) Script (3) regex (3) seo (3) .Net (2) Android Studio (2) Apache (2) Apache Procrun (2) Architecture (2) Batch (2) Build (2) Building Scalable Web Sites (2) C# (2) C/C++ (2) CSV (2) Career (2) Cassandra (2) Distributed (2) Fiddler (2) Google Drive (2) Gson (2) Html Parser (2) Http (2) Image Tools (2) JQuery (2) Jersey (2) LDAP (2) Life (2) Logging (2) Software Issues (2) Storage (2) Text Search (2) xml parser (2) AOP (1) Application Design (1) AspectJ (1) Bit Operation (1) Chrome DevTools (1) Cloud (1) Codility (1) Data Mining (1) Data Structure (1) ExceptionUtils (1) Exif (1) Feature Request (1) FindBugs (1) Greasemonkey (1) HTML5 (1) Httpd (1) I18N (1) IBM Java Thread Dump Analyzer (1) JDK Source Code (1) JDK8 (1) JMX (1) Lazy Developer (1) Mac (1) Machine Learning (1) Mobile (1) My Plan for 2010 (1) Netbeans (1) Notes (1) Operating System (1) Perl (1) Problems (1) Product Architecture (1) Programming Life (1) Quality (1) Redhat (1) Redis (1) Review (1) RxJava (1) Solutions logs (1) Team Management (1) Thread Dump Analyzer (1) Troubleshooting (1) Visualization (1) boilerpipe (1) htm (1) ongoing (1) procrun (1) rss (1)

Popular Posts