Eclipse vs IntelliJ

Better ways to organize info - Winner: Eclipse
-  Eclipse provides workspace, working set, projects
-  Organizes views into perspectives, provides view like console, problems, errors. 
- We always know where to find info we need.

Editor -Winner: Eclipse
Eclipse shows error/warning in current window in left side bar. 
IntelliJ only shows error/warning for all content in right side bar. (Eclipse also shows things in right side bar)
When you write code, its much easier to check error/warning in Eclipse.

Shortcut keys -Winner: Eclipse
Eclipse's default shortcut is more natural to me: easier to remember and type. Usually Eclipse uses less keys. IntelliJ uses all keys F1-10.

To format code: Cmd+shift+f in eclipse Cmd+alt+l in IntelliJ
- what L means?

Quick fix - Winner: Eclipse
- Eclipse quick fix(ctrl+1) is too powerful, it can fix bugs, generate code based on current context.
- IntelliJ provides too many ways to do similar things - wtf?
Show intention actions - alt+enter
Basic code completion - ctrl+space

Smart type code completion - ctrl+shift+space

Debug
- Drop to frame always works in Eclipse, but doesn't in IntelliJ
- Display view in Eclipse stores history, it's much easier to use

IntelliJ does have its own advantages such as: smart step into for anonymous classes and lambdas.

Search everywhere in IntelliJ
In Eclipse, you have to use different shortcuts/ways to search type, resource, views. 
In IntelliJ, just double press shift keys.

Better Performance/More Stable - Winner: IntelliJ
Sometimes Eclipse is slow or gets stuck, you need take time to customize it, check Improve Eclipse Performance


Same code may work in Eclipse but not in IntelliJ
Eclipse and IntelliJ doesn't works well together, if some team members are using eclipse, you better also use eclipse, as sometimes, the same code works and can be deployed in eclipse's tomcat but doesn't work or can't be deployed in IntelliJ tomcat.

More about Eclipse and IntelliJ:
Mastering Eclipse IDE - Tips, Tricks
IntelliJ IDEA Tips and Tricks

How to Be a Great Technical Lead

How to choose tech stack
- List your current/near-future features
- Consider multiple/different options and how to use them to implement features 
- Compare their pros and cons: manageability, scalability 
- What techs are used in your team and departments
    - But this doesn't mean always use same tech stacks used before 
- What tech are supported by operation team

Build reputation
- be useful and help others

Know your team members
- their strength or weakness
  - programming skills
  - design skills
  - trouble shooting skills
- what they are interested, what are not
- what they want to learn and improve
Help them to grow and improve their skills

Know yourself
Know how/what you can help the team
- your programming/design/communication skills
Know when to stand alone and when to step in
Know when to let others shine

Delegation
- don't try to do everything
- let other people do the job they are good at and observe
- let others contribute and take ownership/leadership

Spring Tips & Tricks

Spring-boot
Spring-boot specifies and manages its dependencies' version:
check https://github.com/spring-projects/spring-boot/blob/master/spring-boot-dependencies/pom.xml. 
Don't specify and overwrite the version again in our project's pom.xml.
Don't use same property value to overwrite the version by accident.

Properties
@Bean public static PropertySourcesPlaceholderConfigurer propertyConfig(){}

Bind properties to a list or a map
First create ConversionService bean
@Bean public ConversionService conversionService() {
    return new DefaultConversionService();
}

my.list.of.ints=1,2,3
@Value("${my.list.of.ints}")
private List<Integer> myList

redis.expires={typeA:900, typeB: 86400}
@Value("#{${redis.expires}}")

private Map<String, Long> expires;

Spring Cache
Support Spring Expression Language in Spring AOP
In contrast to the @Cacheable annotation, @CachePut does not cause the advised method to be skipped. Rather, it always causes the method to be invoked and its result to be stored in the associated cache. 
We can use #result as key in @CachePut.

CacheAspectSupport.execute
CacheAspectSupport.generateKey
SpringCacheAnnotationParser.parseCacheAnnotations
CacheOperationExpressionEvaluator.createEvaluationContext
if (result != NO_RESULT) evaluationContext.setVariable(RESULT_VARIABLE, result);

- To support #result, we just put method return result value into the variable: result.

Synchronized caching
@Cacheable(sync="true")
@CacheConfig(cacheNames = "cacheA")
@Cacheable(key= ("#p0['name']")) // p0 is a map
@CachePut(key = "#p0.name") // p0 is an instance
@Cacheable(key = "'somePrefix-'+#p0")

public static final String PREFIX = "'" + CACHE_NAME + "-'+";
@Cacheable(key = PREFIX + "#p0.toString()") // p0 is UUID

@CachePut(key = PREFIX + "#result.uuid.toString()", condition = "#p2 == 'typeA'")

Compared with condition, unless expressions are evaluated after the method has been called.
unless = "#result != null"

Not cache null value explicitly
@Cacheable(unless = "#result == null")

Change cache log level to trace
logger name="org.springframework.cache" level="trace"
No cache entry for key
Cache entry for key found in cache

Redis Cache
- Cache null values
- Use properties to set different expires for different cache regions
public CacheManager cacheManager(final StringRedisTemplate redisTemplate) {
    final RedisCacheManager cacheManager =
            new RedisCacheManager(redisTemplate, Collections.<String>emptyList(), true);
    cacheManager.setDefaultExpiration(86400);
    cacheManager.setExpires(expires);
    return cacheManager;
}
configKeys = stringRedisTemplate.opsForZSet().range(cacheKey, 0, -1)
redisTemplate.opsForValue().multiGet(configKeys);
stringRedisTemplate.opsForZSet().size(cacheKey)

Bind Collection of Beans
@Autowired(required = false)
private List<XInterface> instances;

WebApplicationInitializer
AbstractAnnotationConfigDispatcherServletInitializer

Using Spring managed Bean in non-managed object
SpringContextBridge

Spring EL
@Value("${property:default_vaule}")
@Value("#{'${my.list.of.strings}'.split(',')}")
private List myList;
@Value("#{someBean.someMethod()}")

Parse annotation's value using Spring EL
// instance are thread safe
private final ExpressionParser parser = new SpelExpressionParser(); 
final XAnnotation xAnnotation = method.getAnnotation(XAnnotation.class);
final Expression exp = parser.parseExpression(xAnnotation.propertyA());
final EvaluationContext context = new MethodBasedEvaluationContext(invocation.getThis(), invocation.getMethod(),
        invocation.getArguments(), new LocalVariableTableParameterNameDiscoverer());


final Object value = exp.getValue(context);

Spring EL
#root, #root.methodName, #root.method.name, #result


@PostConstruct
@PreDestroy

BeanUtils.copyProperties - copy properties while ignoring null values
@EnableRetry
@Retryable(value = {FooException.class, BarException.class}, maxAttempts = 5)


Spring-Security
Use AbstractSecurityWebApplicationInitializer
Configure it by extends WebSecurityConfigurerAdapter

Authentication auth = SecurityContextHolder.getContext().getAuthentication();

Integrate In-Memory Authentication for Test Automation

Use inMemoryAuthentication in dev lines for automation test

Extend UsernamePasswordAuthenticationFilter
Build Multi-Tenant Application

Spring Test
@RunWith(SpringJUnit4ClassRunner.class)
@ContextConfiguration(loader = AnnotationConfigContextLoader.class)

Spring test doesn't work with JUnit 4.11
InitializationError When Use Spring Test + JUnit

Spring data
How pring-data-cassandra runs query
org.springframework.aop.framework.JdkDynamicAopProxy.invoke
org.springframework.data.repository.core.support.RepositoryFactorySupport.QueryExecutorMethodInterceptor.doInvoke(MethodInvocation)

org.springframework.data.cassandra.repository.query.AbstractCassandraQuery.execute(Object[])

SimpleCassandraRepository

Add @NoRepositoryBean to intermediate repository interface.
Spring-data-solr

@Score private Float score;

Spring-sftp
com.jcraft.jsch.JSchException: UnknownHostKey
ssh-keyscan server-ip-address > my_known_hosts
new DefaultSftpSessionFactory().setKnownHosts(new ClassPathResource("my_known_hosts").getFile().getAbsolutePath());

Spring-mvc
PathVariable contains special characters 
@GetMapping("/path/{variable:.+}/")

@PathVariable(value = "variable")

Return raw json string
- Add StringHttpMessageConverter before MappingJackson2HttpMessageConverter in method configureMessageConverters of WebMvcConfigurationSupport

Misc
Injecting Bean Dependencies as Method Parameters
@Bean
public Foo foo(@Qualifier("bar1") Bar bar1) {
 return new Foo(bar1);
}
@Bean public Bar bar1() {
 return new Bar("bar1");
}
This approach is better than the following one as there is only one bean(bar) created.
@Bean
public Foo foo() {
 return new Foo(bar());
}
-Dspring.profiles.active=p1,p2

afterPropertiesSet
If we create a bean manually in code and it implements InitializingBean interface which will be called automatically after the bean is created by Spring, we may have to call afterPropertiesSet manually.

@Autowired(required = false)
@Qualifier("aConf")

private Configuration aConf;

Journey to be A Great Engineer

Self-Awareness
- strength, weaknesses
- have a plan to overcome the weaknesses that matters
Brand yourself/build reputation
- join discussion, other projects proactively
- make presentation, give a talk
- be useful and help others grow
Be better everyday
- writer better code, design system better
- try new/different/best approach
Never give up
- exhausts All Resources to solve the problem
- ask more questions and dig deeper
Follow up
- be inquisitive
- how others eventually fix/solve the problem
- what's the real root cause

Reflection
Keep Learning everyday
- in different ways
  - learn from books/blog/video/podcast
  - learn from others: no matter whether developers are senior or junior than you
  - be open-minded
- learn related/important skills that matters
  - how to search in codebase, log, git(hub) efficiently
  - troubleshoot, IDE
  - how to review code
- learn tech stack used by other team (members)
- understand trends and new techs
  - what's the advantage, when should use it when not
  - how it works, how it's designed
Maintain your learning log 
- blog, notes, doc etc
Maintain your toolbox
- languages/frameworks/ide/bash, new features

Keep calm and be happy/excited
- when meet some difficult problems
- when hear suggestions

Preparation
- prepare for next meeting/discussion/feature/project
- learning the techs that may be used in current or next feature/project
- learn the techs that may be different(better) and can improve your current solution
- don't blindly repeat same approach/design/code

Use time efficiently
Meetings
- Know the topics, what you can get and what you can contribute
- Whether it's important to you, if not and you are not required, skip it

- Avoid fragmented time; if can't, use it efficiently(reading, podcast)
- don’t lose state when context switch
- use tools to automate

Think/Discuss/design more before take action
- full algorithm/design before coding
- don't take action blindly: make change then test it or debug it
- check and infer first
Admit what you don’t know
- don't pretend
- ask questions and learn it

Share knowledge
- share more and learn more

How to Ask questions
Ask more and deeper questions to yourself
Ask questions to others
- don't ask simple knowledgeable questions too often
  -- learn/read/google first
- But don't hesitate to ask questions to any one if needed

How to answer questions
- first understand the question and what's the real question and issue
- don't just assume others are asking simple or stupid questions
- if not 100% sure, check it before answer

Quickly vs Slowly
Know when to solve problem or do things quickly when to do things slowly
- Solve problems quickly so others can move on
- But take time to find root cause or reflect if it matters

Career Development
- be initiative
- talk with your managers
- express your interest/passion and goal
- work on interesting/challenging projects
- prefer yes over no

Troubleshooting Practices

Troubleshooting is important and inevitable for engineers when we try to figure out why the code doesn't work; it also improves our problem solving skills: what steps to take to quickly find the root cause.

This articles is to reflect the process that I took to find the root cause.

Can't deploy Spring-boot application to tomcat
- SpringBootServletInitializer.configure is not called at all

During dev, the team is running spring-boot application as java application. But later when we deploy to the cloud, we need run it as web application. The developer tried it but doesn't work.

We found out it's because the dev specifies 2.5 in web.xml unintentionally and Spring-boot requires Servlet 3.0 by default.
http://docs.spring.io/spring-boot/docs/current/reference/html/howto-traditional-deployment.html

jackson-module-scala: NoSuchMethodError due to intellij
Recently when I ran tests in intellij, it throws the following exception:
java.lang.NoSuchMethodError: scala.util.matching.Regex.unapplySeq(Ljava/lang/CharSequence;)Lscala/Option;
at com.fasterxml.jackson.module.scala.JacksonModule$.version$lzycompute(JacksonModule.scala:30)
at com.fasterxml.jackson.module.scala.JacksonModule$.version(JacksonModule.scala:26)
at com.fasterxml.jackson.module.scala.JacksonModule$class.version(JacksonModule.scala:49)
at com.fasterxml.jackson.module.scala.DefaultScalaModule.version(DefaultScalaModule.scala:19)
at com.fasterxml.jackson.databind.ObjectMapper.registerModule(ObjectMapper.java:701)

Seems it's caused by jackson-module-scala, but I use mvn dependency:tree and dependency:copy-dependencies to check all thing looks good: we use scala-2.11 and jackson-module-scala_2.11. Also if I ran the test in command line, it works.

This made me think maybe this is because of intellij. 
I added a breakpoint and run: scala.util.Properties.versionString in debug mode, it's 2.10.6. This is strange. Then I check external libraries in intellij, there are indeed multiple scala-library versions: 2.10.6 and 2.11.8.

In our current projects, we don't use scala-library-2.10.6 anywhere. Maybe previously we use scala 2.10.6, later we upgraded to 2.11. But intellij still keeps reference to old scala in external libraries.
- no effects when reimport all maven projects

The workaround is to delete .idea/libraries internals and restart IntelliJ and reimport projects.

ClassNotFoundException: StandardSessionIdGenerator because we overwrite tomcat.version
- Asked by team mate to help solve this issue
Usually this is because of version conflict. We use spring-boot 1.5, but the tomcat-embed-core is 7.0.65. This looks suspicious. Then I checked the pom.xml. This makes me check how spring-boot specifies its version. search tomcat-embed-core in https://github.com/spring-projects, found 
spring-boot-starters/spring-boot-starter-tomcat/pom.xml and spring-boot-dependencies/pom.xml:
<tomcat.version>8.5.6</tomcat.version>

But we also defined same property: tomcat.version to 7.0.50 in project's pom.xml for some tomcat dependencies we used. This overwrites spring-boot property. So now spring-boot use tomcat-embed-core 7.0.65 which doesn't have StandardSessionIdGenerator class.

The fix is to change our own property to xx. tomcat.version and also bumped its version to same as spring-boot.
- spring-boot should add prefix such as spring.boot to these properties.

Find code by running sample code and debugging it
Build Web Service APIs to Update Solr's Managed Resources (stop words, synonyms)
I need figure out how to get address of one of live solr nodes by using CloudSolrClient.  I know solrJ uses apache http client to send requests, somewhere it needs get addresses of solr nodes, so I write some sample code which sends invalid request, solrJ will throw exception, then I check the stack trace form the log and find out the code that I need at org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(SolrRequest, String).

System Design Practices

Design home page stream
Design feed
Mixed approaches - Push + Pull from 
-- selectively disable push for celebrities, pull them instead.
https://engineering.pinterest.com/blog/building-scalable-and-available-home-feed
Design Decisions For Scaling Your High Traffic Feeds

Cassandra + Denormalize
Redis + Normalized

Selective Fanout Based On Producer/Consumer

URL shortener
https://www.youtube.com/watch?v=fMZMm_0ZhK4 - Range based 
- read heavy
- separate read/write to different apps
- able to disable write functions
- id (auto increment), actual url, and shorten url

- base 62 encode the id to string(shorten url)
  - generate auto-generate id
  - can use sql or nosql as long as find a way to generate unique long
- rate limit, prevent abuse
Insert Long Url to db, get the primary key: long
Convert base10 to Base62[0-9a-zA-Z], other characters such as -,_+, convert long to short url
Decode short url to long number and query db to get long url.

How to make it scalable
How to analyze short url click event
301 or 302?
- custom url

Extension - popular urls in last hour/day

Generate unique ID numbers
Twitter Snowflake
- 41bit timestamp
- 10bit workerid
- 12bit sequence
Flickr Ticket Servers
TicketServer1:
auto-increment-increment = 2
auto-increment-offset = 1

TicketServer2:
auto-increment-increment = 2
auto-increment-offset = 2
Map many logical shards to fewer physical shards
How to assign shards when we can add/remove nodes?

User Overseer(leader) to change concurrent into serialized problem


Crawler
- politeness: rate limit
- put same host pages into same queue
- different priority for different sites - multiple queues
- normalization/canonicalization
- directed graph
- each URL: last visited time, score, fingerprint
- Use Bloom filters to remove duplicates
Re-crawling
- predict which websites are most likely to have frequent update, put those into the fetchers’ priority queue.
How to handle realtime news?
- each url: next visited time based on its score and how frequently it changes
- if page(links, content) is (mostly) not changed, increase its next visit time(cut+2t) otherwise decrease its next visit time
- begin with a seed set
- use a simple fingerprint such as a checksum
- the frontier
- normalized URL
DNS resolver
Locality
- Put data centers in different regions - machines in asian to crawl asian web sites

To crawl price histories of products
signals:
- previous price history changes
- how many reviews they have

- To detect new products: provide api so client can add or manually crawl new added pages.

Design an in-memory search engine
How to index and store in memory
How to support free text queries, phrase queries
Map<String, List<Document>>
List<Document> is sorted
Document: docId, List<Long> positionIds
List<Long> is sorted
How to save to files
How to merge small files into big files
-- When save to file, make it sorted by word
-- Use merge sort to merge multiple fields
How to make it scalable - how solr cloud works 

Auto completion in search
- single node
- Trie, Topk


Design Hit Counter - how to count number of requests in last second, minute and hour - link1,

Generate Game Map
- brute force, randomly generate one, check whether it can be solved in limit time by machine

- different difficulty level

Design score/rank system for social game.
Need update score realtime?
- yes, for this user's operation, when score is small
- no, for others' operation, when score is big
Rank - realtime?
- yes, when user is ranked in X(1000),
- no, if user ranks is below Y(1M)
Average score
- rank: use sorted tree - sorted by score, node splitted by score range
- table: userid, score
- sharding

- append, not update existing data
- separate score change to different types: real time vs batchable
- batch score change

Search nearby places
- GeoHash

Design Slack
Design Chat app like messenger, what's app
https://www.youtube.com/watch?v=zKPNUMkwOJE&t=2s
Websockets
ServerA forward message to ServerB which forwards to clientB
How to manage/detect friends status - presence
- reduce message, avoid O(n^2)
- Each user subscribes to its subscription list in server
- When one user state's changed, it uses pub/sub to update all related users's subscription list in server side
- Server pushes friend status to each user  in some fixed interval - if there is change (batch)

How to store messages
Typing indicators
- Only send it to users who is currently interacting with this user/group

Message status: delivered, seen

De-duplication

- url dedup: use bloom filter
Detect duplicate documents in billions of urls
- back-of-the-envelope calculations
- (30 bytes for url + int/long(4 or 8 bytes)) * 10^9(~1g) ~ 34g
- A1: Hash and split into multiple files  - slow (even with multiple threads) as a lot of IOs
- A2: Parallelize in multiple machines - like map-reduce

Design log collection and analysis system

- Auxiliary service, should not affect the existing service (Monitor system memory and cpu, adjust the push rate accordingly)

Design shopping cart
addToCart,removeFromCart,changeQuantity
- monitor
- abandoned cart
guest cart
- cookie
- merged if user login later
- check items availability at some point

- call other services - stock service(price, availability), metadata service, images(cdn)

ecommerce - amazon
- nosql: wide column for product table

- concurrency for order: optimistic, pessimistic
- analytics
- abandoned cart, user behavior(click,view,add/remove from cart)
products
checkout
Payment
- two phrases: Authorization, Capture
shipping

customer service

Design rate limiter
- Separate from application
- warn/notify user before user reaches 90% limit
Algorithms:
Token bucket, Leaky bucket, rolling window, counter
Guava RateLimiter

Design Miao Sha
- Use cases: different phrases of the activity: before, available, sold out
- cache boolean available instead of the count

- Principle: fair(fifo) - but no need to 100% guarantee
Two types of queues
- one queue for each on-sale hot product, only keep x (count *1.x) requests, reject directly if more than it (ask them to retry after x mins)
- handled by one single server which reads from queue
- avoid distributed problems
- asynchronous processing
Cache activity page 
- protect order page by adding a server side generated random number
- server side will generate and update a non-cached a js including activity info and the random number for the order page, the js is not cached in client side

How to update stocks - concurrency 
How to send email notification, handle delivery
asynchronous, put them into a queue

Design train ticket booking - Chunyun

Protect against from hackers
- CAPTCHA
- filter based on ipAddress, acountId
- Prevent abuse: cheating

Design email

Counter
- 1. Use count table
- 2. Use Redis

Hierarchical Data
- id, parentid

Design the data structure for large social network
Design system to store/read images

Expire mp3 in cdn and generate new one after x min
pre-load

Playlist Shuffle
- Fisher-Yates shuffle
- a good, uniformly distributed mix of all the tracks, no burst

Design key-value Database(lookup service)
Design APIs for add/update/query
CA or CP?
- List main components first (meltable, sstable, index, bloom filter, WAL log)
- membership and failure detection(gossip)
- Partition (Virtual nodes, consistent hashing)
- Replication(Versioning)
- handling temporary failures
- read repair and anti-entropy using Merkle tree
- Sparse index file and loaded into memory
- compress block
Write ahead (redo) log
- Global(Local) secondary index
- Conflict resolution

- Request coordinator
- Smart client can send requests to right replica
- Cache: row cache, key cache
Merge and compaction

Advantages over B-tress
- turn random writes into sequential writes

Downside of LSM tree
- compaction will interfere with the performance (not consistency)
- more storage
- key may appear in multiple sstable, 
- more storage during compaction 


Design column database
How to update some fields

Design Delayed Job Scheduler
- single server or distributed
- generic or for specific tasks
- Java Delay Queue
- wheelTimer (Netty)
- qurtz

Top K Frequent Elements in Recent X mins
- data structure: hashmap + linkedlist
system design:
- approximate
- batch, update every x seconds
- 分阶段统计, bucket
- sample

- different intervals for different cache(hour, daily)

Design Ticket Master - TODO
- Analyze the requirements: QPS, SLA
- Autoscale, predicative scale based on estimated popularity of concert

- queuing and prioritizing

How to Design a Trending Algorithm - TODO
- offline pipelines
- cache, no need realtime
- candidates: hashtag

- signal: influencer, location, personalization

Make single-server datastore to peer-to-peer, linearly scalable, clustered systems
Dynomite

Design distributed priority queue service + Delayed queue
Distributed delay queues based on Dynomite + Redis
A Sorted Set containing queued elements by score.
A Hash set that contains message payload, with key as message ID.

A Sorted Set containing messages consumed by client but yet to be acknowledged. Un-ack set.

Design distributed file system
- master for meta data: where the chunk stores
- get file chunk location from master, read directly form slave

- replication, checksum

Image conversion service

- queue, asynchronously

Java Concurrency Tips

Executors
executor.execute vs submit
execute can only accept Runnable, execute returns nothing
- void execute(Runnable task)
- used for fire and forget

submit can accept Runnable and Callable, submit returns Future
Future submit(Callable task)
Future submit(Runnable task)
- used when need get the result

Exception handling - if task throws unhanded exception
- execute: the UncaughtExceptionHandler for the Thread running the task to be invoked. By default it prints exception to System.err. We can register global UncaughtExceptionHandler: Thread.setDefaultUncaughtExceptionHandler();
- submit: Any exception thrown will be bound in Future. Call future.get will throw an ExecutionException with the original Throwable as its cause.

RejectedExecutionHandler: ThreadPoolExecutor.CallerRunsPolicy(), NewThreadRunsPolicy

Threadpool
-Isolation: Allocate a separate thread pool for slow web requests within the same application

- corePoolSize 30, maximumPoolSize 50, ArrayBlockingQueue(100): what it means

Deadlock prevention
- Same Lock Ordering
- Lock Timeout

Mutual Exclusion
Hold and Wait
No Preemption

Circular Wait

Lock
Intrinsic Lock - synchronized
- every object and class is logically associated with a monitor
- wait, notify, notifyAll
ReentrantLock
- acquire lock interruptibly and with timeout - tryLock
- lockInterruptibly: used to interrupt thread when it is waiting for lock
- fair lock
- better performance: uses atomic variable and faster CAS operation
- get list of waiting thread for lock
disadvantages
- need put it into try-finally block, code unreadable
- easier to make mistake: forget to unlock in finally block
associate a counter with each thread

Condition
- wait for/notify a particular condition to become true
- can have multiple conditions on same lock
- enables the effect of having multiple wait-sets per object
- factors out the Object monitor methods (wait, notify and notifyAll) into distinct objects to give the effect of having multiple wait-sets per object
- Where a Lock replaces the use of synchronized methods and statements, a Condition replaces the use of the Object monitor methods.

- await releases the lock, suspend the thread, signal, signalAll
- BoundedBuffer
Condition notFull = lock.newCondition();
Condition notEmpty = lock.newCondition();

Semaphore
- can be released by other threads, useful for deadlock recovery
- P(acquire), V(release)

- a counter (integer) that allows a thread to get into a critical region if the value of the counter is greater than 0
- implementation: no lock, CAS
- acquire/(permits), tryAcquire, release/(permits)
CountDownLatch
CountDownLatch countDownLatch = new CountDownLatch(TOTAL_EVENTS);
countDownLatch.countDown();
countDownLatch.await();
Barrier
CyclicBarrier
- We can reuse CyclicBarrier even if Barrier is broken but can't reuse CountDownLatch

Programming Concurrency on the JVM
Read-write lock - ReentrantReadWriteLock
Thread local storage
Prefer local thread variables over static and shared when possible
Avoiding deadlocks by ordering the locks
Using atomic variables instead of synchronization
Holding locks for as short a time as possible
Avoid executing inside the critical section the code you don't control.

Use lock instead of synchronize
Use CyclicBarrier and CountdownLatch instead of wait/notify

volatile
- visibility: any thread that reads a field will see the most recently written value
- operations on it not be reordered

- not cached in registers or in caches; always read from and write to main memory - thus a little slower

Ensure Visibility
- cross the memory barrier on getters, use volatile, atomicXXX, readlock etc
Ensure Atomicity

Don't start threads in constructors
- use static factory methods, thread pool
Use tryLock - instead of lock
Cancel a task - future.cancel, executor.cancel

Initialization-on-demand holder idiom
Double-checked locking

AtomicLongArray
The long elements in the AtomicLongArray can be updated atomically
- compareAndSet: compare the value of a given element with a specified value, and if the two values are equal, set a new value for that element.
- getAndAdd(index, incrment), getAndIncrement(index), incrementAndGet

AtomicReferenceArray
AtomicReference

Concurrent data structures
ConcurrentHashMap
- guarantees O(log(n)) operations
- sorted, ceilingEntry/Key, floorEntry/Key

ConcurrentSkipListMap
ConcurrentSkipListSet
- thread-safe implementation of the SortedSet
- concurrent crud
- slower than TreeSet, don't use it unless in highly ...
- size is slow, O(n)
- head, nil are in every skip list
- when add, a random number generator is used to determine the levels of the new node
ConcurrentLinkedDeque
CopyOnWriteArrayList
CompletionService
extend thread class vs implement runnable interface
- use threadpool to share threads if implement runnable

Missed Signals - link
boolean wasSignalled = false;
synchronized(myMonitorObject){
  while(!wasSignalled){
    try{
      myMonitorObject.wait();
     } catch(InterruptedException e){...}
  }
  wasSignalled = false;
}

Don't call wait() on constant String's or global objects
LongAdder
- use more memory
- prefer AtomicLong when contention is low

unsafe

unsafe.getAndAddLong(this, valueOffset, 1L)
Practices
Implement AtomicInteger
synchronized
int currentValue;
int previousValue;

Implement ReadWriteLock and ReentrantReadWriteLock - link
private Map readingThreads =
     new HashMap();

 private int writeAccesses    = 0;
 private int writeRequests    = 0;
 private Thread writingThread = null;
Implement Lock and ReEntrantLock
int lockHoldCount;
long IdOfThreadCurrentlyHoldingLock;
IdOfThreadCurrentlyHoldingLock=Thread.currentThread().getId();
synchronized

Implement Semaphore
int permits; synchronized
while(this.permits == 0) wait();

Implement CyclicBarrier
2 or more threads wait for each other to reach a common barrier point. When all threads have reached common barrier point (i.e. when all threads have called await() method), all waiting threads are released.
CyclicBarrier(int parties)

int initialParties; //total parties
int partiesAwait; //parties yet to arrive
synchronized await
-
partiesAwait--;
if(partiesAwait>0){
  this.wait();
}

Implementing (Thread Safe) Stacklink
AtomicReference
- CAS
- if(head.compareAndSet(oldHead, newHead))
Design threadsafe real time counter
private AtomicLongArray counter = new AtomicLongArray(GRANULARITY);
private volatile int pos = 0;
- Another thread that update pos every second
void incrementPosition(){
    counter.set((pos + 1)%GRANULARITY, 0);
    pos = (pos + 1)%GRANULARITY;

}

Person A holds A, PB holds B, PC holds C, how can we guarantee always print ABC
- Use Semaphore

Print even and odd numbers using threads
Dining Philosophers Problem

System Design: Learning from Existing Products

Pinterest Home Feed
- best-first rather than newest-first
- separation of available, but unseen, content and content that’s already been presented

- fan out write pins to followers
- batch write
standby HBase cluster
- speculative execution
  - make another call to the standby cluster if the first call to primary cluster fails or doesn’t return within certain time
- randomly forward x percent of calls to warm up it

smart feed worker
- score/store pins: quality and meaning to users
- repins by followers, related pins, interest Pins
- 3 priority queues(hbase tables) for each user, use key-based sorting of HBase: userid, score, pin
Smart feed content generator
- pins selected are removed from pool(hbase table)
Smart feed service
materialized feed

discover interests - Pinterest
-  n-grams
- verify with search queries, public domain ontologies like Wikipedia titles
- blacklist, stop words
- implicit and explicit signals from users

related interests
- co-occurrence relationship for interests

search query
- user's demographic stats
- store and use previous queries
- adjacent queries in the same search session

- clicks afert submit query

Twitter
- users,tweets,be-followed,following
- persistent connection between client and server
Add a twitter
- Write it to the queue, return to client
- in background, save to db, push to (active) follower's queue
Cache aggressively
Vector Cache
- ids(user, tweet)
- user's followers(id)

Row Cache
- tweet

Fragment Cache
Page Cache

- separated and put into a different machine

Home Timeline - link
user's timeline vs home timeline

- precomputed and cached in redis for active users
Mixed Pull+Push
- local decisions about each producer/consumer pair
- how often producer adds a event, how ofter consumer views the event
- celebrity twitter - Pull

- normal user twitter - Push
Flock maintains the follower and followings lists.

Search

- tweets are stored in memory Lucene

- rerank based on social proof, the number of retweets, favorites, and replies.

- so every time, a tweet is retweeted or etc, no need update 

Lucene

Facebook
Separate data
Recent Activity - hot
- changes frequently, cache is frequently invalidated
- small, fast to regenerate them

Non-Recent Activity - cold
- a lot of data, slow to generate
- but don't change, cache will not be invalidated

Photo storage
- use cdn popular images
- use Haystack for other images - long tail
NFS photo infrastructure - old version
- huge metadata, most are not used(permission)
- use cdn to store images
- one(or more) to translate the filename into inode number, another to read the inode from disk, and a final one to read the file itself.
- need keep all inodes in memeory
- CDN good for hot data, but not so for cold data
long tail data
-  requests for less popular (often older) content

Haystack - link
Haystack Object Store
- a log structured (append-only) object store containing needles representing the stored objects
- eliminates the metadata overhead by aggregating hundreds of thousands of images in a single haystack store file
- at most one disk operation per read, keep all metadata in main memory
- haystack store file
- index - needle's location, not critical, can be rebuilt, written asynchronously

needle is identified by Offset, Key, Alternate Key, Cookie
- Don't allow overwrite existing needle, instead: add a new version and its index
- needles with tuple: duplicate Key, Alternate Key, Cookie, the largest offset is the most recent one
Read - Pass needle offset, key, alternate key, cookie and the data size
Delete - set a deleted bit in the flags

- Fault-tolerant: replicates

Photo Store Server - http layer
- keeps a minimal in-memory index
- Google sparse hash data structure

Compaction
- reclaim the space
- copy needles while skipping any duplicate or deleted entries

Haystack Cache
- insulation if upstream CDN nodes fail and

Timeline - your story
News Feed - Multifeed
- multi-fetch and aggregate stories at read time
- aggregator(staeless) and leafs
- aggregator servers sends request to all leaf servers and ranks the results.
- all data in leafs is in memory (memcached)
- fetch recent activity from all your friends
- gather and Group into stories
- rank stories by relevance
- more flexible, manageable
- fan-out reads can be bounded. writes, often cannot


- SSD
- others's story that related to you
- rank edges(stories) by importance(relevance) - link
- content score: ∑e = ue we de, wherein ∑e is the sum of the edge's rank, ue is the affinity score with the user who created the edge, we is the weight for the content type, and de is a time decay factor

- a friend's affinity score: ∑i = li ni wi, wherein ∑i is the sum of the interactions with that friend, li is the time since your last interaction (this would need to be weighted so that 1 day > 30 days), ni is the number of interacts, and wi is the weight of those interactions

- Activity(id, user_id, source_id, activity_type, edge_rank, parent_id, parent_type, data, time)
- timeline - user's story
- newsfeed
connections and activities

"Push" Model, or Fan-out-on-write
- a lot of write, only push to active users for non-celebrity users
- use write-optimized data store such as Cassandra, HBase, or BigTable
"Pull" Model, or Fan-out-on-load
- less data, less computation
- materialized what user already viewed into a table, only pull new data
- have a fallback mechanism if fails to pull - show old materialized data

Combination of push and pull
- make decision locally on per consumer/producer basis
- a frequent producer of events may overshadow events from less frequent producers
Selective Fanout Based On Consumer
- only fan-out to active/valuable users
Selective Fanout Based On Producer

- don't fan-out for high profile users
Pull for new users
- because nothing pushed to the user 
- when there is no enough data, pull from following

Make schema changes is expensive in traditional db
- such as add/remove indexes
- Users don't know what they don't know.

- All content is not created equal.

Reddit
- Treat nonlogged user as second class citizens
- Return cached content to them
- Put a limit on everything
- Provide timeout

- Expire data. Lock out old threads and create a fully rendered page and cache it.

Red Envelop
- store count, balance into redis
- use redis 
  - decr count, CAS to decrease balance
- ttl

Uber
Driver - Long connection
- matches drivers and riders, the supply and demand
Maps/ETA (estimated time of arrival)

Nutch2
- plugins, extensible
- fetching pages, filtering  and normalizing URLs and parsing responses

Etsy - link
$choose_connection = mt_rand() < $affinity;
Pruning Same/similiar message
combine similar stories into rollups
- group similar storeis

Cache
- seperate global and unique to the person
Cache Warming - gearman
- proactively
- TTL Caching
Lazy Below the Fold

JD
数据闭环,异构,聚合

EFK - Log collection/visualization
logstash/fleuent -> kafka -> Elasticsearch -> Kibana

Labels

Java (161) Lucene-Solr (112) Interview (64) All (58) J2SE (53) Algorithm (45) Soft Skills (39) Eclipse (33) Code Example (31) JavaScript (23) Linux (22) Spring (22) Tools (22) Windows (22) Web Development (20) Nutch2 (18) Bugs (17) Debug (16) Defects (14) Text Mining (14) Troubleshooting (14) J2EE (13) Network (13) Tips (12) PowerShell (11) Chrome (10) Problem Solving (10) Design (9) How to (9) Learning code (9) Performance (9) Security (9) UIMA (9) html (9) Http Client (8) Maven (8) bat (8) blogger (8) Big Data (7) Database (7) Google (7) Guava (7) JSON (7) Shell (7) System Design (7) ANT (6) Coding Skills (6) Lesson Learned (6) Programmer Skills (6) Scala (6) css (6) Algorithm Series (5) Cache (5) Continuous Integration (5) IDE (5) adsense (5) xml (5) AIX (4) Become a Better You (4) Code Quality (4) Concurrency (4) GAE (4) Git (4) Good Programming Practices (4) Jackson (4) Life (4) Memory Usage (4) Miscs (4) OpenNLP (4) Project Managment (4) Review (4) Spark (4) Testing (4) ads (4) regular-expression (4) Android (3) Apache Spark (3) Distributed (3) Dynamic Languages (3) Eclipse RCP (3) English (3) Happy Hacking (3) IBM (3) J2SE Knowledge Series (3) JAX-RS (3) Jetty (3) Mac (3) Python (3) Restful Web Service (3) Script (3) regex (3) seo (3) .Net (2) Android Studio (2) Apache (2) Apache Procrun (2) Architecture (2) Batch (2) Bit Operation (2) Build (2) Building Scalable Web Sites (2) C# (2) C/C++ (2) CSV (2) Career (2) Cassandra (2) Fiddler (2) Google Drive (2) Gson (2) How to Interview (2) Html Parser (2) Http (2) Image Tools (2) JQuery (2) Jersey (2) LDAP (2) Logging (2) Software Issues (2) Storage (2) Text Search (2) xml parser (2) AOP (1) Application Design (1) AspectJ (1) Chrome DevTools (1) Cloud (1) Codility (1) Data Mining (1) Data Structure (1) ExceptionUtils (1) Exif (1) Feature Request (1) FindBugs (1) Firefox (1) Greasemonkey (1) HTML5 (1) Httpd (1) I18N (1) IBM Java Thread Dump Analyzer (1) Invest (1) JDK Source Code (1) JDK8 (1) JMX (1) Lazy Developer (1) Machine Learning (1) Mobile (1) My Plan for 2010 (1) Netbeans (1) Notes (1) Operating System (1) Perl (1) Problems (1) Product Architecture (1) Programming Life (1) Quality (1) Redhat (1) Redis (1) RxJava (1) Solutions logs (1) Team Management (1) Thread Dump Analyzer (1) Visualization (1) boilerpipe (1) htm (1) ongoing (1) procrun (1) rss (1)

Popular Posts