System Design: Learning from Existing Products


Pinterest Home Feed
- best-first rather than newest-first
- separation of available, but unseen, content and content that’s already been presented

- fan out write pins to followers
- batch write
standby HBase cluster
- speculative execution
  - make another call to the standby cluster if the first call to primary cluster fails or doesn’t return within certain time
- randomly forward x percent of calls to warm up it

smart feed worker
- score/store pins: quality and meaning to users
- repins by followers, related pins, interest Pins
- 3 priority queues(hbase tables) for each user, use key-based sorting of HBase: userid, score, pin
Smart feed content generator
- pins selected are removed from pool(hbase table)
Smart feed service
materialized feed

discover interests - Pinterest
-  n-grams
- verify with search queries, public domain ontologies like Wikipedia titles
- blacklist, stop words
- implicit and explicit signals from users

related interests
- co-occurrence relationship for interests

search query
- user's demographic stats
- store and use previous queries
- adjacent queries in the same search session

- clicks afert submit query

Twitter
- users,tweets,be-followed,following
- persistent connection between client and server
Add a twitter
- Write it to the queue, return to client
- in background, save to db, push to (active) follower's queue
Cache aggressively
Vector Cache
- ids(user, tweet)
- user's followers(id)

Row Cache
- tweet

Fragment Cache
Page Cache

- separated and put into a different machine

Home Timeline - link
user's timeline vs home timeline

- precomputed and cached in redis for active users
Mixed Pull+Push
- local decisions about each producer/consumer pair
- how often producer adds a event, how ofter consumer views the event
- celebrity twitter - Pull

- normal user twitter - Push
Flock maintains the follower and followings lists.

Search

- tweets are stored in memory Lucene

- rerank based on social proof, the number of retweets, favorites, and replies.

- so every time, a tweet is retweeted or etc, no need update 

Lucene

Facebook
Separate data
Recent Activity - hot
- changes frequently, cache is frequently invalidated
- small, fast to regenerate them

Non-Recent Activity - cold
- a lot of data, slow to generate
- but don't change, cache will not be invalidated

Photo storage
- use cdn popular images
- use Haystack for other images - long tail
NFS photo infrastructure - old version
- huge metadata, most are not used(permission)
- use cdn to store images
- one(or more) to translate the filename into inode number, another to read the inode from disk, and a final one to read the file itself.
- need keep all inodes in memeory
- CDN good for hot data, but not so for cold data
long tail data
-  requests for less popular (often older) content

Haystack - link
Haystack Object Store
- a log structured (append-only) object store containing needles representing the stored objects
- eliminates the metadata overhead by aggregating hundreds of thousands of images in a single haystack store file
- at most one disk operation per read, keep all metadata in main memory
- haystack store file
- index - needle's location, not critical, can be rebuilt, written asynchronously

needle is identified by Offset, Key, Alternate Key, Cookie
- Don't allow overwrite existing needle, instead: add a new version and its index
- needles with tuple: duplicate Key, Alternate Key, Cookie, the largest offset is the most recent one
Read - Pass needle offset, key, alternate key, cookie and the data size
Delete - set a deleted bit in the flags

- Fault-tolerant: replicates

Photo Store Server - http layer
- keeps a minimal in-memory index
- Google sparse hash data structure

Compaction
- reclaim the space
- copy needles while skipping any duplicate or deleted entries

Haystack Cache
- insulation if upstream CDN nodes fail and

Timeline - your story
News Feed - Multifeed
- multi-fetch and aggregate stories at read time
- aggregator(staeless) and leafs
- aggregator servers sends request to all leaf servers and ranks the results.
- all data in leafs is in memory (memcached)
- fetch recent activity from all your friends
- gather and Group into stories
- rank stories by relevance
- more flexible, manageable
- fan-out reads can be bounded. writes, often cannot


- SSD
- others's story that related to you
- rank edges(stories) by importance(relevance) - link
- content score: ∑e = ue we de, wherein ∑e is the sum of the edge's rank, ue is the affinity score with the user who created the edge, we is the weight for the content type, and de is a time decay factor

- a friend's affinity score: ∑i = li ni wi, wherein ∑i is the sum of the interactions with that friend, li is the time since your last interaction (this would need to be weighted so that 1 day > 30 days), ni is the number of interacts, and wi is the weight of those interactions

- Activity(id, user_id, source_id, activity_type, edge_rank, parent_id, parent_type, data, time)
- timeline - user's story
- newsfeed
connections and activities

"Push" Model, or Fan-out-on-write
- a lot of write, only push to active users for non-celebrity users
- use write-optimized data store such as Cassandra, HBase, or BigTable
"Pull" Model, or Fan-out-on-load
- less data, less computation
- materialized what user already viewed into a table, only pull new data
- have a fallback mechanism if fails to pull - show old materialized data

Combination of push and pull
- make decision locally on per consumer/producer basis
- a frequent producer of events may overshadow events from less frequent producers
Selective Fanout Based On Consumer
- only fan-out to active/valuable users
Selective Fanout Based On Producer

- don't fan-out for high profile users
Pull for new users
- because nothing pushed to the user 
- when there is no enough data, pull from following

Make schema changes is expensive in traditional db
- such as add/remove indexes
- Users don't know what they don't know.

- All content is not created equal.

Reddit
- Treat nonlogged user as second class citizens
- Return cached content to them
- Put a limit on everything
- Provide timeout

- Expire data. Lock out old threads and create a fully rendered page and cache it.

Red Envelop
- store count, balance into redis
- use redis 
  - decr count, CAS to decrease balance
- ttl

Uber
Driver - Long connection
- matches drivers and riders, the supply and demand
Maps/ETA (estimated time of arrival)

Nutch2
- plugins, extensible
- fetching pages, filtering  and normalizing URLs and parsing responses

Etsy - link
$choose_connection = mt_rand() < $affinity;
Pruning Same/similiar message
combine similar stories into rollups
- group similar storeis

Cache
- seperate global and unique to the person
Cache Warming - gearman
- proactively
- TTL Caching
Lazy Below the Fold

JD
数据闭环,异构,聚合

EFK - Log collection/visualization
logstash/fleuent -> kafka -> Elasticsearch -> Kibana

Labels

adsense (5) Algorithm (69) Algorithm Series (35) Android (7) ANT (6) bat (8) Big Data (7) Blogger (14) Bugs (6) Cache (5) Chrome (19) Code Example (29) Code Quality (7) Coding Skills (5) Database (7) Debug (16) Design (5) Dev Tips (63) Eclipse (32) Git (5) Google (33) Guava (7) How to (9) Http Client (8) IDE (7) Interview (88) J2EE (13) J2SE (49) Java (186) JavaScript (27) JSON (7) Learning code (9) Lesson Learned (6) Linux (26) Lucene-Solr (112) Mac (10) Maven (8) Network (9) Nutch2 (18) Performance (9) PowerShell (11) Problem Solving (11) Programmer Skills (6) regex (5) Scala (6) Security (9) Soft Skills (38) Spring (22) System Design (11) Testing (7) Text Mining (14) Tips (17) Tools (24) Troubleshooting (29) UIMA (9) Web Development (19) Windows (21) xml (5)