System Design Practices


Design home page stream
Design feed
Mixed approaches - Push + Pull from 
-- selectively disable push for celebrities, pull them instead.
https://engineering.pinterest.com/blog/building-scalable-and-available-home-feed
Design Decisions For Scaling Your High Traffic Feeds

Cassandra + Denormalize
Redis + Normalized

Selective Fanout Based On Producer/Consumer

URL shortener
https://www.youtube.com/watch?v=fMZMm_0ZhK4 - Range based 
- read heavy
- separate read/write to different apps
- able to disable write functions
- id (auto increment), actual url, and shorten url

- base 62 encode the id to string(shorten url)
  - generate auto-generate id
  - can use sql or nosql as long as find a way to generate unique long
- rate limit, prevent abuse
Insert Long Url to db, get the primary key: long
Convert base10 to Base62[0-9a-zA-Z], other characters such as -,_+, convert long to short url
Decode short url to long number and query db to get long url.

How to make it scalable
How to analyze short url click event
301 or 302?
- custom url

Extension - popular urls in last hour/day

Generate unique ID numbers
Twitter Snowflake
- 41bit timestamp
- 10bit workerid
- 12bit sequence
Flickr Ticket Servers
TicketServer1:
auto-increment-increment = 2
auto-increment-offset = 1

TicketServer2:
auto-increment-increment = 2
auto-increment-offset = 2
Map many logical shards to fewer physical shards
How to assign shards when we can add/remove nodes?

User Overseer(leader) to change concurrent into serialized problem


Crawler
- politeness: rate limit
- put same host pages into same queue
- different priority for different sites - multiple queues
- normalization/canonicalization
- directed graph
- each URL: last visited time, score, fingerprint
- Use Bloom filters to remove duplicates
Re-crawling
- predict which websites are most likely to have frequent update, put those into the fetchers’ priority queue.
How to handle realtime news?
- each url: next visited time based on its score and how frequently it changes
- if page(links, content) is (mostly) not changed, increase its next visit time(cut+2t) otherwise decrease its next visit time
- begin with a seed set
- use a simple fingerprint such as a checksum
- the frontier
- normalized URL
DNS resolver
Locality
- Put data centers in different regions - machines in asian to crawl asian web sites

To crawl price histories of products
signals:
- previous price history changes
- how many reviews they have

- To detect new products: provide api so client can add or manually crawl new added pages.

Design an in-memory search engine
How to index and store in memory
How to support free text queries, phrase queries
Map<String, List<Document>>
List<Document> is sorted
Document: docId, List<Long> positionIds
List<Long> is sorted
How to save to files
How to merge small files into big files
-- When save to file, make it sorted by word
-- Use merge sort to merge multiple fields
How to make it scalable - how solr cloud works 

Auto completion in search
- single node
- Trie, Topk


Design Hit Counter - how to count number of requests in last second, minute and hour - link1,

Generate Game Map
- brute force, randomly generate one, check whether it can be solved in limit time by machine

- different difficulty level

Design score/rank system for social game.
Need update score realtime?
- yes, for this user's operation, when score is small
- no, for others' operation, when score is big
Rank - realtime?
- yes, when user is ranked in X(1000),
- no, if user ranks is below Y(1M)
Average score
- rank: use sorted tree - sorted by score, node splitted by score range
- table: userid, score
- sharding

- append, not update existing data
- separate score change to different types: real time vs batchable
- batch score change

Search nearby places
- GeoHash

Design Slack
Design Chat app like messenger, what's app
https://www.youtube.com/watch?v=zKPNUMkwOJE&t=2s
Websockets
ServerA forward message to ServerB which forwards to clientB
How to manage/detect friends status - presence
- reduce message, avoid O(n^2)
- Each user subscribes to its subscription list in server
- When one user state's changed, it uses pub/sub to update all related users's subscription list in server side
- Server pushes friend status to each user  in some fixed interval - if there is change (batch)

How to store messages
Typing indicators
- Only send it to users who is currently interacting with this user/group

Message status: delivered, seen

De-duplication

- url dedup: use bloom filter
Detect duplicate documents in billions of urls
- back-of-the-envelope calculations
- (30 bytes for url + int/long(4 or 8 bytes)) * 10^9(~1g) ~ 34g
- A1: Hash and split into multiple files  - slow (even with multiple threads) as a lot of IOs
- A2: Parallelize in multiple machines - like map-reduce

Design log collection and analysis system

- Auxiliary service, should not affect the existing service (Monitor system memory and cpu, adjust the push rate accordingly)

Design shopping cart
addToCart,removeFromCart,changeQuantity
- monitor
- abandoned cart
guest cart
- cookie
- merged if user login later
- check items availability at some point

- call other services - stock service(price, availability), metadata service, images(cdn)

ecommerce - amazon
- nosql: wide column for product table

- concurrency for order: optimistic, pessimistic
- analytics
- abandoned cart, user behavior(click,view,add/remove from cart)
products
checkout
Payment
- two phrases: Authorization, Capture
shipping

customer service

Design rate limiter
- Separate from application
- warn/notify user before user reaches 90% limit
Algorithms:
Token bucket, Leaky bucket, rolling window, counter
Guava RateLimiter

Design Miao Sha
- Use cases: different phrases of the activity: before, available, sold out
- cache boolean available instead of the count

- Principle: fair(fifo) - but no need to 100% guarantee
Two types of queues
- one queue for each on-sale hot product, only keep x (count *1.x) requests, reject directly if more than it (ask them to retry after x mins)
- handled by one single server which reads from queue
- avoid distributed problems
- asynchronous processing
Cache activity page 
- protect order page by adding a server side generated random number
- server side will generate and update a non-cached a js including activity info and the random number for the order page, the js is not cached in client side

How to update stocks - concurrency 
How to send email notification, handle delivery
asynchronous, put them into a queue

Design train ticket booking - Chunyun

Protect against from hackers
- CAPTCHA
- filter based on ipAddress, acountId
- Prevent abuse: cheating

Design email

Counter
- 1. Use count table
- 2. Use Redis

Hierarchical Data
- id, parentid

Design the data structure for large social network
Design system to store/read images

Expire mp3 in cdn and generate new one after x min
pre-load

Playlist Shuffle
- Fisher-Yates shuffle
- a good, uniformly distributed mix of all the tracks, no burst

Design key-value Database(lookup service)
Design APIs for add/update/query
CA or CP?
- List main components first (meltable, sstable, index, bloom filter, WAL log)
- membership and failure detection(gossip)
- Partition (Virtual nodes, consistent hashing)
- Replication(Versioning)
- handling temporary failures
- read repair and anti-entropy using Merkle tree
- Sparse index file and loaded into memory
- compress block
Write ahead (redo) log
- Global(Local) secondary index
- Conflict resolution

- Request coordinator
- Smart client can send requests to right replica
- Cache: row cache, key cache
Merge and compaction

Advantages over B-tress
- turn random writes into sequential writes

Downside of LSM tree
- compaction will interfere with the performance (not consistency)
- more storage
- key may appear in multiple sstable, 
- more storage during compaction 


Design column database
How to update some fields

Design Delayed Job Scheduler
- single server or distributed
- generic or for specific tasks
- Java Delay Queue
- wheelTimer (Netty)
- qurtz

Top K Frequent Elements in Recent X mins
- data structure: hashmap + linkedlist
system design:
- approximate
- batch, update every x seconds
- 分阶段统计, bucket
- sample

- different intervals for different cache(hour, daily)

Design Ticket Master - TODO
- Analyze the requirements: QPS, SLA
- Autoscale, predicative scale based on estimated popularity of concert

- queuing and prioritizing

How to Design a Trending Algorithm - TODO
- offline pipelines
- cache, no need realtime
- candidates: hashtag

- signal: influencer, location, personalization

Make single-server datastore to peer-to-peer, linearly scalable, clustered systems
Dynomite

Design distributed priority queue service + Delayed queue
Distributed delay queues based on Dynomite + Redis
A Sorted Set containing queued elements by score.
A Hash set that contains message payload, with key as message ID.

A Sorted Set containing messages consumed by client but yet to be acknowledged. Un-ack set.

Design distributed file system
- master for meta data: where the chunk stores
- get file chunk location from master, read directly form slave

- replication, checksum

Image conversion service

- queue, asynchronously

Labels

adsense (5) Algorithm (69) Algorithm Series (35) Android (7) ANT (6) bat (8) Big Data (7) Blogger (14) Bugs (6) Cache (5) Chrome (19) Code Example (29) Code Quality (7) Coding Skills (5) Database (7) Debug (16) Design (5) Dev Tips (63) Eclipse (32) Git (5) Google (33) Guava (7) How to (9) Http Client (8) IDE (7) Interview (88) J2EE (13) J2SE (49) Java (186) JavaScript (27) JSON (7) Learning code (9) Lesson Learned (6) Linux (26) Lucene-Solr (112) Mac (10) Maven (8) Network (9) Nutch2 (18) Performance (9) PowerShell (11) Problem Solving (11) Programmer Skills (6) regex (5) Scala (6) Security (9) Soft Skills (38) Spring (22) System Design (11) Testing (7) Text Mining (14) Tips (17) Tools (24) Troubleshooting (29) UIMA (9) Web Development (19) Windows (21) xml (5)