Programmer: Lifelong Learning: Cassandra in Theory and Practice

column-family store
- not columnar database (like redshift) which are used for analytics jobs
data locality at the partition level, not the column level

Not using the “in” query for multiple partitions
- Query them one by one instead

Primary key vs partition key
The first part of primary key is partition key which determines which node stores the data.
Composite/compound keys
skinny rows
- the primary key only contains the partition key
wide rows

- the primary key contains columns other than the partition key

primary key restrictions
- it must contain all the primary key columns of the base table. This ensures that every row of the view correspond to exactly one row of the base table.
- it can only contain a single column that is not a primary key column in the base table.

Materialized view
- implemented as normal Cassandra table which takes as the same amount of disk space as the base table

Table design
- Determine what queries to support, use different tables(or Materialized view) for different queries if needed
- Avoid hot spot and unbounded row growth
- Spreads data evenly
- Minimal partitions read
DESCending for time to search for recent, time-based data

We can only run EQ or IN in partition key.

How deletes are implemented and why

Delete and tombstones
- grace period
Understanding Deletes
A row tombstone is a row with no liveness_info and no cells.
A cell tombstone: no liveness_info at the column level
Range delete
Partition delete

Local Index
Secondary index is slow, requires to access all nodes
- only suited for low cardinality data

SASI - SStable-Attached Secondary Indexing
- a new on-disk format based on B+ trees
- it attaches to each sstable/memtable its own immutable index file

memtable
- SSTable in memory
- write-back cache

off-heap memory
- Same concept for Cassandra, Kafka

Cache
- serialize cache data (row-cache, key cache) to avoid cold restart

clqsh
DESCRIBE keyspaces;
describe tables;

COPY keyspace.table to 'output.txt';
COPY keyspace.table(column1,c2) to 'output.txt';

Write query result to file
cqlsh -e'cqlQuery' > output.txt

Use CAPTURE command to export the query result to a file:
cqlsh> CAPTURE
cqlsh> CAPTURE '~/output.txt';

File Store Format
Data (Data.db)
Primary Index (Index.db)
SSTable Index Summary (SUMMARY.db)
Bloom filter (Filter.db)
Compression Information (CompressionInfo.db)
Statistics (Statistics.db)
SSTable Table of Contents (TOC.txt)

Secondary Index (SI_.*.db)

Lightweight transactions
Conditional upadte
If not exist, if
CAS, implanting the Paxos algorithm

Read/Write Consistency Levels

Misc
Cassandra breaks a row up into columns that can be updated independently

Cassandra in Theory and Practice

Labels