Cassandra supports two basic compaction strategies:
- Size tiered compaction
- Level tiered compaction
Before talking about these two compaction strategies, let’s take a look at what is compaction.
In Cassandra, each write is written into MemTables and commitlogs in realtime and when Memtable fills up (or when manually memtables are flushed) data is written into the persistent files called SSTables.
Memtable, as the name suggests is in memory table and commitlogs are the files to ensure retrieval of data if node crashes before data is written from memtable to SSTables.
Even for any update operations, the data is written into SSTables in sequential manner only; unlike RDBMS where the existing rows are searched in data files and then the said rows are updated with new data. This is done to ensure good write performance as the heavy operation to seek an existing row in data files (here, SSTables) is taken out.
However, the flipside of this approach is that data for a single row may span across multiple SSTables. This impacts the read operation because multiple SSTables will have to be accessed in order to read a single row.
To avoid this issue with read operation, cassandra performs ‘compaction’ on each column family. During compaction, the data of each row is compacted into a single SSTable. This ensures good read performance by placing data for each row in single file.
However, we must recognize at this point that as the compaction is an asynchronous process; in worst case scenario, there would be situations where read will be performed from multiple SSTables until compaction is performed for the SSTables containing said row.
This becomes critical factor for selection of compaction strategy for your column family.
As mentioned above there are two strategies for compaction process.
Size Tiered CompactionIn this compaction process, SSTables of fixed size are created initially and ones the number of SSTables reach a threshold count, they are compacted into a bigger SSTable.
For example, let’s consider initial SSTable size to be 5MB and the threshold count to be 5: In this case, compaction is triggered when 5 SSTables of 5MB each are filled and the compaction process will create a single SSTable of bigger size by compacting all these 5 SSTables.
Level Tiered CompactionIn this compaction process, sstables of a fixed, relatively small size (5MB by default in Cassandra’s implementation) are created and grouped into “levels”. Within each level, sstables are guaranteed to be non-overlapping. Each level is ten times as large as the previous.
L0: SSTables with size 5MB
L1: SSTables with size 50MB
L2: SSTables with size 500MB
New SSTables being added at level L(n) are immediately compacted with the SSTables at Level L(n+1). Once Level L(n+1) gets filled; extra SSTables are promoted to level L(n+2) and subsequent SSTables are compacted with SStables in Level L(n+2).
This strategy ensures that 90% of all reads will be satisfied from single SSTable.
Comparison at high level
- Size Tiered compaction is better suited for write heavy applications. As the sstables are not immediately compacted with next level, the write performance will be better.
- For column families having very wide rows with frequent updates (for example, manual index rows) the read performance will be severely hit. I have observed a case where a single read accessed almost 800 SSTables for a column family with wide row containing more than a million columns – case of frequent readTimeOuts! (This is also a learning for Data Model design and explains why wide rows with millions of columns should be avoided!)
- Level Tiered compaction is better suited for column families with heavy read operations and/or heavy update-operations! (Important: Different between a write and an update!)