If you don’t already know about collections in Cassandra CQL, following page provides excellent details about the same –
However, if you have been using cassandra from pre-CQL days, you would have worked with low level thrift APIs and hence you would be tempted to think how the data looks like in cassandra’s internal storage structure [which is very well articulated (exposed?) by thrift APIs]!
I have a big hangover of my extensive work with thrift API and hence I always get tempted to think how my CQL data looks like in internal storage structure.
Following is a CQL column-family containing different types of collections – set/list/map followed by corresponding look of data in internal storage structure:
Internal Storage Structure: (CLI result)
=> (name=, value=, timestamp=1411701396643000)
=> (name=emails:62616767696e7340676d61696c2e636f6d, value=, timestamp=1411701396643000)
=> (name=emails:664062616767696e732e636f6d, value=, timestamp=1411701396643000)
=> (name=first_name, value=46726f646f, timestamp=1411701396643000)
=> (name=last_name, value=42616767696e73, timestamp=1411701396643000)
=> (name=numbermap:00000001, value=0000000b, timestamp=1411703133238000)
=> (name=numbermap:00000002, value=0000000c, timestamp=1411703133238000)
=> (name=numbers:534eaca0452c11e4932561c636c97db3, value=00000001, timestamp=1411701740650000)
=> (name=numbers:534eaca1452c11e4932561c636c97db3, value=00000002, timestamp=1411701740650000)
=> (name=numbers:534eaca2452c11e4932561c636c97db3, value=00000003, timestamp=1411701740650000)
=> (name=todo:00000139f7134980, value=76616c756531, timestamp=1411702558812000)
=> (name=top_places:a3300bb0452c11e4932561c636c97db3, value=726976656e64656c6c, timestamp=1411701874667000)
=> (name=top_places:a3300bb1452c11e4932561c636c97db3, value=726f68616e, timestamp=1411701874667000)
Some important points to note:
- The ‘set’ field (emails) do not have any column-values in CLI result. As set is expected to store unique items, the values (rather, hash values) are stored as part of column-names only!
- On the contrary, as ‘list’ field (numbers/top_places) is expected to have duplicate values, the actual value of list elements is stored in column-value and not in column-name to avoid overwrite of duplicate elements!
- ‘map’ field (numbermap/todo): hash/hex of key is used in column-name and hash of values is used in column-values.