Sunday, July 28, 2019

How disk/memory space affects long column names in Cassandra

Will disk space be affected with long column name in Cassandra/Datastax, Yes.

Usually a column is stored as a tuple with name, value and timestamp. If column's name was large then it needs to allocate the space for each value that it persists.

Here, we also need the consideration of keycaches, row caches and memtable consideration as this also replicates the same and uses more space in memory.

A simple show up on disk space utilization.

CREATE TABLE test_pp.t1 (columnoftablet1inthekeyspacetestpp text PRIMARY KEY);

INSERT INTO t1 (columnoftablet1inthekeyspacetestpp) VALUES ( 'a;sdkfjalksjfdl;ewjiekdnvasdif') ;

$nodetool flush
$ls -l *Data.db
-rw-r--r-- 1 cassandra cassandra 59 Jun 17 11:58 mc-1-big-Data.db

INSERT INTO t1 (columnoftablet1inthekeyspacetestpp) VALUES ( 'k') ;

$nodetool flush
$ls -l *Data.db
-rw-r--r-- 1 cassandra cassandra 30 Jun 17 11:59 mc-2-big-Data.db

We can see the size difference of data files where first insert is of 59 bytes and send insert is off 30 bytes.

***

No comments:

Post a Comment