DSBulk (Datastax bulk loader) is used to load, unload and count the data from Cassandra DB.
Below is the command to run the load or unload using configuration files.
nohup dsbulk unload -h 100.37.24.174, 1100.37.24.175 -maxErrors 1000 -u cassandra -p cassandra -f /datastax/toolbox/dsbulk-1.3.3/conf/unload_details.conf &
The parameters used below are better performed in our environment.
Configuration file to unload data:
[cassandra@localhost]$ cat unload_details.conf dsbulk { connector.name = "json" connector.json.url = "/cassandra/backup/dsbulk_unload/" connector.json.fileNameFormat = "output-%0,6d.json" connector.json.maxRecords = 10000 connector.json.generatorFeatures = { ESCAPE_NON_ASCII: true, QUOTE_FIELD_NAMES: true } schema.keyspace = "bypramod_keyspace" schema.table = "bypramod_table" executor.maxPerSecond = 300 executor.maxInFlight = 50 executor.continuousPaging.enabled = false driver.query.fetchSize = 5000 driver.policy.maxRetries = 30 driver.socket.readTimeout = 240000 }
Configuration file to load data:
[cassandra@localhost]$ cat load_details.conf dsbulk { connector.name = "json" schema.keyspace = "bypramod_keyspace" schema.table = "bypramod_table2" connector.json.url = "/cassandra/backup/dsbulk_unload/" connector.json.fileNameFormat = "output-%0,6d.json" connector.json.maxRecords = 10000 connector.json.generatorFeatures = { ESCAPE_NON_ASCII: true, QUOTE_FIELD_NAMES: true } executor.maxPerSecond = 2500 executor.maxInFlight = 30 executor.continuousPaging.enabled = false driver.query.fetchSize = 100 driver.policy.maxRetries = 30 }
Note:
The format string "%0,6d" is a placeholder used in programming languages, particularly in languages like Python or C, to represent a formatted numerical value. Here's a breakdown of each part:
%: This is the format specifier that indicates a placeholder.
0: This is a flag that specifies zero-padding. In this case, it means that if the number has fewer than 6 digits, it will be padded with zeros on the left.
,: This is an optional thousands separator, but in this context, it separates the padding specifier (0) from the width specifier (6).
6d: This specifies the width of the field. It indicates that the numerical value should be formatted to take up at least 6 characters, including padding if necessary.
d: This is the conversion specifier for a decimal integer.
In summary, "%0,6d" is used to format an integer by zero-padding it on the left, ensuring that it takes up at least 6 characters. If the number has fewer than 6 digits, it will be padded with zeros.
dsbulk { connector.name = "csv" } datastax-java-driver { advanced { ssl-engine-factory { class = DefaultSslEngineFactory truststore-password = "truststore" truststore-path = "/cassandra/keystores/client.truststore" hostname-validation="false" } } }
***