Some learnings: February 2018

Wednesday, February 21, 2018

Performance digging in Cassandra

1) Verify NTP, as last write wins if servers have different time.
2) Problems with streaming/Repair
3) Cleanup when required.
4) Slow Queries--> Compactions, Histograms, Tracing.
5) Nodes Failing

Few Linux command:
iostat -> Disk level statics (iostat -dmx 1 10)
Here,
avgqu-sz (disk queue),
svctm (service time)
htop -> process overvies
iftop & netstat & ss -> network utilities
dstat -> All the above in 1 tool
strace -> for hardcore
jstat -gcutil 89760 250 10000

Monitoring tools:
Munin, nagios, Icinga => Graphing system metrics and application metrics.

Diagnosing Problems:
a) If you see weird consistency issues, even on consistency all? Its possible you're dealing with a clock sync issue.

Things to Know:
a) Throughput is determined by how often the garbage collection runs and pauses the application.

b) jstat -gc <pid> 250ms 0, will print the status of all generations every quarter second.

c) Collecting garbage is fast, copying objects between eden/survivor spaces and generations is slow.

d) PID (thread ID) from top.out can be related directly to jstact.out as well, via the "nid" parameter. PID 17290 in this case, converted to hex, is 438A

***