Some learnings: September 2023

Tuesday, September 26, 2023

How to identify the exact amount of data stored in TigerGraph DB

The commands mentioned can help you gather information about the data stored in a TigerGraph database. However, it's important to note that these commands provide different levels of granularity and precision in terms of data statistics. Let's break down what each command does and how precise they are:

1. `kill -SIGUSR2 $(pgrep gped)`:

- This command sends the SIGUSR2 signal to the TigerGraph GPE (Graph Processing Engine) process.

- The GPE process, when it receives this signal, writes information about the graph's vertex count, edge count, total memory usage, and on-disk size to a file called "topology_memory.txt" in the GPE logs directory.

- The information provided by this command is relatively precise and directly reflects the current state of the graph data in the GPE.

2. `curl -X POST "http://localhost:9000/builtins/your_graph_name" -d '{"function":"stat_vertex_number","type":"*"}'`:

- This command makes an API request to TigerGraph to get accurate statistics about the number of vertices in your graph.

- It provides precise information about the vertex count for the specified graph.

- You can replace `"your_graph_name"` with the actual name of your graph.

3. `gstatusgraph`:

- This command provides a general overview of the graph's status, including the number of partitions, replicas, and more.

- While it can give you an idea of the overall state of the graph, it may not provide as detailed and precise information as the previous commands.

4. `du -sh /tigergraph/db/data/gstore`:

- This command checks the disk space usage of the TigerGraph data directory.

- It provides an estimate of the space used by the entire database, including data, configuration files, and other related files.

- This estimate can be larger than the actual data size due to the reasons you mentioned (configuration files, data consistency, replication, etc.).

In summary, the first two commands (`kill -SIGUSR2` and the `curl` command) are the most precise and provide specific information about the graph's data. The `gstatusgraph` command offers a general overview, while the `du` command provides an estimate of the disk space used by the database but may overstate the data size. Depending on your needs, you can use one or more of these commands to monitor and gather information about TigerGraph database.

***

Commands to verify if the attached disk in spinning disk(hdd) or Solid state drive (ssd)

Use either of the commands provided below to determine if a disk is a spinning disk (HDD) or a solid-state drive (SSD). Both commands will provide the same information by checking the "rotational" property of the disk. If it returns 1, it's an HDD, and if it returns 0, it's an SSD.

$ lsblk -d -o name,rota

NAME ROTA

sda 0

sdb 0

sdc 0

(or)

$ cat /sys/block/sd*/queue/rotational

Information: To find available storage you can do it with command "lsblk"

***

Friday, September 15, 2023

rsync

$ cd

$ mkdir bypramod

$ sudo rsync -a --delete bypramod/ /mount/file

***

Friday, September 8, 2023

Google Logging(glog) Library, output Format

Google Logging (glog) is a C++14 library that implements application-level logging. The library provides logging APIs based on C++-style streams and various helper macros

Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line msg

[IWEF]yyyymmdd hh:mm:ss.uuuuuu threadid file:line msg

IWEF ==> INFO, WARNING, ERROR AND FATAL

yyyymmdd ==> YEAR, MONTH AND DATE

hh:mm:ss.uuuuuu ==> HOURS, MINUTES, SECONDS AND MICROSECONDS

threadid ==> Process thread ID

file:line ==> File name and line number

msg ==> Actual log message, multiline

Official link: https://github.com/google/glog

***

Wednesday, September 6, 2023

Introduction to Transactions in Databases

A transaction is a logical, atomic unit of work that contains one or more SQL statements.

A transaction groups SQL statements so that they are either all committed, which means they are applied to the database, or all rolled back, which means they are undone from the database. Oracle Database assigns every transaction a unique identifier called a transaction ID.

All Oracle transactions obey the basic properties of a database transaction, known as ACID properties. ACID is an acronym for the following:

Atomicity

All tasks of a transaction are performed or none of them are. There are no partial transactions. For example, if a transaction starts updating 100 rows, but the system fails after 20 updates, then the database rolls back the changes to these 20 rows.

Consistency

The transaction takes the database from one consistent state to another consistent state. For example, in a banking transaction that debits a savings account and credits a checking account, a failure must not cause the database to credit only one account, which would lead to inconsistent data.

Isolation

The effect of a transaction is not visible to other transactions until the transaction is committed. For example, one user updating the hr.employees table does not see the uncommitted changes to employees made concurrently by another user. Thus, it appears to users as if transactions are executing serially.

Durability

Changes made by committed transactions are permanent. After a transaction completes, the database ensures through its recovery mechanisms that changes from the transaction are not lost.

The use of transactions is one of the most important ways that a database management system differs from a file system.

Credits: Oracle documentation.

***

Command to get process elapsed time in Linux

Command: $ ps -p 17176 -o etime

Here, 17176 is the process id of required process. Which we can get with "ps -ef | grep chrome".

***

Using 'egrep -v' to Filter Comments and Empty Lines in Linux

Command: egrep -v "(^#.*|^$)" myfile.txt

The egrep -v "(^#.*|^$)" command is a command-line instruction that uses the egrep utility to search for lines in a text file or input stream and exclude lines that match the specified regular expression pattern. Let's break down what this command does:

egrep: This is a command that performs pattern matching using regular expressions. It searches for lines that match a given pattern and prints them to the standard output.

-v: This is an option for egrep (also known as grep with extended regular expressions) that inverts the matching behavior. In other words, it tells egrep to exclude lines that match the specified pattern, rather than including them.

"(^#.*|^$)": This is the regular expression pattern enclosed in double quotes. Let's break down this pattern:

(^#.*|^$): This is a logical OR (|) between two sub-patterns enclosed in parentheses.

^#.*: This sub-pattern matches lines that start with a # character (comments).

^$: This sub-pattern matches empty lines (lines containing no characters or only whitespace).

So, when you run the egrep -v "(^#.*|^$)" command, it will search through the input text and exclude lines that are either comments (start with #) or empty lines. It will print all other lines from the input, effectively filtering out comments and empty lines.

***