Tuesday, October 15, 2024

GPE down on Tigergraph DB 3.9.3-3 due to Vertex deletions and schema changes

Cause: GPE was not coming online as there were schema changes, where in few vertex's are deleted etc. as a result the "config.yaml" was not in sync on all nodes on cluster.

Fixed in: 3.10.1 and later

# Location

$ cd /tigergraph/data/gstore/0/part/


# Identifying latest catalog available.

$ grun_p all "ls -l $(gadmin config get System.DataRoot)/gsql/backup"


# Check for latest catalog zip file on any node in the cluster and fetch to central place like m1 node.

$ LATEST_CATALOG_BACKUP_NAME=<Catalog file>.zip

$ mkdir /tmp/latest-catalog-bak

$ gfetch m3 $(gadmin config get System.DataRoot)/gsql/backup/$LATEST_CATALOG_BACKUP_NAME /tmp/latest-catalog-bak

$ unzip /tmp/latest-catalog-bak/* -d /tmp/latest-catalog-bak/


# Verify the config versions from the latest catalogs pulled above.

$ cd catalog.x

$ grep "GraphConfigVersion:" 0/GraphCatalog.yaml

$ grep "GraphConfigVersion:" 1/GraphCatalog.yaml


## Similarly check for other latest catalogs for correct version.


# Checking data directory

$ cd /tigergraph/data/gstore/0/part

$ view config.yaml

$ view config.yaml.old

$ view config.yaml.staging.old


# In gsql logs check for

$ grun_p all "cat /tigergraph/logs/gsql/* | grep -i "RunSchemaChange.*succeed"


# Checking gpe logs, specifically .out files for schema changes.

$ cd /tigergraph/logs/gpe

$ ls -ltrh *.out


# Verifiying for specifi graph schema version.

$ cat *.out | grep -i "Graph schema version 304"

$ grep -r "Graph schema version 304" *.out


# Update the config.yaml and sync in all nodes.


Now, update the catalog


$ LATEST_CATALOG_BACKUP_PATH=/tmp/latest-catalog-bak/catalog.20240920-102901

$ GSQL_BIN=$(gadmin config get System.AppRoot --file ~/.tg.cfg)/dev/gdk/gsql/lib/.tg_dbs_gsqld.jar

$ GPE_SCHEMA_PATH=$(gadmin config get System.DataRoot)/gstore/0/part

$ GPE_SCHEMA_META_FILE=$GPE_SCHEMA_PATH/config.yaml

$ $GSQL_BIN \

> -r 1 \

> --verifydict $LATEST_CATALOG_BACKUP_PATH \

> --gpeschema $GPE_SCHEMA_META_FILE



Output: 

======= UPGRADE_OLD_VERSION: null =======

Successfully finished verifying catalog.

# Restart all services

$ gadmin restart all -y


***