Monitoring Metrics¶
You can check the status and available resources of a cluster and learn about its real-time resource consumption through the GaussDB(DWS) monitoring items.
Table 1 describes GaussDB(DWS) monitoring metrics.
Monitored Object | Metric | Description | Value Range | Monitoring Period (Raw Data) |
---|---|---|---|---|
Cluster Overview | Cluster Status | Status of a cluster. | Normal/Abnormal/Degraded | 30s |
Nodes | Number of available nodes and total number of nodes (Available/Total) in a cluster. | >= 0 | 60s | |
CNs | Number of CNs in a cluster. | >= 0 | 60s | |
Databases | Number of created databases in a cluster. | >= 0 | 90s | |
Resource Consumption | CPU Usage | Average real-time CPU usage of all nodes in a cluster. | 0% to 100% | 30s |
Memory Usage | Average real-time memory usage of all nodes in a cluster. | 0% to 100% | 30s | |
Disk Usage | Average real-time disk usage of all nodes in a cluster. | 0% to 100% | 30s | |
Disk I/O | Average real-time disk I/O of all nodes in a cluster. | >= 0 KB/s | 30s | |
Network I/O | Average real-time network I/O of all NICs in a cluster. | >= 0 KB/s | 30s | |
Top 5 Time-Consuming Queries | Query ID | ID of a query, which is automatically generated by the database. | >= 0 | 180s |
SQL Statement | Query statement executed by a user. | String | 180s | |
Execution Time | Execution time of a query statement (unit: ms). | >= 0 ms | 180s | |
Top 5 Queries with Most Data Written to Disk | Query ID | ID of a query, which is automatically generated by the database. | >= 0 | 180s |
SQL Statement | Query statement executed by a user. | String | 180s | |
Data Written to Disk | Data to be written to disks after a user runs a statement (unit: MB). | >= 0 MB | 180s | |
Cluster Resource Metrics | CPU Usage | Average CPU usage of all nodes in a cluster. | 0% to 100% | 30s |
Memory Usage | Average memory usage of all nodes in a cluster. | 0% to 100% | 30s | |
Disk Usage | Average usage of all disks in a cluster. | 0% to 100% | 30s | |
Disk I/O Usage | Average I/O usage of all disks in a cluster. | 0% to 100% | 30s | |
Network I/O Usage | Average I/O usage of all NICs in a cluster. | 0% to 100% | 30s | |
Key Database Metrics | Cluster Status | Cluster running status. | Normal/Degraded/Abnormal | 30s |
Cluster Abnormal CNs | Number of abnormal CNs in the cluster | >= 0 | 60s | |
Cluster Read-only | Whether the cluster is in the read-only state | Yes/No | 30s | |
Concurrent Sessions | Number of concurrent sessions in a cluster within a specified period. | >= 0 | 30s | |
Concurrent Queries | Number of concurrent queries in a cluster within a specified period. | >= 0 | 30s | |
Node Monitoring-Overview | Node Name | Name of a node in a cluster. | String | 30s |
CPU Usage | CPU usage of a host. | 0% to 100% | 30s | |
Memory Usage | Memory usage of a host. | 0% to 100% | 30s | |
Average Disk Usage (%) | Disk usage of a host. | 0% to 100% | 30s | |
IP Address | Service IP address of a host. | String | 30s | |
Disk I/O | Disk I/O of a host (unit: KB/s) | >= 0 KB/s | 30s | |
TCP Protocol Stack Retransmission Rate | Retransmission rate of TCP packets per unit time. | 0% to 100% | 30s | |
Status | Running status of a host | Online/Offline | 30s | |
Node Monitoring-Disks | Node Name | Name of a node in a cluster. | String | 30s |
Disk Name | Name of a disk on a host. | String | 30s | |
Disk Capacity | Disk capacity of the host (unit: GB) | >= 0 GB | 30s | |
Disk Usage | Disk usage of a host. | 0% to 100% | 30s | |
Disk Read Rate | Disk read rate of the host (unit: KB/s) | >= 0 KB/s | 30s | |
Disk Write Rate | Disk write rate of the host (unit: KB/s) | >= 0 KB/s | 30s | |
I/O Wait Time (await, ms) | Average waiting time for each I/O request (unit: ms) | >= 0 ms | 30s | |
I/O Service Time (svctm, ms) | Average processing time for each I/O request (unit: ms) | >= 0 ms | 30s | |
I/O Utility (util, %) | Disk I/O usage of a host. | 0% to 100% | 30s | |
Node Monitoring-Network | Node Name | Name of a node in a cluster. | String | 30s |
NIC Name | Name of the NIC on a host. | String | 30s | |
NIC Status | NIC status. | up/down | 30s | |
NIC Speed | Working rate of a NIC, in Mbit/s. | >= 0 | 30s | |
Received Packets | Number of received packets of a NIC. | >= 0 | 30s | |
Sent Packets | Number of sent packets of a NIC. | >= 0 | 30s | |
Lost Packets Received | Number of received lost packets of a NIC. | >= 0 | 30s | |
Receive Rate | Number of bytes received by a NIC per unit of time (KB/s). | >= 0 KB/s | 30s | |
Transmit Rate | Number of bytes sent by a NIC per unit of time (unit: KB/s) | >= 0 KB/s | 30s | |
Database Monitoring | Database Name | Name of the database created by a user in a cluster. | String | 60s |
Usage | Used capacity of the current database (unit: GB). | >= 0 GB | 86400s | |
Users | Number of users in the current database. | >= 0 | 30s | |
Sessions | Number of sessions in the current database. | >= 0 | 30s | |
Applications | Number of applications in the current database. | >= 0 | 30s | |
Queries | Number of active queries in the current database. | >= 0 | 30s | |
Scanning Rows | Number of rows returned by the full table scan query in the current database. | >= 0 | 60s | |
Index Query Rows | Number of rows returned by the index query in the current database. | >= 0 | 60s | |
Inserted Rows | Number of rows inserted in the current database. | >= 0 | 60s | |
Updated Rows | Number of rows updated in the current database. | >= 0 | 60s | |
Deleted Rows | Number of rows deleted from the current database. | >= 0 | 60s | |
Executed Transactions | Number of transaction executions on the current database. | >= 0 | 60s | |
Transaction Rollbacks | Number of transactions in the current database that have been rolled back. | >= 0 | 60s | |
Deadlocks | Number of deadlocks detected in the current database. | >= 0 | 60s | |
Temporary Files | Number of temporary files created in the current database. | >= 0 | 60s | |
Temporary File Capacity | Size of temporary files written by the current database, in GB. | >= 0 | 60s | |
Performance Monitoring | Cluster CPU Usage | Average CPU usage of all nodes in a cluster. | 0% to 100% | 30s |
Cluster Memory Usage | Average memory usage of all nodes in a cluster. | 0% to 100% | 30s | |
Cluster Disk Usage | Average disk usage of all nodes in a cluster. | 0% to 100% | 30s | |
Cluster Disk I/O | Average I/O of all disks in a cluster. | 0% to 100% | 30s | |
Cluster Network I/O | Average I/O of all NICs in a cluster. | 0% to 100% | 30s | |
Cluster Status | Historical trend of the cluster status. | Normal/Abnormal/Degraded | 30s | |
Cluster Read-only | Historical trend of the cluster read-only status change trend. | Yes/No | 30s | |
Cluster Abnormal CNs | Historical trend of the number of abnormal CNs in the cluster. | >= 0 | 60s | |
Cluster Abnormal DNs | Historical trend of the number of abnormal DNs in the cluster. | >= 0 | 60s | |
Cluster CPU Usage of DNs | Average CPU usage of all DNs in a cluster. | 0% to 100% | 60s | |
Cluster Sessions | Historical trend of the number of sessions in a cluster. | >= 0 | 30s | |
Cluster Queries | Historical change trend of the number of queries in the cluster. | >= 0 | 30s | |
Cluster Deadlocks | Historical trend of the number of deadlocks in a cluster. | >= 0 | 60s | |
Cluster TPS | Average number of transactions per second of all databases in a cluster. Formula: (delta_xact_commit + delta_xact_rollback)/current_collect_rate | >=0 | 60s | |
Cluster QPS | Average number of concurrent requests per second of all databases in a cluster. Formula: delta_query_count/current_collect_rate | >= 0 | 60s | |
Database Sessions | Historical trend of the number of sessions on a single database in a cluster. | >= 0 | 30s | |
Database Queries | Historical trend of the number of queries on a single database in a cluster. | >= 0 | 30s | |
Database Inserted Rows | Historical trend of the number of rows inserted into a single database in a cluster. | >= 0 | 60s | |
Database Updated Rows | Historical trend of the number of updated rows in a single database in a cluster. | >= 0 | 60s | |
Database Deleted Rows | Historical trend of the number of deleted rows in a single database in a cluster. | >= 0 | 60s | |
Database Capacity | Historical trend of the capacity in a single database in a cluster. | >= 0 | 86400s | |
Live Session | Session ID | ID of the current session (query thread ID). | String | 30s |
User Name | Name of the user who executes the current session. | String | 30s | |
Database Name | Name of the database connected to the current session. | String | 30s | |
Session Duration | Duration of the current session (unit: ms). | >= 0 ms | 30s | |
Application Name | Name of the application that creates the current session. | String | 30s | |
Queries | Number of SQL statements executed in the current session. | >= 0 | 30s | |
Latest Query Duration | Duration for executing the previous SQL statement in the current session. | >= 0 ms | 30s | |
Client IP Address | IP address of the client that initiates the current session. | String | 30s | |
Connected CN | Connected CN of the current session. | String | 30s | |
Session Status | Execution status of the current session. | Running/Idle/Retry | 30s | |
Real-Time Query | Query ID | Query ID of a current query statement, which is a unique identifier allocated by the kernel to each query statement. | String | 30s |
User Name | Name of the user who submits the current query statement. | String | 30s | |
Database Name | Name of the database corresponding to the current query statement. | String | 30s | |
Application Name | Name of the application corresponding to the current query statement. | String | 30s | |
Resource Pool | Name of the resource pool for the current query statement. | String | 30s | |
Submitted | Timestamp when the current query statement is submitted. | String | 30s | |
Blocking Time | Waiting time before the current query statement is executed, in ms. | >= 0 | 30s | |
Execution Time | Execution time of the current query statement, in ms. | >= 0 | 30s | |
CPU Time | Total CPU time spent by the current query statement on all DNs, in ms. | >= 0 | 30s | |
CPU Time Skew | CPU time skew of the current query statement among all DNs. | 0% to 100% | 30s | |
Statement | Query statement that is being executed. | String | 30s | |
Connected CN | Name of the CN that submits the current query statement. | String | 30s | |
Client IP Address | IP address of the client that submits the current query statement. | String | 30s | |
Lane | Lane where the current query statement is located. | Fast Lane/Slow Lane | 30s | |
Query Status | Query status of the statement that is being executed. | String | 30s | |
Session ID | Session ID of the current query statement, which is a unique identifier allocated by the kernel to each client connection. | String | 30s | |
Queuing Status | Status of the current query execution in the database, indicating whether the query is queued in the resource pool. | Yes/No | 30s | |
Historical Query | Query ID | Query ID of a query statement, which is a unique identifier allocated by the kernel to each query statement. | String | 180s |
User Name | Name of the user who submits a query statement. | String | 180s | |
Application Name | Application name corresponding to a query statement. | String | 180s | |
Database Name | Name of the database corresponding to a query statement. | String | 180s | |
Resource Pool | Name of the resource pool for the current query statement. | String | 180s | |
Submitted | Timestamp when a query statement is submitted. | String | 180s | |
Blocking Time | Waiting time before the query statement is executed, in ms. | >= 0 | 180s | |
Execution Time | Execution time of the query statement, in ms. | >= 0 | 180s | |
CPU Time | Total CPU time spent by the query statement on all DNs, in ms. | >= 0 | 180s | |
CPU Time Skew | CPU time skew of a query statement executed on all DNs. | 0% to 100% | 180s | |
Statement | Query statements to be parsed | String | 180s | |
Slow Instance Monitoring | Slow Instance | Number of slow instances detected at the current time point. | >= 0 | 240s |
Detected | Time when a slow instance is detected for the first time. | String | 240s | |
Node Name | Name of the node where the slow instance is deployed. | String | 240s | |
Instance | Name of an instance. | String | 240s | |
Slow Node Detections (within 24 hours) | Number of times that a slow instance is detected within 24 hours. | >= 0 | 240s | |
Resource Pool Monitoring | Resource Pool | Name of a resource pool in a cluster. | String | 120s |
CPU Usage | Real-time CPU usage of a resource pool. | 0% to 100% | 120s | |
CPU Resource | CPU usage quota of a resource pool. | 0% to 100% | 120s | |
Real-Time Concurrent Short Queries | Simple concurrency in a resource pool. | >= 0 | 120s | |
Concurrent Short Queries | Quota for simple concurrency in a resource pool. | >= 0 | 120s | |
Real-Time Concurrent Queries | Real-time complex concurrency in a resource pool. | >= 0 | 120s | |
Query Concurrency | Quota for complex concurrency in a resource pool. | >= 0 | 120s | |
Storage | Storage resource quota of a resource pool. | >= 0 | 120s | |
Disk Usage | Disk usage of a resource pool. | 0% to 100% | 120s | |
Memory | Memory quota of a resource pool. | >= 0 | 120s | |
Memory Usage | Memory usage of a resource pool. | 0% to 100% | 120s | |
Queries Waiting in a Resource Pool | User | Name of the user of waiting queries | String | 120s |
Application | Name of the application to be queried. | String | 120s | |
Database | Name of the database to be queried. | String | 120s | |
Queuing Status | Execution status of a query in the database (CCN/CN/DN). | String | 120s | |
Wait Time | Waiting time for a waiting query (unit: ms). | >= 0 ms | 120s | |
Resource Pool | Resource pool of the waiting query. | String | 120s | |
Statement | Query statement for the waiting status. | String | 120s | |
Circuit Breaking Queries | Query ID | Query ID of the circuit breaking query statement. | String | 120s |
Query Statement | Query statement for the circuit breaking status. | String | 120s | |
Blocking Time | Blocking time before the query statement triggers circuit breaking, in ms. | >= 0 | 120s | |
Execution Time | Execution time before the query statement triggers circuit breaking, in ms. | >= 0 | 120s | |
CPU Time | Average CPU time consumed by each DN before the query statement triggers circuit breaking, in ms. | >= 0 | 120s | |
CPU Skew | Skew rate of CPU time consumed by each DN before the query statement triggers circuit breaking. | 0% to 100% | 120s | |
Exception Handling | Handling method after the query statement triggers circuit breaking. | Abort/Degrade | 120s | |
Status | Circuit breaking handling status of a query statement. | Executing/Completed | 120s | |
SQL Tuning | Query ID | IP address of the current query (query logic ID). | String | 180s |
Database | Name of the database where the current query is executed. | String | 180s | |
Schema Name | Name of the current query schema. | String | 180s | |
User Name | Name of the user who performs the query. | String | 180s | |
Client | Name of the client that initiates the current query. | String | 180s | |
Client IP Address | IP address of the client that initiates the current query. | String | 180s | |
Running Time | Execution time of the current query, in ms. | >= 0 | 180s | |
CPU Time | CPU time of the current query, in ms. | >= 0 | 180s | |
Scale-Out Started | Start time of the current query. | Timestamp | 180s | |
Completed | End time of the current query. | Timestamp | 180s | |
Details | Details about the current query. | String | 180s | |
INODE | Inode Usage | Disk inode usage. | 0% to 100% | 30s |
SCHEMA | Schema Usage | Database schema usage. | 0% to 100% | 3600s |