Monitoring Clusters Using Cloud Eye

Function

This section describes how to check cluster metrics on Cloud Eye. By monitoring cluster running metrics, you can identify the time when the database cluster is abnormal and analyze potential activity problems based on the database logs, improving database performance. This section describes the metrics that can be monitored by Cloud Eye as well as their namespaces and dimensions. You can use the management console or APIs provided by Cloud Eye to query the monitoring metrics and alarms generated by GaussDB(DWS). For details, see the User Guide and API Reference of Cloud Eye.

This section is organized as follows:

Namespace

SYS.DWS

Cluster Monitoring Metrics

With the GaussDB(DWS) monitoring metrics provided by Cloud Eye, you can obtain information about the cluster running status and performance. This information will provide a better understanding of the node-level information.

Table 1 describes GaussDB(DWS) monitoring metrics.

Table 1 GaussDB(DWS) monitoring metrics

Metric ID

Name

Description

Value Range

Monitored Object

Monitoring Period (Raw Data)

dws001_shared_buffer_hit_ratio

Cache Hit Ratio

Percentage of data volume obtained from memory, expressed in percentage

0% to 100%

Data warehouse cluster

4 minutes

dws002_in_memory_sort_ratio

In-memory Sort Ratio

Percentage of data volume that is sorted in memory, expressed in percentage

0% to 100%

Data warehouse cluster

4 minutes

dws003_physical_reads

File Reads

Total number of database file reads

> 0

Data warehouse cluster

4 minutes

dws004_physical_writes

File Writes

Total number of database file writes

> 0

Data warehouse cluster

4 minutes

dws005_physical_reads_per_second

File Reads per Second

Number of database file reads per second

>= 0

Data warehouse cluster

4 minutes

dws006_physical_writes_per_second

File Writes per Second

Number of database file writes per second

>= 0

Data warehouse cluster

4 minutes

dws007_db_size

Data Volume

Total size of data in the database, in MB

>= 0 MB

Data warehouse cluster

4 minutes

dws008_active_sql_count

Active SQL Count

Number of active SQLs in the database

>= 0

Data warehouse cluster

4 minutes

dws009_session_count

Session Count

Number of sessions that access the database

>= 0

Data warehouse cluster

4 minutes

dws010_cpu_usage

CPU Usage

CPU usage of each node in a cluster, in percentage

0% to 100%

Data warehouse node

1 minute

dws011_mem_usage

Memory Usage

Memory usage of each node in a cluster, in percentage

0% to 100%

Data warehouse node

1 minute

dws012_iops

IOPS

Number of I/O requests processed by each node in the cluster per second

>= 0

Data warehouse node

1 minute

dws013_bytes_in

Network Input Throughput

Data input to each node in the cluster per second over the network

Unit: byte/s

>= 0 bytes/s

Data warehouse node

1 minute

dws014_bytes_out

Network Output Throughput

Data sent to the network per second from each node in the cluster

Unit: byte/s

>= 0 bytes/s

Data warehouse node

1 minute

dws015_disk_usage

Disk Usage

Disk usage of each node in a cluster, in percentage

0% to 100%

Data warehouse node

1 minute

dws016_disk_total_size

Total Disk Size

Total disk space of each node in the cluster

Unit: GB

100 to 2000 GB

Data warehouse node

1 minute

dws017_disk_used_size

Used Disk Space

Used disk space of each node in the cluster

Unit: GB

0 to 3600 GB

Data warehouse node

1 minute

dws018_disk_read_throughput

Disk Read Throughput

Data volume read from each disk in the cluster per second

Unit: byte/s

>= 0 bytes/s

Data warehouse node

1 minute

dws019_disk_write_throughput

Disk Write Throughput

Data volume written to each disk in the cluster per second

Unit: byte/s

>= 0 bytes/s

Data warehouse node

1 minute

dws020_avg_disk_sec_per_read

Average Time per Disk Read

Average time used each time when a disk reads data

Unit: second

> 0s

Data warehouse node

1 minute

dws021_avg_disk_sec_per_write

Average Time per Disk Write

Average time used each time when data is written to a disk

Unit: second

> 0s

Data warehouse node

1 minute

dws022_avg_disk_queue_length

Average Disk Queue Length

Average I/O queue length of a disk

>= 0

Data warehouse node

1 minute

dws_024_dn_diskio_util

DN I/O usage

Average disk I/O usage of DNs in a cluster

0% to 100%

Data warehouse instance

1 minute

Dimensions

Key

Value

datastore_id

Data warehouse cluster ID

dws_instance_id

Data warehouse node ID

Cluster and Node Monitoring Information

  1. Log in to the GaussDB(DWS) management console.

  2. View the cluster information. In the cluster list, click View Metric in the Operation column where a specific cluster resides. The Cloud Eye management console is displayed. By default, the cluster monitoring information on the Cloud Eye management console is displayed.

    Additionally, you can specify a specific monitoring metric and the time range to view the performance curve.

  3. View the node information. Click image1 to return to the Cloud Eye management console. On the Data Warehouse Nodes tab page in the right pane, you can view metrics of each node in the cluster.

    Additionally, you can specify a specific monitoring metric and the time range to view the performance curve.

    Cloud Eye also supports the ability to compare the monitoring metrics of multiple nodes. For details, see Comparing the Monitoring Metrics of Multiple Nodes.

Comparing the Monitoring Metrics of Multiple Nodes

  1. In the left navigation pane of the Cloud Eye management console, choose Dashboard > Panels.

  2. On the page that is displayed, click Create Panel. In the displayed dialog box, enter the name and click OK.

  3. Click Add Graph in the upper right corner.

  4. In the displayed dialog box, configure the title and monitoring metrics.

    Note

    You can add multiple monitoring metrics by clicking Add Metric.

    The following describes how to set parameters if you want to compare CPU usage of two nodes.

    Table 2 Configuration example

    Parameter

    Example Value

    Resource Type

    DWS

    Dimension

    Data Warehouse Node

    Monitored Object

    dws-demo-dws-cn-cn-2-1

    dws-demo-dws-cn-cn-1-1

    dws-demo-dws-dn-1-1

    Metric

    CPU Usage

  5. Click OK.

    Then you can view the corresponding monitoring graph on the Panels page. Move the cursor to the graph and click image2 in the upper right corner to zoom in the graph and view detailed metric comparison data.

Creating Alarm Rules

Setting GaussDB(DWS) alarm rules allows you to customize the monitored objects and notification policies and determine the running status of your GaussDB(DWS) at any time.

A GaussDB(DWS) alarm rule includes the alarm rule name, monitored object, metric, threshold, monitoring interval, and whether to send a notification. This section describes how to set GaussDB(DWS) alarm rules.

  1. Log in to the GaussDB(DWS) management console.

  2. In the navigation pane on the left, click Clusters.

  3. Locate the row containing the target cluster, click View Metric in the Operation column to enter the Cloud Eye management console and view the GaussDB(DWS) monitoring information.

    The status of the target cluster must be Available. Otherwise, you cannot create alarm rules.

    image3

  4. In the left navigation pane of the Cloud Eye management console, choose Alarm Management > Alarm Rules.

  5. On the Alarm Rules page, click Create Alarm Rule in the upper right corner.

    image4

  6. On the Create Alarm Rule page, set parameters as prompted.

    1. Configure the rule name and description.

      image5

    2. Configure the alarm parameters as prompted.

      image6

      image7

      Table 3 Configuring alarm parameters

      Parameter

      Description

      Example Value

      Resource Type

      Name of the cloud service resource for which the alarm rule is configured.

      Data Warehouse Service

      Dimension

      Metric dimension of the alarm rule. You can select Data Warehouse Nodes or Data Warehouses.

      Data Warehouse Node

      Monitoring Scope

      Resource scope to which an alarm rule applies. Select Specific resources and select one or more monitoring objects. Select the ID of the cluster instance or node you have created. Click image8 to synchronize the monitoring objects to the right pane.

      Specific resources

      Method

      Select Use template or Create manually as required.

      • If no alarm template is available, set Method to Create manually and configure related parameters to create an alarm rule.

      • If you have available alarm rule templates, set Method to Use template, so that you can use a template to quickly create alarm rules.

      Create manually

      Template

      This parameter is valid only when Use template is selected.

      Select the template to be imported. If no alarm template is available, click Create Custom Template to create one that meets your requirements.

      -

      Alarm Policy

      This parameter is valid only when Create manually is selected.

      Set the policy that triggers an alarm. For example, trigger an alarm if the CPU usage equals to or is greater than 80% for 3 consecutive periods.

      Table 1 lists the GaussDB(DWS) monitoring metrics.

      -

      Alarm Severity

      Severity of an alarm. Valid values are Critical, Major, Minor, and Informational.

      Major

    3. Configure the alarm notification parameters as prompted.

      image9

      Table 4 Configuring alarm notifications

      Parameter

      Description

      Example Value

      Alarm Notification

      Whether to notify users when alarms are triggered. Notifications can be sent as emails or text messages, or HTTP/HTTPS requests sent to the servers.

      You can enable (recommended) or disable Alarm Notification.

      Enable

      Notification Object

      Name of the topic to which the alarm notification is sent.

      If you enable Alarm Notification, you need to select a topic. If no desired topics are available, create one first, whereupon the SMN service is invoked. For details about how to create a topic, see the Simple Message Notification User Guide.

      -

      Trigger Condition

      Condition for triggering the alarm. You can select Generated alarm, Cleared alarm, or both.

      -

    4. After the configuration is complete, click Next.

      After the alarm rule is created, if the metric data reaches the specified threshold, Cloud Eye will immediately inform you that an exception has occurred.