Cluster Topology

Overview

A topology shows all the nodes in a cluster. You can check the node statuses, processes, and IP addresses.

Note

  • You can check the topology structure and node processes.

  • Only cluster versions 8.0.0 and later can display the topology structure. Only cluster versions 8.2.0 and later can display node processes.

Viewing the Cluster Topology

  1. Log in to the GaussDB(DWS) management console.

  2. In the cluster list, click the name of a cluster.

  3. On the Cluster Details page, click the Cluster Topology tab.

  4. In the upper part of the page, you can select IP Address or Node Name. After entering the IP address or node name in the search box, you can view the location of the IP address or node name in the cluster topology.

    image1

Topology Overview

image2

This figure shows a topology. The elements marked in the figure are as follows:

  1. Public IP address of the ELB bound to the cluster. If no public IP addresses are bound to the ELB, the service address is displayed.

  2. EIP bound to the cluster.

  3. Search category. You can perform exact search by IP address or node name.

  4. Rings in the cluster.

  5. A ring. Each ring occupies a line. An icon in a ring indicates a node.

  6. A node. The type of the node is displayed in the upper right corner of the icon. Currently, the type can only be CN or DN. If there is a CN process on the node, CN is displayed. If there are no CN processes on the node, DN is displayed.

  7. Node details, including the node name, status, IP addresses, and task process. Node details are displayed when you hover your cursor over a node icon.

Terms in the Topology View

Table 1 Cluster structure description

Name

Description

Usage

ELB

Elastic Load Balance (ELB) automatically distributes incoming traffic across multiple backend servers based on listening rules you configure.

If the internal IP address or EIP of a CN is used to connect to a GaussDB(DWS) cluster, the failure of this CN will lead to a cluster connection failure. If a private domain name is used for connection, connection failures can be avoided by polling.

However, private domain names cannot be used for public network access, and requests cannot be forwarded in the case of a CN failure. Therefore, ELB is used to avoid single CN failures. For details, see Associating and Disassociating ELB.

EIP

The Elastic IP (EIP) service provides static public IP addresses and scalable bandwidths that enable your cloud resources to communicate with the Internet.

EIPs can be bound to or unbound from ECSs, BMSs, virtual IP addresses, load balancers, and NAT gateways.

Ring

A security ring is used for isolating faulty servers. A fault in a ring does not affect servers outside the ring.

Data on a DN has multiple copies in a ring, and will not be lost even if the DN server is faulty.

For example, if Server1 in a ring is faulty, the standby DN1 on Server2, the standby DN2 on Server3, and the standby DN3 on Server3 are still running. The loads of servers in a ring are still balanced.

A cluster can run properly as long as the number of faulty servers does not exceed the number of rings.

Note

The ring is the minimum unit for a scale-out. When you scale out a cluster, the added nodes must be a multiple of the ring quantity.

Table 2 Node IP addresses

Name

Description

Usage

Manage IP

IP address used by a data warehouse node to communicate with the management plane

It is used by the management plane to deliver commands, and used by the node to report node status and monitoring information.

Traffic IP

IP address of a data warehouse node for external access.

This IP address can be bound to an EIP or ELB, or directly connect to a VPC.

Internal IP

IP address used for communication inside a data warehouse cluster.

-

Internalmgnt IP

IP address used by nodes to send internal management commands in a data warehouse cluster.

-

Table 3 Node processes

Name

Description

Usage

CMS

A Cluster Manager (CM) manages and monitors the running status of functional units and physical resources in the distributed system, ensuring system stability.

CM Server (CMS) is a module of CM.

A CM consists of CM Agent, OM Monitor, and CM Server.

  • CM Agent monitors the running status of primary and standby GTMs, CNs, and primary and standby DNs on the host, and reports the status to CM Server. In addition, it executes the arbitration instruction delivered by CM Server. A CM Agent process runs on each server.

  • OM Monitor monitors scheduled tasks of CM Agent and restarts CM Agent when CM Agent stops. If CM Agent cannot be restarted, the server will be unavailable. In this case, you need to manually rectify this fault.

    Note

    A CM Agent restart fails probably because of lack of system resources, which rarely happens.

  • CM Server checks whether the current system is normal according to the instance status reported by CM Agent. In the case of exceptions, CM Server delivers recovery commands to CM Agent.

GaussDB(DWS) deploys CM Server in primary/standby mode to ensure system HA. CM Agent connects to the primary CM Server. If the primary CM Server is faulty, the standby CM Server is promoted to primary to prevent single-CM faults.

GTM

A Global Transaction Manager (GTM) generates and maintains the globally unique information, such as the transaction ID, transaction snapshot, and timestamp.

A cluster includes only one pair of GTMs: one primary and one standby GTM.

CN

A Coordinator (CN) receives access requests from applications, and returns execution results to the client; splits tasks and allocates task fragments to different DNs for parallel processing.

CNs in a cluster have equivalent roles and return the same result for the same DML statement. Load balancers can be added between CNs and applications to ensure that CNs are transparent to applications. If a CN is faulty, the load balancer connects its applications to another CN.

CNs need to connect to each other in the distributed transaction architecture. To reduce heavy load caused by excessive threads on GTMs, no more than 10 CNs should be configured in a cluster.

CCN

Central Coordinator (CCN)

GaussDB(DWS) handles the global resource load in a cluster using the Central Coordinator (CCN) for adaptive dynamic load management. When the cluster is started for the first time, the CM selects the CN with the smallest ID as the CCN. If the CCN is faulty, CM replaces it with a new one.

DN

A Data Node (DN) stores data in row-store, column-store, or hybrid mode, executes data query tasks, and returns execution results to CNs.

There are multiple DNs in the cluster. Each DN stores part of data. If DNs are not deployed in primary/standby mode and a DN is faulty, data on the DN will be inaccessible.