ALM-45433 ClickHouse AZ Topology Exception

Description

If the cross-AZ HA function is enabled for a cluster where ClickHouse has been deployed, the ClickHouse topology remains unchanged. This alarm is generated when the cross-AZ HA does not take effect if backup nodes of the same shard are in the same AZ.

This alarm is automatically cleared when the system detects that all shards meet the cross-AZ HA deployment requirements.

Attribute

Alarm ID

Alarm Severity

Auto Clear

45433

Critical

Yes

Parameters

Name

Meaning

Source

Specifies the cluster for which the alarm is generated.

ServiceName

Specifies the service for which the alarm is generated.

HostName

Specifies the host for which the alarm is generated.

Impact on the System

The current deployment of the ClickHouse service does not support cross-AZ HA.

Possible Causes

After cross-AZ HA is enabled, all backup nodes of a shard are in the same AZ.

Procedure

Modify the AZ of backup nodes.

  1. Log in to the node where the client is installed as the client installation user. Run the following command to switch to the client installation directory:

    cd {Client installation path}

  2. Run the following command to configure environment variables:

    source bigdata_env

  3. Run the following command to authenticate the user (skip this step in normal mode):

    kinit Component service user

  4. Run the following command to log in to the client tool:

    zkCli.sh -serverService IP address of the node where the ZooKeeper instance resides:Client port

  5. Run the following command to view the current topology:

    get /clickhouse/topo

    Note

    If the ClickHouse is installed with multiple services, run the get /clickhouse{-n}/topo command. For example, if the ClickHouse-1 is installed, run the get /clickhouse-1/topo command.

    [zk: 192.168.20.36:24002(CONNECTED) 0] get /clickhouse/topo
    
    <topo>
      <mcluster>
        <shard id="14" index="1">
          <server id="15">
            <replica>1</replica>
            <az>AZ1</az>
            <host>192-168-20-205</host>
            <port>21427</port>
          </server>
          <server id="16">
            <replica>2</replica>
            <az>AZ1</az>
            <host>192-168-20-2205</host>
            <port>21427</port>
          </server>
        </shard>
      </mcluster>
    </topo>
    
  6. Select a host from the desired shard and deploy the host in another AZ.

  7. Log in to FusionInsight Manager, click Host, select the host you have deployed in 6 and choose More > Reinstall to reinstall the host.

  8. Choose Cluster > Cross-AZ HA, click Configure AZ and Policy and change the AZ information of the reinstalled host to the AZ planned in the 6.

  9. Wait for five minutes and check whether the alarm is cleared.

    • If yes, no further action is required.

    • If no, go to 10.

Collect fault information.

  1. On FusionInsight Manager, choose O&M > Log > Download.

  2. Expand the drop-down list next to the Service field. In the Services dialog box that is displayed, select ClickHouseServer for the target cluster.

  3. Expand the Hosts list. In the Select Host dialog box that is displayed, select the abnormal host, and click OK.

  4. Click image1 in the upper right corner, and set Start Date and End Date for log collection to 1 hour ahead of and after the alarm generation time, respectively. Then, click Download.

  5. Contact O&M personnel and provide the collected logs.

Alarm Clearing

This alarm is automatically cleared after the fault is rectified.