ALM-19000 HBase Service Unavailable

Description

This alarm is generated when the HBase service is unavailable. The alarm module checks the HBase service status every 120 seconds.

This alarm is cleared when the HBase service recovers.

Attribute

Alarm ID

Alarm Severity

Automatically Cleared

19000

Critical

Yes

Parameters

Name

Meaning

Source

Specifies the cluster for which the alarm is generated.

ServiceName

Specifies the service for which the alarm is generated.

RoleName

Specifies the role for which the alarm is generated.

HostName

Specifies the host for which the alarm is generated.

Impact on the System

Operations, such as reading or writing data and creating tables, cannot be performed.

Possible Causes

  • The ZooKeeper service is abnormal.

  • The HDFS service is abnormal.

  • The HBase service is abnormal.

  • The network is abnormal.

Procedure

Check the ZooKeeper service status.

  1. On the FusionInsight Manager, check whether the running status of ZooKeeper is Normal on service list.

    • If yes, go to 5.

    • If no, go to 2.

  2. In the alarm list, check whether ALM-13000 ZooKeeper Service Unavailable exists.

    • If yes, go to 3.

    • If no, go to 5.

  3. Rectify the fault by following the steps provided in ALM-13000 ZooKeeper Service Unavailable.

  4. Wait several minutes, and check whether alarm is cleared.

    • If yes, no further action is required.

    • If no, go to 5.

Check the HDFS service status.

  1. In the alarm list, check whether ALM-14000 HDFS Service Unavailable exists.

    • If yes, go to 6.

    • If no, go to 8.

  2. Rectify the fault by following the steps provided in ALM-14000 HDFS Service Unavailable.

  3. Wait several minutes, and check whether alarm is cleared.

    • If yes, no further action is required.

    • If no, go to 8.

  4. On the FusionInsight Manager portal, choose Cluster > Name of the desired cluster > Services > HDFS. Check whether Safe Mode is ON.

    • If yes, go to 9.

    • If no, go to 12.

  5. Log in to the HDFS client as user root. Run cd to switch to the client installation directory, and run source bigdata_env.

    If the cluster uses the security mode, perform security authentication. Obtain the password of user hdfs from the administrator, run the kinit hdfs command and enter the password as prompted.

  6. Run the following command to manually exit the safe mode:

    hdfs dfsadmin -safemode leave

  7. Wait several minutes and check whether the alarm is cleared.

    • If yes, no further action is required.

    • If no, go to 12.

Check the HBase service status.

  1. On the FusionInsight Manager portal, click Cluster > Name of the desired cluster > Services > HBase.

  2. Check whether there is one active HMaster and one standby HMaster.

    • If yes, go to 15.

    • If no, go to 14.

  3. Click Instances, select the HMaster whose status is not Active, click More, and select Restart Instance to restart the HMaster. Check whether there is one active HMaster and one standby HMaster again.

    • If yes, go to 15.

    • If no, go to 21.

  4. Choose Cluster >Name of the desired cluster > Services > HBase > HMaster(Active) to go to the HMaster WebUI.

    Note

    By default, the admin user does not have the permissions to manage other components. If the page cannot be opened or the displayed content is incomplete when you access the native UI of a component due to insufficient permissions, you can manually create a user with the permissions to manage that component.

  5. Check whether at least one RegionServer exists under Region Servers.

    • If yes, go to 17.

    • If no, go to 21.

  6. Check Tables > System Tables, as shown in Figure 1. Check whether hbase:meta, hbase:namespace, and hbase:acl exist in the Table Name column.

    • If yes, go to 18.

    • If no, go to 19.

    **Figure 1** HBase system table

    Figure 1 HBase system table

  7. As shown in Figure 1, click the hbase:meta, hbase:namespace, and hbase:acl hyperlinks and check whether the pages are properly displayed. If the pages are properly displayed, the tables are normal.

    If they are, go to 19.

    If they are not, go to 23.

    Note

    In normal mode, ACL is enabled for HBase by default. The hbase:acl table is generated only when ACL is manually enabled. In this case, check this table. In other scenarios, this table does not need to be checked.

  8. View the HMaster startup status.

    In Figure 2, if the RUNNING state exists in Tasks, HMaster is being started. In the State column, you can view the time when HMaster is in the RUNNING state. In Figure 3, if the state is COMPLETE, HMaster is started.

    Check whether HMaster is in the RUNNING state for a long time.

    **Figure 2** HMaster is being started

    Figure 2 HMaster is being started

    **Figure 3** HMaster is started

    Figure 3 HMaster is started

    • If yes, go to 20.

    • If no, go to 21.

  9. On the HMaster WebUI, check whether any hbase:meta is in the Region in Transition state for a long time.

    **Figure 4** Region in Transition

    Figure 4 Region in Transition

    • If yes, go to 21.

    • If no, go to 22.

  10. In the precondition that services are not affected, log in to the FusionInsight Manager portal and choose Cluster > Name of the desired cluster > Services > HBase > More > Restart Service. Enter the administrator password and click OK.

    • If yes, go to 22.

    • If no, go to 23.

  11. Wait several minutes and check whether the alarm is cleared.

    • If yes, no further action is required.

    • If no, go to 23.

Check the network connection between HMaster and dependent components.

  1. On the FusionInsight Manager, choose Cluster >Name of the desired cluster > Services > HBase.

  2. Click Instance and the HMaster instance list is displayed. Record the management IP Address in the row of HMaster(Active).

  3. Use the IP address obtained in 24 to log in to the host where the active HMaster runs as user omm .

  4. Run the ping command to check whether communication between the host that runs the active HMaster and the hosts that run the dependent components. (The dependent components include ZooKeeper, HDFS and Yarn. Obtain the IP addresses of the hosts that run these services in the same way as that for obtaining the IP address of the active HMaster.)

    • If yes, go to 29.

    • If no, go to 27.

  5. Contact the administrator to restore the network.

  6. In the alarm list, check whether HBase Service Unavailable is cleared.

    • If yes, no further action is required.

    • If no, go to 29.

Collect fault information.

  1. On the FusionInsight Manager, choose O&M > Log > Download.

  2. Select the following nodes in the required cluster from the Service drop-down list:

    • ZooKeeper

    • HDFS

    • HBase

  3. Click image1 in the upper right corner, and set Start Date and End Date for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click Download.

  4. Contact the O&M personnel and send the collected logs.

Alarm Clearing

After the fault is rectified, the system automatically clears this alarm.