ALM-45000 HetuEngine Service Unavailable

Description

The system checks the HetuEngine service status every 300 seconds. This alarm is generated when the HetuEngine service is unavailable.

This alarm is cleared when the HetuEngine service recovers.

Attribute

Alarm ID

Alarm Severity

Auto Clear

45000

Critical

Yes

Parameters

Name

Meaning

Source

Specifies the cluster for which the alarm is generated.

ServiceName

Specifies the service for which the alarm is generated.

RoleName

Specifies the role for which the alarm is generated.

HostName

Specifies the host for which the alarm is generated.

Impact on the System

HetuEngine tasks fail to execute.

Possible Causes

  • The KrbServer service is abnormal.

  • The ZooKeeper service is abnormal.

  • The HDFS service is abnormal.

  • The Yarn service is abnormal.

  • The DBService service is abnormal.

  • The Hive service is abnormal.

  • Thre are no HSBroker instances in HetuEngine.

Procedure

Check the KrbServer service status.

  1. On FusionInsight Manager, choose O&M > Alarm > Alarm.

  2. In the alarm list, check whether the "ALM-25500 KrbServer Service Unavailable" alarm is generated.

    • If yes, go to 3.

    • If no, go to 5.

  3. Clear "ALM-25500 KrbServer Service Unavailable" according to the alarm help.

  4. In the alarm list, check whether the alarm "ALM-45000 HetuEngine Service Unavailable" is cleared.

    • If yes, no further action is required.

    • If no, go to 5.

Check the ZooKeeper service status.

  1. In the alarm list, check whether the alarm "ALM-12007 Process Fault" is generated.

    • If yes, go to 6.

    • If no, go to 9.

  2. In the alarm list, click image1 in the row that contains the "Process Fault" alarm. Check whether the name of the service for which the alarm is generated is ZooKeeper in Location Information.

    • If yes, go to 7.

    • If no, go to 9.

  3. Clear "ALM-12007 Process Fault" according to the alarm help.

  4. In the alarm list, check whether the alarm "ALM-45000 HetuEngine Service Unavailable" is cleared.

    • If yes, no further action is required.

    • If no, go to 9.

Check the HDFS service status.

  1. In the alarm list, check whether the "ALM-14000 HDFS Service Unavailable" alarm is generated.

    • If yes, go to 10.

    • If no, go to 12.

  2. Clear "ALM-14000 HDFS Service Unavailable" according to the alarm help.

  3. In the alarm list, check whether the "ALM-45000 HetuEngine Service Unavailable" alarm is cleared.

    • If yes, no further action is required.

    • If no, go to 12.

Check the YARN service status.

  1. In the alarm list, check whether the "ALM-18000 YARN Service Unavailable" alarm is generated.

    • If yes, go to 13.

    • If no, go to 15.

  2. Clear "ALM-18000 YARN Service Unavailable" according to the alarm help.

  3. In the alarm list, check whether the "ALM-45000 HetuEngine Service Unavailable" alarm is cleared.

    • If yes, no further action is required.

    • If no, go to 15.

Check the DBService service status.

  1. In the alarm list, check whether the "ALM-27001 DBService Service Unavailable" alarm is generated.

    • If yes, go to 16.

    • If no, go to 20.

  2. Clear "ALM-27001 DBService Service Unavailable" according to the alarm help.

  3. In the alarm list, check whether the "ALM-45000 HetuEngine Service Unavailable" alarm is cleared.

    • If yes, no further action is required.

    • If no, go to 20.

Check the Hive service status.

  1. In the alarm list, check whether the "ALM-16004 Hive Service Unavailable" alarm is generated.

    • If yes, go to 19.

    • If no, go to 20.

  2. Clear "ALM-16004 Hive Service Unavailable" according to the alarm help.

  3. In the alarm list, check whether the "ALM-45000 HetuEngine Service Unavailable" alarm is cleared.

    • If yes, no further action is required.

    • If no, go to 20.

Check whether there are no HSBroker instances in HetuEngine.

  1. On FusionInsight Manager, choose Cluster > Name of the desired cluster > Services > HetuEngine. On the page that is displayed, click the Instance tab.

  2. Check whether there are no HSBroker instances.

    • If yes, click Add Instance to add one.

    • If no, go to 23.

  3. In the alarm list, check whether the "ALM-45000 HetuEngine Service Unavailable" alarm is cleared.

    • If yes, no further action is required.

    • If no, go to 23.

Check the network connection between HetuEngine and ZooKeeper, HDFS, YARN, DBService, and Hive.

  1. On FusionInsight Manager, choose Cluster > Name of the desired cluster > Services > HetuEngine. On the page that is displayed, click the Instance tab.

  2. Click the host name in the HSBroker row and record the management IP address in the Basic Information area.

  3. Log in to the host where HSBroker resides as user omm using the IP address obtained in 25.

  4. Run the ping command to check whether the network connection between the host where HSBroker resides and the hosts where ZooKeeper, HDFS, Yarn, DBService, and Hive reside is in the normal state.

    • If yes, go to 30.

    • If no, go to 28.

  5. Contact the network administrator to restore the network.

  6. In the alarm list, check whether the "ALM-45000 HetuEngine Service Unavailable" alarm is cleared.

    • If yes, no further action is required.

    • If no, go to 30.

Collect fault information.

  1. On FusionInsight Manager, choose O&M > Log > Download.

  2. Expand the Service drop-down list. In the Services dialog box that is displayed, select HetuEngine under the target cluster name, and click OK.

  3. Expand the Hosts drop-down list. In the Select Host dialog box that is displayed, select the hosts to which the role belongs, and click OK.

  4. Click image2 in the upper right corner, and set Start Date and End Date for log collection to 30 minutes ahead of and after the alarm generation time respectively. Then, click Download.

  5. Contact O&M personnel and provide the collected logs.

Alarm Clearing

After the fault is rectified, the system automatically clears this alarm.

Reference

None