ALM-12037 NTP Server Abnormal

Description

The system checks the NTP server status every 60 seconds. This alarm is generated when the system detects that the NTP server is abnormal for 10 consecutive times.

This alarm is cleared when the NTP server recovers.

Attribute

Alarm ID

Alarm Severity

Auto Clear

12037

Major

Yes

Parameters

Name

Meaning

Source

Specifies the cluster or system for which the alarm is generated.

ServiceName

Specifies the service for which the alarm is generated.

RoleName

Specifies the role for which the alarm is generated.

HostName

Specifies the IP address of the NTP server for which the alarm is generated.

Impact on the System

The NTP server configured on the active OMS node is abnormal. In this case, the active OMS node cannot synchronize time with the NTP server and a time offset may be generated in the cluster.

Possible Causes

  • The NTP server network is abnormal.

  • The NTP server authentication fails.

  • The NTP server time cannot be obtained.

  • The time obtained from the NTP server is not continuously updated.

Procedure

Check the NTP server network.

  1. On the FusionInsight Manager portal, click O&M > Alarm > Alarms and click image1 in the row where the alarm is located.

  2. View the alarm additional information to check whether the NTP server fails to be pinged.

    • If yes, go to 3.

    • If no, go to 4.

  3. Contact the network administrator to check the network configuration and ensure that the network between the NTP server and the active OMS node is normal. Then, check whether the alarm is cleared.

    • If yes, no further action is required.

    • If no, go to 4.

Check whether the NTP server authentication fails.

  1. Log in to the active OMS node as user root.

  2. Run the following command to check the status of the resources on the active and standby nodes:

    su - omm

    sh ${BIGDATA_HOME}/om-server/om/sbin/status-oms.sh

    • If "chrony" is displayed in the ResName column of the command output, go to 6.

    • If "ntp" is displayed in the ResName column, go to 7.

    Note

    If both "chrony" and "ntp" are displayed in the ResName column of the command output, the NTP service mode is being switched. Wait for 10 minutes and perform 5 again. If both "chrony" and "ntp" still exist in the ResName column, contact O&M personnel.

  3. Run the command chronyc sources to check whether the NTP server authentication fails.

    If the value of Reach for chrony is 0, the connection or authentication fails.

    • If yes, go to 12.

    • If no, go to 8.

  4. Run the command ntpq -np to check whether the NTP server authentication fails.

    If refid of the NTP server is .AUTH., the authentication fails.

    • If yes, go to 12.

    • If no, go to 8.

Check whether the time can be obtained from the NTP server.

  1. View the alarm additional information to check whether the time can be obtained from the NTP server.

    • If yes, go to 9.

    • If no, go to 10.

  2. Contact the provider of the NTP server to rectify the NTP server fault. After the NTP server is normal, check whether the alarm is cleared.

    • If yes, no further action is required.

    • If no, go to 10.

Check whether the time obtained from the NTP server is not continuously updated.

  1. View the alarm additional information to check whether the time obtained from the NTP server is not continuously updated.

    • If yes, go to 11.

    • If no, go to 12.

  2. Contact the provider of the NTP server to rectify the NTP server fault. After the NTP server is normal, check whether the alarm is cleared.

    • If yes, no further action is required.

    • If no, go to 12.

Collect fault information.

  1. On the FusionInsight Manager, choose O&M > Log > Download.

  2. Select NodeAgent and OmmServer from the Service and click OK.

  3. Click image2 in the upper right corner, and set Start Date and End Date for log collection to 30 minutes ahead of and after the alarm generation time, respectively. Then, click Download.

  4. Contact the O&M personnel and send the collected log information.

Alarm Clearing

After the fault is rectified, the system automatically clears this alarm.