• MapReduce Service

mrs
  1. Help Center
  2. MapReduce Service
  3. User Guide
  4. MRS Manager Operation Guide
  5. Alarm Reference
  6. ALM-12012 NTP Service Is Abnormal

ALM-12012 NTP Service Is Abnormal

Description

This alarm is generated when the NTP service on the current node fails to synchronize time with the NTP service on the active OMS node. It is cleared when they succeed in synchronizing time.

Attribute

Alarm ID

Alarm Severity

Automatically Cleared

12012

Major

Yes

Parameters

Parameter

Description

ServiceName

Specifies the service for which the alarm is generated.

RoleName

Specifies the role for which the alarm is generated.

HostName

Specifies the host for which the alarm is generated.

Impact on the System

The time on the node is inconsistent with the time on other nodes in the cluster. Therefore, some MRS applications on the node may not run properly.

Possible Causes

  • The NTP service on the current node cannot start properly.
  • The current node fails to synchronize time with the NTP service on the active OMS node.
  • The key value authenticated by the NTP service on the current node is inconsistent with that on the active OMS node.
  • The time offset between the node and the NTP service on the active OMS node is large.

Procedure

  1. Check the NTP service on the current node.

    1. Check whether the ntpd process is running on the node using the following method. Log in to the node and run the sudo su - root command to switch the user. Run the following command to check whether the command output contains the ntpd process:

      ps -ef | grep ntpd | grep -v grep.

      • If yes, go to 2.a.
      • If no, go to 1.b.
    2. Run service ntp start to start the NTP service.
    3. Wait 10 minutes and check whether the alarm is cleared.
      • If yes, no further action is required.
      • If no, go to 2.a.

  2. Check whether the current node can synchronize time properly with the NTP service on the active OMS node.

    1. Check whether the node can synchronize time with the NTP service on the active OMS node based on Additional Info of the alarm.
    2. Check whether the synchronization with the NTP service on the active OMS node is faulty.

      Log in to the alarm node and run the sudo su - root command to switch the user. Then run the ntpq -np command.

      In the command output, if an asterisk (*) exists before the IP address of the NTP service on the active OMS node, the synchronization is in the normal state. The command output is as follows:

      remote refid st t when poll reach delay offset jitter ============================================================================== *10.10.10.162 .LOCL. 1 u 1 16 377 0.270 -1.562 0.014

      If no asterisk (*) exists before the IP address of the NTP service on the active OMS node and the value of refid is .INIT., the synchronization is abnormal. The command output is as follows:

      remote refid st t when poll reach delay offset jitter ============================================================================== 10.10.10.162 .INIT. 1 u 1 16 377 0.270 -1.562 0.014
    3. Rectify the fault, wait 10 minutes, and then check whether the alarm is cleared.
      • If yes, no further action is required.
      • If no, go to Step 3.

      An NTP synchronization failure is usually related to the system firewall. If the firewall can be disabled, disable it and check whether the fault is rectified. If the firewall cannot be disabled, check the firewall configuration policies and ensure that port UDP 123 is enabled. You need to follow specific firewall configuration policies of each system.

  3. Check whether the key value authenticated by the NTP service on the current node is consistent with that on the active OMS node.

    Run cat /etc/ntp/ntpkeys to check whether the authentication code with a key value index of 1 is the same as the value of the NTP service on the active OMS node.

  4. Check whether the time offset between the node and the NTP service on the active OMS node is large.

    1. Check whether the time offset is large in Additional Info of the alarm.
    2. On the Host page, select the host of the node, and choose More > Stop All Roles to stop all the services on the node.

      If the time on the alarm node is earlier than that on the NTP service of the active OMS node, adjust the time on the alarm node to be the same as that on the NTP service of the active OMS node. After doing so, choose More > Start All Roles to start services on the node.

      If the time on the alarm node is later than that on the NTP service of the active OMS node, wait until the time offset is due and adjust the time on the alarm node. After doing so, choose More > Start All Roles to start services on the node.

      NOTE:

      If you do not wait until the time offset is due, data loss may occur.

    3. Wait 10 minutes and check whether the alarm is cleared.
      • If yes, no further action is required.
      • If no, go to Step 5.

  5. Collect fault information.

    1. On MRS Manager, choose System > Export Log.
    2. Contact technical support engineers for help, detail see technical support.

Related Information

N/A