• MapReduce Service

mrs
  1. Help Center
  2. MapReduce Service
  3. User Guide
  4. MRS Manager Operation Guide
  5. Alarm Reference
  6. ALM-14012 HDFS JournalNode Data Is Not Synchronized

ALM-14012 HDFS JournalNode Data Is Not Synchronized

Description

On the active NameNode, the system checks data synchronization on all JournalNodes in the cluster every 5 minutes. This alarm is generated when data on a JournalNode is not synchronized with that on other JournalNodes.

This alarm is cleared in 5 minutes after data on the JournalNodes is synchronized.

Attribute

Alarm ID

Alarm Severity

Automatically Cleared

14012

Major

Yes

Parameters

Parameter

Description

ServiceName

Specifies the service for which the alarm is generated.

RoleName

Specifies the role for which the alarm is generated.

IP

Specifies the service IP address of the JournalNode instance for which the alarm is generated.

Impact on the System

If data on more than half of the JournalNodes is not synchronized, the NameNode cannot work correctly, making the HDFS service unavailable.

Possible Causes

  • The JournalNode instance has not been started or has been stopped.
  • The JournalNode instance is working incorrectly.
  • The network of the JournalNode is unreachable.

Procedure

  1. Check whether the JournalNode instance has been started.

    1. Log in to MRS Manager and click Alarm. In the alarm list, click the alarm.
    2. In the Alarm Details area, check Location and obtain the IP address of the JournalNode that generated the alarm.
    3. Choose Service > HDFS > Instance. In the instance list, click the JournalNode that generated the alarm and check whether Operating Status of the node is Started.
      • If yes, go to 2.a.
      • If no, go to 1.d.
    4. Select the JournalNode instance and choose More > Start Instance to start it.
    5. Wait 5 minutes and check whether the alarm is cleared.
      • If yes, no further action is required.
      • If no, go to Step 4.

  2. Check whether the JournalNode instance is working correctly.

    1. Check whether Health Status of the JournalNode instance is Good.
      • If yes, go to 3.a.
      • If no, go to 2.b.
    2. Select the JournalNode instance and choose More > Start Instance to start it.
    3. Wait 5 minutes and check whether the alarm is cleared.
      • If yes, no further action is required.
      • If no, go to Step 4.

  3. Check whether the network of the JournalNode is reachable.

    1. On the MRS Manager portal, choose Service > HDFS > Instance to check the service IP address of the active NameNode.
    2. Log in to the active NameNode.
    3. Run the ping Service IP address of the JournalNode command to check whether either a timeout occurs or the network between the active NameNode and the JournalNode is unreachable.
    4. Contact public cloud O&M personnel to rectify the network fault. Wait 5 minutes and check whether the alarm is cleared.
      • If yes, no further action is required.
      • If no, go to Step 4.

  4. Collect fault information.

    1. On MRS Manager, choose System > Export Log.
    2. Contact technical support engineers for help, detail see technical support.

Related Information

N/A