• MapReduce Service

mrs
  1. Help Center
  2. MapReduce Service
  3. User Guide
  4. MRS Manager Operation Guide
  5. Alarm Reference
  6. ALM-12045 Network Read Packet Dropped Rate Exceeds the Threshold

ALM-12045 Network Read Packet Dropped Rate Exceeds the Threshold

Description

The system checks the network read packet dropped rate every 30 seconds and compares the actual packet dropped rate with the threshold (the default threshold is 0.5%). This alarm is generated when the system detects that the network read packet dropped rate exceeds the threshold for several times (5 times by default) consecutively.

To change the threshold, choose System > Threshold Configuration > Device > Host > Network Reading > Network Read Packet Rate Information > Read Packet Dropped Rate.

When the hit number is 1, this alarm is cleared when the network read packet dropped rate is less than or equal to the threshold. When the hit number is greater than 1, this alarm is cleared when the network read packet dropped rate is less than or equal to 90% of the threshold.

Alarm detection is disabled by default. If you want to enable this function, check whether alarm sending can be enabled based on section "Check the system environment."

Attribute

Alarm ID

Alarm Severity

Automatically Cleared

12045

Major

Yes

Parameters

Parameter

Description

ServiceName

Specifies the service for which the alarm is generated.

RoleName

Specifies the role for which the alarm is generated.

HostName

Specifies the host for which the alarm is generated.

NetworkCardName

Specifies the network port for which the alarm is generated.

Trigger Condition

Generates an alarm when the actual indicator value exceeds the specified threshold.

Impact on the System

The service performance deteriorates or services time out.

Precautions: In SUSE (kernel: 3.0 or later) or Red Hat 7.2, because the system kernel modifies the mechanism for counting read and discarded packets, this alarm may be generated even when the network is normal. Services are not adversely affected. You are advised to check whether the alarm is caused by this problem based on section "Check the system environment."

Possible Causes

  • An OS exception occurs.
  • The NIC has configured the active/standby bond mode.
  • The alarm threshold is set improperly.
  • The network is abnormal.

Procedure

Check the network packet dropped rate.

  1. Use PuTTY to log in to any node for which the alarm is not generated in the cluster as user omm and run the ping IP address -c 100 command to check whether network packet loss occurs.

    # ping 10.10.10.12 -c 5   
    PING 10.10.10.12 (10.10.10.12) 56(84) bytes of data.   
    64 bytes from 10.10.10.11: icmp_seq=1 ttl=64 time=0.033 ms   
    64 bytes from 10.10.10.11: icmp_seq=2 ttl=64 time=0.034 ms   
    64 bytes from 10.10.10.11: icmp_seq=3 ttl=64 time=0.021 ms   
    64 bytes from 10.10.10.11: icmp_seq=4 ttl=64 time=0.033 ms   
    64 bytes from 10.10.10.11: icmp_seq=5 ttl=64 time=0.030 ms     
    --- 10.10.10.12 ping statistics ---   
    5 packets transmitted, 5 received, 0% packet loss, time 4001ms   rtt min/avg/max/mdev = 0.021/0.030/0.034/0.006 ms
    NOTE:
    • IP address: indicates the value of HostName in the alarm location information. To query the value of OM IP and Business IP, click Host on MRS Manager.
    • -c: indicates the check times. The default value is 100.

Check the system environment.

  1. Use PuTTY to log in as user omm to the active OMS node or the node for which the alarm is generated.
  2. Run the cat /etc/*-release command to check the OS type.

    • If EulerOS is used, go to Step 4.
      # cat /etc/*-release
      EulerOS release 2.0 (SP2)
      EulerOS release 2.0 (SP2)
    • If SUSE is used, go to Step 5.
      # cat /etc/*-release
      SUSE Linux Enterprise Server 11 (x86_64)
      VERSION = 11
      PATCHLEVEL = 3
    • If another OS is used, go to Step 10.

  3. Run the cat /etc/euleros-release command to check whether the OS version is

    EulerOS 2.2.

     # cat  /etc/euleros-release
    EulerOS release 2.0 (SP2)
    • If yes, the alarm sending function cannot be enabled. Go to Step 6.

  4. Run the cat /proc/version command to check whether the SUSE kernel version is 3.0 or later.

    # cat /proc/version
    Linux version 3.0.101-63-default (geeko@buildhost) (gcc version 4.3.4 [gcc-4_3-branch revision 152973] (SUSE Linux) ) #1 SMP Tue Jun 23 16:02:31 UTC 2015 (4b89d0c)
    • If yes, the alarm sending function cannot be enabled. Go to Step 6.
    • If no, go to Step 10.

  5. Log in to MRS Manager and choose System > Configuration > Threshold Configuration.
  6. In the navigation tree of the Threshold Configuration page, choose Network Reading > Network Read Packet Rate Information Read Packet Dropped Rate. In the area on the right, check whether Send Alarm is selected.

    • If yes, the alarm sending function has been enabled. Go to Step 8.
    • If no, the alarm sending function has been disabled. Go to Step 9.

  7. In the area on the right, deselect Send Alarm to disable the checking of Network Read Packet Dropped Rate Exceeds the Threshold.
  8. On the Alarm page of MRS Manager, search for the 12045 alarm. If the alarm is not cleared automatically, clear it manually. No further action is required.

    NOTE:

    The ID of alarm Network Read Packet Dropped Rate Exceeds the Threshold is 12045.

Check whether the NIC has configured the active/standby bond mode.

  1. Use PuTTY to log in to the alarm node as user omm. Run the ls -l /proc/net/bonding command to check whether directory /proc/net/bonding exists on the alarm node.

    • If yes, the NIC has configured the active/standby bond mode, as shown in the following. Go to Step 11.
      # ls -l /proc/net/bonding/
      total 0
      -r--r--r-- 1 root root 0 Oct 11 17:35 bond0
    • If no, the NIC has not configured the active/standby bond mode, as shown in the following. Go to Step 13.
      # ls -l /proc/net/bonding/
      ls: cannot access /proc/net/bonding/: No such file or directory

  2. Run the cat /proc/net/bonding/bond0 command and check whether the value of Bonding Mode is fault-tolerance.

    NOTE:

    bond0 indicates the name of the bond configuration file. Use the file name queried in Step 10 in practice.

    # cat /proc/net/bonding/bond0 
    Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
    
    Bonding Mode: fault-tolerance (active-backup)
    Primary Slave: eth1 (primary_reselect always)
    Currently Active Slave: eth1
    MII Status: up
    MII Polling Interval (ms): 100
    Up Delay (ms): 0
    Down Delay (ms): 0
    
    Slave Interface: eth0
    MII Status: up
    Speed: 1000 Mbps
    Duplex: full
    Link Failure Count: 1
    Slave queue ID: 0
    
    Slave Interface: eth1
    MII Status: up
    Speed: 1000 Mbps
    Duplex: full
    Link Failure Count: 1
    Slave queue ID: 0
    • If yes, the NIC has configured the active/standby bond mode. Go to Step 12.
    • If no, the NIC has not configured the active/standby bond mode. Go to Step 13.

  3. Check whether the NIC of the NetworkCardName parameter in the alarm details is the standby NIC.

    • If yes, manually clear the alarm on the Alarms page because the alarm on the standby cannot be automatically cleared. No further action is required.
    • If no, go to Step 13.
      NOTE:

      Method of determining whether an NIC is standby: In the /proc/net/bonding/bond0 configuration file, check whether the NIC name of the NetworkCardName parameter is the same as the Slave Interface, but is different from Currently Active Slave (indicating the current active NIC). If the answer is yes, the NIC is a standby one.

Check whether the threshold is set properly.

  1. Log in to MRS Manager and check whether the alarm threshold is set properly. (By default, 0.5% is a proper value. However, users can configure the value as required.)

  2. Based on actual usage condition, choose System > Threshold Configuration > Device > Host > Network Reading > Network Read Packet Rate Information > Read Packet Dropped Rate to modify the alarm threshold.

    For details, see Figure 1.

    Figure 1 Setting alarm thresholds

  3. Wait 5 minutes and check whether the alarm is cleared.

    • If yes, no further action is required.
    • If no, go to Step 16.

Check whether the network is normal.

  1. Contact the system administrator to check whether the network is abnormal.

    • If yes, go to Step 17 to rectify the network fault.
    • If no, go to Step 18.

  2. Wait 5 minutes and check whether the alarm is cleared.

    • If yes, no further action is required.
    • If no, go to Step 18.

Collect fault information.

  1. On MRS Manager, choose System > Export Log.
  2. Contact technical support engineers for help, detail see technical support.

Related Information

N/A