• MapReduce Service

mrs
  1. Help Center
  2. MapReduce Service
  3. User Guide
  4. MRS Manager Operation Guide
  5. Alarm Reference
  6. ALM-38001 Insufficient Kafka Disk Space

ALM-38001 Insufficient Kafka Disk Space

Description

The system checks the Kafka disk usage every 60 seconds and compares it with the threshold. This alarm is generated when the disk usage exceeds the threshold.

To modify the threshold, users can choose System > Threshold Configuration on MRS Manager.

This alarm is cleared when the Kafka disk usage is lower than or equal to the threshold.

Attribute

Alarm ID

Alarm Severity

Automatically Cleared

38001

Major

Yes

Parameters

Parameter

Description

ServiceName

Specifies the service for which the alarm is generated.

RoleName

Specifies the role for which the alarm is generated.

HostName

Specifies the host for which the alarm is generated.

PartitionName

Specifies the disk partition where the alarm is generated.

Trigger Condition

Generates an alarm when the actual indicator value exceeds the specified threshold.

Impact on the System

Kafka fails to write data to the disks.

Possible Causes

  • The Kafka disk configurations (such as disk count and disk size) are insufficient for the data volume.
  • The data retention period is long and historical data occupies a large space.
  • Services are improperly planned. As a result, data is unevenly distributed and some disks are full.

Procedure

  1. Log in to MRS Manager and click Alarm.
  2. In the alarm list, click the alarm and view the HostName and PartitionName of the alarm in Location of Alarm Details.
  3. In Hosts, click the host obtained in 2.
  4. Check whether the Disk area contains the PartionName of the alarm.
    • If yes, go to 5.
    • If no, manually clear the alarm and no further action is required.
  5. In the Disk area, check whether the usage of the alarmed partition has reached 100%.
    • If yes, go to 6.
    • If no, go to 8.
  6. In Instance, choose Broker > Instance Configuration. On the Instance Configuration page that is displayed, set Type to All and query data directory parameter log.dirs.
  7. Choose Service > Kafka > Instance. On the Kafka Instance page that is displayed, stop the Broker instance corresponding to that in 2. Then log in to the alarm node and manually delete the data directory queried in 6. After all subsequent operations are complete, start the Broker instance.
  8. Choose Service > Kafka > Service Configuration. The Kafka Configuration page is displayed.
  9. Check whether disk.adapter.enable is true.
    • If yes, go to 11.
    • If no, change the value to true and go to 10.
  10. Check whether the adapter.topic.min.retention.hours parameter, indicating the minimum data retention period, is properly configured.
    • If yes, go to 11.
    • If no, set it to a proper value and go to 11.
    NOTE:

    If the retention period cannot be adjusted for certain topics, the topics can be added to disk.adapter.topic.blacklist.

  11. Wait 10 minutes and check whether the disk usage is reduced.
    • If yes, wait until the alarm is cleared.
    • If no, go to 12.
  12. Go to the Kafka Topic Monitor page and query the data retention period configured for Kafka. Determine whether the retention period needs to be shortened based on service requirements and data volume.
    • If yes, go to 13.
    • If no, go to 14.
  13. Find the topics with great data volumes based on the disk partition obtained in 2. Log in to the Kafka client and manually shorten the data retention period for these topics using the following command:

    kafka-topics.sh --zookeeper ZooKeeper address:24002/kafka --alter --topic Topic name --config retention.ms=Retention period

  14. Check whether partitions are properly configured for topics. For example, if the number of partitions for a topic with a large data volume is smaller than the number of disks, data may be unevenly distributed to the disks and the usage of some disks will reach the upper limit.
    NOTE:

    To identify topics with great data volumes, log in to the relevant nodes that are obtained in 2, go to the data directory (the directory before log.dirs in 6 is modified), and check the disk space occupied by the partitions of the topics.

    • If the partitions are improperly configured, go to 15.
    • If the partitions are properly configured, go to 16.
  15. On the Kafka client, add partitions to the topics.

    kafka-topics.sh --zookeeper ZooKeeper address:24002/kafka --alter --topic Topic name --partitions=Number of new partitions

    NOTE:

    It is advised to set the number of new partitions to a multiple of the number of Kafka disks.

    This operation may not quickly clear the alarm. Data will be gradually balanced among the disks.

  16. Check whether the cluster capacity needs to be expanded.
    • If yes, add nodes to the cluster and go to 17.
    • If no, go to 17.
  17. Wait a moment and then check whether the alarm is cleared.
    • If yes, no further action is required.
    • If no, go to 18.
  18. Contact technical support engineers for help, detail see technical support.

Related Information

N/A