• MapReduce Service

mrs
  1. Help Center
  2. MapReduce Service
  3. User Guide
  4. Cluster Operation Guide
  5. Managing Active Clusters
  6. Shrinking a Cluster

Shrinking a Cluster

You can reduce Core nodes or Task nodes based on service requirements to shrink a cluster so that MRS has better storage and computing capabilities at lower O&M costs.

Background

An MRS cluster supports a maximum of 502 nodes. There are two Master nodes by default. The minimum number of Core nodes is three. The maximum number of 500 Core and Task nodes are supported by default. If more than 500 Core and Task nodes are required, contact technical support engineers or invoke a background interface to modify the database.

Core and Task nodes can be reduced but Master nodes cannot. After you adjust the number of nodes when shrinking a cluster, the system will automatically select nodes to delete them. At least three Core nodes and 0 Task node must be left.

Node selection policy
  • Service components such as ZooKeeper, DBServcie, KrbServer, and LdapServer are fundamental for stable cluster running. Therefore, their nodes cannot be deleted.
  • Core nodes are used to store cluster service data. When shrinking a cluster, data on the nodes to be deleted must be fully migrated to other nodes. Therefore, perform follow-up operations after cluster shrinking only when all services are decommissioned, such as making nodes exit MRS Manager and deleting ECSs. When you select nodes, prefer healthy nodes that store a small volume of data and whose instances can be decommissioned to avoid node decommission failure. For example, if DataNodes are installed on Core nodes in an analysis cluster, healthy DataNodes that store a small volume of data will be preferred.
  • Task nodes are computing nodes and are not used to store cluster data. Therefore, node data migration is not involved. When shrinking a cluster, prefer nodes whose health status is BadUnknown, or Partially Healthy. You can view health status of nodes on the instance management page after logging in to MRS Manager.

Cluster shrinking verification policies

Component decommissioning restrictions vary. Cluster shrinking is allowed only after all component decommissioning restrictions are complied with. Table 1 describes the component decommissioning restrictions.

Table 1 Component decommissioning restrictions

Component

Decommissioning Restriction

HDFS/DataNode

Restriction: The total volume of HDFS data must not exceed 80% of total volume of the shrunk HDFS cluster.

Cause: This ensures that there is sufficient available space to store existing data and some space can be reserved.

HBase/RegionServer

Restriction: Total available memory of RegionServers on nodes excluding nodes to be deleted must be greater than 1.2 times of the memory used by RegionServers on the nodes to be deleted.

Cause: Regions on a node to be decommissioned will be migrated to other nodes. Therefore, available memory of other nodes must be sufficient to bear regions migrated from the decommissioned node.

Kafka/Broker

Restriction: After shrinking, the number of nodes must not be fewer than the maximum number for topic replicas and the used Kafka disk space must not exceed 80% of the total Kafka disk space of the cluster.

Cause: This avoids insufficient disk space after cluster shrinking.

Storm/ Supervisor

Restriction: The number of slots in the shrunk cluster must be sufficient to run the submitted jobs.

Cause: This prevents resources from being insufficient to execute streaming processing tasks.

Flume/FlumeServer

Restriction: If FlumeServer is installed and Flume tasks have been configured on a node, the node cannot be deleted.

Cause: This prevents the deployed service applications from being mistakenly deleted.

Procedure

  1. Log in to the MRS management console.
  2. Click in the upper-left corner on the management console and select Region and Project.
  3. Choose Clusters > Active Clusters, select a running cluster, and click its name to switch to the cluster information page.
  4. Click Resize Cluster and go to the Resize Cluster page.

    This operation can be performed only on a running cluster in which all nodes are running.

  5. Set Node Type to Core Node or Task Node and configure the Nodes After Resize parameter.
  6. On the Resize Cluster page, click OK.
  7. In the Shrink Node dialog box, click OK.

    Cluster shrinking is explained as follows:
    • During shrinking: The cluster status is Shrinking. The submitted jobs will be executed and you can submit new jobs. You are not allowed to continue to shrink or terminate the cluster. You are advised not to restart the cluster or modify the cluster configuration.
    • Successful shrinking: The cluster status is Running. The resources used after node reduction are charged.
    • Failed shrinking: The cluster status is Running. You are allowed to execute jobs and shrink the cluster again.

    After the cluster shrink is successful, you can view node information on the cluster information page.