Creating a Custom Topology Cluster

The analysis cluster, streaming cluster, and hybrid cluster provided by MRS use fixed templates to deploy cluster processes. Therefore, you cannot customize service processes on management nodes and control nodes.

A custom cluster provides the following functions:

  • Separated deployment of the management and control roles: The management role and control role are deployed on different Master nodes.

  • Co-deployment of the management and control roles: The management and control roles are co-deployed on the Master node.

  • ZooKeeper is deployed on an independent node to improve reliability.

  • Components are deployed separately to avoid resource contention.

Roles in an MRS cluster:

  • Management Node (MN): is the node to install Manager (the management system of the MRS cluster). It provides a unified access entry. Manager centrally manages nodes and services deployed in the cluster.

  • Control Node (CN): controls and monitors how data nodes store and receive data, and send process status, and provides other public functions. Control nodes of MRS include HMaster, HiveServer, ResourceManager, NameNode, JournalNode, and SlapdServer.

  • Data Node (DN): A data node executes the instructions sent by the management node, reports task status, stores data, and provides other public functions. Data nodes of MRS include DataNode, RegionServer, and NodeManager.

Customizing a Cluster

  1. Log in to the MRS console.

  2. Click Create Cluster. The page for creating a cluster is displayed.

  3. Click the Custom Config tab.

  4. Configure basic cluster information. For details about the parameters, see Software Configurations.

    • Region: Retain the default value.

    • Cluster Name: You can use the default name. However, you are advised to include a project name abbreviation or date for consolidated memory and easy distinguishing, for example, mrs_20180321.

    • Cluster Type: Use the default value.

    • Cluster Version: Select the latest version, which is the default value.

    • Component: Select components such as Spark2x, HBase, and Hive for the analysis cluster. For a streaming cluster, select components such as Kafka and Storm. For a hybrid cluster, you can select the components of the analysis cluster and streaming cluster based on service requirements.

    • Component Port: Use the default Open source.

  5. Click Next. Configure hardware information.

    • AZ: Retain the default value.

    • VPC: Retain the default value. If there is no available VPC, click View VPC to access the VPC console and create a new VPC.

    • Subnet: Retain the default value.

    • Security Group: Select Auto create.

    • EIP: Select Bind later.

    • CPU Architecture: Retain the default value.

    • Common Template: Select a template based on service requirements.

    • Cluster Nodes

      • Node Count: the number of nodes you want to purchase. For MRS 3.x clusters, the default value is 3. You can set the value as you need.

      • Instance Specifications: Retain the default settings for master and core nodes or select proper specifications based on service requirements.

      • System Disk: Retain the default Ultra-high I/O and storage capacity.

      • Data Disk: Retain the default Ultra-high I/O, storage capacity, and quantity.

    • Topology Adjustment: If the deployment mode in the Common Node does not meet the requirements, you need to manually install some instances that are not deployed by default, or you need to manually install some instances, set Topology Adjustment to Enable and adjust the instance deployment mode based on service requirements. For details, see Topology Adjustment for a Custom Cluster.

  6. Click Next and set advanced options.

    For details about the parameters, see Advanced Options.

  7. Click Next.

    • Configure: Confirm the parameters configured in the Configure Software, Configure Hardware, and Set Advanced Options areas.

    • Select the check box for Secure Communications.

  8. Click Create Now.

    If Kerberos authentication is enabled for a cluster, check whether Kerberos authentication is required. If yes, click Continue. If no, click Back to disable Kerberos authentication and then create a cluster.

  9. Click Back to Cluster List to view the cluster status.

    It takes some time to create a cluster. The initial status of the cluster is Starting. After the cluster has been created successfully, the cluster status becomes Running.

Custom Cluster Template Description

Table 1 Common templates for custom clusters

Common Node

Description

Node Range

Compact

The management role and control role are deployed on the Master node, and data instances are deployed in the same node group. This deployment mode applies to scenarios where the number of control nodes is less than 100, reducing costs.

  • The number of Master nodes is greater than or equal to 3 and less than or equal to 11.

  • The total number of node groups is less than or equal to 10, and the total number of nodes in non-Master node groups is less than or equal to 10,000.

OMS-separate

The management role and control role are deployed on different Master nodes, and data instances are deployed in the same node group. This deployment mode is applicable to a cluster with 100 to 500 nodes and delivers better performance in high-concurrency load scenarios.

  • The number of Master nodes is greater than or equal to 5 and less than or equal to 11.

  • The total number of node groups is less than or equal to 10, and the total number of nodes in non-Master node groups is less than or equal to 10,000.

Full-size

The management role and control role are deployed on different Master nodes, and data instances are deployed in different node groups. This deployment mode is applicable to a cluster with more than 500 nodes. Components can be deployed separately, which can be used for a larger cluster scale.

  • The number of Master nodes is greater than or equal to 9 and less than or equal to 11.

  • The total number of node groups is less than or equal to 10, and the total number of nodes in non-Master node groups is less than or equal to 10,000.

Table 2 Node deployment scheme of a customized MRS cluster

Node Deployment Principle

Applicable Scenario

Networking Rule

Management nodes, control nodes, and data nodes are deployed separately.

(This scheme requires at least eight nodes.)

MN x 2 + CN x 9 + DN x n

(Recommended) This scheme is used when the number of data nodes is 500-2000.

  • If the number of nodes in a cluster exceeds 200, the nodes are distributed to different subnets and the subnets are interconnected with each other in Layer 3 using core switches. Each subnet can contain a maximum of 200 nodes and the allocation of nodes to different subnets must be balanced.

  • If the number of nodes is less than 200, the nodes in the cluster are deployed in the same subnet and the nodes are interconnected with each other in Layer 2 using aggregation switches.

MN x 2 + CN x 5 + DN x n

(Recommended) This scheme is used when the number of data nodes is 100-500.

MN x 2 + CN x 3 + DN x n

(Recommended) This scheme is used when the number of data nodes is 30-100.

The management nodes and control nodes are deployed together, and the data nodes are deployed separately.

(MN+CN) x 3 + DN x n

(Recommended) This scheme is used when the number of data nodes is 3-30.

Nodes in the cluster are deployed in the same subnet and are interconnected with each other at Layer 2 through aggregation switches.

The management nodes, control nodes, and data nodes are deployed together.

  • This scheme is applicable to a cluster having fewer than 6 nodes.

  • This scheme requires at least three nodes.

Note

This template is not recommended in the production environment or commercial environment.

  • If management, control, and data nodes are co-deployed, cluster performance and reliability are greatly affected.

  • If the number of nodes meet the requirements, deploy data nodes separately.

  • If the number of nodes is insufficient to support separately deployed data nodes, use the dual-plane networking mode for this scenario. The traffic of the management network is isolated from that of the service network to prevent excessive data volumes on the service plane, ensuring correct delivery of management operations.

Nodes in the cluster are deployed in the same subnet and are interconnected with each other at Layer 2 through aggregation switches.

Topology Adjustment for a Custom Cluster

Table 3 Topology adjustment

Service

Dependency

Role

Role Deployment Suggestions

Description

OMSServer

-

OMSServer

This role can be deployed it on the Master node and cannot be modified.

-

CDL

(applicable only to MRS 3.2.0)

  • Depends on Kafka.

  • Depends on DBService.

CC(CDLConnector)

This role can be deployed in all node groups.

Number of role instances to be deployed: 1 to 256

It is recommended that the number of CDLConnector instances to be deployed be the same as the number of Broker roles.

CS(CDLService)

This role can be deployed in all node groups.

Number of role instances to be deployed: 1 or 2

-

ClickHouse

Depends on ZooKeeper.

CHS (ClickHouseServer)

This role can be deployed on all nodes.

Number of role instances to be deployed: an even number ranging from 2 to 256

A non-Master node group with this role assigned is considered as a Core node.

CLB (ClickHouseBalancer)

This role can be deployed on all nodes.

Number of role instances to be deployed: 2 to 256

-

ZooKeeper

-

QP(quorumpeer)

This role can be deployed on the Master node only.

Number of role instances to be deployed: 3 to 9, with the step size of 2

-

Hadoop

Depends on ZooKeeper.

NN(NameNode)

This role can be deployed on the Master node only.

Number of role instances to be deployed: 2

The NameNode and ZKFC processes are deployed on the same server for cluster HA.

HFS (HttpFS)

This role can be deployed on the Master node only.

Number of role instances to be deployed: 0 to 10

-

JN(JournalNode)

This role can be deployed on the Master node only.

Number of role instances to be deployed: 3 to 60, with the step size of 2

-

DN(DataNode)

This role can be deployed on all nodes.

Number of role instances to be deployed: 3 to 10,000

A non-Master node group with this role assigned is considered as a Core node.

RM(ResourceManager)

This role can be deployed on the Master node only.

Number of role instances to be deployed: 2

-

NM(NodeManager)

This role can be deployed on all nodes.

Number of role instances to be deployed: 3 to 10,000

-

JHS(JobHistoryServer)

This role can be deployed on the Master node only.

Number of role instances to be deployed: 1 to 2

-

TLS(TimelineServer)

This role can be deployed on the Master node only.

Number of role instances to be deployed: 0 to 1

-

Presto

Depends on Hive.

PCD(Coordinator)

This role can be deployed on the Master node only.

Number of role instances to be deployed: 2

-

PWK(Worker)

This role can be deployed on all nodes.

Number of role instances to be deployed: 1 to 10,000

-

Spark2x

  • Depends on Hadoop.

  • Depends on Hive.

  • Depends on ZooKeeper.

JS2X(JDBCServer2x)

This role can be deployed on the Master node only.

Number of role instances to be deployed: 2 to 10

-

JH2X(JobHistory2x)

This role can be deployed on the Master node only.

Number of role instances to be deployed: 2

-

SR2X(SparkResource2x)

This role can be deployed on the Master node only.

Number of role instances to be deployed: 2 to 50

-

IS2X(IndexServer2x)

(Optional) This role can be deployed on the Master node only.

Number of role instances to be deployed: 0 to 2, with the step size of 2

-

HBase

Depends on Hadoop.

HM(HMaster)

This role can be deployed on the Master node only.

Number of role instances to be deployed: 2

-

TS(ThriftServer)

This role can be deployed on all nodes.

Number of role instances to be deployed: 0 to 10,000

-

RT(RESTServer)

This role can be deployed on all nodes.

Number of role instances to be deployed: 0 to 10,000

-

RS(RegionServer)

This role can be deployed on all nodes.

Number of role instances to be deployed: 3 to 10,000

-

TS1(Thrift1Server)

This role can be deployed on all nodes.

Number of role instances to be deployed: 0 to 10,000

If the Hue service is installed in a cluster and HBase needs to be used on the Hue web UI, install this instance for the HBase service.

Hive

  • Depends on Hadoop.

  • Depends on DBService.

MS(MetaStore)

This role can be deployed on the Master node only.

Number of role instances to be deployed: 2 to 10

-

WH (WebHCat)

This role can be deployed on the Master node only.

Number of role instances to be deployed: 1 to 10

-

HS(HiveServer)

This role can be deployed on the Master node only.

Number of role instances to be deployed: 2 to 80

-

Hue

Depends on DBService.

H(Hue)

This role can be deployed on the Master node only.

Number of role instances to be deployed: 2

-

Sqoop

Depends on Hadoop.

SC(SqoopClient)

This role can be deployed on all nodes.

Number of role instances to be deployed: 1 to 10,000

-

Kafka

Depends on ZooKeeper.

B(Broker)

This role can be deployed on all nodes.

Number of role instances to be deployed: 3 to 10,000

-

Flume

-

MS(MonitorServer)

This role can be deployed on the Master node only.

Number of role instances to be deployed: 1 to 2

-

F(Flume)

This role can be deployed on all nodes.

Number of role instances to be deployed: 1 to 10,000

A non-Master node group with this role assigned is considered as a Core node.

Tez

  • Depends on Hadoop.

  • Depends on DBService.

  • Depends on ZooKeeper.

TUI(TezUI)

This role can be deployed on the Master node only.

Number of role instances to be deployed: 1 to 2

-

Flink

  • Depends on ZooKeeper.

  • Depends on Hadoop.

FR(FlinkResource)

This role can be deployed on all nodes.

Number of role instances to be deployed: 1 to 10,000

-

FS(FlinkServer)

This role can be deployed on all nodes.

Number of role instances to be deployed: 0 to 2

-

Oozie

  • Depends on Hadoop.

  • Depends on DBService.

  • Depends on ZooKeeper.

O(oozie)

This role can be deployed on the Master node only.

Number of role instances to be deployed: 2

-

Impala

  • Depends on Hadoop.

  • Depends on Hive.

  • Depends on DBService.

  • Depends on ZooKeeper.

StateStore

This role can be deployed on the Master node only.

Number of role instances to be deployed: 1

-

Catalog

This role can be deployed on the Master node only.

Number of role instances to be deployed: 1

-

Impalad

This role can be deployed on all nodes.

Number of role instances to be deployed: 1 to 10,000

-

Kudu

-

KuduMaster

This role can be deployed on the Master node only.

Number of role instances to be deployed: 3 or 5

-

KuduTserver

This role can be deployed on all nodes.

Number of role instances to be deployed: 3 to 10,000

-

Ranger

Depends on DBService.

RA(RangerAdmin)

This role can be deployed on the Master node only.

Number of role instances to be deployed: 1 to 2

-

USC(UserSync)

This role can be deployed on the Master node only.

Number of role instances to be deployed: 1

-

TSC (TagSync)

This role can be deployed on all nodes.

Number of role instances to be deployed: 0 to 1

-

IoTDB

(applicable only to MRS 3.2.0)

Depends on KerbServer.

IS (IoTDBServer)

This role can be deployed in all node groups.

Number of role instances to be deployed: 3 to 256

It is recommended that IoTDBServer be deployed independently and not co-deployed with other data nodes.

CN(ConfigNode)

This role can only be deployed in the master node group.

The number of role instances must be at least 3 and at most 9 with a step size of 2.

-

JobGateway

  • Depends on Hadoop.

  • Depends on DBService.

JS(JobServer)

This role can only be deployed in the master node group.

The number of role instances must be at least 2 and at most 10.

-

JB(JobBalancer)

This role can be deployed in all node groups.

Number of role instances to be deployed: 2

-

Guardian

  • Depends on Hadoop.

  • Depends on ZooKeeper.

TS(TokenServer)

This role can be deployed in all node groups.

The number of role instances must be at least 2 and at most 100.

The Guardian component needs to be installed only when OBS is connected.

Doris

Doris depends on LdapServer.

FE

You can deploy the role on 1 to 199 nodes. You can set the number with a step size of 2.

When the FE native port is used to create a cluster, it conflicts with Yarn ResourceManager native port 8030. Do not deploy FE and Yarn ResourceManager on the same node.

BE

This role can be deployed in all node groups.

The number of role instances must be at least 3 and at most 200.

-

DBroker

This role can be deployed in all node groups.

Deployed role instances must be no more than 200.

-

DBalancer

Optional. This role can be deployed in all node groups.

The number of role instances must be at least 2 and at most 9.

-