• MapReduce Service

mrs
  1. Help Center
  2. MapReduce Service
  3. API Reference
  4. APIs
  5. Cluster Management APIs
  6. Creating a Cluster and Running a Job

Creating a Cluster and Running a Job

Function

This API is used to create an MRS cluster and submit a job in the cluster. This API is incompatible with Sahara.

Users can create a maximum of 10 clusters at a time.

Before using the API, you need to obtain the resources listed in Table 1.

Table 1 Obtaining resources

Resource

Obtain Method

VPC

See the operation guide in Virtual Private Cloud > Querying VPCs and VPC > Creating a VPC in the API Reference of the virtual private cloud.

Subnet

See the operation guide in Subnet > Querying Subnets and Subnet > Creating a Subnet in the API Reference of the virtual private cloud.

Key pair

See the operation guide in ECS SSH Key Management > Querying SSH Key Pairs (Native OpenStack API) and ECS SSH Key Management > Creating and Importing an SSH Key Pair (Native OpenStack API) in the API Reference of the elastic cloud server.

Region

For information about regions and available zones, see Regions and Endpoints.

Version information

Currently, MRS 1.3.0, MRS 1.5.0, MRS 1.6.0, MRS 1.6.3, and MRS 1.7.2 are supported.

The latest version of MRS is used by default. Currently, the latest version is MRS 1.7.2.

Component information

  • Information about components supported by MRS 1.5.0, MRS 1.6.0, MRS 1.6.3 or MRS 1.7.2
    Component of an analysis cluster:
    • Hadoop
    • Spark
    • HBase
    • Hive
    • Hue
    • Loader
    Component of a streaming cluster:
    • Kafka
    • Storm
    • Flume
  • Information about components supported by MRS 1.3.0
    Component of an analysis cluster:
    • Hadoop
    • Spark
    • HBase
    • Hive
    • Hue
    NOTE:

    The request has the Hue component only when safe_mode is set to 1.

    Component of a streaming cluster:
    • Kafka
    • Storm

URI

  • Format:

    POST /v1.1/{project_id}/run-job-flow

  • Parameter description
    Table 2 URI parameter description

    Parameter

    Mandatory or Not

    Description

    project_id

    Yes

    Project ID. For details on how to obtain the project ID, see Obtaining a Project ID.

Request

  • Example:
    Enabling the cluster HA
    {
        "billing_type": 12, 
        "data_center": "eu-de", 
        "master_node_num": 2, 
        "master_node_size": "s1.8xlarge.linux.mrs", 
        "core_node_num": 3, 
        "core_node_size": "c2.2xlarge.linux.mrs", 
        "available_zone_id": "bf84aba586ce4e948da0b97d9a7d62fb", 
        "cluster_name": "newcluster", 
        "vpc": "vpc1", 
        "vpc_id": "5b7db34d-3534-4a6e-ac94-023cd36aaf74", 
        "subnet_id": "815bece0-fd22-4b65-8a6e-15788c99ee43", 
        "subnet_name": "subnet", 
        "security_groups_id": "845bece1-fd22-4b45-7a6e-14338c99ee43",
        "tags": [
           {
              "key": "key1",
              "value":"value1"
           },
           {
              "key": "key2",
              "value": "value2"
           }
        ],
        "cluster_version": "MRS 1.7.2",
        "cluster_type": 0,
        "master_data_volume_type": "SATA",
        "master_data_volume_size": 100,
        "master_data_volume_count": 1,
        "core_data_volume_type": "SATA",
        "core_data_volume_size": 100,
        "core_data_volume_count": 2,
    
        "node_public_cert_name": "SSHkey-bba1", 
        "safe_mode": 0, 
        "log_collection": 1,
        "task_node_groups": [
            {
              
              "node_num": 2,
              "node_size": "s1.xlarge.linux.mrs",
              "data_volume_type": "SATA",
              "data_volume_count": 1,
              "data_volume_size": 700,
              "auto_scaling_policy": 
               {
                 "auto_scaling_enable": true,
                 "min_capacity": 1,
                 "max_capacity": "3",
                 "rules": [
                  {
                   "name": "default-expand-1",
                   "adjustment_type": "scale_out",
                   "cool_down_minutes": 5,
                   "scaling_adjustment": 1,
                   "trigger": {
                     "metric_name": "YARNMemoryAvailablePercentage",
                     "metric_value": "25",
                     "comparison_operator": "LT",
                     "evaluation_periods": 10
                    }
                 },
                 {
                   "name": "default-shrink-1",
                   "adjustment_type": "scale_in",
                   "cool_down_minutes": 5,
                   "scaling_adjustment": 1,
                   "trigger": {
                     "metric_name": "YARNMemoryAvailablePercentage",
                     "metric_value": "70",
                     "comparison_operator": "GT",
                     "evaluation_periods": 10
                   }
                 }
               ]
             }
    
           }
         ],
        "component_list": [
            {
               "component_name": "Hadoop" 
             }, 
            {
               "component_name": "Spark" 
             }, 
            {
               "component_name": "HBase" 
             }, 
            {
                "component_name": "Hive"
             }
                          ], 
        "add_jobs": [
            {
                "job_type": 1, 
                "job_name": "tenji111", 
                "jar_path": "s3a://bigdata/program/hadoop-mapreduce-examples-2.7.2.jar", 
                "arguments": "wordcount", 
                "input": "s3a://bigdata/input/wd_1k/", 
                "output": "s3a://bigdata/ouput/", 
                "job_log": "s3a://bigdata/log/", 
                "shutdown_cluster": true, 
                "file_action": "", 
                "submit_job_once_cluster_run": true, 
                "hql": "", 
                "hive_script_path": ""
            }
        ],
    "bootstrap_scripts": [
             {
                 "name":"Modify os config",
                 "uri":"s3a://XXX/modify_os_config.sh",
                 "parameters":"param1 param2",
                 "nodes":[
                     "master",
                     "core",
                     "task"
                 ],
                 "active_master":"false",
                 "before_component_start":"true",
                 "fail_action":"continue"
             },
             {
                 "name":"Install zepplin",
                 "uri":"s3a://XXX/zeppelin_install.sh",
                 "parameters":"",
                 "nodes":[
                 "master"
                 ],
                 "active_master":"true",
                 "before_component_start":"false",
                 "fail_action":"continue"
             }
        ]
    }

    Disabling the cluster HA when creating the smallest cluster

    {
        "billing_type": 12, 
        "data_center": "eu-de", 
        "master_node_num": 1, 
        "master_node_size": "s1.8xlarge.linux.mrs", 
        "core_node_num": 1, 
        "core_node_size": "c2.2xlarge.linux.mrs", 
        "available_zone_id": "bf84aba586ce4e948da0b97d9a7d62fb", 
        "cluster_name": "newcluster", 
        "vpc": "vpc1", 
        "vpc_id": "5b7db34d-3534-4a6e-ac94-023cd36aaf74", 
        "subnet_id": "815bece0-fd22-4b65-8a6e-15788c99ee43", 
        "subnet_name": "subnet", 
        "security_groups_id": "845bece1-fd22-4b45-7a6e-14338c99ee43",
        "tags": [
           {
              "key": "key1",
              "value":"value1"
           },
           {
              "key": "key2",
              "value": "value2"
           }
        ],
        "cluster_version": "MRS 1.7.2",
        "cluster_type": 0,
        "master_data_volume_type": "SATA",
        "master_data_volume_size": 100,
        "master_data_volume_count": 1,
        "core_data_volume_type": "SATA",
        "core_data_volume_size": 100,
        "core_data_volume_count": 1,
        "node_public_cert_name": "SSHkey-bba1", 
        "safe_mode": 0, 
        "log_collection": 1,
        "component_list": [
            {
               "component_name": "Hadoop" 
             }, 
            {
               "component_name": "Spark" 
             }, 
            {
               "component_name": "HBase" 
             }, 
            {
                "component_name": "Hive"
             }
                          ], 
        "add_jobs": [
            {
                "job_type": 1, 
                "job_name": "tenji111", 
                "jar_path": "s3a://bigdata/program/hadoop-mapreduce-examples-2.7.2.jar", 
                "arguments": "wordcount", 
                "input": "s3a://bigdata/input/wd_1k/", 
                "output": "s3a://bigdata/ouput/", 
                "job_log": "s3a://bigdata/log/", 
                "shutdown_cluster": true, 
                "file_action": "", 
                "submit_job_once_cluster_run": true, 
                "hql": "", 
                "hive_script_path": ""
            }
        ],
    "bootstrap_scripts": [
             {
                 "name":"Install zepplin",
                 "uri":"s3a://XXX/zeppelin_install.sh",
                 "parameters":"",
                 "nodes":[
                 "master"
                 ],
                 "active_master":"false",
                 "before_component_start":"false",
                 "fail_action":"continue"
             }
        ]
    }
  • Parameter description
    Table 3 Request parameter description

    Parameter

    Mandatory or Not

    Type

    Description

    billing_type

    Yes

    Integer

    The value is 12, indicating on-demand payment.

    data_center

    Yes

    String

    Cluster region information. Obtain the value from Regions and Endpoints.

    master_node_num

    Yes

    Integer

    Number of Master nodes

    The value is 2.

    master_node_size

    Yes

    String

    Best match based on several years of commissioning experience. MRS supports host specifications determined by CPUs, memory, and disks space.

    • Master nodes support h1.2xlarge.4, h1.4xlarge.4, h1.8xlarge.4, c2.4xlarge, s1.4xlarge and s1.8xlarge, c3.2xlarge.2, c3.xlarge.4, c3.2xlarge.4, c3.4xlarge.2, c3.4xlarge.4, c3.8xlarge.4, c3.15xlarge.4.
    • Core nodes of a streaming cluster support s1.xlarge, c2.2xlarge, c2.4xlarge, s1.4xlarge, s1.8xlarge, d1.8xlarge, h1.2xlarge.4, h1.4xlarge.4 and h1.8xlarge.4, c3.2xlarge.2, c3.xlarge.4, c3.2xlarge.4, c3.4xlarge.2, c3.4xlarge.4, c3.8xlarge.4, c3.15xlarge.4.
    • Core nodes of an analysis cluster support all specifications c2.2xlarge, c2.4xlarge, s1.xlarge, s1.4xlarge, s1.8xlarge, d1.xlarge, d1.2xlarge, d1.4xlarge, d1.8xlarge, h1.2xlarge.4, h1.4xlarge.4 and h1.8xlarge.4, c3.2xlarge.2, c3.xlarge.4, c3.2xlarge.4, c3.4xlarge.2, c3.4xlarge.4, c3.8xlarge.4, c3.15xlarge.4, d2.xlarge.8, d2.2xlarge.8, d2.4xlarge.8, d2.8xlarge.8.
    • Task nodes support c2.2xlarge, c2.4xlarge, s1.xlarge, s1.4xlarge, s1.8xlarge, h1.2xlarge.4, h1.4xlarge.4 and h1.8xlarge.4, c3.2xlarge.2, c3.xlarge.4, c3.2xlarge.4, c3.4xlarge.2, c3.4xlarge.4, c3.8xlarge.4, c3.15xlarge.4.

    The following provides specification details.

    • General Computing S1 > 4 vCPUs 16 GB | s1.xlarge
      • CPU: 4-core
      • Memory: 16 GB
      • System Disk: 40 GB
    • Disk-intensive D1 > 4 vCPUs 32 GB | d1.xlarge
      • CPU: 4-core
      • Memory: 32 GB
      • System Disk: 40 GB
      • Data Disk: 1.8 TB x 3 HDDs
    • General Computing C2 > 8 vCPUs 16 GB | c2.2xlarge
      • CPU: 8-core
      • Memory: 16 GB
      • System Disk: 40 GB
    • Disk-intensive D1 > 8 vCPUs 64 GB | d1.2xlarge
      • CPU: 8-core
      • Memory: 64 GB
      • System Disk: 40 GB
      • Data Disk: 1.8 TB x 6 HDDs
    • General Computing C2 > 16 vCPUs 32 GB | c2.4xlarge
      • CPU: 16-core
      • Memory: 32 GB
      • System Disk: 40 GB
    • General Computing S1 > 16 vCPUs 64 GB | s1.4xlarge
      • CPU: 16-core
      • Memory: 64 GB
      • System Disk: 40 GB
    • Disk-intensive D1 > 16 vCPUs 128 GB | d1.4xlarge
      • CPU: 16-core
      • Memory: 128 GB
      • System Disk: 40 GB
      • Data Disk: 1.8 TB x 12 HDDs
    • General Computing S1 > 32 vCPUs 128 GB | s1.8xlarge
      • CPU: 32-core
      • Memory: 128 GB
      • System Disk: 40 GB
    • General computing-plus kvm C3 > 4 vCPUs 16 GB | c3.xlarge.4
      • CPU: 4-core
      • Memory: 16 GB
      • System Disk: 40 GB
    • General computing-plus kvm C3 > 8 vCPUs 32 GB | c3.2xlarge.4
      • CPU: 8-core
      • Memory: 32 GB
      • System Disk: 40 GB
    • General computing-plus kvm C3 > 16 vCPUs 32 GB | c3.4xlarge.2
      • CPU: 16-core
      • Memory: 32 GB
      • System Disk: 40 GB
    • General computing-plus kvm C3 > 16 vCPUs 64 GB | c3.4xlarge.4
      • CPU: 16-core
      • Memory: 64 GB
      • System Disk: 40 GB
    • General computing-plus kvm C3 > 32 vCPUs 128 GB | c3.8xlarge.4
      • CPU: 32-core
      • Memory: 128 GB
      • System Disk: 40 GB
    • General computing-plus kvm C3 > 60 vCPUs 256 GB | c3.15xlarge.4
      • CPU: 60-core
      • Memory: 256 GB
      • System Disk: 40 GB
    • General computing-plus kvm C3 > 8 vCPUs 16 GB | c3.2xlarge.2
      • CPU: 8-core
      • Memory: 16 GB
      • System Disk: 40 GB
    • Disk-intensive D1 > 36 vCPUs 256 GB | d1.8xlarge
      • CPU: 36-core
      • Memory: 256 GB
      • System Disk: 40 GB
      • Data Disk: 1.8 TB x 24 HDDs
    • High-Performance Computing H1 > 8 vCPUs 32 GB | h1.2xlarge
      • CPU: 8-core
      • Memory: 32 GB
      • System Disk: 40 GB
    • High-Performance Computing H1 > 16 vCPUs 64 GB | h1.4xlarge
      • CPU: 16-core
      • Memory: 64 GB
      • System Disk: 40 GB
    • High-Performance Computing H1 > 32 vCPUs 128 GB | h1.8xlarge
      • CPU: 32-core
      • Memory: 128 GB
      • System Disk: 40 GB
    • Disk-intensive kvm D2 > 4 vCPUs 32 GB | d2.xlarge.8
      • CPU: 4-core
      • Memory: 32 GB
      • System Disk: 40 GB
      • Data Disk: 1.8 TB x 2 HDDs
    • Disk-intensive kvm D2 > 8 vCPUs 64 GB | d2.2xlarge.8
      • CPU: 8-core
      • Memory: 64 GB
      • System Disk: 40 GB
      • Data Disk: 1.8 TB x 4 HDDs
    • Disk-intensive kvm D2 > 16 vCPUs 128 GB | d2.4xlarge.8
      • CPU: 16-core
      • Memory: 128 GB
      • System Disk: 40 GB
      • Data Disk: 1.8 TB x 8 HDDs
    • Disk-intensive kvm D2 > 32 vCPUs 256 GB | d2.8xlarge.8
      • CPU: 32-core
      • Memory: 256 GB
      • System Disk: 40 GB
      • Data Disk: 1.8 TB x 16 HDDs

    core_node_num

    Yes

    Integer

    Number of Core nodes

    Value range: 1 to 500

    A maximum of 500 Core nodes are supported by default. If more than 500 Core nodes are required, contact technical support engineers or invoke background APIs to modify the database.

    core_node_size

    Yes

    String

    Instance specification of a Core node

    The configuration method of this parameter is identical to that of master_node_size.

    available_zone_id

    Yes

    String

    ID of an available zone. Obtain the value from Regions and Endpoints.

    AZ1(eu-de-01):bf84aba586ce4e948da0b97d9a7d62fb

    AZ2(eu-de-02):bf84aba586ce4e948da0b97d9a7d62fc

    cluster_name

    Yes

    String

    Cluster name, which is globally unique and contains only 1 to 64 letters, digits, hyphens (-), and underscores (_).

    vpc

    Yes

    String

    Name of the VPC where the subnet locates

    Obtain the VPC name from the management console as follows:

    1. Register an account and log in to the management console.
    2. Click Virtual Private Cloud and select Virtual Private Cloud from the left list.

    On the Virtual Private Cloud page, obtain the VPC name from the list.

    vpc_id

    Yes

    String

    ID of the VPC where the subnet locates

    Obtain the VPC ID from the management console as follows:

    1. Register an account and log in to the management console.
    2. Click Virtual Private Cloud and select Virtual Private Cloud from the left list.

    On the Virtual Private Cloud page, obtain the VPC ID from the list.

    subnet_id

    Yes

    String

    Subnet ID

    Obtain the subnet ID from the management console as follows:

    1. Register an account and log in to the management console.
    2. Click Virtual Private Cloud and select Virtual Private Cloud from the left list.

    On the Virtual Private Cloud page, obtain the subnet ID from the list.

    subnet_name

    Yes

    String

    Name of the subnet

    Obtain the subnet name from the management console as follows:

    1. Register an account and log in to the management console.
    2. Click Virtual Private Cloud and select Virtual Private Cloud from the left list.

    On the Virtual Private Cloud page, obtain the subnet name from the list.

    security_groups_id

    No

    String

    Security group ID of the cluster

    • If this parameter is left blank, MRS automatically creates a security group, whose name starts with mrs_ {cluster_name}.
    • If this parameter is not left blank, a fixed security group is used to create a cluster. The transferred ID must be the security group ID owned by the current tenant.

    tags

    No

    Array

    Cluster tag information

    • A cluster allows a maximum of 10 tags. A tag name (key) must be unique in a cluster.
    • A tag key or value cannot contain the following special characters: =*<>\,|/

    cluster_version

    No

    String

    Version of the clusters

    Currently, MRS 1.3.0, MRS 1.5.0, MRS 1.6.0, MRS 1.5.0MRS 1.6.3 and MRS 1.7.2 are supported. The latest version of MRS is used by default. Currently, the latest version is MRS 1.7.2.

    cluster_type

    No

    Integer

    Type of clusters

    • 0: analysis cluster
    • 1: streaming cluster

    The default value is 0.

    master_data_volume_type

    No

    String

    Data disk storage type of the Master node, supporting SATA, SAS, and SSD currently

    (This parameter is a multi-disk parameter available for MRS1.6.0 or later versions.)

    NOTE:

    Possible configurations are: configure volume_type only, or both master_data_volume_type and core_data_volume_type, or all of the three parameters.

    master_data_volume_size

    No

    Integer

    Data disk size of the Master node. Disks can be purchased at the same time when a cluster is created to enlarge storage capacity.

    Value range: 100 GB to 32000 GB

    (This parameter is a multi-disk parameter available for MRS1.6.0 or later versions.)

    master_data_volume_count

    No

    Integer

    Number of data disks of the Master node

    The value can be set to 1 only.

    (This parameter is a multi-disk parameter available for MRS1.6.0 or later versions.)

    core_data_volume_type

    No

    String

    Data disk storage type of the Core node, supporting SATA, SAS, and SSD currently

    (This parameter is a multi-disk parameter available for MRS1.6.0 or later versions.)

    NOTE:

    Possible configurations are: configure volume_type only, or both master_data_volume_type and core_data_volume_type, or all of the three parameters.

    core_data_volume_size

    No

    Integer

    Data disk size of the Core node. Disks can be purchased at the same time when a cluster is created to enlarge storage capacity.

    Value range: 100 GB to 32000 GB

    (This parameter is a multi-disk parameter available for MRS1.6.0 or later versions.)

    core_data_volume_count

    No

    Integer

    Number of data disks of the Core node

    Value range: 1 to 10

    (This parameter is a multi-disk parameter available for MRS1.6.0 or later versions.)

    volume_type

    No

    String

    Type of disks

    SATA, SAS, and SSD are supported.

    • SATA: common I/O
    • SAS: high-speed I/O
    • SSD: Ultrahigh-speed I/O

    MRS 1.6.0 allows creation of multi-disk clusters so multi-disk parameters are available. To improve compatibility, original disk parameters are also available. Therefore, you can use disk parameters as follows:

    • For versions earlier than MRS 1.6.0, only volume_type and volume_size can be used as disk parameters.
    • For MRS 1.6.0 or later versions, in addition to volume_type and volume_size, multi-disk parameters can also be used and are recommended.
    • If volume_type and volume_size coexist with multi-disk parameters, the system read volume_type and volume_size first.
      NOTE:

      Possible configurations are: configure volume_type only, or both master_data_volume_type and core_data_volume_type, or all of the three parameters.

    volume_size

    No

    Integer

    Data disk size of the Core node

    Users can add disks to expand storage capacity when creating a cluster. There are the following scenarios:

    • Separation of data storage and computing: Data is stored in the OBS system. Costs of clusters are relatively low but computing performance is poor. The clusters can be deleted at any time. It is recommended when data computing is not frequently performed.
    • Integration of data storage and computing: Data is stored in the HDFS system. Costs of clusters are relatively high but computing performance is good. The clusters cannot be deleted in a short term. It is recommended when data computing is frequently performed.

    Value range: 100 GB to 32000 GB

    This parameter is not recommended for MRS 1.6.0 or later. For details, see description about the volume_type parameter.

    node_public_cert_name

    Yes

    String

    Name of a key pair

    You can use a key to log in to the Master node in the cluster.

    safe_mode

    No

    Integer

    MRS cluster running mode

    • 0: common mode

      The value indicates that the Kerberos authentication is disabled. Users can use all functions provided by the cluster.

    • 1: safe mode

      The value indicates that the Kerberos authentication is enabled. Common users cannot use the file management or job management functions of an MRS cluster and cannot view cluster resource usage or the job records of Hadoop and Spark. To use these functions, the users must obtain the relevant permissions from the MRS Manager administrator.

      The request has the cluster_admin_secret parameter only when safe_mode is set to 1.

    cluster_admin_secret

    No

    String

    Indicates the password of the MRS Manager administrator.

    The password for MRS 1.6.0 or later version:

    • Must contain 8 to 32 characters.
    • Must contain at least three types of the following:
      • Lowercase letters
      • Uppercase letters
      • Digits
      • Special characters of `~!@#$%^&*()-_=+\|[{}];:'",<.>/?
      • Spaces
    • Must be different from the username.
    • Must be different from the username written in reverse order.

    The password for MRS 1.5.0 :

    • Must contain 6 to 32 characters.
    • Must contain at least two types of the following:
      • Lowercase letters
      • Uppercase letters
      • Digits
      • Special characters of `~!@#$%^&*()-_=+\|[{}];:'",<.>/?
      • Spaces
    • Must be different from the username.
    • Must be different from the username written in reverse order.
    The password for MRS 1.3.0:
    • Must contain 8 to 64 characters.
    • Must contain at least four types of the following:
      • Lowercase letters
      • Uppercase letters
      • Digits
      • Special characters of `~!@#$%^&*()-_=+\|[{}];:'",<.>/?
      • Spaces
    • Must be different from the username.
    • Must be different from the username written in reverse order.

    This parameter needs to be configured only when safe_mode is set to 1.

    log_collection

    No

    Integer

    Indicates whether logs are collected when cluster installation fails.

    • 0: not collected
    • 1: collected

    The default value is 0.

    If log_collection is set to 1, OBS buckets will be created to collect the MRS logs. These buckets will be charged.

    task_node_groups

    No

    Array

    List of Task nodes

    For the parameter description, see Table 4.

    component_list

    Yes

    Array

    Service component list

    For the parameter description, see Table 5.

    add_jobs

    No

    Array

    You can submit a job when you create a cluster to save time and use MRS easily. Only one job can be added. For details about job parameters, see Table 6.

    bootstrap_scripts

    No

    Array

    Bootstrap action script information. For details, see Table 11.

    MRS 1.7.2 or later supports this parameter.

    Table 4 task_node_groups parameter description

    Parameter

    Mandatory or Not

    Type

    Description

    node_num

    Yes

    Integer

    Number of Task nodes. The value ranges from 0 to 100 and the default value is 0. The total number of Core and Task nodes cannot exceed 500.

    node_size

    Yes

    String

    Instance specification of a Task node

    For details about how to configure the parameter, see the configuration description of master_node_size.

    data_volume_type

    Yes

    String

    Data disk storage type of the Task node, supporting SATA, SAS, and SSD currently

    • SATA: common I/O
    • SAS: High-speed I/O
    • SSD: Ultrahigh-speed I/O

    data_volume_count

    Yes

    Integer

    Number of data disks of the Task node

    Value range: 0 to 10

    data_volume_size

    Yes

    Integer

    Data disk size of the Task node

    Value range: 100 GB to 32000 GB

    auto_scaling_policy

    No

    AutoScalingPolicy

    Table 7 describes auto scaling policies.

    Table 5 component_list parameter description

    Parameter

    Mandatory or Not

    Type

    Description

    component_name

    Yes

    String

    Component name

    Currently, Hadoop, Spark, HBase, Hive, Hue, Loader, Flume, Kafka and Storm are supported.

    Table 6 add_jobs parameter description

    Parameter

    Mandatory or Not

    Type

    Description

    job_type

    Yes

    Integer

    Job type

    • 1: MapReduce
    • 2: Spark
    • 3: Hive Script
    • 4: HiveQL (not supported currently)
    • 5: DistCp, importing and exporting data (not supported in this API currently).
    • 6: Spark Script
    • 7: Spark SQL, submitting Spark SQL statements (not supported in this API currently).
      NOTE:

      Spark and Hive jobs can be added to only clusters including Spark and Hive components.

    job_name

    Yes

    String

    Job name

    It contains only 1 to 64 letters, digits, hyphens (-), and underscores (_).

    NOTE:

    Identical job names are allowed but not recommended.

    jar_path

    No

    String

    Path of the .jar file or .sql file for program execution

    The parameter must meet the following requirements:

    • Contains a maximum of 1023 characters, excluding special characters such as ;|&><'$. The address cannot be empty or full of spaces.
    • The path varies depending on the file system:
      • OBS: The path must start with s3a://.
      • HDFS: The path starts with /.
    • Spark Script must end with .sql; while MapReduce and Spark Jar must end with .jarsql and jar are case-insensitive.

    arguments

    No

    String

    Key parameter for program execution

    The parameter is specified by the function of the user's program. MRS is only responsible for loading the parameter.

    The parameter contains a maximum of 2047 characters, excluding special characters such as ;|&>'<$, and can be empty.

    input

    No

    String

    Path for inputting data

    The path varies depending on the file system:
    • OBS: The path must start with s3a://.
    • HDFS: The path starts with /.

    The parameter contains a maximum of 1023 characters, excluding special characters such as ;|&>'<$, and can be empty.

    output

    No

    String

    Path for outputting data

    The path varies depending on the file system:
    • OBS: The path must start with s3a://.
    • HDFS: The path starts with /.

    If the path does not exist, the system automatically creates it.

    The parameter contains a maximum of 1023 characters, excluding special characters such as ;|&>'<$, and can be empty.

    job_log

    No

    String

    Path for storing job logs that record job running status.

    The path varies depending on the file system:
    • OBS: The path must start with s3a://.
    • HDFS: The path starts with /.

    The parameter contains a maximum of 1023 characters, excluding special characters such as ;|&>'<$, and can be empty.

    shutdown_cluster

    No

    Bool

    Whether to delete the cluster after the jobs are complete

    • true: Yes
    • false: No

    file_action

    No

    String

    Data import and export

    • import
    • export

    submit_job_once_cluster_run

    Yes

    Bool

    • true: A job is submitted when a cluster is created.
    • false: A job is submitted separately.

    The parameter is set to true in this example.

    hql

    No

    String

    HiveQL statement

    hive_script_path

    Yes

    String

    SQL program path

    This parameter is needed by Spark Script and Hive Script jobs only and must meet the following requirements:

    • Contains a maximum of 1023 characters, excluding special characters such as ;|&><'$. The address cannot be empty or full of spaces.
    • The path varies depending on the file system:
      • OBS: The path must start with s3a://.
      • HDFS: The path starts with /.
    • Ends with .sqlsql is case-insensitive.
    NOTE:

    Files and programs encrypted by the KMS cannot be supported in the OBS path.

    Table 7 auto_scaling_policy parameter description

    Parameter

    Mandatory or Not

    Type

    Description

    auto_scaling_enable

    Yes

    Boolean

    Whether the auto scaling rule is enabled

    min_capacity

    Yes

    Integer

    Minimum number of nodes left in the node group

    Value range: 0 to 500

    max_capacity

    Yes

    Integer

    Maximum number of nodes in the node group

    Value range: 0 to 500

    rules

    Yes

    List

    List of auto scaling rules. For details, see Table 8.

    Table 8 rules parameter description

    Parameter

    Mandatory or Not

    Type

    Description

    name

    Yes

    String

    Name of an auto scaling rule

    It contains only 1 to 64 letters, digits, hyphens (-), and underscores (_).

    Rule names must be unique in a node group.

    description

    No

    String

    Description about an auto scaling rule

    Contains a maximum of 1024 characters.

    adjustment_type

    Yes

    String

    Elastic scaling rule adjustment type. Possible values are:

    • scale_out: cluster expansion
    • scale_in: cluster shrink

    cool_down_minutes

    Yes

    Integer

    Cluster cooling time after an auto scaling rule is triggered, when no auto scaling operation is performed. The unit is minute.

    Value range: 0 to 10080. There are 10080 minutes in a week.

    scaling_adjustment

    Yes

    Integer

    Number of nodes that can be adjusted once

    Value range: 0 to 100

    trigger

    Yes

    Trigger

    Rule triggering condition. For details, see Table 9.

    Table 9 trigger parameter description

    Parameter

    Mandatory or Not

    Type

    Description

    metric_name

    Yes

    String

    Metric name.

    This triggering condition makes a judgment according to the value of the metric corresponding to the name.

    Contains a maximum of 64 characters.

    Table 10 lists the supported metric names.

    metric_value

    Yes

    String

    Metric value.

    Metric threshold to trigger a rule. This parameter value can be an integer or number with two decimal places only.

    Table 10 provides values types and value ranges of metric_value.

    comparison_operator

    Yes

    String

    Metric judgment logic operator, including:

    • LT: less than
    • GT: greater than
    • LTOE: less than or equal to
    • GTOE: greater than or equal to

    evaluation_periods

    Yes

    Integer

    Number of consecutive five-minute periods, during which a metric threshold is reached

    Value range: 1 to 288

    Table 10 Auto scaling metrics

    Cluster Type

    Metric Name

    Type

    Description

    Streaming cluster

    StormSlotAvailable

    Integer

    Number of available Storm slots

    Value range: 0 to 2147483647

    StormSlotAvailablePercentage

    Percentage

    Percentage of available Storm slots, that is, the proportion of available slots to total slots

    Value range: 0 to 100

    StormSlotUsed

    Integer

    Number of the used Storm slots

    Value range: 0 to 2147483647

    StormSlotUsedPercentage

    Percentage

    Percentage of the used Storm slots, that is, the proportion of the used slots to total slots

    Value range: 0 to 100

    Analysis cluster

    YARNAppPending

    Integer

    Number of pending tasks on YARN

    Value range: 0 to 2147483647

    YARNAppPendingRatio

    Ratio

    Ratio of pending tasks on YARN, that is, the ratio of pending tasks to running tasks on Yarn

    Value range: 0 to 2147483647

    YARNAppRunning

    Integer

    Number of running tasks on YARN

    Value range: 0 to 2147483647

    YARNContainerAllocated

    Integer

    Number of containers allocated to YARN

    Value range: 0 to 2147483647

    YARNContainerPending

    Integer

    Number of pending containers on YARN

    Value range: 0 to 2147483647

    YARNContainerPendingRatio

    Ratio

    Ratio of pending containers on YARN, the ratio of pending containers to running containers on Yarn

    Value range: 0 to 2147483647

    YARNCPUAllocated

    Integer

    Number of virtual CPUs (vCPUs) allocated to YARN

    Value range: 0 to 2147483647

    YARNCPUAvailable

    Integer

    Number of available vCPUs on YARN

    Value range: 0 to 2147483647

    YARNCPUAvailablePercentage

    Percentage

    Percentage of available vCPUs on YARN, that is, the proportion of available vCPUs to total vCPUs

    Value range: 0 to 100

    YARNCPUPending

    Integer

    Number of pending vCPUs on YARN

    Value range: 0 to 2147483647

    YARNMemoryAllocated

    Integer

    Memory allocated to YARN. The unit is MB.

    Value range: 0 to 2147483647

    YARNMemoryAvailable

    Integer

    Available memory on YARN. The unit is MB.

    Value range: 0 to 2147483647

    YARNMemoryAvailablePercentage

    Percentage

    Percentage of available memory on YARN, that is, the proportion of available memory to total memory on YARN

    Value range: 0 to 100

    YARNMemoryPending

    Integer

    Pending memory on YARN

    Value range: 0 to 2147483647

    NOTE:

    When the value type is percentage or ratio in Table 10, the valid value can be accurate to percentile. The percentage metric value is a decimal value with a percent sign (%) removed. For example, 16.80 represents 16.80%.

    Table 11 bootstrap_scripts parameter description

    Parameter

    Mandatory or Not

    Type

    Description

    name

    Yes

    String

    Name of a bootstrap action script. A name of a bootstrap action script must be unique in a cluster.

    The value can contain only digits, letters, spaces, hyphens (-), and underscores (_) and cannot start with a space.

    The value can contain a maximum of 1 to 64 characters.

    uri

    Yes

    String

    Path of the shell script. Set this parameter to an OBS bucket path or a local VM path.

    • OBS bucket path: Enter a script path manually. For example, enter the path of the public sample script provided by MRS.

      Example: s3a://bootstrap/presto/presto-install.sh

      If dualroles is installed, the parameter of the presto-install.sh script is dualroles. If worker is installed, the parameter of the presto-install.sh script is worker. Based on the Presto usage habit, you are advised to install dualroles on the active Master nodes and worker on the Core nodes.

    • Local VM path: Enter a script path. The script path must start with a slash (/) and end with .sh.

    parameters

    No

    String

    Bootstrap action script parameters

    nodes

    Yes

    Array String

    Type of a node where the bootstrap action script is executed, including Master, Core, and Task

    active_master

    No

    Boolean

    Whether the bootstrap action script runs only on active Master nodes.

    The default value is false, indicating that the bootstrap action script can run on all Master nodes.

    before_component_start

    No

    Boolean

    Time when the bootstrap action script is executed. Currently, the script can be executed before and after the component is started.

    The default value is false, indicating that the bootstrap action script is executed after the component is started.

    fail_action

    Yes

    String

    Whether to continue to execute subsequent scripts and creating a cluster after the bootstrap action script fails to be executed.

    • continue: Continue to execute subsequent scripts.
    • errorout: Stop the action.
    The default value is errorout, indicating that the action is stopped.
    NOTE:

    You are advised to set this parameter to continue in the debugging phase so that the cluster can continue to be installed and started no matter whether the bootstrap action is successful.

Response

  • Example:
    {
        "cluster_id": "da1592c2-bb7e-468d-9ac9-83246e95447a", 
        "result": true,
        "msg": ""
    }
  • Parameter description
    Table 12 Response parameter description

    Parameter

    Mandatory or Not

    Type

    Description

    cluster_id

    Yes

    String

    Cluster ID, which is returned by the system after the cluster is created.

    result

    Yes

    Bool

    Operation result

    • true: operation succeeded
    • false: operation failed

    msg

    No

    String

    System message, which can be empty.

Status Code

Table 13 describes the status code of this API.

Table 13 Status code

Status Code

Description

200

The cluster is successfully created.

For the description about error status codes, see section Status Codes.