Creating a Resource Pool¶
This section describes how to create a dedicated resource pool.
Creating a Dedicated Resource Pool¶
Log in to the ModelArts management console. In the navigation pane, choose Dedicated Resource Pools > Elastic Cluster.
Note
Old-version dedicated resource pools are available only for users who have used the old version. For these users, the new version is displayed as Dedicated Resource Pools (New).
In the Resource Pools tab, click Create and configure parameters.
¶ Parameter
Sub-Parameter
Description
Name
N/A
Name of a dedicated resource pool.
Only lowercase letters, digits, and hyphens (-) are allowed. The value must start with a lowercase letter and cannot end with a hyphen (-).
Description
N/A
Brief description of a dedicated resource pool.
Billing Mode
N/A
You can select Pay-per-use.
Resource Pool Type
N/A
Only Physical is available.
Job Type
N/A
Select job types supported by the resource pool based on service requirements. DevEnviron, Training Job, and Inference Service are supported.
Network
N/A
Network in which the target service instance is deployed. The instance can exchange data with other cloud service resources in the same network.
Select a network from the drop-down list box. If no network is available, click Create on the right to create a network. For details about how to create a network, see Creating a Network.
Specification Configuration
N/A
You can add multiple specifications. Restrictions:
Each flavor must be unique.
The CPU architectures of multiple flavors must be the same. For example, all are x86 or Arm.
You are advised to select only one GPU or NPU flavor for distributed training. If multiple GPU or NPU flavors are selected, the distributed training speed will be affected because the parameter network planes of different flavors are not interconnected.
A maximum of 10 flavors can be added to a resource pool.
Specifications
Select required specifications. Due to system loss, the actual available resources are less than those specified in the specifications. After a dedicated resource pool is created, you can view the actual available resources in the Nodes tab of the dedicated resource pool details page.
AZ
You can select Automatically allocated or Specifies AZ. An AZ is a physical region where resources use independent power supplies and networks. AZs are physically isolated but interconnected over an intranet.
Automatically allocated: AZs are automatically allocated.
Specifies AZ: Specify AZs for resource pool nodes. To ensure system disaster recovery, deploy all nodes in the same AZ. You can set the number of nodes in an AZ.
Nodes
Select the number of nodes in a dedicated resource pool. More nodes mean higher computing performance.
If AZ is set to Specifies AZ, you do not need to configure Nodes.
Note
It is a good practice to create no more than 30 nodes at a time. Otherwise, the creation may fail due to traffic limiting.
Custom Driver
N/A
This parameter is available only when a GPU flavor is selected. Enable this function and select a GPU driver.
GPU Driver
N/A
This parameter is available only when custom driver is enabled. Select a GPU accelerator driver.
Advanced Options
N/A
Enable Configure now to set the cluster specifications and node distribution.
Cluster Specifications
N/A
Cluster Scale: maximum number of nodes that can be managed by the cluster. After the creation, the cluster can be scaled out but cannot be scaled in.
You can select Default or Custom.
Master Distribution
N/A
Distribution locations of controller nodes. You can select Random or Custom.
Random: Use the AZs randomly allocated by the system.
Custom: Select AZs for controller nodes.
Distribute controller nodes in different AZs for disaster recovery.
Click Next. Confirm the information and click Submit.
An icon is displayed for a resource pool that is being created. You can click this icon to view details. Failures including failing to create, modify, and run a resource pool are recorded in Failure Records.
After a resource pool is created, its status changes to Running. Only when the number of available nodes is greater than 0, tasks can be delivered to this resource pool.