Deploying a Model as a Service

Function

This API is used to deploy a model as a service.

URI

POST /v1/{project_id}/services

Table 1 describes the required parameters.

Table 1 Parameters

Parameter

Mandatory

Type

Description

project_id

Yes

String

Project ID. For details about how to obtain the project ID, see Obtaining a Project ID.

Request Body

Table 2 describes the request parameters.

Table 2 Parameters

Parameter

Mandatory

Type

Description

service_name

Yes

String

Service name. Enter 1 to 64 characters. Only letters, digits, hyphens (-), and underscores (_) are allowed.

description

No

String

Service description, which contains a maximum of 100 characters. By default, this parameter is left blank.

infer_type

Yes

String

Inference mode. The value can be real-time or batch.

  • real-time: real-time service, which can be stopped as scheduled.

  • batch: batch service, which can be configured as tasks to run in batches. When the tasks are completed, the service stops automatically.

workspace_id

No

String

ID of the workspace to which a service belongs. The default value is 0, indicating the default workspace.

vpc_id

No

String

ID of the VPC to which a real-time service instance is deployed. By default, this parameter is left blank.

  • In this case, ModelArts allocates a dedicated VPC to each user so that users are isolated from each other. If you need to access other service components in a VPC of a service instance, set this parameter to the ID of the corresponding VPC.

  • Once a VPC is configured, it cannot be modified. If both vpc_id and cluster_id are configured, only the dedicated resource pool takes effect.

subnet_network_id

No

String

ID of a subnet. By default, this parameter is left blank.

This parameter is mandatory when vpc_id is configured. Enter the network ID displayed in the subnet details on the VPC console. A subnet provides dedicated network resources that are isolated from other networks.

security_group_id

No

String

Security group. By default, this parameter is left blank. This parameter is mandatory when vpc_id is configured.

A security group is a virtual firewall that provides secure network access control policies for service instances. A security group must contain at least one inbound rule to permit the requests whose protocol is TCP, source address is 0.0.0.0/0, and port number is 8080.

cluster_id

No

String

ID of a dedicated resource pool. This parameter is left blank by default, indicating that no dedicated resource pool is used. When using a dedicated resource pool to deploy services, ensure that the resource pool is running properly. After this parameter is set, the network configuration of the cluster is used, and the vpc_id parameter does not take effect. If this parameter is configured together with cluster_id in real-time config, cluster_id in real-time config is used preferentially.

config

Yes

config array corresponding to infer_type

Model running configuration. If infer_type is batch, you can configure only one model. If infer_type is real-time, you can configure multiple models and assign weights based on service requirements. However, the versions of these models cannot be the same.

  • If infer_type is set to real-time, see Table 3.

  • If infer_type is set to batch, see Table 4.

schedule

No

schedule array

Service scheduling configuration, which can be configured only for real-time services. By default, this parameter is not used. Services run for a long time. For details, see Table 5.

Table 3 config parameters of real-time

Parameter

Mandatory

Type

Description

model_id

Yes

String

Model ID

weight

Yes

Integer

Traffic weight allocated to a model. This parameter is mandatory only when infer_type is set to real-time. The sum of the weights must be 100.

specification

Yes

String

Resource specifications. Select specifications based on service requirements.

custom_spec

No

Object

Custom specifications. Set this parameter when you use a dedicated resource pool. For details, see Table 6.

instance_count

Yes

Integer

Number of instances deployed in a model. The value must be greater than 0.

envs

No

Map<String, String>

(Optional) Environment variable key-value pair required for running a model. By default, this parameter is left blank.

To ensure data security, do not enter sensitive information, such as plaintext passwords, in environment variables.

cluster_id

No

string

ID of a dedicated resource pool. By default, this parameter is left blank, indicating that no dedicated resource pool is used.

Table 4 config parameters of batch

Parameter

Mandatory

Type

Description

model_id

Yes

String

Model ID

specification

Yes

String

Resource flavor. Options: modelarts.vm.cpu.2u and modelarts.vm.gpu.p4

instance_count

Yes

Integer

Number of instances deployed in a model.

envs

No

Map<String, String>

(Optional) Environment variable key-value pair required for running a model. By default, this parameter is left blank.

To ensure data security, do not enter sensitive information, such as plaintext passwords, in environment variables.

src_type

No

String

Data source type. This parameter can be set to ManifestFile. By default, this parameter is left blank, indicating that only files in the src_path directory are read. If this parameter is set to ManifestFile, src_path must be a specific manifest file path. You can specify multiple data paths in the manifest file.

src_path

Yes

String

OBS path of the input data of a batch job

dest_path

Yes

String

OBS path of the output data of a batch job

req_uri

Yes

String

Inference API called in a batch task, which is a REST API in the model image. Select an API URI from the model config.json file for inference. If a ModelArts built-in inference image is used, the value of this parameter is /.

mapping_type

Yes

String

Mapping type of the input data. The value can be file or csv.

  • If you select file, each inference request corresponds to a file in the input data path. When this mode is used, req_uri of this model can have only one input parameter and the type of this parameter is file.

  • If you select csv, each inference request corresponds to a row of data in the CSV file. When this mode is used, the files in the input data path can only be in CSV format and mapping_rule needs to be configured to map the index of each parameter in the inference request body to the CSV file.

mapping_rule

No

Map

Mapping between input parameters and CSV data. This parameter is mandatory only when mapping_type is set to csv.

Mapping rule: The mapping rule comes from the input parameter (input_params) in the model configuration file config.json. When type is set to string, number, integer, or boolean, you need to configure the index parameter. For details, see .

The index must be a positive integer starting from 0. If the value of index does not comply with the rule, this parameter is ignored in the request. After the mapping rule is configured, the corresponding CSV data must be separated by commas (,).

Table 5 schedule parameters

Parameter

Mandatory

Type

Description

type

Yes

String

Scheduling type. Only the value stop is supported.

time_unit

Yes

String

Scheduling time unit. Options:

  • DAYS

  • HOURS

  • MINUTES

duration

Yes

Integer

Value that maps to the time unit. For example, if the task stops after two hours, set time_unit to HOURS and duration to 2.

Table 6 custom_spec parameters

Parameter

Mandatory

Type

Description

cpu

Yes

Float

Number of required CPUs

memory

Yes

Integer

Required memory capacity, in MB

gpu_p4

No

Float

Number of GPUs, which can be decimals. This parameter is optional. By default, it is not used.

Response Body

Table 7 describes the response parameters.

Table 7 Parameters

Parameter

Type

Description

service_id

String

Service ID

resource_ids

Array of strings

Resource ID array for the resource IDs generated by the target model

Samples

The following shows how to deploy different types of services.

  • Sample request: Creating a real-time service

    POST    https://endpoint/v1/{project_id}/services
    {
      "service_name": "mnist",
      "description": "mnist service",
      "infer_type": "real-time",
      "config": [
        {
          "model_id": "xxxmodel-idxxx",
          "weight": "100",
          "specification": "modelarts.vm.cpu.2u",
          "instance_count": 1
        }
      ]
    }
    
  • Sample request: Creating a real-time service and configuring multi-version traffic distribution

    {
      "service_name": "mnist",
      "description": "mnist service",
      "infer_type": "real-time",
      "config": [
        {
          "model_id": "xxxmodel-idxxx",
          "weight": "70",
          "specification": "modelarts.vm.cpu.2u",
          "instance_count": 1,
          "envs":
          {
              "model_name": "mxnet-model-1",
              "load_epoch": "0"
          }
        },
        {
          "model_id": "xxxxxx",
          "weight": "30",
          "specification": "modelarts.vm.cpu.2u",
          "instance_count": 1
        }
      ]
    }
    
  • Sample request: Creating a real-time service in a dedicated resource pool with custom specifications

    {
        "service_name": "realtime-demo",
        "description": "",
        "infer_type": "real-time",
        "cluster_id": "8abf68a969c3cb3a0169c4acb24b0000",
        "config": [{
            "model_id": "eb6a4a8c-5713-4a27-b8ed-c7e694499af5",
            "weight": "100",
            "cluster_id": "8abf68a969c3cb3a0169c4acb24b0000",
            "specification": "custom",
            "custom_spec": {
                "cpu": 1.5,
                "memory": 7500,
                "gpu_p4": 0,
    
            },
            "instance_count": 1
        }]
    }
    
  • Sample request: Creating a real-time service and setting it to automatically stop

    {
        "service_name": "service-demo",
        "description": "demo",
        "infer_type": "real-time",
        "config": [{
            "model_id": "xxxmodel-idxxx",
            "weight": "100",
            "specification": "modelarts.vm.cpu.2u",
            "instance_count": 1
        }],
        "schedule": [{
            "type": "stop",
            "time_unit": "HOURS",
            "duration": 1
        }]
    }
    
  • Sample request: Creating a batch service and setting mapping_type to file

    {
    "service_name": "batchservicetest",
    "description": "",
    "infer_type": "batch",
    "cluster_id": "8abf68a969c3cb3a0169c4acb24b****",
    "config": [{
        "model_id": "598b913a-af3e-41ba-a1b5-bf065320f1e2",
        "specification": "modelarts.vm.cpu.2u",
        "instance_count": 1,
        "src_path": "https://infers-data.obs.xxxx.com/xgboosterdata/",
        "dest_path": "https://infers-data.obs.dxxxx.com/output/",
        "req_uri": "/",
        "mapping_type": "file"
    }]
    }
    
  • Sample request: Creating a batch service and setting mapping_type to csv

    {
    "service_name": "batchservicetest",
    "description": "",
    "infer_type": "batch",
    "config": [{
        "model_id": "598b913a-af3e-41ba-a1b5-bf065320f1e2",
        "specification": "modelarts.vm.cpu.2u",
        "instance_count": 1,
        "src_path": "https://infers-data.obs.xxxx.com/xgboosterdata/",
        "dest_path": "https://infers-data.obs.xxxx.com.com/output/",
        "req_uri": "/",
        "mapping_type": "csv",
        "mapping_rule": {
            "type": "object",
            "properties": {
                "data": {
                    "type": "object",
                    "properties": {
                        "req_data": {
                            "type": "array",
                            "items": [{
                                "type": "object",
                                "properties": {
                                    "input5": {
                                        "type": "number",
                                        "index": 0
                                    },
                                    "input4": {
                                        "type": "number",
                                        "index": 1
                                    },
                                    "input3": {
                                        "type": "number",
                                        "index": 2
                                    },
                                    "input2": {
                                        "type": "number",
                                        "index": 3
                                    },
                                    "input1": {
                                        "type": "number",
                                        "index": 4
                                    }
                                }
                            }]
                        }
                    }
                }
            }
        }
    }]
    }
    

    The format of the inference request body described in mapping_rule is as follows:

    {
    "data": {
        "req_data": [{
            "input1": 1,
            "input2": 2,
            "input3": 3,
            "input4": 4,
            "input5": 5
        }]
    }
    }
    
  • Sample response

    {
      "service_id": "10eb0091-887f-4839-9929-cbc884f1e20e",
      "resource_ids": [     "INF-f878991839647358@1598319442708"   ]
    }
    

Status Code

For details about the status code, see Table 1.

Error Codes

See Error Codes.