Deploying a Model as a Service¶

Function¶

This API is used to deploy a model as a service.

URI¶

POST /v1/{project_id}/services

Table 1 describes the required parameters.

**Table 1** Parameters¶
Parameter	Mandatory	Type	Description
project_id	Yes	String	Project ID. For details about how to obtain a project ID, see Obtaining a Project ID and Name.

Request Body¶

Table 2 describes the request parameters.

**Table 2** Parameters¶
Parameter	Mandatory	Type	Description
service_name	Yes	String	Service name. Enter 1 to 64 characters. Only letters, digits, hyphens (-), and underscores (_) are allowed.
description	No	String	Service description, which contains a maximum of 100 characters. By default, this parameter is left blank.
infer_type	Yes	String	Inference mode. The value can be real-time or batch. real-time: real-time service, which can be stopped as scheduled. batch: batch service, which can be configured as tasks to run in batches. When the tasks are completed, the service stops automatically.
workspace_id	No	String	ID of the workspace to which a service belongs. The default value is 0, indicating the default workspace. Retain the default setting.
vpc_id	No	String	ID of the VPC to which a real-time service instance is deployed. By default, this parameter is left blank. In this case, ModelArts allocates a dedicated VPC to each user so that users are isolated from each other. If you need to access other service components in a VPC of a service instance, set this parameter to the ID of the corresponding VPC. Once a VPC is configured, it cannot be modified. If both vpc_id and cluster_id are configured, only the dedicated resource pool takes effect.
subnet_network_id	No	String	ID of a subnet. By default, this parameter is left blank. This parameter is mandatory when vpc_id is configured. Enter the network ID displayed in the subnet details on the VPC console. A subnet provides dedicated network resources that are isolated from other networks.
security_group_id	No	String	Security group. By default, this parameter is left blank. This parameter is mandatory when vpc_id is configured. A security group is a virtual firewall that provides secure network access control policies for service instances. A security group must contain at least one inbound rule to permit the requests whose protocol is TCP, source address is 0.0.0.0/0, and port number is 8080.
cluster_id	No	String	ID of a dedicated resource pool. This parameter is left blank by default, indicating that no dedicated resource pool is used. When using a dedicated resource pool to deploy services, ensure that the resource pool is running properly. After this parameter is set, the network configuration of the cluster is used, and the vpc_id parameter does not take effect. If this parameter is configured with cluster_id in real-time config, cluster_id in real-time config is used preferentially.
config	Yes	config array corresponding to infer_type	Model running configuration. If infer_type is batch, you can configure only one model. If infer_type is real-time, you can configure multiple models and assign weights based on service requirements. However, the versions of these models cannot be the same. If infer_type is set to real-time, see Table 3. If infer_type is set to batch, see Table 4.
schedule	No	schedule array	Service scheduling configuration, which can be configured only for real-time services. By default, this parameter is not used. Services run for a long time. For details, see Table 5.

**Table 3** **config** parameters of **real-time**¶
Parameter	Mandatory	Type	Description
model_id	Yes	String	Model ID
weight	Yes	Integer	Traffic weight allocated to a model. This parameter is mandatory only when infer_type is set to real-time. The sum of the weights must be 100.
specification	Yes	String	Resource specifications. Select specifications based on service requirements. modelarts.kat1.xlarge.al modelarts.kat1.xlarge
custom_spec	No	Object	Custom specifications. Set this parameter when you use a dedicated resource pool. For details, see Table 6.
instance_count	Yes	Integer	Number of instances deployed in a model. The value must be greater than 0.
envs	No	Map<String, String>	(Optional) Environment variable key-value pair required for running a model. By default, this parameter is left blank. To ensure data security, do not enter sensitive information, such as plaintext passwords, in environment variables.
cluster_id	No	string	ID of a dedicated resource pool. By default, this parameter is left blank, indicating that no dedicated resource pool is used.

**Table 4** **config** parameters of **batch**¶
Parameter	Mandatory	Type	Description
model_id	Yes	String	Model ID
specification	Yes	String	Resource flavor. modelarts.kat1.xlarge.al modelarts.kat1.xlarge
instance_count	Yes	Integer	Number of instances deployed in a model.
envs	No	Map<String, String>	(Optional) Environment variable key-value pair required for running a model. By default, this parameter is left blank. To ensure data security, do not enter sensitive information, such as plaintext passwords, in environment variables.
src_type	No	String	Data source type. This parameter can be set to ManifestFile. By default, this parameter is left blank, indicating that only files in the src_path directory are read. If this parameter is set to ManifestFile, src_path must be a specific manifest file path. You can specify multiple data paths in the manifest file.
src_path	Yes	String	OBS path of the input data of a batch job
dest_path	Yes	String	OBS path of the output data of a batch job
req_uri	Yes	String	Inference API called in a batch job, which is a REST API in the model image. Select an API URI from the model config.json file for inference. If a ModelArts built-in inference image is used, the value of this parameter is /.
mapping_type	Yes	String	Mapping type of the input data. The value can be file or csv. If you select file, each inference request corresponds to a file in the input data path. When this mode is used, req_uri of this model can have only one input parameter and the type of this parameter is file. If you select csv, each inference request corresponds to a row of data in the CSV file. When this mode is used, the files in the input data path can only be in CSV format and mapping_rule must be configured to map the index of each parameter in the inference request body to the CSV file.
mapping_rule	No	Map	Mapping between input parameters and CSV data. This parameter is mandatory only when mapping_type is set to csv. Mapping rule: The mapping rule comes from the input parameter (input_params) in the model configuration file config.json. When type is set to string, number, integer, or boolean, configure the index parameter. For details, see . The index must be a positive integer starting from 0. If the value of index does not comply with the rule, this parameter is ignored in the request. After the mapping rule is configured, the corresponding CSV data must be separated by commas (,).

**Table 5** **schedule** parameters¶
Parameter	Mandatory	Type	Description
type	Yes	String	Scheduling type. Only the value stop is supported.
time_unit	Yes	String	Scheduling time unit. Options: DAYS HOURS MINUTES
duration	Yes	Integer	Value that maps to the time unit. For example, if the task stops after two hours, set time_unit to HOURS and duration to 2.

**Table 6** **custom_spec** parameters¶
Parameter	Mandatory	Type	Description
cpu	Yes	Float	Number of required CPUs
memory	Yes	Integer	Required memory capacity, in MB
gpu_p4	No	Float	Number of GPUs, which can be decimals. This parameter is optional. By default, it is not used.
ascend_a310	No	Integer	Number of NPUs. This parameter is optional and is not used by default.

Response Body¶

Table 7 describes the response parameters.

**Table 7** Parameters¶
Parameter	Type	Description
service_id	String	Service ID
resource_ids	Array of strings	Resource ID array for the resource IDs generated by the target model

Samples¶

The following shows how to deploy different types of services.

Sample request: Creating a real-time service

POST    https://endpoint/v1/{project_id}/services
{
  "service_name": "mnist",
  "description": "mnist service",
  "infer_type": "real-time",
  "config": [
    {
      "model_id": "xxxmodel-idxxx",
      "weight": "100",
      "specification": "modelarts.kat1.xlarge",
      "instance_count": 1
    }
  ]
}

Sample request: Creating a real-time service and configuring multi-version traffic distribution

{
  "service_name": "mnist",
  "description": "mnist service",
  "infer_type": "real-time",
  "config": [
    {
      "model_id": "xxxmodel-idxxx",
      "weight": "70",
      "specification": "modelarts.kat1.xlarge",
      "instance_count": 1,
      "envs":
      {
          "model_name": "mxnet-model-1",
          "load_epoch": "0"
      }
    },
    {
      "model_id": "xxxxxx",
      "weight": "30",
      "specification": "modelarts.kat1.xlarge",
      "instance_count": 1
    }
  ]
}

Sample request: Creating a real-time service in a dedicated resource pool with custom specifications

{
    "service_name": "realtime-demo",
    "description": "",
    "infer_type": "real-time",
    "cluster_id": "8abf68a969c3cb3a0169c4acb24b0000",
    "config": [{
        "model_id": "eb6a4a8c-5713-4a27-b8ed-c7e694499af5",
        "weight": "100",
        "cluster_id": "8abf68a969c3cb3a0169c4acb24b0000",
        "specification": "custom",
        "custom_spec": {
            "cpu": 1.5,
            "memory": 7500,
            "gpu_p4": 0,
            "ascend_a910": 0
        },
        "instance_count": 1
    }]
}

Sample request: Creating a real-time service and setting it to automatically stop

{
    "service_name": "service-demo",
    "description": "demo",
    "infer_type": "real-time",
    "config": [{
        "model_id": "xxxmodel-idxxx",
        "weight": "100",
        "specification": "modelarts.kat1.xlarge",
        "instance_count": 1
    }],
    "schedule": [{
        "type": "stop",
        "time_unit": "HOURS",
        "duration": 1
    }]
}

Sample request: Creating a batch service and setting mapping_type to file

{
"service_name": "batchservicetest",
"description": "",
"infer_type": "batch",
"cluster_id": "8abf68a969c3cb3a0169c4acb24b****",
"config": [{
    "model_id": "598b913a-af3e-41ba-a1b5-bf065320f1e2",
    "specification": "modelarts.vm.cpu.2u",
    "instance_count": 1,
    "src_path": "https://infers-data.obs.xxxx.com/xgboosterdata/",
    "dest_path": "https://infers-data.obs.dxxxx.com/output/",
    "req_uri": "/",
    "mapping_type": "file"
}]
}

Sample response

{
  "service_id": "10eb0091-887f-4839-9929-cbc884f1e20e",
  "resource_ids": [     "INF-f878991839647358@1598319442708"   ]
}

Status Code¶

For details about the status code, see Table 1.

last updated: 2024-06-20 00:39