Querying a List of Training Job Versions

Function

This API is used to query the version of a specified training job based on the job ID.

URI

GET /v1/{project_id}/training-jobs/{job_id}/versions

Table 1 describes the required parameters.

Table 1 URI parameters

Parameter

Mandatory

Type

Description

project_id

Yes

String

Project ID. For details about how to obtain the project ID, see Obtaining a Project ID.

job_id

Yes

Long

ID of a training job

Table 2 Query parameters

Parameter

Mandatory

Type

Description

per_page

No

Integer

Number of job parameters displayed on each page. The value range is [1, 1000]. Default value: 10

page

No

Integer

Index of the page to be queried

  • If paging is required, set page to 1.

  • The default value of page is 0, indicating that paging is not supported.

Request Body

None

Response Body

Table 3 describes the response parameters.

Table 3 Parameters

Parameter

Type

Description

is_success

Boolean

Whether the request is successful

error_message

String

Error message of a failed API call.

This parameter is not included when the API call succeeds.

error_code

String

Error code of a failed API call. For details, see Error Codes.

This parameter is not included when the API call succeeds.

job_id

Long

ID of a training job

job_name

String

Name of a training job

job_desc

String

Description of a training job

version_count

Long

Number of versions of a training job

versions

JSON Array

Version parameters of a training job. For details, see the sample response.

Table 4 versions parameters

Parameter

Type

Description

version_id

Long

Version ID of a training job

version_name

String

Version name of a training job

pre_version_id

Long

ID of the previous version of a training job

engine_type

Long

Engine type of a training job

engine_name

String

Name of the engine selected for a training job

engine_id

Long

ID of the engine selected for a training job

engine_version

String

Version of the engine selected for a training job

status

Int

Status of a training job

app_url

String

Code directory of a training job

boot_file_url

String

Boot file of a training job

create_time

Long

Time when a training job is created

parameter

JSON Array

Running parameters of a training job. This parameter is a container environment variable when a training job uses a custom image. For details, see Table 5.

duration

Long

Training job running duration, in milliseconds

spec_id

Long

ID of the resource specifications selected for a training job

core

String

Number of cores of the resource specifications

cpu

String

CPU memory of the resource specifications

gpu_num

Integer

Number of GPUs of the resource specifications

gpu_type

String

GPU type of the resource specifications

worker_server_num

Integer

Number of workers in a training job

data_url

String

Dataset of a training job

train_url

String

OBS path of the training job output file

log_url

String

OBS URL of the logs of a training job. By default, this parameter is left blank. Example value: /usr/log/

dataset_version_id

String

Dataset version ID of a training job

dataset_id

String

Dataset ID of a training job

data_source

JSON Array

Datasets of a training job. For details, see Table 6.

model_id

Long

Model ID of a training job

model_metric_list

String

Model metrics of a training job. For details, see Table 7.

system_metric_list

String

System monitoring metrics of a training job. For details, see Table 8.

user_image_url

String

SWR URL of a custom image used by a training job

user_command

String

Boot command used to start the container of a custom image of a training job

resource_id

String

Charged resource ID of a training job

dataset_name

String

Dataset of a training job

start_time

Long

Training start time

volumes

JSON Array

Storage volume that can be used by a training job. For details, see Table 13.

dataset_version_name

String

Dataset of a training job

pool_name

String

Name of a resource pool

pool_id

String

ID of a resource pool

nas_mount_path

String

Local mount path of SFS Turbo (NAS). Example value: /home/work/nas

nas_share_addr

String

Shared path of SFS Turbo (NAS). Example value: 192.168.8.150:/

nas_type

String

Only NFS is supported. Example value: nfs

Table 5 parameter parameters

Parameter

Type

Description

label

String

Parameter name

value

String

Parameter value

Table 6 data_source parameters

Parameter

Type

Description

dataset_id

String

Dataset ID of a training job

dataset_version

String

Dataset version ID of a training job

type

String

Dataset type

  • obs: Data from OBS is used.

  • dataset: Data from a specified dataset is used.

data_url

String

OBS bucket path

Table 7 model_metric_list parameters

Parameter

Type

Description

metric

JSON Array

Validation metrics of a classification of a training job

total_metric

JSON

Overall validation parameters of a training job. For details, see Table 11.

Table 8 system_metric_list parameters

Parameter

Type

Description

cpuUsage

Array

CPU usage of a training job

memUsage

Array

Memory usage of a training job

gpuUtil

Array

GPU usage of a training job

Table 9 metric parameters

Parameter

Type

Description

metric_values

JSON

Validation metrics of a classification of a training job. For details, see Table 10.

reserved_data

JSON

Reserved parameter

metric_meta

JSON

Classification of a training job, including the classification ID and name

Table 10 metric_values parameters

Parameter

Type

Description

recall

Float

Recall of a classification of a training job

precision

Float

Precision of a classification of a training job

accuracy

Float

Accuracy of a classification of a training job

Table 11 total_metric parameters

Parameter

Type

Description

total_metric_meta

JSON Array

Reserved parameter

total_reserved_data

JSON Array

Reserved parameter

total_metric_values

JSON Array

Overall validation metrics of a training job. For details, see Table 12.

Table 12 total_metric_values parameters

Parameter

Type

Description

f1_score

Float

F1 score of a training job. This parameter is used only by some preset algorithms and is automatically generated. It is for reference only.

recall

Float

Total recall of a training job

precision

Float

Total precision of a training job

accuracy

Float

Total accuracy of a training job

Table 13 volumes parameters

Parameter

Type

Description

nfs

JSON

Storage volume of the shared file system type. Only the training jobs running in the resource pool with the shared file system network connected support such storage volume. For details, see Table 14.

host_path

JSON

Storage volume of the host file system type. Only training jobs running in the dedicated resource pool support such storage volume. For details, see Table 15.

Table 14 nfs parameters

Parameter

Type

Description

id

String

ID of an SFS Turbo file system

src_path

String

Address of an SFS Turbo file system

dest_path

String

Local path of a training job

read_only

Boolean

Whether dest_path is read-only. The default value is false.

  • true: read-only permission

  • false: read/write permission. This is the default value.

Table 15 host_path parameters

Parameter

Type

Description

src_path

String

Local path of a host

dest_path

String

Local path of a training job

read_only

Boolean

Whether dest_path is read-only. The default value is false.

  • true: read-only permission

  • false: read/write permission. This is the default value.

Samples

The following shows how to query the job version details on the first page when job_id is set to 10 and five records are displayed on each page.

  • Sample request

    GET    https://endpoint/v1/{project_id}/training-jobs/10/versions?per_page=5&page=1
    
  • Successful sample response

    {
        "is_success": true,
        "job_id": 10,
        "job_name": "testModelArtsJob",
        "job_desc": "testModelArtsJob desc",
        "version_count": 2,
        "versions": [
            {
                "version_id": 10,
                "version_name": "V0004",
                "pre_version_id": 5,
                "engine_type": 1,
                "engine_name": "TensorFlow",
                "engine_id": 1,
                "engine_version": "TF-1.4.0-python2.7",
                "status": 10,
                "app_url": "/usr/app/",
                "boot_file_url": "/usr/app/boot.py",
                "create_time": 1524189990635,
                "parameter": [
                    {
                        "label": "learning_rate",
                        "value": 0.01
                    }
                ],
                "duration": 532003,
                "spec_id": 1,
                "core": 2,
                "cpu": 8,
                "gpu_num": 2,
                "gpu_type": "P100",
                "worker_server_num": 1,
                "data_url": "/usr/data/",
                "train_url": "/usr/train/",
                "log_url": "/usr/log/",
                "dataset_version_id": "2ff0d6ba-c480-45ae-be41-09a8369bfc90",
                "dataset_id": "38277e62-9e59-48f4-8d89-c8cf41622c24",
                "data_source": [
                    {
                        "type": "obs",
                        "data_url": "/qianjiajun-test/minst/data/"
                    }
                ],
                "user_image_url": "100.125.5.235:20202/jobmng/custom-cpu-base:1.0",
                "user_command": "bash -x /home/work/run_train.sh python /home/work/user-job-dir/app/mnist/mnist_softmax.py --data_url /home/work/user-job-dir/app/mnist_data",
                "model_id": 1,
                "model_metric_list": "{\"metric\":[{\"metric_values\":{\"recall\":0.005833,\"precision\":0.000178,\"accuracy\":0.000937},\"reserved_data\":{},\"metric_meta\":{\"class_name\":0,\"class_id\":0}}],\"total_metric\":{\"total_metric_meta\":{},\"total_reserved_data\":{},\"total_metric_values\":{\"recall\":0.005833,\"id\":0,\"precision\":0.000178,\"accuracy\":0.000937}}}",
                "system_metric_list": "{\"cpuUsage\":[\"0\",\"3.10\",\"5.76\",\"0\",\"0\",\"0\",\"0\"],\"memUsage\":[\"0\",\"0.77\",\"2.09\",\"0\",\"0\",\"0\",\"0\"],\"gpuUtil\":[\"0\",\"0.25\",\"0.88\",\"0\",\"0\",\"0\",\"0\"],\"gpuMemUsage\":[\"0\",\"0.65\",\"6.01\",\"0\",\"0\",\"0\",\"0\"],\"diskReadRate\":[\"0\",\"91811.07\",\"38846.63\",\"0\",\"0\",\"0\",\"0\"],\"diskWriteRate\":[\"0\",\"2.23\",\"0.94\",\"0\",\"0\",\"0\",\"0\"],\"recvBytesRate\":[\"0\",\"5770405.50\",\"2980077.75\",\"0\",\"0\",\"0\",\"0\"],\"sendBytesRate\":[\"0\",\"12607.17\",\"10487410.00\",\"0\",\"0\",\"0\",\"0\"],\"interval\":1}",
                "dataset_name": "dataset-test",
                "dataset_version_name": "dataset-version-test",
    
                "start_time": 1563172362000,
                "volumes": [
                    {
                        "nfs": {
                            "id": "43b37236-9afa-4855-8174-32254b9562e7",
                            "src_path": "192.168.8.150:/",
                            "dest_path": "/home/work/nas",
                            "read_only": false
                        }
                    },
                    {
                        "host_path": {
                            "src_path": "/root/work",
                            "dest_path": "/home/mind",
                            "read_only": false
                        }
                    }
                ],
                "pool_id": "pool9928813f",
                "pool_name": "p100",
                "nas_mount_path": "/home/work/nas",
                "nas_share_addr": "192.168.8.150:/",
                "nas_type": "nfs"
            }
        ]
    }
    
  • Failed sample response

    {
        "is_success": false,
        "error_message": "Error string",
        "error_code": "ModelArts.0105"
    
    }
    

Status Code

For details about the status code, see Status Code.

Error Codes

See Error Codes.