Querying the Dataset Version List

Function

This API is used to query the version list of a specific dataset.

Debugging

You can debug this API through automatic authentication in or use the SDK sample code generated by API Explorer.

URI

GET /v2/{project_id}/datasets/{dataset_id}/versions

Table 1 Path Parameters

Parameter

Mandatory

Type

Description

dataset_id

Yes

String

Dataset ID.

project_id

Yes

String

Project ID. For details about how to obtain a project ID, see Obtaining a Project ID and Name.

Table 2 Query Parameters

Parameter

Mandatory

Type

Description

status

No

Integer

Status of a dataset version. Options:

  • 0: creating

  • 1: running

  • 2: deleting

  • 3: deleted

  • 4: error

train_evaluate_ratio

No

String

Version split ratio for version filtering. The numbers before and after the comma indicate the minimum and maximum split ratios, and the versions whose split ratios are within the range are filtered out, for example, 0.0,1.0. Note: If this parameter is left blank or unavailable, the system does not filter datasets based on the version split ratio by default.

version_format

No

Integer

Format of a dataset version. Options:

  • 0: default format

  • 1: CarbonData (supported only by table datasets)

  • 2: CSV

offset

No

Integer

Start page for pagination display. The default value is 0.

limit

No

Integer

Maximum number of records returned on each page. The value ranges from 1 to 1000. The default value is 1000.

Request Parameters

None

Response Parameters

Status code: 200

Table 3 Response body parameters

Parameter

Type

Description

total_number

Integer

Total number of dataset versions.

versions

Array of DatasetVersion objects

Dataset version list.

Table 4 DatasetVersion

Parameter

Type

Description

add_sample_count

Integer

Number of added samples.

analysis_cache_path

String

Cache path for feature analysis.

analysis_status

Integer

Status of a feature analysis task. Options:

  • 0: initialized

  • 1: running

  • 2: completed

  • 3: failed

analysis_task_id

String

ID of a feature analysis task.

annotated_sample_count

Integer

Number of samples with labeled versions.

annotated_sub_sample_count

Integer

Number of labeled subsamples.

clear_hard_property

Boolean

Whether to clear hard example properties during release. Options:

  • true: Clear hard example properties. (Default value)

  • false: Do not clear hard example properties.

code

String

Status code of a preprocessing task such as rotation and cropping.

create_time

Long

Time when a version is created.

crop

Boolean

Whether to crop the image. This field is valid only for the object detection dataset whose labeling box is in the rectangle shape. Options:

  • true: Crop the image.

  • false: Do not crop the image. (Default value)

crop_path

String

Path for storing cropped files.

crop_rotate_cache_path

String

Temporary directory for executing the rotation and cropping task.

data_analysis

Map<String,Object>

Feature analysis result in JSON format.

data_path

String

Path for storing data.

data_statistics

Map<String,Object>

Sample statistics on a dataset, including the statistics on sample metadata in JSON format.

data_validate

Boolean

Whether data is validated by the validation algorithm before release. Options:

  • true: The data has been validated.

  • false: The data has not been validated.

deleted_sample_count

Integer

Number of deleted samples.

deletion_stats

Map<String,Integer>

Deletion reason statistics.

description

String

Description of a version.

export_images

Boolean

Whether to export images to the version output directory during release. Options:

  • true: Export images to the version output directory.

  • false: Do not export images to the version output directory. (Default value)

extract_serial_number

Boolean

Whether to parse the subsample number during release. The field is valid for the healthcare dataset. Options:

  • true: Parse the subsample number.

  • false: Do not parse the subsample number. (Default value)

include_dataset_data

Boolean

Whether to include the source data of a dataset during release. Options:

  • true: The source data of a dataset is included.

  • false: The source data of a dataset is not included.

is_current

Boolean

Whether the current dataset version is used. Options:

  • true: The current dataset version is used.

  • false: The current dataset version is not used.

label_stats

Array of LabelStats objects

Label statistics list of a released version.

label_type

String

Label type of a released version. Options:

  • multi: Multi-label samples are included.

  • single: All samples are single-labeled.

manifest_cache_input_path

String

Input path for the manifest file cache during version release.

manifest_path

String

Path for storing the manifest file with the released version.

message

String

Task information recorded during release (for example, error information).

modified_sample_count

Integer

Number of modified samples.

previous_annotated_sample_count

Integer

Number of labeled samples of parent versions.

previous_total_sample_count

Integer

Total samples of parent versions.

previous_version_id

String

Parent version ID

processor_task_id

String

ID of a preprocessing task such as rotation and cropping.

processor_task_status

Integer

Status of a preprocessing task such as rotation and cropping. Options:

  • 0: initialized

  • 1: running

  • 2: completed

  • 3: failed

  • 4: stopped

  • 5: timeout

  • 6: deletion failed

  • 7: stop failed

remove_sample_usage

Boolean

Whether to clear the existing usage information of a dataset during release. Options:

  • true: Clear the existing usage information of a dataset. (Default value)

  • false: Do not clear the existing usage information of a dataset.

rotate

Boolean

Whether to rotate the image. Options:

  • true: Rotate the image.

  • false: Do not rotate the image. (Default value)

rotate_path

String

Path for storing the rotated file.

sample_state

String

Sample status. Options:

  • __ALL__: labeled

  • __NONE__: unlabeled

  • __UNCHECK__: to be checked

  • __ACCEPTED__: accepted

  • __REJECTED__: rejected

  • __UNREVIEWED__: to be reviewed

  • __REVIEWED__: reviewed

  • __WORKFORCE_SAMPLED__: accepted data sampled

  • __WORKFORCE_SAMPLED_UNCHECK__: samples to be checked

  • __WORKFORCE_SAMPLED_CHECKED__: samples checked

  • __WORKFORCE_SAMPLED_ACCEPTED__: samples accepted

  • __WORKFORCE_SAMPLED_REJECTED__: samples rejected

  • __AUTO_ANNOTATION__: to be checked

start_processor_task

Boolean

Whether to start a data analysis task during release. Options:

  • true: Start a data analysis task during release.

  • false: Do not start a data analysis task during release. (Default value)

status

Integer

Status of a dataset version. Options:

  • 0: creating

  • 1: running

  • 2: deleting

  • 3: deleted

  • 4: error

tags

Array of strings

Key identifier list of the dataset. The labeling type is used as the default label when the labeling task releases a version. For example, ["Image","Object detection"].

task_type

Integer

Labeling task type of the released version, which is the same as the dataset type.

total_sample_count

Integer

Total number of version samples.

total_sub_sample_count

Integer

Total number of subsamples generated from the parent samples.

train_evaluate_sample_ratio

String

Split training and verification ratio during version release. The default value is 1.00, indicating that all released versions are training sets.

update_time

Long

Time when a version is updated.

version_format

String

Format of a dataset version. Options:

  • Default: default format

  • CarbonData: CarbonData (supported only by table datasets)

  • CSV: CSV

version_id

String

Dataset version ID.

version_name

String

Dataset version name.

with_column_header

Boolean

Whether the first row in the released CSV file is a column name. This field is valid for the table dataset. Options:

  • true: The first row in the released CSV file is a column name.

  • false: The first row in the released CSV file is not a column name.

Table 5 LabelStats

Parameter

Type

Description

attributes

Array of LabelAttribute objects

Multi-dimensional attribute of a label. For example, if the label is music, attributes such as style and artist may be included.

count

Integer

Number of labels.

name

String

Label name.

property

LabelProperty object

Basic attribute key-value pair of a label, such as color and shortcut keys.

sample_count

Integer

Number of samples containing the label.

type

Integer

Label type. Options:

  • 0: image classification

  • 1: object detection

  • 3: image segmentation

  • 100: text classification

  • 101: named entity recognition

  • 102: text triplet relationship

  • 103: text triplet entity

  • 200: sound classification

  • 201: speech content

  • 202: speech paragraph labeling

  • 600: video labeling

Table 6 LabelAttribute

Parameter

Type

Description

default_value

String

Default value of a label attribute.

id

String

Label attribute ID.

name

String

Label attribute name.

type

String

Label attribute type. Options:

  • text: text

  • select: single-choice drop-down list

values

Array of LabelAttributeValue objects

List of label attribute values.

Table 7 LabelAttributeValue

Parameter

Type

Description

id

String

Label attribute value ID.

value

String

Label attribute value.

Table 8 LabelProperty

Parameter

Type

Description

@modelarts:color

String

Default attribute: Label color, which is a hexadecimal code of the color. By default, this parameter is left blank. Example: #FFFFF0.

@modelarts:default_shape

String

Default attribute: Default shape of an object detection label (dedicated attribute). By default, this parameter is left blank. Options:

  • bndbox: rectangle

  • polygon: polygon

  • circle: circle

  • line: straight line

  • dashed: dotted line

  • point: point

  • polyline: polyline

@modelarts:from_type

String

Default attribute: Type of the head entity in the triplet relationship label. This attribute must be specified when a relationship label is created. This parameter is used only for the text triplet dataset.

@modelarts:rename_to

String

Default attribute: The new name of the label.

@modelarts:shortcut

String

Default attribute: Label shortcut key. By default, this parameter is left blank. For example: D.

@modelarts:to_type

String

Default attribute: Type of the tail entity in the triplet relationship label. This attribute must be specified when a relationship label is created. This parameter is used only for the text triplet dataset.

Example Requests

Querying the Version List of a Specific Dataset

GET https://{endpoint}/v2/{project_id}/datasets/{dataset_id}/versions

Example Responses

Status code: 200

OK

{
  "total_number" : 3,
  "versions" : [ {
    "version_id" : "54IXbeJhfttGpL46lbv",
    "version_name" : "V003",
    "version_format" : "Default",
    "previous_version_id" : "eSOKEQaXhKzxN00WKoV",
    "status" : 1,
    "create_time" : 1605930512183,
    "total_sample_count" : 10,
    "annotated_sample_count" : 10,
    "total_sub_sample_count" : 0,
    "annotated_sub_sample_count" : 0,
    "manifest_path" : "/test-obs/classify/output/dataset-f9e8-gfghHSokody6AJigS5A/annotation/V003/V003.manifest",
    "data_path" : "/test-obs/classify/output/dataset-f9e8-gfghHSokody6AJigS5A/annotation/V003/data/",
    "is_current" : true,
    "analysis_status" : 3,
    "train_evaluate_sample_ratio" : "0.8",
    "remove_sample_usage" : false,
    "export_images" : false,
    "description" : "",
    "task_type" : 0,
    "extract_serial_number" : false
  }, {
    "version_id" : "eSOKEQaXhKzxN00WKoV",
    "version_name" : "V002",
    "version_format" : "Default",
    "previous_version_id" : "vlGvUqOcxxGPIB0ugeE",
    "status" : 1,
    "create_time" : 1605691027084,
    "total_sample_count" : 10,
    "annotated_sample_count" : 10,
    "total_sub_sample_count" : 0,
    "annotated_sub_sample_count" : 0,
    "manifest_path" : "/test-obs/classify/output/dataset-f9e8-gfghHSokody6AJigS5A/annotation/V002/V002.manifest",
    "data_path" : "/test-obs/classify/output/dataset-f9e8-gfghHSokody6AJigS5A/annotation/V002/data/",
    "is_current" : false,
    "analysis_status" : 3,
    "train_evaluate_sample_ratio" : "0.9999",
    "remove_sample_usage" : false,
    "export_images" : false,
    "description" : "",
    "task_type" : 0,
    "extract_serial_number" : false
  }, {
    "version_id" : "vlGvUqOcxxGPIB0ugeE",
    "version_name" : "V001",
    "version_format" : "Default",
    "status" : 1,
    "create_time" : 1605690687346,
    "total_sample_count" : 10,
    "annotated_sample_count" : 10,
    "total_sub_sample_count" : 0,
    "annotated_sub_sample_count" : 0,
    "manifest_path" : "/test-obs/classify/output/dataset-f9e8-gfghHSokody6AJigS5A/annotation/V001/V001.manifest",
    "data_path" : "/test-obs/classify/output/dataset-f9e8-gfghHSokody6AJigS5A/annotation/V001/data/",
    "is_current" : false,
    "analysis_status" : 3,
    "train_evaluate_sample_ratio" : "0.99",
    "remove_sample_usage" : false,
    "export_images" : false,
    "description" : "",
    "task_type" : 0,
    "extract_serial_number" : false
  } ]
}

Status Codes

Status Code

Description

200

OK

401

Unauthorized

403

Forbidden

404

Not Found

Error Codes

See Error Codes.