Creating a Dataset Export Task

Function

This API is used to create a dataset export task to export a dataset to OBS or new datasets.

URI

POST /v2/{project_id}/datasets/{dataset_id}/export-tasks

Table 1 Path Parameters

Parameter

Mandatory

Type

Description

dataset_id

Yes

String

Dataset ID.

project_id

Yes

String

Project ID. For details about how to obtain the project ID, see Obtaining a Project ID.

Request Parameters

Table 2 Request body parameters

Parameter

Mandatory

Type

Description

annotation_format

No

String

Labeling format. The options are as follows:

  • VOC: VOC

  • COCO: COCO

dataset_id

No

String

Dataset ID.

dataset_type

No

Integer

Dataset type. The options are as follows:

  • 0: image classification

  • 1: object detection

  • 100: text classification

  • 101: named entity recognition

  • 102: text triplet

  • 200: sound classification

  • 201: speech content

  • 202: speech paragraph labeling

  • 400: table dataset

  • 600: video labeling

  • 900: custom format

export_format

No

Integer

Format of the exported directory. The options are as follows:

  • 1: tree structure. For example: cat/1.jpg,rabbit/2.jpg.

  • 2: tile structure. For example: 1.jpg, 1.txt; 2.jpg,2.txt.

export_params

No

ExportParams object

Parameters of a dataset export task.

export_type

No

Integer

Export type. The options are as follows:

  • 0: labeled

  • 1: unlabeled

  • 2: all

  • 3: conditional search

path

No

String

Export output path.

sample_state

No

String

Sample status. The options are as follows:

  • ALL: labeled

  • NONE: unlabeled

  • UNCHECK: pending acceptance

  • ACCEPTED: accepted

  • REJECTED: rejected

  • UNREVIEWED: pending review

  • REVIEWED: reviewed

  • WORKFORCE_SAMPLED: sampled

  • WORKFORCE_SAMPLED_UNCHECK: sampling unchecked

  • WORKFORCE_SAMPLED_CHECKED: sampling checked

  • WORKFORCE_SAMPLED_ACCEPTED: sampling accepted

  • WORKFORCE_SAMPLED_REJECTED: sampling rejected

  • AUTO_ANNOTATION: to be confirmed

source_type_header

No

String

Prefix of the OBS path in the exported labeling file. The default value is obs://. You can set it to s3://. The image path starting with obs cannot be parsed during training. Set the path prefix in the exported manifest file to s3://.

status

No

Integer

Task status.

task_id

No

String

Task ID.

version_format

No

String

Format of a dataset version. The options are as follows:

  • Default: default format

  • CarbonData: CarbonData (supported only by table datasets)

  • CSV: CSV

version_id

No

String

Dataset version ID.

with_column_header

No

Boolean

Whether to write the column name in the first line of the CSV file during export. This field is valid for the table dataset. The options are as follows:

  • true: Write the column name in the first line of the CSV file. (Default value)

  • false: Do not write the column name in the first line of the CSV file.

Table 3 ExportParams

Parameter

Mandatory

Type

Description

clear_hard_property

No

Boolean

Whether to clear hard example attributes. The options are as follows:

  • true: Clear hard example attributes. (Default value)

  • false: Do not clear hard example attributes.

export_dataset_version_format

No

String

Format of the dataset version to which data is exported.

export_dataset_version_name

No

String

Name of the dataset version to which data is exported.

export_dest

No

String

Export destination. The options are as follows:

  • DIR: Export data to OBS. (Default value)

  • NEW_DATASET: Export data to a new dataset.

export_new_dataset_name

No

String

Name of the new dataset to which data is exported.

export_new_dataset_work_path

No

String

Working directory of the new dataset to which data is exported.

ratio_sample_usage

No

Boolean

Whether to randomly allocate the training set and validation set based on the specified ratio. The options are as follows:

  • true: Allocate the training set and validation set.

  • false: Do not allocate the training set and validation set. (Default value)

sample_state

No

String

Sample status. The options are as follows:

  • ALL: labeled

  • NONE: unlabeled

  • UNCHECK: pending acceptance

  • ACCEPTED: accepted

  • REJECTED: rejected

  • UNREVIEWED: pending review

  • REVIEWED: reviewed

  • WORKFORCE_SAMPLED: sampled

  • WORKFORCE_SAMPLED_UNCHECK: sampling unchecked

  • WORKFORCE_SAMPLED_CHECKED: sampling checked

  • WORKFORCE_SAMPLED_ACCEPTED: sampling accepted

  • WORKFORCE_SAMPLED_REJECTED: sampling rejected

  • AUTO_ANNOTATION: to be confirmed

samples

No

Array of strings

ID list of exported samples.

search_conditions

No

Array of SearchCondition objects

Exported search conditions. The relationship between multiple search conditions is OR.

train_sample_ratio

No

String

Split ratio of training set and verification set during specified version release. The default value is 1.00, indicating that all released versions are training sets.

Table 4 SearchCondition

Parameter

Mandatory

Type

Description

coefficient

No

String

Filter by coefficient of difficulty.

frame_in_video

No

Integer

A frame in the video.

hard

No

String

Whether a sample is a hard sample. The options are as follows:

  • 0: non-hard sample

  • 1: hard sample

import_origin

No

String

Filter by data source.

kvp

No

String

CT dosage, filtered by dosage.

label_list

No

SearchLabels object

Label search criteria.

labeler

No

String

Labeler.

metadata

No

SearchProp object

Search by sample attribute.

parent_sample_id

No

String

Parent sample ID.

sample_dir

No

String

Directory where data samples are stored (the directory must end with a slash (/)). Only samples in the specified directory are searched for. Recursive search of directories is not supported.

sample_name

No

String

Search by sample name, including the file name extension.

sample_time

No

String

When a sample is added to the dataset, an index is created based on the last modification time (accurate to day) of the sample on OBS. You can search for the sample based on the time. The options are as follows:

  • month: Search for samples added from 30 days ago to the current day.

  • day: Search for samples added from yesterday (one day ago) to the current day.

  • yyyyMMdd-yyyyMMdd: Search for samples added in a specified period (at most 30 days), in the format of Start date-End date. For example, 20190901-20190915 indicates that samples generated from September 1 to September 15, 2019 are searched.

score

No

String

Search by confidence.

slice_thickness

No

String

DICOM layer thickness. Samples are filtered by layer thickness.

study_date

No

String

DICOM scanning time.

time_in_video

No

String

A time point in the video.

Table 5 SearchLabels

Parameter

Mandatory

Type

Description

labels

No

Array of SearchLabel objects

List of label search criteria.

op

No

String

If you want to search for multiple labels, op must be specified. If you search for only one label, op can be left blank. The options are as follows:

  • OR: OR operation

  • AND: AND operation

Table 6 SearchLabel

Parameter

Mandatory

Type

Description

name

No

String

Label name.

op

No

String

Operation type between multiple attributes. The options are as follows:

  • OR: OR operation

  • AND: AND operation

property

No

Map<String,Array<String>>

Label attribute, which is in the Object format and stores any key-value pairs. key indicates the attribute name, and value indicates the value list. If value is null, the search is not performed by value. Otherwise, the search value can be any value in the list.

type

No

Integer

Label type. The options are as follows:

  • 0: image classification

  • 1: object detection

  • 100: text classification

  • 101: named entity recognition

  • 102: text triplet relationship

  • 103: text triplet entity

  • 200: speech classification

  • 201: speech content

  • 202: speech paragraph labeling

  • 600: video classification

Table 7 SearchProp

Parameter

Mandatory

Type

Description

op

No

String

Relationship between attribute values. The options are as follows:

  • AND: AND relationship

  • OR: OR relationship

props

No

Map<String,Array<String>>

Search criteria of an attribute. Multiple search criteria can be set.

Response Parameters

Status code: 200

Table 8 Response body parameters

Parameter

Type

Description

create_time

Long

Time when a task is created.

error_code

String

Error code.

error_msg

String

Error message.

export_format

Integer

Format of the exported directory. The options are as follows:

  • 1: tree structure. For example: cat/1.jpg,rabbit/2.jpg.

  • 2: tile structure. For example: 1.jpg, 1.txt; 2.jpg,2.txt.

export_params

ExportParams object

Parameters of a dataset export task.

export_type

Integer

Export type. The options are as follows:

  • 0: labeled

  • 1: unlabeled

  • 2: all

  • 3: conditional search

finished_sample_count

Integer

Number of completed samples.

path

String

Export output path.

progress

Float

Percentage of current task progress.

status

String

Task status. The options are as follows:

  • INIT: initialized

  • RUNNING: running

  • FAILED: failed

  • SUCCESSED: completed

task_id

String

Task ID.

total_sample_count

Integer

Total number of samples.

update_time

Long

Time when a task is updated.

version_format

String

Format of a dataset version. The options are as follows:

  • Default: default format

  • CarbonData: CarbonData (supported only by table datasets)

  • CSV: CSV

version_id

String

Dataset version ID.

Table 9 ExportParams

Parameter

Type

Description

clear_hard_property

Boolean

Whether to clear hard example attributes. The options are as follows:

  • true: Clear hard example attributes. (Default value)

  • false: Do not clear hard example attributes.

export_dataset_version_format

String

Format of the dataset version to which data is exported.

export_dataset_version_name

String

Name of the dataset version to which data is exported.

export_dest

String

Export destination. The options are as follows:

  • DIR: Export data to OBS. (Default value)

  • NEW_DATASET: Export data to a new dataset.

export_new_dataset_name

String

Name of the new dataset to which data is exported.

export_new_dataset_work_path

String

Working directory of the new dataset to which data is exported.

ratio_sample_usage

Boolean

Whether to randomly allocate the training set and validation set based on the specified ratio. The options are as follows:

  • true: Allocate the training set and validation set.

  • false: Do not allocate the training set and validation set. (Default value)

sample_state

String

Sample status. The options are as follows:

  • ALL: labeled

  • NONE: unlabeled

  • UNCHECK: pending acceptance

  • ACCEPTED: accepted

  • REJECTED: rejected

  • UNREVIEWED: pending review

  • REVIEWED: reviewed

  • WORKFORCE_SAMPLED: sampled

  • WORKFORCE_SAMPLED_UNCHECK: sampling unchecked

  • WORKFORCE_SAMPLED_CHECKED: sampling checked

  • WORKFORCE_SAMPLED_ACCEPTED: sampling accepted

  • WORKFORCE_SAMPLED_REJECTED: sampling rejected

  • AUTO_ANNOTATION: to be confirmed

samples

Array of strings

ID list of exported samples.

search_conditions

Array of SearchCondition objects

Exported search conditions. The relationship between multiple search conditions is OR.

train_sample_ratio

String

Split ratio of training set and verification set during specified version release. The default value is 1.00, indicating that all released versions are training sets.

Table 10 SearchCondition

Parameter

Type

Description

coefficient

String

Filter by coefficient of difficulty.

frame_in_video

Integer

A frame in the video.

hard

String

Whether a sample is a hard sample. The options are as follows:

  • 0: non-hard sample

  • 1: hard sample

import_origin

String

Filter by data source.

kvp

String

CT dosage, filtered by dosage.

label_list

SearchLabels object

Label search criteria.

labeler

String

Labeler.

metadata

SearchProp object

Search by sample attribute.

parent_sample_id

String

Parent sample ID.

sample_dir

String

Directory where data samples are stored (the directory must end with a slash (/)). Only samples in the specified directory are searched for. Recursive search of directories is not supported.

sample_name

String

Search by sample name, including the file name extension.

sample_time

String

When a sample is added to the dataset, an index is created based on the last modification time (accurate to day) of the sample on OBS. You can search for the sample based on the time. The options are as follows:

  • month: Search for samples added from 30 days ago to the current day.

  • day: Search for samples added from yesterday (one day ago) to the current day.

  • yyyyMMdd-yyyyMMdd: Search for samples added in a specified period (at most 30 days), in the format of Start date-End date. For example, 20190901-20190915 indicates that samples generated from September 1 to September 15, 2019 are searched.

score

String

Search by confidence.

slice_thickness

String

DICOM layer thickness. Samples are filtered by layer thickness.

study_date

String

DICOM scanning time.

time_in_video

String

A time point in the video.

Table 11 SearchLabels

Parameter

Type

Description

labels

Array of SearchLabel objects

List of label search criteria.

op

String

If you want to search for multiple labels, op must be specified. If you search for only one label, op can be left blank. The options are as follows:

  • OR: OR operation

  • AND: AND operation

Table 12 SearchLabel

Parameter

Type

Description

name

String

Label name.

op

String

Operation type between multiple attributes. The options are as follows:

  • OR: OR operation

  • AND: AND operation

property

Map<String,Array<String>>

Label attribute, which is in the Object format and stores any key-value pairs. key indicates the attribute name, and value indicates the value list. If value is null, the search is not performed by value. Otherwise, the search value can be any value in the list.

type

Integer

Label type. The options are as follows:

  • 0: image classification

  • 1: object detection

  • 100: text classification

  • 101: named entity recognition

  • 102: text triplet relationship

  • 103: text triplet entity

  • 200: speech classification

  • 201: speech content

  • 202: speech paragraph labeling

  • 600: video classification

Table 13 SearchProp

Parameter

Type

Description

op

String

Relationship between attribute values. The options are as follows:

  • AND: AND relationship

  • OR: OR relationship

props

Map<String,Array<String>>

Search criteria of an attribute. Multiple search criteria can be set.

Example Requests

  • Creating an Export Task (Exporting Data to OBS)

    {
      "path" : "/test-obs/daoChu/",
      "export_type" : 3,
      "export_params" : {
        "sample_state" : "",
        "export_dest" : "DIR"
      }
    }
    
  • Creating an Export Task (Exporting Data to a New Dataset)

    {
      "path" : "/test-obs/classify/input/",
      "export_type" : 3,
      "export_params" : {
        "sample_state" : "",
        "export_dest" : "NEW_DATASET",
        "export_new_dataset_name" : "dataset-export-test",
        "export_new_dataset_work_path" : "/test-obs/classify/output/"
      }
    }
    

Example Responses

Status code: 200

OK

{
  "task_id" : "rF9NNoB56k5rtYKg2Y7"
}

Status Codes

Status Code

Description

200

OK

401

Unauthorized

403

Forbidden

404

Not Found

Error Codes

See Error Codes.