Creating a Dataset¶
Function¶
This API is used to create a dataset.
Debugging¶
You can debug this API through automatic authentication in or use the SDK sample code generated by API Explorer.
URI¶
POST /v2/{project_id}/datasets
Parameter | Mandatory | Type | Description |
---|---|---|---|
project_id | Yes | String | Project ID. For details about how to obtain a project ID, see Obtaining a Project ID and Name. |
Request Parameters¶
Parameter | Mandatory | Type | Description |
---|---|---|---|
data_format | No | String | Data format. Options:
|
data_sources | Yes | Array of DataSource objects | Input dataset path, which is used to synchronize source data (such as images, text files, and audio files) in the directory and its subdirectories to the dataset. For a table dataset, this parameter indicates the import directory. The work directory of a table dataset cannot be an OBS path in a KMS-encrypted bucket. Only one data source can be imported at a time. |
dataset_name | Yes | String | Dataset name. The value contains 1 to 100 characters. Only letters, digits, underscores (_), and hyphens (-) are allowed, for example, dataset-9f3b. |
dataset_type | No | Integer | Dataset type. Options:
|
description | No | String | Dataset description. The value is empty by default. The description contains 0 to 256 characters and does not support the following special characters: |
import_annotations | No | Boolean | Indicates whether to automatically import the labeling information in the input directory. Object detection, image classification, and text classification are supported. The options are as follows:
|
import_data | No | Boolean | Whether to import data. This parameter is used only for table datasets. Options:
|
label_format | No | LabelFormat object | Label format information. This parameter is used only for text datasets. |
labels | No | Array of Label objects | Dataset label list. |
managed | No | Boolean | Whether to host a dataset. Options:
|
schema | No | Array of Field objects | Schema list. |
work_path | Yes | String | Output dataset path, which is used to store output files such as label files.
|
work_path_type | Yes | Integer | Type of the dataset output path. The default value is 0, indicating an OBS bucket. |
workforce_information | No | WorkforceInformation object | Team labeling information. |
workspace_id | No | String | Workspace ID. If no workspace is created, the default value is 0. If a workspace is created and used, use the actual value. |
Parameter | Mandatory | Type | Description |
---|---|---|---|
data_path | No | String | Data source path. |
data_type | No | Integer | Data type. Options:
|
schema_maps | No | Array of SchemaMap objects | Schema mapping information corresponding to the table data. |
source_info | No | SourceInfo object | Information required for importing a table data source. |
with_column_header | No | Boolean | Whether the first row in the file is a column name. This field is valid for the table dataset. Options:
|
Parameter | Mandatory | Type | Description |
---|---|---|---|
dest_name | No | String | Name of the destination column. |
src_name | No | String | Name of the source column. |
Parameter | Mandatory | Type | Description |
---|---|---|---|
cluster_id | No | String | MRS cluster ID. You can log in to the MRS console to view the information. |
cluster_mode | No | String | Running mode of an MRS cluster. Options:
|
cluster_name | No | String | MRS cluster name You can log in to the MRS console to view the information. |
database_name | No | String | Name of the database to which the table dataset is imported. |
input | No | String | HDFS path of the table data set. For example, /datasets/demo. |
ip | No | String | IP address of your GaussDB(DWS) cluster. |
port | No | String | Port number of your GaussDB(DWS) cluster. |
queue_name | No | String | DLI queue name of a table dataset. |
subnet_id | No | String | Subnet ID of an MRS cluster. |
table_name | No | String | Name of the table to which a table dataset is imported. |
user_name | No | String | Username, which is mandatory for GaussDB(DWS) data. |
user_password | No | String | User password, which is mandatory for GaussDB(DWS) data. |
vpc_id | No | String | ID of the VPC where an MRS cluster resides. |
Parameter | Mandatory | Type | Description |
---|---|---|---|
label_type | No | String | Label type of text classification. Options:
|
text_label_separator | No | String | Separator between labels. By default, a comma (,) is used as the separator. The separator needs to be escaped. The separator can contain only one character, such as a letter, a digit, or any of the following special characters: |
text_sample_separator | No | String | Separator between the text and label. By default, the Tab key is used as the separator. The separator needs to be escaped. The separator can contain only one character, such as a letter, a digit, or any of the following special characters: |
Parameter | Mandatory | Type | Description |
---|---|---|---|
attributes | No | Array of LabelAttribute objects | Multi-dimensional attribute of a label. For example, if the label is music, attributes such as style and artist may be included. |
name | No | String | Label name. |
property | No | LabelProperty object | Basic attribute key-value pair of a label, such as color and shortcut keys. |
type | No | Integer | Label type. Options:
|
Parameter | Mandatory | Type | Description |
---|---|---|---|
default_value | No | String | Default value of a label attribute. |
id | No | String | Label attribute ID. You can query the tag by invoking the tag list. |
name | No | String | Label attribute name. The value contains a maximum of 64 characters and cannot contain the character. <>=&"'. |
type | No | String | Label attribute type. Options:
|
values | No | Array of LabelAttributeValue objects | List of label attribute values. |
Parameter | Mandatory | Type | Description |
---|---|---|---|
id | No | String | Label attribute value ID. |
value | No | String | Label attribute value. |
Parameter | Mandatory | Type | Description |
---|---|---|---|
@modelarts:color | No | String | Default attribute: Label color, which is a hexadecimal code of the color. By default, this parameter is left blank. Example: #FFFFF0. |
@modelarts:default_shape | No | String | Default attribute: Default shape of an object detection label (dedicated attribute). By default, this parameter is left blank. Options:
|
@modelarts:from_type | No | String | Default attribute: Type of the head entity in the triplet relationship label. This attribute must be specified when a relationship label is created. This parameter is used only for the text triplet dataset. |
@modelarts:rename_to | No | String | Default attribute: The new name of the label. |
@modelarts:shortcut | No | String | Default attribute: Label shortcut key. By default, this parameter is left blank. For example: D. |
@modelarts:to_type | No | String | Default attribute: Type of the tail entity in the triplet relationship label. This attribute must be specified when a relationship label is created. This parameter is used only for the text triplet dataset. |
Parameter | Mandatory | Type | Description |
---|---|---|---|
description | No | String | Schema description. |
name | No | String | Schema name. |
schema_id | No | Integer | Schema ID. |
type | No | String | Schema value type. |
Parameter | Mandatory | Type | Description |
---|---|---|---|
data_sync_type | No | Integer | Synchronization type. Options:
|
repetition | No | Integer | Number of persons who label each sample. The minimum value is 1. |
synchronize_auto_labeling_data | No | Boolean | Whether to synchronously update auto labeling data. Options:
|
synchronize_data | No | Boolean | Whether to synchronize updated data, such as uploading files, synchronizing data sources, and assigning imported unlabeled files to team members. Options:
|
task_id | No | String | ID of a team labeling task. |
task_name | Yes | String | Name of a team labeling task. The name contains 1 to 64 characters, including only letters, digits, underscores (_), and hyphens (-). |
workforces_config | No | WorkforcesConfig object | Manpower assignment of a team labeling task. You can delegate the administrator to assign the manpower or do it by yourself. |
Parameter | Mandatory | Type | Description |
---|---|---|---|
agency | No | String | Administrator |
workforces | No | Array of WorkforceConfig objects | List of teams that execute labeling tasks. |
Parameter | Mandatory | Type | Description |
---|---|---|---|
workers | No | Array of Worker objects | List of labeling team members. |
workforce_id | No | String | ID of a labeling team. |
workforce_name | No | String | Name of a labeling team. The value contains 0 to 1024 characters and does not support the following special characters: !<>=&"' |
Parameter | Mandatory | Type | Description |
---|---|---|---|
create_time | No | Long | Creation time. |
description | No | String | Labeling team member description. The value contains 0 to 256 characters and does not support the following special characters: |
No | String | Email address of a labeling team member. | |
role | No | Integer | Role. Options:
|
status | No | Integer | Current login status of a labeling team member. Options:
|
update_time | No | Long | Update time. |
worker_id | No | String | ID of a labeling team member. |
workforce_id | No | String | ID of a labeling team. |
Response Parameters¶
Status code: 201
Parameter | Type | Description |
---|---|---|
dataset_id | String | Dataset ID. |
error_code | String | Error code. |
error_msg | String | Error message. |
import_task_id | String | ID of an import task. |
Example Requests¶
Creating an Image Classification Dataset
{ "workspace_id" : "0", "dataset_name" : "dataset-457f", "dataset_type" : 0, "data_sources" : [ { "data_type" : 0, "data_path" : "/test-obs/classify/input/animals/" } ], "description" : "", "work_path" : "/test-obs/classify/output/", "work_path_type" : 0, "labels" : [ { "name" : "Rabbits", "type" : 0, "property" : { "@modelarts:color" : "#3399ff" } }, { "name" : "Bees", "type" : 0, "property" : { "@modelarts:color" : "#3399ff" } } ] }
Creating an Object Detection Dataset
{ "workspace_id" : "0", "dataset_name" : "dataset-95a6", "dataset_type" : 1, "data_sources" : [ { "data_type" : 0, "data_path" : "/test-obs/detect/input/animals/" } ], "description" : "", "work_path" : "/test-obs/detect/output/", "work_path_type" : 0, "labels" : [ { "name" : "Rabbits", "type" : 1, "property" : { "@modelarts:color" : "#3399ff" } }, { "name" : "Bees", "type" : 1, "property" : { "@modelarts:color" : "#3399ff" } } ] }
Creating a Table Dataset
{ "workspace_id" : "0", "dataset_name" : "dataset-de83", "dataset_type" : 400, "data_sources" : [ { "data_type" : 0, "data_path" : "/test-obs/table/input/", "with_column_header" : true } ], "description" : "", "work_path" : "/test-obs/table/output/", "work_path_type" : 0, "schema" : [ { "schema_id" : 1, "name" : "150", "type" : "STRING" }, { "schema_id" : 2, "name" : "4", "type" : "STRING" }, { "schema_id" : 3, "name" : "setosa", "type" : "STRING" }, { "schema_id" : 4, "name" : "versicolor", "type" : "STRING" }, { "schema_id" : 5, "name" : "virginica", "type" : "STRING" } ], "import_data" : true }
Example Responses¶
Status code: 201
Created
{
"dataset_id" : "WxCREuCkBSAlQr9xrde"
}
Status Codes¶
Status Code | Description |
---|---|
201 | Created |
401 | Unauthorized |
403 | Forbidden |
404 | Not Found |
Error Codes¶
See Error Codes.