Creating a Dataset¶
This API is used to create a dataset.
You can debug this API through automatic authentication in or use the SDK sample code generated by API Explorer.
POST /v2/{project_id}/datasets
Parameter | Mandatory | Type | Description |
project_id | Yes | String | Project ID. For details about how to obtain a project ID, see Obtaining a Project ID and Name. |
Request Parameters¶
Parameter | Mandatory | Type | Description |
data_format | No | String | Data format. Options:
data_sources | Yes | Array of DataSource objects | Input dataset path, which is used to synchronize source data (such as images, text files, and audio files) in the directory and its subdirectories to the dataset. For a table dataset, this parameter indicates the import directory. The work directory of a table dataset cannot be an OBS path in a KMS-encrypted bucket. Only one data source can be imported at a time. |
dataset_name | Yes | String | Dataset name. |
dataset_type | No | Integer | Dataset type. Options:
description | No | String | Dataset description. The value is empty by default. The description contains 0 to 256 characters and does not support the following special characters: |
import_annotations | No | Boolean | Whether to automatically import the labeling information in the input directory, supporting detection, image classification, and text classification. Options:
import_data | No | Boolean | Whether to import data. This parameter is used only for table datasets. Options:
label_format | No | LabelFormat object | Label format information. This parameter is used only for text datasets. |
labels | No | Array of Label objects | Dataset label list. |
managed | No | Boolean | Whether to host a dataset. Options:
schema | No | Array of Field objects | Schema list. |
work_path | Yes | String | Output dataset path, which is used to store output files such as label files.
work_path_type | Yes | Integer | Type of the dataset output path. Options:
workforce_information | No | WorkforceInformation object | Team labeling information. |
workspace_id | No | String | Workspace ID. If no workspace is created, the default value is 0. If a workspace is created and used, use the actual value. |
Parameter | Mandatory | Type | Description |
data_path | No | String | Data source path. |
data_type | No | Integer | Data type. Options:
schema_maps | No | Array of SchemaMap objects | Schema mapping information corresponding to the table data. |
source_info | No | SourceInfo object | Information required for importing a table data source. |
with_column_header | No | Boolean | Whether the first row in the file is a column name. This field is valid for the table dataset. Options:
Parameter | Mandatory | Type | Description |
dest_name | No | String | Name of the destination column. |
src_name | No | String | Name of the source column. |
Parameter | Mandatory | Type | Description |
cluster_id | No | String | ID of an MRS cluster. |
cluster_mode | No | String | Running mode of an MRS cluster. Options:
cluster_name | No | String | Name of an MRS cluster. |
database_name | No | String | Name of the database to which the table dataset is imported. |
input | No | String | HDFS path of a table dataset. |
ip | No | String | IP address of your GaussDB(DWS) cluster. |
port | No | String | Port number of your GaussDB(DWS) cluster. |
queue_name | No | String | DLI queue name of a table dataset. |
subnet_id | No | String | Subnet ID of an MRS cluster. |
table_name | No | String | Name of the table to which a table dataset is imported. |
user_name | No | String | Username, which is mandatory for GaussDB(DWS) data. |
user_password | No | String | User password, which is mandatory for GaussDB(DWS) data. |
vpc_id | No | String | ID of the VPC where an MRS cluster resides. |
Parameter | Mandatory | Type | Description |
label_type | No | String | Label type of text classification. Options:
text_label_separator | No | String | Separator between labels. By default, a comma (,) is used as the separator. The separator needs to be escaped. The separator can contain only one character, such as a letter, a digit, or any of the following special characters: |
text_sample_separator | No | String | Separator between the text and label. By default, the Tab key is used as the separator. The separator needs to be escaped. The separator can contain only one character, such as a letter, a digit, or any of the following special characters: |
Parameter | Mandatory | Type | Description |
attributes | No | Array of LabelAttribute objects | Multi-dimensional attribute of a label. For example, if the label is music, attributes such as style and artist may be included. |
name | No | String | Label name. |
property | No | LabelProperty object | Basic attribute key-value pair of a label, such as color and shortcut keys. |
type | No | Integer | Label type. Options:
Parameter | Mandatory | Type | Description |
default_value | No | String | Default value of a label attribute. |
id | No | String | Label attribute ID. |
name | No | String | Label attribute name. |
type | No | String | Label attribute type. Options:
values | No | Array of LabelAttributeValue objects | List of label attribute values. |
Parameter | Mandatory | Type | Description |
id | No | String | Label attribute value ID. |
value | No | String | Label attribute value. |
Parameter | Mandatory | Type | Description |
@modelarts:color | No | String | Default attribute: Label color, which is a hexadecimal code of the color. By default, this parameter is left blank. Example: #FFFFF0. |
@modelarts:default_shape | No | String | Default attribute: Default shape of an object detection label (dedicated attribute). By default, this parameter is left blank. Options:
@modelarts:from_type | No | String | Default attribute: Type of the head entity in the triplet relationship label. This attribute must be specified when a relationship label is created. This parameter is used only for the text triplet dataset. |
@modelarts:rename_to | No | String | Default attribute: The new name of the label. |
@modelarts:shortcut | No | String | Default attribute: Label shortcut key. By default, this parameter is left blank. For example: D. |
@modelarts:to_type | No | String | Default attribute: Type of the tail entity in the triplet relationship label. This attribute must be specified when a relationship label is created. This parameter is used only for the text triplet dataset. |
Parameter | Mandatory | Type | Description |
description | No | String | Schema description. |
name | No | String | Schema name. |
schema_id | No | Integer | Schema ID. |
type | No | String | Schema value type. |
Parameter | Mandatory | Type | Description |
data_sync_type | No | Integer | Synchronization type. Options:
repetition | No | Integer | Number of persons who label each sample. The minimum value is 1. |
synchronize_auto_labeling_data | No | Boolean | Whether to synchronously update auto labeling data. Options:
synchronize_data | No | Boolean | Whether to synchronize updated data, such as uploading files, synchronizing data sources, and assigning imported unlabeled files to team members. Options:
task_id | No | String | ID of a team labeling task. |
task_name | Yes | String | Name of a team labeling task. |
workforces_config | No | WorkforcesConfig object | Manpower assignment of a team labeling task. You can delegate the administrator to assign the manpower or do it by yourself. |
Parameter | Mandatory | Type | Description |
agency | No | String | Administrator |
workforces | No | Array of WorkforceConfig objects | List of teams that execute labeling tasks. |
Parameter | Mandatory | Type | Description |
workers | No | Array of Worker objects | List of labeling team members. |
workforce_id | No | String | ID of a labeling team. |
workforce_name | No | String | Name of a labeling team. The value contains 0 to 1024 characters and does not support the following special characters: !<>=&"' |
Parameter | Mandatory | Type | Description |
create_time | No | Long | Creation time. |
description | No | String | Labeling team member description. The value contains 0 to 256 characters and does not support the following special characters: |
No | String | Email address of a labeling team member. | |
role | No | Integer | Role. Options:
status | No | Integer | Current login status of a labeling team member. Options:
update_time | No | Long | Update time. |
worker_id | No | String | ID of a labeling team member. |
workforce_id | No | String | ID of a labeling team. |
Response Parameters¶
Status code: 201
Parameter | Type | Description |
dataset_id | String | Dataset ID. |
error_code | String | Error code. |
error_msg | String | Error message. |
import_task_id | String | ID of an import task. |
Example Requests¶
Creating an Image Classification Dataset
{ "workspace_id" : "0", "dataset_name" : "dataset-457f", "dataset_type" : 0, "data_sources" : [ { "data_type" : 0, "data_path" : "/test-obs/classify/input/animals/" } ], "description" : "", "work_path" : "/test-obs/classify/output/", "work_path_type" : 0, "labels" : [ { "name" : "Rabbits", "type" : 0, "property" : { "@modelarts:color" : "#3399ff" } }, { "name" : "Bees", "type" : 0, "property" : { "@modelarts:color" : "#3399ff" } } ] }
Creating an Object Detection Dataset
{ "workspace_id" : "0", "dataset_name" : "dataset-95a6", "dataset_type" : 1, "data_sources" : [ { "data_type" : 0, "data_path" : "/test-obs/detect/input/animals/" } ], "description" : "", "work_path" : "/test-obs/detect/output/", "work_path_type" : 0, "labels" : [ { "name" : "Rabbits", "type" : 1, "property" : { "@modelarts:color" : "#3399ff" } }, { "name" : "Bees", "type" : 1, "property" : { "@modelarts:color" : "#3399ff" } } ] }
Creating a Table Dataset
{ "workspace_id" : "0", "dataset_name" : "dataset-de83", "dataset_type" : 400, "data_sources" : [ { "data_type" : 0, "data_path" : "/test-obs/table/input/", "with_column_header" : true } ], "description" : "", "work_path" : "/test-obs/table/output/", "work_path_type" : 0, "schema" : [ { "schema_id" : 1, "name" : "150", "type" : "STRING" }, { "schema_id" : 2, "name" : "4", "type" : "STRING" }, { "schema_id" : 3, "name" : "setosa", "type" : "STRING" }, { "schema_id" : 4, "name" : "versicolor", "type" : "STRING" }, { "schema_id" : 5, "name" : "virginica", "type" : "STRING" } ], "import_data" : true }
Example Responses¶
Status code: 201
"dataset_id" : "WxCREuCkBSAlQr9xrde"
Status Codes¶
Status Code | Description |
201 | Created |
401 | Unauthorized |
403 | Forbidden |
404 | Not Found |
Error Codes¶
See Error Codes.