Querying Details About a Dataset¶
Function¶
This API is used to query details about a dataset.
Debugging¶
You can debug this API through automatic authentication in or use the SDK sample code generated by API Explorer.
URI¶
GET /v2/{project_id}/datasets/{dataset_id}
Parameter | Mandatory | Type | Description |
---|---|---|---|
dataset_id | Yes | String | Dataset ID. |
project_id | Yes | String | Project ID. For details about how to obtain a project ID, see Obtaining a Project ID and Name. |
Parameter | Mandatory | Type | Description |
---|---|---|---|
check_running_task | No | Boolean | Whether to detect tasks (including initialization tasks) that are running in a dataset. Options:
|
running_task_type | No | Integer | Type of the running tasks (including initialization tasks) to be detected. The options are as follows:
|
Request Parameters¶
None
Response Parameters¶
Status code: 200
Parameter | Type | Description |
---|---|---|
annotated_sample_count | Integer | Number of labeled samples in a dataset. |
annotated_sub_sample_count | Integer | Number of labeled subsamples. |
content_labeling | Boolean | Whether to enable content labeling for the speech paragraph labeling dataset. This function is enabled by default. |
create_time | Long | Time when a dataset is created. |
current_version_id | String | Current version ID of a dataset. |
current_version_name | String | Current version name of a dataset. Version name. The value is a string of 1 to 32 characters consisting of letters, digits, underscores (_), and hyphens (-). |
data_format | String | Data format. |
data_sources | Array of DataSource objects | Data source list. |
data_statistics | Map<String,Object> | Sample statistics on a dataset, including the statistics on sample metadata. |
data_update_time | Long | Time when a sample and a label are updated. |
dataset_format | Integer | Dataset format. Options:
|
dataset_id | String | Dataset ID. |
dataset_name | String | Dataset name. |
dataset_tags | Array of strings | Key identifier list of a dataset, for example, ["Image","Object detection"]. |
dataset_type | Integer | Dataset type. Options:
|
dataset_version_count | Integer | Number of dataset versions. |
deleted_sample_count | Integer | Number of deleted samples. |
deletion_stats | Map<String,Integer> | Deletion reason statistics. |
description | String | Dataset description. |
enterprise_project_id | String | Enterprise project ID. |
exist_running_task | Boolean | Whether the dataset contains running (including initialization) tasks. Options:
|
exist_workforce_task | Boolean | Whether the dataset contains team labeling tasks. Options:
|
feature_supports | Array of strings | List of features supported by the dataset. Currently, only the value 0 is supported, indicating that the OBS file size is limited. |
import_data | Boolean | Whether to import data. Options:
|
import_task_id | String | ID of an import task. |
inner_annotation_path | String | Path for storing the labeling result of a dataset. |
inner_data_path | String | Path for storing the internal data of a dataset. |
inner_log_path | String | Path for storing internal logs of a dataset. |
inner_task_path | String | Path for internal task of a dataset. |
inner_temp_path | String | Path for storing internal temporary files of a dataset. |
inner_work_path | String | Output directory of a dataset. |
label_task_count | Integer | Number of labeling tasks. |
labels | Array of Label objects | Dataset label list. |
loading_sample_count | Integer | Number of loading samples. |
managed | Boolean | Whether a dataset is hosted. Options:
|
next_version_num | Integer | Number of next versions of a dataset. |
running_tasks_id | Array of strings | ID list of running (including initialization) tasks. |
schema | Array of Field objects | Schema list. |
status | Integer | Dataset status. Options:
|
third_path | String | Third-party path. |
total_sample_count | Integer | Total number of dataset samples. |
total_sub_sample_count | Integer | Total number of subsamples generated from the parent samples. For example, the total number of key frame images extracted from the video labeling dataset is that of subsamples. |
unconfirmed_sample_count | Integer | Number of auto labeling samples to be confirmed. |
update_time | Long | Time when a dataset is updated. |
versions | Array of DatasetVersion objects | Dataset version information. Currently, only the current version information of a dataset is recorded. |
work_path | String | Output dataset path, which is used to store output files such as label files. The path is an OBS path in the format of /Bucket name/File path. For example: /obs-bucket. |
work_path_type | Integer | Type of the dataset output path. The default value is 0, indicating an OBS bucket. |
workforce_descriptor | WorkforceDescriptor object | Team labeling information. |
workforce_task_count | Integer | Number of team labeling tasks of a dataset. |
workspace_id | String | Workspace ID. If no workspace is created, the default value is 0. If a workspace is created and used, use the actual value. |
Parameter | Type | Description |
---|---|---|
data_path | String | Data source path. |
data_type | Integer | Data type. Options:
|
schema_maps | Array of SchemaMap objects | Schema mapping information corresponding to the table data. |
source_info | SourceInfo object | Information required for importing a table data source. |
with_column_header | Boolean | Whether the first row in the file is a column name. This field is valid for the table dataset. Options:
|
Parameter | Type | Description |
---|---|---|
dest_name | String | Name of the destination column. |
src_name | String | Name of the source column. |
Parameter | Type | Description |
---|---|---|
cluster_id | String | MRS cluster ID. You can log in to the MRS console to view the information. |
cluster_mode | String | Running mode of an MRS cluster. Options:
|
cluster_name | String | MRS cluster name You can log in to the MRS console to view the information. |
database_name | String | Name of the database to which the table dataset is imported. |
input | String | HDFS path of the table data set. For example, /datasets/demo. |
ip | String | IP address of your GaussDB(DWS) cluster. |
port | String | Port number of your GaussDB(DWS) cluster. |
queue_name | String | DLI queue name of a table dataset. |
subnet_id | String | Subnet ID of an MRS cluster. |
table_name | String | Name of the table to which a table dataset is imported. |
user_name | String | Username, which is mandatory for GaussDB(DWS) data. |
user_password | String | User password, which is mandatory for GaussDB(DWS) data. |
vpc_id | String | ID of the VPC where an MRS cluster resides. |
Parameter | Type | Description |
---|---|---|
attributes | Array of LabelAttribute objects | Multi-dimensional attribute of a label. For example, if the label is music, attributes such as style and artist may be included. |
name | String | Label name. |
property | LabelProperty object | Basic attribute key-value pair of a label, such as color and shortcut keys. |
type | Integer | Label type. Options:
|
Parameter | Type | Description |
---|---|---|
description | String | Schema description. |
name | String | Schema name. |
schema_id | Integer | Schema ID. |
type | String | Schema value type. |
Parameter | Type | Description |
---|---|---|
add_sample_count | Integer | Number of added samples. |
analysis_cache_path | String | Cache path for feature analysis. |
analysis_status | Integer | Status of a feature analysis task. Options:
|
analysis_task_id | String | ID of a feature analysis task. |
annotated_sample_count | Integer | Number of samples with labeled versions. |
annotated_sub_sample_count | Integer | Number of labeled subsamples. |
clear_hard_property | Boolean | Whether to clear hard example properties during release. Options:
|
code | String | Status code of a preprocessing task such as rotation and cropping. |
create_time | Long | Time when a version is created. |
crop | Boolean | Whether to crop the image. This field is valid only for the object detection dataset whose labeling box is in the rectangle shape. Options:
|
crop_path | String | Path for storing cropped files. |
crop_rotate_cache_path | String | Temporary directory for executing the rotation and cropping task. |
data_analysis | Map<String,Object> | Feature analysis result in JSON format. |
data_path | String | Path for storing data. |
data_statistics | Map<String,Object> | Sample statistics on a dataset, including the statistics on sample metadata in JSON format. |
data_validate | Boolean | Whether data is validated by the validation algorithm before release. Options:
|
deleted_sample_count | Integer | Number of deleted samples. |
deletion_stats | Map<String,Integer> | Deletion reason statistics. |
description | String | Description of a version. |
export_images | Boolean | Whether to export images to the version output directory during release. Options:
|
extract_serial_number | Boolean | Whether to parse the subsample number during release. The field is valid for the healthcare dataset. Options:
|
include_dataset_data | Boolean | Whether to include the source data of a dataset during release. Options:
|
is_current | Boolean | Whether the current dataset version is used. Options:
|
label_stats | Array of LabelStats objects | Label statistics list of a released version. |
label_type | String | Label type of a released version. Options:
|
manifest_cache_input_path | String | Input path for the manifest file cache during version release. |
manifest_path | String | Path for storing the manifest file with the released version. |
message | String | Task information recorded during release (for example, error information). |
modified_sample_count | Integer | Number of modified samples. |
previous_annotated_sample_count | Integer | Number of labeled samples of parent versions. |
previous_total_sample_count | Integer | Total samples of parent versions. |
previous_version_id | String | Parent version ID |
processor_task_id | String | ID of a preprocessing task such as rotation and cropping. |
processor_task_status | Integer | Status of a preprocessing task such as rotation and cropping. The options are as follows:
|
remove_sample_usage | Boolean | Whether to clear the existing usage information of a dataset during release. Options:
|
rotate | Boolean | Whether to rotate the image. Options:
|
rotate_path | String | Path for storing the rotated file. |
sample_state | String | Sample status. The options are as follows:
|
start_processor_task | Boolean | Whether to start a data analysis task during release. Options:
|
status | Integer | Status of a dataset version. Options:
|
tags | Array of strings | Key identifier list of the dataset. The labeling type is used as the default label when the labeling task releases a version. For example, ["Image","Object detection"]. |
task_type | Integer | Labeling task type of the released version, which is the same as the dataset type. |
total_sample_count | Integer | Total number of version samples. |
total_sub_sample_count | Integer | Total number of subsamples generated from the parent samples. |
train_evaluate_sample_ratio | String | Split training and verification ratio during version release. The default value is 1.00, indicating that all released versions are training sets. |
update_time | Long | Time when a version is updated. |
version_format | String | Format of a dataset version. Options:
|
version_id | String | Dataset version ID. |
version_name | String | Dataset version name. |
with_column_header | Boolean | Whether the first row in the released CSV file is a column name. This field is valid for the table dataset. Options:
|
Parameter | Type | Description |
---|---|---|
attributes | Array of LabelAttribute objects | Multi-dimensional attribute of a label. For example, if the label is music, attributes such as style and artist may be included. |
count | Integer | Number of labels. |
name | String | Label name. |
property | LabelProperty object | Basic attribute key-value pair of a label, such as color and shortcut keys. |
sample_count | Integer | Number of samples containing the label. |
type | Integer | Label type. Options:
|
Parameter | Type | Description |
---|---|---|
default_value | String | Default value of a label attribute. |
id | String | Label attribute ID. You can query the tag by invoking the tag list. |
name | String | Label attribute name. The value contains a maximum of 64 characters and cannot contain the character. <>=&"'. |
type | String | Label attribute type. Options:
|
values | Array of LabelAttributeValue objects | List of label attribute values. |
Parameter | Type | Description |
---|---|---|
id | String | Label attribute value ID. |
value | String | Label attribute value. |
Parameter | Type | Description |
---|---|---|
@modelarts:color | String | Default attribute: Label color, which is a hexadecimal code of the color. By default, this parameter is left blank. Example: #FFFFF0. |
@modelarts:default_shape | String | Default attribute: Default shape of an object detection label (dedicated attribute). By default, this parameter is left blank. Options:
|
@modelarts:from_type | String | Default attribute: Type of the head entity in the triplet relationship label. This attribute must be specified when a relationship label is created. This parameter is used only for the text triplet dataset. |
@modelarts:rename_to | String | Default attribute: The new name of the label. |
@modelarts:shortcut | String | Default attribute: Label shortcut key. By default, this parameter is left blank. For example: D. |
@modelarts:to_type | String | Default attribute: Type of the tail entity in the triplet relationship label. This attribute must be specified when a relationship label is created. This parameter is used only for the text triplet dataset. |
Parameter | Type | Description |
---|---|---|
current_task_id | String | ID of a team labeling task. |
current_task_name | String | Name of a team labeling task. |
reject_num | Integer | Number of rejected samples. |
repetition | Integer | Number of persons who label each sample. The minimum value is 1. |
is_synchronize_auto_labeling_data | Boolean | Whether to synchronously update auto labeling data. Options:
|
is_synchronize_data | Boolean | Whether to synchronize updated data, such as uploading files, synchronizing data sources, and assigning imported unlabeled files to team members. Options:
|
workers | Array of Worker objects | List of labeling team members. |
workforce_id | String | ID of a labeling team. |
workforce_name | String | Name of a labeling team. |
Parameter | Type | Description |
---|---|---|
create_time | Long | Creation time. |
description | String | Labeling team member description. The value contains 0 to 256 characters and does not support the following special characters: |
String | Email address of a labeling team member. | |
role | Integer | Role. Options:
|
status | Integer | Current login status of a labeling team member. Options:
|
update_time | Long | Update time. |
worker_id | String | ID of a labeling team member. |
workforce_id | String | ID of a labeling team. |
Example Requests¶
Querying Details About a Dataset
GET https://{endpoint}/v2/{project_id}/datasets/{dataset_id}
Example Responses¶
Status code: 200
OK
{
"dataset_id" : "gfghHSokody6AJigS5A",
"dataset_name" : "dataset-f9e8",
"dataset_type" : 0,
"data_format" : "Default",
"next_version_num" : 4,
"status" : 1,
"data_sources" : [ {
"data_type" : 0,
"data_path" : "/test-obs/classify/input/animals/"
} ],
"create_time" : 1605690595404,
"update_time" : 1605690595404,
"description" : "",
"current_version_id" : "54IXbeJhfttGpL46lbv",
"current_version_name" : "V003",
"total_sample_count" : 10,
"annotated_sample_count" : 10,
"unconfirmed_sample_count" : 0,
"work_path" : "/test-obs/classify/output/",
"inner_work_path" : "/test-obs/classify/output/dataset-f9e8-gfghHSokody6AJigS5A/",
"inner_annotation_path" : "/test-obs/classify/output/dataset-f9e8-gfghHSokody6AJigS5A/annotation/",
"inner_data_path" : "/test-obs/classify/output/dataset-f9e8-gfghHSokody6AJigS5A/data/",
"inner_log_path" : "/test-obs/classify/output/dataset-f9e8-gfghHSokody6AJigS5A/logs/",
"inner_temp_path" : "/test-obs/classify/output/dataset-f9e8-gfghHSokody6AJigS5A/temp/",
"inner_task_path" : "/test-obs/classify/output/dataset-f9e8-gfghHSokody6AJigS5A/task/",
"work_path_type" : 0,
"workspace_id" : "0",
"enterprise_project_id" : "0",
"workforce_task_count" : 0,
"feature_supports" : [ "0" ],
"managed" : false,
"import_data" : false,
"label_task_count" : 1,
"dataset_format" : 0,
"dataset_version_count" : 3,
"content_labeling" : true,
"labels" : [ {
"name" : "Rabbits",
"type" : 0,
"property" : {
"@modelarts:color" : "#3399ff"
}
}, {
"name" : "Bees",
"type" : 0,
"property" : {
"@modelarts:color" : "#3399ff"
}
} ]
}
Status Codes¶
Status Code | Description |
---|---|
200 | OK |
401 | Unauthorized |
403 | Forbidden |
404 | Not Found |
Error Codes¶
See Error Codes.