Querying the Dataset List¶

Function¶

This API is used to query the created datasets that meet the search criteria by page.

URI¶

GET /v2/{project_id}/datasets

**Table 1** Path Parameters¶
Parameter	Mandatory	Type	Description
project_id	Yes	String	Project ID. For details about how to obtain the project ID, see Obtaining a Project ID.

**Table 2** Query Parameters¶
Parameter	Mandatory	Type	Description
check_running_task	No	Boolean	Whether to detect tasks (including initialization tasks) that are running in a dataset. The options are as follows: true: Detect tasks (including initialization tasks) that are running in the dataset. false: Do not detect tasks (including initialization tasks) that are running in the dataset. (Default value)
contain_versions	No	Boolean	Whether the dataset contains a version.
dataset_type	No	Integer	Dataset type. The options are as follows: 0: image classification 1: object detection 100: text classification 101: named entity recognition 102: text triplet 200: sound classification 201: speech content 202: speech paragraph labeling 400: table dataset 600: video labeling 900: custom format
file_preview	No	Boolean	Whether a dataset supports preview when it is queried. The options are as follows: true: Preview is supported and the list of four dataset files is returned. false: Preview is not supported. (Default value)
limit	No	Integer	Maximum number of records returned on each page. The value ranges from 1 to 100. The default value is 10.
offset	No	Integer	Start page of the paging list. The default value is 0.
order	No	String	Sorting sequence of the query. The options are as follows: asc: ascending order desc: descending order (default value)
running_task_type	No	Integer	Type of the running tasks (including initialization tasks) to be detected. The options are as follows: 0: auto labeling 1: pre-labeling 2: export 3: version switch 4: manifest file export 5: manifest file import 6: version publishing 7: auto grouping 10: one-click model deployment (default value)
search_content	No	String	Fuzzy search keyword. By default, this parameter is left blank.
sort_by	No	String	Sorting mode of the query. The options are as follows: create_time: Sort by creation time. (Default value) dataset_name: Sort by dataset name.
support_export	No	Boolean	Whether to filter datasets that can be exported only (including datasets of image classification, object detection, and custom format). If this parameter is left blank or the value is set to false, no filtering is performed. The options are as follows:- true: Filter datasets that can be exported only.- false: Do not filter datasets that can be exported only. (Default value)
train_evaluate_ratio	No	String	Version split ratio for dataset filtering. The numbers before and after the comma indicate the minimum and maximum split ratios, and the versions whose split ratios are within the range are filtered out, for example, 0.0,1.0. Note: If this parameter is left blank or unavailable, the system does not filter datasets based on the version split ratio by default.
version_format	No	Integer	Dataset version format for dataset filtering. This parameter is used to filter datasets that meet the filter criteria. The options are as follows: 0: default format 1: CarbonData (supported only by table datasets) 2: CSV
with_labels	No	Boolean	Whether to return dataset labels. The options are as follows: true: Return label information. false: Do not return label information. (Default value)
workspace_id	No	String	Workspace ID. If no workspace is created, the default value is 0. If a workspace is created and used, use the actual value.

Request Parameters¶

None

Response Parameters¶

Status code: 200

**Table 3** Response body parameters¶
Parameter	Type	Description
datasets	Array of DatasetAndFilePreview objects	Dataset list queried by page.
total_number	Integer	Total number of datasets.
workspaceId	String	Workspace ID. If no workspace is created, the default value is 0. If a workspace is created and used, use the actual value.

**Table 4** DatasetAndFilePreview¶
Parameter	Type	Description
annotated_sample_count	Integer	Number of labeled samples in a dataset.
annotated_sub_sample_count	Integer	Number of labeled subsamples.
content_labeling	Boolean	Whether to enable content labeling for the speech paragraph labeling dataset. This function is enabled by default.
create_time	Long	Time when a dataset is created.
current_version_id	String	Current version ID of a dataset.
current_version_name	String	Current version name of a dataset.
data_format	String	Data format.
data_sources	Array of DataSource objects	Data source list.
data_statistics	Map<String,Object>	Sample statistics on a dataset, including the statistics on sample metadata in JSON format.
data_update_time	Long	Time when a sample and a label are updated.
data_url	String	Data path for training.
dataset_format	Integer	Dataset format. The options are as follows: 0: file 1: table
dataset_id	String	Dataset ID.
dataset_name	String	Dataset name.
dataset_tags	Array of strings	Key identifier list of a dataset, for example, ["Image","Object detection"].
dataset_type	Integer	Dataset type. The options are as follows: 0: image classification 1: object detection 100: text classification 101: named entity recognition 102: text triplet 200: sound classification 201: speech content 202: speech paragraph labeling 400: table dataset 600: video labeling 900: custom format
dataset_version_count	Integer	Version number of a dataset.
deleted_sample_count	Integer	Number of deleted samples.
deletion_stats	Map<String,Integer>	Deletion reason statistics.
description	String	Dataset description.
enterprise_project_id	String	Enterprise project ID.
exist_running_task	Boolean	Whether the dataset contains running (including initialization) tasks. The options are as follows: true: The dataset contains running tasks. false: The dataset does not contain running tasks.
exist_workforce_task	Boolean	Whether the dataset contains team labeling tasks. The options are as follows: true: The dataset contains team labeling tasks. false: The dataset does not contain team labeling tasks.
feature_supports	Array of strings	List of features supported by the dataset. Currently, only the value 0 is supported, indicating that the OBS file size is limited.
import_data	Boolean	Whether to import data. The options are as follows: true: Import data. false: Do not import data.
import_task_id	String	ID of an import task.
inner_annotation_path	String	Path for storing the labeling result of a dataset.
inner_data_path	String	Path for storing the internal data of a dataset.
inner_log_path	String	Path for storing internal logs of a dataset.
inner_task_path	String	Path for internal task of a dataset.
inner_temp_path	String	Path for storing internal temporary files of a dataset.
inner_work_path	String	Output directory of a dataset.
label_task_count	Integer	Number of labeling tasks.
labels	Array of Label objects	Dataset label list.
loading_sample_count	Integer	Number of loading samples.
managed	Boolean	Whether a dataset is hosted. The options are as follows: true: The dataset is hosted. false: The dataset is not hosted.
next_version_num	Integer	Number of next versions of a dataset.
running_tasks_id	Array of strings	ID list of running (including initialization) tasks.
samples	Array of AnnotationFile objects	Sample list.
schema	Array of Field objects	Schema list.
status	Integer	Dataset status. The options are as follows: 0: creating dataset 1: normal dataset 2: deleting dataset 3: deleted dataset 4: abnormal dataset 5: synchronizing dataset 6: releasing dataset 7: dataset in version switching 8: importing dataset
third_path	String	Third-party path.
total_sample_count	Integer	Total number of dataset samples.
total_sub_sample_count	Integer	Total number of subsamples generated from the parent samples. For example, the total number of key frame images extracted from the video labeling dataset is that of subsamples.
unconfirmed_sample_count	Integer	Number of auto labeling samples to be confirmed.
update_time	Long	Time when a dataset is updated.
versions	Array of DatasetVersion objects	Dataset version information. Currently, only the current version information of a dataset is recorded.
work_path	String	Output dataset path, which is used to store output files such as label files. The path is an OBS path in the format of /Bucket name/File path. For example: /obs-bucket.
work_path_type	Integer	Type of the dataset output path. The options are as follows: 0: OBS bucket (default value)
workforce_descriptor	WorkforceDescriptor object	Team labeling information.
workforce_task_count	Integer	Number of team labeling tasks of a dataset.
workspace_id	String	Workspace ID. If no workspace is created, the default value is 0. If a workspace is created and used, use the actual value.

**Table 5** DataSource¶
Parameter	Type	Description
data_path	String	Data source path.
data_type	Integer	Data type. The options are as follows: 0: OBS bucket (default value) 1: GaussDB(DWS) 2: DLI 3: RDS 4: MRS 5: AI Gallery 6: Inference service
schema_maps	Array of SchemaMap objects	Schema mapping information corresponding to the table data.
source_info	SourceInfo object	Information required for importing a table data source.
with_column_header	Boolean	Whether the first row in the file is a column name. This field is valid for the table dataset. The options are as follows: true: The first row in the file is the column name. false: The first row in the file is not the column name.

**Table 6** SchemaMap¶
Parameter	Type	Description
dest_name	String	Name of the destination column.
src_name	String	Name of the source column.

**Table 7** SourceInfo¶
Parameter	Type	Description
cluster_id	String	ID of an MRS cluster.
cluster_mode	String	Running mode of an MRS cluster. The options are as follows: 0: normal cluster 1: security cluster
cluster_name	String	Name of an MRS cluster.
database_name	String	Name of the database to which the table dataset is imported.
input	String	HDFS path of a table dataset.
ip	String	IP address of your GaussDB(DWS) cluster.
port	String	Port number of your GaussDB(DWS) cluster.
queue_name	String	DLI queue name of a table dataset.
subnet_id	String	Subnet ID of an MRS cluster.
table_name	String	Name of the table to which a table dataset is imported.
user_name	String	Username, which is mandatory for GaussDB(DWS) data.
user_password	String	User password, which is mandatory for GaussDB(DWS) data.
vpc_id	String	ID of the VPC where an MRS cluster resides.

**Table 8** Label¶
Parameter	Type	Description
attributes	Array of LabelAttribute objects	Multi-dimensional attribute of a label. For example, if the label is music, attributes such as style and artist may be included.
name	String	Label name.
property	LabelProperty object	Basic attribute key-value pair of a label, such as color and shortcut keys.
type	Integer	Label type. The options are as follows: 0: image classification 1: object detection 100: text classification 101: named entity recognition 102: text triplet relationship 103: text triplet entity 200: speech classification 201: speech content 202: speech paragraph labeling 600: video classification

**Table 9** LabelProperty¶
Parameter	Type	Description
@modelarts:color	String	Default attribute: Label color, which is a hexadecimal code of the color. By default, this parameter is left blank. Example: #FFFFF0.
@modelarts:default_shape	String	Default attribute: Default shape of an object detection label (dedicated attribute). By default, this parameter is left blank. The options are as follows: bndbox: rectangle polygon: polygon circle: circle line: straight line dashed: dotted line point: point polyline: polyline
@modelarts:from_type	String	Default attribute: Type of the head entity in the triplet relationship label. This attribute must be specified when a relationship label is created. This parameter is used only for the text triplet dataset.
@modelarts:rename_to	String	Default attribute: The new name of the label.
@modelarts:shortcut	String	Default attribute: Label shortcut key. By default, this parameter is left blank. For example: D.
@modelarts:to_type	String	Default attribute: Type of the tail entity in the triplet relationship label. This attribute must be specified when a relationship label is created. This parameter is used only for the text triplet dataset.

**Table 10** AnnotationFile¶
Parameter	Type	Description
create_time	Long	Time when a sample is created.
dataset_id	String	Dataset ID.
depth	Integer	Number of image sample channels.
file_Name	String	Sample name.
file_id	String	Sample ID.
file_type	String	File type.
height	Integer	Image sample height.
size	Long	Image sample size.
tags	Map<String,String>	Label information of a sample.
url	String	OBS address of the preview sample.
width	Integer	Image sample width.

**Table 11** Field¶
Parameter	Type	Description
description	String	Schema description.
name	String	Schema name.
schema_id	Integer	Schema ID.
type	String	Schema value type.

**Table 12** DatasetVersion¶
Parameter	Type	Description
add_sample_count	Integer	Number of added samples.
annotated_sample_count	Integer	Number of samples with labeled versions.
annotated_sub_sample_count	Integer	Number of labeled subsamples.
clear_hard_property	Boolean	Whether to clear hard example properties during release. The options are as follows: true: Clear hard example properties. (Default value) false: Do not clear hard example properties.
code	String	Status code of a preprocessing task such as rotation and cropping.
create_time	Long	Time when a version is created.
crop	Boolean	Whether to crop the image. This field is valid only for the object detection dataset whose labeling box is in the rectangle shape. The options are as follows: true: Crop the image. false: Do not crop the image. (Default value)
crop_path	String	Path for storing cropped files.
crop_rotate_cache_path	String	Temporary directory for executing the rotation and cropping task.
data_path	String	Path for storing data.
data_statistics	Map<String,Object>	Sample statistics on a dataset, including the statistics on sample metadata in JSON format.
data_validate	Boolean	Whether data is validated by the validation algorithm before release. The options are as follows: true: The data has been validated. false: The data has not been validated.
deleted_sample_count	Integer	Number of deleted samples.
deletion_stats	Map<String,Integer>	Deletion reason statistics.
description	String	Description of a version.
export_images	Boolean	Whether to export images to the version output directory during release. The options are as follows: true: Export images to the version output directory. false: Do not export images to the version output directory. (Default value)
extract_serial_number	Boolean	Whether to parse the subsample number during release. The field is valid for the healthcare dataset. The options are as follows: true: Parse the subsample number. false: Do not parse the subsample number. (Default value)
include_dataset_data	Boolean	Whether to include the source data of a dataset during release. The options are as follows: true: The source data of a dataset is included. false: The source data of a dataset is not included.
is_current	Boolean	Whether the current dataset version is used. The options are as follows: true: The current dataset version is used. false: The current dataset version is not used.
label_stats	Array of LabelStats objects	Label statistics list of a released version.
label_type	String	Label type of a released version. The options are as follows: multi: Multi-label samples are included. single: All samples are single-labeled.
manifest_cache_input_path	String	Input path for the manifest file cache during version release.
manifest_path	String	Path for storing the manifest file with the released version.
message	String	Task information recorded during release (for example, error information).
modified_sample_count	Integer	Number of modified samples.
previous_annotated_sample_count	Integer	Number of labeled samples of parent versions.
previous_total_sample_count	Integer	Total samples of parent versions.
previous_version_id	String	Parent version ID
processor_task_id	String	ID of a preprocessing task such as rotation and cropping.
processor_task_status	Integer	Status of a preprocessing task such as rotation and cropping. The options are as follows: 0: initialized 1: running 2: completed 3: failed 4: stopped 5: timeout 6: deletion failed 7: stop failed
remove_sample_usage	Boolean	Whether to clear the existing usage information of a dataset during release. The options are as follows: true: Clear the existing usage information of a dataset. (Default value) false: Do not clear the existing usage information of a dataset.
rotate	Boolean	Whether to rotate the image. The options are as follows: true: Rotate the image. false: Do not rotate the image. (Default value)
rotate_path	String	Path for storing the rotated file.
sample_state	String	Sample status. The options are as follows: ALL: labeled NONE: unlabeled UNCHECK: pending acceptance ACCEPTED: accepted REJECTED: rejected UNREVIEWED: pending review REVIEWED: reviewed WORKFORCE_SAMPLED: sampled WORKFORCE_SAMPLED_UNCHECK: sampling unchecked WORKFORCE_SAMPLED_CHECKED: sampling checked WORKFORCE_SAMPLED_ACCEPTED: sampling accepted WORKFORCE_SAMPLED_REJECTED: sampling rejected AUTO_ANNOTATION: to be confirmed
status	Integer	Status of a dataset version. The options are as follows: 0: creating 1: running 2: deleting 3: deleted 4: error
tags	Array of strings	Key identifier list of the dataset. The labeling type is used as the default label when the labeling task releases a version. For example, ["Image","Object detection"].
task_type	Integer	Labeling task type of the released version, which is the same as the dataset type.
total_sample_count	Integer	Total number of version samples.
total_sub_sample_count	Integer	Total number of subsamples generated from the parent samples.
train_evaluate_sample_ratio	String	Split training and verification ratio during version release. The default value is 1.00, indicating that all labeled samples are split into the training set.
update_time	Long	Time when a version is updated.
version_format	String	Format of a dataset version. The options are as follows: Default: default format CarbonData: CarbonData (supported only by table datasets) CSV: CSV
version_id	String	Dataset version ID.
version_name	String	Dataset version name.
with_column_header	Boolean	Whether the first row in the released CSV file is a column name. This field is valid for the table dataset. The options are as follows: true: The first row in the released CSV file is a column name. false: The first row in the released CSV file is not a column name.

**Table 13** LabelStats¶
Parameter	Type	Description
attributes	Array of LabelAttribute objects	Multi-dimensional attribute of a label. For example, if the label is music, attributes such as style and artist may be included.
count	Integer	Number of labels.
name	String	Label name.
property	LabelProperty object	Basic attribute key-value pair of a label, such as color and shortcut keys.
sample_count	Integer	Number of samples containing the label.
type	Integer	Label type. The options are as follows: 0: image classification 1: object detection 100: text classification 101: named entity recognition 102: text triplet relationship 103: text triplet entity 200: speech classification 201: speech content 202: speech paragraph labeling 600: video classification

**Table 14** LabelAttribute¶
Parameter	Type	Description
default_value	String	Default value of a label attribute.
id	String	Label attribute ID.
name	String	Label attribute name.
type	String	Label attribute type. The options are as follows: text: text select: single-choice drop-down list
values	Array of LabelAttributeValue objects	List of label attribute values.

**Table 15** LabelAttributeValue¶
Parameter	Type	Description
id	String	Label attribute value ID.
value	String	Label attribute value.

**Table 16** WorkforceDescriptor¶
Parameter	Type	Description
current_task_id	String	ID of a team labeling task.
current_task_name	String	Name of a team labeling task.
reject_num	Integer	Number of rejected samples.
repetition	Integer	Number of persons who label each sample. The minimum value is 1.
is_synchronize_auto_labeling_data	Boolean	Whether to synchronously update auto labeling data. The options are as follows: true: Update auto labeling data synchronously. false: Do not update auto labeling data synchronously.
is_synchronize_data	Boolean	Whether to synchronize updated data, such as uploading files, synchronizing data sources, and assigning imported unlabeled files to team members. The options are as follows: true: Synchronize updated data to team members. false: Do not synchronize updated data to team members.
workers	Array of Worker objects	List of labeling team members.
workforce_id	String	ID of a labeling team.
workforce_name	String	Name of a labeling team.

**Table 17** Worker¶
Parameter	Type	Description
create_time	Long	Creation time.
description	String	Labeling team member description. The value contains 0 to 256 characters and does not support the following special characters: `^!<>=&"'`
email	String	Email address of a labeling team member.
role	Integer	Role. The options are as follows: 0: labeling personnel 1: reviewer 2: team administrator 3: dataset owner
status	Integer	Current login status of a labeling team member. The options are as follows: 0: The invitation email has not been sent. 1: The invitation email has been sent but the user has not logged in. 2: The user has logged in. 3: The labeling team member has been deleted.
update_time	Long	Update time.
worker_id	String	ID of a labeling team member.
workforce_id	String	ID of a labeling team.

Example Requests¶

Querying the Dataset List

GET https://{endpoint}/v2/{project_id}/datasets?offset=0&limit=10&sort_by=create_time&order=desc&dataset_type=0&file_preview=true

Example Responses¶

Status code: 200

{
  "total_number" : 1,
  "datasets" : [ {
    "dataset_id" : "gfghHSokody6AJigS5A",
    "dataset_name" : "dataset-f9e8",
    "dataset_type" : 0,
    "data_format" : "Default",
    "next_version_num" : 4,
    "status" : 1,
    "data_sources" : [ {
      "data_type" : 0,
      "data_path" : "/test-obs/classify/input/catRabbit4/"
    } ],
    "create_time" : 1605690595404,
    "update_time" : 1605690595404,
    "description" : "",
    "current_version_id" : "54IXbeJhfttGpL46lbv",
    "current_version_name" : "V003",
    "total_sample_count" : 10,
    "annotated_sample_count" : 10,
    "work_path" : "/test-obs/classify/output/",
    "inner_work_path" : "/test-obs/classify/output/dataset-f9e8-gfghHSokody6AJigS5A/",
    "inner_annotation_path" : "/test-obs/classify/output/dataset-f9e8-gfghHSokody6AJigS5A/annotation/",
    "inner_data_path" : "/test-obs/classify/output/dataset-f9e8-gfghHSokody6AJigS5A/data/",
    "inner_log_path" : "/test-obs/classify/output/dataset-f9e8-gfghHSokody6AJigS5A/logs/",
    "inner_temp_path" : "/test-obs/classify/output/dataset-f9e8-gfghHSokody6AJigS5A/temp/",
    "inner_task_path" : "/test-obs/classify/output/dataset-f9e8-gfghHSokody6AJigS5A/task/",
    "work_path_type" : 0,
    "workspace_id" : "0",
    "enterprise_project_id" : "0",
    "exist_running_task" : false,
    "exist_workforce_task" : false,
    "running_tasks_id" : [ ],
    "workforce_task_count" : 0,
    "feature_supports" : [ "0" ],
    "managed" : false,
    "import_data" : false,
    "ai_project" : "default-ai-project",
    "label_task_count" : 1,
    "dataset_format" : 0,
    "dataset_version" : "v1",
    "content_labeling" : true,
    "samples" : [ {
      "url" : "https://test-obs.obs.xxx.com:443/classify/input/catRabbit4/15.jpg?AccessKeyId=vprCCTY1NmHudlvC0bXr&Expires=1606100112&Signature=tuUo9jl6lqoMKAwNBz5g8dxO%2FdE%3D",
      "create_time" : 1605690596035
    }, {
      "url" : "https://test-obs.obs.xxx.com:443/classify/input/catRabbit4/8.jpg?AccessKeyId=vprCCTY1NmHudlvC0bXr&Expires=1606100112&Signature=NITOdBnkUXtdnKuEgDzZpkQzNfM%3D",
      "create_time" : 1605690596046
    }, {
      "url" : "https://test-obs.obs.xxx.com:443/classify/input/catRabbit4/9.jpg?AccessKeyId=vprCCTY1NmHudlvC0bXr&Expires=1606100112&Signature=%2BwUo1BL38%2F2d7p7anPi4fNzm1VU%3D",
      "create_time" : 1605690596050
    }, {
      "url" : "https://test-obs.obs.xxx.com:443/classify/input/catRabbit4/7.jpg?AccessKeyId=vprCCTY1NmHudlvC0bXr&Expires=1606100112&Signature=tOrHfcWo%2FEJ0wRzfi1M5Wk2MrXg%3D",
      "create_time" : 1605690596043
    } ],
    "files" : [ {
      "url" : "https://test-obs.obs.xxx.com:443/classify/input/catRabbit4/15.jpg?AccessKeyId=vprCCTY1NmHudlvC0bXr&Expires=1606100112&Signature=tuUo9jl6lqoMKAwNBz5g8dxO%2FdE%3D",
      "create_time" : 1605690596035
    }, {
      "url" : "https://test-obs.obs.xxx.com:443/classify/input/catRabbit4/8.jpg?AccessKeyId=vprCCTY1NmHudlvC0bXr&Expires=1606100112&Signature=NITOdBnkUXtdnKuEgDzZpkQzNfM%3D",
      "create_time" : 1605690596046
    }, {
      "url" : "https://test-obs.obs.xxx.com:443/classify/input/catRabbit4/9.jpg?AccessKeyId=vprCCTY1NmHudlvC0bXr&Expires=1606100112&Signature=%2BwUo1BL38%2F2d7p7anPi4fNzm1VU%3D",
      "create_time" : 1605690596050
    }, {
      "url" : "https://test-obs.obs.xxx.com:443/classify/input/catRabbit4/7.jpg?AccessKeyId=vprCCTY1NmHudlvC0bXr&Expires=1606100112&Signature=tOrHfcWo%2FEJ0wRzfi1M5Wk2MrXg%3D",
      "create_time" : 1605690596043
    } ]
  } ]
}

Status Codes¶

Status Code	Description
200	OK
401	Unauthorized
403	Forbidden
404	Not Found

Error Codes¶

See Error Codes.

last updated: 2024-06-20 00:39