Adding Samples in Batches¶

Function¶

This API is used to add samples in batches.

Debugging¶

You can debug this API through automatic authentication in or use the SDK sample code generated by API Explorer.

URI¶

POST /v2/{project_id}/datasets/{dataset_id}/data-annotations/samples

**Table 1** Path Parameters¶
Parameter	Mandatory	Type	Description
dataset_id	Yes	String	Dataset ID.
project_id	Yes	String	Project ID. For details about how to obtain a project ID, see Obtaining a Project ID and Name.

Request Parameters¶

**Table 2** Request body parameters¶
Parameter	Mandatory	Type	Description
final_annotation	No	Boolean	Whether to directly import to the final result. Options: true: Import labels to the labeled dataset. (Default value). false: Import labels to the to-be-confirmed dataset. Currently, to-be-confirmed datasets only support categories of image classification and object detection.
label_format	No	LabelFormat object	Label format. This parameter is used only for text datasets.
samples	No	Array of Sample objects	Sample list.

**Table 3** LabelFormat¶
Parameter	Mandatory	Type	Description
label_type	No	String	Label type of text classification. Options: 0: The label is separated from the text, and they are distinguished by the fixed suffix _result. For example, the text file is abc.txt, and the label file is abc_result.txt. 1: Default value. Labels and texts are stored in the same file and separated by separators. You can use text_sample_separator to specify the separator between the text and label and text_label_separator to specify the separator between labels.
text_label_separator	No	String	Separator between labels. By default, a comma (,) is used as the separator. The separator needs to be escaped. The separator can contain only one character, such as a letter, a digit, or any of the following special characters: `!@#$%^&*_=\|?/':.;,`
text_sample_separator	No	String	Separator between the text and label. By default, the Tab key is used as the separator. The separator needs to be escaped. The separator can contain only one character, such as a letter, a digit, or any of the following special characters: `!@#$%^&*_=\|?/':.;,`

**Table 4** Sample¶
Parameter	Mandatory	Type	Description
data	No	Object	Byte data of sample files. The type is java.nio.ByteBuffer. When this parameter is called, the string converted from the byte data is uploaded.
data_source	No	DataSource object	Data source.
encoding	No	String	Encoding type of sample files, which is used to upload .txt or .csv files. The value can be UTF-8, GBK, or GB2312. The default value is UTF-8.
labels	No	Array of SampleLabel objects	Sample label list.
metadata	No	SampleMetadata object	Key-value pair of the sample metadata attribute.
name	No	String	Name of sample files. The value contains 0 to 1,024 characters and cannot contain special characters `(!<>=&"').`
sample_type	No	Integer	Sample type. Options: 0: image 1: text 2: speech 4: table 6: video 9: custom format

**Table 5** DataSource¶
Parameter	Mandatory	Type	Description
data_path	No	String	Data source path.
data_type	No	Integer	Data type. Options: 0: OBS bucket (default value) 1: GaussDB(DWS) 2: DLI 3: RDS 4: MRS 5: AI Gallery 6: Inference service
schema_maps	No	Array of SchemaMap objects	Schema mapping information corresponding to the table data.
source_info	No	SourceInfo object	Information required for importing a table data source.
with_column_header	No	Boolean	Whether the first row in the file is a column name. This field is valid for the table dataset. Options: true: The first row in the file is the column name. false: The first row in the file is not the column name.

**Table 6** SchemaMap¶
Parameter	Mandatory	Type	Description
dest_name	No	String	Name of the destination column.
src_name	No	String	Name of the source column.

**Table 7** SourceInfo¶
Parameter	Mandatory	Type	Description
cluster_id	No	String	ID of an MRS cluster.
cluster_mode	No	String	Running mode of an MRS cluster. Options: 0: normal cluster 1: security cluster
cluster_name	No	String	Name of an MRS cluster.
database_name	No	String	Name of the database to which the table dataset is imported.
input	No	String	HDFS path of a table dataset.
ip	No	String	IP address of your GaussDB(DWS) cluster.
port	No	String	Port number of your GaussDB(DWS) cluster.
queue_name	No	String	DLI queue name of a table dataset.
subnet_id	No	String	Subnet ID of an MRS cluster.
table_name	No	String	Name of the table to which a table dataset is imported.
user_name	No	String	Username, which is mandatory for GaussDB(DWS) data.
user_password	No	String	User password, which is mandatory for GaussDB(DWS) data.
vpc_id	No	String	ID of the VPC where an MRS cluster resides.

**Table 8** SampleLabel¶
Parameter	Mandatory	Type	Description
annotated_by	No	String	Video labeling method, which is used to distinguish whether a video is labeled manually or automatically. Options: human: manual labeling auto: automatic labeling
id	No	String	Label ID.
name	No	String	Label name.
property	No	SampleLabelProperty object	Attribute key-value pair of the sample label, such as the object shape and shape feature.
score	No	Float	Confidence.
type	No	Integer	Label type. Options: 0: image classification 1: object detection 3: image segmentation 100: text classification 101: named entity recognition 102: text triplet relationship 103: text triplet entity 200: sound classification 201: speech content 202: speech paragraph labeling 600: video labeling

**Table 9** SampleLabelProperty¶
Parameter	Mandatory	Type	Description
@modelarts:content	No	String	Speech text content, which is a default attribute dedicated to the speech label (including the speech content and speech start and end points).
@modelarts:end_index	No	Integer	End position of the text, which is a default attribute dedicated to the named entity label. The end position does not include the character corresponding to the value of end_index. Example: If the text is "Barack Hussein Obama II (born August 4, 1961) is an attorney and politician.", start_index and end_index of Barack Hussein Obama II are 0 and 23, respectively. If the text is "Hope is the thing with feathers", start_index and end_index of Hope are 0 and 4, respectively.
@modelarts:end_time	No	String	Speech end time, which is a default attribute dedicated to the speech start/end point label, in the format of hh:mm:ss.SSS. (hh indicates hour; mm indicates minute; ss indicates second; and SSS indicates millisecond.)
@modelarts:feature	No	Object	Shape feature, which is a default attribute dedicated to the object detection label, with type of List. The upper left corner of the image is used as the coordinate origin [0, 0]. Each coordinate point is represented by [x, y], where x indicates the horizontal coordinate and y indicates the vertical coordinate (both x and y are >=0). The format of each shape is as follows: bndbox consists of two points, for example, [[0,10],[50,95]]. The upper left vertex of the rectangle is the first point, and the lower right vertex is the second point. That is, the x-coordinate of the first point must be less than the x-coordinate of the second point, and the y-coordinate of the first point must be less than the y-coordinate of the second point. polygon: consists of multiple points that are connected in sequence to form a polygon, for example, [[0,100],[50,95],[10,60],[500,400]]. circle: consists of the center and radius, for example, [[100,100],[50]]. line: consists of two points, for example, [[0,100],[50,95]]. The first point is the start point, and the second point is the end point. dashed: consists of two points, for example, [[0,100],[50,95]]. The first point is the start point, and the second point is the end point. point: consists of one point, for example, [[0,100]]. polyline: consists of multiple points, for example, [[0,100],[50,95],[10,60],[500,400]].
@modelarts:from	No	String	ID of the head entity in the triplet relationship label, which is a default attribute dedicated to the triplet relationship label.
@modelarts:hard	No	String	Sample labeled as a hard sample or not, which is a default attribute. Options: 0/false: not a hard example 1/true: hard example
@modelarts:hard_coefficient	No	String	Coefficient of difficulty of each label level, which is a default attribute. The value range is [0,1].
@modelarts:hard_reasons	No	String	Reasons that the sample is a hard sample, which is a default attribute. Use a hyphen (-) to separate every two hard sample reason IDs, for example, 3-20-21-19. Options: 0: No target objects are identified. 1: The confidence is low. 2: The clustering result based on the training dataset is inconsistent with the prediction result. 3: The prediction result is greatly different from the data of the same type in the training dataset. 4: The prediction results of multiple consecutive similar images are inconsistent. 5: There is a large offset between the image resolution and the feature distribution of the training dataset. 6: There is a large offset between the aspect ratio of the image and the feature distribution of the training dataset. 7: There is a large offset between the brightness of the image and the feature distribution of the training dataset. 8: There is a large offset between the saturation of the image and the feature distribution of the training dataset. 9: There is a large offset between the color richness of the image and the feature distribution of the training dataset. 10: There is a large offset between the definition of the image and the feature distribution of the training dataset. 11: There is a large offset between the number of frames of the image and the feature distribution of the training dataset. 12: There is a large offset between the standard deviation of area of image frames and the feature distribution of the training dataset. 13: There is a large offset between the aspect ratio of image frames and the feature distribution of the training dataset. 14: There is a large offset between the area portion of image frames and the feature distribution of the training dataset. 15: There is a large offset between the edge of image frames and the feature distribution of the training dataset. 16: There is a large offset between the brightness of image frames and the feature distribution of the training dataset. 17: There is a large offset between the definition of image frames and the feature distribution of the training dataset. 18: There is a large offset between the stack of image frames and the feature distribution of the training dataset. 19: The data enhancement result based on GaussianBlur is inconsistent with the prediction result of the original image. 20: The data enhancement result based on fliplr is inconsistent with the prediction result of the original image. 21: The data enhancement result based on Crop is inconsistent with the prediction result of the original image. 22: The data enhancement result based on flipud is inconsistent with the prediction result of the original image. 23: The data enhancement result based on scale is inconsistent with the prediction result of the original image. 24: The data enhancement result based on translate is inconsistent with the prediction result of the original image. 25: The data enhancement result based on shear is inconsistent with the prediction result of the original image. 26: The data enhancement result based on superpixels is inconsistent with the prediction result of the original image. 27: The data enhancement result based on sharpen is inconsistent with the prediction result of the original image. 28: The data enhancement result based on add is inconsistent with the prediction result of the original image. 29: The data enhancement result based on invert is inconsistent with the prediction result of the original image. 30: The data is predicted to be abnormal.
@modelarts:shape	No	String	Object shape, which is a default attribute dedicated to the object detection label and is left empty by default. Options: bndbox: rectangle polygon: polygon circle: circle line: straight line dashed: dotted line point: point polyline: polyline
@modelarts:source	No	String	Speech source, which is a default attribute dedicated to the speech start/end point label and can be set to a speaker or narrator.
@modelarts:start_index	No	Integer	Start position of the text, which is a default attribute dedicated to the named entity label. The start value begins from 0, including the character corresponding to the value of start_index.
@modelarts:start_time	No	String	Speech start time, which is a default attribute dedicated to the speech start/end point label, in the format of hh:mm:ss.SSS. (hh indicates hour; mm indicates minute; ss indicates second; and SSS indicates millisecond.)
@modelarts:to	No	String	ID of the tail entity in the triplet relationship label, which is a default attribute dedicated to the triplet relationship label.

**Table 10** SampleMetadata¶
Parameter	Mandatory	Type	Description
@modelarts:import_origin	No	Integer	Sample source, which is a built-in attribute.
@modelarts:hard	No	Double	Whether the sample is labeled as a hard sample, which is a default attribute. Options: 0: non-hard sample 1: hard sample
@modelarts:hard_coefficient	No	Double	Coefficient of difficulty of each sample level, which is a default attribute. The value range is [0,1].
@modelarts:hard_reasons	No	Array of integers	ID of a hard sample reason, which is a default attribute. Options: 0: No object is identified. 1: The confidence is low. 2: The clustering result based on the training dataset is inconsistent with the prediction result. 3: The prediction result is greatly different from the data of the same type in the training dataset. 4: The prediction results of multiple consecutive similar images are inconsistent. 5: There is a large offset between the image resolution and the feature distribution of the training dataset. 6: There is a large offset between the aspect ratio of the image and the feature distribution of the training dataset. 7: There is a large offset between the brightness of the image and the feature distribution of the training dataset. 8: There is a large offset between the saturation of the image and the feature distribution of the training dataset. 9: There is a large offset between the color richness of the image and the feature distribution of the training dataset. 10: There is a large offset between the definition of the image and the feature distribution of the training dataset. 11: There is a large offset between the number of frames of the image and the feature distribution of the training dataset. 12: There is a large offset between the standard deviation of area of image frames and the feature distribution of the training dataset. 13: There is a large offset between the aspect ratio of image frames and the feature distribution of the training dataset. 14: There is a large offset between the area portion of image frames and the feature distribution of the training dataset. 15: There is a large offset between the edge of image frames and the feature distribution of the training dataset. 16: There is a large offset between the brightness of image frames and the feature distribution of the training dataset. 17: There is a large offset between the definition of image frames and the feature distribution of the training dataset. 18: There is a large offset between the stack of image frames and the feature distribution of the training dataset. 19: The data enhancement result based on GaussianBlur is inconsistent with the prediction result of the original image. 20: The data enhancement result based on fliplr is inconsistent with the prediction result of the original image. 21: The data enhancement result based on Crop is inconsistent with the prediction result of the original image. 22: The data enhancement result based on flipud is inconsistent with the prediction result of the original image. 23: The data enhancement result based on scale is inconsistent with the prediction result of the original image. 24: The data enhancement result based on translate is inconsistent with the prediction result of the original image. 25: The data enhancement result based on shear is inconsistent with the prediction result of the original image. 26: The data enhancement result based on superpixels is inconsistent with the prediction result of the original image. 27: The data enhancement result based on sharpen is inconsistent with the prediction result of the original image. 28: The data enhancement result based on add is inconsistent with the prediction result of the original image. 29: The data enhancement result based on invert is inconsistent with the prediction result of the original image. 30: The data is predicted to be abnormal.
@modelarts:size	No	Array of objects	Image size (width, height, and depth of the image), which is a default attribute, with type of List<Integer>. In the list, the first number indicates the width (pixels), the second number indicates the height (pixels), and the third number indicates the depth (the depth can be left blank and the default value is 3). For example, [100,200,3] and [100,200] are both valid. Note: This parameter is mandatory only when the sample label list contains the object detection label.

Response Parameters¶

Status code: 200

**Table 11** Response body parameters¶
Parameter	Type	Description
error_code	String	Error code.
error_msg	String	Error message.
results	Array of UploadSampleResp objects	Response list for adding samples in batches.
success	Boolean	Whether the operation is successful. Options: true: successful false: failed

**Table 12** UploadSampleResp¶
Parameter	Type	Description
error_code	String	Error code.
error_msg	String	Error message.
info	String	Description.
name	String	Name of a sample file.
success	Boolean	Whether the operation is successful. Options: true: successful false: failed

Example Requests¶

Adding Samples in Batches

{
  "samples" : [ {
    "name" : "2.jpg",
    "data" : "/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAgGBgcGBQgHBwcJCQgKDBQNDAsLDBkSEw8UHRofHh0aHBwgJC4nICIsIxwcKDcpLDAxNDQ0Hyc5PTgyPC4zNDL/2wBDAQkJCQwLDBgNDRgyIRwhMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjL/wAARCAA1AJUDASIAAhEBAxEB/8QAHwAAAQUBAQEBAQEAAAAAAAAAAAECAwQFBgcICQoL/8QAtRAAAgEDAwIEAwUFBAQAAAF9AQIDAAQRBRIhMUEGE1FhByJxFDKBkaEII0KxwRVS0fAkM2JyggkKFhcYGRolJicoKSo0NTY3ODk6Q0RFRkdISUpTVFVWV1hZWmNkZWZnaGlqc3R1dnd4eXqDhIWGh4iJipKTlJWWl5iZmqKjpKWmp6ipqrKztLW2t7i5usLDxMXGx8jJytLT1NXW19jZ2uHi4+Tl5ufo6erx8vP09fb3+Pn6/8QAHwEAAwEBAQEBAQEBAQAAAAAAAAECAwQFBgcICQoL"
  } ]
}

Example Responses¶

Status code: 200

{
  "success" : true,
  "results" : [ {
    "success" : true,
    "name" : "/test-obs/classify/input/animals/2.jpg",
    "info" : "960585877c92d63911ba555ab3129d36"
  } ]
}

Status Codes¶

Status Code	Description
200	OK
401	Unauthorized
403	Forbidden
404	Not Found

Error Codes¶

See Error Codes.

last updated: 2024-11-28 19:02 UTC - commit: 7e602c6c50500da4fb13dea67589040fd3f7deb8