Querying the Sample List

Function

This API is used to query the sample list by page.

URI

GET /v2/{project_id}/datasets/{dataset_id}/data-annotations/samples

Table 1 Path Parameters

Parameter

Mandatory

Type

Description

dataset_id

Yes

String

Dataset ID.

project_id

Yes

String

Project ID. For details about how to obtain the project ID, see Obtaining a Project ID.

Table 2 Query Parameters

Parameter

Mandatory

Type

Description

email

No

String

Email address of a labeling team member.

high_score

No

String

Upper confidence limit. The default value is 1.

label_name

No

String

Label name.

label_type

No

Integer

Labeling type. The options are as follows:

  • 0: image classification

  • 1: object detection

  • 100: text classification

  • 101: named entity recognition

  • 102: text triplet

  • 200: sound classification

  • 201: speech content

  • 202: speech paragraph labeling

  • 400: table dataset

  • 600: video labeling

  • 900: custom format

limit

No

Integer

Maximum number of records returned on each page. The value ranges from 1 to 100. The default value is 10.

locale

No

String

Language. The options are as follows:

en-us: English (default value)

low_score

No

String

Lower confidence limit. The default value is 0.

offset

No

Integer

Start page of the paging list. The default value is 0.

order

No

String

Sorting sequence of the query. The options are as follows:

  • asc: ascending order

  • desc: descending order (default value)

preview

No

Boolean

Whether to support preview. The options are as follows:

  • true: Preview is supported.

  • false: Preview is not supported.

process_parameter

No

String

Image resizing setting, which is the same as the OBS resizing setting. For details, see . For example, image/resize,m_lfit,h_200 indicates that the target image is resized proportionally and the height is set to 200 pixels.

sample_state

No

String

Sample status. The options are as follows:

  • ALL: labeled

  • NONE: unlabeled

  • UNCHECK: pending acceptance

  • ACCEPTED: accepted

  • REJECTED: rejected

  • UNREVIEWED: pending review

  • REVIEWED: reviewed

  • WORKFORCE_SAMPLED: sampled

  • WORKFORCE_SAMPLED_UNCHECK: sampling unchecked

  • WORKFORCE_SAMPLED_CHECKED: sampling checked

  • WORKFORCE_SAMPLED_ACCEPTED: sampling accepted

  • WORKFORCE_SAMPLED_REJECTED: sampling rejected

  • AUTO_ANNOTATION: to be confirmed

sample_type

No

Integer

Sample file type. The options are as follows:

  • 0: image

  • 1: text

  • 2: audio

  • 4: table

  • 6: video

  • 9: custom format (default value)

search_conditions

No

String

Multi-dimensional search condition after URL encoding. The relationship between multiple search conditions is AND.

version_id

No

String

Dataset version ID.

Request Parameters

None

Response Parameters

Status code: 200

Table 3 Response body parameters

Parameter

Type

Description

sample_count

Integer

Number of samples.

samples

Array of DescribeSampleResp objects

Sample list.

Table 4 DescribeSampleResp

Parameter

Type

Description

check_accept

Boolean

Whether the acceptance is passed, which is used for team labeling. The options are as follows:

  • true: The acceptance is passed.

  • false: The acceptance is not passed.

check_comment

String

Acceptance comment, which is used for team labeling.

check_score

String

Acceptance score, which is used for team labeling.

deletion_reasons

Array of strings

Reason for deleting a sample, which is used for healthcare.

hard_details

Map<String,HardDetail>

Details about difficulties, including description, causes, and suggestions of difficult problems.

labelers

Array of Worker objects

Labeling personnel list of sample assignment. The labelers record the team members to which the sample is allocated for team labeling.

labels

Array of SampleLabel objects

Sample label list.

metadata

SampleMetadata object

Key-value pair of the sample metadata attribute.

review_accept

Boolean

Whether to accept the review, which is used for team labeling. The options are as follows:

  • true: accepted

  • false: rejected

review_comment

String

Review comment, which is used for team labeling.

review_score

String

Review score, which is used for team labeling.

sample_data

Array of strings

Sample data list.

sample_dir

String

Sample path.

sample_id

String

Sample ID.

sample_name

String

Sample name.

sample_size

Long

Sample size or text length, in bytes.

sample_status

String

Sample status. The options are as follows:

  • ALL: labeled

  • NONE: unlabeled

  • UNCHECK: pending acceptance

  • ACCEPTED: accepted

  • REJECTED: rejected

  • UNREVIEWED: pending review

  • REVIEWED: reviewed

  • WORKFORCE_SAMPLED: sampled

  • WORKFORCE_SAMPLED_UNCHECK: sampling unchecked

  • WORKFORCE_SAMPLED_CHECKED: sampling checked

  • WORKFORCE_SAMPLED_ACCEPTED: sampling accepted

  • WORKFORCE_SAMPLED_REJECTED: sampling rejected

  • AUTO_ANNOTATION: to be confirmed

sample_time

Long

Sample time, when OBS is last modified.

sample_type

Integer

Sample type. The options are as follows:

  • 0: image

  • 1: text

  • 2: speech

  • 4: table

  • 6: video

  • 9: custom format

score

String

Comprehensive score, which is used for team labeling.

source

String

Source address of sample data.

sub_sample_url

String

Subsample URL, which is used for healthcare.

worker_id

String

ID of a labeling team member, which is used for team labeling.

Table 5 HardDetail

Parameter

Type

Description

alo_name

String

Alias.

id

Integer

Reason ID.

reason

String

Reason description.

suggestion

String

Handling suggestion.

Table 6 Worker

Parameter

Type

Description

create_time

Long

Creation time.

description

String

Labeling team member description. The value contains 0 to 256 characters and does not support the following special characters: ^!<>=&"'

email

String

Email address of a labeling team member.

role

Integer

Role. The options are as follows:

  • 0: labeling personnel

  • 1: reviewer

  • 2: team administrator

  • 3: dataset owner

status

Integer

Current login status of a labeling team member. The options are as follows:

  • 0: The invitation email has not been sent.

  • 1: The invitation email has been sent but the user has not logged in.

  • 2: The user has logged in.

  • 3: The labeling team member has been deleted.

update_time

Long

Update time.

worker_id

String

ID of a labeling team member.

workforce_id

String

ID of a labeling team.

Table 7 SampleLabel

Parameter

Type

Description

annotated_by

String

Video labeling method, which is used to distinguish whether a video is labeled manually or automatically. The options are as follows:

  • human: manual labeling

  • auto: automatic labeling

id

String

Label ID.

name

String

Label name.

property

SampleLabelProperty object

Attribute key-value pair of the sample label, such as the object shape and shape feature.

score

Float

Confidence.

type

Integer

Label type. The options are as follows:

  • 0: image classification

  • 1: object detection

  • 100: text classification

  • 101: named entity recognition

  • 102: text triplet relationship

  • 103: text triplet entity

  • 200: speech classification

  • 201: speech content

  • 202: speech paragraph labeling

  • 600: video classification

Table 8 SampleLabelProperty

Parameter

Type

Description

@modelarts:content

String

Speech text content, which is a default attribute dedicated to the speech label (including the speech content and speech start and end points).

@modelarts:end_index

Integer

End position of the text, which is a default attribute dedicated to the named entity label. The end position does not include the character corresponding to the value of end_index. Examples are as follows.

  • If the text content is "Barack Hussein Obama II (born August 4, 1961) is an American attorney and politician.", the start_index and end_index values of "Barack Hussein Obama II" are 0 and 23, respectively.

  • If the text content is "By the end of 2018, the company has more than 100 employees.", the start_index and end_index values of "By the end of 2018" are 0 and 18, respectively.

@modelarts:end_time

String

Speech end time, which is a default attribute dedicated to the speech start/end point label, in the format of hh:mm:ss.SSS. (hh indicates hour; mm indicates minute; ss indicates second; and SSS indicates millisecond.)

@modelarts:feature

Object

Shape feature, which is a default attribute dedicated to the object detection label, with type of List. The upper left corner of an image is used as the coordinate origin [0,0]. Each coordinate point is represented by [x, y]. x indicates the horizontal coordinate, and y indicates the vertical coordinate (both x and y are greater than or equal to 0). The format of each shape is as follows:

  • bndbox: consists of two points, for example, [[0,10],[50,95]]. The first point is located at the upper left corner of the rectangle and the second point is located at the lower right corner of the rectangle. That is, the X coordinate of the first point must be smaller than that of the second point, and the Y coordinate of the second point must be smaller than that of the first point.

  • polygon: consists of multiple points that are connected in sequence to form a polygon, for example, [[0,100],[50,95],[10,60],[500,400]].

  • circle: consists of the center point and radius, for example, [[100,100],[50]].

  • line: consists of two points, for example, [[0,100],[50,95]]. The first point is the start point, and the second point is the end point.

  • dashed: consists of two points, for example, [[0,100],[50,95]]. The first point is the start point, and the second point is the end point.

  • point: consists of one point, for example, [[0,100]].

  • polyline: consists of multiple points, for example, [[0,100],[50,95],[10,60],[500,400]].

@modelarts:from

String

ID of the head entity in the triplet relationship label, which is a default attribute dedicated to the triplet relationship label.

@modelarts:hard

String

Sample labeled as a hard sample or not, which is a default attribute. Options:

  • 0/false: not a hard example

  • 1/true: hard example

@modelarts:hard_coefficient

String

Coefficient of difficulty of each label level, which is a default attribute. The value range is [0,1].

@modelarts:hard_reasons

String

Reasons that the sample is a hard sample, which is a default attribute. Use a hyphen (-) to separate every two hard sample reason IDs, for example, 3-20-21-19. The options are as follows:

  • 0: No target objects are identified.

  • 1: The confidence is low.

  • 2: The clustering result based on the training dataset is inconsistent with the prediction result.

  • 3: The prediction result is greatly different from the data of the same type in the training dataset.

  • 4: The prediction results of multiple consecutive similar images are inconsistent.

  • 5: There is a large offset between the image resolution and the feature distribution of the training dataset.

  • 6: There is a large offset between the aspect ratio of the image and the feature distribution of the training dataset.

  • 7: There is a large offset between the brightness of the image and the feature distribution of the training dataset.

  • 8: There is a large offset between the saturation of the image and the feature distribution of the training dataset.

  • 9: There is a large offset between the color richness of the image and the feature distribution of the training dataset.

  • 10: There is a large offset between the definition of the image and the feature distribution of the training dataset.

  • 11: There is a large offset between the number of frames of the image and the feature distribution of the training dataset.

  • 12: There is a large offset between the standard deviation of area of image frames and the feature distribution of the training dataset.

  • 13: There is a large offset between the aspect ratio of image frames and the feature distribution of the training dataset.

  • 14: There is a large offset between the area portion of image frames and the feature distribution of the training dataset.

  • 15: There is a large offset between the edge of image frames and the feature distribution of the training dataset.

  • 16: There is a large offset between the brightness of image frames and the feature distribution of the training dataset.

  • 17: There is a large offset between the definition of image frames and the feature distribution of the training dataset.

  • 18: There is a large offset between the stack of image frames and the feature distribution of the training dataset.

  • 19: The data enhancement result based on GaussianBlur is inconsistent with the prediction result of the original image.

  • 20: The data enhancement result based on fliplr is inconsistent with the prediction result of the original image.

  • 21: The data enhancement result based on Crop is inconsistent with the prediction result of the original image.

  • 22: The data enhancement result based on flipud is inconsistent with the prediction result of the original image.

  • 23: The data enhancement result based on scale is inconsistent with the prediction result of the original image.

  • 24: The data enhancement result based on translate is inconsistent with the prediction result of the original image.

  • 25: The data enhancement result based on shear is inconsistent with the prediction result of the original image.

  • 26: The data enhancement result based on superpixels is inconsistent with the prediction result of the original image.

  • 27: The data enhancement result based on sharpen is inconsistent with the prediction result of the original image.

  • 28: The data enhancement result based on add is inconsistent with the prediction result of the original image.

  • 29: The data enhancement result based on invert is inconsistent with the prediction result of the original image.

  • 30: The data is predicted to be abnormal.

@modelarts:shape

String

Object shape, which is a default attribute dedicated to the object detection label and is left empty by default. The options are as follows:

  • bndbox: rectangle

  • polygon: polygon

  • circle: circle

  • line: straight line

  • dashed: dotted line

  • point: point

  • polyline: polyline

@modelarts:source

String

Speech source, which is a default attribute dedicated to the speech start/end point label and can be set to a speaker or narrator.

@modelarts:start_index

Integer

Start position of the text, which is a default attribute dedicated to the named entity label. The start value begins from 0, including the character corresponding to the value of start_index.

@modelarts:start_time

String

Speech start time, which is a default attribute dedicated to the speech start/end point label, in the format of hh:mm:ss.SSS. (hh indicates hour; mm indicates minute; ss indicates second; and SSS indicates millisecond.)

@modelarts:to

String

ID of the tail entity in the triplet relationship label, which is a default attribute dedicated to the triplet relationship label.

Table 9 SampleMetadata

Parameter

Type

Description

@modelarts:hard

Double

Whether the sample is labeled as a hard sample, which is a default attribute. The options are as follows:

  • 0: non-hard sample

  • 1: hard sample

@modelarts:hard_coefficient

Double

Coefficient of difficulty of each sample level, which is a default attribute. The value range is [0,1].

@modelarts:hard_reasons

Array of integers

ID of a hard sample reason, which is a default attribute. The options are as follows:

  • 0: No target objects are identified.

  • 1: The confidence is low.

  • 2: The clustering result based on the training dataset is inconsistent with the prediction result.

  • 3: The prediction result is greatly different from the data of the same type in the training dataset.

  • 4: The prediction results of multiple consecutive similar images are inconsistent.

  • 5: There is a large offset between the image resolution and the feature distribution of the training dataset.

  • 6: There is a large offset between the aspect ratio of the image and the feature distribution of the training dataset.

  • 7: There is a large offset between the brightness of the image and the feature distribution of the training dataset.

  • 8: There is a large offset between the saturation of the image and the feature distribution of the training dataset.

  • 9: There is a large offset between the color richness of the image and the feature distribution of the training dataset.

  • 10: There is a large offset between the definition of the image and the feature distribution of the training dataset.

  • 11: There is a large offset between the number of frames of the image and the feature distribution of the training dataset.

  • 12: There is a large offset between the standard deviation of area of image frames and the feature distribution of the training dataset.

  • 13: There is a large offset between the aspect ratio of image frames and the feature distribution of the training dataset.

  • 14: There is a large offset between the area portion of image frames and the feature distribution of the training dataset.

  • 15: There is a large offset between the edge of image frames and the feature distribution of the training dataset.

  • 16: There is a large offset between the brightness of image frames and the feature distribution of the training dataset.

  • 17: There is a large offset between the definition of image frames and the feature distribution of the training dataset.

  • 18: There is a large offset between the stack of image frames and the feature distribution of the training dataset.

  • 19: The data enhancement result based on GaussianBlur is inconsistent with the prediction result of the original image.

  • 20: The data enhancement result based on fliplr is inconsistent with the prediction result of the original image.

  • 21: The data enhancement result based on Crop is inconsistent with the prediction result of the original image.

  • 22: The data enhancement result based on flipud is inconsistent with the prediction result of the original image.

  • 23: The data enhancement result based on scale is inconsistent with the prediction result of the original image.

  • 24: The data enhancement result based on translate is inconsistent with the prediction result of the original image.

  • 25: The data enhancement result based on shear is inconsistent with the prediction result of the original image.

  • 26: The data enhancement result based on superpixels is inconsistent with the prediction result of the original image.

  • 27: The data enhancement result based on sharpen is inconsistent with the prediction result of the original image.

  • 28: The data enhancement result based on add is inconsistent with the prediction result of the original image.

  • 29: The data enhancement result based on invert is inconsistent with the prediction result of the original image.

  • 30: The data is predicted to be abnormal.

@modelarts:size

Array of objects

Image size (width, height, and depth of the image), which is a default attribute, with type of List. In the list, the first number indicates the width (pixels), the second number indicates the height (pixels), and the third number indicates the depth (the depth can be left blank and the default value is 3). For example, [100,200,3] and [100,200] are both valid. Note: This parameter is mandatory only when the sample label list contains the object detection label.

Example Requests

Querying the Sample List by Page

GET https://{endpoint}/v2/{project_id}/datasets/{dataset_id}/data-annotations/samples

Example Responses

Status code: 200

OK

{
  "sample_count" : 2,
  "samples" : [ {
    "sample_id" : "012f99f3cf405860130b6ed2350c2228",
    "sample_type" : 0,
    "labels" : [ {
      "name" : "car",
      "type" : 0,
      "property" : { }
    } ],
    "source" : "https://test-obs.obs.xxx.com:443/image/aifood/%E5%86%B0%E6%BF%80%E5%87%8C/36502.jpg?AccessKeyId=RciyO7xxxxxxxxxxyUH&Expires=1606296688&x-image-process=image%2Fresize%2Cm_lfit%2Ch_200&Signature=icyvHhFew9vnmy3zh1uZMP15Mbg%3D",
    "metadata" : {
      "@modelarts:import_origin" : 0
    },
    "sample_time" : 1589190552106,
    "sample_status" : "MANUAL_ANNOTATION",
    "annotated_by" : "human/test_123/test_123",
    "labelers" : [ {
      "email" : "xxx@xxx.com",
      "worker_id" : "5d8d4033b428fed5ac158942c33940a2",
      "role" : 0
    } ]
  }, {
    "sample_id" : "0192f3acfb000666033a0f85c21577c7",
    "sample_type" : 0,
    "labels" : [ {
      "name" : "car",
      "type" : 0,
      "property" : { }
    } ],
    "source" : "https://test-obs.obs.xxx.com:443/image/aifood/%E5%86%B0%E6%BF%80%E5%87%8C/36139.jpg?AccessKeyId=RciyO7xxxxxxxxxxyUH&Expires=1606296688&x-image-process=image%2Fresize%2Cm_lfit%2Ch_200&Signature=RRr9r2cghLCXk%2B0%2BfHtYJi8eZ4k%3D",
    "metadata" : {
      "@modelarts:import_origin" : 0
    },
    "sample_time" : 1589190543327,
    "sample_status" : "MANUAL_ANNOTATION",
    "annotated_by" : "human/test_123/test_123",
    "labelers" : [ {
      "email" : "xxx@xxx.com",
      "worker_id" : "a2abd3f27b4e92c593c15282f8b6bd29",
      "role" : 0
    } ]
  } ]
}

Status Codes

Status Code

Description

200

OK

401

Unauthorized

403

Forbidden

404

Not Found

Error Codes

See Error Codes.