Import Operation

After a dataset is created, you can directly synchronize data from the dataset. Alternatively, you can import more data by importing the dataset. Data can be imported from an OBS directory or the manifest file.

Prerequisites

  • You have created a dataset.

  • You have stored the data to be imported in OBS. You have stored the manifest file in OBS.

  • The OBS buckets and ModelArts are in the same region.

Import Modes

There are two import modes: OBS path and Manifest file.

  • OBS path: indicates that the dataset to be imported has been stored in an OBS directory in advance. In this case, you need to select an OBS path that you can access. In addition, the directory structure in the OBS path must comply with the specifications. For details, see Specifications for Importing Data from an OBS Directory. Only the following types of dataset support the OBS path import mode: Image classification, Object detection, Text classification, Table, and Sound classification.

  • Manifest file: indicates that the dataset file is in the manifest format and data is imported from the manifest file. The manifest file defines the mapping between labeling objects and content. In addition, the manifest file has been uploaded to OBS. For details about the specifications of the manifest file, see Specifications for Importing the Manifest File.

Note

Before importing an object detection dataset, ensure that the labeling range of the labeling file does not exceed the size of the original image. Otherwise, the import may fail.

Table 1 Import modes supported by datasets

Dataset Type

Importing Data from an OBS Path

Importing Data from a Manifest File

Image classification

Supported

Follow the format specifications described in Image Classification.

Supported

Follow the format specifications described in Image Classification.

Object detection

Supported

Follow the format specifications described in Object Detection.

Supported

Follow the format specifications described in Object Detection.

Sound classification

Supported

Follow the format specifications described in Sound Classification.

Supported

Follow the format specifications described in Sound Classification.

Speech labeling

N/A

Supported

Follow the format specifications described in Speech Paragraph Labeling.

Speech paragraph labeling

N/A

Supported

Follow the format specifications described in Speech Labeling.

Text classification

Supported

Follow the format specifications described in Text Classification.

Supported

Follow the format specifications described in Text Classification.

Named entity recognition

N/A

Supported

Follow the format specifications described in Named Entity Recognition.

Text triplet

N/A

Supported

Follow the format specifications described in Text Triplet.

Video

N/A

Supported

Follow the format specifications described in Video Labeling.

Free format

N/A

N/A

Importing Data from an OBS Path

The parameters on the GUI for data import vary according to the dataset type. The following uses a dataset of the image classification type as an example.

  1. Log in to the ModelArts management console. In the left navigation pane, choose Data Management > Datasets. The Datasets page is displayed.

  2. Locate the row that contains the desired dataset and choose More > Import in the Operation column.

    Alternatively, you can click the dataset name to go to the Dashboard tab page of the dataset, and click Import in the upper right corner.

  3. In the Import dialog box, set Import Mode to OBS path and set OBS path to the path for storing data. Then click OK.

    After the data import is successful, the data is automatically synchronized to the dataset. On the Datasets page, you can click the dataset name to view its details and label the data.

Importing Data from a Manifest File

The parameters on the GUI for data import vary according to the dataset type. The following uses a dataset of the object detection type as an example. Datasets of the table type cannot be imported from the manifest file.

  1. Log in to the ModelArts management console. In the left navigation pane, choose Data Management > Datasets. The Datasets page is displayed.

  2. Locate the row that contains the desired dataset and choose More > Import in the Operation column.

    Alternatively, you can click the dataset name to go to the Dashboard tab page of the dataset, and click Import in the upper right corner.

  3. In the Import dialog box, set the parameters as follows and click OK.

    • Import Mode: Select Manifest file.

    • Manifest file: Select the OBS path for storing the manifest file.

    • Import by Label: The system automatically obtains the labels of the dataset. You can click Add Label to add a label or click the deletion icon on the right to delete a label. This field is optional. After importing a dataset, you can add or delete labels during data labeling.

    • Import labels: If this parameter is selected, the labels defined in the manifest file are imported to the ModelArts dataset.

    • Import only hard examples: If this parameter is selected, only the hard attribute data of the manifest file is imported. Examples whose hard attribute is true in the manifest file are hard examples.

    After the data import is successful, the data is automatically synchronized to the dataset. On the Datasets page, you can click the dataset name to go to the Dashboard tab page of the dataset, and click Label in the upper right corner. On the displayed dataset details page, view detailed data and label data.