• Data Warehouse Service

dws
  1. Help Center
  2. Data Warehouse Service
  3. Developer Guide
  4. Data Import
  5. Concurrently Importing Data from OBS
  6. Uploading Data to OBS

Uploading Data to OBS

Scenarios

Before importing data from OBS to a cluster, prepare source data files and upload these files to OBS. If the data files have been stored on OBS, skip Step 1 in this section.

Preparing Data Files

Prepare source data files to be uploaded to OBS. DWS supports only source data files in CSV, TEXT, and ORC format.

If user data cannot be saved in CSV format, store the data as any text file.

NOTE:

According to How Data Is Imported, when the data volume of each source data file is large, evenly split these files into multiple files before storing them to OBS. The optimal import performance is delivered when the number of files is an integer multiple of the DN quantity.

Uploading Data to OBS

  1. Upload data to OBS.

    Store the source data files to be imported in the OBS bucket in advance.

    1. Log in to the OBS management console.

      Click Service List and choose Object Storage Service to open the OBS management console.

    2. Create a bucket.

      For details about how to create a bucket, see OBS Console Operation Guide > Managing Buckets > Creating a Bucket in the Object Storage Service User Guide.

      For example, create two buckets named mybucket and mybucket02.

    3. Create a folder.

      For details, see OBS Console Operation Guide > Managing Objects > Creating a Folder in the Object Storage Service User Guide.

      For example:

      • Create a folder named input_data in the mybucket OBS bucket.
      • Create a folder named input_data in the mybucket02 OBS bucket.
    4. Upload the files.

      For details, see OBS Console Operation Guide > Managing Objects > Uploading a File in the Object Storage Service User Guide.

      For example:

      • Upload the following data files to the input_data folder in the mybucket OBS bucket:
        product_info.0
        product_info.1
      • Upload the following data file to the input_data folder in the mybucket02 OBS bucket:
        product_info.2

  2. Obtain the OBS path for storing source data files.

    After the source data files are uploaded to an OBS bucket, a globally unique access path is generated. The OBS path of the source data files is the value of the location parameter used for creating a foreign table.

    The OBS folder path in the location parameter consists of obs://, a bucket name, and a file path. Example:

    obs://<bucket_name>/<file_path>

    For example, the OBS paths are as follows:

    obs://mybucket/input_data/product_info.0
    obs://mybucket/input_data/product_info.1
    obs://mybucket02/input_data/product_info.2

  3. Grant the OBS bucket read permission for the user who will import data.

    When importing data from OBS to a cluster, the user must have the read permission for the OBS buckets where the source data files are located. You can configure the ACL for the OBS buckets to grant the read permission to a specific user.

    For details, see OBS Console Operation Guide > Bucket Permissions > Setting ACL Permissions for Buckets in the Object Storage Service User Guide.