• MapReduce Service

mrs
  1. Help Center
  2. MapReduce Service
  3. User Guide
  4. Cluster Operation Guide
  5. Managing Active Clusters
  6. Managing Data Files

Managing Data Files

After Kerberos authentication is disabled, you can create directories, delete directories, and import, export, or delete files on the File Management page.

Background

Data to be processed by MRS is stored in either OBS or HDFS. OBS provides you with massive, highly reliable, and secure data storage capabilities at a low cost. You can view, manage, and use data through OBS Console or OBS Browser. In addition, you can use the REST APIs to manage or access data. The REST APIs can be used alone or it can be integrated with service programs.

Before creating jobs, upload the local data to OBS for computing and analysis. MRS allows data to be exported from OBS to HDFS for computing and analysis. After the analysis and computing are complete, you can either store the data in HDFS or export it to OBS. HDFS and OBS can store compressed data in the format of bz2 or gz.

Importing Data

MRS supports data import from the OBS system to HDFS. This function is recommended if the data size is small, because the upload speed reduces as the file size increases.

Both files and folders containing files can be imported. The operations are as follows:

  1. Log in to the MRS management console.
  2. Click in the upper-left corner on the management console and select Region and Project.
  3. Choose Clusters > Active Clusters, select a cluster, and click its name to switch to the cluster information page.
  4. Click File Management and go to the File Management tab page.
  5. Select HDFS File List.
  6. Click the data storage directory, for example, bd_app1.

    bd_app1 is just an example. The storage directory can be any directory on the page. You can create a directory by clicking Create Folder.

    The name of the created directory must meet the following requirements:

    • Contains a maximum of 255 characters, and the full path contains a maximum of 1023 characters.
    • Cannot be empty.
    • Cannot contain special characters (/:*?"<>|\;&,'`!{}[]$).
    • Cannot start or end with a period (.).
  7. Click Import Data to configure the paths for HDFS and OBS.
    NOTE:

    When configuring the OBS or HDFS path, click Browse, select the file path, and click OK.

    • The path for OBS
      • Must start with s3a://. s3a:// is used by default.
      • Files and programs encrypted by the KMS cannot be imported.
      • Empty folders cannot be imported.
      • Directories and file names can contain letters, Chinese characters, digits, hyphens (-), or underscores (_), but cannot contain special characters (/:*?"<>|\;&,'`!{}[]$).
      • Directories and file names cannot start or end with spaces, but can have spaces between other characters.
      • The full path of OBS contains a maximum of 1023 characters.
    • The path for HDFS
      • It starts with /user by default.
      • Directories and file names can contain letters, Chinese characters, digits, hyphens (-), or underscores (_), but cannot contain special characters (/:*?"<>|\;&,'`!{}[]$).
      • Directories and file names cannot start or end with spaces, but can have spaces between other characters.
      • The full path of HDFS contains a maximum of 1023 characters.
      • The parent HDFS directory in HDFS File List is displayed in the textbox for the HDFS path by default when data is imported.
  8. Click OK.

    View the upload progress in File Operation Record. The data import operation is operated as a Distcp job by MRS. You can check whether the Distcp job is successfully executed in Job Management > Job.

Exporting Data

After data is processed and analyzed, you can either store the data in HDFS or export it to the OBS system.

Both files and folders containing files can be exported. The operations are as follows:

  1. Log in to the MRS management console.
  2. Click in the upper-left corner on the management console and select Region and Project.
  3. Choose Clusters > Active Clusters, select a cluster, and click its name to switch to the cluster information page.
  4. Click File Management and go to the File Management tab page.
  5. Select HDFS File List.
  6. Click the data storage directory, for example, bd_app1.
  7. Click Export Data and configure the paths for HDFS and OBS.
    NOTE:

    When configuring the OBS or HDFS path, click Browse, select the file path, and click OK.

    • The path for OBS
      • Must start with s3a://s3a:// is used by default.
      • Directories and file names can contain letters, Chinese characters, digits, hyphens (-), or underscores (_), but cannot contain special characters (/:*?"<>|\;&,'`!{}[]$).
      • Directories and file names cannot start or end with spaces, but can have spaces between other characters.
      • The full path of OBS contains a maximum of 1023 characters.
    • The path for HDFS
      • It starts with /user by default.
      • Directories and file names can contain letters, Chinese characters, digits, hyphens (-), or underscores (_), but cannot contain special characters (/:*?"<>|\;&,'`!{}[]$).
      • Directories and file names cannot start or end with spaces, but can have spaces between other characters.
      • The full path of HDFS contains a maximum of 1023 characters.
      • The parent HDFS directory in HDFS File List is displayed in the textbox for the HDFS path by default when data is exported.
    NOTE:

    Ensure that the exported folder is not empty. If an empty folder is exported to the OBS system, the folder is exported as a file. After the folder is exported, its name is changed, for example, from test to test-$folder$, and its type is file.

  8. Click OK.

    View the upload progress in File Operation Record. The data export operation is operated as a Distcp job by MRS. You can check whether the Distcp job is successfully executed in Job Management > Job.

Viewing File Operation Records

When importing or exporting data on the MRS management console, you can choose File Management > File Operation Record to view the import or export progress.

Table 1 lists the parameters in file operation records.

Table 1 Parameters in file operation records

Parameter

Description

Created

Time when data import or export is started

Source Path

Source path of data

  • In data import, Source Path is the OBS path.
  • In data export, Source Path is the HDFS path.

Target Path

Target path of data

  • In data import, Target Path is the HDFS path.
  • In data export, Target Path is the OBS path.

Status

Status of the data import or export operation
  • Running
  • Completed
  • Terminated
  • Abnormal

Duration (min)

Total time used by data import or export

Unit: minute

Result

Data import or export result

  • Successful
  • Failed

Operation

View Log: You can click View Log to view log information of a job. For details, see Viewing Job Configurations and Logs.