How Do I Get My Data into OBS or HDFS?

MRS can process data in OBS and HDFS. You can get your data into OBS or HDFS as follows:

  1. Upload local data to OBS.

    1. Log in to the OBS console.

    2. Create a parallel file system named userdata on OBS and create the program, input, output, and log folders in the file system.

      1. Choose Parallel File System > Create Parallel File System, and create a file system named userdata.

      2. In the OBS file system list, click the file system name userdata, choose Files > Create Folder, and create the program, input, output, and log folders.

    3. Upload data to the userdata file system.

      1. Go to the program folder and click Upload File.

      2. Click add file and select a user program.

      3. Click Upload.

      4. Upload the user data file to the input directory using the same method.

  2. Import OBS data to HDFS.

    You can import OBS data to HDFS only when Kerberos Authentication is disabled and the cluster is running.

    1. Log in to the MRS console.

    2. Click the name of the cluster.

    3. On the page displayed, select the Files tab page and click HDFS File List.

    4. Select a data directory, for example, bd_app1.

      The bd_app1 directory is only an example. You can use any directory on the page or create a new one.

    5. Click Import Data and click Browse to select an OBS path and an HDFS path.

    6. Click OK.

      You can view the file upload progress on the File Operation Records tab page.