• Data Warehouse Service

dws
  1. Help Center
  2. Data Warehouse Service
  3. Developer Guide
  4. Data Import
  5. Importing Data from MRS to a Cluster
  6. Preparing Data in an MRS Cluster

Preparing Data in an MRS Cluster

Before importing data from MRS to a DWS cluster, you must have:

  1. Created an MRS cluster.
  2. Created the Hive/Spark ORC table in the MRS cluster and stored the table data to the HDFS path corresponding to the table.

If you have completed the preparations, skip this section.

In this tutorial, the Hive ORC table will be created in the MRS cluster as an example to complete the preparation work. The process for creating the Spark ORC table in the MRS cluster and the SQL syntax are similar to those of Hive.

Data File

The sample data of the product_info.txt data file is as follows:

100,XHDK-A-1293-#fJ3,2017-09-01,A,2017 Autumn New Shirt Women,red,M,328,2017-09-04,715,good
205,KDKE-B-9947-#kL5,2017-09-01,A,2017 Autumn New Knitwear Women,pink,L,584,2017-09-05,406,very good!
300,JODL-X-1937-#pV7,2017-09-01,A,2017 autumn new T-shirt men,red,XL,1245,2017-09-03,502,Bad.
310,QQPX-R-3956-#aD8,2017-09-02,B,2017 autumn new jacket women,red,L,411,2017-09-05,436,It's really super nice
150,ABEF-C-1820-#mC6,2017-09-03,B,2017 Autumn New Jeans Women,blue,M,1223,2017-09-06,1200,The seller's packaging is exquisite
200,BCQP-E-2365-#qE4,2017-09-04,B,2017 autumn new casual pants men,black,L,997,2017-09-10,301,The clothes are of good quality.
250,EABE-D-1476-#oB1,2017-09-10,A,2017 autumn new dress women,black,S,841,2017-09-15,299,Follow the store for a long time.
108,CDXK-F-1527-#pL2,2017-09-11,A,2017 autumn new dress women,red,M,85,2017-09-14,22,It's really amazing to buy
450,MMCE-H-4728-#nP9,2017-09-11,A,2017 autumn new jacket women,white,M,114,2017-09-14,22,Open the package and the clothes have no odor
260,OCDA-G-2817-#bD3,2017-09-12,B,2017 autumn new woolen coat women,red,L,2004,2017-09-15,826,Very favorite clothes
980,ZKDS-J-5490-#cW4,2017-09-13,B,2017 Autumn New Women's Cotton Clothing,red,M,112,2017-09-16,219,The clothes are small
98,FKQB-I-2564-#dA5,2017-09-15,B,2017 autumn new shoes men,green,M,4345,2017-09-18,5473,The clothes are thick and it's better this winter.
150,DMQY-K-6579-#eS6,2017-09-21,A,2017 autumn new underwear men,yellow,37,2840,2017-09-25,5831,This price is very cost effective
200,GKLW-l-2897-#wQ7,2017-09-22,A,2017 Autumn New Jeans Men,blue,39,5879,2017-09-25,7200,The clothes are very comfortable to wear
300,HWEC-L-2531-#xP8,2017-09-23,A,2017 autumn new shoes women,brown,M,403,2017-09-26,607,good
100,IQPD-M-3214-#yQ1,2017-09-24,B,2017 Autumn New Wide Leg Pants Women,black,M,3045,2017-09-27,5021,very good.
350,LPEC-N-4572-#zX2,2017-09-25,B,2017 Autumn New Underwear Women,red,M,239,2017-09-28,407,The seller's service is very good
110,NQAB-O-3768-#sM3,2017-09-26,B,2017 autumn new underwear women,red,S,6089,2017-09-29,7021,The color is very good 
210,HWNB-P-7879-#tN4,2017-09-27,B,2017 autumn new underwear women,red,L,3201,2017-09-30,4059,I like it very much and the quality is good.
230,JKHU-Q-8865-#uO5,2017-09-29,C,2017 Autumn New Clothes with Chiffon Shirt,black,M,2056,2017-10-02,3842,very good

  

Creating a Hive ORC Table in an MRS Cluster

  1. Create an MRS cluster.

    For details, see Creating an MRS Data Source Connection in Data Warehouse Service User Guide.

  2. Log in to the Hive client of the MRS cluster.
    1. Log in to a Master node.

      For details, see Logging In to a Master Node in MapReduce Service User Guide.

    2. Run the following command to switch the user:
      sudo su - omm
    3. Run the following command to go to the client directory:
      cd /opt/client
    4. Run the following command to configure the environment variables:
      source bigdata_env
    5. Run the following command to start the Hive client:
      beeline
  3. Create a database demo on Hive.

    Run the following command to create the database demo:

    CREATE DATABASE demo;
  4. Create table product_info of the Hive TEXTFILE type in the database demo and import the Data File (product_info.txt) to the HDFS path corresponding to the table.

    Run the following command to switch to the database demo:

    USE demo;

    Run the following command to create table product_info and define the table fields based on data in the Data File.

    DROP TABLE product_info;
    
    CREATE TABLE product_info 
    (    
        product_price                int            not null,
        product_id                   char(30)       not null,
        product_time                 date           ,
        product_level                char(10)       ,
        product_name                 varchar(200)   ,
        product_type1                varchar(20)    ,
        product_type2                char(10)       ,
        product_monthly_sales_cnt    int            ,
        product_comment_time         date           ,
        product_comment_num          int        ,
        product_comment_content      varchar(200)                   
    ) 
    row format delimited fields terminated by ',' 
    stored as TEXTFILE;

    For details about how to import data to an MRS cluster, see Managing Data Files in MapReduce Service User Guide.

  5. Create a Hive ORC table named product_info_orc in the database demo.

    Run the following command to create the Hive ORC table product_info_orc. The table fields are the same as those of the product_info table created in the previous step.

    DROP TABLE product_info_orc;
    
    CREATE TABLE product_info_orc
    (    
        product_price                int            not null,
        product_id                   char(30)       not null,
        product_time                 date           ,
        product_level                char(10)       ,
        product_name                 varchar(200)   ,
        product_type1                varchar(20)    ,
        product_type2                char(10)       ,
        product_monthly_sales_cnt    int            ,
        product_comment_time         date           ,
        product_comment_num          int            ,
        product_comment_content      varchar(200)                   
    ) 
    row format delimited fields terminated by ',' 
    stored as orc;
  6. Insert data in the product_info table to the Hive ORC table product_info_orc.
    insert into product_info_orc select * from product_info;

    Query table product_info_orc.

    select * from product_info_orc;

    If data displayed in the Data File can be queried, the data has been successfully inserted to the ORC table.