• Data Warehouse Service

dws
  1. Help Center
  2. Data Warehouse Service
  3. Developer Guide
  4. Tutorial: Tuning Table Design
  5. Step 6: Creating Another Table and Loading Data

Step 6: Creating Another Table and Loading Data

After selecting a storage mode, compression level, distribution mode, and distribution column for each table, use these attributes to create tables and reload data. Compare the system performance before and after the table recreation.

  1. Delete the tables created before.

    DROP TABLE store_sales;
    DROP TABLE date_dim;
    DROP TABLE store;
    DROP TABLE item;
    DROP TABLE time_dim;
    DROP TABLE promotion;
    DROP TABLE customer_demographics;
    DROP TABLE customer_address;
    DROP TABLE household_demographics;
    DROP TABLE customer;
    DROP TABLE income_band;
    
    DROP FOREIGN TABLE obs_from_store_sales_001;
    DROP FOREIGN TABLE obs_from_date_dim_001;
    DROP FOREIGN TABLE obs_from_store_001;
    DROP FOREIGN TABLE obs_from_item_001;
    DROP FOREIGN TABLE obs_from_time_dim_001;
    DROP FOREIGN TABLE obs_from_promotion_001;
    DROP FOREIGN TABLE obs_from_customer_demographics_001;
    DROP FOREIGN TABLE obs_from_customer_address_001;
    DROP FOREIGN TABLE obs_from_household_demographics_001;
    DROP FOREIGN TABLE obs_from_customer_001;
    DROP FOREIGN TABLE obs_from_income_band_001;

  2. Create tables and specify storage and distribution modes for them.

    Only the syntax for recreating the store_sales table is provided for simplicity. To recreate all the other tables, copy the syntax in Creating a Secondary Table After Design Tuning.

    CREATE TABLE store_sales
    (
        ss_sold_date_sk           integer                       ,
        ss_sold_time_sk           integer                       ,
        ss_item_sk                integer               not null,
        ss_customer_sk            integer                       ,
        ss_cdemo_sk               integer                       ,
        ss_hdemo_sk               integer                       ,
        ss_addr_sk                integer                       ,
        ss_store_sk               integer                       ,
        ss_promo_sk               integer                       ,
        ss_ticket_number          bigint               not null,
        ss_quantity               integer                       ,
        ss_wholesale_cost         decimal(7,2)                  ,
        ss_list_price             decimal(7,2)                  ,
        ss_sales_price            decimal(7,2)                  ,
        ss_ext_discount_amt       decimal(7,2)                  ,
        ss_ext_sales_price        decimal(7,2)                  ,
        ss_ext_wholesale_cost     decimal(7,2)                  ,
        ss_ext_list_price         decimal(7,2)                  ,
        ss_ext_tax                decimal(7,2)                  ,
        ss_coupon_amt             decimal(7,2)                  ,
        ss_net_paid               decimal(7,2)                  ,
        ss_net_paid_inc_tax       decimal(7,2)                  ,
        ss_net_profit             decimal(7,2)                  
    ) 
    WITH (ORIENTATION = column,COMPRESSION=middle)
    DISTRIBUTE BY hash (ss_item_sk);

  3. Load sample data into these tables.
  4. Record the loading time in the benchmark tables.

    Benchmark

    Before

    After

    Loading time (11 tables)

    341584ms

    257241ms

    Occupied storage space

    Store_Sales

    42GB

      

    Date_Dim

    11MB

      

    Store

    232kB

      

    Item

    110MB

      

    Time_Dim

    11MB

      

    Promotion

    256kB

      

    Customer_Demographics

    171MB

      

    Customer_Address

    170MB

      

    Household_Demographics

    504kB

      

    Customer

    441MB

      

    Income_Band

    88kB

      

    Total storage space

    42GB

      

    Query execution time

    Query 1

    14552.05ms

      

    Query 2

    27952.36ms

      

    Query 3

    17721.15ms

      

    Total execution time

    60225.56ms

      

  5. Run the ANALYZE command to update statistics.

    ANALYZE;

    If ANALYZE is returned, the execution is successful.

    ANALYZE

  6. Check for data skew.

    For a hash table, an improper distribution key may cause data skew or poor I/O performance on certain DNs. Therefore, you need to check the table to ensure that data is evenly distributed on each DN. You can run the following SQL statements to check for data skew:

    SELECT a.count,b.node_name FROM (SELECT count(*) AS count,xc_node_id FROM table_name GROUP BY xc_node_id) a, pgxc_node b WHERE a.xc_node_id=b.node_id ORDER BY a.count desc;

    xc_node_id corresponds to a DN. Generally, over 5% difference between the amount of data on different DNs is regarded as data skew. If the difference is over 10%, choose another distribution column. Multiple distribution columns can be selected in DWS to evenly distribute data.