• Data Warehouse Service

dws
  1. Help Center
  2. Data Warehouse Service
  3. Developer Guide
  4. Tutorial: Tuning Table Design
  5. Step 8: Evaluating the Results

Step 8: Evaluating the Results

Compare the loading time, storage space usage, and query execution time before and after the table tuning.

The following table shows the example results of the cluster used in this tutorial. Your results will be different, but should show similar improvement.

Benchmark

Before

After

Change

Percentage (%)

Loading time (11 tables)

341584ms

257241ms

-84343ms

-24.7%

Occupied storage space

-

  

Store_Sales

42 GB

14 GB

-28 GB

-66.7%

Date_Dim

11 MB

27 MB

16 MB

145.5%

Store

232 KB

4352 KB

4120 KB

1775.9%

Item

110 MB

259 MB

149 MB

1354.5%

Time_Dim

11 MB

14 MB

13 MB

118.2%

Promotion

256 KB

3200 KB

2944 KB

1150%

Customer_Demographics

171 MB

11MB

-160 MB

-93.6

Customer_Address

170 MB

27 MB

-143 MB

-84.1%

Household_Demographics

504 KB

1280 KB

704 KB

139.7%

Customer

441 MB

111 MB

-330 MB

-74.8%

Income_Band

88 KB

896 KB

808 KB

918.2%

Total storage space

42 GB

15 GB

-27 GB

-64.3%

Query execution time

  

  

Query 1

14552.05ms

1783.353ms

-12768.697ms

-87.7%

Query 2

27952.36ms

14247.803ms

-13704.557ms

-49.0%

Query 3

17721.15ms

11441.659ms

-6279.491ms

-35.4%

Total execution time

60225.56ms

27472.815ms

-32752.745ms

-54.4%

  • The loading time was reduced by 24.7%.

    The distribution mode has obvious impact on loading data. The hash distribution mode improves the loading efficiency. The replication distribution mode reduces the loading efficiency. When the CPU and I/O are sufficient, the compression level has little impact on the loading efficiency. Typically, the efficiency of loading a column-store table is higher than that of a row-store table.

  • The storage usage space was reduced by 64.3%.

    The compression level, column storage, and hash distribution can save the storage space. A replication table increases the storage usage, but reduces the network overhead. Using the replication mode for small tables is a positive way to use small space for performance.

  • The query performance (speed) increased by 54.4%, indicating that the query time decreased by 54.4%.

    The query performance is improved by optimizing storage modes, distribution modes, and distribution columns. In a statistical analysis query on multi-column tables, column storage can improve query performance. In a hash table, I/O resources on each node can be used during I/O read/write, which improves the read/write speed of a table.

    Often, query performance can be improved further by rewriting queries and configuring workload management (WLM). For details, see Query Performance Tuning Overview and Resource Load Management Overview.