• Data Warehouse Service

dws
  1. Help Center
  2. Data Warehouse Service
  3. Developer Guide
  4. Tutorial: Tuning Table Design
  5. Step 5: Selecting a Distribution Column

Step 5: Selecting a Distribution Column

If your table is distributed using hash, select a proper distribution key to prevent data skew or poor I/O performance on certain DNs.

You are advised to select distribution keys for each hash table based on the following rules:

  1. The values of the distribution column should be discrete so that data can be evenly distributed on each DN. For example, select the primary key of a hash table as a distribution column or the ID card number as the distribution column in an employee information table.
  2. Do not select the column where a constant filter exists. For example, if a constant constraint (for example, zqdh= '000001') exists on the zqdh column in some queries on the dwcjk table, you are not advised to use zqdh as the distribution column.
  3. Select the join conditions in a query as distribution keys. In this way, Join tasks can be pushed down to DNs for execution, and the communication data between DNs is reduced.

Based on the best practice rules, select the primary key of each table as the hash table distribution key.

Table Name

Number of Records

Distribution Mode

Distribution Key

Store_Sales

287997024

Hash

ss_item_sk

Date_Dim

73049

Replication

-

Store

402

Replication

-

Item

204000

Replication

-

Time_Dim

86400

Replication

-

Promotion

1000

Replication

-

Customer_Demographics

1920800

Hash

cd_demo_sk

Customer_Address

1000000

Hash

ca_address_sk

Household_Demographics

7200

Replication

-

Customer

1981703

Hash

c_customer_sk

Income_Band

20

Replication

-