• Data Warehouse Service

dws
  1. Help Center
  2. Data Warehouse Service
  3. Developer Guide
  4. Tutorial: Tuning Table Design
  5. Step 4: Selecting Distribution Modes

Step 4: Selecting Distribution Modes

DWS uses a massively parallel processing (MPP) system of the shared-nothing architecture. The MPP performs horizontal partitioning to store tuples in the service data table on all DNs using proper distribution policies. Currently, user tables can be distributed in Replication or Hash mode.

  • Replication: stores a full table on each DN. Full data in a table stored on each DN avoids data redistribution during the join operation. This reduces network costs and plan segment (each having a thread), but generates much redundant data. Generally, replication is only used for small dimension tables.
  • Hash: A distribution key must be specified for a user table. If a record is inserted, the system performs hash computing based on values in the distribute column and then stores data on the related DN. In a hash table, I/O resources on each node can be used during I/O read/write, which greatly improve the read/write speed of a table. Generally, a large table (containing over 1 million records) is defined as a hash table.

Based on table sizes provided in Step 2: Testing System Performance of the Initial Table and Establishing a Baseline, set the distribution mode as follows.

Table Name

Number of Rows

Distribution Mode

Store_Sales

287997024

Hash

Date_Dim

73049

Replication

Store

402

Replication

Item

204000

Replication

Time_Dim

86400

Replication

Promotion

1000

Replication

Customer_Demographics

1920800

Hash

Customer_Address

1000000

Hash

Household_Demographics

7200

Replication

Customer

1981703

Hash

Income_Band

20

Replication