• Data Warehouse Service

dws
  1. Help Center
  2. Data Warehouse Service
  3. Developer Guide
  4. Best Practices
  5. Best Practices of Table Design
  6. Selecting a Distribution Mode

Selecting a Distribution Mode

Replication is to copy full data in a table to every DN in a cluster. This is suitable for tables having small record sets. Full data in a table stored on each DN avoids data redistribution during the join operation. This reduces network costs and plan segment (each having a thread), but generates much redundant data. Generally, replication is only used for small dimension tables.

In a hash table, hash values are generated for one or more columns. You can obtain the storage location of a tuple based on the mapping between DNs and the hash values. In a hash table, I/O resources on each node can be used during I/O read/write, which greatly improve the read/write speed of a table. Generally, a table containing a large amount data is defined as a hash table.

Policy

Description

Application Scenario

Hash

Table data is distributed on all DNs in the cluster in hash mode.

Fact tables containing a large amount of data

Replication

Full data in a table is stored on each DN in the cluster.

Small tables and dimension tables.

As shown in Figure 1, T1 is a replication table and T2 is a hash table.

Figure 1 Replication table and hash table