• Data Warehouse Service

  1. Help Center
  2. Data Warehouse Service
  3. Developer Guide
  4. Query Performance Optimization
  5. Tuning Queries
  6. Reviewing and Modifying a Table Definition
  7. Reviewing and Modifying a Table Definition

Reviewing and Modifying a Table Definition

In a distributed framework, data is distributed on DNs. Data on one or more DNs is stored on a physical storage device. To properly define a table, you must:

  1. Evenly distribute data on each DN to avoid the available capacity decrease of a cluster caused by insufficient storage space of the storage device associated with a DN. Select proper distribution columns to avoid data skew.
  2. Evenly assign table scanning tasks on each DN to avoid that a DN is overloaded by the table scanning tasks. Do not select columns in the equivalent filter of a base table.
  3. Reduce the data volume scanned by using the partition pruning mechanism.
  4. Avoid the use of random I/O by using clustering or partial clustering.
  5. Avoid data shuffle to reduce the network pressure by selecting the join-condition or group by column as the distribution key.

The distribution key is the core for defining a table. Follow Figure 1 to define a table. The table definition is created during the database design and is reviewed and modified during the SQL statement optimization.

Figure 1 Procedure of defining a table