• Data Warehouse Service

dws
  1. Help Center
  2. Data Warehouse Service
  3. Developer Guide
  4. Query Performance Optimization
  5. Optimization Cases
  6. Setting cost_param and Optimizing Query Performance

Setting cost_param and Optimizing Query Performance

Symptom 1

If cost_param & bit0 is set to 1, an improved mechanism is used for estimating the selection rate of non-equi-joins. This method is more accurate for estimating the selection rate of joins between two identical tables. The following example describes the optimization scenario when bit0 of cost_param is set to 1.

Note: The selection rate indicates the percentage for which the number of rows meeting the join conditions account of the JOIN results when the JOIN relationship is established between two tables.

The table structure is as follows:

CREATE TABLE LINEITEM
(
L_ORDERKEYBIGINT NOT NULL
, L_PARTKEYBIGINT NOT NULL
, L_SUPPKEYBIGINT NOT NULL
, L_LINENUMBERBIGINT NOT NULL
, L_QUANTITYDECIMAL(15,2) NOT NULL
, L_EXTENDEDPRICEDECIMAL(15,2) NOT NULL
, L_DISCOUNTDECIMAL(15,2) NOT NULL
, L_TAXDECIMAL(15,2) NOT NULL
, L_RETURNFLAGCHAR(1) NOT NULL
, L_LINESTATUSCHAR(1) NOT NULL
, L_SHIPDATEDATE NOT NULL
, L_COMMITDATEDATE NOT NULL
, L_RECEIPTDATE DATE NOT NULL
, L_SHIPINSTRUCT CHAR(25) NOT NULL
, L_SHIPMODECHAR(10) NOT NULL
, L_COMMENTVARCHAR(44) NOT NULL
) with (orientation = column, COMPRESSION = MIDDLE) distribute by hash(L_ORDERKEY);

CREATE TABLE ORDERS
(
O_ORDERKEYBIGINT NOT NULL
, O_CUSTKEYBIGINT NOT NULL
, O_ORDERSTATUSCHAR(1) NOT NULL
, O_TOTALPRICEDECIMAL(15,2) NOT NULL
, O_ORDERDATEDATE NOT NULL
, O_ORDERPRIORITYCHAR(15) NOT NULL
, O_CLERKCHAR(15) NOT NULL
, O_SHIPPRIORITYBIGINT NOT NULL
, O_COMMENTVARCHAR(79) NOT NULL
)with (orientation = column, COMPRESSION = MIDDLE) distribute by hash(O_ORDERKEY);

The query statements are as follows:

explain verbose select
count(*) as numwait 
from
lineitem l1,
orders 
where
o_orderkey = l1.l_orderkey
and o_orderstatus = 'F'
and l1.l_receiptdate > l1.l_commitdate
and not exists (
select
*
from
lineitem l3
where
l3.l_orderkey = l1.l_orderkey
and l3.l_suppkey <> l1.l_suppkey
and l3.l_receiptdate > l3.l_commitdate
)
order by
numwait desc;

When bit0 of cost_param is 0, the execution plan is shown as follows:

Optimization Analysis 1

These queries are from Anti Join connected in the lineitem table. When cost_param & bit0 is 1, the estimated number of Anti Join rows greatly differ from that of the actual number of rows so that the query performance deteriorates. You can estimate the number of Anti Join rows more accurately by setting cost_param & bit0 to 1 to improve the query performance. The optimized execution plan is as follows:

Symptom 2

When cost_param & bit1 is 1, the selection rate is estimated based on multiple filter criteria. The lowest selection rate among all filter criteria, but not the product of the selection rates for two tables under a specific filter criterion, is used as the total selection rate. This method is more accurate when a close correlation exists between the columns to be filtered. The following example describes the optimization scenario when cost_param & bit1 is 1.

The table structure is as follows:

CREATE TABLE NATION
(
N_NATIONKEYINT NOT NULL
, N_NAMECHAR(25) NOT NULL
, N_REGIONKEYINT NOT NULL
, N_COMMENTVARCHAR(152)
) distribute by replication;
CREATE TABLE SUPPLIER
(
S_SUPPKEYBIGINT NOT NULL
, S_NAMECHAR(25) NOT NULL
, S_ADDRESSVARCHAR(40) NOT NULL
, S_NATIONKEYINT NOT NULL
, S_PHONECHAR(15) NOT NULL
, S_ACCTBALDECIMAL(15,2) NOT NULL
, S_COMMENTVARCHAR(101) NOT NULL
) distribute by hash(S_SUPPKEY);
CREATE TABLE PARTSUPP
(
PS_PARTKEYBIGINT NOT NULL
, PS_SUPPKEYBIGINT NOT NULL
, PS_AVAILQTYBIGINT NOT NULL
, PS_SUPPLYCOSTDECIMAL(15,2)NOT NULL
, PS_COMMENTVARCHAR(199) NOT NULL
)distribute by hash(PS_PARTKEY);

The query statements are as follows:

explain verbose select
nation,
sum(amount) as sum_profit 
from
(
select
n_name as nation,
l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as amount
from
supplier,
lineitem,
partsupp,
nation
where
s_suppkey = l_suppkey
and ps_suppkey = l_suppkey
and ps_partkey = l_partkey
and s_nationkey = n_nationkey
) as profit 
group by nation 
order by nation;

When bit1 of cost_param is is 0, the execution plan is shown as follows:

Optimization Analysis 2

In the preceding queries, the hash join criteria of the supplier, lineitem, and partsupp tables are setting lineitem.l_suppkey to supplier.s_suppkey and lineitem.l_partkey to partsupp.ps_partkey. Two filter criteria exist in the hash join conditions. lineitem.l_suppkey in the first filter criteria and lineitem.l_partkey in the second filter criteria are two columns with strong relationship of the lineitem table. In this situation, when you estimate the rate of the hash join conditions, if cost_param & bit1 is 0, the selection rate is estimated based on multiple filter criteria. The lowest selection rate among all filter criteria, but not the product of the selection rates for two tables under a specific filter criteria, is used as the total selection rate. This method is more accurate when a close correlation exists between the columns to be filtered. The plan after optimization is shown as follows: