• Data Warehouse Service

dws
  1. Help Center
  2. Data Warehouse Service
  3. Developer Guide
  4. Query Performance Optimization
  5. Tuning Queries
  6. Typical SQL Optimization Methods
  7. Optimizing Statement Pushdown

Optimizing Statement Pushdown

Statement Pushdown

Currently, the DWS optimizer can use three methods to develop statement execution policies in the distributed framework: generating a statement pushdown plan, a distributed execution plan, or a distributed execution plan for sending statements.

  1. A statement pushdown plan pushes query statements from a CN down to DNs for execution and returns the execution results to the CN.
  2. In a distributed execution plan, a CN compiles and optimizes query statements, generates a plan tree, and then sends the plan tree to DNs for execution. After the statements have been executed, execution results will be returned to the CN.
  3. A distributed execution plan for sending statements pushes queries that can be pushed down (mostly base table scanning statements) to DNs for execution. Then, the plan obtains the intermediate results and sends them to the CN, on which the remaining queries are to be executed.

The third policy sends many intermediate results from the DNs to a CN for further execution. In this case, the CN performance bottleneck (in bandwidth, storage, and computing) is caused by statements that cannot be pushed down to DNs. Therefore, you are not advised to use the query statements that only the third policy is applicable to.

Statements cannot be pushed down to DNs if they have Functions That Do Not Support Pushdown or Syntax That Does Not Support Pushdown. Generally, you can rewrite the execution statements to solve the problem.

Viewing Whether the Execution Plan Has Been Pushed Down to DNs

Perform the following procedure to quickly determine whether the execution plan can be pushed down to DNs:

  1. Set the GUC parameter enable_fast_query_shipping to off to use the distributed framework policy for the query optimizer.

    SET enable_fast_query_shipping = off;

  2. View the execution plan.

    If the execution plan contains Data Node Scan, the SQL statements cannot be pushed down to DNs. If the execution plan contains Streaming, the SQL statements can be pushed down to DNs.

    For example:

    select
    concat(ss_item_sk, ss_ticket_number) 
    from store_sales 
    order by 1 
    limit 100; 

    In the following execution plan, the SQL statement cannot be pushed down to DNs.

      

Syntax That Does Not Support Pushdown

SQL syntax that does not support pushdown is described using the following table definition examples:

CREATE TABLE CUSTOMER1
(
    C_CUSTKEY     BIGINT NOT NULL
  , C_NAME        VARCHAR(25) NOT NULL
  , C_ADDRESS     VARCHAR(40) NOT NULL
  , C_NATIONKEY   INT NOT NULL
  , C_PHONE       CHAR(15) NOT NULL
  , C_ACCTBAL     DECIMAL(15,2)   NOT NULL
  , C_MKTSEGMENT  CHAR(10) NOT NULL
  , C_COMMENT     VARCHAR(117) NOT NULL
)
DISTRIBUTE BY hash(C_CUSTKEY);
CREATE TABLE test_stream(a int, b float);--float does not support redistribution.
CREATE TABLE sal_emp ( c1 integer[] ) DISTRIBUTE BY replication;
  • The returning statement cannot be pushed down.
    explain update customer1 set C_NAME = 'a' returning c_name;
                                   QUERY PLAN                                           
    ------------------------------------------------------------------
     Update on customer1  (cost=0.00..0.00 rows=30 width=187)
       Node/s: All datanodes
       Node expr: c_custkey
       ->  Data Node Scan on customer1 "_REMOTE_TABLE_QUERY_"  (cost=0.00..0.00 rows=30 width=187)
             Node/s: All datanodes
    (5 rows)
  • Queries of recursive range tables using WITH RECURSIVE cannot be pushed downed.
    explain WITH RECURSIVE customer1_rescursive (c_name,c_nationkey) AS (SELECT 1,c_nationkey FROM customer1 WHERE c_custkey = 1) SELECT * FROM customer1_rescursive;
                                             QUERY PLAN                                          
    ------------------------------------------------------------------
     CTE Scan on customer1_rescursive  (cost=0.00..20.00 rows=1000 width=8)
       CTE customer1_rescursive
         ->  Data Node Scan on customer1 "_REMOTE_TABLE_QUERY_"  (cost=0.00..0.00 rows=1 width=4)
               Node/s: dn_6003_6004
    (4 rows)
  • Aggregate functions using the ORDER BY statement cannot be pushed down.
    explain verbose select count ( c_custkey order by c_custkey) from customer1;
                                   
                             QUERY PLAN                                        
    ------------------------------------------------------------------ Aggregate  (cost=2.50..2.51 rows=1 width=8)
       Output: count(customer1.c_custkey ORDER BY customer1.c_custkey)
       ->  Data Node Scan on customer1 "_REMOTE_TABLE_QUERY_"  (cost=0.00..0.00 rows=30 width=8)
             Output: customer1.c_custkey
             Node/s: All datanodes
             Remote query: SELECT c_custkey FROM ONLY public.customer1 WHERE true
    (6 rows)
  • If columns in count(distinct expr) do not support redistribution, they do not support pushdown.
    explain verbose select count(distinct b) from test_stream;
                                              QUERY PLAN                                           
    ------------------------------------------------------------------ Aggregate  (cost=2.50..2.51 rows=1 width=8)
       Output: count(DISTINCT test_stream.b)
       ->  Data Node Scan on test_stream "_REMOTE_TABLE_QUERY_"  (cost=0.00..0.00 rows=30 width=8)
             Output: test_stream.b
             Node/s: All datanodes
             Remote query: SELECT b FROM ONLY public.test_stream WHERE true
    (6 rows)
  • In a statement using count(distinct) and group by, if the column specified using group by does not support redistribution, the statement does not support pushdown.

  • Statements using distinct on cannot be pushed down.
    explain verbose select distinct on (c_custkey) c_custkey from customer1 order by c_custkey;
                                                QUERY PLAN                                             
    ------------------------------------------------------------------ Unique  (cost=49.83..54.83 rows=30 width=8)
       Output: customer1.c_custkey
       ->  Sort  (cost=49.83..52.33 rows=30 width=8)
             Output: customer1.c_custkey
             Sort Key: customer1.c_custkey
             ->  Data Node Scan on customer1 "_REMOTE_TABLE_QUERY_"  (cost=0.00..0.00 rows=30 width=8)
                   Output: customer1.c_custkey
                   Node/s: All datanodes
                   Remote query: SELECT c_custkey FROM ONLY public.customer1 WHERE true
    (9 rows)
  • In a statement using FULL JOIN, if the column specified using JOIN does not support redistribution, the statement does not support pushdown.
    explain select * from test_stream t1 full join test_stream t2 on t1.a=t2.b;
                                                  QUERY PLAN                                              
    ------------------------------------------------------------------ Hash Full Join  (cost=0.38..0.82 rows=30 width=24)
       Hash Cond: ((t1.a)::double precision = t2.b)
       ->  Data Node Scan on test_stream "_REMOTE_TABLE_QUERY_"  (cost=0.00..0.00 rows=30 width=12)
             Node/s: All datanodes
       ->  Hash  (cost=0.00..0.00 rows=30 width=12)
             ->  Data Node Scan on test_stream "_REMOTE_TABLE_QUERY_"  (cost=0.00..0.00 rows=30 width=12)
                   Node/s: All datanodes
    (7 rows)
  • Subplans do not support pushdown.

    Subplans, related subqueries, and subqueries in the target list and Where clause.

  • Does not support the pushdown of EXPR_SUBLINK.

    Sublinks are used to represent subqueries in expressions. EXPR_SUBLINK is a parameter returned by a subquery and is used to create search criteria for its parent query. It is usually used in expr op (SELECT single_target_list from table...).

  • Does not support array expression pushdown.
    explain verbose select array[c_custkey,1] from customer1 order by c_custkey;
    
                              QUERY PLAN                                                    
    ------------------------------------------------------------------ Sort  (cost=49.83..52.33 rows=30 width=8)
       Output: (ARRAY[customer1.c_custkey, 1::bigint]), customer1.c_custkey
       Sort Key: customer1.c_custkey
       ->  Data Node Scan on "__REMOTE_SORT_QUERY__"  (cost=0.00..0.00 rows=30 width=8)
             Output: (ARRAY[customer1.c_custkey, 1::bigint]), customer1.c_custkey
             Node/s: All datanodes
             Remote query: SELECT ARRAY[c_custkey, 1::bigint], c_custkey FROM ONLY public.customer1 WHERE true ORDER BY 2
    (7 rows)

Functions That Do Not Support Pushdown

This module describes the variability of functions. In DWS, every function has a volatility classification, with the possibilities being:

  • IMMUTABLE

    Indicates that the function always returns the same result if the parameter values are the same.

  • STABLE

    Indicates that the function cannot modify the database, and that within a single table scan it will consistently return the same result for the same parameter values, but that its result varies by SQL statements.

  • VOLATILE

    Indicates that the function value can change even within a single table scan, so no optimizations can be made.

To view the variability of a function, query the provolatile column of pg_proc. If this column displays i, the function can be pushed down. If the column displays v or s, the function cannot be pushed down to DNs.

Take function regexp_like as an example. Two records are returned after pg_proc is queried. For the first record, i is displayed in the provolatile column, so this function can be pushed down to DNs. For the second record, v is displayed in the provolatile column, so this function cannot be pushed down to DNs.

select proname,pronamespace,pronargs,prorettype,proargtypes,provolatile from pg_proc where proname = 'regexp_like';

proname   | pronamespace | pronargs | prorettype | proargtypes | provolatile 
-------------+--------------+----------+------------+-------------+-------------
 regexp_like |           11 |        2 |         16 | 25 25       | i
 regexp_like |           11 |        3 |         16 | 25 25 25    | v
(2 rows)

In scenarios where a function does not support pushdown, perform one of the following as required:

  • If it is a system function, replace it with a functionally equivalent one.
  • If it is a customized function, analyze the actual service scenario and check whether the function variability definition can be set to the immutable attribute.

Example 1: Replace Functions That Cannot Be Pushed Down to DNs

Some functions cannot be pushed down (for example, if _REMTE_TABLE_QUERY is returned after you run the following command, the function cannot be pushed down), but they may have alternatives that can be pushed down to DNs. You are advised to select functions that support pushdown.

Take data tables in TPCDS as an example. To query the state and city names of all call centers and concatenate them into strings for output, you can execute the following statement:

SELECT concat(cc_state, cc_city)
FROM call_center;

The execution plan is:

From the execution plan, we can see that the query statement is deaggregated into two parts, and the following part is first pushed down to DNs for execution:

SELECT cc_state, cc_city FROM ONLY call_center WHERE true

After the execution, state and city names are obtained. Then, the concat function is executed on a CN to concatenate the state and city names into strings for output. In this way, resources on the CN are consumed, resulting in low CN performance.

In this case, you can modify the statement as follows:

SELECT cc_state || cc_city
FROM call_center;

It has same function as the original statement, and its execution plan is as follows:

The modified statement can be fully pushed down to DNs for execution to improve efficiency.

Example 2: User-defined Function

Define a user-defined function that generates fixed output for a certain input as the immutable type.

Take the sales information of TPCDS as an example. If you want to write a function to calculate the discount data of a product, you can define the function as follows:

CREATE FUNCTION func_percent_2 (NUMERIC, NUMERIC) RETURNS NUMERIC
AS 'SELECT $1 / $2 WHERE $2 > 0.01'
LANGUAGE SQL
VOLATILE;

Run the following statement:

SELECT func_percent_2(ss_sales_price, ss_list_price)
FROM store_sales;

The execution plan is as follows:

func_percent_2 is not pushed down, and ss_sales_price and ss_list_price are executed on a CN. In this case, a large amount of resources on the CN is consumed, and the performance deteriorates as a result.

In this example, the function returns certain output when certain input is entered. Therefore, we can modify the function to the following one:

CREATE FUNCTION func_percent_1 (NUMERIC, NUMERIC) RETURNS NUMERIC
AS 'SELECT $1 / $2 WHERE $2 > 0.01'
LANGUAGE SQL
IMMUTABLE;

Run the following statement:

SELECT func_percent_1(ss_sales_price, ss_list_price)
FROM store_sales;

The execution plan is as follows:

func_percent_1 is pushed down to DNs for quicker execution. (In TPCDS 1000X, where three CNs and 18 DNs are used, the query efficiency is improved by over 100 times).

Example 3: Pushing Down the Sorting Operation

For details, see Case: Pushing Down Sort Operations to DNs.