• Data Warehouse Service

dws
  1. Help Center
  2. Data Warehouse Service
  3. Developer Guide
  4. SQL Reference
  5. SQL Syntax
  6. SELECT

SELECT

Function

SELECT retrieves rows from a table or view.

Serving as an overlaid filter for a database table, SELECT using SQL keywords retrieves required data from data tables.

Precautions

  • Using SELECT can join HDFS and ordinary tables, but cannot join ordinary and GDS foreign tables. That is, A SELECT statement cannot contain both ordinary and GDS foreign tables.
  • The user must have the SELECT permission on every column used in the SELECT command.
  • UPDATE permission is required when using FOR UPDATE or FOR SHARE.

Syntax

  • Querying data
[ WITH [ RECURSIVE ] with_query [, ...] ]
SELECT [/*+ plan_hint */] [ ALL | DISTINCT [ ON ( expression [, ...] ) ] ]
{ * | {expression [ [ AS ] output_name ]} [, ...] }
[ FROM from_item [, ...] ]
[ WHERE condition ]
[ GROUP BY grouping_element [, ...] ]
[ HAVING condition [, ...] ]
[ WINDOW {window_name AS ( window_definition )} [, ...] ]
[ { UNION | INTERSECT | EXCEPT | MINUS } [ ALL | DISTINCT ] select ]
[ ORDER BY {expression [ [ ASC | DESC | USING operator ] | nlssort_expression_clause ] [ NULLS { FIRST | LAST } ]} [, ...] ]
[ LIMIT { count | ALL } ]
[ OFFSET start [ ROW | ROWS ] ]
[ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY ]
[ {FOR { UPDATE | SHARE } [ OF table_name [, ...] ] [ NOWAIT ]} [...] ];
NOTE:

In condition and expression, you can use the aliases of expressions in targetlist in compliance with the following rules:

  • Reference only in the same level.
  • Only reference aliases in the targetlist.
  • Reference a prior expression in a subsequent expression.
  • The volatile function cannot be used.
  • The Window function cannot be used.
  • Do not reference an alias in the join on condition.
  • An error is reported if the targetlist contains multiple referenced aliases.
  • The subquery with_query is as follows:
    with_query_name [ ( column_name [, ...] ) ]
        AS ( {select | values | insert | update | delete} )
  • The specified query source from_item is as follows:
    {[ ONLY ] table_name [ * ] [ partition_clause ] [ [ AS ] alias [ ( column_alias [, ...] ) ] ]
    |( select ) [ AS ] alias [ ( column_alias [, ...] ) ]
    |with_query_name [ [ AS ] alias [ ( column_alias [, ...] ) ] ]
    |function_name ( [ argument [, ...] ] ) [ AS ] alias [ ( column_alias [, ...] | column_definition [, ...] ) ]
    |function_name ( [ argument [, ...] ] ) AS ( column_definition [, ...] )
    |from_item [ NATURAL ] join_type from_item [ ON join_condition | USING ( join_column [, ...] ) ]}
  • The group clause is as follows:
    ( )
    | expression
    | ( expression [, ...] )
    | ROLLUP ( { expression | ( expression [, ...] ) } [, ...] )
    | CUBE ( { expression | ( expression [, ...] ) } [, ...] )
    | GROUPING SETS ( grouping_element [, ...] )
  • The specified partition partition_clause is as follows:
    PARTITION { ( partition_name ) | 
            FOR (  partition_value [, ...] ) }
    NOTE:

    You can specify partitions only for ordinary tables.

  • The sorting order nlssort_expression_clause is as follows:
    NLSSORT ( column_name, ' NLS_SORT = { SCHINESE_PINYIN_M | generic_m_ci } ' )
  • Simplified query syntax, equivalent to select * from table_name.
    TABLE { ONLY {(table_name)| table_name} | table_name [ * ]};

Parameter Description

  • WITH [ RECURSIVE ] with_query [, ...]

    The WITH clause allows you to specify one or more subqueries that can be referenced by name in the primary query, equal to temporary table.

    If RECURSIVE is specified, it allows a SELECT subquery to reference itself by name.

    The detailed format of with_query is as follows: with_query_name [ ( column_name [, ...] ) ] AS ( {select | values | insert | update | delete} )

    • with_query_name specifies the name of the result set generated by a subquery. Such names can be used to access the result sets of subqueries in a query.
    • column_name specifies the column name displayed in the subquery result set.
    • Each subquery can be a SELECT, VALUES, INSERT, UPDATE or DELETE statement.
  • plan_hint clause

    Follows the SELECT keyword in the /*+ */ format. This clause is used to optimize the plan generated by the statement block corresponding to SELECT. For details, see Plan Hint Optimization.

  • ALL

    Specifies that all rows meeting the requirements are returned. This is the default behavior, so you can omit this key word.

  • DISTINCT [ ON ( expression [, ...] ) ]

    Removes all duplicate rows from the SELECT result so one row is kept from each group of duplicates.

    ON ( expression [, ...] ) is only reserved for the first row among all the rows with the same result calculated using given expressions.

    DISTINCT ON expression is explained with the same rule of ORDER BY. Unless you use ORDER BY to guarantee that the required row appears first, you cannot know what the first row is.

  • SELECT list

    Indicates columns to be queried. Some or all columns (using wildcard character *) can be queried.

    You may use the AS output_name clause to give an alias for an output column. The alias is used for the displaying of the output column.

    Column names may be either of:

    • Manually input column names which are spaced using commas (,).
    • Fields computed in the FROM clause.
  • FROM clause

    Indicates one or more source tables for SELECT.

    The FROM clause can contain the following elements:

    • table_name

      Indicates the name (optionally schema-qualified) of an existing table or view, for example, schema_name.table_name.

    • alias

      Gives a temporary alias to a table to facilitate the quotation by other queries.

      An alias is used for brevity or to eliminate ambiguity for self-joins. When an alias is provided, it completely hides the actual name of the table or function.

    • column_alias

      Specifies the column alias.

    • PARTITION

      Queries data in the specified partition in a partitioned table.

    • partition_name

      Specifies the name of a partition.

    • partition_value

      Specifies the value of the specified partition key. If there are many partition keys, use the PARTITION FOR clause to specify the value of the only partition key you want to use.

    • subquery

      Performs a subquery in the FROM clause. A temporary table is created to save subquery results.

    • with_query_name

      WITH clause can also be the source of FROM clause and can be referenced with the name queried by executing WITH.

    • function_name

      Function name. Function calls can appear in the FROM clause.

    • join_type

      There are five types below:

      • [ INNER ] JOIN

        A JOIN clause combines two FROM items. Use parentheses if necessary to determine the order of nesting. In the absence of parentheses, JOIN nests left-to-right.

        In any case, JOIN binds more tightly than the commas separating FROM items.

      • LEFT [ OUTER ] JOIN

        Returns all rows in the qualified Cartesian product (all combined rows that pass its join condition), and pluses one copy of each row in the left-hand table for which there was no right-hand row that passed the join condition. This left-hand row is extended to the full width of the joined table by inserting NULL values for the right-hand columns. Note that only the JOIN clause's own condition is considered while deciding which rows have matches. Outer conditions are applied afterwards.

      • RIGHT [ OUTER ] JOIN

        Returns all the joined rows, plus one row for each unmatched right-hand row (extended with NULL on the left).

        This is just a notational convenience, since you could convert it to a LEFT OUTER JOIN by switching the left and right inputs.

      • FULL [ OUTER ] JOIN

        Returns all the joined rows, pluses one row for each unmatched left-hand row (extended with NULL on the right), and pluses one row for each unmatched right-hand row (extended with NULL on the left).

      • CROSS JOIN

        CROSS JOIN is equivalent to INNER JOIN ON (TRUE), which means no rows are removed by qualification. These join types are just a notational convenience, since they do nothing you could not do with plain FROM and WHERE.

        NOTE:

        For the INNER and OUTER join types, a join condition must be specified, namely exactly one of NATURAL ON, join_condition, or USING (join_column [, ...]). For CROSS JOIN, none of these clauses can appear.

      CROSS JOIN and INNER JOIN produce a simple Cartesian product, the same result as you get from listing the two items at the top level of FROM.

    • ON join_condition

      A join condition to define which rows have matches in joins. Example: ON left_table.a = right_table.a

    • USING(join_column[, ...])

      ON left_table.a = right_table.a AND left_table.b = right_table.b ... abbreviation. Corresponding columns must have the same name.

    • NATURAL

      NATURAL is a shorthand for a USING list that mentions all columns in the two tables that have the same names.

    • from item

      Specifies the name of the query source object connected.

  • WHERE clause

    The WHERE clause forms an expression for row selection to narrow down the query range of SELECT. The condition is any expression that evaluates to a result of Boolean type. Rows that do not satisfy this condition will be eliminated from the output.

    To Where clause, if special character "%", "_", or "\" is queried in LIKE, add the slash "\" before each character.

  • GROUP BY clause

    Condenses query results into a single row or selected rows that share the same values for the grouped expressions.

    • CUBE ( { expression | ( expression [, ...] ) } [, ...] )

      A CUBE grouping is an extension to the GROUP BY clause that creates subtotals for all of the possible combinations of the given list of grouping columns (or expressions). In terms of multidimensional analysis, CUBE generates all the subtotals that could be calculated for a data cube with the specified dimensions. Notice that n elements of a CUBE translate to 2n grouping sets. For example, if the CUBE clause contains three expressions (n = 3), the number of result sets = 2n = 23 = 8 groups. Rows grouped on the values of n expressions are called regular rows, and the rest are called superaggregate rows.

    • GROUPING SETS ( grouping_element [, ...] )

      GROUPING SETS is another extension to the GROUP BY clause. It allows users to specify multiple GROUP BY options. This improves efficiency by trimming away unnecessary data. After you specify the set of groups that you want to create using a GROUPING SETS expression within a GROUP BY clause, the database does not need to compute a whole ROLLUP or CUBE.

    If the SELECT list expression quotes some ungrouped fields and no aggregate function is used, an error is displayed. This is because multiple values may be returned for ungrouped fields.

  • HAVING clause

    Selects special groups by working with the GROUP BY clause. The HAVING clause compares some attributes of groups with a constant. Only groups that matching the logical expression in the HAVING clause are extracted.

  • WINDOW clause

    The general format is WINDOW window_name AS ( window_definition ) [, ...]. window_name is a name can be referenced by window_definition. window_definition can be expressed in the following forms:

    [ existing_window_name ]

    [ PARTITION BY expression [, ...] ]

    [ ORDER BY expression [ ASC | DESC | USING operator ] [ NULLS { FIRST | LAST } ] [, ...] ]

    [ frame_clause ]

    frame_clause defines a window frame for the window function. The window function (not all window functions) depends on window frame and window frame is a set of relevant rows of the current query row. frame_clause can be expressed in the following forms:

    [ RANGE | ROWS ] frame_start

    [ RANGE | ROWS ] BETWEEN frame_start AND frame_end

    frame_start and frame_end can be expressed in the following forms:

    UNBOUNDED PRECEDING

    value PRECEDING

    CURRENT ROW

    value FOLLOWING

    UNBOUNDED FOLLOWING

    For the query of column storage table, only row_number window function is supported, frame_clause is not supported.

  • UNION clause

    Computes the set union of the rows returned by the involved SELECT statements.

    The UNION clause has the following constraints:

    • By default, the result of UNION does not contain any duplicate rows unless the ALL option is specified.
    • Multiple UNION operators in the same SELECT statement are evaluated left to right, unless otherwise specified by parentheses.
    • FOR UPDATE cannot be specified either for a UNION result or for any input of a UNION.

    General expression:

    select_statement UNION [ALL] select_statement

    • select_statement can be any SELECT statement without an ORDER BY, LIMIT, FOR UPDATE, or FOR SHARE statement.
    • ORDER BY and LIMIT in parentheses can be attached in a sub-expression.
  • INTERSECT clause

    Computes the set intersection of rows returned by the involved SELECT statements. The result of INTERSECT does not contain any duplicate rows.

    The INTERSECT clause has the following constraints:

    • Multiple INTERSECT operators in the same SELECT statement are evaluated left to right, unless otherwise specified by parentheses.
    • Processing INTERSECT preferentially when UNION and INTERSECT operations are executed for results of multiple SELECT statements.

    General format:

    select_statement INTERSECT select_statement

    select_statement can be any SELECT statement without a FOR UPDATE clause.

  • EXCEPT clause

    EXCEPT clause has the following common form:

    select_statement EXCEPT [ ALL ] select_statement

    select_statement can be any SELECT statement without a FOR UPDATE clause.

    The EXCEPT operator computes the set of rows that are in the result of the left SELECT statement but not in the result of the right one.

    The result of EXCEPT does not contain any duplicate rows unless the ALL option is specified. To execute ALL, a row that has m duplicates in the left table and n duplicates in the right table will appear max(m-n,0) duplicates in the result set.

    Multiple EXCEPT operators in the same SELECT statement are evaluated left to right, unless parentheses dictate otherwise. EXCEPT binds at the same level as UNION.

    Currently, FOR UPDATE and FOR SHARE cannot be specified either for an EXCEPT result or for any input of an EXCEPT.

  • MINUS clause

    Has the same function and syntax as EXCEPT clause.

  • ORDER BY clause

    Sorts data retrieved by SELECT in descending or ascending order. If the ORDER BY expression contains multiple columns:

    • If two columns are equal according to the leftmost expression, they are compared according to the next expression and so on.
    • If they are equal according to all specified expressions, they are returned in an implementation-dependent order.
    • Columns sorted by ORDER BY must be contained in the result retrieved by SELECT.

    To support Chinese pinyin order or case insensitive order, specify the UTF-8 or GBK encoding mode during database initiation. The commands are as follows:

    initdb – E UTF8 – D../data – locale=zh_CN.UTF-8 or initdb – E GBK-D../data – locale=zh_CN.GBK.

  • LIMIT clause

    Consists of two independent sub-clauses:

    LIMIT { count | ALL }

    OFFSET start count specifies the maximum number of rows to return, while start specifies the number of rows to skip before starting to return rows. When both are specified, start rows are skipped before starting to count the count rows to be returned.

  • OFFSET clause

    The SQL: 2008 standard has introduced a different clause:

    OFFSET start { ROW | ROWS }

    start specifies the number of rows to skip before starting to return rows.

  • FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY

    If count is omitted in a FETCH clause, it defaults to 1.

  • FOR UPDATE clause

    Locks rows retrieved by SELECT. This ensures that the rows cannot be modified or deleted by other transactions until the current transaction ends. That is, other transactions that attempt UPDATE, DELETE, or SELECT FOR UPDATE of these rows will be blocked until the current transaction ends.

    To avoid waiting for the committing of other transactions, you can apply NOWAIT. Rows to which NOWAIT applies cannot be immediately locked. After SELECT FOR UPDATE NOWAIT is executed, an error is reported.

    FOR SHARE behaves similarly, except that it acquires a shared rather than exclusive lock on each retrieved row. A shared lock blocks other transaction from performing UPDATE, DELETE, or SELECT FOR UPDATE on these rows, but it does not prevent them from performing SELECT FOR SHARE.

    If specified tables are named in FOR UPDATE or FOR SHARE, then only rows coming from those tables are locked; any other tables used in SELECT are simply read as usual. Otherwise, locking all tables in the command.

    If FOR UPDATE or FOR SHARE is applied to a view or sub-query, it affects all tables used in the view or sub-query.

    Multiple FOR UPDATE and FOR SHARE clauses can be written if it is necessary to specify different locking behaviors for different tables.

    If the same table is mentioned (or implicitly affected) by both FOR UPDATE and FOR SHARE clauses, it is processed as FOR UPDATE. Similarly, a table is processed as NOWAIT if that is specified in any of the clauses affecting it.

    • For SQL statements containing FOR UPDATE or FOR SHARE, if their execution plans cannot be implemented, an error is reported. If their execution plans can be implemented, they are issued to DNs for execution.
    • The query of column storage table does not support for update/share.
  • NLS_SORT

    Indicates a field to be ordered in a special mode. Currently, only the Chinese Pinyin order and case insensitive order are supported.

    Value range:

    • SCHINESE_PINYIN_M, sorted by Pinyin order. To use this sort method, specify GBK as the encoding format when you create the database. If you do not do so, this value is invalid.
    • generic_m_ci, case-insensitive order.
  • PARTITION clause

    Queries data in the specified partition in a partitioned table.

Example

-- Obtain the temp_t temporary table by a subquery and query all records in this table:
WITH temp_t(name,isdba) AS (SELECT usename,usesuper FROM pg_user) SELECT * FROM temp_t;

-- Query all the r_reason_sk records in the tpcds.reason table and de-duplicate them.
SELECT DISTINCT(r_reason_sk) FROM tpcds.reason;

-- Example of a LIMIT subquery: Obtain a record from the table.
SELECT * FROM tpcds.reason LIMIT 1;

-- Query all records and sort them in alphabetic order.
SELECT r_reason_desc FROM tpcds.reason ORDER BY r_reason_desc;

-- Use table aliases to obtain data from the pg_user and pg_user_status tables:
SELECT a.usename,b.locktime FROM pg_user a,pg_user_status b WHERE a.usesysid=b.roloid;

-- Example of the FULL JOIN subquery: Join data in the pg_user and pg_user_status tables.
SELECT a.usename,b.locktime,a.usesuper FROM pg_user a FULL JOIN pg_user_status b on a.usesysid=b.roloid;

-- Example of the GROUP BY subquery: Filter data based on query conditions, and group the results.
SELECT r_reason_id, AVG(r_reason_sk) FROM tpcds.reason GROUP BY r_reason_id HAVING AVG(r_reason_sk) > 25;

-- Example of the GROUP BY CUBE subquery: Filter data based on query conditions, and group the results.
SELECT r_reason_id,r_reason_desc,AVG(r_reason_sk) FROM tpcds.reason GROUP BY CUBE(r_reason_id,r_reason_sk);

-- Example of the GROUP BY GROUPING SETS subquery: Filter data based on query conditions, and group the results.
SELECT r_reason_id,r_reason_desc,AVG(r_reason_sk) FROM tpcds.reason GROUP BY GROUPING SETS((r_reason_id,r_reason_sk),r_reason_sk);

-- Example of the UNION subquery: Merge the names started with W or N in the r_reason_desc column in the tpcds.reason table.
SELECT r_reason_sk, tpcds.reason.r_reason_desc
    FROM tpcds.reason
    WHERE tpcds.reason.r_reason_desc LIKE 'W%'
UNION
SELECT r_reason_sk, tpcds.reason.r_reason_desc
    FROM tpcds.reason
    WHERE tpcds.reason.r_reason_desc LIKE 'N%';

-- Example of the NLS_SORT subquery: Sort by Chinese Pinyin.
SELECT * FROM tpcds.reason ORDER BY NLSSORT( r_reason_desc, 'NLS_SORT = SCHINESE_PINYIN_M');


-- Case-insensitive order:
SELECT * FROM tpcds.reason ORDER BY NLSSORT( r_reason_desc, 'NLS_SORT = generic_m_ci');

-- Create the table tpcds.reason_p:
CREATE TABLE tpcds.reason_p
(
  r_reason_sk integer,
  r_reason_id character(16),
  r_reason_desc character(100)
)
PARTITION BY RANGE (r_reason_sk)
(
  partition P_05_BEFORE values less than (05),
  partition P_15 values less than (15),
  partition P_25 values less than (25),
  partition P_35 values less than (35),
  partition P_45_AFTER values less than (MAXVALUE)
)
;

-- Insert data:
INSERT INTO tpcds.reason_p values(3,'AAAAAAAABAAAAAAA','reason 1'),(10,'AAAAAAAABAAAAAAA','reason 2'),(4,'AAAAAAAABAAAAAAA','reason 3'),(10,'AAAAAAAABAAAAAAA','reason 4'),(10,'AAAAAAAABAAAAAAA','reason 5'),(20,'AAAAAAAACAAAAAAA','reason 6'),(30,'AAAAAAAACAAAAAAA','reason 7');

-- Example of the PARTITION subquery: Obtain data from the P_05_BEFORE partition in the tpcds.reason_p table.
 SELECT * FROM tpcds.reason_p PARTITION (P_05_BEFORE);
 r_reason_sk |   r_reason_id    |   r_reason_desc                   
-------------+------------------+------------------------------------
           4 | AAAAAAAABAAAAAAA | reason 3                          
           3 | AAAAAAAABAAAAAAA | reason 1                          
(2 rows)

-- Example of the GROUP BY subquery: Group records in the tpcds.reason_p table by r_reason_id, and count the number of records in each group.
SELECT COUNT(*),r_reason_id FROM tpcds.reason_p GROUP BY r_reason_id;
 count |   r_reason_id    
-------+------------------
     2 | AAAAAAAACAAAAAAA
     5 | AAAAAAAABAAAAAAA
(2 rows)

-- Example of the GROUP BY CUBE subquery: Filter data based on query conditions, and group the results.
SELECT * FROM tpcds.reason GROUP BY  CUBE (r_reason_id,r_reason_sk,r_reason_desc);

-- Example of the GROUP BY GROUPING SETS subquery: Filter data based on query conditions, and group the results.
SELECT * FROM tpcds.reason GROUP BY  GROUPING SETS ((r_reason_id,r_reason_sk),r_reason_desc);

-- Example of the HAVING subquery: Group records in the tpcds.reason_p table by r_reason_id, count the number of records in each group, and display only values whose number of r_reason_id is greater than 2.
SELECT COUNT(*) c,r_reason_id FROM tpcds.reason_p GROUP BY r_reason_id HAVING c>2;
 c |   r_reason_id    
---+------------------
 5 | AAAAAAAABAAAAAAA
(1 row)

-- Example of the IN subquery: Group records in the tpcds.reason_p table by r_reason_id, count the number of records in each group, and display only the numbers of records whose r_reason_id is AAAAAAAABAAAAAAA or AAAAAAAADAAAAAAA.
SELECT COUNT(*),r_reason_id FROM tpcds.reason_p GROUP BY r_reason_id HAVING r_reason_id IN('AAAAAAAABAAAAAAA','AAAAAAAADAAAAAAA'); 
count |   r_reason_id    
-------+------------------
     5 | AAAAAAAABAAAAAAA
(1 row)

-- Example of the INTERSECT subquery: Query records whose r_reason_id is AAAAAAAABAAAAAAA and whose r_reason_sk is smaller than 5.
SELECT * FROM tpcds.reason_p WHERE r_reason_id='AAAAAAAABAAAAAAA' INTERSECT SELECT * FROM tpcds.reason_p WHERE r_reason_sk<5;
 r_reason_sk |   r_reason_id    |     r_reason_desc                 
-------------+------------------+------------------------------------
           4 | AAAAAAAABAAAAAAA | reason 3                           
           3 | AAAAAAAABAAAAAAA | reason 1                           
(2 rows)

-- Example of the EXCEPT subquery: Query records whose r_reason_id is AAAAAAAABAAAAAAA and whose r_reason_sk is greater than or equal to 4.
SELECT * FROM tpcds.reason_p WHERE r_reason_id='AAAAAAAABAAAAAAA' EXCEPT SELECT * FROM tpcds.reason_p WHERE r_reason_sk<4;
r_reason_sk |   r_reason_id    |      r_reason_desc                  
-------------+------------------+------------------------------------
          10 | AAAAAAAABAAAAAAA | reason 2                          
          10 | AAAAAAAABAAAAAAA | reason 5                          
          10 | AAAAAAAABAAAAAAA | reason 4                          
           4 | AAAAAAAABAAAAAAA | reason 3                          
(4 rows)

-- Delete the tables:
DROP TABLE tpcds.reason_p;