• Data Warehouse Service

  1. Help Center
  2. Data Warehouse Service
  3. Developer Guide
  4. Querying External Data
  5. Querying Data on OBS
  6. Creating a Foreign Table

Creating a Foreign Table

After steps in Creating a Foreign Server are performed, create an OBS foreign table in the DWS database to access the data stored on OBS. An OBS foreign table is read-only. It can only be queried using SELECT.

Creating a Foreign Table

The syntax for creating a foreign table is as follows. For details, see CREATE FOREIGN TABLE (SQL on Hadoop or OBS).

( [ { column_name type_name 
    [ { [CONSTRAINT constraint_name] NULL |
    [CONSTRAINT constraint_name] NOT NULL |
      column_constraint [...]} ] |
      table_constraint [, ...]} [, ...] ] ) 
    SERVER dfs_server 
    OPTIONS ( { option_name ' value ' } [, ...] ) 
    [ PARTITION BY ( column_name ) [ AUTOMAPPED ] ] ;

For example, when creating a foreign table named product_info_ext_obs, set parameters in the syntax as follows:

  • table_name

    Specifies the name of the foreign table to be created.

  • Table field definitions
    • column_name: specifies the name of a column in the foreign table.
    • type_name: specifies the data type of the column.

    Multiple columns are separate by commas (,).

    The number of fields and field types in the foreign table must be the same as those in the data stored on OBS For details about the data types, see Data Types.

  • SERVER dfs_server

    This parameter specifies the foreign server name of the foreign table. This server must exist. The foreign server connects to OBS to read data by setting its foreign server.

    Enter the name of the foreign server created by following steps in Creating a Foreign Server.

  • OPTIONS parameters

    These are parameters associated with the foreign table. The key parameters are as follows:

    • format: indicates the file format on OBS. Only the ORC format is supported.
    • foldername: This parameter is mandatory. It specifies the OBS path of the source data file. You only need to enter /Bucket name/Directory level, which does not contain the IP address or domain name of OBS.
      You can perform 2 in Preparing Data on OBS to obtain the complete OBS path of the source data file. The path may be in either of the following formats, which contain the bucket name and the directory level.
      • https://IP address or domain name of the storage server/Bucket name/Directory level/Object name
      • https://Bucket name.Domain name/Directory level/Object name
    • totalrows: This parameter is optional. It does not indicate the total rows of the imported data. Because OBS may store many files, it is slow to analyze data. This parameter allows you to set an estimated value so that the optimizer can estimate the table size according to the value. Generally, query efficiency is relatively high when the estimated value is almost the same as the actual value.
    • encoding: specifies the encoding format of source data files in foreign tables. Its default value is utf8. This parameter is mandatory for OBS foreign tables.

    This clause is mandatory. Currently, OBS foreign tables support only the ROUNDROBIN distribution mode.

    It indicates that when a foreign table reads data from the data source, each node in the data warehouse cluster randomly reads some data and integrates the random data to a complete data set.

  • Other parameters in the syntax

    Other parameters are optional. You can set them as required. In this example, you do not need to set these parameters. For details, see CREATE FOREIGN TABLE (SQL on Hadoop or OBS).

Based on the preceding settings, the command for creating the foreign table is as follows:

DROP FOREIGN TABLE IF EXISTS product_info_ext_obs;

---Create an OBS foreign table that does not contain partition columns. The foreign server associated with the table is obs_server, the file format on OBS corresponding to the table is ORC, and the data storage path on OBS is/mybucket/data/.

CREATE FOREIGN TABLE product_info_ext_obs
    product_price                integer        not null,
    product_id                   char(30)       not null,
    product_time                 date           ,
    product_level                char(10)       ,
    product_name                 varchar(200)   ,
    product_type1                varchar(20)    ,
    product_type2                char(10)       ,
    product_monthly_sales_cnt    integer        ,
    product_comment_time         date           ,
    product_comment_num          integer        ,
    product_comment_content      varchar(200)                      
) SERVER obs_server 
format 'orc', 
foldername '/mybucket/demo.db/product_info_orc/',
encoding 'utf8',
totalrows '10'