• Data Warehouse Service

dws
  1. Help Center
  2. Data Warehouse Service
  3. Developer Guide
  4. SQL Reference
  5. SQL Syntax
  6. CREATE FOREIGN TABLE (for OBS Import and Export)

CREATE FOREIGN TABLE (for OBS Import and Export)

Function

CREATE FOREIGN TABLE creates an OBS foreign table in the current database for parallel data import and export.

An OBS foreign table can be set to READ ONLY or WRITE ONLY. The default value is READ ONLY. To import data to the cluster, use READ ONLY for the foreign table. To export data, use WRITE ONLY.

Precautions

  • The foreign table is owned by the user who runs the command.
  • The distribution mode of an OBS foreign table does not need to be explicitly specified. The default mode is ROUNDROBIN.
  • The OBS table takes effect only for Informational Constraint.

Syntax

CREATE FOREIGN TABLE [ IF NOT EXISTS  ] table_name 
( { column_name type_name [column_constraint ]
    | LIKE source_table | table_constraint [, ...]} [, ...] ) 
SEVER gsmpp_server 
OPTIONS (  { option_name ' value '  }  [, ...] ) 
[  { WRITE ONLY  |  READ ONLY  }] 
[ WITH error_table_name | LOG INTO error_table_name] 
[PER NODE REJECT LIMIT 'value']  ;
  • column_constraint is as follows:
    [CONSTRAINT constraint_name]
    {PRIMARY KEY | UNIQUE}
    [NOT ENFORCED [ENABLE QUERY OPTIMIZATION | DISABLE QUERY OPTIMIZATION] | ENFORCED]
  • table_constraint is as follows:
    [CONSTRAINT constraint_name]
    {PRIMARY KEY | UNIQUE} (column_name)
    [NOT ENFORCED [ENABLE QUERY OPTIMIZATION | DISABLE QUERY OPTIMIZATION] | ENFORCED]

Parameter Overview

CREATE FOREIGN TABLE provides multiple parameters, which are classified as follows:

Parameter Description

  • IF NOT EXISTS

    Does not throw an error if a table with the same name exists. A notice is issued in this case.

  • table_name

    Specifies the name of the foreign table to be created.

    Valid value: a string compliant with the naming convention.

  • column_name

    Specifies the name of a column in the foreign table.

    Valid value: a string compliant with the naming convention.

  • type_name

    Specifies the data type of the column.

  • SERVER gsmpp_server

    Specifies the server name of the foreign table. In the OBS foreign table, its server gsmpp_server is created by the initial database.

  • OPTIONS ( { option_name ' value ' } [, ...] )

    Specifies parameters of foreign table data.

    • encrypt

      Specifies whether to enable HTTPS for data transfer. The default value is on indicating that HTTPS is enabled. If it is disabled, HTTP is used.

    • access_key

      Indicates the access key (AK, obtained from the user information on the console) used for the OBS access protocol. When you create a foreign table, its AK value is encrypted and saved to the metadata table of the database.

    • secret_access_key:

      Indicates the secret access key (SK, obtained from the user information on the console) used for the OBS access protocol. When you create a foreign table, its SK value is encrypted and saved to the metadata table of the database.

    • chunksize

      Specifies the cache read by each OBS thread on a DN. Its value range is 8 to 512 in the unit of MB. Its default value is 64.

    • location

      Specifies the data source location of a foreign table. Currently, only URLs are allowed. Multiple URLs are separated using vertical bars (|).

      NOTE:
      • FThe URL of a read-only foreign table (the default permission is read-only) can end with the path prefix or the full path of the target object in the format of obs://Bucket/Prefix or full path Prefix indicates the prefix of an object path, for example, obs://mybucket/tpch/nation/.
      • If the region parameter is explicitly specified in obs://Bucket/Prefix, the value of region will be read. If the region parameter is not specified, the value of defaultRegion will be read.
      • The URL of a writable foreign table does not need to contain a file name. You can specify only one data source location for a foreign table. The directory corresponding to the location must be created before you specify the location.
      • URLs specified for a read-only foreign table must be different.

      When importing and exporting data, you are advised to use the location parameter as follows:

      • You are advised to specify a file name for location during data import. If you only specify an OBS bucket or directory, all text files in it will be imported. An error message will be reported if the data format is incorrect. If you set fault tolerance, a large amount of data may be imported to the fault-tolerant table.
      • Multiple files in an OBS bucket can be imported at the same time. The matched files are imported based on the file name prefix.
        For example, you can identify and import the following two files after specifying the prefix mybucket/input_data/product_info in location:
        mybucket/input_data/product_info.0
        mybucket/input_data/product_info.1
      • If you specify a file name, for example, 1.csv, then other files (like 1.csv1 or 1.csv22) starting with 1.csv in the bucket or directory where 1.csv resides will be automatically imported. That is, files, such as 1.csv1 and 1.csv22, are automatically imported.
      • During import, location supports multiple OBS paths, which are separated with vertical bars (|).
      • During data export, a directory is generated for location by default. If you specify only a file name, the system automatically creates a directory whose name starts with the file name and then generates the file that stores the exported data. The file name is automatically generated by DWS.
      • You can specify one path for location only during data export.
    • region

      (Optional) Specifies the value of regionCode, region information on the cloud.

      If the region parameter is explicitly specified, the value of region will be read. If the region parameter is not specified, the value of defaultRegion will be read.

      NOTE:

      Note the following when setting parameters for importing or exporting OBS foreign tables in TEXT or CSV format:

      • The location parameter is mandatory. The prefixes gsobs and obs indicate file locations on OBS. The gsobs prefix should be followed by obs url, bucket, and prefix. The obs prefix should be followed by bucket or prefix.
      • The data sources of multiple buckets are separated by vertical bars (|), for example, LOCATION 'obs://bucket1/folder/ | obs://bucket2/'. The database scans all objects in the specified folders.
    • format

      Specifies the format of the data source file in a foreign table.

      Valid value: CSV and TEXT. The default value is TEXT. DWS only supports CSV and TEXT formats.

      • CSV (comma-separated format):
        • The CSV file can process linefeeds efficiently, but cannot process certain special characters very well.
        • A CSV file is composed of records that are separated as columns by delimiters. Each record shares the same column sequence.
      • TEXT (text format):
        • Records are separated as columns by linefeed. The TEXT file can process special characters efficiently, but cannot process linefeeds well.
    • header

      Specifies whether the data file contains a table header. header is available only for CSV files.

      When data is imported, if header is on, the first row of the data file will be identified as the header and ignored. If header is off, the first row is identified as data.

      Valid value: true, on, false, and off. The default value is false or off.

    • delimiter

      Specifies the column delimiter of data. Use the default delimiter if it is not set. The default delimiter of TEXT is a tab and that of CSV is a comma (,).

      NOTE:
      • The delimiter of TEXT cannot be \r or \n.
      • A delimiter cannot be the same as the null value. The delimiter for the CSV format cannot be same as the quote value.
      • The delimiter for the TEXT format data cannot contain any of the following characters: \.abcdefghijklmnopqrstuvwxyz0123456789.
      • The data length of a single row should be less than 1 GB. If the delimiters are too long and there are too many rows, the length of valid data will be affected.
      • You are advised to use a multi-character, such as the combination of the dollar sign ($), caret (^), and ampersand (&), or invisible characters, such as 0x07, 0x08, and 0x1b as the delimiter.

      Value range:

      The value of delimiter can be a multi-character delimiter whose length is less than or equal to 10 bytes.

    • quote

      Specifies the quotation mark for the CSV format. The default value is a double quotation mark (").

      NOTE:
      • The quote value cannot be the same as the delimiter or null value.
      • The quote value must be a single-byte character.
      • Invisible characters are recommended as quote values, such as 0x07, 0x08, and 0x1b.
    • escape

      Specifies escape characters when the CSV format is used. The value must be a single-byte character.

      The default value is double quotation marks (""). If it is the same as the quote value, it will be replaced with \0.

    • null
      Specifies the string that represents a null value.
      NOTE:
      • The null value cannot be \r or \n. The maximum length is 100 characters.
      • The null value cannot be the same as the delimiter or quote value.

      Value range:

      • The default value is \N for the TEXT format.
      • The default value for the CSV format is an empty string without quotation marks.
    • noescaping

      Specifies whether to escape the backslash (\) and its following characters in the TEXT format.

      NOTE:

      noescaping is available only for the TEXT format.

      Valid value: true, on, false, and off. The default value is false or off.

    • encoding

      Specifies the encoding of a data file, that is, the encoding used to parse, check, and generate a data file. Its default value is the default client_encoding value of the current database.

      Before you import foreign tables, it is recommended that you set the encoding to the file encoding format, or a format matching the character set of the file. Otherwise, unnecessary parsing and check errors may occur, leading to import errors, rollback, or even invalid data import. Before exporting foreign tables, you are also advised to specify this parameter, because the export result using the default character set may not be what you expect.

      If this parameter is not specified when you create a foreign table, a warning message will be displayed on the client.

      NOTE:

      Currently, OBS cannot parse a file using multiple character sets during foreign table import.

      Currently, OBS cannot write a file using multiple character sets during foreign table export.

    • fill_missing_fields

      Specifies how to handle the problem that the last column of a row in the source file is lost during data import.

      Valid value: true, on, false, and off. The default value is false or off.

      • If this parameter is set to true or on and the last column of a data row in a data source file is lost, the column will be replaced with NULL and no error message will be generated.
      • If this parameter is set to false or off and the last column of a data row in a data source file is lost, the following error information will be displayed:
        missing data for column "tt"
    • ignore_extra_data

      Specifies whether to ignore excessive columns when the number of columns in a source data file exceeds that defined in the foreign table. This parameter is available only for data import.

      Valid value: true, on, false, and off. The default value is false or off.

      • If this parameter is set to true or on and the number of data source files exceeds the number of foreign table columns, excessive columns will be ignored.
      • If this parameter is set to false or off and the number of data source files exceeds the number of foreign table columns, the following error information will be displayed:
        extra data after last expected column

      If the linefeed at the end of a row is lost and this parameter is set to true, data in the next row will be ignored.

    • reject_limit

      Specifies the maximum number of data format errors allowed during a data import task. If the number of errors does not reach the maximum number, the data import task can still be executed.

      You are advised to replace this syntax with PER NODE REJECT LIMIT 'value'.

      Examples of data format errors include the following: a column is lost, an extra column exists, a data type is incorrect, and encoding is incorrect. Once a non-data format error occurs, the whole data import process is stopped.

      Value range: an integer and unlimited.

      The default value is 0, indicating that error information is returned immediately.

    • eol

      Specifies the linefeed style of the exported data file.

      This syntax is available only for the WRITE ONLY foreign table.

      Valid value: 0x0D0A and 0x0A.

    • date_format

      Specifies the DATE format for data import. This syntax is available only for READ ONLY foreign tables.

      Valid value: any valid DATE value. For details, see Date and Time Processing Functions and Operators.

      NOTE:

      If Oracle is specified as the compatible database, the DATE format is TIMESTAMP. For details, see timestamp_format below.

    • time_format

      Specifies the TIME format for data import. This syntax is available only for READ ONLY foreign tables.

      Valid value: any valid TIME format. Time zones cannot be used. For details, see Date and Time Processing Functions and Operators.

    • timestamp_format

      Specifies the TIMESTAMP format for data import. This syntax is available only for READ ONLY foreign tables.

      Valid value: any valid TIMESTAMP format. Time zones are not supported. For details, see Date and Time Processing Functions and Operators.

    • smalldatetime_format

      Specifies the SMALLDATETIME format for data import. This syntax is available only for READ ONLY foreign tables.

      Valid value: any valid SMALLDATETIME value. For details, see Date and Time Processing Functions and Operators.

    • compatible_illegal_chars

      Specifies whether to enable fault tolerance on invalid characters during data import. This syntax is available only for READ ONLY foreign tables.

      Valid value: true, on, false, and off. The default value is false or off.

      • If this parameter is set to true or on, invalid characters are tolerated and imported to the database after conversion.
      • If this parameter is set to false or off and an error occurs when there are invalid characters, the import will be interrupted.

      On a Windows platform, if OBS reads data files using the TEXT format, 0x1A will be treated as an EOF symbol and a parsing error will occur. It is the implementation constraint of the Windows platform. Since OBS on a Windows platform does not support BINARY read, the data can be read by OBS on a Linux platform.

      NOTE:

      The rule of error tolerance for invalid characters imported is as follows:

      (1) \0 is converted to a space.

      (2) Other invalid characters are converted to question marks.

      (3) If compatible_illegal_chars is set to true or on, invalid characters are tolerated. If NULL, DELIMITER, QUOTE, and ESCAPE are set to a spaces or question marks, errors like "illegal chars conversion may confuse COPY escape 0x20" will be displayed to prompt users to change parameter values that cause confusion, preventing import errors.

  • READ ONLY

    Specifies whether a foreign table is read-only. This parameter is available only for data import.

  • WRITE ONLY

    Specifies whether a foreign table is write-only. This parameter is available only for data import.

  • WITH error_table_name

    Specifies the table where data format errors generated during parallel data import are recorded. You can query the error information table after data is imported to obtain error details. This parameter is available only after reject_limit is set.

    NOTE:

    To be compatible with postgres open source interfaces, you are advised to replace this syntax with LOG INTO. When this parameter is specified, an error table is automatically created.

    Valid value: a string compliant with the naming convention.

  • LOG INTO error_table_name

    Specifies the table where data format errors generated during parallel data import are recorded. You can query the error information table after data is imported to obtain error details.

    NOTE:
    • This parameter is available only after PER NODE REJECT LIMIT is set.
    • When this parameter is specified, an error table is automatically created.

    Valid value: a string compliant with the naming convention.

  • PER NODE REJECT LIMIT 'value'

    Specifies the maximum number of data format errors on each DN during data import. If the number of errors exceeds the specified value on any DN, data import fails, an error is reported, and the system exits data import.

    This syntax specifies the error tolerance of a single node.

    Examples of data format errors include the following: a column is lost, an extra column exists, a data type is incorrect, and encoding is incorrect. When a non-data format error occurs, the whole data import process stops.

    Valid value: an integer and unlimited. The default value is 0, indicating that error information is returned immediately.

  • NOT ENFORCED

    Specifies the constraint to be an informational constraint. This constraint is guaranteed by the user instead of the database.

  • ENFORCED

    The default value is ENFORCED. ENFORCED is a reserved parameter and is currently not supported.

  • PRIMARY KEY (column_name)

    Specifies the informational constraint on column_name.

    Value range: a string. It must comply with the naming convention, and the value of column_name must exist.

  • ENABLE QUERY OPTIMIZATION

    Optimizes the query plan using an informational constraint.

  • DISABLE QUERY OPTIMIZATION

    Disables the optimization of the query plan using an informational constraint.

Example

-- Create a foreign table to import data in the TXT format from OBS to the row_tbl table.
drop foreign table if exists OBS_ft;
NOTICE:  foreign table "obs_ft" does not exist, skipping
DROP FOREIGN TABLE
create foreign table OBS_ft( a int, b int)SERVER gsmpp_server OPTIONS (location 'obs://gaussdbcheck/obs_ddl/test_case_data/txt_obs_informatonal_test001',format 'text',encoding 'utf8',chunksize '32', encrypt 'on',ACCESS_KEY 'access_key_value_to_be_replaced',SECRET_ACCESS_KEY 'secret_access_key_value_to_be_replaced',delimiter E'\x08') read only;
CREATE FOREIGN TABLE
drop table row_tbl;
DROP TABLE
create table row_tbl( a int, b int);
NOTICE:  The 'DISTRIBUTE BY' clause is not specified. Using 'a' as the distribution column by default.
HINT:  Please use 'DISTRIBUTE BY' clause to specify suitable data distribution column.
CREATE TABLE
insert into row_tbl select * from OBS_ft;
INSERT 0 3