• Data Warehouse Service

dws
  1. Help Center
  2. Data Warehouse Service
  3. Developer Guide
  4. SQL Reference
  5. SQL Syntax
  6. CREATE FOREIGN TABLE (for GDS Import and Export)

CREATE FOREIGN TABLE (for GDS Import and Export)

CREATE FOREIGN TABLE creates a GDS foreign table.

Function

CREATE FOREIGN TABLE creates a GDS foreign table in the current database for concurrent data import and export. The GDS foreign table can be read-only or write-only, used for concurrent data import and export, respectively. The OBS foreign table is read-only by default.

Precautions

  • The foreign table is owned by the user who runs the command.
  • The distribution mode of a GDS foreign table does not need to be explicitly specified. The default mode is ROUNDROBIN.
  • All constraints (including column and row constraints) are invalid to the GDS foreign table.

Syntax

CREATE FOREIGN TABLE [ IF NOT EXISTS  ] table_name 
    ( [  { column_name type_name POSITION(offset,length) | LIKE source_table } [, ...]  ] ) 
    SERVER gsmpp_server 
    OPTIONS (  { option_name ' value '  }  [, ...] ) 
    [  { WRITE ONLY  |  READ ONLY  }] 
    [ WITH error_table_name | LOG INTO error_table_name] 
    [REMOTE LOG 'name'] 
    [PER NODE REJECT LIMIT 'value']
    [ TO { GROUP groupname | NODE ( nodename [, ... ] ) } ];

Parameter Overview

CREATE FOREIGN TABLE provides multiple parameters, which are classified as follows:

Parameter Description

  • IF NOT EXISTS

    Does not throw an error if a table with the same name exists. A notice is issued in this case.

  • table_name

    Specifies the name of the foreign table to be created.

    Valid value: a string compliant with the naming convention.

  • column_name

    Specifies the name of a column in the foreign table.

    Valid value: a string compliant with the naming convention.

  • type_name

    Specifies the data type of the column.

  • POSITION(offset,length)

    Specifies the location of each column in the source data file in fixed length mode.

    NOTE:

    offset is the start of the column in the source file, and length is the length of the column.

    Valid value: offset must be greater than 0 bytes, and its unit is byte.

    The length of each record must be less than 1 GB. By default, columns not in the file are replaced with null.

  • SERVER gsmpp_server

    Specifies the server name of the foreign table. The server gsmpp_server of the foreign table is created by the initial database.

  • OPTIONS ( { option_name ' value ' } [, ...] )

    Specifies parameters of foreign table data.

    • location

      Specifies the data source location of a foreign table. Currently, only URLs are allowed. Multiple URL files are separated using vertical bars (|).

      NOTE:
      • For a read-only foreign table imported by GDS from a remote server in parallel, its URL must end with its corresponding schema or file name. (Read-only is the default file attribute.)

        Example: gsfs://192.168.0.90:5000/* or gsfs:// 192.168.0.90:5000/* | gsfs:// 192.168.0.91:5000/*

      • For a writable foreign table imported by GDS from a remote server in parallel, its URL does not need to contain a file name. You can specify multiple remote URLs, for example, gsfs:// 192.168.0.90:5000/, to export data. If the number of URLs is less than or equal to the number of DNs, data exported through a foreign table will be evenly distributed to each URL. If the number of URLs is greater than the number of DNs, data will be evenly distributed to the first nth URLs, where n indicates the number of DNs. Blank data files will be created under the rest of the URLs.
      • For a read-only foreign table imported by GDS from a remote server in parallel, the number of URLs must be less than the number of DNs, and URLs in the same location cannot be used.
      • If the URL begins with gsfss://, data is imported and exported in encryption mode, and DOP cannot exceed 10.
    • format

      Specifies the format of the source data file in a foreign table.

      Valid value: CSV, TEXT, and FIXED. The default value is TEXT. DWS only supports CSV and TEXT formats.

      • The CSV file can process linefeeds efficiently, but cannot process certain special characters very well.
      • The TEXT file can process certain special characters efficiently, but cannot process linefeeds very well.
      • The FIXED file can process linefeeds in data columns efficiently, but cannot process special characters very well.
      NOTE:

      FIXED is defined as follows: (POSITION must be specified for each column when FIXED is used.)

      1. The column length of each record is the same.
      2. Spaces are added to short columns. Digit types must be left-aligned, and columns must be right-aligned.
      3. No delimiters are used between columns.
    • header

      Specifies whether the data file contains a table header. header is used only for CSV and FIXED files.

      When data is imported, if header is on, the first row of the data file will be identified as the header and ignored. If header is off, the first row is identified as data.

      When data is exported, if header is on, fileheader must be specified. fileheader specifies the format of the exported header file. If header is off, the exported file does not include a header.

      Valid value: true, on, false, and off. The default value is false or off.

    • fileheader

      Specifies a file that defines the contents in the header for exported data. The file contains data description of each column.

      For example, to add a header in a file containing product information, define the file as follows:

      The information of products.\n
      • Available only when the header is on or true.
      • In Remote mode, the definition file must be put to the working directory of the GDS (the -d directory specified when the GDS is started).
      • The definition file can contain only one row of header information, and end with a linefeed. Excess rows will be discarded. (Header information cannot contain a linefeed).
      • The length of the definition file including the linefeed cannot exceed 1 MB.
    • out_filename_prefix

      Specifies the name prefix of the data file exported using GDS from a write-only foreign table.

      • The file name prefix must be valid and compliant with the restrictions of the file system in the physical environment where the GDS is deployed. Otherwise, the file will fail to be created.
        • The file name prefix can contain only lowercase letters, uppercase letters, digits, and underscore (_).
        • The file name prefix cannot contain feature fields reserved for the Windows and Linux OS, including but not limited to:

          "con","aux","nul","prn","com0","com1","com2","com3","com4","com5","com6","com7","com8","com9","lpt0","lpt1","lpt2","lpt3","lpt4","lpt5","lpt6","lpt7","lpt8","lpt9"

        • The total length of the absolute path consisting of the exported file prefix, the path specified by gds –d, and .dat.x should be as required by the file system where the GDS is deployed.
        • It is required that the prefix can be correctly parsed and identified by the receiver of the data file (including but not limited to the original database where it was exported). Identify and modify the option that causes the file name resolution problem (if any).
      • To export files in high concurrency scenarios, do not use the same file name prefix for them. Otherwise, the exported files may overwrite each other or be lost in the OS or file system.
    • delimiter

      Specifies the column delimiter of data, and uses the default delimiter if it is not set. The default delimiter of TEXT is a tab and that of CSV is a comma (,). No delimiter is used in FIXED format.

      NOTE:
      • A delimiter cannot be \r or \n.
      • A delimiter cannot be the same as the null value. The delimiter for the CSV format cannot be same as the quote value.
      • The delimiter for the TEXT format data cannot contain any of the following characters: \.abcdefghijklmnopqrstuvwxyz0123456789.
      • The data length of a single row should be less than 1 GB. If the delimiters are too long and there are too many rows, the length of valid data will be affected.
      • You are advised to use a multi-character, such as the combination of the dollar sign ($), caret (^), and ampersand (&), or invisible characters, such as 0x07, 0x08, and 0x1b as the delimiter.

      Valid value:

      The value of delimiter can be a multi-character delimiter whose length is less than or equal to 10 bytes.

    • quote

      Specifies the quotation mark for the CSV format. The default value is a double quotation mark (").

      NOTE:
      • The quote value cannot be the same as the delimiter or null value.
      • The quote value must be a single-byte character.
      • Invisible characters are recommended as quote values, such as 0x07, 0x08, and 0x1b.
    • escape

      Specifies escape characters when the CSV format is used. The value must be a single-byte character.

      The default value is double quotation marks (""). If it is the same as the quote value, it will be replaced with \0.

    • null
      Specifies the string that represents a null value.
      NOTE:
      • The null value cannot be \r or \n. The maximum length is 100 characters.
      • The null value cannot be the same as the delimiter or quote value.

      Valid value:

      • The default value is \n for the TEXT format.
      • The default value for the CSV format is an empty string without quotation marks.
    • noescaping

      Specifies in TEXT format, whether to escape the backslash (\) and its following characters.

      NOTE:

      noescaping is available only for the TEXT format.

      Valid value: true, on, false, and off. The default value is false or off.

    • encoding

      Specifies the encoding of a data file, that is, the encoding used to parse, check, and generate a data file. Its default value is the default client_encoding value of the current database.

      Before you import foreign tables, it is recommended that you set the encoding to the file encoding format, or a format matching the character set of the file. Otherwise, unnecessary parsing and check errors may occur, leading to import errors, rollback, or even invalid data import. Before exporting foreign tables, you are also advised to specify this parameter, because the export result using the default character set may not be what you expect.

      If this parameter is not specified when you create a foreign table, a warning message will be displayed on the client.

      NOTE:

      Currently, GDS cannot parse or write in a file using multiple encoding formats during foreign table import or export.

    • fill_missing_fields

      Specifies how to handle the problem that the last column of a row in the source file is lost during data import.

      Valid value: true, on, false, and off. The default value is false or off.

      • If this parameter is set to true or on and the last column of a data row in a source data file is lost, the column will be replaced with NULL and no error message will be generated.
      • If this parameter is set to false or off and the last column of a data row in a source data file is lost, the following error information will be displayed:
        missing data for column "tt"
    • ignore_extra_data

      Specifies whether to ignore excessive columns when the number of source data files exceeds the number of foreign table columns. This parameter is available for data import.

      Valid value: true, on, false, and off. The default value is false or off.

      • If this parameter is set to true or on and the number of source data files exceeds the number of foreign table columns, excessive columns will be ignored.
      • If this parameter is set to false or off and the number of source data files exceeds the number of foreign table columns, the following error information will be displayed:
        extra data after last expected column

      If the linefeed at the end of a row is lost and this parameter is set to true, data in the next row will be ignored.

    • reject_limit

      Specifies the maximum number of data format errors allowed during a data import task. If the number of errors does not reach the maximum number, the data import task can still be executed.

      You are advised to replace this syntax with PER NODE REJECT LIMIT 'value'.

      Examples of data format errors include the following: a column is lost, an extra column exists, a data type is incorrect, and encoding is incorrect. Once a non-data format error occurs, the whole data import process is stopped.

      Valid value: a positive integer and unlimited

      The default value is 0, indicating that error information is returned immediately.

      NOTE:

      Enclose positive integer values with single quotation marks ('').

    • mode

      Specifies the data import strategy during a specific data import process.

      Valid value:

      Normal (default): All file types (CSV, TEXT, FIXED) are supported. To import data, enable GDS first.

    • eol

      Specifies the linefeed style of the exported data file.

      This syntax is available only for the WRITE ONLY foreign table.

      Valid value: 0x0D0A and 0x0A.

    • fix

      Specifies the length of fixed-length data. The unit is byte. This syntax is available only for READ ONLY foreign tables.

      Valid value: Less than 1GB and greater than or equal to Total length specified by POSITION (The total length is the sum of offset and length in the last column of the table definition.)

    • out_fix_alignment

      Specifies how the columns of the types BYTEAOID, CHAROID, NAMEOID, TEXTOID, BPCHAROID, VARCHAROID, NVARCHAR2OID, and CSTRINGOID are aligned during fixed-length export.

      Valid value: align_left and align_right.

      Default value: align_right

      The bytea data type must be in hexadecimal format (for example, \XXXX) or octal format (for example, \XXX\XXX\XXX). The data to be imported must be left-aligned (that is, the column data starts with two formats but not a space). Therefore, if the exported file needs to be imported using a GDS table and the data length is less than the specified length of the GDS table, the exported file must be aligned to the left. Otherwise, an error will be reported during the import.

    • date_format

      Specifies the DATE format for data import. This syntax is available only for READ ONLY foreign tables.

      Valid value: any valid DATE format. For details, see Date and Time Processing Functions and Operators.

      NOTE:

      If Oracle is specified as the compatible database, the DATE format is TIMESTAMP. For details, see timestamp_format below.

    • time_format

      Specifies the TIME format for data import. This syntax is available only for READ ONLY foreign tables.

      Valid value: any valid TIME format. Time zones cannot be used. For details, see Date and Time Processing Functions and Operators.

    • timestamp_format

      Specifies the TIMESTAMP format for data import. This syntax is available only for READ ONLY foreign tables.

      Valid value: any valid TIMESTAMP format. Time zones are not supported. For details, see Date and Time Processing Functions and Operators.

    • smalldatetime_format

      Specifies the SMALLDATETIME format for data import. This syntax is available only for READ ONLY foreign tables.

      Valid value: any valid SMALLDATETIME value. For details, see Date and Time Processing Functions and Operators.

    • compatible_illegal_chars

      Specifies whether to enable fault tolerance on invalid characters during data import. This syntax is available only for READ ONLY foreign tables.

      Valid value: true, on, false, and off. The default value is false or off.

      • If this parameter is set to true or on, invalid characters are tolerated and imported to the database after conversion.
      • If this parameter is set to false or off and an error occurs when there are invalid characters, the import will be interrupted.

      On a Windows platform, if GDS reads data files using the Text format, 0x1A will be treated as an EOF symbol and parse error will occur. It is the implementation constraint of a Windows platform. Since GDS does not support BINARY read, the data can be read by GDS on a Linux platform.

      NOTE:

      The rule of error tolerance for invalid characters imported is as follows:

      (1) \0 is converted to a space.

      (2) Other invalid characters are converted to question marks.

      (3) If compatible_illegal_chars is set to true or on, invalid characters are tolerated. If NULL, DELIMITER, QUOTE, and ESCAPE are set to a spaces or question marks, errors like "illegal chars conversion may confuse COPY escape 0x20" will be displayed to prompt users to change parameter values that cause confusion, preventing import errors.

  • READ ONLY

    Specifies whether a foreign table is read-only. This parameter is available only for data import.

  • WRITE ONLY

    Specifies whether a foreign table is write-only. This parameter is available only for data import.

  • WITH error_table_name

    Specifies the table where data format errors generated during parallel data import are recorded. You can query the error information table after data is imported to obtain error details. This parameter is available only after reject_limit is set.

    NOTE:

    To be compatible with postgres open source interfaces, you are advised to replace this syntax with LOG INTO.

    Valid value: a string compliant with the naming convention.

  • LOG INTO error_table_name

    Specifies the table where data format errors generated during parallel data import are recorded. You can query the error information table after data is imported to obtain error details.

    NOTE:

    This parameter is available only after PER NODE REJECT LIMIT is set.

    Valid value: a string compliant with the naming convention.

  • REMOTE LOG 'name'

    The data format error information is saved as files in GDS. name is the prefix of the error data file.

  • PER NODE REJECT LIMIT 'value'

    Specifies the maximum number of data format errors on each DN during data import. If the number of errors exceeds the specified value on any DN, data import fails, an error is reported, and the system exits data import.

    This syntax specifies the error tolerance of a single node.

    Examples of data format errors include the following: a column is lost, an extra column exists, a data type is incorrect, and encoding is incorrect. When a non-data format error occurs, the whole data import process stops.

    Valid value: an integer and unlimited. The default value is 0, indicating that error information is returned immediately.

  • TO { GROUP groupname | NODE ( nodename [, ... ] ) }

    Currently, TO GROUP cannot be used. TO NODE is used for internal scale-out tools. Common users should not use this mode.

Example

-- Create a foreign table to import data from GDS servers 192.168.0.90 and 192.168.0.91 in TEXT format. Record errors that occur during data import to the err_HR_staffS table.
CREATE FOREIGN TABLE foreign_HR_staffS
(
  staff_ID       NUMBER(6) ,
  FIRST_NAME     VARCHAR2(20),
  LAST_NAME      VARCHAR2(25),
  EMAIL          VARCHAR2(25),
  PHONE_NUMBER   VARCHAR2(20),
  HIRE_DATE      DATE,
  employment_ID  VARCHAR2(10),
  SALARY         NUMBER(8,2),
  COMMISSION_PCT NUMBER(2,2),
  MANAGER_ID     NUMBER(6),
  section_ID  NUMBER(4)
) SERVER gsmpp_server OPTIONS (location 'gsfs://192.168.0.90:5000/* | gsfs://192.168.0.91:5000/*', format 'TEXT', delimiter E'\x08',  null '') WITH err_HR_staffS;
-- Create a foreign table to import data from GDS servers 192.168.0.90 and 192.168.0.91 in TEXT format and record error messages in the import process to the err_HR_staffS table. A maximum of two data format errors are allowed during the import.
CREATE FOREIGN TABLE foreign_HR_staffS_ft1
(
  staff_ID       NUMBER(6) ,
  FIRST_NAME     VARCHAR2(20),
  LAST_NAME      VARCHAR2(25),
  EMAIL          VARCHAR2(25),
  PHONE_NUMBER   VARCHAR2(20),
  HIRE_DATE      DATE,
  employment_ID  VARCHAR2(10),
  SALARY         NUMBER(8,2),
  COMMISSION_PCT NUMBER(2,2),
  MANAGER_ID     NUMBER(6),
  section_ID  NUMBER(4)
) SERVER gsmpp_server OPTIONS (location 'gsfs://192.168.0.90:5000/* | gsfs://192.168.0.91:5000/*', format 'TEXT', delimiter E'\x08',  null '',reject_limit '2') WITH err_HR_staffS_ft3;
-- Delete the foreign table:
DROP FOREIGN TABLE foreign_HR_staffS;
DROP FOREIGN TABLE foreign_HR_staffS_ft1;