Migrating Data from DDS to DWS

Scenario

CDM allows you to migrate data from DDS to other data sources. This section describes how to use CDM to migrate data from DDS to DWS. The procedure includes four steps:

  1. Creating a CDM Cluster and Binding an EIP to the Cluster

  2. Creating a DDS Link

  3. Creating a DWS Link

  4. Creating a Migration Job

Prerequisites

  • DWS/DDS is available.

  • You have obtained the IP address, port number, database name, username, and password for connecting to the DWS and DDS databases. In addition, you must have the read, write, and delete permissions for the DDS and DWS databases.

Creating a CDM Cluster and Binding an EIP to the Cluster

  1. Create a CDM cluster by following the instructions in Creating a Cluster.

    The key configurations are as follows:

    • The flavor of the CDM cluster is selected based on the amount of data to be migrated. Generally, cdm.medium meets the requirements for most migration scenarios.

    • If DDS and DWS are deployed in the same VPC, the newly created CDM cluster also needs to be deployed in that VPC, with no EIP bound. The CDM cluster's subnet and security group can be the same as those of the DDS or DWS cluster. You can also configure a security group rule to enable the CDM cluster to access the cluster of another service (DWS or DDS).

    • If DDS and DWS are not deployed in the same VPC, the newly created CDM cluster needs to be in the same VPC as DDS and an EIP must be bound for the CDM cluster to access the DWS cluster.

  2. After the CDM cluster is created, on the Cluster Management page, click Bind EIP in the Operation column to bind an EIP to the cluster. The CDM cluster uses the EIP to access DWS. If DDS and DWS are in the same VPC, do not bind an EIP to the CDM cluster.

    Note

    If SSL encryption is configured for the access channel of a local data source, CDM cannot connect to the data source using the EIP.

Creating a Migration Job

  1. Choose Table/File Migration > Create Job to create a data migration job.

  2. Configure the required job information:

    • Job Name: Enter a unique job name.

    • Source Job Configuration

      • Source Link Name: Select the mongo_link link created in Creating a DDS Link.

      • Database Name: Select the database whose data is to be migrated.

      • Collection Name: Enter the name of the MongoDB collection on DDS, which is similar to the table name in a relational database.

    • Destination Job Configuration

      • Destination Link Name: Select the dwslink link created in Creating a DWS Link.

      • Schema/Tablespace: Select the DWS database to which data is to be written.

      • Table Name: Name of the table to which data is to be written. You can manually enter a table name that does not exist. CDM automatically creates the table on DWS.

      • Clear Data Before Import: Choose whether to clear data in the destination table before data import.

  3. Click Next. The Map Field tab page is displayed. CDM automatically maps table fields at the migration source and destination. Check whether the field mapping is correct.

    • If the field mapping is incorrect, click the row where the field is located and drag the field to adjust the mapping.

    • When importing data to DWS, you need to manually select the distribution columns of DWS. You are advised to select the distribution columns according to the following principles:

      1. Use the primary key as the distribution column.

      2. If multiple data segments are combined as primary keys, specify all primary keys as the distribution column.

      3. In the scenario where no primary key is available, if no distribution column is selected, DWS uses the first column as the distribution column by default. As a result, data skew risks exist.

    • If you want to convert the content of the source fields, perform the operations in this step. In this example, field conversion is not required.

  4. Click Next and set task parameters. Generally, retain the default values of all parameters.

    In this step, you can configure the following optional functions:

    • Retry Upon Failure: If the job fails to be executed, you can determine whether to automatically retry. Retain the default value Never.

    • Group: Select the group to which the job belongs. The default group is DEFAULT. On the Job Management page, jobs can be displayed, started, or exported by group.

    • Schedule Execution: To configure scheduled jobs, see Scheduling Job Execution. Retain the default value No.

    • Concurrent Extractors: Enter the number of extractors to be concurrently executed. Retain the default value 1.

    • Write Dirty Data: Specify this parameter if data that fails to be processed or filtered out during job execution needs to be written to OBS for future viewing. Before writing dirty data, create an OBS link. Retain the default value No so that dirty data is not recorded.

    • Delete Job After Completion: Retain the default value Do not delete.

  5. Click Save and Run. The Job Management page is displayed, on which you can view the job execution progress and result.

  6. After the job is successfully executed, in the Operation column of the job, click Historical Record to view the job's historical execution records and read/write statistics.

    On the Historical Record page, click Log to view the job logs.