Creating an Entire Database Migration Job¶
Scenario¶
CDM supports entire DB migration between homogeneous and heterogeneous data sources. The migration principles are the same as those in Table/File Migration Jobs. Each type of Elasticsearch, each key prefix of Redis, or each collection of MongoDB can be executed concurrently as a subtask.
Supported Data Sources in Entire DB Migration lists the data sources supporting entire database migration.
Field Mapping in Automatic Table Creation¶
CDM automatically creates tables at the destination during database migration. Figure 1 describes the field mapping between the DWS tables created by CDM and source tables. For example, if you use CDM to migrate the Oracle database to DWS, CDM automatically creates a table on DWS and maps the NUMBER(3,0) field of the Oracle database to the SMALLINT field of DWS.
Prerequisites¶
You have created links according to Creating Links.
The CDM cluster can communicate with the data source.
Procedure¶
Log in to the management console and choose Service List > Cloud Data Migration. In the left navigation pane, choose Cluster Management. Locate the target cluster and click Job Management.
Choose Entire DB Migration > Create Job. The page for configuring the job is displayed.
Configure the related parameters of the source database according to Table 1.
¶ Source Database
Parameter
Description
Example Value
DWS
FusionInsight LibrA
MySQL
PostgreSQL
SQL Server
Oracle
IBM Db2
SAP HANA
MyCAT
Dameng database
Schema/Tablespace
Name of the schema or tablespace from which data will be extracted. This parameter is displayed when Use SQL Statement is set to No. Click the icon next to the text box to go to the page for selecting a schema or directly enter a schema or tablespace.
If the desired schema or tablespace is not displayed, check whether the login account has the permissions required to query metadata.
schema
WHERE Clause
WHERE clause used to specify the tables to be extracted. This parameter applies to all subtables in the entire DB migration. If this parameter is not set, the entire table is extracted. If the table to be migrated does not contain the fields specified by the WHERE clause, the migration will fail.
You can set a date macro variable to extract data generated on a specific date.
age > 18 and age <= 60
Null in Partition Column
Whether a partition field can be null
Yes
Hive
Database Name
Name of the database to be migrated. The user configured in the source link must have the permission to read the database.
hivedb
HBase
CloudTable
Start Time
Start time (included). The format is yyyy-MM-dd hh:mm:ss. The dateformat time macro variable function is supported. Examples: 2017-12-31 20:00:00, ${dateformat(yyyy-MM-dd, -1, DAY)} 02:00:00, and ${dateformat(yyyy-MM-dd HH:mm:ss, -1, DAY)}
-
End Time
End time (excluded) The format is yyyy-MM-dd hh:mm:ss. The dateformat time macro variable function is supported. Examples: 2018-01-01 20:00:00, ${dateformat(yyyy-MM-dd, -1, DAY)} 02:00:00, and ${dateformat(yyyy-MM-dd HH:mm:ss, -1, DAY)}
-
Redis
Key Filter Character
Filter character used to determine the keys to be migrated For example, if the value of this parameter is a*, all asterisks (
*
) will be migrated.-
DDS
MongoDB
Database Name
Name of the database from which data is to be migrated. The user configured in the source link must have the permission to read the database.
mongodb
Query Filter
Filter used to match documents. Example: {HTTPStatusCode:{$gt:"400",$lt:"500"},HTTPMethod:"GET"}
-
Elasticsearch
CSS
Index
Index of the data to be extracted. The value can be a wildcard character. Multiple indexes that meet the wildcard condition can be migrated at a time. For example, if this parameter is set to cdm*, CDM migrates all indexes starting with cdm, such as cdm01, cdmB3, cdm_45 and so on.
If multiple indexes are migrated at the same time, Index cannot be configured at the migration destination.
cdm*
Configure the related parameters, from Table 2, for the destination cloud service.
¶ Source Database
Parameter
Description
Example Value
DWS
FusionInsight LibrA
MySQL
PostgreSQL
SQL Server
-
For details about the destination job parameters required for entire DB migration to a relational database, see To a Common Relational Database.
schema
MRS HIVE
-
For details about the destination job parameters required for entire DB migration to MRS HIVE, see To Hive.
hivedb
MRS HBase
CloudTable
-
For details about the destination job parameters required for entire DB migration to MRS HBase or CloudTable, see To HBase/CloudTable.
Yes
MRS HDFS
-
For details about the destination job parameters required for entire DB migration to MRS HDFS, see To HDFS.
-
OBS
-
For details about the destination job parameters required for entire database migration to OBS, see To OBS.
-
DCS
-
For details about the destination job parameters required for entire database migration to DCS, see To DCS.
-
DDS
Database Name
Name of the database from which data is to be migrated. The user configured in the source link must have the permission to read the database.
mongodb
Migration Behavior
Add
If there is already a value, replace it; otherwise, add a value.
Replace
-
CSS
Index
Index of the data to be extracted. The value can be a wildcard character. Multiple indexes that meet the wildcard condition can be migrated at a time. For example, if this parameter is set to cdm*, CDM migrates all indexes starting with cdm, such as cdm01, cdmB3, cdm_45 and so on.
If multiple indexes are migrated at the same time, Index cannot be configured at the migration destination.
cdm*
If a relational database is migrated, after job parameters are configured, click Next to access the page for selecting tables. You can select the tables to be migrated to the migration destination based on your requirements.
Click Next and set job parameters.
Table 3 describes related parameters.
¶ Parameter
Description
Example Value
Concurrent Tables
Number of tables to be concurrently executed
3
Concurrent Extractors
Number of extractors to be concurrently executed. Generally, retain the default value.
1
Write Dirty Data
Whether to record dirty data. By default, this parameter is set to No.
Yes
Write Dirty Data Link
This parameter is only displayed when Write Dirty Data is set to Yes.
Only links to OBS support dirty data writes.
obs_link
OBS Bucket
This parameter is only displayed when Write Dirty Data Link is a link to OBS.
Name of the OBS bucket to which the dirty data will be written.
dirtydata
Dirty Data Directory
This parameter is only displayed when Write Dirty Data is set to Yes.
Directory for storing dirty data on OBS. Dirty data is saved only when this parameter is configured.
You can go to this directory to query data that fails to be processed or is filtered out during job execution, and check the source data that does not meet conversion or cleaning rules.
/user/dirtydir
Max. Error Records in a Single Shard
This parameter is only displayed when Write Dirty Data is set to Yes.
When the number of error records of a single map exceeds the upper limit, the job will automatically terminate and the imported data cannot be rolled back. You are advised to use a temporary table as the destination table. After the data is imported, rename the table or combine it into the final data table.
0
Click Save or Save and Run.
When the job starts running, a sub-job will be generated for each table. You can click the job name to view the sub-job list.