Reference: Job Splitting Dimensions¶
CDM splits jobs for different data sources based on different dimensions. Table 1 lists the splitting dimensions.
Data Source Category | Data Source | Job Splitting Rule |
---|---|---|
Data warehouse | GaussDB(DWS) |
|
Data Lake Insight (DLI) |
| |
Hadoop | MRS HDFS | Jobs can be split based on files. |
MRS HBase | Jobs can be split based on HBase regions. | |
MRS Hive |
| |
FusionInsight HDFS | Jobs can be split based on files. | |
FusionInsight HBase | Jobs can be split based on HBase regions. | |
FusionInsight Hive |
| |
Apache HDFS | Jobs can be split based on files. | |
Apache HBase | Jobs can be split based on HBase regions. | |
Apache Hive |
| |
Object storage | Object Storage Service (OBS) | Jobs can be split based on files. |
File system | FTP | Jobs can be split based on files. |
SFTP | Jobs can be split based on files. | |
HTTP | Jobs can be split based on files. | |
Relational database | RDS for MySQL |
|
RDS for PostgreSQL |
| |
RDS for SQL Server |
| |
MySQL |
| |
PostgreSQL |
| |
Microsoft SQL Server |
| |
Oracle |
| |
SAP HANA |
| |
Database shard | Each backend connects to a subjob, which can be split based on primary keys. | |
NoSQL | Distributed Cache Service (DCS) | Jobs cannot be split. |
Redis | Jobs cannot be split. | |
Document Database Service (DDS) | Jobs cannot be split. | |
MongoDB | Jobs cannot be split. | |
Cassandra | Jobs can be split based on the token range of Cassandra. | |
Message system | Apache Kafka | Jobs can be split based on topics. |
DMS Kafka | Jobs can be split based on topics. | |
MRS Kafka | Jobs can be split based on topics. | |
Search | Elasticsearch | Jobs cannot be split. |
Cloud Search Service (CSS) | Jobs cannot be split. |