Job Parameter Description¶
When you create a job in a specified cluster by following the instructions in Creating a Job in a Specified Cluster or create and execute a job in a random cluster by following the instructions in Creating and Executing a Job in a Random Cluster, the driver-config-values parameter specifies the job configuration, which includes the following functions:
Retry upon Failure: If a job fails to be executed, you can choose whether to automatically restart the job.
Job Group: CDM allows you to group jobs. You can filter, delete, start, or export jobs by group.
Schedule Execution: Specify whether to execute scheduled jobs.
Concurrent Extractors: Enter the number of concurrent extractors.
Write Dirty Data: Specify this parameter if data that fails to be processed or filtered out during job execution needs to be written to OBS for future viewing. Before writing dirty data, create an OBS link.
Delete Job After Completion: Specify whether to delete a job after the job is executed.
Sample JSON File¶
"driver-config-values": {
"configs": [
{
"inputs": [
{
"name": "throttlingConfig.numExtractors",
"value": "1"
},
{
"name": "throttlingConfig.numLoaders",
"value": "1"
},
{
"name": "throttlingConfig.recordDirtyData",
"value": "false"
}
],
"name": "throttlingConfig"
},
{
"inputs": [],
"name": "jarConfig"
},
{
"inputs": [
{
"name": "schedulerConfig.isSchedulerJob",
"value": "false"
},
{
"name": "schedulerConfig.disposableType",
"value": "NONE"
}
],
"name": "schedulerConfig"
},
{
"inputs": [],
"name": "transformConfig"
},
{
"inputs": [
{
"name": "retryJobConfig.retryJobType",
"value": "NONE"
}
],
"name": "retryJobConfig"
}
]
}
Parameter Description¶
Parameter | Mandatory | Type | Description |
---|---|---|---|
throttlingConfig.numExtractors | No | Integer | Maximum number of concurrent extraction jobs. For example, 20. |
groupJobConfig.groupName | No | Enumeration | Group to which a job belongs. The default group is DEFAULT. |
throttlingConfig.numLoaders | No | Integer | This parameter is available only when HBase or Hive serves as the destination data source. Maximum number of loading jobs. For example, 5. |
throttlingConfig.recordDirtyData | No | Boolean | Whether to write dirty data. For example, true. |
throttlingConfig.writeToLink | No | String | Link to which dirty data is written. Currently, dirty data can be written only to OBS or HDFS. For example, obslink. |
throttlingConfig.obsBucket | No | String | Name of the OBS bucket to which dirty data is written. This parameter is valid only when dirty data is written to OBS. For example, dirtyData. |
throttlingConfig.dirtyDataDirectory | No | String | Directory to which dirty data is written
|
throttlingConfig.maxErrorRecords | No | String | Maximum number of error records in a single shard. When the number of error records of a map exceeds the upper limit, the task automatically ends. The imported data will not be rolled back. |
schedulerConfig.isSchedulerJob | No | Boolean | Whether to enable a scheduled task. For example, true. |
schedulerConfig.cycleType | No | String | Cycle type of a scheduled task. The options are as follows:
|
schedulerConfig.cycle | No | Integer | Cycle of a scheduled task. If cycleType is set to minute and cycle is set to 10, the scheduled task is executed every 10 minutes. |
schedulerConfig.runAt | No | String | Time when a scheduled task is triggered in a cycle. This parameter is valid only when cycleType is set to hour, week, or month.
|
schedulerConfig.startDate | No | String | Start time of a scheduled task. For example, 2018-01-24 19:56:19. |
schedulerConfig.stopDate | No | String | End time of a scheduled task. For example, 2018-01-27 23:59:00. If you do not set the end time, the scheduled task is always executed and will never stop. |
schedulerConfig.disposableType | No | Enumeration | Whether to delete a job after the job is executed. The options are as follows:
|
retryJobConfig.retryJobType | No | Enumeration | Whether to automatically retry if a job fails to be executed. The options are as follows:
|