Using Loader from Scratch¶
You can use Loader to import data from the SFTP server to HDFS.
This section applies to MRS clusters earlier than 3.x.
Prerequisites¶
You have prepared service data.
You have created an analysis cluster.
Procedure¶
Access the Loader page.
Access the cluster details page.
For versions earlier than MRS 1.9.2, log in to MRS Manager and choose Services.
For MRS 1.9.2 or later, click the cluster name on the MRS console and choose Components.
Choose Hue. In Hue Web UI of Hue Summary, click Hue (Active). The Hue web UI is displayed.
Choose Data Browsers > Sqoop.
The job management tab page is displayed by default on the Loader page.
On the Loader page, click Manage links.
Click New link and create sftp-connector. For details, see File Server Link.
Click New link, enter the link name, select hdfs-connector, and create hdfs-connector.
On the Loader page, click Manage jobs.
Click New Job.
In Connection, set parameters.
In From, configure the job of the source link.
For details, see ftp-connector or sftp-connector.
In To, configure the job of the target link.
For details, see hdfs-connector.
In Task Config, set job running parameters.
¶ Parameter
Description
Extractors
Number of Map tasks
Loaders
Number of Reduce tasks
This parameter is displayed only when the destination field is HBase or Hive.
Max. Error Records in a Single Shard
Error record threshold. If the number of error records of a single Map task exceeds the threshold, the task automatically stops and the obtained data is not returned.
Note
Data is read and written in batches for MYSQL and MPPDB of generic-jdbc-connector by default. Errors are recorded once at most for each batch of data.
Dirty Data Directory
Directory for saving dirty data. If you leave this parameter blank, dirty data will not be saved.
Click Save.