Enabling Cross-Cluster Copy

Scenario

DistCp is used to copy the data stored on HDFS from a cluster to another cluster. DistCp depends on the cross-cluster copy function, which is disabled by default. This function needs to be enabled in both clusters.

This section describes how to enable cross-cluster copy.

Impact on the System

Yarn needs to be restarted to enable the cross-cluster copy function and cannot be accessed during the restart.

Prerequisites

The hadoop.rpc.protection parameter of the two HDFS clusters must be set to the same data transmission mode, which can be privacy (encryption enabled) or authentication (encryption disabled).

Note

Go to the All Configurations page by referring to Modifying Cluster Service Configuration Parameters and search for hadoop.rpc.protection.

For versions earlier than MRS 3.x, choose Components > HDFS > Service Configuration on the cluster details page. Switch Basic to All, and search for hadoop.rpc.protection.

Procedure

  1. Log in to the service page.

    For versions earlier than MRS 1.9.2: Log in to MRS Manager, and choose Services.

    For MRS 1.9.2 or later: Click the cluster name on the MRS console and choose Components.

  2. Go to the All Configurations page of the Yarn service. For details, see Modifying Cluster Service Configuration Parameters.

    Note

    If the Components tab is unavailable, complete IAM user synchronization first. (On the Dashboard page, click Synchronize on the right side of IAM User Sync to synchronize IAM users.)

  3. In the navigation pane, choose Yarn > Distcp.

  4. Set haclusterX.remotenn1 of dfs.namenode.rpc-address to the service IP address and RPC port number of one NameNode instance of the peer cluster, and set haclusterX.remotenn2 to the service IP address and RPC port number of the other NameNode instance of the peer cluster. Enter a value in the IP address:port format.

    Note

    For MRS 1.9.2 or later, log in to the MRS console, click the cluster name, and choose Components > HDFS > Instances to obtain the service IP address of the NameNode instance.

    You can also log in to FusionInsight Manager in MRS 3.x clusters, and choose Cluster > Name of the desired cluster > Services > HDFS > Instance to obtain the service IP address of the NameNode instance.

    dfs.namenode.rpc-address.haclusterX.remotenn1 and dfs.namenode.rpc-address.haclusterX.remotenn2 do not distinguish active and standby NameNode instances. The default NameNode RPC port is 9820 and cannot be modified on MRS Manager.

    For example, 10.1.1.1:9820 and 10.1.1.2:9820.

    Note

    For MRS 1.6.2 or earlier, the default port number is 25000. For details, see List of Open Source Component Ports.

  5. Save the configuration. On the Dashboard tab page, and choose More > Restart Service to restart the Yarn service.

    Operation succeeded is displayed. Click Finish. The Yarn service is started successfully.

  6. Log in to the other cluster and repeat the preceding operations.