• MapReduce Service

mrs
  1. Help Center
  2. MapReduce Service
  3. User Guide
  4. MRS Manager Operation Guide
  5. Backup and Restoration
  6. Introduction

Introduction

Overview

MRS Manager provides backup and recovery for user data and system data. The backup function is provided based on components to back up Manager data (including OMS data and LdapServer data), Hive user data, component metadata saved in DBService, and HDFS metadata.

Backup and recovery tasks are performed in the following scenarios:

  • Routine backup is performed to ensure the data security of the system and components.
  • If the system is faulty, backup data can be used to restore the system.
  • If the active cluster is completely faulty, an image cluster identical to the active cluster needs to be created. Backup data can be used to perform restoration operations.
Table 1 Backing up metadata

Backup Type

Backup Content

OMS

Back up database data (excluding alarm data) and configuration data in the cluster management system.

LdapServer

Back up user information, including the username, password, key, password policy, and group information.

DBService

Back up metadata of the component (Hive) managed by DBService.

NameNode

Back up HDFS metadata.

Principles

Task

Before backup or recovery, you need to create a backup or recovery task and set task parameters, such as the task name, backup data source, and type of directories for saving backup files. When Manager is used to recover the data of HDFS, Hive, and NameNode, the cluster cannot be accessed.

Each backup task can back up different data sources and generate independent backup files for each data source. All the backup files generated in each task form a backup file set, which can be used in recovery tasks. Backup files can be stored on Linux local disks, HDFS of the local cluster, and HDFS of the standby cluster. The backup task provides both full and incremental backup policies. HDFS and Hive backup tasks support the incremental backup policy, while OMS, LdapServer, DBService, and NameNode backup tasks support only the full backup policy.

NOTE:

The rules for task execution are as follows:

  • If a task is being executed, it cannot be executed repeatedly and other tasks cannot be started at the same time.
  • The interval at which a periodic task is automatically executed must be greater than 120s; otherwise, the task is postponed and will be executed in the next period. Manual tasks can be executed at any interval.
  • When a periodic task is to be automatically executed, the current time cannot be 120s later than the task start time; otherwise, the task is postponed and will be executed in the next period.
  • When a periodic task is locked, it cannot be automatically executed and needs to be manually unlocked.
  • Before an OMS, LdapServer, DBService, or NameNode backup task starts, ensure that the LocalBackup partition on the active management node has more than 20 GB available space; otherwise, the backup task cannot be started.
  • When planning backup and recovery tasks, select the data you want to back up or recover according to the service logic, data storage structure, and database or table association. By default, the system creates periodic backup task default that has an execution interval of 24 hours to perform full backup of OMS, LdapServer, DBService, and NameNode data to the Linux local disk.

Snapshot

The system adopts the snapshot technology to quickly back up data. Snapshots include HDFS snapshots.

An HDFS snapshot is a read-only backup of HDFS at a specified time point. It is used in data backup, misoperation protection, and disaster recovery.

The snapshot function can be enabled for any HDFS directory to create the related snapshot file. Before creating a snapshot for a directory, the system automatically enables the snapshot function for the directory. Snapshot creation does not affect HDFS operations. A maximum of 65,536 snapshots can be created for each HDFS directory.

When a snapshot has been created for an HDFS directory, the directory cannot be deleted and the directory name cannot be modified before the snapshot is deleted. Snapshots cannot be created for the upper-layer directories or subdirectories of the directory.

DistCp

Distributed copy (DistCp) is a tool used to replicate large amounts of data within a cluster HDFS or between HDFSs of different clusters. In HBase, HDFS, or Hive backup or recovery tasks, if the data is backed up in HDFS of the standby cluster, the system invokes DistCp to perform the operation. You need to install the same version of MRS in the active and standby clusters.

DistCp uses MapReduce to implement data distribution, troubleshooting, recovery, and report. DistCp specifies different Map jobs for various source files and directories in the specified list. Each Map job copies the data in the partition that corresponds to the specified file in the list.

To use DistCp to perform data replication between HDFSs of two clusters, configure the cross-cluster trust relationship and enable the cross-cluster replication function for both clusters.

Local quick recovery

After using DistCp to back up the HDFS and Hive data of the local cluster to HDFS of the standby cluster, HDFS of the local cluster retains the backup data snapshots. Users can create local quick recovery tasks to recover data by using the snapshot files in HDFS of the local cluster.

Specifications

Table 2 Backup and recovery feature specifications

Item

Specifications

Maximum number of backup or recovery tasks

100

Number of concurrent running tasks

1

Maximum number of waiting tasks

199

Maximum size of backup files on a Linux local disk (GB)

600

Table 3 Specifications of the default task

Item

OMS

LdapServer

DBService

NameNode

Backup period

1 hour

Maximum number of backups

2

Maximum size of a backup file

10 MB

20 MB

100 MB

1.5 GB

Maximum size of disk space used

20 MB

40 MB

200 MB

3 GB

Save path of backup data

Data save path/LocalBackup/ on active and standby management nodes

NOTE:

The administrator must regularly transfer the backup data of the default task to an external cluster based on the enterprise's O&M requirements.