Importing Data from MRS to GaussDB(DWS)

Importing Data from MRS to a Data Warehouse Cluster

MRS is a big data cluster running based on the open-source Hadoop ecosystem. It provides the industry's latest cutting-edge storage and analysis capabilities of massive volumes of data, satisfying your data storage and processing requirements. For details about MRS services, see the MapReduce Service User Guide.

You can use Hive/Spark (analysis cluster of MRS) to store massive volumes of service data. Hive/Spark data files are stored in HDFS. On GaussDB(DWS), you can connect a data warehouse cluster to MRS clusters, read data from HDFS files, and write the data to GaussDB(DWS) when the clusters are on the same network.

Import Process

Perform the following operations to import data from MRS to a data warehouse cluster:

  1. In the data warehouse cluster, create an MRS data source connection according to Creating an MRS Data Source Connection.


    • Multiple MRS data sources can exist on the same network, but one GaussDB(DWS) cluster can connect to only one MRS cluster at a time.

  2. Create an HDFS foreign table for querying data from the MRS cluster over APIs of a foreign server.

    For details, see "Data Import > Importing Data from MRS to a Cluster" in the Data Warehouse Service (DWS) Developer Guide.

  3. (Optional) When the HDFS configuration of the MRS cluster changes, update the MRS data source configuration on GaussDB(DWS). For details, see Updating the MRS Data Source Configuration.