What Are the Differences Between a Data Warehouse and the Hadoop Big Data Platform?¶
The Hadoop big data platform can be regarded as a next-generation data warehousing system. It has the characteristics of modern data warehouses and is widely used by enterprises. Because of the scalability of MPP, the MPP-based data warehousing system is sometimes classified as a big data platform.
However, data warehouses greatly differ from the Hadoop platform in function and user experience in different scenarios. For details, see the following table.
Number of compute nodes
Over 10 PB
Max 10 PB
Relational, semi-relational, unstructured (voice, images, and video)
Medium to high
Application development API
SQL and other programming language APIs, such as MapReduce
Standard database SQL
Unlimited, with comprehensive programming APIs
Limited, supported by UDFs
Data warehouses and the Hadoop platform work together in different scenarios. GaussDB(DWS) on the public cloud can seamlessly integrate with Hadoop-based MRS on the public cloud to provide the SQL-over-Hadoop data sharing across platforms and services. GaussDB(DWS) serves as a data warehouse for managing massive data while relishing the openness, convenience, and innovation of the Hadoop platform. You can also enjoy the upper-layer applications of conventional data warehouses, especially BI applications, using GaussDB(DWS).