Read¶
The read operation of Hudi applies to three views of Hudi. You can select a proper view for query based on requirements.
Hudi supports multiple query engines, including Spark and Hive. For details, see Table 1 and Table 2.
Query Engine | Real-time View/Read-optimized View | Incremental View |
---|---|---|
Hive | Y | Y |
Spark (SparkSQL) | Y | Y |
Spark (SparkDataSource API) | Y | Y |
Query Engine | Real-time View | Incremental View | Read-optimized View |
---|---|---|---|
Hive | Y | Y | Y |
Spark (SparkSQL) | Y | Y | Y |
Spark (SparkDataSource API) | Y | Y | Y |
Caution
Currently, the partition deduction capability is not supported when Hudi uses the Spark DataSource API to read data. For example, when the DataSource API is used to query a bootstrap table, the partition field may not be displayed or may be displayed as null.
For an incremental view, set hoodie.hudicow.consume.mode to INCREMENTAL. This parameter applies only to queries on the incremental view and cannot be used for queries on other types of Hudi tables or queries on other tables. You can set hoodie.hudicow.consume.mode to SNAPSHOT or any value to restore the configuration.