Write Configuration

Table 1 Write configuration

Parameter

Description

Default Value

hoodie.datasource.write.table.name

Specifies the name of the Hudi table to be written.

None

hoodie.datasource.write.operation

Specifies the operation type of writing the Hudi table. Currently, upsert, delete, insert, bulk_insert, and bootstrap are supported.

  • upsert: updates and inserts data.

  • delete: deletes data.

  • insert: inserts data.

  • bulk_insert: imports data during initial table creation. Do not upsert or insert during initial table creation.

  • bootstrap: directly transforms a Parquet table into a Hudi table.

  • insert_overwrite: performs insert and overwrite operations on static partitions.

  • insert_overwrite_table: performs insert and overwrite operations on dynamic partitions. It does not immediately delete the entire table or overwrite the table. Instead, it overwrites the metadata of the Hudi table logically, and Hudi deletes useless data through the clean mechanism. Its efficiency is higher than that of the combination of bulk_insert and overwrite.

upsert

hoodie.datasource.write.table.type

Specifies the Hudi table type. Once the table type is specified, this parameter cannot be modified. The value can be MERGE_ON_READ.

COPY_ON_WRITE

hoodie.datasource.write.precombine.field

Merges and reduplicates rows with the same key before write.

ts

hoodie.datasource.write.payload.class

Specifies the class used to merge the records to be updated and the updated records during update. This parameter can be customized. You can compile it yourself to implement your merge logic.

org.apache.hudi.OverwriteWithLatestAvroPayload

hoodie.datasource.write.recordkey.field

Specifies the primary key of the Hudi table. The Hudi table must have a unique primary key.

uuid

hoodie.datasource.write.partitionpath.field

Specifies the partition key. This parameter is used together with hoodie.datasource.write.keygenerator.class to meet the requirements of different partition scenarios.

partitionpath

hoodie.datasource.write.hive_style_partitioning

Specifies whether the partition mode is the same as that of Hive. You are advised to set this parameter to true.

false

hoodie.datasource.write.keygenerator.class

Generates the primary key and partition mode when used together with hoodie.datasource.write.partitionpath.field and hoodie.datasource.write.recordkey.field.

org.apache.hudi.keygen.SimpleKeyGenerator