Spark2x Open Source New Features¶
Purpose¶
Compared with Spark 1.5, Spark2x has some new open-source features. The specific features or concepts are as follows:
DataSet: For details, see SparkSQL and DataSet Principle.
Spark SQL Native DDL/DML: For details, see SparkSQL and DataSet Principle.
SparkSession: For details, see SparkSession Principle.
Structured Streaming: For details, see Structured Streaming Principle.
Optimizing Small Files
Optimizing the Aggregate Algorithm
Optimizing Datasource Tables
Merging CBO