Active/standby switchovers of YARN ResourceManager were frequent.
The NodeManager health check of YARN was too sensitive.
The YARN health check incorrectly collected the health status of standby nodes. As a result, an alarm indicating that the service is unavailable was reported.
LDAPServer data could not be synchronized.
Hive execution failed after the MRS 3.1.2-LTS.2.6 patch is installed
Thread leak occurred when HiveServer connects to guardian
Hive column values were too long and failed to be written into an ORC file.
It took a long time to clear temporary files after Hive tasks failed or abnormally terminated.
Hive failed to be started because external metadata has been configured for Hive.
The /var/log/ directory was full because the hiveserver.out log of Hive is not compressed.
It took a long time to add fields to Hive partition tables.
The rand function should generate random numeric strings ranging from 0 to 1 but generated only strings around 0.72.
After the WebHcat process of Hive was killed, it couldn't be automatically started and no alarm was reported.
An exception occurred when Kafka automatically restarted after Kerberos authentication failed.
The Spring packages in the Hudi and Spark directories were incompatible.
After a quota was configured for ZooKeeper, an alarm indicating that the top-level quota failed to be set was still displayed.
The client IP address needed to be printed in the logs of the old Guardian instance.
When MemArtsCC used the TPCDS test suite to write 10 TB data, cc-sidecar restarted repeatedly during task running.
The cc-sidecar process was faulty when the MemArtsCC bare metal server has run stably for a long time.
Residual files needed to be quickly cleared when Spark jobs failed in the architecture with decoupled storage and compute.
Spark printed error logs.
The JobHistory process of Spark was suspended and did not perform self-healing, and no alarm was reported.
The loaded partition was null when speculative execution was enabled for Spark.
The process injection of Spark JDBCServer was in the Z state. 1. The process did not perform self-healing during fault injection. 2. No process exception alarm was generated. 3. Spark tasks failed to be submitted, and no alarm was reported for unavailable Spark applications.
After the JDBC process of Spark was killed, self-healing was performed within 7 minutes, and no alarm was reported. There were reliability risks.
The JDBCServer process of Spark was suspended, the process did not perform self-healing, no alarm was reported, and Spark applications failed to be submitted.
No event was reported when Spark stopped the JDBCServer instance, and the JDBCServer.log file contained a warn indicating that the event failed to be reported.
Some Spark jobs couldn't run due to spring package conflicts after patch 2.10 was installed
After the Spark JobHistory process entered the z state, the process disappeared unexpectedly and did not perform self-healing. In addition, no alarm was reported, leaving reliability risks.
After the Spark JobHistory process is killed, the process automatically recovered within 5 minutes and no alarm was reported.
The JAR package on the Spark server was not replaced after Spark2x patch installation.
Spark failed to write data to Eventlogs.