MRS 3.1.2-LTS.2 Patch Description

Basic Information

Patch Version

MRS 3.1.2-LTS.2.14

Release Date

2024-01-19

Pre-Installation Operations

If an MRS cluster node is faulty or the network is disconnected, isolate the node first. Otherwise, the patch installation will fail.

New Features and Optimizations

  • MRS Manager

    The MRS client management function is enhanced. You can install patches on the MRS client.

    MRS supports O&M inspections.

    MRS integrates the rolling restart of StoreWorker and StoreMaster. Rolling restart policies are added to Manager.

Resolved Issues

Resolved issues in MRS 3.1.2-LTS.2.14:

  • MRS Manager

    • The standby OMS node reported an alarm indicating that the FMS resource was abnormal.

    • Subsequent capacity expansion failed due to residual IP addresses in the HOSTS_OS_PATCH_STATE table

    • CES monitoring was inconsistent with YARN monitoring.

    • Active/standby OMS switchovers were frequent.

    • Failed to view the host resource overview in a specified period because the monitoring data is empty

    • The disk monitoring metrics were incorrectly calculated.

  • Component

    • Active/standby switchovers of YARN ResourceManager were frequent.

    • The NodeManager health check of YARN was too sensitive.

    • The YARN health check incorrectly collected the health status of standby nodes. As a result, an alarm indicating that the service is unavailable was reported.

    • LDAPServer data could not be synchronized.

    • Hive execution failed after the MRS 3.1.2-LTS.2.6 patch is installed

    • Thread leak occurred when HiveServer connects to guardian

    • Hive column values were too long and failed to be written into an ORC file.

    • It took a long time to clear temporary files after Hive tasks failed or abnormally terminated.

    • Hive failed to be started because external metadata has been configured for Hive.

    • The /var/log/ directory was full because the hiveserver.out log of Hive is not compressed.

    • It took a long time to add fields to Hive partition tables.

    • The rand function should generate random numeric strings ranging from 0 to 1 but generated only strings around 0.72.

    • After the WebHcat process of Hive was killed, it couldn't be automatically started and no alarm was reported.

    • An exception occurred when Kafka automatically restarted after Kerberos authentication failed.

    • The Spring packages in the Hudi and Spark directories were incompatible.

    • After a quota was configured for ZooKeeper, an alarm indicating that the top-level quota failed to be set was still displayed.

    • The client IP address needed to be printed in the logs of the old Guardian instance.

    • When MemArtsCC used the TPCDS test suite to write 10 TB data, cc-sidecar restarted repeatedly during task running.

    • The cc-sidecar process was faulty when the MemArtsCC bare metal server has run stably for a long time.

    • Residual files needed to be quickly cleared when Spark jobs failed in the architecture with decoupled storage and compute.

    • Spark printed error logs.

    • The JobHistory process of Spark was suspended and did not perform self-healing, and no alarm was reported.

    • The loaded partition was null when speculative execution was enabled for Spark.

    • The process injection of Spark JDBCServer was in the Z state. 1. The process did not perform self-healing during fault injection. 2. No process exception alarm was generated. 3. Spark tasks failed to be submitted, and no alarm was reported for unavailable Spark applications.

    • After the JDBC process of Spark was killed, self-healing was performed within 7 minutes, and no alarm was reported. There were reliability risks.

    • The JDBCServer process of Spark was suspended, the process did not perform self-healing, no alarm was reported, and Spark applications failed to be submitted.

    • No event was reported when Spark stopped the JDBCServer instance, and the JDBCServer.log file contained a warn indicating that the event failed to be reported.

    • Some Spark jobs couldn't run due to spring package conflicts after patch 2.10 was installed

    • After the Spark JobHistory process entered the z state, the process disappeared unexpectedly and did not perform self-healing. In addition, no alarm was reported, leaving reliability risks.

    • After the Spark JobHistory process is killed, the process automatically recovered within 5 minutes and no alarm was reported.

    • The JAR package on the Spark server was not replaced after Spark2x patch installation.

    • Spark failed to write data to Eventlogs.

Compatibility with Other Patches

The MRS 3.1.2-LTS.2.14 patch can resolve all the issues detected in MRS 3.1.2-LTS.2.

Impact of Patch Installation

After the MRS 3.1.2-LTS.2.14 patch is installed, a message may be displayed indicating that the client patch package fails to be generated. To solve this problem, perform the following steps:

  1. Log in to the active OMS node of the cluster.

  2. Switch to user omm.

    su - omm

  3. Log in to the MRS management console and choose Clusters > Active Clusters in the navigation pane. On the displayed page, click a cluster name and click Patches. On the displayed page, check the latest patch version and run the following script:

    sh /opt/Bigdata/patches/{Patch version}/generate_client_patch.sh

  4. If "generate client patch success" is displayed, the patch package is successfully generated. If "ERROR" is displayed, the patch package failed to be generated. In this case, perform step a. to locate the fault.

    1. View the /opt/Bigdata/patches/log/generate_client_patch.log file to locate the failure cause.