Upgrading the Cluster Version

Same-version upgrade, cross-engine upgrade, and cross-version upgrade are supported. Same-version upgrade is to upgrade the kernel patch of a cluster to fix problems or optimize performance. Cross-engine upgrade is to upgrade an Elasticsearch cluster to an OpenSearch cluster. Cross-version upgrade is to upgrade the cluster version to enhance functions or incorporate versions.

Description

Principle

Nodes in the cluster are upgraded one by one so that services are not interrupted. The upgrade process is as follows: bring a node offline, migrate its data to another node, create a new node of the target version, and mount the NIC ports of the offline node to the new node to retain the node IP address. After a new node is added to the cluster, other nodes will be updated in the same way in sequence. If there is a large amount of data in a cluster, the upgrade duration depends on the data migration duration.

Process

  1. Pre-Upgrade Check

  2. Creating a Snapshot

  3. Creating an Upgrade Task

Version Restrictions

The supported target version varies according to the current cluster version. For details, see Table 1.

Table 1 Version restrictions

Current Version

Target Version

Elasticsearch: 6.2.3

Elasticsearch: 6.5.4 or 6.8.23

Elasticsearch: 6.5.4

Elasticsearch: 6.8.23

Elasticsearch: 6.8.23

Elasticsearch: 7.6.2 or 7.10.2

Elasticsearch: 7.1.1

Elasticsearch: 7.6.2 or 7.10.2

Elasticsearch: 7.6.2

Elasticsearch: 7.10.2

Elasticsearch: 7.9.3

Elasticsearch: 7.10.2

Note:

  • Elasticsearch 7.6.2 and 7.10.2 are mainstream cluster versions. You are advised to upgrade your clusters to these two versions. The supported target versions are displayed in the drop-down list of Target Image.

  • Elasticsearch clusters of version 5.X cannot be upgraded across versions. Elasticsearch clusters of versions 6.2.3 and 6.5.4 can be upgraded to 6.8.23 and then to 7.X.X.

Constraints

  • A maximum of 20 clusters can be upgraded at the same time. You are advised to perform the upgrade during off-peak hours.

  • Clusters that have ongoing tasks cannot be upgraded.

  • Once started, an upgrade task cannot be stopped until it succeeds or fails.

  • During the upgrade, nodes are replaced one by one. Requests sent to a node that is being replaced may fail. In this case, you are advised to access the cluster through the VPC Endpoint service or a dedicated load balancer.

  • During the upgrade, the Kibana and Cerebro components will be rebuilt and cannot be accessed. Different Kibana versions are incompatible with each other. During the upgrade, you may fail to access Kibana due to version incompatibility. A cluster can be accessed after it is successfully upgraded.

Pre-Upgrade Check

To ensure a successful upgrade, you must check the items listed in the following table before performing an upgrade.

Table 2 Pre-upgrade checklist

Check Item

Check Method

Description

Normal Status

Cluster status

System check

After an upgrade task is started, the system automatically checks the cluster status. Clusters whose status is green or yellow can provide services properly and have no unallocated primary shards.

The cluster status is Available.

Node quantity

System check

After an upgrade task is started, the system automatically checks the number of nodes. The total number of data nodes and cold data nodes in a cluster must be greater than or equal to 3 so that services will not be interrupted.

The total number of data nodes and cold data nodes in a cluster must be greater than or equal to 3.

Disk capacity

System check

After an upgrade task is started, the system automatically checks the disk capacity. During the upgrade, nodes are brought offline one by one and then new nodes are created. Ensure that the disk capacity of all the remaining nodes can process all data of the node that has been brought offline.

After a node is brought offline, the remaining nodes can contain all data of the cluster.

Data backup

System check

Check whether the maximum number of primary and standby shards of indexes in a cluster can be allocated to the remaining data nodes and cold data nodes. Prevent backup allocation failures after a node is brought offline during the upgrade.

Maximum number of primary and standby index shards plus 1 must be less than or equal to the total number of data nodes and cold data nodes before the upgrade.

Data backup

System check

Before the upgrade, back up data to prevent data loss caused by upgrade faults. When submitting an upgrade task, you can determine whether to enable the system to check for the backup of all indexes.

Check whether data has been backed up.

Resources

System check

After an upgrade task is started, the system automatically checks resources. Resources will be created during the upgrade. Ensure that resources are available.

Resources are available and sufficient.

Custom plugins

System and manual check

Perform this check only when custom plugins are installed in the source cluster. If a cluster has a custom plugin, upload all plugin packages of the target version on the plugin management page before the upgrade. During the upgrade, install the custom plugin in the new nodes. Otherwise, the custom plugins will be lost after the cluster is successfully upgraded. After an upgrade task is started, the system automatically checks whether the custom plugin package has been uploaded, but you need to check whether the uploaded plugin package is correct.

Note

If the uploaded plugin package is incorrect or incompatible, the plugin package cannot be automatically installed during the upgrade. As a result, the upgrade task fails. To restore a cluster, you can terminate the upgrade task and restore the node that fails to be upgraded by Replacing a Specified Node.

After the upgrade is complete, the status of the custom plugin is reset to Uploaded.

The plugin package of the cluster to be upgraded has been uploaded to the plugin list.

Custom configurations

System check

During the upgrade, the system automatically synchronizes the content of the cluster configuration file elasticsearch.yml.

Clusters' custom configurations are not lost after the upgrade.

Non-standard operations

Manual check

Check whether non-standard operations are contained in the upgrade. Non-standard operations refer to manual operations that are not recorded. These operations cannot be automatically transferred during the upgrade, for example, modification of the Kibana.yml configuration file, system configuration, and route return.

Some non-standard operations are compatible. For example, the modification of a security plugin can be retained through metadata, and the modification of system configuration can be retained using images. Some non-standard operations, such as the modification of the kibana.yml file, cannot be retained, and you must back up the file in advance.

Compatibility check

System and manual check

After a cross-version upgrade task is started, the system automatically checks whether the source and target versions have incompatible configurations. If a custom plugin is installed for a cluster, the version compatibility of the custom plugin needs to be manually checked.

Configurations before and after the cross-version upgrade are compatible.

Creating an Upgrade Task

  1. Log in to the CSS management console.

  2. In the navigation pane on the left, choose Clusters. On the cluster list page that is displayed, click the name of a cluster.

  3. On the displayed basic cluster information page, click Version Upgrade.

  4. On the displayed page, set upgrade parameters.

    Table 3 Upgrade parameters

    Parameter

    Description

    Upgrade Type

    • Same-version upgrade: Upgrade the kernel patch of the cluster. The cluster version number remains unchanged.

    • Cross-version upgrade: Upgrade the cluster version.

    Target Image

    Image of the target version. When you select an image, the image name and target version details are displayed.

    The supported target versions are displayed in the drop-down list of Target Image. If the target image cannot be selected, the possible causes are as follows:

    • The current cluster is of the latest version.

    • The current cluster is created before 2023 and has vector indexes.

    • The new version images have not been added at the current region.

    Agency

    Select an IAM agency to grant the upgrade permission to the current account.

    If no agency is available, click Create Agency to go to the IAM console and create an agency.

    Note

    The selected agency must be assigned the Tenant Administrator or VPC Administrator policy.

  5. After setting the parameters, click Submit. Determine whether to enable Check full index snapshot and Perform cluster load detection and click OK.

    If a cluster is overloaded, the upgrade task may suspend or fail. Enabling Cluster load detection can effectively avoid failures. If any of the following situations occurs during the detection, wait or reduce the load. If you urgently need to upgrade the version and you have understood the upgrade failure risks, you can disable the Cluster load detection function. The cluster load detection items are as follows:

    • nodes.thread_pool.search.queue < 1000: check whether the maximum number of search queues is less than 1000.

    • nodes.thread_pool.write.queue < 200: Check whether the maximum number of write queues is less than 200.

    • nodes.process.cpu.percent < 90: Check whether the maximum CPU usage is less than 90%.

    • nodes.os.cpu.load_average/Number of CPU cores < 80%: Check whether the ratio of the maximum load to the number of CPU cores is less than 80%.

  6. View the upgrade task in the task list. If the task status is Running, you can expand the task list and click View Progress to view the upgrade progress.

    If the task status is Failed, you can retry or terminate the task.

    • Retry the task: Click Retry in the Operation column.

    • Terminate the task: Click Terminate in the Operation column.

      Important

      • Same version upgrade: If the upgrade task status is Failed, you can terminate the upgrade task.

      • Cross version upgrade: You can stop an upgrade task only when the task status is Failed and no node has been upgraded.

      After an upgrade task is terminated, the Task Status of the cluster is rolled back to the status before the upgrade, and other tasks in the cluster are not affected.