Pre-upgrade Check¶
The system automatically checks a cluster before its upgrade. If the cluster does not meet the pre-upgrade check conditions, the upgrade cannot continue. To avoid risks, you can perform pre-upgrade check according to the check items and solutions described in this section.
No. | Check Item | Description |
---|---|---|
1 |
| |
2 | Check whether the target cluster is under upgrade management. | |
3 |
| |
4 | Check whether the current HelmRelease record contains discarded Kubernetes APIs that are not supported by the target cluster version. If yes, the Helm chart may be unavailable after the upgrade. | |
5 | Check whether your master nodes can be accessed using SSH. | |
6 | Check the node pool status. | |
7 | Check whether the Protocol & Port of the worker node security groups is set to ICMP: All and whether the security group with the source IP address set to the master node security group is deleted. | |
8 | Check whether nodes need to be migrated. | |
9 | Check whether there are discarded resources in the clusters. | |
10 | Read the version compatibility differences and ensure that they are not affected. The patch upgrade does not involve version compatibility differences. | |
11 | Check whether cce-agent on the current node is of the latest version. | |
12 | Check whether the CPU usage of the node exceeds 90%. | |
13 |
| |
14 |
| |
15 |
| |
16 | Check whether the owner and owner group of the files in the /var/paas directory used by the CCE are both paas. | |
17 | Check whether the kubelet on the node is running properly. | |
18 | Check whether the memory usage of the node exceeds 90%. | |
19 | Check whether the clock synchronization server ntpd or chronyd of the node is running properly. | |
20 | Check whether the OS kernel version of the node is supported by CCE. | |
21 | Check and make sure that the master nodes in your cluster have more than 2 CPU cores. | |
22 | Check whether the Python commands are available on a node. | |
23 | Check whether the nodes in the cluster are ready. | |
24 | Check whether journald of a node is normal. | |
25 | Check whether the containerd.sock file exists on the node. This file affects the startup of container runtime in the Euler OS. | |
26 | This check item is not typical and implies that an internal error was found during the pre-upgrade check. | |
27 | Check whether inaccessible mount points exist on the node. | |
28 | Check whether the taint needed for cluster upgrade exists on the node. | |
29 | Check whether there are any compatibility restrictions on the current Everest add-on. | |
30 | Check whether there are compatibility limitations between the current and target cce-controller-hpa add-on versions. | |
31 | Check whether the current cluster version and the target version support enhanced CPU policy. | |
32 | Check whether the container runtime and network components on the worker nodes are healthy. | |
33 | Check whether cluster components such as the Kubernetes component, container runtime component, and network component are running properly before the upgrade. | |
34 | Check whether the resources of Kubernetes components, such as etcd and kube-controller-manager, exceed the upper limit. | |
35 | The system scans the audit logs of the past day to check whether the user calls the deprecated APIs of the target Kubernetes version. Note Due to the limited time range of audit logs, this check item is only an auxiliary method. APIs to be deprecated may have been used in the cluster, but their usage is not included in the audit logs of the past day. Check the API usage carefully. | |
36 | If IPv6 is enabled for a CCE Turbo cluster, check whether the target cluster version supports IPv6. | |
37 | Check whether NetworkManager of a node is normal. | |
38 | Check the ID file format. | |
39 | When you upgrade a cluster to v1.19 or later, the system checks whether the following configuration files have been modified on the backend: | |
40 | Check whether the configuration files of key components exist on the node. | |
41 | Check whether the current CoreDNS key configuration Corefile is different from the Helm release record. The difference may be overwritten during the add-on upgrade, affecting domain name resolution in the cluster. | |
42 | Check whether the sudo commands and sudo-related files of the node are working. | |
43 | Whether some key commands that the node upgrade depends on are working | |
44 | Check whether the docker/containerd.sock file is directly mounted to the pods on a node. During an upgrade, Docker or containerd restarts and the sock file on the host changes, but the sock file mounted to pods does not change accordingly. As a result, your services cannot access Docker or containerd due to sock file inconsistency. After the pods are rebuilt, the sock file is mounted to the pods again, and the issue is resolved accordingly. | |
45 | Check whether the certificate used by an HTTPS load balancer has been modified on ELB. | |
46 | Check whether the default mount directory and soft link on the node have been manually mounted or modified. | |
47 | Check whether user paas is allowed to log in to a node. | |
48 | Check whether the load balancer associated with a Service is allocated with a private IPv4 address. | |
49 | Check the historical upgrade records of the cluster and confirm that the current version of the cluster meets the requirements for upgrading to the target version. | |
50 | Check whether the CIDR block of the cluster management plane is the same as that configured on the backbone network. | |
51 | The GPU add-on is involved in the upgrade, which may affect the GPU driver installation during the creation of a GPU node. | |
52 | Check whether the default system parameter settings on your nodes are modified. | |
53 | Check whether there are residual package version data in the current cluster. | |
54 | Check whether the commands required for the upgrade are available on the node. | |
55 | Check whether swap has been enabled on cluster nodes. | |
56 | Check whether there are compatibility issues that may occur during nginx-ingress upgrade. | |
57 | Check whether the service pods running on a containerd node are restarted when containerd is upgraded. | |
58 | Check whether the configuration of the CCE AI Suite add-on in a cluster has been intrusively modified. If so, upgrading the cluster may fail. | |
59 | Check whether GPU service pods are rebuilt in a cluster when kubelet is restarted during the upgrade of the cluster. | |
60 | If access control is configured, check whether the configurations are correct. | |
61 | Check whether the flavor of the master nodes in the cluster is the same as the actual flavor of these nodes. | |
62 | Check whether the number of available IP addresses in the cluster subnet supports rolling upgrade. | |
63 | Check whether an alarm is generated when a cluster is upgraded to v1.27 or later. Do not use Docker in clusters of versions later than 1.27. | |
64 | Check whether an alarm is generated when a cluster is upgraded to v1.27 or later. Do not use Docker in clusters of versions later than 1.27. | |
65 | Check the number of images on your node. If there are more than 1000 images, it takes a long time for Docker to start, affecting the standard Docker output and functions such as Nginx. |