Key Parameters of the GPU Add-on

Check Items

Check whether some configurations of the CCE AI Suite add-on installed in a cluster are intrusively modified. If yes, the upgrade may fail.

Solutions

  1. Use kubectl to access the cluster.

  2. Run the following command to obtain the add-on instance details:

    kubectl get ds nvidia-driver-installer -nkube-system -oyaml
    
  3. Check whether the value of UpdateStrategy is changed to OnDelete. If yes, change it to RollingUpdate.

  4. Check whether the value of NVIDIA_DRIVER_DOWNLOAD_URL is the same as the add-on IP address on the add-on details page. If no, change the value on the web page.