CCE Node Problem Detector¶
Add-on Overview¶
CCE Node Problem Detector (node-problem-detector, NPD) is an add-on that monitors abnormal events of cluster nodes and connects to a third-party monitoring platform. It is a daemon running on each node. It collects node issues from different daemons and reports them to the API server. It can run as a DaemonSet or a daemon.
Add-on Parameters¶
Parameter | Mandatory | Type | Description |
---|---|---|---|
basic | No | object | Basic configuration parameters, which do not need to be specified |
flavor | Yes | Table 2 object | Flavor parameters |
custom | Yes | Table 3 object | Custom parameters |
Parameter | Mandatory | Type | Description |
---|---|---|---|
description | No | String | Add-on description |
name | Yes | String | Add-on specification name. The value is fixed at Single-instance. |
replicas | Yes | String | Number of pods. The default value is 1. |
resources | Yes | resources object | Container resource (CPU and memory) quotas |
Parameter | Mandatory | Type | Description |
---|---|---|---|
feature_gate | No | String | Feature gate, which is used to enable the beta features |
multiAZBalance | No | Bool | Multi AZ deployment |
multiAZEnabled | No | Bool | Whether to deploy the add-on pods in multiple AZs. The default value is false. If this parameter is set to true, cross-AZ deployment is forcibly performed. If this parameter is set to false, cross-AZ deployment is preferred. |
npc | Yes | object Table 5 | node-problem-controller configuration |
tolerations | No | List<Object> Table 7 | Tolerations of the add-on |
node_match_expressions | No | List<Object> Table 7 | Node affinity configuration of the add-on |
Parameter | Mandatory | Type | Description |
---|---|---|---|
limitsCpu | Yes | String | CPU size limit (unit: m) |
limitsMem | Yes | String | Memory size limit (unit: Mi) |
name | Yes | String | Add-on name. The value is fixed at custom-resources. |
requestsCpu | Yes | String | Requested CPU size (unit: m) |
requestsMem | Yes | String | Requested memory size (unit: Mi) |
Parameter | Mandatory | Type | Description |
---|---|---|---|
maxTaintedNode | Yes | String or Int | The maximum number of nodes that NPC can add taints to when a single fault occurs on multiple nodes for minimizing impact. The value can be in int or percentage format. |
Parameter | Mandatory | Type | Description |
---|---|---|---|
key | No | String | Taint key |
effect | No | String | Taint policy |
operator | No | String | Operator |
tolerationSeconds | No | Int | Toleration time window |
Parameter | Mandatory | Type | Description |
---|---|---|---|
key | No | String | Taint key |
values | No | List<String> | Node affinity name |
operator | No | String | Operator |
Example Request¶
{
"kind": "Addon",
"apiVersion": "v3",
"metadata": {
"annotations": {
"addon.install/type": "install"
}
},
"spec": {
"clusterID": "b78fb690-b82c-11ee-83cf-0255ac100b0f",
"version": "1.18.48",
"addonTemplateName": "npd",
"values": {
"basic": {
"image_version": "1.18.48",
"swr_addr": "***",
"swr_user": "***",
"rbac_enabled": true,
"cluster_version": "v1.23"
},
"flavor": {
"description": "custom resources",
"name": "custom-resources",
"replicas": 2,
"resources": [
{
"limitsCpu": "100m",
"limitsMem": "300Mi",
"name": "node-problem-controller",
"requestsCpu": "30m",
"requestsMem": "100Mi"
},
{
"limitsCpu": "100m",
"limitsMem": "300Mi",
"name": "node-problem-detector",
"requestsCpu": "30m",
"requestsMem": "100Mi"
}
],
"category": [
"CCE",
"Turbo"
]
},
"custom": {
"annotations": {},
"common": {},
"feature_gates": "",
"multiAZBalance": false,
"multiAZEnabled": false,
"node_match_expressions": [],
"npc": {
"maxTaintedNode": "10%"
},
"tolerations": [
{
"key": "node.kubernetes.io/not-ready",
"operator": "Exists",
"effect": "NoExecute",
"tolerationSeconds": 60
},
{
"key": "node.kubernetes.io/unreachable",
"operator": "Exists",
"effect": "NoExecute",
"tolerationSeconds": 60
}
]
}
}
}
}