Host Metrics and Dimensions¶
Metric | Description | Value Range | Unit |
---|---|---|---|
Total CPU cores (aom_node_cpu_limit_core) | Total number of CPU cores that have been applied for a measured object | >= 1 | Cores |
Used CPU cores (aom_node_cpu_used_core) | Number of CPU cores used by a measured object | >= 0 | Cores |
CPU usage (aom_node_cpu_usage) | CPU usage of a measured object | 0-100 | % |
Available physical memory (aom_node_memory_free_megabytes) | Available physical memory of a measured object | >= 0 | MB |
Available virtual memory (aom_node_virtual_memory_free_megabytes) | Available virtual memory of a measured object | >= 0 | MB |
Total GPU memory (aom_node_gpu_memory_free_megabytes) | Total GPU memory of a measured object | > 0 | MB |
GPU memory usage (aom_node_gpu_memory_usage) | Percentage of the used GPU memory to the total GPU memory | 0-100 | % |
Used GPU memory (aom_node_gpu_memory_used_megabytes) | GPU memory used by a measured object | >= 0 | MB |
GPU usage (aom_node_gpu_usage) | GPU usage of a measured object | 0-100 | % |
Physical memory usage (aom_node_memory_usage) | Percentage of the used physical memory to the total physical memory | 0-100 | % |
Host status (aom_node_status) | Host status |
| N/A |
NTP offset (aom_node_ntp_offset_ms) | Offset between the local time of the host and the NTP server time. The closer the NTP offset is to 0, the closer the local time of the host is to the time of the NTP server. |
| ms |
NTP server status (aom_node_ntp_server_status) | Whether the host is connected to the NTP server | 0 or 1
| N/A |
NTP synchronization status (aom_node_ntp_status) | Whether the local time of the host is synchronized with the NTP server time | 0 or 1
| N/A |
Processes (aom_node_process_number) | Number of processes on a measured object | >= 0 | N/A |
GPU temperature (aom_node_gpu_temperature_centigrade) | GPU temperature of a measured object |
| °C |
Total physical memory (aom_node_memory_total_megabytes) | Total physical memory that has been applied for a measured object | >= 0 | MB |
Total virtual memory (aom_node_virtual_memory_total_megabytes) | Total virtual memory that has been applied for a measured object | >= 0 | MB |
Virtual memory usage (aom_node_virtual_memory_usage) | Percentage of the used virtual memory to the total virtual memory | 0-100 | % |
Threads (aom_node_current_threads_num) | Number of threads created on a host | >= 0 | N/A |
Max. threads (aom_node_sys_max_threads_num) | Maximum number of threads that can be created on a host | >= 0 | N/A |
Total physical disk space (aom_node_phy_disk_total_capacity_megabytes) | Total disk space of a host | >= 0 | MB |
Used disk space (aom_node_physical_disk_total_used_megabytes) | Used disk space of a host | >= 0 | MB |
Hosts (aom_billing_hostUsed) | Number of hosts connected per day | >= 0 | N/A |
Note
Memory usage = (Physical memory capacity - Available physical memory capacity)/Physical memory capacity; Virtual memory usage = ((Physical memory capacity + Total virtual memory capacity) - (Available physical memory capacity + Available virtual memory capacity))/(Physical memory capacity + Total virtual memory capacity)
The virtual memory of a VM is 0 MB by default. If no virtual memory is configured, the memory usage on the monitoring page is the same as the virtual memory usage.
For the total and used physical disk space, only the space of the local disk partitions' file systems is counted. The file systems (such as JuiceFS, NFS, and SMB) mounted to the host through the network are not taken into account.
Dimension | Description |
---|---|
clusterId | Cluster ID |
clusterName | Cluster name |
gpuName | GPU name |
gpuID | GPU ID |
npuName | NPU name |
npuID | NPU ID |
hostID | Host ID |
nameSpace | Cluster namespace |
nodeIP | Host IP address |
hostName | Host name |