Supported Events¶
Source | Namespace | Name | ID | Severity | Description | Handling Suggestion | Impact |
---|---|---|---|---|---|---|---|
GaussDB | SYS.GAUSSDBV5 | Process status alarm | ProcessStatusAlarm | Major | Key GaussDB processes exit, including CMS/CMA, ETCD, GTM, CN, and DN processes. | Wait until the process is automatically recovered or a primary/standby failover is automatically performed. Check whether services are recovered. If no, contact SRE engineers. | If processes on primary nodes are faulty, services are interrupted and then rolled back. If processes on standby nodes are faulty, services are not affected. |
Component status alarm | ComponentStatusAlarm | Major | Key GaussDB components do not respond, including CMA, ETCD, GTM, CN, and DN components. | Wait until the process is automatically recovered or a primary/standby failover is automatically performed. Check whether services are recovered. If no, contact SRE engineers. | If processes on primary nodes do not respond, neither do the services. If processes on standby nodes are faulty, services are not affected. | ||
Cluster status alarm | ClusterStatusAlarm | Major | The cluster is abnormal, including the following faults: The cluster is read-only. The majority of ETCD members are faulty. The cluster resources are unevenly distributed. | Contact SRE engineers. | If the cluster status is read-only, only read requests are processed. If the majority of ETCD members are faulty, the cluster is unavailable. If resources are unevenly distributed, the instance performance and reliability deteriorate. | ||
Hardware resource alarm | HardwareResourceAlarm | Major | A major hardware fault occurs in the instance, such as disk damage or GTM network fault. | Contact SRE engineers. | Some or all services are affected. | ||
Status transition alarm | StateTransitionAlarm | Major | The following events occur in the instance: DN build attempt, DN build failure, forcible DN promotion, primary/standby DN switchover/failover, or primary/standby GTM switchover/failover. | Wait until the fault is automatically rectified and check whether services are recovered. If no, contact SRE engineers. | Some services are interrupted. | ||
Other abnormal alarm | OtherAbnormalAlarm | Major | Disk usage threshold alarm | Monitor workload changes and scale up storage as needed. | If the used space exceeds the threshold, storage cannot be scaled up. | ||
Instance running status abnormal | TaurusInstanceRunningStatusAbnormal | Major | This event is a key alarm event and is reported when an instance is faulty due to a disaster or a server failure. | Submit a service ticket. | The database service may be unavailable. | ||
Instance running status recovered | TaurusInstanceRunningStatusRecovered | Major | GaussDB provides an HA tool to automatically or manually rectify the catastrophic fault. After the fault is rectified, this event is reported. | No further action is required. | None | ||
Node status abnormal | TaurusNodeRunningStatusAbnormal | Major | This event is a key alarm event and is reported when a database node is faulty due to a disaster or a server failure. | Check whether the database service is available and submit a service ticket. | The database service may be unavailable. | ||
Node recovered | TaurusNodeRunningStatusRecovered | Major | GaussDB provides an HA tool to automatically or manually rectify the catastrophic fault. After the fault is rectified, this event is reported. | No further action is required. | None | ||
Instance creation failure | GaussDBV5CreateInstanceFailed | Major | Instances fail to be created because the quota is insufficient or underlying resources are exhausted. | Release the instances that are no longer used and try to provision new instances again, or submit a service ticket to adjust the quota. | Instances fail to be created. | ||
Node adding failure | GaussDBV5ExpandClusterFailed | Major | The underlying resources are insufficient. | Submit a service ticket to ask O&M personnel to coordinate resources, delete the node that failed to be added and add a new one. | None | ||
Storage scale-up failure | GaussDBV5EnlargeVolumeFailed | Major | The underlying resources are insufficient. | Submit a service ticket. The O&M personnel will coordinate resources in the background and then you scale up the storage space again. | Services may be interrupted. | ||
Reboot failure | GaussDBV5RestartInstanceFailed | Major | The network is abnormal. | Retry the reboot operation or submit a service ticket to the O&M personnel. | The database service may be unavailable. | ||
Full backup failure | GaussDBV5FullBackupFailed | Major | The backup files fail to be exported or uploaded. | Submit a service ticket to O&M personnel. | Data cannot be backed up. | ||
Differential backup failure | GaussDBV5DifferentialBackupFailed | Major | The backup files fail to be exported or uploaded. | Submit a service ticket to O&M personnel. | Data cannot be backed up. | ||
Backup deletion failure | GaussDBV5DeleteBackupFailed | Major | Backup files fail to be cleared. | Submit a service ticket to O&M personnel. | There may be residual OBS files. | ||
EIP binding failure | GaussDBV5BindEIPFailed | Major | The EIP has been used or EIP resources are insufficient. | Submit a service ticket to O&M personnel. | The instance cannot be accessed from the Internet. | ||
EIP unbinding failure | GaussDBV5UnbindEIPFailed | Major | The network or the EIP service is faulty. | Unbind the IP address again or submit a service ticket to the O&M personnel. | Residual IP resources may be generated. | ||
Parameter template application failure | GaussDBV5ApplyParamFailed | Major | Changing a parameter group times out. | Change the parameter group again. | None | ||
Parameter modification failure | GaussDBV5UpdateInstanceParamGroupFailed | Major | Changing a parameter group times out. | Change the parameter group again. | None | ||
Backup and restoration failure | GaussDBV5RestoreFromBcakupFailed | Major | The underlying resources are insufficient or backup files fail to be downloaded. | Submit a service ticket. | The database service may be unavailable during the restoration failure. | ||
Hot patch installation failure | GaussDBV5UpgradeHotfixFailed | Major | Generally, this fault is caused by an error reported during kernel upgrade. | View the error information about the workflow and redo or skip the job. | None |