Alarm Status of a Deployed Real-Time Service

Symptom

A deployed real-time service is in the Alarm state.

Solution

The prediction using a real-time service that is in the Alarm state may fail. Perform the following operations to locate the fault and deploy the service again:

  1. Check whether there are too many prediction requests on the backend.

    If you call APIs for prediction, check whether there are too many prediction requests. A large number of prediction requests lead to the alarm state of the real-time service.

  2. Check whether the service memory is functional.

    Check whether memory overflow or leakage occurs in the inference code.

  3. Check whether the model is running properly.

    If the model fails, for example, the associated resources are faulty, check inference logs.