- Print
- PDF
Article summary
Did you find this summary helpful?
Thank you for your feedback!
Meaning
This alert is triggered when a core service logs more than 500 messages at error or critical level within a 5-minute window. It indicates that the service may be experiencing serious issues. The alert fires at a critical level if this condition persists for 1 minute.
Full context
A surge in error or critical-level logs typically points to unhandled exceptions, infrastructure problems, or severe application failures.
Impact
- Service degradation or failure.
- Potential impact on system stability or user experience.
- Log storage may fill rapidly.
Diagnosis
- Identify the affected service from the alert label.
- Check logs for recurring exceptions or error patterns.
- Correlate with deployment, scaling, or dependency changes.
Mitigation
- Restart or isolate the affected service.
- Revert recent changes if applicable.
- Escalate to engineering if the root cause is not immediately clear or service is unstable.
Was this article helpful?