- Print
- PDF
Article summary
Did you find this summary helpful?
Thank you for your feedback!
Meaning
This alert is triggered when more than 10% of total requests for a specific ingress return 4xx or 5xx error codes (excluding common client-side errors like 404 and 403), and there are at least 10 such failed requests within 5 minutes. This may indicate backend failures or routing issues.
Full context
This alert focuses on ingress-level error rates that exceed acceptable thresholds. It filters out expected or benign 4xx status codes and highlights spikes that might suggest actual misbehavior of the application or infrastructure.
Impact
- Degraded user experience or failed API calls.
- Potential backend application crashes or misconfigurations.
- May indicate broken ingress routing or health check failures.
Diagnosis
- Identify the ingress from the alert.
- Use ingress logs and metrics to determine which service is behind the failures.
- Check backend application logs and health probes.
- Validate recent deployment changes or configuration updates.
Mitigation
- Roll back recent changes if correlated.
- Restart or scale backend services if unhealthy.
- Investigate ingress annotations or rewrite rules that may be misconfigured.
- Escalate if persistent failures affect production traffic.
Was this article helpful?