NGINXTooManyErrorCodesPerIngress

Prev Next

Meaning

This alert is triggered when more than 10% of total requests for a specific ingress return 4xx or 5xx error codes (excluding common client-side errors like 404 and 403), and there are at least 10 such failed requests within 5 minutes. This may indicate backend failures or routing issues.

Full context

This alert focuses on ingress-level error rates that exceed acceptable thresholds. It filters out expected or benign 4xx status codes and highlights spikes that might suggest actual misbehavior of the application or infrastructure.

Impact

  • Degraded user experience or failed API calls.
  • Potential backend application crashes or misconfigurations.
  • May indicate broken ingress routing or health check failures.

Diagnosis

  • Identify the ingress from the alert.
  • Use ingress logs and metrics to determine which service is behind the failures.
  • Check backend application logs and health probes.
  • Validate recent deployment changes or configuration updates.

Mitigation

  • Roll back recent changes if correlated.
  • Restart or scale backend services if unhealthy.
  • Investigate ingress annotations or rewrite rules that may be misconfigured.
  • Escalate if persistent failures affect production traffic.