KnovvuCoreErrorCriticalLogRules

Prev Next

Meaning

This alert is triggered when a core service logs more than 500 messages at error or critical level within a 5-minute window. It indicates that the service may be experiencing serious issues. The alert fires at a critical level if this condition persists for 1 minute.

Full context

A surge in error or critical-level logs typically points to unhandled exceptions, infrastructure problems, or severe application failures.

Impact

  • Service degradation or failure.
  • Potential impact on system stability or user experience.
  • Log storage may fill rapidly.

Diagnosis

  • Identify the affected service from the alert label.
  • Check logs for recurring exceptions or error patterns.
  • Correlate with deployment, scaling, or dependency changes.

Mitigation

  • Restart or isolate the affected service.
  • Revert recent changes if applicable.
  • Escalate to engineering if the root cause is not immediately clear or service is unstable.