KnovvuCoreFatalLogRules

Prev Next

Meaning

This alert is triggered when a core service logs more than one fatal-level message within a 1-minute window. Fatal logs indicate unrecoverable errors that typically cause service crashes or major disruptions. The alert fires at a critical level.

Full context

Fatal log entries usually reflect catastrophic failures such as unhandled exceptions, corrupted state, or forced shutdowns.

Impact

  • Service likely crashed or entered a failed state.
  • Immediate disruption of functionality or data processing.
  • Potential cascading impact on dependent systems.

Diagnosis

  • Identify the affected service from the alert label.
  • Check logs for fatal errors and stack traces.
  • Correlate with system events like deployments, resource spikes, or dependency failures.

Mitigation

  • Restart the service immediately if it has stopped.
  • Revert recent changes if fatal errors were introduced during deployment.
  • Escalate to engineering for root cause analysis and remediation.