Monitoring and Observability

14 Feb 2025
3 Minutes to read
Contributors

Monitoring and Observability

Updated on 14 Feb 2025
3 Minutes to read
Contributors

Article summary

Did you find this summary helpful?

Thank you for your feedback!

This document outlines the observability infrastructure for Knovvu applications on three pillars: metrics, traces, and logs. Each pillar provides unique and complementary perspectives on system health and operations.

Pillar	Definition
Metrics	Metrics are numerical data points collected over time, representing system performance and resource utilization (e.g., CPU usage, memory consumption, or request latency). They enable trend analysis, performance monitoring, and alerting when thresholds are breached.
Traces	Traces capture the lifecycle of individual requests or transactions as they propagate through distributed systems. By visualizing the path and timing of operations across services, traces help identify bottlenecks, dependencies, and inefficiencies in complex architectures.
Logs	Logs are structured or unstructured text records that capture discrete events within a system. They provide detailed context about what happened, when, and where, making them essential for diagnosing issues and understanding system behavior at specific points in time.

Goals of Observability

Early Issue Detection
Faster Troubleshooting
Performance Optimization
Scalability Readiness

Metrics

Architecture

Knovvu applications expose metrics in Prometheus format. Prometheus pulls metrics from Knovvu applications, leveraging service discovery to track targets dynamically, and stores time-series data for analysis. Grafana queries Prometheus via PromQL to visualize metrics, while Alert Manager processes alerts and routes notifications to Opsgenie and email. This setup ensures real-time monitoring, alerting, and incident management for maintaining application reliability.

Metrics.drawio.png

Alerts

Infrastructure alerts

For infrastructure alerts, rule definitions provided by the Kube Prometheus Stack are used. These rules cover cluster health, node status, workload performance, and health of Kubernetes components. They can be found at the manifest files from the following link.

alertmanager-prometheusRule.yaml
kubePrometheus-prometheusRule.yaml
kubeStateMetrics-prometheusRule.yaml
kubernetesControlPlane-prometheusRule.yaml
nodeExporter-prometheusRule.yaml
prometheus-prometheusRule.yaml
prometheusOperator-prometheusRule.yaml

Application and business Alerts

Application and Business alerts are defined by Sestek per application.

Alerting for customers managing their own Kubernetes clusters

When customers manage their own Kubernetes clusters, Sestek does not deploy the Prometheus stack, as it requires cluster-wide access and configuration privileges. In these cases, customers are expected to monitor Knovvu applications using their own Prometheus-based monitoring stack. Please contact Sestek for receiving guidelines on defining Prometheus alerts based on Knovvu application metrics within customer-managed clusters.

Dashboards

Infrastructure dashboards:

For infrastructure dashboards, dashboards provided by the Kube Prometheus Stack are used. They can be found at the manifest files from the following link.

Application and business dashboards:

Application and business dashboards are defined by Sestek per application.

A sample dashboard displaying Knovvu Virtual Agent Orchestrator API requests in Grafana can be found below:

sample dashboard.png

Notifications

Knovvu applications use Opsgenie as an alert management platform to ensure timely incident response. Alerts generated from Prometheus are routed through Opsgenie, where they are prioritized, escalated, and notified to the appropriate teams for quick resolution.

When Kubernetes is managed by Sestek, email notifications are configured to be sent to a Sestek email address, which relays these alerts into Opsgenie.

Notifications for customers managing their own Kubernetes clusters

For customers managing their own Kubernetes clusters, please contact Sestek to receive guidelines on integrating notifications with the Sestek Opsgenie system.

Traces

Architecture

Knovvu applications use OpenTelemetry for tracing. By default, traces are sent directly to Elastic APM. However, if customers require integration with different tracing systems, Knovvu applications can be configured to send traces to an OpenTelemetry Collector, enabling compatibility with other observability platforms.

Tracing Data

Tracing in Knovvu applications provides insights into application performance, dependencies, and errors. Here are the key types of information that can be received through tracing:

Request Flow & Latency Tracking
Service Dependencies
Performance Metrics
Distributed Tracing

A sample dashboard displaying Knovvu Virtual Agent Orchestrator API requests can be found below:

sample trace.png

Logs

Architecture

Knovvu applications use Elasticsearch as the central logging system. The logging architecture consists of two main approaches depending on the application type. Some applications, like .NET applications, write logs directly to Elasticsearch. Other applications, like core C++ applications, send logs to Logstash, which processes and transforms the logs before forwarding them to Elasticsearch.

Log Data

All Knovvu applications follow a structured logging approach to ensure consistency and ease of analysis. Each log entry typically includes:

Timestamp
Log Level (e.g., INFO, WARN, ERROR)
Service Name (the application generating the log)
Message (descriptive log message)
Pod Information

Retention

Knovvu platform defines retention retention based on both date and storage. These values are configurable, defaults being 7 days and 250GB.

Next topic: Security

Was this article helpful?

What's Next

Security

Table of contents

Goals of Observability
Metrics
Traces
Logs