- Print
- PDF
This document outlines the observability infrastructure for Knovvu applications on three pillars: metrics, traces, and logs. Each pillar provides unique and complementary perspectives on system health and operations.
Pillar | Definition |
---|---|
Metrics | Metrics are numerical data points collected over time, representing system performance and resource utilization (e.g., CPU usage, memory consumption, or request latency). They enable trend analysis, performance monitoring, and alerting when thresholds are breached. |
Traces | Traces capture the lifecycle of individual requests or transactions as they propagate through distributed systems. By visualizing the path and timing of operations across services, traces help identify bottlenecks, dependencies, and inefficiencies in complex architectures. |
Logs | Logs are structured or unstructured text records that capture discrete events within a system. They provide detailed context about what happened, when, and where, making them essential for diagnosing issues and understanding system behavior at specific points in time. |
Goals of Observability
- Early Issue Detection
- Faster Troubleshooting
- Performance Optimization
- Scalability Readiness
Metrics
Architecture
Knovvu applications expose metrics in Prometheus format. Prometheus pulls metrics from Knovvu applications, leveraging service discovery to track targets dynamically, and stores time-series data for analysis. Grafana queries Prometheus via PromQL to visualize metrics, while Alert Manager processes alerts and routes notifications to Opsgenie and email. This setup ensures real-time monitoring, alerting, and incident management for maintaining application reliability.
Alerts
Infrastructure alerts
For infrastructure alerts, rule definitions provided by the Kube Prometheus Stack are used. These rules cover cluster health, node status, workload performance, and health of Kubernetes components. They can be found at the manifest files from the following link.
- alertmanager-prometheusRule.yaml
- kubePrometheus-prometheusRule.yaml
- kubeStateMetrics-prometheusRule.yaml
- kubernetesControlPlane-prometheusRule.yaml
- nodeExporter-prometheusRule.yaml
- prometheus-prometheusRule.yaml
- prometheusOperator-prometheusRule.yaml
Application and business Alerts
Application and Business alerts are defined by Sestek per application.
Alerting for customers managing their own Kubernetes clusters
When customers manage their own Kubernetes clusters, Sestek does not deploy the Prometheus stack, as it requires cluster-wide access and configuration privileges. In these cases, customers are expected to monitor Knovvu applications using their own Prometheus-based monitoring stack. Please contact Sestek for receiving guidelines on defining Prometheus alerts based on Knovvu application metrics within customer-managed clusters.
Dashboards
Infrastructure dashboards:
For infrastructure dashboards, dashboards provided by the Kube Prometheus Stack are used. They can be found at the manifest files from the following link.
Application and business dashboards:
Application and business dashboards are defined by Sestek per application.
A sample dashboard displaying Knovvu Virtual Agent Orchestrator API requests in Grafana can be found below:
Notifications
Knovvu applications use Opsgenie as an alert management platform to ensure timely incident response. Alerts generated from Prometheus are routed through Opsgenie, where they are prioritized, escalated, and notified to the appropriate teams for quick resolution.
When Kubernetes is managed by Sestek, email notifications are configured to be sent to a Sestek email address, which relays these alerts into Opsgenie.
Notifications for customers managing their own Kubernetes clusters
For customers managing their own Kubernetes clusters, please contact Sestek to receive guidelines on integrating notifications with the Sestek Opsgenie system.
Traces
Architecture
Knovvu applications use OpenTelemetry for tracing. By default, traces are sent directly to Elastic APM. However, if customers require integration with different tracing systems, Knovvu applications can be configured to send traces to an OpenTelemetry Collector, enabling compatibility with other observability platforms.
Tracing Data
Tracing in Knovvu applications provides insights into application performance, dependencies, and errors. Here are the key types of information that can be received through tracing:
- Request Flow & Latency Tracking
- Service Dependencies
- Performance Metrics
- Distributed Tracing
A sample dashboard displaying Knovvu Virtual Agent Orchestrator API requests can be found below:
Logs
Architecture
Knovvu applications use Elasticsearch as the central logging system. The logging architecture consists of two main approaches depending on the application type. Some applications, like .NET applications, write logs directly to Elasticsearch. Other applications, like core C++ applications, send logs to Logstash, which processes and transforms the logs before forwarding them to Elasticsearch.
Log Data
All Knovvu applications follow a structured logging approach to ensure consistency and ease of analysis. Each log entry typically includes:
- Timestamp
- Log Level (e.g., INFO, WARN, ERROR)
- Service Name (the application generating the log)
- Message (descriptive log message)
- Pod Information
Retention
Knovvu platform defines retention retention based on both date and storage. These values are configurable, defaults being 7 days and 250GB.
Next topic: Security