Distributed Logging & Monitoring Pipeline with Loki, Prometheus, Grafana and automated alerting on Slack Channel

What is a Distributed Logging and monitoring System?

A distributed logging system is a centralized approach to collecting, aggregating, and analysing log data from multiple services, applications, and infrastructure components running across different servers or containers in a distributed environment.

Centralized collection: Gathering logs from various sources into a single location
Unified view: Providing a consolidated perspective of system behaviour across all components
Scalability: Handling massive volumes of log data generated by distributed systems
Real-time monitoring: Enabling quick detection and diagnosis of issues across the entire infrastructure
Search and analysis: Offering powerful tools to query and analyse logs for troubleshooting and performance optimization

Key components typically include log collectors (agents running on each node), a central storage system, and visualization/query interfaces for log analysis.

Project Overview

Nexcell Monitoring is a distributed logging and monitoring system built with a FastAPI application, Loki, prometheus and grafana, all deployed via docker compose. It collects metrics, aggregates logs, visualises data in real time and sends alerts to Slack when thresholds are breached and when resolved.

Expected Outputs

Centralized logging through docker
Real-time metrics collection via prometheus
Log storage via loki
Performance monitoring dashboards of 6 panels; CPU usage, Memory usage, Request throughput, P95 Latency/Response time, Failure Rate and Error Logs
Integration to slack channel for alerting of critical issues and when resolved.

Architectural Diagram

Nexcell Monitoring.png