What is a Distributed Logging and monitoring System?
A distributed logging system is a centralized approach to collecting, aggregating, and analysing log data from multiple services, applications, and infrastructure components running across different servers or containers in a distributed environment.
- Centralized collection: Gathering logs from various sources into a single location
- Unified view: Providing a consolidated perspective of system behaviour across all components
- Scalability: Handling massive volumes of log data generated by distributed systems
- Real-time monitoring: Enabling quick detection and diagnosis of issues across the entire infrastructure
- Search and analysis: Offering powerful tools to query and analyse logs for troubleshooting and performance optimization
Key components typically include log collectors (agents running on each node), a central storage system, and visualization/query interfaces for log analysis.
Project Overview
Nexcell Monitoring is a distributed logging and monitoring system built with a FastAPI application, Loki, prometheus and grafana, all deployed via docker compose. It collects metrics, aggregates logs, visualises data in real time and sends alerts to Slack when thresholds are breached and when resolved.
Expected Outputs
- Centralized logging through docker
- Real-time metrics collection via prometheus
- Log storage via loki
- Performance monitoring dashboards of 6 panels; CPU usage, Memory usage, Request throughput, P95 Latency/Response time, Failure Rate and Error Logs
- Integration to slack channel for alerting of critical issues and when resolved.
Architectural Diagram
