Distributed Tracing System
Lightweight distributed tracing for microservices with span aggregation and visualization.
Built a distributed tracing system for understanding request flows across microservices. Implements context propagation via headers, allowing traces to be correlated across services. Spans are aggregated with minimal overhead and stored for later analysis. Supports sampling to handle high-volume traces while maintaining visibility into important requests. Includes a UI for viewing traces, latency analysis, and error identification in complex distributed systems.

The system uses probabilistic sampling to reduce storage costs while maintaining visibility into slow/failing requests. Span data includes metadata about services, databases, and external APIs called. Query interface allows filtering by latency, error status, or service, making it easy to identify bottlenecks.