Distributed tracing with Envoy, Kuma, Grafana Agent, and Jaeger
As a cloud service provider, observability is a critical subject as it's strongly related to the availability of the services running on the platform. We need to understand everything that is happening on our platform to troubleshoot errors as fast as possible and improve performance issues. A year ago, while the platform was still in private beta, we faced a tough reliability issue: users were facing random 500 errors when accessing their applications.