I would like to share a recent case study on our organization, Logz.io, which battled a very serious performance issue, the solution for which turned out to be a small change in code but with a huge impact on all of our HTTP endpoints in our platform. Among other topics, I’ll cover: Some may say we are experts in observability and mainly in logging; after all, that’s what we do for a living.
We use a custom field processingTime (the amount of time it takes for a backend endpoint to be processed) in our access logs and then visualize it over time in Kibana.
As a result, we were pretty confident that the root cause for all of it had to be in one of our core cache components.
We got a really cool and meaningful view of our system, which makes us better understand our business flows and, more importantly in our case, to quickly identify any bottlenecks.