Category: Software

LightStep sponsored this post, in anticipation of Chaos Conf, Oct. 6-8, 2020. By offering an opportunity to practice handling failures in a safe way, chaos helps developers build more robust code as well as gain confidence in their ability to respond to failures. Distributed tracing can support this by helping to find root causes and other contributing factors: traces encode causality within a distributed system, so that you can use them to trace failures as they propagate across services.

Like other kinds of testing, you can think about test coverage as a way to measure progress in your chaos exercises.

That way you can ensure that you are getting good coverage — and not just hope that attacks are targeting both new and old versions — without increasing the number of faults.

Related Articles