While the development and QA staff have primary responsibility for testing, site reliability engineers (SREs) are typically responsible for the overall system and its availability. Unit tests are typically created and run in silos within individual teams, but integration, end-to-end and system tests must be conducted in a fuller environment that combines the work of multiple teams, technologies and more realistic deployment environments.
Having the task of creating and implementing tools that increase site reliability and performance, the SRE team has a role to play in enabling testing against the full distributed system.
Both developers and QA engineers actively create and maintain tests, and building them for distributed systems is notoriously difficult. Trace-based testing is the “easy button,” as it relies on work your organization has already done by implementing distributed tracing.