I recently joined a new early-stage scale-up, https://gofirefly.io, and had to build their https://thenewstack.io/5-tips-to-improve-your-sre-incident-metrics/ (SRE) strategy from the ground up. We hope this post will be beneficial, both to first SREs at companies and those joining larger teams, and provide a better understanding of the SRE role in production operations.
However for an SRE, it is important to do so without affecting other critical aspects of engineering such as velocity or stability.
The SRE will introduce “production principles” — on running code in production and on how the code is written affects the systems.
Reducing complexity in all of these core areas will eventually be what leads to improved performance and stability, while maintaining a healthy amount of error budget.