Category: Data

Here’s a list of the ways that SREs from companies like Dropbox, Amazon, and Netflix have prepared for peak traffic this holiday season. Reviewing past incidents is a powerful way to gain an understanding of how your system has failed previously; and will offer you a lot of insight into how the system actually behaves in production. Armed with this insight, you’ll be more confident in the case of an outage.

Now that we’re working from home, it’s important for us to do a dress rehearsal to make sure that we are confident we’ll find gaps in our process before we end up troubleshooting an incident from the living room in the middle of Thanksgiving.

This can be for a number of reasons: inadequate tooling, hesitance to test in production, or perhaps even laziness.

Related Articles