No matter how well run an application is, eventually it will have problems. These problems could occur thanks to many things: code/infrastructure changes, bugs in your application, weird input from users, backhoe incidents, even cosmic rays. No matter how well made your application is, something will cause a problem in your application at some point.

A common misunderstanding of “incident response” is that it’s really about “outage response” — dealing with times when your application becomes unavailable.

A robust incident response tool and the right mindset to eliminate incidents are two key factors in operating a well-designed application.

Related Articles