To avoid high costs of downtime, mission critical applications in the cloud need to achieve resilience against degradation of cloud provider APIs and services. In 2021, AWS launched https://docs.aws.amazon.com/fis/latest/userguide/what-is.html (FIS), a fully managed service to perform fault injection experiments on workloads in AWS to improve their https://docs.aws.amazon.com/wellarchitected/latest/reliability-pillar/welcome.html and resilience.

To inject failures we use https://docs.aws.amazon.com/systems-manager/latest/userguide/what-is-systems-manager.html (AWS SSM) https://docs.aws.amazon.com/systems-manager/latest/userguide/systems-manager-automation.html to attach and detach IAM policies at the API fault boundary and FIS to orchestrate the workflow.

This shows the SSM document ARN to be used for fault injection and the JSON parameters passed to the SSM document specifying the IAM Role to modify and the IAM Policy to use.

It shows the SSM execution role permitting access to use SSM automation documents as well as modify IAM roles and policies via the SSM document.

Related Articles