Category: Data

Jupyter notebooks, an open source tool originally designed for the scientific research community, could be a valuable aid in helping site reliability engineers (SREs) and other operational folk investigate, document and even recreate fixes for site incidents, Zadka said.

Incident resolution is by nature exploratory,” Zadka said, an activity well supported by the notebook format.

As much of the data science community knows, Project Jupyter provides a web interface for users to create and share documents with live code, equations, visualizations and text.

So another set of code can be crafted to query the AWS API that can describe all the instances, along with the handy identifying tags that the SRE put in place earlier: Once the EC2 instance is identified, a next step is to connect to the instance, via SSH, to confirm it has a core dump file, indicating the crashed service: And, to finish the job, the core file is moved elsewhere.

Once done, the engineer can export to notebook to HTML, where it can be viewed as a presentation by anyone, and attached to the incident ticket.

Related Articles