incident_management · saas · workflow

Canva auto-generates Post Incident Review summaries with GPT-4-chat

Canva's reliability engineers manually wrote Post Incident Review summaries after every incident, but over time the summaries became inconsistent and reviewers often lacked the context needed to review them quickly and effectively, creating ongoing toil for the engineering team.

How it works

Common implementation structure

How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.

Stage 1 · Fetch PIR from Confluence

The workflow starts by fetching the incident report from Confluence and parsing the HTML to extract the PIR content as raw text.

Tools used

GPT-4-chatConfluenceJiradata warehouse

Outcome

After approximately two months in production, most AI-generated PIR summaries remain unaltered by engineers, demonstrating the team's approval of GPT-4's output quality, with the process significantly improving the efficiency and consistency of PIR summarization and reducing operational toil for reliability engineers.

What failed first

A fine-tuned GPT model was evaluated as a candidate approach but discarded because the available training examples were insufficient to produce summaries capturing the specific details needed, and manual comparison showed the fine-tuned model underperformed GPT completion and GPT chat on accurately determining impact duration and correlating incident phases.

Results

Volume$0.06

Cost replaced$0.6

Running sinceapproximately two months before article publication

Source

https://www.canva.dev/blog/engineering/summarise-post-incident-reviews-with-gpt4/

How we source this →

Grounding & classification

Source type: technical build writeup

28 fields verified against source quotes, 1 dropped as unverifiable.

document aisummarizationknowledge basefailure mode describedhuman review describedmetric backednamed customerproduction runtime claimedtools describedworkflow describedsoftwareaccuracy improvementemployee productivitytime savedtechnical build writeupback office opsincident managementcase to summarydocument to record