Workflow Deployment

Incident Response

Structured response workflow when a production issue is detected

deploymentincidentproduction
CLAUDE.md

When a production incident is detected:

  1. Acknowledge the incident. Assign an incident lead if the team has a rotation.
  2. Assess severity: who is affected, how many users, is data at risk?
  3. If a recent deploy caused the issue, roll back immediately. Investigate after restoring service.
  4. If the cause is unknown, check: error logs, monitoring dashboards, recent deploys, infrastructure changes, external service status.
  5. Communicate status to stakeholders: what’s happening, who’s affected, ETA for resolution.
  6. Fix the immediate issue. Optimize for restoring service, not for perfect code.
  7. After resolution, write a postmortem: what happened, root cause, timeline, what prevented faster detection, and action items to prevent recurrence.

Copy this workflow into your CLAUDE.md or agent config file so your agent follows this process automatically.

get crystl