Incident Response Framework
AuthorityRail's operational doctrine: Detection → Classification → Communication → Recovery. The same SEV ladder used internally is published here so customers and auditors can verify alignment with the response actions they observe.
Last revised: 2026-05-17 · Version v1.0
Severity ladder
| SEV | Definition | Customer notification SLA | RCA SLA |
|---|---|---|---|
| SEV-1 | Core execution authority surface broken or compromised. Mass DENY, signing failure, security incident with customer data exposure. | < 1 hour | 5 business days |
| SEV-2 | Single non-gate service down with degraded impact, or sustained gate performance degradation. Partial outage. | < 4 hours | 10 business days |
| SEV-3 | Single-customer impact, minor degradation, non-customer-facing internal issue. | < 24 hours (direct) | Internal log; published only if customer requests |
| SEV-4 | Cosmetic, minor, no customer impact. | Monthly summary if relevant | None |
How customers are notified
- status.authorityrail.com — public status page; the first place to check for current state.
- Targeted email from
[email protected]for incidents affecting specific tenants. - Post-incident RCA published at
trust.authorityrail.com/incident-reports/<YYYY-MM-DD>-<short-tag>.mdwithin the SLA above.
Detection layer
- Sentry application monitoring (per Sprint Closure #2).
- Better Stack uptime monitoring on 8 customer-facing surfaces (per Sprint Closure #8).
- PagerDuty on-call routing with continually-re-page-until-acknowledged for SEV-1 (per Sprint Closure #9).
- Customer reports via
[email protected]or[email protected].
Recovery layer — runbooks
Five critical runbooks are maintained against the highest-impact failure modes:
- Signer failure (SEV-1)
- Supabase outage (SEV-1 full / SEV-2 degraded)
- Railway outage (SEV-1 / SEV-2)
- Deployment rollback (SEV varies)
- Customer key compromise (SEV-2 single / SEV-1 systemic)
Disaster recovery
Supabase Point-In-Time Recovery (PITR) targeted for activation per Sprint Closure #5 (Pro tier upgrade pending). Restore drill documented in docs/operations/disaster-recovery/supabase-pitr-restore-drill-2026-05-17.md.
Until PITR activation: Supabase managed backups on the AR-managed project. Multi-region active-active deployment is sequenced for Q1 2027 per the operational maturity gap analysis.
Postmortem doctrine
Every postmortem is blameless. The single on-call (today: Sammy Jones, founder) is one person; the goal is to make the system harder to break next time, not to assign fault. Postmortems use the canonical template at docs/operations/incident-response/rca-template.md.
Status page state mapping
- SEV-1 → Major outage banner; affected component red.
- SEV-2 → Partial outage banner; affected component amber.
- SEV-3 (multi-customer) → Degraded performance; affected component amber.
- SEV-4 → no status page state change.
Related documents
docs/operations/incident-response/ in the AuthorityRail repository. The two are kept in sync; this page is the canonical public version.