08 · Failure Taxonomy
Multi-Chain Failure Taxonomy
Execution, verification, and settlement are separated. Liveness failures look like safety failures in the UI. 'Rollback' is usually economic compensation, not state rewind.
By John Wright-Nyingifa · Product Designer building infrastructure for DeFi, DePIN, and autonomous agents.

Live Signal · March 2026
$3B stolen in 119 hacks by mid-2025 (50%+ increase over 2024). Bybit: $1.5B, largest crypto theft ever (Safe{Wallet} JS injection). 88% of stolen funds from private key compromises. Oracle attacks: 13% of DeFi exploits. Starknet: 9-hour + 4-hour outages. Kroma: permanent shutdown. Only 4.6% of stolen bridge funds voluntarily returned.
Multi-chain systems fail in ways that are hard to describe because execution, verification, and settlement are separated. Liveness failures look like safety failures in the UI. And "rollback" is usually economic compensation, not state rewind.
This page is a taxonomy of failures and the UX responses that keep users safe, informed, and unblocked.
Failure Categories
Ten categories of multi-chain failure, each with distinct user symptoms:
System executed but proof/finality is not complete. User symptom: "I can't withdraw yet." Signals: prover queue, congestion, slow finality.
Destination action failed after source succeeded. User symptom: "Funds left, nothing arrived." Causes: contract revert, slippage, out-of-gas, destination congestion.
Different layers disagree temporarily about what is final. User symptom: status flips or remains ambiguous. Causes: reorgs, delayed inclusion, dispute windows.
Action executed on unintended domain. User symptom: asset appears but unusable. Causes: UI ambiguity, wallet mis-selection, misconfigured routing.
Deadline exceeded for some phase. User symptom: "stuck" status. Causes: relayer outage, prover backlog, gas spikes.
Inclusion halts or slows materially. User symptom: transactions not included. Causes: outage, leader failover, censorship.
Data not available, verification cannot proceed. User symptom: finalization paused. Causes: DA network outage, publishing failure.
Previously seen inclusion is undone. User symptom: "it confirmed, then unconfirmed." Causes: probabilistic finality, congestion.
Execution/submission becomes uneconomical or stuck. User symptom: long delay or failure. Causes: sudden market activity, auction dynamics.
Blocked by policy, allowlist, or limits. User symptom: "action not allowed." Causes: spending limits, contract denylist, missing approvals.
INCIDENT TIMELINE (2025-2026)
Feb 2025 Bybit $1.5B Safe{Wallet} JS injection
Nov 2025 Moonwell $1M Chainlink oracle malfunction
Feb 2026 CrossCurve $3M Spoofed cross-chain messages
Feb 2026 IoTeX $8.8M Private key compromise
L2 OUTAGES
Sept 2025 Starknet 9 hours Grinta upgrade, chain reorgs
Jan 2026 Starknet 4 hours Block production halt
June 2025 Kroma Permanent Shut down, funds at riskRecovery Patterns
Seven response patterns, each for a different failure context:
When route A fails predictably: auto-switch to route B within same constraints. Show what changed and why.
Transient liveness issues: retry with backoff. Show retry count and next attempt time window.
Dependency is down (relayer, bridge, sequencer): choose alternative path. Show tradeoff (time, fee, trust).
Remote execution failed but funds recoverable: provide "Claim refund" action. Show eligibility conditions and expected time.
System cannot safely continue: cancel pending steps, settle to stable state. Show what was executed and what was not.
Notify on: phase changes, long delays beyond estimate, action required (claim, approve, re-confirm).
After resolution: what happened, impact, how the user was protected, what changed to prevent recurrence. Bybit set the standard with two public forensic reports.
FAILURE → RECOVERY MAPPING Verification delay → Phase UI + "usable soon" / "final later" Remote exec failure → Safe claim (refund) or checkpoint resume Settlement mismatch → Clear status + wait for resolution Wrong chain → Re-route with explicit tradeoffs Timeout → Retry with backoff → Cancel → Refund Sequencer stall → "Stuck" state + forced inclusion if available DA failure → Queue actions + incident banner Reorg → Re-confirm state + explain what changed Gas exhaustion → Wait or re-submit at new price Permission rejected → Clear reason + next action to resolve PRINCIPLE ┌─────────────────────────────────────────────┐ │ "Safely halted" requires PATIENCE │ │ "Failed" requires ACTION │ │ Most UX treats both the same. │ └─────────────────────────────────────────────┘
UX Language Guide
Avoid ambiguity in status copy:
"Included by sequencer," "Data available," "Verified onchain," "Finalized for withdrawal." Each state maps to a system phase.
"Confirmed" without specifying what kind. "Completed" when finality is pending. These create false confidence.
Glossary
System is not making progress (stall, timeout).
System produced an invalid result (exploit, reorg).
Refund/compensation, not literal state rewind.
Auto-switching to a predefined alternative on failure.
Explanation of what happened after an incident resolves.