Fail-Safe Theorem

Tom McLaughlin   ·  

(noun) The theory that a system designed to fail safely will fail by failing to fail safely.

The payment system was timing out, so systems began to retry failed transactions, compounding the original system overload.

The Fail-Safe Theorem represents a fundamental paradox in system design. Safety mechanisms themselves become points of failure, often in ways that defeat their intended purpose. The theorem posits that the complexity required to implement fail-safe behavior introduces new failure modes that may be more dangerous than the original risks.

Examples:

  • Database failover systems that corrupt data during the failover process
  • Retry mechanisms that further overload the system that required the retry in the first place
  • Emergency brake systems that lock wheels and cause skidding
  • Network redundancy protocols that amplify cascading failures
  • Circuit breakers that fail to trip when overloaded