10 of 11: The Regime Detection Track Record
Over 28 years, the system entered ALERT 43 times. Of those, 36 were followed by a decline of 5% or more within 60 trading days — an 84% hit rate. Seven were not.
The Headline Numbers
Since 1997, there have been 11 major S&P 500 drawdowns — declines of 7% or more from peak to trough. Our structural regime detection system identified 10 of 11 before they began, with a median lead time of 47 calendar days.
But detection rate alone doesn't tell the full story. What matters just as much is what happens across all 43 times the system has reached ALERT — not just the named events.
The Full ALERT Record
Over 28 years, the system entered ALERT 43 times. Of those, 36 were followed by a decline of 5% or more within 60 trading days — an 84% hit rate. Seven were not.
| Outcome Tier | Count | Description |
|---|---|---|
| Major drawdown (>20%) | 1 | COVID crash, GFC |
| Correction (10–20%) | 20 | Rate hike cycles, European debt crisis, etc. |
| Mild drawdown (5–10%) | 15 | Volmageddon, China devaluation, etc. |
| False positive (<5%) | 7 | ALERT fired, no significant drawdown followed |
The 84% figure is the number that matters most for practical use. It means roughly 5 out of 6 ALERT signals have been followed by a meaningful decline. It also means approximately 1 in 6 were not — the system is not infallible.
We publish the false positive count because transparency about when the system is wrong matters as much as showing when it's right.
The Named Event Record
The table below shows every major drawdown since 1997, whether the system detected it, and how much warning it provided.
| # | Event | Date | Drawdown | Result | ALERT Lead |
|---|---|---|---|---|---|
| 1 | Asian Financial Crisis | Oct 1997 | -11.2% | HIT | 256 days |
| 2 | LTCM / Russian Default | Aug 1998 | -19.0% | HIT | 13 days |
| 3 | Dot-Com Peak | Mar 2000 | -11.4% | HIT | 80 days |
| 4 | September 11 | Sep 2001 | -21.3% | MISS | — |
| 5 | GFC Bear Start | Oct 2007 | -17.8% | HIT | 243 days |
| 6 | GFC / Lehman | Sep 2008 | -48.4% | HIT | 0 days* |
| 7 | European Debt Crisis | Aug 2011 | -18.4% | HIT | 157 days |
| 8 | China Devaluation | Aug 2015 | -13.0% | HIT | 3 days |
| 9 | Volmageddon | Feb 2018 | -10.1% | HIT | 3 days |
| 10 | COVID-19 Crash | Mar 2020 | -33.7% | HIT | 56 days |
| 11 | 2022 Rate Hike Cycle | Jan 2022 | -23.0% | HIT | 38 days |
*Lehman: ALERT triggered on the event date itself. The broader GFC was detected 243 days earlier at the Bear Market Start.
Lead times ranged from same-day (Lehman, which was part of an already-detected crisis) to 256 days (Asian Financial Crisis). The median of 47 days represents roughly seven weeks of advance warning.
The Only Miss: September 11, 2001
The system detected only WATCH-level signals in the months before 9/11 but never escalated to ALERT. This is exactly the kind of event the system cannot detect by design — a sudden exogenous shock with no statistical precursors in market microstructure. No amount of structural analysis will front-run a terrorist attack.
The system is built to detect endogenous deterioration: the slow breakdown of market stability that precedes most major drawdowns. For that class of event, the record is 10 for 10.
The False Positives
Seven of 43 ALERT episodes were not followed by a 5%+ drawdown. We consider these false positives and track them with the same rigor as the detections.
In most cases, the system detected genuine structural stress that resolved without a significant decline — the market absorbed the shock. This is an inherent property of any early warning system: not every fire alarm means the building burns down, but you still want the alarm.
The false positive rate of 16% (7/43) is a cost of the system's sensitivity. A less sensitive system would produce fewer false alarms but would also miss more drawdowns. We believe the tradeoff — catching 84% of events at the cost of occasional false signals — is favorable for investors focused on risk management.
How It Works
The detection system monitors two independent channels of structural market health:
Channel 1 — Stability: Tracks whether S&P 500 returns are losing their normal self-correcting behavior. When stability metrics deteriorate beyond historical norms, it signals structural weakening — a market becoming more fragile.
Channel 2 — Distribution: Monitors the symmetry of market outcomes. When the balance of risk shifts toward the downside beyond historical norms, it signals elevated tail risk.
Each channel operates independently. A single channel reaching its threshold triggers DETERIORATING status. Both channels confirming triggers ALERT. This dual-confirmation requirement filters out noise from single-channel fluctuations, which is the primary driver of the system's relatively low false positive rate.
Methodology Notes
Detection criterion: System reached ALERT-level status within 90 calendar days preceding the event date.
Drawdown threshold: 7% or greater S&P 500 peak-to-trough drawdown (60-day lookback for peak, 120-day forward window for trough).
ALERT hit rate criterion: 5% or greater forward drawdown within 60 trading days of ALERT entry.
Validation: All statistics were independently validated on March 18, 2026, via a full replay of the detection engine on historical data from 1997–present. The validation script and internal report are maintained alongside the production codebase.
Important caveat: All detection results shown here are based on backtested analysis developed with the benefit of hindsight. The model parameters and thresholds were calibrated on historical data. Past detection does not guarantee future performance. This is a research tool, not investment advice.
What changed from the original version of this post: The original version (published March 17) highlighted a zero false positive rate across 4 recent production-period episodes. While accurate for that narrow timeframe, it was misleading when presented alongside the full 28-year backtest. This revision replaces that framing with the complete 43-episode record (36/43 hit rate, 7 false positives), which is both more statistically meaningful and more honest. We'd rather show you the full picture — including the misses — than a cherry-picked subset.