Technology Apr 23, 2026 · 8 min read

I Backtested My Own GEX Product Across 8 Years of SPY. Most of It Is Just VIX.

I sell a dealer-exposure API. GEX, DEX, VEX, CHEX — the whole Greek-exposure stack. So when I tell you the backtest on my own product is mostly a VIX proxy, that is not a competitive hit piece. It is the result I got when I ran the test honestly, and it is the test I wanted before I bought any of th...

DE
DEV Community
by tomasz dobrowolski
I Backtested My Own GEX Product Across 8 Years of SPY. Most of It Is Just VIX.

I sell a dealer-exposure API. GEX, DEX, VEX, CHEX — the whole Greek-exposure stack. So when I tell you the backtest on my own product is mostly a VIX proxy, that is not a competitive hit piece. It is the result I got when I ran the test honestly, and it is the test I wanted before I bought any of this from anyone else.

Most options dashboards present four dealer-exposure Greeks as four independent signals. The pitch is intuitive: dealers hedge, those hedges create flows, and those flows forecast vol, returns, or IV changes. The mechanical story is real. The predictive story is much thinner once you control for what a trader can already see on a free VIX chart.

I pre-registered the hypotheses. Then I ran the statistics. Here is what 1,972 SPY trading days say.

What I tested

One row per SPY EOD. Each row carries dealer-signed GEX, DEX, VEX, CHEX, gamma flip, VIX, ATM IV, realized volatility, and forward outcomes. Positive exposure means dealers are net long that Greek.

Item Setting
Universe SPY only
Window 2018-04-16 to 2026-04-02
Sample 1,972 EOD snapshots; 1,971 usable next-day outcomes
Primary signals GEX, DEX, VEX, CHEX
Primary outcomes Next-day realized vol, next-day return, next-day ATM IV change
Controls VIX, then VIX + ATM IV
Primary tests Quintile sorts, top-minus-bottom t-tests, Spearman rank correlation

The question is not "do dealer exposures mean anything?" They do. The question is: do they add predictive information after you already know VIX and ATM IV?

Naive GEX looks excellent

Sort every day by GEX into quintiles. Measure next-day realized vol as |log_return| * sqrt(252).

GEX quintile n Mean net_gex ($B) Mean next-day RV (%) Median next-day RV (%)
Q1 — most negative 395 -7.91 16.97 13.54
Q2 — moderately negative 394 -2.79 18.57 12.91
Q3 — roughly neutral 394 +0.25 12.71 10.29
Q4 — moderately positive 394 +2.97 9.24 6.45
Q5 — most positive 394 +6.50 6.34 4.86

Q5 minus Q1: -10.63 vol points, t = -13.00, p = 1.0e-33. Spearman ρ = -0.36 on n=1,971.

This is the chart that sells a subscription. It is also not enough. A strong raw GEX backtest can still be a volatility-regime proxy — negative-GEX and high-VIX days often describe the same market state.

(Fact check: the Q1/Q2 mean inversion is real. Outlier-heavy COVID-era days; medians are clean. The rank correlation matters more than strict mean monotonicity.)

The control that changes the story

Regress both signal and outcome on the control set, then Spearman-correlate the residuals. Bonferroni significance for the four primary tests is p < 0.0125.

Signal → outcome Raw ρ (p) After VIX (p) After VIX + ATM IV (p)
GEX → next-day RV -0.36 (4.6e-60) -0.14 (1.2e-9) -0.03 (0.18)
DEX → next-day return -0.03 (0.19) +0.01 (0.69) +0.02 (0.40)
VEX → next-day ATM IV change -0.16 (2.1e-13) -0.05 (0.02) -0.01 (0.77)
CHEX → next-day return -0.05 (0.03) -0.01 (0.63) -0.00 (0.93)

The raw signals are mostly volatility signals wearing more sophisticated labels. GEX survives the VIX-only control, but at ~40% of the raw magnitude. Add ATM IV and the GEX residual drops to ρ = -0.03, p = 0.18. DEX, VEX, and CHEX do not carry robust independent predictive information in these residualized rank tests.

GEX top-minus-bottom realized-vol difference tells the same story: -10.63 vol points raw, -3.15 after VIX, -0.99 after VIX + ATM IV (p = 0.25).

The honest result: GEX is not fake. The raw effect is real. The incremental information after VIX and ATM IV is the part that fails. Useful regime descriptor ≠ independent forecasting edge.

GEX × VIX double-sort

Double-sort: first split by VIX quintile, then within each VIX bucket split by GEX. Cells = mean next-day realized vol (%).

VIX \ GEX Q1 most negative Q2 Q3 Q4 Q5 most positive
V1 lowest VIX 8.02 7.32 6.62 5.22 5.07
V2 11.70 10.30 8.85 6.48 6.04
V3 12.00 12.05 12.15 9.62 8.56
V4 15.87 15.35 17.18 12.31 8.06
V5 highest VIX 20.57 24.91 37.69 21.71 15.91

V1 and V2 are textbook. V3 and V4 still show much lower RV in positive-GEX buckets, but not strictly monotonic. V5, the highest-VIX bucket, is a non-monotonic mess.

Practical takeaway: GEX's predictive value is mostly a calm-to-moderate phenomenon. In the top VIX quintile, where an extra risk signal would matter most, this double-sort does not show a clean edge.

Exposure verdicts

GEX: weak survivor. Useful as a regime descriptor; modestly incremental over VIX alone; not significant after VIX + ATM IV. Raw backtest is real, but it is mostly a vol-regime backtest.

DEX: dead on arrival. Top-minus-bottom next-day return difference: 0.00%, t = 0.04, p = 0.97. Raw Spearman ρ = -0.03, p = 0.19. DEX does not predict next-day SPY direction in this EOD test.

VEX: mostly a VIX proxy. Raw ρ = -0.16, p = 2.1e-13. But VEX correlates with VIX at +0.72 and with ATM IV at +0.76. After controlling for VIX + ATM IV, residualized VEX falls to ρ = -0.01, p = 0.77.

CHEX: weak and fragile. One interesting raw result — sign agreement between CHEX and next-day return is 54.9% on n=1,967, p = 1.5e-5. But the pre-registered residualized rank test collapses under VIX control (ρ = -0.01, p = 0.63) and under VIX + ATM IV (ρ = -0.00, p = 0.93). A separate OLS spec does show a significant CHEX coefficient after VIX + AR(1) controls, so: CHEX is not robust in the rank/quintile framework used for the primary article claim.

Why they overlap

Spearman rank correlations across all 1,972 observations:

GEX DEX VEX CHEX VIX ATM IV
GEX +1.00 +0.73 -0.54 +0.59 -0.49 -0.63
DEX +0.73 +1.00 -0.89 +0.68 -0.58 -0.67
VEX -0.54 -0.89 +1.00 -0.65 +0.72 +0.76
CHEX +0.59 +0.68 -0.65 +1.00 -0.39 -0.46
VIX -0.49 -0.58 +0.72 -0.39 +1.00 +0.91
ATM IV -0.63 -0.67 +0.76 -0.46 +0.91 +1.00

DEX ↔ VEX = -0.89. VEX ↔ VIX = +0.72. VIX ↔ ATM IV = +0.91. That does not make the exposure Greeks useless, but it makes them dangerous to treat as independent features. If someone sells GEX/DEX/VEX/CHEX as four unrelated predictive signals, this correlation matrix is the rebuttal. They are four views of the same options chain, and much of what they capture is already in volatility level. One-and-a-half signals, not four.

High-VIX regime test

The regime split is where a GEX backtest matters most for real trading decisions. The high-VIX row was pre-registered, not cherry-picked.

Regime n Top-minus-bottom RV diff t p ρ
All days 1,971 -10.63 -13.00 1.0e-33 -0.36
Pre-COVID 463 -11.43 -6.78 4.5e-10 -0.41
COVID shock 72 -59.88 -3.88 0.001 -0.50
Post-COVID 1,436 -9.62 -9.58 8.7e-20 -0.32
Low-VIX days 1,478 -8.09 -11.81 3.9e-28 -0.32
High-VIX days 493 -1.89 -0.78 0.44 -0.02

Across all days GEX looks excellent. Inside high-VIX days the effect is statistically absent. That is the trading lesson: GEX is best at labeling calm regimes. It is not a clean crisis detector in this EOD SPY sample.

Train/test stability

A 70/30 chronological split looks impressive at first glance:

Split Window n Top-minus-bottom diff ρ
In-sample 2018-04 → 2023-11 1,380 -10.77 -0.367
Out-of-sample 2023-12 → 2026-04 591 -10.79 -0.374

Naive read: GEX is exceptionally robust. Cautious read: GEX is proxying a stable volatility-regime variable, and VIX + ATM IV absorb the effect. The residualized controls point to the second explanation.

What traders should do with this

  • Use GEX as a regime label, not a standalone forecast. Positive GEX describes a calmer market state. It does not, by itself, survive the VIX + ATM IV control as an independent next-day realized-vol signal.
  • Do not count the exposure stack as four independent factors. DEX and VEX are near mirror images in this sample; VEX is tightly linked to vol level.
  • Demand orthogonal tests. A vendor chart that beats a coin flip is not enough. The right question is whether the signal adds anything after obvious baselines — VIX, ATM IV, prior return, outcome persistence.
  • Correlation ≠ tradable PnL. Even GEX's VIX-only residual effect (ρ = -0.14) needs transaction costs, latency, data availability, and execution rules before it is a strategy.

Limitations

  1. EOD panel only. 16:00 ET snapshots vs next-day outcomes. The canonical intraday CHEX claim — charm flow into the last hour — needs a separate minute-level study.
  2. SPY only. Deepest, most-hedged ETF. Single stocks and index options may behave differently.
  3. Linear residual controls. OLS residualization. A nonlinear model could recover interactions not measured here.
  4. Correlation, not PnL. Statistical association, not executable trades after costs.
  5. 0DTE deserves its own article. Post-COVID includes the 0DTE era, but this is not a dedicated intraday 0DTE mechanics test.
  6. Dealer-sign convention matters. Positive = dealers net long that Greek. Opposite conventions invert signs but not conclusions.
  7. Stress obs are limited. COVID, 2022, 2025 provide meaningful turmoil, but high-VIX inference still rests on fewer days than calm-market inference.

How to reproduce

Three endpoints on the FlashAlpha Historical API, looped across 1,972 trading days, joined on (ts, symbol), outcomes shifted -1.

# Dealer-exposure summary (GEX, DEX, VEX, CHEX, gamma flip, walls)
curl -H "X-Api-Key: $KEY" \
  "https://historical.flashalpha.com/v1/exposure/summary/SPY?at=2024-06-14T20:00:00Z"

# Stock summary (spot, VIX context, ATM IV, realized vol)
curl -H "X-Api-Key: $KEY" \
  "https://historical.flashalpha.com/v1/stock/SPY/summary?at=2024-06-14T20:00:00Z"

# VRP snapshot (implied-vs-realized spread, regime, harvest score)
curl -H "X-Api-Key: $KEY" \
  "https://historical.flashalpha.com/v1/vrp/SPY?at=2024-06-14T20:00:00Z"

Residualize signals and outcomes on VIX, then on VIX + ATM IV. Re-run the quintile sorts, Spearman tests, and regime splits.

TL;DR

Gamma exposure works at the raw regime level, but the independent GEX edge is much smaller than the marketing version. Across 1,972 SPY days, raw GEX → next-day realized vol is ρ = -0.36. After VIX + ATM IV controls, it is ρ = -0.03 with p = 0.18. DEX has no next-day return signal, VEX is mostly a VIX/IV proxy, and CHEX is fragile in residualized rank tests. Use dealer exposure as context. Do not treat it as four clean standalone alpha signals.

Originally published at flashalpha.com. The full version includes the raw CSVs, pre-registered hypotheses file, and Python re-run scripts for every table above.

DE
Source

This article was originally published by DEV Community and written by tomasz dobrowolski.

Read original article on DEV Community
Back to Discover

Reading List