retail attention as a volatility predictor and momentum amplifier

For Math 585 (Algorithmic Trading) this spring, our team investigated whether retail attention, measured by Wikipedia pageviews and WallStreetBets discussion volume, predicts stock return volatility and amplifies momentum. We ran nine statistical validation tests on 40 U.S. equities over 2018–2026, deployed a live strategy on QuantConnect, and honestly documented every approach that didn't work before the one that did.

The short answer: attention robustly predicts next-day absolute returns beyond GARCH(1,1) forecasts (OOS t = 6.00, 94.4% of tickers positive), and attention in uptrend regimes has a strong positive directional effect that attention in isolation does not. The deployed strategy hit Sharpe ratios of 0.56 (IS), 1.83 (OOS-A), 0.78 (OOS-B), and 1.04 through the COVID crash, all with max drawdown ≤ 10.3%.

the central insight

We started by trying to use Reddit sentiment as a directional signal. FinBERT on WallStreetBets posts achieved roughly 33% accuracy; no better than random three-class classification. WSB is written in slang, sarcasm, and emoji-laden prose that FinBERT was never trained on. We considered GPT-4/Claude for better classification but 1.6M posts at $0.01–0.03 per post would have run $16,000–$48,000. Student research project budget is definitely not that!

This constraint forced a useful reframe. Rather than chasing sentiment, we focused on attention volume: does the sheer count of posts or pageviews, regardless of tone, contain tradeable information about price movement intensity? This is a cleaner question. Attention is measured objectively; it doesn't require a model to interpret it.

The key finding: attention volume predicts volatility magnitude. Attention combined with an existing trend predicts direction. Attention alone is ambiguous or slightly negative on signed returns.

data pipeline (my contribution)

I built the ingestion and signal pipeline that merged two heterogeneous sources: 1.6M Reddit posts (Pushshift archive, irregular timestamps) and Wikimedia REST API pageview counts (daily, UTC). The pipeline resolves ticker aliases to 40 canonical identifiers, GOOG/GOOGL and similar, using a hand-curated mapping table. Weekend Reddit activity is realigned to the following trading day using business-day reindexing; missing Wikipedia counts are forward-filled up to three consecutive days before being flagged as unavailable.

The final output, qc_signal_v3.csv, contains 75,855 ticker-day rows of daily z-scores, composite attention flags, and spike indicators. It's uploaded to the QuantConnect Object Store via an HMAC-SHA256 authenticated API call for live strategy consumption.

I also built a low-latency GPU inference infrastructure on Modal; a T4 GPU running FinBERT at batch_size=64 in a debian_slim container, dispatched via RPC with results returned as pickle. We didn't end up using it in the final strategy (the sentiment signal was too noisy), but the infrastructure was designed to be reactivated for incremental live inference.

signal construction

For each ticker and source (wiki, reddit), we compute a 30-day rolling z-score shifted one day to avoid look-ahead bias:

mathz_i,t = clip( (x_i,t − μ_i,t−1:30) / σ_i,t−1:30 , −10, 10 )

The composite signal uses IC-weighted blending. The raw information coefficient analysis on IS data gives Wiki IC = 0.025 and Reddit IC = 0.975, but this overfits to the 9.3% of ticker-days where Reddit data actually exists. For the remaining 90.7% (wiki-only), we stabilize to w_wiki = 0.60, w_reddit = 0.40.

High attention and spike flags:

mathHighAttention_i,t   = 1{ z_comp ≥ 2.5 }
FirstDaySpike_i,t   = 1{ z_comp ≥ 3.0 } · 1{ z_comp_t−1 < 3.0 }

nine statistical tests

Jerry ran all nine tests. I'm listing the ones that shaped strategy design most directly.

Granger causality (ADL, 5 lags). With Benjamini-Hochberg FDR correction at q = 0.05, the composite signal Granger-causes absolute returns in 57.9% of tickers OOS; up from 28.9% IS. OOS significance exceeding IS means this isn't an artifact of in-sample fitting. The relationship is stronger out of sample.

Signal	IS Raw	IS FDR	OOS Raw	OOS FDR
Composite Z → \|Return\|	44.7%	28.9%	63.2%	57.9%
Wiki Z → \|Return\|	34.2%	15.8%	42.1%	36.8%
Reddit Z → \|Return\|	48.6%	43.2%	42.9%	22.9%

Reddit alone overfits (43.2% IS → 22.9% OOS). The composite signal generalizes because Wikipedia provides stable, broad coverage while Reddit adds value only when it's available.

GARCH benchmark. If our attention signal just captures the same volatility clustering that GARCH already models, it adds nothing. We fit GARCH(1,1) on each ticker's IS returns, produced OOS forecasts, then compared GARCH forecast errors on high-attention vs. low-attention days. The gap was mean error +0.732 percentage points higher on high-attention days, t = 6.00, 94.4% of tickers positive. GARCH systematically underestimates volatility when attention is elevated.

Distributional regime. High-attention days (z ≥ 2.5, 3.78% of observations) show mean |return| of 2.571% versus 1.664% in low-attention periods; a 1.92× volatility ratio confirmed by a KS test at p = 3.6×10⁻²⁸. This directly justifies the short-reduction rule: holding a short during nearly double normal volatility is not worth it.

Panel regression with interaction. The core finding for strategy design comes from the directional panel regression (pooled, HC1 standard errors):

Variable	Coef.	p
Attention (composite z)	−0.0008	0.012
Momentum (63d)	+0.0124	<10⁻¹⁸
Trend (above SMA200)	+0.0014	<10⁻⁴
Attn × Trend	+0.0019	2.7×10⁻⁷

Attention alone has a slightly negative directional coefficient. Attention in an uptrend regime has a strongly positive coefficient. This is the whole basis for the strategy: attention doesn't predict direction, but it amplifies existing momentum.

Signal orthogonalization. We regressed attention on momentum and 21-day historical volatility and kept the residual. 58% of the raw Spearman correlation with |return| survives (ρ = 0.048 vs. 0.082 raw, both p < 10⁻¹⁹). Attention is not just a proxy for momentum or vol.

five attempts that didn't work

The paper's strategy iteration section is the most honest part of it, and I want to preserve that here.

Attempt 1: ATM straddles on spike days. GARCH t = 6.00; our signal clearly predicts vol better than GARCH, so buy straddles when vol is about to spike. Sharpe ≈ −0.6. The problem: implied volatility is not GARCH. The options market already prices in that high-attention days are volatile. Our signal beats GARCH but we never tested whether it beats IV. The variance risk premium ate the entire edge.

Attempt 2: delta-hedged version. Buy calls, short stock, isolate gamma P&L. Same problem compounded by bid-ask spreads and discrete hedging error. Uncapped drawdown hit 41% before we stopped it.

Attempt 3: attention overlay on 12-1 momentum. Boost weight +8% when a long has a spike. Underperformed pure momentum by 0.3 Sharpe points. The spike boost adds variance (spikes coincide with high-vol days) without adding return, because we hadn't yet run the panel regression showing the attn×trend interaction only fires in uptrend regimes.

Attempt 4: market-wide risk alarm. When multiple tickers spiked simultaneously, reduce gross exposure and hedge with short SPY/QQQ. Drawdown 25%. The hedge bled money through 2018–2021's bull market. With 40 tickers at a 3.65% spike rate, expected ≈ 1.5 spikes per day, so the alarm was almost always on.

Attempt 5: event-driven, entry only on spike. Enter when spike fires with trend and momentum filters, size by inverse vol. Sharpe ≈ −0.5. First-day spikes after filtering are only ≈ 0.9 per day across 40 tickers; too rare to support an event-driven architecture while running a continuous index hedge.

Each failure pointed to the same conclusion: momentum supplies continuous direction; attention modulates position size and manages risk. Not the other way around.

the final strategy

Every 7 trading days: rank all 40 tickers by 12-1 momentum (skip 21 days), long top 3 that pass the 200-day MA filter, short bottom 2 below the 50-day MA. Apply the attention overlay:

Spike boost: if a long has an active spike (z ≥ 3.0), add +8% to weight. Justified by the attn×trend interaction (p = 2.7×10⁻⁷).
Short reduction: if a short has z ≥ 2.5, halve its weight. Justified by the 1.92× vol ratio; don't hold shorts in elevated-vol regimes.

Caps: 20% per name, 27% long gross, 12% short gross, 25% net. Daily loss stop at −2%; drawdown halt at −7% from peak, resume at −3.5%.

backtest results

Metric	IS (2018–22)	OOS-A (May–Nov 23)	OOS-B (Jan–Jul 25)	COVID (Feb–May 20)
Sharpe Ratio	0.556	1.826	0.784	1.036
Annual Return	8.3%	31.7%	20.0%	20.3%
Max Drawdown	9.9%	4.8%	10.3%	7.0%
Sortino Ratio	0.608	2.402	0.890	1.342
Win Rate	63%	75%	71%	61%

OOS-A and OOS-B Sharpe ratios both exceed IS. That's the thing that matters most here: no overfitting. The COVID period (SPY −34% peak-to-trough) produced a Sharpe of 1.036 with only 7.0% drawdown; the drawdown halt at −7% caught it.

All parameters were frozen before any OOS backtest and are identical across all periods. The full frozen parameter table is in the paper.

paper trading (7 days)

Alicia and Jerry deployed the strategy live on QuantConnect on April 21, 2026. The paper trading instance ran on a 12-ticker universe (slightly different from the 40-ticker backtest parameters) and held NVDA, AMD, and GOOG. Over 6 trading days it returned +1.54% against a starting equity of $10M with max drawdown under 2%. Too short to be statistically meaningful, but consistent with the backtest profile.

what didn't work and what's honest

The data limitations are real. Wikipedia visitors are not all investors; a Tesla pageview spike might be students or journalists, not traders. Reddit's 17.7% ticker tag rate means 82.3% of posts are untagged and excluded. No intraday data, no bid-ask spreads, no vol surface. We can't run a 2008 stress test because the data starts in 2018.

What the evidence supports: attention Granger-causes volatility, high-attention days are nearly 2× more volatile, the signal survives orthogonalization, and factor alpha is significant. What it doesn't support: using attention as a standalone directional signal. It needs momentum and a trend filter to do anything useful directionally.

The full paper with all nine tests, the strategy iteration history, and the extended stochastic volatility framework is available on request.