Full Documentation

METHODOLOGY

APEX is designed to be auditable. Every weight is published, every limitation is documented, and every structural decision is justified below. This is not a black box — it is a reasoned framework built on peer-reviewed research and deliberate trade-offs.

Design Philosophy
01
Composite Over Single-Metric
No single statistic captures player value. APEX deliberately blends five independent estimator families rather than defaulting to whichever metric is currently fashionable. Each pillar acts as a check on the others.
02
Era-Neutral via Z-Scoring
Pace, shot selection, and defensive schemes change every decade. APEX normalizes every metric against contemporaries in the same season pool, not fixed historical baselines. A great player in 2003 and a great player in 2025 are each measured against their own era.
03
Transparent Weighting
Weights are published and justified. Where peer-reviewed research supports a specific allocation, we use it. Where expert consensus shapes a decision, we document that too. Opacity is not precision — it is just opacity.
04
Deliberate Scope
APEX measures regular season individual value for qualified players. It does not measure playoff performance, career value, leadership, or anything not captured in publicly available data. Knowing what you don't measure matters as much as knowing what you do.
What APEX Measures
APEX MeasuresAPEX Does NOT Measure
Regular season individual valuePlayoff performance
Current season (era-neutral)Career or historical value
On-court impact across five pillarsLeadership, culture, intangibles
Qualified players (≥1,000 minutes, ≥20 GP)Two-way, 10-day, or G-League players
Publicly available metric dataTracking-based shot quality, off-ball positioning
Scored Metrics — APEX V4.2
Metric Detail
■  OFFENSIVE IMPACT — 32%
O-EPM
~13% overall
40% of pillar
What It Measures

Offensive Estimated Plus/Minus from Dunks & Threes (off column). The offensive-side component of EPM — a RAPM-family estimator isolating individual offensive win-probability contribution from teammates and opponents. V3.0 uses the offensive component only to avoid encoding defensive signal in the Offensive Impact pillar. Deshpande & Jensen (2016) identify RAPM-family metrics as more predictive than box-score alternatives at equivalent sample sizes.

Known Limitations

Full methodology is unpublished — no external peer review. Single-season estimates carry wider confidence intervals for players in system transitions or limited high-leverage minutes. The regression-to-prior can suppress historically extreme peaks. Combined with O-LEBRON and OBPM to reduce dependence on any single unpublished methodology.

O-LEBRON
~11% overall
35% of pillar
What It Measures

Offensive component of LEBRON (BBall-Index). Measures offensive win-probability contribution with explicit luck adjustment for single-season shooting variance. V3.0 uses the offensive-side component only — prior versions used full two-way LEBRON, which encodes defensive signal and caused double-counting against the Defensive Impact pillar. Franks et al. (2016) establish the theoretical basis for luck-adjusted on/off estimation.

Known Limitations

Despite luck adjustment, partially team-contaminated for primary offensive options in elite systems. The RAPM component depends on lineup frequency; infrequent lineup combinations produce estimates with higher uncertainty. Replacing full LEBRON with O-LEBRON removes the defensive channel but also narrows the signal to offensive contexts only.

OBPM
~8% overall
25% of pillar
What It Measures

Offensive Box Plus/Minus (Basketball-Reference). Offensive-side estimate of individual contribution per 100 possessions using box score inputs only. The most historically available offensive metric, with data from 1973-74 onward — essential for the 40-season backtest. V3.0 uses OBPM rather than full BPM to keep the Offensive Impact pillar free of defensive signal. Jewell et al. (JQAS) identify BPM as the box-score metric most robustly significant at high minutes loads; OBPM inherits this reproducibility advantage.

Known Limitations

Box-score construction misses off-ball offensive contributions: spacing, screening, and movement without the ball. Does not adjust for teammate or opponent quality. Switching from full BPM to OBPM removes the defensive-side signal but reduces the pillar's ability to capture two-way contributors through this metric alone — the design intent, since Defensive Impact carries that channel separately.

■  SHOT QUALITY — 14%
Rel TS+
~8% overall
60% of pillar
What It Measures

Relative True Shooting Plus (BBall-Index). TS% relative to league average, adjusted for shot quality — location, contest level, and shot type — relative to the player's own shot distribution. A Rel TS+ of +5 means the player scores 5 points per 100 attempts above what an average player would score on the same shot diet. Raw TS% was dropped in v3.0 because it is already encoded inside O-EPM, O-LEBRON, and OBPM; Rel TS+ is retained at 60% because its difficulty-adjustment provides signal not captured in those composite inputs.

Known Limitations

Proprietary adjustment methodology using discrete location and contest categories rather than continuous tracking data; within-category variance is treated as noise. Cannot fully resolve the creativity trade-off: a player who manufactures a high-quality contested mid-range off a screen and one who catches and shoots from the same spot receive similar credit, despite different generative difficulty.

USG% (interaction)
~6% overall
40% of pillar
What It Measures

Usage Rate interaction modifier (v1.6). Applied as a within-pillar multiplier: penalizes Shot Quality scores for high-volume seasons with below-average efficiency, and rewards efficient high-usage seasons. Addresses volume-scorer inflation — a player using 35%+ of possessions at below-average efficiency should not receive the same Shot Quality score as an equally efficient player at lower volume. Computed via linear regression of TS% on USG% within each K-means offensive archetype group. Raised from 10% to 40% pillar weight in v3.0 as TS% was dropped and USG interaction now serves as the second primary signal.

Known Limitations

Modifier thresholds derived from backtest optimization, not external peer review. Does not distinguish forced high usage (team's sole offensive option) from self-selected volume — both receive equivalent adjustments at the same efficiency level. Edge cases near the penalty threshold may introduce minor discontinuities in scoring.

■  DEFENSIVE IMPACT — 28%
D-LEBRON
~13% overall
47% of pillar
What It Measures

Defensive component of LEBRON (BBall-Index). Regression-adjusted and luck-adjusted — explicitly accounts for single-season opponent shooting variance that inflates raw on/off defensive metrics. Primary metric at 47% within the Defense pillar: its luck-adjustment framework handles team-context noise more reliably than raw RAPM components in high-minute, single-season samples (Franks et al., 2016). Normalized within BBall-Index Defensive Role peer groups to compare defenders against players in structurally similar assignments.

Known Limitations

Partially team-contaminated despite luck adjustment — anchor bigs in strong defensive schemes can record inflated D-LEBRON through the team-defense halo effect. Does not capture deterrence: altered shots, off-ball positioning, and perimeter disruption are invisible to on/off methods. Byman (2023) demonstrates off-ball movement features dominate predictive defensive RAPM models (R²=0.848) — a dimension no current on/off estimator can access.

D-EPM
~12% overall
43% of pillar
What It Measures

Defensive component of EPM (Dunks & Threes). RAPM-family estimator using regularized adjusted plus/minus to isolate individual defensive contribution from team context. Ranked as the most-trusted public defensive metric by EPM/RPM co-creator Steve Ilardi and first in a HoopsHype survey of approximately 30 NBA front-office executives (2024). Correlated with D-LEBRON (Pearson r ≈ 0.65) but captures independent variance — each metric identifies defenders the other systematically misses (Terner & Franks, 2021). Raised to 43% pillar weight in v2.6 to reflect growing external validation.

Known Limitations

RAPM estimates require large samples to stabilize; single-season figures carry meaningful uncertainty for players in role or system transitions. Does not capture deterrence or off-ball positioning — the same structural gap as D-LEBRON. Full methodology is unpublished, the same transparency limitation as EPM in the Impact pillar. Correlation with D-LEBRON creates partial redundancy, though the independent variance each captures justifies retaining both.

DBPM
~3% overall
10% of pillar
What It Measures

Defensive Box Plus/Minus (Basketball-Reference). Box-score estimate of individual defensive impact per 100 possessions. Retained at minimal weight (10% within Defense pillar) solely as a Bayesian stabilizer — prevents D-LEBRON and D-EPM from overreacting to single-season lineup outliers in small samples. Franks, D'Amour, Cervone & Bornn (JQAS, 2016) establish that box-score inputs fail to predict individual defensive RAPM at population level, which is why it is excluded from higher weighting.

Known Limitations

Systematically biased against players assigned the hardest defensive matchups — guarding elite scorers produces worse DBPM through proximity effects, not worse defense. Jewell, Page & Reese (JQAS) confirm box-score defensive counting stats show null predictive significance at the high minutes loads that characterize the APEX qualified pool. Used only as a stabilizer; never a meaningful signal in isolation at this player tier.

■  CREATION & PLAYMAKING — 16%
AST%
~7% overall
45% of pillar
What It Measures

Assist Percentage — estimated percentage of teammate field goals a player assisted while on the floor. The most direct box-score proxy for primary creation volume. Retained in v1.7 after backtest confirmed net contribution: removing AST% over 40 seasons produced a net loss of 1 correct MVP prediction. Per Deshpande & Jensen (2016), AST% provides complementary signal to RAPM-based estimates in separating playmaking impact from scoring impact — a distinction the offensive composites partially conflate.

Known Limitations

Does not distinguish creation quality — a simple two-man game dump-off and a contested skip pass for a corner three count identically. Structurally rewards players in ball-movement systems over equally skilled creators in isolation-heavy offenses, where fewer touch opportunities reduce assist volume. Within-archetype normalization partially corrects for system context but does not resolve stylistic asymmetries in assist accumulation.

AST/TOV
~5% overall
30% of pillar
What It Measures

Assist-to-Turnover ratio — assists generated per turnover committed. Captures decision-making efficiency alongside creation volume: a player with 8 assists and 4 turnovers (2.0) is scored below one with 6 assists and 1.5 turnovers (4.0) despite higher raw volume. Separates efficient creators from high-volume ball-handlers who inflate AST% at the cost of ball security. Combined with invTOV, provides a two-dimensional view of playmaking discipline within the pillar.

Known Limitations

Structurally favors non-primary handlers and conservative passers over high-volume creators in fast-paced offensive systems. A lead point guard with full-possession responsibility faces inherently more turnover pressure per touch than a secondary playmaker with limited ball-handling duties — the ratio format amplifies this asymmetry. Within-archetype normalization mitigates but does not fully resolve this structural difference.

invTOV
~2% overall
15% of pillar
What It Measures

Inverted Turnover Percentage — TOV% inverted so that higher values indicate better ball security. Applied as a penalty signal: players with disproportionate turnover rates relative to their archetype peer group receive a lower pillar contribution. Complements AST/TOV by penalizing absolute turnover rate, preventing high-AST% players with heavy turnover loads from being fully rewarded on volume alone. Weight reduced from 20% (v2.x) to 15% in v3.0.

Known Limitations

Highly correlated with usage rate by construction — primary ball-handlers face the most turnover exposure per possession as a structural feature of their role, not purely a decision-making signal. High-USG playmakers are penalized more steeply than spot-up shooters even at equal decision-making skill levels. Within-archetype normalization is the primary mitigation.

FTA/FGA
~2% overall
10% of pillar
What It Measures

Free throw attempt rate — free throw attempts per field goal attempt. Moved to Creation & Playmaking in v3.0: foul-drawing is a shot-creation and aggression proxy, not a shooting-efficiency metric (which is how it was classified in v2.x). A stable, repeatable individual skill tied to driving aggression and body control. Kubatko et al. (2007) confirm FTA rate as a persistent offensive differentiator across seasons — players who draw fouls at a high rate are generating offensive creation opportunities independent of field goal results.

Known Limitations

Partially reflects playing style rather than pure creation quality. Free throws drawn on three-point foul-baiting attempts inflate FTA/FGA without representing genuine drive-based creation. High FTA/FGA combined with below-average FT% reduces net scoring value from those possessions; this interaction is captured upstream by the O-EPM and O-LEBRON composites but not within this pillar directly.

■  PHYSICAL CONTRIBUTION — 10%
REB%
~10% overall
100% of pillar
What It Measures

Total Rebound Percentage — estimated percentage of available rebounds (offensive + defensive) secured while on the floor. The sole Physical Contribution metric following the v1.7 audit that removed STL% and BLK% on Jewell et al. (JQAS) null-significance grounds. Pillar weight raised from 8% to 10% in v3.0. Per Franks et al. (2016), total rebounding is among the most position-independent predictors of possession gain at the individual level. Above-average REB% at guard and wing positions represents genuine physical versatility — contested board capacity that translates across positional assignments.

Known Limitations

Structurally advantages centers and power forwards with greater rebound proximity by design. K-means archetype normalization significantly mitigates this by grouping rim-dominant bigs together, measuring their elite REB% against each other rather than against perimeter players. Within-cluster position variance means the bias is reduced, not eliminated. True positional versatility — switchability, transition contributions, multi-assignment defensive coverage — is not captured by REB% alone and remains a V3 scope item.

WS/48 and VORP appear in player cards as reference context but are not scored — excluded to avoid double-counting within the BPM metric family. Weights are approximate; exact pillar values are published in The Model.

Removed Metrics

Five metrics were removed from active scoring across versions v1.6–v2.0. Each removal is evidence-based: the metric failed to demonstrate predictive signal at the qualified-player tier, introduced more noise than information, or conflicted with a more robust alternative. They are documented below for transparency and reproducibility.

Metric Detail
DWS
Removed v1.7
Was in Defense
What It Measured

Defensive Win Shares (Basketball-Reference). Distributed team defensive efficiency to individual players based on playing time and box-score inputs — blocks, steals, defensive rebounds — using the Dean Oliver win-shares framework.

Why Removed

Jewell, Page & Reese (JQAS) demonstrate null predictive significance for box-score defensive counting stats at high minutes loads. DWS inherits this limitation — it distributes team defensive credit via the same null-signal metrics, meaning it primarily reflects assignment opportunity and team quality rather than individual defensive impact. D-LEBRON's luck-adjusted on/off framework is a more direct individual signal.

STL%
Removed v1.7
Was in Defense
What It Measured

Steal Percentage — percentage of opponent possessions ending in a player steal while on the floor. Intended to proxy for perimeter defensive activity and anticipation.

Why Removed

Jewell et al. (JQAS) find null significance at the high minutes threshold where APEX operates. Steals are strongly influenced by defensive scheme, opponent ball-handling quality, and assignment — a help-side defender positioned in passing lanes records more steals than an equal-quality on-ball defender assigned to the screener. The metric primarily tracks opportunity and scheme, not individual defensive skill. Additionally, steals carry a non-trivial luck component (tempo, opponent decision-making) that cannot be adjusted at the box-score level.

BLK%
Removed v1.7
Was in Defense
What It Measured

Block Percentage — percentage of opponent two-point field goal attempts blocked while on the floor. Intended to capture rim protection and deterrence contribution.

Why Removed

Jewell et al. (JQAS) — same null-significance finding. More critically, blocks are a proxy for shot alteration, not a measure of it. Byman (2023) demonstrates that off-ball movement features dominate predictive individual defensive RAPM models at R²=0.848, while block counts have negligible marginal contribution. Retaining BLK% created a partial double-count with D-LEBRON for elite rim protectors while failing to capture the actual deterrence effect (altered-but-not-blocked shots) that represents the primary defensive value of elite rim protectors.

DefOnOff
Removed v1.7
Was in Defense
What It Measured

Defensive On/Off differential — team defensive rating when the player is on the floor minus when off. Used at 85–90% of the Defense pillar weight in v1.6–v1.7 before being superseded by D-LEBRON.

Why Removed

Severely team-contaminated without adjustment: a dominant defender on a team with strong second-unit defenders records a flat or negative DefOnOff; a mediocre defender surrounded by weak backups records a strong positive. Franks et al. (2016) identify this as the fundamental problem with raw on/off metrics. D-LEBRON's luck adjustment was specifically designed to address this; once D-LEBRON was integrated and validated, raw DefOnOff was removed from scoring and retained only as player-card context.

DARKO DPM
Removed v2.0
Was in Defense / Impact
What It Measured

Dynamic and Regressed Kentucky Outcomes (Sill, 2010 framework). A Bayesian Kalman-filter metric designed to stabilize player estimates rapidly and project future performance while accounting for lineup context.

Why Removed

The DARKO creator explicitly notes the metric "cannot meaningfully be used in backwards-looking MVP debates" — its Kalman filter pulls outlier peaks toward the long-run prior, compressing the very multi-sigma seasons that APEX is designed to identify. A dominant 2021-22 Jokić season or a 2024-25 SGA peak is exactly the kind of outlier DARKO suppresses by design. The metric's primary purpose is projection and rapid stabilization for in-season use, not retrospective season assessment. EPM (RAPM-family, no Kalman suppression) serves the Impact pillar's retrospective evaluation goal more accurately.

Scoring Formula
apex_score_v3.0.pseudo
// Qualified pool: players with ≥1,000 minutes AND ≥20 games in the current season

// Step 1 — Collect raw metrics from Basketball-Reference, BBall-Index, DunksAndThrees
metrics = collect_all_sources()

// Step 2 — Z-score normalization
//   Offensive Impact:              full qualified pool (O-EPM, O-LEBRON, OBPM)
//   Shot Quality, Creation & Playmaking, Physical Contribution: within K-means offensive archetype (8 clusters)
//   Defensive Impact:              within BBall-Index defensive role group
z(metric) = (player_value - peer_group_mean) / peer_group_std_dev

// Step 3 — Clamp to ±3.5 to limit outlier distortion
z_clamped = clamp(z, -3.5, +3.5)

// Step 4 — Scale to 0–100
scaled(metric) = (z_clamped + 3.5) / 7.0 × 100

// Step 5 — Compute each pillar score as a weighted average of its metrics
pillar_score = Σ (scaled(metric_i) × metric_weight_i) / 100 × pillar_weight

// Step 6 — Apply GP^0.75 availability modifier (not a binary cutoff)
availability = min(1.0, (games_played / 65) ^ 0.75)  // full credit at 65+ games

// Step 7 — Final APEX Score
APEX_score = Σ(pillar_scores) × availability

// Context adjustments applied as modifiers (not scored independently):
//   · K-means archetype normalization — v3.0; 8 clusters on 6 style markers
//   · NOI/CV modifier                — ±5% cap on Scoring pillar (v1.6, empirical validation pending)

Z-scores measure how far above or below average a player is in each metric, expressed in standard deviations. A z-score of +2 means the player is two standard deviations above the peer group average — roughly the top 2% of the qualified pool. The ±3.5 clamp prevents one historically extreme season from compressing or expanding scores for every other player in the pool.

GP^0.75 Availability Modifier
Games PlayedModifier
65+ (full credit)1.000
580.918
500.821
400.695
300.560
200.413
Known Limitations
  1. 01
    Defense Is the Model's Weakest Point
    Box score defensive stats (STL%, BLK%) are noisy and gameable. DefOnOff is team-contaminated — it reflects lineup quality as much as individual impact. Off-ball movement data, shown to dominate OLS coefficients in tracking studies (Byman, 2023), is behind a paid API. APEX almost certainly undervalues elite off-ball defenders.
  2. 02
    Team Context Contamination
    On/off metrics conflate player quality and lineup quality. A dominant player on a bad team is systematically undervalued; a role player on a great team is overvalued. This is a structural problem shared by all public impact estimators — not unique to APEX.
  3. 03
    Small-Pool Z-Score Distortion
    With ~150–180 qualified players, one historically extreme season can shift the pool mean and widen the standard deviation, compressing or expanding scores for everyone else. Early-season scores are directional, not definitive.
  4. 04
    Availability Modifier Is Agnostic to Reason
    A player rested for load management is penalized identically to one who suffered a season-ending injury. Injury-adjusted availability requires injury report data not currently integrated. This will be addressed in a future version.
  5. 05
    Defensive Deterrence Is Invisible
    BLK% captures blocks — not altered shots. A rim protector who changes 15 shots per game and blocks 4 is scored on the 4. The deterrence value is real, substantial, and currently unscored in any public metric.
  6. 06
    Era-Neutral with Caveats
    Z-scoring handles pace and efficiency shifts well, but structural changes — the three-point revolution, positional fluidity, load management — create subtle normalization issues that compound with historical distance. Cross-era comparisons should be treated cautiously.
  7. 07
    NOI/CV Modifier Not Empirically Validated
    The ±5% cap on the NOI/CV consistency modifier (v1.6) is a containment measure, not a confidence expression. Linear weights are theoretically derived, not regression-fitted. The cap will be revised only after empirical calibration against historical data.
  8. 08
    ~~Rim-Running Bigs Overrated Under Full-Pool Z-Scoring~~ — Resolved in V2.3
    Addressed in v2.3 via K-means offensive archetype normalization (8 clusters, Brill et al. 2023). Rim runners (Gobert, Allen, Duren, Kessler) now form their own peer group — their elite TS% and rebounding are measured against each other, compressing within-group variance rather than inflating scores against guards and wings. Remaining open limitation: the Defense pillar still uses BBall-Index defRole groups, so rim deterrence and assignment-quality gaps are deferred to V3.
Academic Grounding

Each key model decision is grounded in peer-reviewed research. The decision line states what APEX specifically did — or chose not to do — based on each finding.

Franks, D'Amour, Cervone & Bornn (2016) — JQAS Vol. 12(4)
Three meta-criteria for useful composite metrics: stability, discrimination, and independence. RAPM-family metrics score highest on all three. On/off metrics are discriminative but lower-stability.
→ Five-pillar architecture built to satisfy the independence criterion — each pillar captures genuinely distinct variance. DBPM retained at 10% of Defense solely as a Bayesian stabilizer per their framework, not as a signal metric.
Terner & Franks (2021) — Annual Review of Statistics and Its Application, Vol. 8
Most comprehensive peer-reviewed basketball analytics survey. DRtg and DWS are "particularly sensitive to teammate performance and thus not reliable measures of individual" defense. RAPM-family metrics validated as the superior individual value framework.
→ DWS removed from scoring in v1.7. Multi-estimator blending (D-LEBRON + D-EPM + DBPM) adopted over any single defensive metric. This is the primary citation when challenged on RAPM-family metric centrality.
Deshpande & Jensen (2016) — JQAS Vol. 12(2)
Box-score metrics fail to account for game context. RAPM family is the superior individual value framework at equivalent sample sizes.
→ Offensive Impact pillar anchored to RAPM-family estimators (O-EPM, O-LEBRON, OBPM) at 32% of the model. PER excluded entirely — it is a pure box-score composite with no context adjustment.
Jewell, Page & Reese — JQAS
BPM is the most robustly significant box-score metric at high minutes loads. STL%, BLK%, and AST% show null predictive significance at the same tier.
→ STL% and BLK% removed from Physical Contribution in v1.7. OBPM/DBPM retained because they pass significance at high minutes. AST% kept despite the null finding because backtest confirmed net +1 correct MVP prediction over 40 seasons.
Byman (2023)
Off-ball movement features dominate OLS coefficients in defensive RAPM prediction models (R²=0.848). Block counts have negligible marginal contribution to predictive defensive RAPM.
→ BLK% excluded from scoring and documented as the formal basis for why no current box-score defensive metric captures individual defensive value. Identified as the key unresolved limitation of the Defense pillar.
HoopsHype survey of ~30 NBA executives (2024) + Steve Ilardi (EPM/RPM co-creator)
D-EPM ranked #1 most-trusted public defensive metric among front-office respondents. Independently endorsed as "the obvious gold standard" by the EPM co-creator.
→ D-EPM added to Defense pillar in v2.5 at 35%, raised to 43% in v2.6 as external validation accumulated. D-LEBRON retained as primary (47%) for its luck-adjustment framework.
Kubatko et al. (2007) — Journal of Quantitative Analysis in Sports, Vol. 3(1)
TS% is the most predictive single-metric proxy for offensive scoring efficiency. FTA rate is a persistent, stable individual offensive differentiator across seasons.
→ TS% and FTA/FGA included from v1.0. Raw TS% later removed in v3.0 (already encoded inside O-EPM/O-LEBRON/OBPM); Relative TS+ retained as the difficulty-adjusted variant providing independent signal. FTA/FGA moved from Shot Quality to Creation & Playmaking in v3.0 as the correct conceptual home.
Brill et al. (2023)
K=8 functional archetypes via K-means clustering on 48 player variables produce stable, interpretable offensive role groups.
→ K-means 8-cluster normalization adopted in v2.3. Shot Quality, Creation & Playmaking, and Physical Contribution metrics z-scored within K-means offensive archetype peer groups instead of the full pool. Resolved rim-running big inflation.
Dehesa et al. (2019) — Kinesiology, 51(1)
In close games (margin ≤8 pts), box score KPIs show no statistically significant cluster separation (all p>.05, ES≈0), while NET/ON/OFF metrics dominate (F>1,499, p<.001).
→ Supports Offensive Impact pillar's 32% weight and RAPM-family centrality. Documented as a known limitation of the Scoring and Playmaking pillars — box-score metrics lose discrimination in high-leverage contexts.
ACM ICSTPA (2024) — ML SHAP analysis, 1947–2024 MVP data
SHAP feature ranking on CatBoost MVP predictor: VORP #1, PPG #2, PER #3, USG% #4, BPM #5. VORP is the top predictor of MVP voting, not of player value.
→ VORP kept display-only. Including it would introduce voter narrative into a value model designed to be independent of voters. This is the formal basis for the display/scored distinction.
Gong, Whitehead et al. (2024) — JQAS
Gap between offensive position groups has narrowed significantly over 2015–2022. Hierarchical Bayesian plus/minus confirms individual contribution separates from team context more cleanly with RAPM methods.
→ Supports position normalization (v1.2) and era-relative z-scoring over fixed historical baselines. Contemporary peer groups preferred over cross-era comparisons.
Data Sources
Source What It Provides Key Caveat
Basketball-Reference BPM, DBPM, OBPM, WS/48, VORP, TS%, AST%, TOV%, REB%, USG%, FTA/FGA, AST/TOV, GP, minutes Primary source. Scraped via automated pipeline. Full history to 1973-74. Free. Rate-limited to 4 seconds/request. On/off tables are JS-rendered — not accessible via scraping.
BBall-Index LEBRON, D-LEBRON, O-LEBRON, Relative True Shooting%, Offensive Role, Defensive Role $5/month Data Tools package. Obtained via manual Excel export from Leaderboards Tool each season. Offensive Role replaced by K-means clustering in v2.3 (retained as display metadata).
Dunks & Threes EPM (Estimated Plus/Minus) v3.0: O-EPM (offensive component only, off column). 40% of the Offensive Impact pillar. Replaces full two-way EPM to eliminate defensive signal from the offensive pillar.
Backtest — Full Results

APEX v3.0 was backtested against 40 seasons of MVP voting (1985–86 through 2024–25). The model correctly identifies the MVP winner in 25 of 40 seasons — 62.5% accuracy. The 25-season window (2000–01 onwards, where O-EPM and O-LEBRON are available) shows higher accuracy; the pre-2000 era is constrained to OBPM+DBPM only. The 2-point accuracy drop from v2.6 (67.5%) is the cost of eliminating the double-counting architecture — the model is structurally cleaner even if two live-season picks change. In each miss, the model's pick had stronger underlying numbers.

The Misses — An Honest Accounting
Season Actual MVP APEX Pick Result
APEX's misses are analytically defensible. In each case, the model's pick had stronger underlying numbers. What APEX cannot capture is voter sentiment, narrative momentum, and fatigue toward repeat candidates. These factors are real — they are simply not quantifiable. APEX tracks player value, not award probability. The v1.6 voter-adjusted prediction layer (a separate, clearly labeled output) addresses this distinction explicitly.