Case-bundle schema (v1) — data/caseN.js

Case-bundle schema (v1) — `data/caseN.js`

Contract between the backtest exporter (scripts/export_cases.py; replays one held-out station-day at every decision hour through the production stage-1 path) and the demo page reports/webapp/index.html (Examples 1–3 tabs).

Each file is a single JS statement wrapping ONE strict-JSON object:

// generated by the backtest case exporter — do not edit
window.CASES = window.CASES || {};
window.CASES["case1"] = { ...strict JSON, no comments, no trailing commas... };

The page loads data/case1.js … data/case3.js; a missing file simply leaves that tab in its “case pending” state (the <script> 404s harmlessly).

Top-level object

field	type	meaning
`schema`	str	literal `"case-bundle v1"`
`station`	str	ICAO id, e.g. `"KDEN"`
`name`	str	display name, e.g. `"Denver, CO"`
`date`	str	climate date (LST), `"YYYY-MM-DD"`
`tz_label`	str	e.g. `"LST (UTC-7)"` — the page never does tz math
`settle`	obj	`{"high": int, "low": int}` — the final official CLI integers
`model_version`	str	git short hash of the model that produced the bundle
`obs`	list	realized observation curve over the WHOLE day (see below)
`hours`	list	one entry per decision hour, ascending (see below)

obs entries (15-min subsampling is fine — display only):

field	type	meaning
`t`	str	`"HH:MM"` local-standard
`f`	num	observed temperature, °F

`hours[i]` — one decision hour

field	type	meaning
`hour`	int	decision hour, local standard (0–23; a negative value is a PRE-DAY decision, e.g. `-6` = 18:00 LST the previous evening)
`high`	obj/null	side document (below); `null` = not priced at this hour
`low`	obj/null	same shape
`warnings`	list	strings — the tick’s warnings (schema v1 `warnings`)

Side document — `hours[i].high` / `.low`

Mirrors realtime/schema.py SideDoc.to_dict() (schema v1) exactly, so the exporter can serialize the backtest’s per-hour SideDocs as-is:

field	type	meaning
`pmf_prior`	obj/null	forecast-only PMF, `{"<int °F>": prob}` (6 dp, zeros dropped); `null` when no ensemble data
`pmf`	obj	obs-corrected PMF, same encoding
`brackets`	list	bracket dicts (below); `[]` when no market was listed
`diagnostics`	obj	at minimum the fields below; extra keys are allowed and ignored

Bracket dict

field	type	meaning
`ticker`	str	market ticker (or a reconstructed `RECON-*` id when historical strike tables are unavailable)
`type`	str	`"less"` / `"between"` / `"greater"`
`lo`	int/null	INCLUSIVE payout lower bound (null for `less`)
`hi`	int/null	INCLUSIVE payout upper bound (null for `greater`)
`fair_value_prior`	num/null	PMF mass over the payout set, prior
`fair_value`	num/null	PMF mass over the payout set, obs-corrected

Diagnostics (minimum set)

field	type	meaning
`p_lock`	num	P(realized extreme already dominates)
`locked`	bool	lock declared
`ess`	num/null	effective sample size of the member weights
`n_members`	int	members entering the weighting
`n_obs`	int	observations seen up to the hour
`sources_used`	list	obs sources, e.g. `["synoptic_5min"]`

Diagnostics (optional stage-0 / consensus keys)

The production daemon (and the exporter mirroring it) additionally emits the stage-0 EMOS-consensus diagnostics below. All are OPTIONAL: bundles exported before the stage-0 prior landed simply lack them and the page renders unchanged (no strip, no prior badge).

field	type	meaning
`prior_source`	str	`"stage0"` — `pmf_prior` is the trained stage-0 EMOS consensus; `"raw_pool"` — the forecast-only equal-system pool fallback
`systems_summary`	obj/null	per-system consensus view `{system: {point_f, sd_f, n_members, weight_mass, init_utc}}` (`realtime.loop._systems_summary`): `point_f` = member-mean in-day extreme °F (1 dp), `sd_f` = member sd (2 dp; `null` for deterministic 1-member systems, e.g. MOS), `n_members` = members entering the feature, `weight_mass` = the system’s posterior F-draw mass (`null` for systems absent from the draw, e.g. MOS), `init_utc` = ISO init of the newest run used. Renders as the per-system strip under the PMF chart
`stage0`	obj/null	stage-0 EMOS predictive diagnostics when `prior_source == "stage0"`: `mu_f`, `sigma_f`, `pattern`, `ladder_level`, `s2bar_f2`, `n_fit`, `dist`, `t_df`; `null` on the raw-pool fallback
`anchor`	obj/null	the moment anchor mapping the posterior F arm onto the stage-0 prior (`stage1_truncate.compute_anchor`): `m0_eff`, `gamma`, `mu_w`, `sig_d`, `s_w`, `m0`, `s0`, `rbar`; `null` when anchoring is off, refused, or the prior is raw-pool

Exporter notes

One bundle per held-out day; the day must come from the holdout split (dataset/splits.is_held_out_day) — never a training day.
Run the production path (stage1_truncate.fair_value_pmf / prior_pmf + kalshi_map-shaped brackets) at each decision hour with available_at honoring the no-leak rule L1, exactly like backtest/runner.py.
Keep bundles < ~300 KB: subsample obs to 15 min, drop PMF entries < 1e-6 (already the schema-v1 convention), limit hours to ~12 entries.
data/case1.js–case3.js in this directory are REAL exports from scripts/export_cases.py (the original pre-exporter placeholder bundle has been replaced): backfilled GEFS/HRRR-lag/NAM-nest trajectory archives + archived 1-min obs + the official NWS CLI settlement, replayed through the walk-forward backtest stage-1 path. Historical strike tables are not archived, so brackets are the standard 6-bracket structure reconstructed around the corrected PMF’s median (see each bundle’s meta).

Xingjian (Ken) Yan