StateTrace

State-Bottlenecked Reasoning for Verifiable Referee Decision Tracing in Sports Broadcast Video

This repository contains the reference implementation for the StateTrace research paper. It provides the complete state-bottlenecked reasoning architecture, three-stage training pipeline, transition verification framework, and evaluation tools described in the paper.

StateTrace is the third paper in the sports adjudication trilogy:

RuleGround — Perception to predicates: grounds raw video into structured game-state predicates
RefTrace — Evidence to traces: generates verifiable reasoning traces over a rule knowledge graph
StateTrace (this repo) — Traces to state transitions: adds explicit state bottleneck for per-step verification

Overview

StateTrace extends RefTrace with explicit state-bottlenecked reasoning for automated referee decision analysis. Where RefTrace uses a HistoryEncoder (Transformer over action/result pairs) to compress reasoning history, StateTrace introduces a typed state schema s_t = (E_t, V_t, R_t, C_t, D_t) that serves as a compact, verifiable bottleneck between each reasoning step. This enables deterministic per-transition verification, state-derived action masking, dense per-step reward signals, and a novel Stage III training loop. We validate on NFL broadcast video using the NFL-MH benchmark.

Key Results on NFL-MH-Core

Configuration	VTA	DA	TV	IAR
StateTrace-7B	74.8	90.1	96.3	2.1%
RefTrace-7B (baseline)	72.1	89.2	--	--
w/o Stage III (ablation)	73.2	89.8	91.7	5.4%
w/o action masking	72.9	89.5	88.2	8.9%

Architecture

---
config:
  layout: elk
  look: neo
  theme: neo
---
flowchart TB
    Q["<b>Query</b><br>(play description)"] --> QE
    Video["<b>Video Frames</b><br>[B, T, C, H, W]"] --> VLM

    subgraph Policy ["StateTracePolicy"]
        VLM["<b>QwenVLBackbone</b><br>Qwen3-VL-8B"] --> GE["<b>GraphEncoder</b><br>Heterogeneous GAT"]
        QE["<b>QueryEncoder</b><br>Sentence-T5"]
        SE["<b>StateEncoder</b><br>2-layer Transformer"]
        GE & QE & SE --> F["<b>FusionLayer</b>"]
        F --> AH["<b>ActionHead</b><br>+ action mask A(s_t)"]
    end

    subgraph Act ["Action Space"]
        direction LR
        A1["GET_RULE"] & A2["GET_STATE"] & A3["GET_EVENTS"]
        A4["GET_VIDEO"] & A5["VERIFY_TEMPORAL"] & A6["STOP"]
    end

    AH --> Act
    Act -->|"execute"| GSTH["<b>GSTH</b><br>Game-State Trace Hypergraph<br>4 node types · 6 edge types"]
    GSTH -->|"result"| UPD["<b>U_θ</b><br>State Update"]
    UPD -->|"s_{t+1}"| SE
    A6 -->|"decision"| D["<b>CALL_CORRECT · CALL_INCORRECT · NO_FOUL</b>"]

State Schema (5 components, 13 fields)

Component	Symbol	Description	Fields
Entities	E_t	Events and candidates	`events_seen`, `candidate_event_ids`
Evidence	V_t	Retrieved evidence	`retrieved_state_features`, `video_segments`, `evidence_bindings`, `justification_pointers`
Rules	R_t	Candidate rules	`candidate_rule_ids`
Constraints	C_t	Open constraints	`open_constraints`, `temporal_requirements`, `conflicts`
Decision	D_t	Decision status	`decision_status`

Quick Start

# Clone and install
git clone https://github.com/sreevadde/statetrace.git && cd statetrace
pip install -e ".[dev]"

# Build the Game-State Trace Hypergraph
statetrace build-gsth --config configs/base.yaml

# Stage 1: Supervised Fine-Tuning
statetrace train --config configs/training/sft.yaml

# Stage 2: Terminal RL (GRPO)
statetrace train --config configs/training/grpo.yaml

# Stage 3: Transition RL (novel)
statetrace train --config configs/training/transition_rl.yaml

# Evaluate on NFL-MH-Core
statetrace eval --config configs/base.yaml \
    --checkpoint checkpoints/stage3/final.pt

Python API

from statetrace.models import StateTracePolicy
from statetrace.graph import GSTH
from statetrace.state import State, StateSerializer

# Load a pre-built GSTH
gsth = GSTH.load("data/gsth/gsth.pkl")
pyg_data = gsth.to_pyg()

# Build the policy from config
policy = StateTracePolicy.from_config(cfg)

# Initialize state and serialize
state = State()
serialized = StateSerializer().serialize(state)

# Run a single reasoning step
dist = policy(
    query="Was the defensive pass interference call correct?",
    gsth_data=pyg_data,
    video_frames=video_tensor,      # (B, T, C, H, W)
    serialized_state=serialized,    # structured text
)

# Sample an action with state-derived masking
action = policy.select_action(
    query="Was the defensive pass interference call correct?",
    gsth_data=pyg_data,
    video_frames=video_tensor,
    state=state,                    # used for A(s_t) mask
)

Configuration

StateTrace uses OmegaConf for hierarchical YAML configuration with CLI overrides.

configs/
├── base.yaml               # Default hyperparameters for all components
├── model/
│   ├── base.yaml            # Qwen3-VL-8B (default) + LoRA rank 64
│   ├── large.yaml           # Qwen3-VL-32B
│   ├── small.yaml           # Qwen3-VL-2B
│   ├── qwen25.yaml          # Qwen2.5-VL-7B (paper baseline)
│   └── qwen35.yaml          # Qwen3.5-9B (latest, unified VL)
├── training/
│   ├── sft.yaml             # Stage 1: supervised fine-tuning
│   ├── grpo.yaml            # Stage 2: terminal RL with NTR
│   └── transition_rl.yaml   # Stage 3: transition RL (novel)
└── nfl/
    ├── core.yaml            # NFL-MH-Core (frame-accurate, expert-labeled)
    ├── auto.yaml            # NFL-MH-Auto (broadcast-scale, auto-extracted)
    └── combined.yaml        # Combined Core + Auto

Override any parameter from the command line:

statetrace train --config configs/training/sft.yaml \
    --set training.lr=1e-5 \
    --set training.batch_size=8

Reproduction

Hardware Requirements

Training: 4x NVIDIA A100 80GB (SFT ~6h, GRPO ~10h, Stage III ~8h)
Inference: 1x A100 40GB (or 2x A6000)
GSTH construction: CPU-only, ~15 minutes

Reproducing Paper Results

# Build GSTH from play-by-play data
statetrace build-gsth --config configs/base.yaml

# Stage 1: SFT on expert reasoning traces
statetrace train --config configs/training/sft.yaml

# Stage 2: Terminal RL with Normalized Trace Reward
statetrace train --config configs/training/grpo.yaml

# Stage 3: Transition RL with dense rewards
statetrace train --config configs/training/transition_rl.yaml

# Evaluate on NFL-MH-Core test set
statetrace eval --config configs/nfl/core.yaml \
    --checkpoint checkpoints/stage3/final.pt

# Evaluate on NFL-MH-Auto test set
statetrace eval --config configs/nfl/auto.yaml \
    --checkpoint checkpoints/stage3/final.pt

Results are deterministic given fixed seeds (training.seed=42). Use --set training.seed={43,44} for the additional seeds reported in the paper.

Citation

@article{vadde2026statetrace,
  title   = {StateTrace: State-Bottlenecked Reasoning for Verifiable
             Referee Decision Tracing in Sports Broadcast Video},
  author  = {Vadde, Sree Krishna},
  journal = {arXiv preprint},
  year    = {2026}
}

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
configs		configs
statetrace		statetrace
tests		tests
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
README.md		README.md
StateTrace.pdf		StateTrace.pdf
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

StateTrace

Overview

Key Results on NFL-MH-Core

Architecture

State Schema (5 components, 13 fields)

Quick Start

Python API

Configuration

Reproduction

Hardware Requirements

Reproducing Paper Results

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

StateTrace

Overview

Key Results on NFL-MH-Core

Architecture

State Schema (5 components, 13 fields)

Quick Start

Python API

Configuration

Reproduction

Hardware Requirements

Reproducing Paper Results

Citation

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages