You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Exact software and hardware versions used to produce all results in the paper.
All experiments were run on both platforms to verify cross-platform reproducibility.
Local Development Machine
Component
Version
OS
macOS 15.7.3 (Darwin 24.6.0 arm64)
Python
3.13.7 (local venv)
pip
25.2
GCP Compute Instance
Component
Version
Instance type
g2-standard-8
OS
Ubuntu 22.04.5 LTS
Kernel
6.8.0-1052-gcp
GPU
NVIDIA L4 (23,034 MiB VRAM)
GPU Driver
570.211.01
CUDA (PyTorch)
12.4
Compute Capability
8.9
Python
3.11.15
Zone
us-central1-a
Project
creditlab-491502
Python Dependencies (pinned)
All experiments use these exact versions, recorded in requirements-lock.txt:
Collection policy RNG: Seeded once per run (not per step). Different rollouts within the same run produce independent trajectories.
Table policy RNG: Seeded once per policy instance for random fallback at unseen states.
Sweep seeds: Explicit in config files. Each seed produces a complete train→score→eval pipeline.
vLLM inference: Temperature=0.7 introduces stochasticity. LLM experiments are not bit-reproducible across runs but are statistically reproducible (same ranking across seeds).