Config-Driven Quantitative Research & Trading Platform
A modular, reproducible platform for market data ingestion, feature engineering, machine learning / reinforcement learning research, systematic evaluation, and controlled execution.
Quantitative research systems often break down when moving from experimentation to repeatable evaluation and production-grade workflows. Ad-hoc scripts, notebook-driven experiments, and inconsistent data handling make it difficult to trust results or compare strategies over time.
Quanto was built to address this gap by providing a deterministic, end-to-end research and execution framework that emphasizes:
- reproducibility
- explicit data lineage
- experiment governance
- clean separation between research and execution
The goal is not "black-box alpha," but transparent, auditable ML-driven trading research.
- Phase 1 (current): research, backtests, qualification gates, and shadow/paper execution only.
- Phase 2: live execution via broker adapters (Alpaca), with strict risk controls.
- Non-goals: no HFT/tick-level ingestion, no AutoML.
- Multi-vendor ingestion pipelines (equities, options, derived signals)
- Deterministic manifests and canonical schemas
- Versioned raw and processed datasets to support replay and audits
- Modular feature pipelines for tabular and time-series data
- Consistent feature/label generation across experiments
- Explicit control of lookahead, alignment, and leakage boundaries
- Reinforcement learning research environments (FinRL-based)
- PPO policy training and reproducible evaluation
- Config-driven experiment definitions for repeatable runs
- Automated backtesting pipelines
- Qualification gates and evaluation reports
- Artifact generation for metrics, diagnostics, and comparisons
- Research / execution boundary enforced by design
- Shadow execution and deterministic replay
- Risk controls and allocation logic isolated from research code
- Structured metrics output
- Diagnostics and plotting utilities for experiment analysis
- Designed to support long-running comparative research
Run the dashboard locally:
streamlit run monitoring/experiment_dashboard.pyThen open the URL Streamlit prints (typically http://localhost:8501).
Regime slices
Experiment comparison
Feature sets are versioned contracts. For options, the default surface set is intentionally dense and availability-aware (OI, volume, IVX, IVR + coverage flags), and excludes sparse fields. When leakage is a concern, use lagged variants that align options surfaces to the prior trading session.
flowchart LR
subgraph Ingest
direction LR
Vendors[Data Vendors] --> Router[Ingestion Router]
Configs[Configs] --> Router
Router --> Rest[REST]
Router --> Flat[Flatfile]
Router --> Stream[Stream]
Rest --> Raw[Raw Data]
Flat --> Raw
Stream --> Raw
Raw --> Canonical[Canonical Store]
end
subgraph Features
direction LR
Canonical --> FeatureEng[Feature Engine]
FeatureEng --> Regime[Regime Features]
FeatureEng --> Options[Options Surface]
end
subgraph Training
direction LR
FeatureEng --> Env[RL Env]
Env --> PPO[PPO]
PPO --> Eval[Evaluation]
end
subgraph Governance
direction LR
Eval --> Qualify[Qualification]
Qualify --> Promote[Promotion]
end
subgraph Execution
direction LR
Promote --> Shadow[Shadow Replay]
Promote --> ExecCtrl[Exec Controller]
ExecCtrl --> Risk[Risk Engine]
Risk --> Orders[Order Router]
Orders --> Broker[Broker Adapter]
Broker --> Fills[Fills]
Fills --> Ledger[Ledger]
end
subgraph Monitoring
direction LR
Eval --> Metrics[Metrics]
Shadow --> Metrics
Ledger --> Metrics
Metrics --> Plots[Plots]
Metrics --> Alerts[Alerts]
end
This architecture is designed to keep research, governance, and execution cleanly separated:
- Ingestion and canonicalization build auditable, vendor‑agnostic datasets.
- Features and training operate only on canonical data.
- Qualification gates decide what is promotion‑eligible.
- Execution uses promoted artifacts only, with explicit risk controls.
- Monitoring closes the loop with metrics and diagnostics.
flowchart TB
subgraph RawLayer
RawEq[Raw Equity OHLCV]
RawOpt[Raw Options Data]
RawFund[Raw Fundamentals]
end
subgraph CanonicalLayer
CanonEq[Canonical Equity]
CanonOpt[Canonical Options]
CanonFund[Canonical Fundamentals]
end
subgraph DerivedLayer
DerSurface[Derived Options Surface]
end
subgraph FeaturesLayer
FeatCore[Feature Sets Core V1]
FeatRegime[Regime Feature Set]
end
subgraph ExperimentsLayer
Spec[Experiment Spec]
Run[Run Experiment]
Metrics[Evaluation Metrics]
RegimeSlices[Regime Slices]
end
subgraph PromotionLayer
QualReport[Qualification Report]
PromoRecord[Promotion Record]
end
subgraph ShadowLayer
Replay[Shadow Replay]
MetricsSim[Metrics Sim]
end
subgraph ExecutionLayer
ExecRun[Paper Runs]
Orders[Orders]
FillLog[Fills]
ExecMetrics[Execution Metrics]
end
RawEq --> CanonEq
RawOpt --> CanonOpt
RawFund --> CanonFund
CanonEq --> DerSurface
CanonOpt --> DerSurface
CanonEq --> FeatCore
CanonOpt --> FeatCore
CanonEq --> FeatRegime
Spec --> Run
FeatCore --> Run
FeatRegime --> Run
Run --> Metrics
Run --> RegimeSlices
Metrics --> QualReport
RegimeSlices --> QualReport
QualReport --> PromoRecord
PromoRecord --> Replay
Replay --> MetricsSim
PromoRecord --> ExecRun
ExecRun --> Orders
Orders --> FillLog
FillLog --> ExecMetrics
This diagram is the end-to-end data story. Raw vendor data is normalized into canonical datasets, then transformed into feature sets and experiment artifacts. Qualification and promotion outputs are stored alongside shadow and execution metrics so every decision can be replayed and audited.
Governance is regime-aware: high-volatility drawdown and exposure are enforced as hard gates, while global performance (e.g., Sharpe) is treated as soft evidence. Shadow replay produces deterministic execution evidence and should be treated as a promotion prerequisite, not a tuning loop.
sequenceDiagram
box Research
participant Spec
participant Runner
participant Eval
end
box Governance
participant Qualify
participant Promote
end
box Execution
participant Shadow
end
Spec->>Runner: run experiment
Runner->>Eval: metrics and regime slices
Eval->>Qualify: qualification report
Qualify->>Promote: promotion record
Promote->>Shadow: replay metrics
This lifecycle highlights governance: evaluation feeds regression comparison and qualification, which gates promotion and any execution path. Shadow replay is evidence, not optimization.
Configs are YAML files. Below is an annotated example.
name: core_v1_regime_ppo_demo
symbols: ["AAPL", "MSFT", "NVDA"] # universe (1..50 typical)
start_date: "2022-01-01" # inclusive
end_date: "2025-12-31" # inclusive
feature_set: core_v1_regime # observation set
regime_feature_set: regime_v1_1 # optional override
evaluation_split:
train_ratio: 0.80 # 0.5–0.9 typical
test_ratio: 0.20 # 0.1–0.5 typical
test_window_months: 3 # allowed: 1, 3, 4, 6, 12
policy: ppo # ppo | sac | sma | equal_weight
policy_params:
timesteps: 200000 # 50k–2M typical
learning_rate: 3.0e-4 # 1e-5–1e-3 typical
gamma: 0.99 # 0.90–0.999
reward_version: reward_v2 # registered reward id
max_turnover_1d: 0.30 # 0.05–0.50 typical
execution:
enabled: true
default_order_type: market # market | limit | stop_loss | trailing_stopgit clone https://github.com/skyliquid22/Quanto
cd Quanto
pip install -r requirements.txtEnvironment prerequisites (set only what you use):
export POLYGON_API_KEY=...
export IVOLATILITY_API_KEY=...export PYTHON=/usr/bin/python3
pip install -r requirements.txt
quanto --help
quanto
python -m cli.appMinimal interactive example:
quanto
doctor
ingest -hWithin the quanto prompt, each command forwards to its underlying script. Use -h to view help:
quanto> ingest -h
ingest: Ingest raw vendor data into .quanto_data/raw using a config file.
Usage:
ingest --config <path> --domain <domain> [--mode auto|rest|flat_file] [--run-id <id>] [--data-root <path>] [--force] [--dry-run]
Parameters:
--config (path, required) - Ingestion config file (YAML).
--domain (str, required) - Domain to ingest (e.g., equity_ohlcv).
--mode (str, optional, default=auto) - Force ingestion mode (auto, rest, flat_file).
--run-id (str, optional) - Optional deterministic run id.
--data-root (path, optional) - Override QUANTO data root.
--force (flag, optional) - Overwrite existing manifest for run-id.
--dry-run (flag, optional) - Resolve routing and print summary without writing data.
Returns:
JSON summary to stdout; raw files + manifest written under .quanto_data/raw/<vendor>/<domain>/.
Example:
ingest --config configs/ingest/polygon_equity_backfill.yml --domain equity_ohlcv --mode rest
Forwards to: python -m scripts.ingest
Example: ingest equities (raw layer) and view a typical JSON status:
quanto> ingest --config configs/ingest/polygon_equity_backfill.yml --domain equity_ohlcv --mode rest
{
"adapter": "PolygonEquityAdapter",
"config_path": "configs/ingest/polygon_equity_backfill.yml",
"domain": "equity_ohlcv",
"files_written": [
{"path": ".../AAPL/daily/2022.parquet", "records": 252}
],
"manifest_path": ".../equity_ohlcv-<run_id>.json",
"mode": "rest",
"run_id": "equity_ohlcv-<run_id>",
"status": "succeeded",
"vendor": "polygon"
}
Example: ingest insider trades (dedicated pipeline):
quanto> ingest-insiders --config configs/ingest/financialdatasets_insider_smoke.yml --run-id insider_trades-smoke
{
"adapter": "FinancialDatasetsAdapter",
"config_path": "configs/ingest/financialdatasets_insider_smoke.yml",
"domain": "insider_trades",
"files_written": [
{"path": ".../insider_trades/AAPL/2023/12/31.parquet", "records": 100}
],
"manifest_path": ".../insider_trades-smoke.json",
"mode": "rest",
"run_id": "insider_trades-smoke",
"status": "succeeded",
"vendor": "financialdatasets"
}
Build canonical shards from raw inputs:
quanto> build-canonical --start-date 2022-01-01 --end-date 2025-12-31 --domains equity_ohlcv
Run an experiment from a spec:
quanto> run-experiment --spec configs/experiments/core_v1_regime_slices_ppo.yml
Run a sweep (multi‑experiment grid):
quanto> run-sweep --sweep configs/sweeps/core_v1_primary_regime_baselines.yml
Evaluate or shadow‑replay a trained model:
quanto> evaluate --experiment-id <EXPERIMENT_ID>
quanto> run-shadow --experiment-id <EXPERIMENT_ID> --replay --start-date 2024-01-01 --end-date 2024-12-31
Generate a user-facing report:
quanto> monitor --experiment-id <EXPERIMENT_ID>
All experiments are driven by explicit configuration files, not ad-hoc flags.
Output locations (default under .quanto_data/):
experiments/<EXPERIMENT_ID>/evaluation/metrics.jsonexperiments/<EXPERIMENT_ID>/evaluation/regime_slices.jsonexperiments/<EXPERIMENT_ID>/promotion/qualification_report.jsonpromotions/<tier>/<EXPERIMENT_ID>.jsonshadow/<EXPERIMENT_ID>/<replay_id>/metrics_sim.json
Use the health reporter for deterministic coverage and NaN diagnostics:
- Calendar modes:
union,intersection, orsymbol(data-derived, no external calendars). - Strict mode: fail fast when missing or NaN thresholds are exceeded.
- Outputs: canonical and feature summaries under
.quanto_data/monitoring/data_health/<run_id>/.
- Missing canonical shards: run
scripts.build_canonical_datasetswith the correct date range and verify raw inputs exist. - NaN-heavy features: check coverage flags and consider lagged variants for leakage checks.
- Qualification skips: ensure baseline and candidate artifacts exist, then re-run
scripts.qualify_experiment. - Shadow replay gaps: confirm the experiment was promoted and replay window has data coverage.
-
Reproducibility First
Every experiment can be replayed using stored manifests and configs. -
Research ≠ Execution
Clear boundaries prevent research shortcuts from leaking into execution logic. -
Config-Driven Control
Pipelines are parameterized and versioned, enabling safe iteration. -
Auditability Over Convenience
Favor traceability and clarity over opaque automation.
- Unit and integration tests for core components
- Explicit validation steps for data schemas and experiment outputs
- Structured logging and artifact storage to support debugging and review
Quanto is an active research and engineering project used to explore systematic ML-driven trading workflows. Some components are production-hardened, while others are research-oriented by design.
The platform prioritizes clarity, correctness, and extensibility over short-term optimization.
- ML engineers interested in end-to-end applied systems
- Quantitative researchers who value reproducibility and evaluation rigor
- Engineers exploring ML/RL workflows beyond notebooks

