Skip to content

allora-network/allora-forge-builder-kit

Repository files navigation

forge_silicon

Allora Forge Builder Kit

Build, evaluate, and deploy ML inference workers on the Allora Network.

Contents

What is Allora?

Allora is a decentralized AI network that coordinates predictions across many independent ML models. Rather than relying on a single model, the network aggregates inferences from competing workers and weights them by historical accuracy — producing a combined output that outperforms any individual contributor.

The network is organized into topics. Each topic defines a prediction task (e.g. "8-hour BTC/USD log return") and runs a continuous lifecycle:

  1. Submission window opens — the network pings all registered workers for their inference
  2. Workers respond with a prediction value
  3. Evaluation window runs for the topic's time horizon (e.g. 8 hours)
  4. Scores are revealed — workers are ranked by loss against the ground truth, and rewards are distributed
time ──────────────────────────────────────────────────────────────────►

  ◄── submission ──►◄─────────────── evaluation period (e.g. 8h) ──────►
  │                 │                                                    │
open             close                                               scores
workers          predictions                                        revealed
polled           locked                                           + rewarded

All live topics today are crypto market predictions across assets like BTC, ETH, SOL, and NEAR. New topics are added over time.

What is the Allora Forge?

The Allora Model Forge is the hub for ML practitioners to compete, earn rewards, and build reputation on the network. Workers start on testnet to establish a track record, then graduate to mainnet where top performers earn ALLO token rewards.

This toolkit handles everything between your model and the network: data, feature engineering, evaluation, wallet management, and worker deployment.

What you get

  • Workflow API — backfill historical data → engineer features → build training datasets
  • Evaluation — grade your model against Allora's scoring methodology before deploying
  • Deployment tooling — wallet creation, faucet funding, worker lifecycle management
  • Monitoring dashboard — web UI showing submission history, on-chain scores, and live logs
  • Topic discovery — query all live topics on testnet and mainnet

Zero to deploy

Step 1 — Clone and install

git clone https://github.com/allora-network/allora-forge-builder-kit.git
cd allora-forge-builder-kit

python3.11 -m venv .venv
source .venv/bin/activate

python -m pip install .
python -m pip install -r requirements.txt

Get a free API key from developer.allora.network and save it:

echo "UP-..." > .allora_api_key

# Load into env without displaying the value
export ALLORA_API_KEY=$(cat .allora_api_key)

To persist across terminal sessions, add to your shell profile:

echo 'export ALLORA_API_KEY=$(cat /path/to/allora-forge-builder-kit/.allora_api_key)' >> ~/.bashrc

No API key? Use data_source="binance" in AlloraMLWorkflow() to pull data from Binance instead.

Step 2 — Train a model

cd notebooks

# Topic 69 — 1-day BTC/USD price prediction (1h bars, ~3 min)
python example_topic_69_bitcoin_walkthrough.py

# Topic 77 — 5-min BTC/USD price prediction (5m bars, ~2 min)
python example_topic_77_bitcoin_5min_walkthrough.py

Each script backfills historical data, engineers features, trains and evaluates a model, and saves a predict.pkl artifact.

Step 3 — Deploy a worker

# Still in notebooks/
python deploy_worker.py

On first run, WorkerManager creates a wallet, writes the key file to worker_keys/, and requests testnet ALLO from the faucet automatically. The worker process starts and begins polling the chain for open submission windows.

Faucet activity is logged, not printed. If a worker fails to start, check worker_logs/ for the subprocess output — faucet requests, balance checks, and on-chain errors all appear there.

Step 4 — Monitor and manage workers

# Web dashboard (recommended)
python -m allora_forge_builder_kit.web_dashboard

Open http://localhost:8787 — auto-refreshes every 5 seconds, shows all workers with submission timelines, on-chain scores, and live log tails.

Pass --host 0.0.0.0 to expose on all interfaces. An auth token is printed to stderr; append it as ?token=... in the URL.

# CLI dashboard — text summary of all workers
python -m allora_forge_builder_kit.workerctl dashboard

Worker management via the Python API:

from allora_forge_builder_kit import WorkerManager

wm = WorkerManager(reconcile_on_start=False)

# See all workers and their status
for w in wm.status_all():
    print(w['topic_id'], w['address'], w['status'])

# Stop a worker (keeps it registered, can be restarted)
wm.stop_worker(topic_id=69, address="allo1...")

# Start a stopped worker
wm.start_worker(topic_id=69, address="allo1...")

# Remove a worker entirely (stops it and deletes the record)
wm.remove_worker(topic_id=69, address="allo1...", force=True)

# Stop all running workers
wm.stop_all()

# Restart all enabled workers (e.g. after a reboot)
wm.start_all()

# Tail a worker's log
lines = wm.get_worker_log_tail(topic_id=69, address="allo1...", lines=50)
print("\n".join(lines))

Step 5 — Deploy other topics

TOPIC_ID=42 python deploy_worker.py   # deploy topic 42
TOPIC_ID=77 python deploy_worker.py   # deploy topic 77

Discover available topics:

from allora_forge_builder_kit import AlloraTopicDiscovery

d = AlloraTopicDiscovery(api_key="UP-...", network="testnet")
for t in d.get_all_topics():
    print(t.topic_id, t.raw.get("topic_name"), t.epoch_length, t.loss_method)

Topic reference

Playground topics (testnet only) are the recommended starting point — no whitelist required.

Testnet ID Name Notes
69 BTC/USD - 1 Day Price Prediction Playground — example walkthroughs use this
77 BTC/USD - 5 Min Price Prediction Playground Fast

Mainnet topics and their testnet equivalents:

Mainnet ID Mainnet Name Testnet ID Testnet Name
1 BTC/USD - Log Returns - 8h 64 8h BTC/USD Log-Return (5min updates)
2 ETH/USD - Log Returns - 8h Missing
3 SOL/USD - Log Returns - 8h 57 8h SOL/USD Log-Return (inactive)
9 ETH/USD - Price Prediction - 8h 41 ETH/USD - 8h Price Prediction
10 SOL/USD - Price Prediction - 8h 38 SOL/USD - 8h Price Prediction
14 BTC/USD - Price Prediction - 8h 42 BTC/USD - 8h Price Prediction
15 BTC/USD - Log Returns - 24h 61 1 day BTC/USD Log-Return Prediction
16 ETH/USD - Log Returns - 24h 63 1 day ETH/USD Log-Return Prediction
17 SOL/USD - Log Returns - 24h 62 1 day SOL/USD Log-Return Prediction
18 BTC/USD - Log Returns - 20m Missing
19 NEAR/USD - Log Returns - 8h 71 8h NEAR/USD Log-Return Prediction

Python API (quick reference)

from allora_forge_builder_kit import AlloraMLWorkflow

# Build a training dataset
workflow = AlloraMLWorkflow(
    tickers=["btcusd"],
    topic_id=69,
    interval="1h",
    n_input_bars=48,
    n_target_bars=24,
)
workflow.backfill(days=500)
df = workflow.get_full_feature_target_dataframe()

# Evaluate a predict function
from allora_forge_builder_kit import PerformanceEvaluator
evaluator = PerformanceEvaluator(workflow)
grade = evaluator.evaluate(predict_fn)

The learning problem

Framing forecasting as supervised learning

At any point in time $t$, the model observes a window of $N$ past bars as input features $\mathbf{x} \in \mathbb{R}^d$ and predicts a future outcome $y$ — a price or log return over the next $H$ bars. By sliding this window across the full history, a single time series becomes thousands of labeled examples $(\mathbf{x}_i, y_i)$, turning forecasting into a standard supervised learning problem.

The AlloraMLWorkflow handles this construction: backfill() fetches historical data, get_full_feature_target_dataframe() builds the feature matrix and target vector, ready for any scikit-learn compatible model.

Empirical risk minimization

The standard recipe is to pick a model $f$ by minimizing empirical (in-sample) loss:

$$f^* = \arg\min_{f \in \mathcal{F}} \frac{1}{n} \sum_{i=1}^{n} \ell(y_i,, f(\mathbf{x}_i))$$

The ERM assumption is that training and deployment data share the same distribution — so a model that fits well in-sample will generalize out-of-sample. This is a reasonable working assumption in many domains.

Why finance makes this hard

Financial markets violate the ERM assumption routinely:

  • Regime changes — volatility regimes, macro shocks, and structural breaks mean the distribution of returns today can look nothing like last year's.
  • Non-stationarity — correlations, volatility, and return distributions all drift over time.
  • Low signal-to-noise — crypto returns are heavily noise-dominated, making it easy to fit noise rather than signal.

The practical consequence is that overfitting is the default failure mode. A model can lower in-sample loss while out-of-sample loss increases — more model complexity captures noise instead of signal. Traditional remedies (early stopping, depth limits, regularization, conservative learning rates) are especially important here.

Walk-forward validation

To measure true out-of-sample performance the toolkit uses walk-forward cross-validation: train on data up to time $t$, evaluate on data strictly after $t$, advance the window, repeat. This respects temporal ordering (no lookahead leakage) and produces a realistic sample of out-of-sample predictions. The evaluation metrics in the next section are computed entirely on these held-out predictions.

The model builder's job

The example notebooks use LightGBM (gradient boosting over decision trees) with conservative defaults as a starting point. Gradient boosting is a strong tabular baseline — it handles non-linearity and feature interactions well and is relatively robust to scale.

From here, improving your score comes down to three levers:

  1. Feature engineering — what information goes into $\mathbf{x}$. The base features are normalized OHLCV ratios (last-close normalized to 1.0). Adding technical indicators (RSI, MACD, realized volatility), log-return series, or cross-asset signals is where most alpha lives.
  2. Model and regularization — early stopping, tree depth, learning rate, and subsampling to keep variance in check.
  3. Maximizing out-of-sample metrics — the evaluation suite (DA, Pearson $r$, WRMSE, CZAR) is the scorecard, not in-sample loss. A higher grade means better generalization and a higher expected score on the Allora network.

For structured methodology guidance on each of these levers, see the Model creation skills section.


Evaluation metrics

PerformanceEvaluator scores your model on 7 primary metrics before you deploy. Each has a pass/fail threshold. The composite score (out of 7) maps to a letter grade.

# Metric Threshold What it measures
1 Directional Accuracy (DA) ≥ 52% Fraction of predictions where the sign (up/down) matches the actual return
2 DA CI Lower Bound ≥ 0.50 Lower bound of the 95% Wilson confidence interval for DA, adjusted for autocorrelation — ensures the edge isn't a statistical fluke
3 DA p-value < 0.05 One-tailed z-test (H₀: DA = 50%) with continuity correction and autocorrelation-aware effective sample size
4 Pearson r ≥ 0.05 Linear correlation between predicted and actual returns
5 Pearson p-value < 0.05 Statistical significance of the Pearson correlation
6 WRMSE Improvement ≥ 5% Weighted RMSE vs. a zero-prediction baseline, where errors are weighted by the magnitude of actual returns — bigger moves count more
7 CZAR Improvement ≥ 10% Cumulative Z-scored Absolute Return: the fraction of z-scored directional return captured vs. a perfect oracle. 0 = random guessing, 1 = perfect

Grading:

Points (out of 7) Grade
7 A+
6 A
5 B+
4 B
3 C
2 D
≤ 1 F

Model creation skills

The allora_research_model_skills/ bundle contains three Claude Code skills for building financial prediction models. Each enters model design from a different angle:

Skill Entry point
forge-hypothesis-driven Start from a theory about what moves markets (deductive)
forge-signal-discovery Start from interesting data, discover what is predictable (inductive)
forge-robustness-first Start from validation gates, work backwards to a design that survives them (adversarial)

All three produce a complete, runnable pipeline and satisfy the same nine methodology principles. See allora_research_model_skills/README.md for selection guidance.


File map

Path Purpose
notebooks/example_topic_69_bitcoin_walkthrough.py End-to-end example for topic 69: data → features → model → artifact
notebooks/example_topic_77_bitcoin_5min_walkthrough.py End-to-end example for topic 77: 5-min BTC prediction
notebooks/deploy_worker.py Deploy any topic with WorkerManager (TOPIC_ID=N python deploy_worker.py)
notebooks/deploy_worker_raw.py Minimal SDK-only deployment reference (no WorkerManager)
notebooks/feature_engineering_example.py Standalone feature engineering reference
allora_forge_builder_kit/workflow.py Data + feature pipeline
allora_forge_builder_kit/evaluation.py Model scoring (7 primary metrics + grading)
allora_forge_builder_kit/topic_discovery.py Query live topics on testnet/mainnet
allora_forge_builder_kit/worker_manager.py Wallet creation, key management, process lifecycle
allora_forge_builder_kit/worker_monitor.py On-chain event tracking
allora_forge_builder_kit/web_dashboard.py Web monitoring UI
allora_research_model_skills/ Methodology skills for building generalizable financial models (hypothesis-driven, signal-discovery, robustness-first)

Testing

pytest tests/test_data_managers.py -v -m "not integration"

# Full suite (requires network)
export RUN_INTEGRATION_TESTS=1
pytest -v

Links

License

MIT

About

The Allora Forge Builder Kit: a series of notebooks for building ML models for Forge competitions.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors