This directory contains examples and configurations for the ForecastLabAI data seeder.
# Generate standard test dataset
uv run python scripts/seed_random.py --full-new --seed 42 --confirm
# Verify data was created
uv run python scripts/seed_random.py --status
# Check data integrity
uv run python scripts/seed_random.py --verify
# Query via API
curl http://localhost:8123/analytics/kpis?start_date=2024-01-01&end_date=2024-12-31| Flag | Description |
|---|---|
--full-new |
Generate complete dataset from scratch |
--delete |
Delete generated data |
--append |
Append data to existing dataset |
--status |
Show current data counts |
--verify |
Verify data integrity |
| Option | Default | Description |
|---|---|---|
--seed |
42 | Random seed for reproducibility |
--stores |
10 | Number of stores to generate |
--products |
50 | Number of products to generate |
--start-date |
2024-01-01 | Start of date range |
--end-date |
2024-12-31 | End of date range |
--sparsity |
0.0 | Fraction of missing combinations |
--scenario |
— | Pre-built scenario name |
--config |
— | Path to YAML config file |
--scope |
all | Deletion scope (all/facts/dimensions) |
--batch-size |
1000 | Batch insert size |
| Flag | Description |
|---|---|
--confirm |
Required for destructive operations |
--dry-run |
Preview without executing |
| Scenario | Description | Use Case |
|---|---|---|
retail_standard |
Normal retail patterns with mild seasonality | General development and testing |
holiday_rush |
Q4 surge with Black Friday/Christmas peaks | Seasonal forecasting validation |
high_variance |
Noisy, unpredictable data with anomalies | Model robustness testing |
stockout_heavy |
Frequent stockouts (25% probability) | Inventory modeling scenarios |
new_launches |
100 products with launch ramp patterns | Launch forecasting validation |
sparse |
50% missing combinations, random gaps | Gap handling and missing data tests |
uv run python scripts/seed_random.py --full-new \
--scenario holiday_rush \
--stores 15 \
--confirmuv run python scripts/seed_random.py --full-new \
--config examples/seed/config_holiday.yaml \
--confirmAll generated data is deterministic given the same seed:
# These produce identical datasets
uv run python scripts/seed_random.py --full-new --seed 42 --confirm
uv run python scripts/seed_random.py --delete --confirm
uv run python scripts/seed_random.py --full-new --seed 42 --confirmAdd data for additional time periods without affecting existing records:
# First, generate initial dataset
uv run python scripts/seed_random.py --full-new \
--start-date 2024-01-01 \
--end-date 2024-12-31 \
--seed 42 \
--confirm
# Later, append Q1 2025
uv run python scripts/seed_random.py --append \
--start-date 2025-01-01 \
--end-date 2025-03-31 \
--seed 43# Delete everything
uv run python scripts/seed_random.py --delete --confirm
# Delete only fact tables (keep dimensions)
uv run python scripts/seed_random.py --delete --scope facts --confirm
# Preview what would be deleted
uv run python scripts/seed_random.py --delete --dry-runSee config_holiday.yaml for a complete example of YAML configuration.
dimensions:
stores:
count: 10
regions: ["North", "South", "East", "West"]
types: ["supermarket", "express", "warehouse"]
products:
count: 50
categories: ["Beverage", "Snack", "Dairy"]
brands: ["BrandA", "BrandB", "Generic"]
date_range:
start: "2024-01-01"
end: "2024-12-31"
time_series:
base_demand: 100
trend: "linear" # none, linear, exponential
trend_slope: 0.001 # daily % change
noise_sigma: 0.15 # demand variance
retail:
promotion_probability: 0.1
stockout_probability: 0.02
promotion_lift: 1.3
sparsity:
missing_combinations_pct: 0.0
random_gaps_per_series: 0
holidays:
- date: "2024-12-25"
name: "Christmas Day"
multiplier: 0.3
seed: 42The seeder generates realistic time-series data with:
- None: Stationary demand
- Linear: Gradual growth/decline
- Exponential: Accelerating growth
- Weekly: Different demand by day of week (Mon-Sun)
- Monthly: Optional multipliers by month
- Holiday: Special multipliers for specific dates
- Gaussian noise with configurable variance
- Random spikes/dips for anomaly testing
- Promotion lift during promotional periods
- Stockout handling (zero sales or backlog)
- Price elasticity effects
After seeding, you can:
- Explore data: Use
/analytics/kpisand/analytics/drilldowns - Train models: Call
/forecasting/trainwith store/product IDs - Run backtests: Call
/backtesting/runto validate models - Test RAG: Index documents and query with
/rag/*endpoints - Use agents: Create sessions and chat with
/agents/*endpoints