Command-line interface for the QuantMini data pipeline.
pip install quantminiAfter installation, the quantmini command will be available globally.
# Initialize configuration
quantmini config init
# Edit credentials (add your Polygon.io API keys)
nano config/credentials.yaml
# Run daily pipeline
quantmini pipeline daily --data-type stocks_daily
# Query data
quantmini data query --data-type stocks_daily \
--symbols AAPL MSFT \
--fields date close volume \
--start-date 2024-01-01 \
--end-date 2024-01-31Initialize configuration files.
quantmini config init
# Force overwrite existing files
quantmini config init --forceCreates:
config/credentials.yaml- API credentials templateconfig/pipeline_config.yaml- Pipeline configurationconfig/system_profile.yaml- System hardware profile
Show current configuration.
quantmini config showShow system hardware profile.
quantmini config profileSet configuration value.
quantmini config set pipeline.mode streaming
quantmini config set processing.chunk_size 50000Get configuration value.
quantmini config get pipeline.mode
quantmini config get processing.use_polarsDownload data from Polygon.io S3.
quantmini data download \
--data-type stocks_daily \
--start-date 2024-01-01 \
--end-date 2024-01-31 \
--output data/rawOptions:
-t, --data-type: Type of data (stocks_daily,stocks_minute,options_daily,options_minute)-s, --start-date: Start date (YYYY-MM-DD)-e, --end-date: End date (YYYY-MM-DD)-o, --output: Output directory (default:data/raw)
Ingest data from landing layer to bronze layer (validated Parquet).
quantmini data ingest \
--data-type stocks_daily \
--start-date 2024-01-01 \
--end-date 2024-01-31 \
--mode polars \
--incrementalTransformation: Landing → Bronze
- Input: Raw CSV.GZ from
landing/polygon-s3/{data_type}/ - Output: Validated Parquet in
bronze/{data_type}/ - Process: Schema enforcement, type checking, data validation
Options:
-t, --data-type: Type of data-s, --start-date: Start date-e, --end-date: End date-m, --mode: Ingestion mode (polarsorstreaming, default:polars)--incremental/--full: Incremental or full ingestion (default: incremental)
Modes:
polars: 5-10x faster, recommended for most systemsstreaming: Memory-efficient, for systems with <32GB RAM
Transform bronze layer to silver layer (add features).
quantmini data enrich \
--data-type stocks_daily \
--start-date 2024-01-01 \
--end-date 2024-01-31 \
--incrementalTransformation: Bronze → Silver
- Input: Validated Parquet from
bronze/{data_type}/ - Output: Feature-enriched Parquet in
silver/{data_type}/ - Process: Calculate technical indicators, returns, alpha factors
Features Added:
- Returns (1d, 5d, 20d)
- Alpha factors
- Price features
- Volume features
- Volatility
Transform silver layer to gold layer (Qlib binary format).
quantmini data convert \
--data-type stocks_daily \
--start-date 2024-01-01 \
--end-date 2024-01-31 \
--incrementalTransformation: Silver → Gold
- Input: Feature-enriched Parquet from
silver/{data_type}/ - Output: ML-ready Qlib binary in
gold/qlib/{data_type}/ - Process: Convert to optimized binary format for ML training/backtesting
Outputs Qlib-compatible binary format:
gold/qlib/{data_type}/instruments/all.txtgold/qlib/{data_type}/calendars/day.txtgold/qlib/{data_type}/features/{symbol}/{feature}.bin
Query enriched data.
quantmini data query \
--data-type stocks_daily \
--symbols AAPL MSFT GOOGL \
--fields date close volume return_1d alpha_daily \
--start-date 2024-01-01 \
--end-date 2024-01-31 \
--output results.csvOptions:
-t, --data-type: Type of data-s, --symbols: Symbols to query (can specify multiple)-f, --fields: Fields to query (can specify multiple)--start-date: Start date--end-date: End date-o, --output: Output CSV file (default: print to stdout)-l, --limit: Limit number of rows
Example with multiple symbols/fields:
quantmini data query \
--data-type stocks_daily \
-s AAPL -s MSFT -s GOOGL -s AMZN \
-f date -f close -f volume -f return_1d \
--start-date 2024-01-01 \
--end-date 2024-12-31 \
-o portfolio_data.csvShow ingestion status.
# Show all data types
quantmini data status
# Show specific data type
quantmini data status --data-type stocks_daily
# Filter by date range
quantmini data status \
--data-type stocks_daily \
--start-date 2024-01-01 \
--end-date 2024-01-31Run complete Medallion Architecture pipeline: Landing → Bronze → Silver → Gold.
quantmini pipeline run \
--data-type stocks_daily \
--start-date 2024-01-01 \
--end-date 2024-01-31Full Pipeline Flow:
- Landing: Download raw CSV.GZ from source
- Bronze: Ingest to validated Parquet
- Silver: Enrich with calculated features
- Gold: Convert to ML-ready Qlib binary
Options:
-t, --data-type: Type of data-s, --start-date: Start date-e, --end-date: End date--skip-ingest: Skip ingestion step--skip-enrich: Skip enrichment step--skip-convert: Skip conversion step
Example - only enrich and convert:
quantmini pipeline run \
--data-type stocks_daily \
--start-date 2024-01-01 \
--end-date 2024-01-31 \
--skip-ingestRun daily update.
# Update yesterday's data
quantmini pipeline daily --data-type stocks_daily
# Update last 3 days
quantmini pipeline daily --data-type stocks_daily --days 3Options:
-t, --data-type: Type of data-d, --days: Number of days to update (default: 1)
Automatically:
- Ingests recent data
- Adds features
- Converts to Qlib format
Backfill missing data.
quantmini pipeline backfill \
--data-type stocks_daily \
--start-date 2024-01-01 \
--end-date 2024-12-31Automatically detects and processes only missing dates.
Validate Qlib binary format conversion.
quantmini validate binary --data-type stocks_dailyChecks:
- Instruments file format
- Calendar file format
- Binary files exist and are valid
- Metadata file exists
Validate bronze and silver layer Parquet data integrity.
quantmini validate parquet --data-type stocks_dailyValidates:
- Bronze layer:
bronze/{data_type}/- validated raw data - Silver layer:
silver/{data_type}/- feature-enriched data
Shows:
- Total partitions per layer
- Total size per layer
- Date range coverage
- Symbol count
- Schema consistency across partitions
Validate configuration files.
# Check configuration
quantmini validate config
# Check and fix
quantmini validate config --fixSchema validation and diagnostics commands for ensuring data consistency.
Validate production schema consistency across all datasets.
# Validate all datasets with default data root
quantmini schema validate
# Validate with custom data root
quantmini schema validate --data-root /Volumes/sandisk/quantmini-lakeChecks:
- Parquet datasets (stocks_daily, stocks_minute, options_daily, options_minute)
- Enriched datasets
- Qlib binary data
Reports:
- File counts
- Schema consistency
- Feature counts
- Date ranges
Diagnose schema inconsistencies for a specific data type.
# Diagnose stocks_daily (checks both parquet and enriched)
quantmini schema diagnose --data-type stocks_daily
# Diagnose only parquet
quantmini schema diagnose --data-type stocks_daily --dataset parquet
# Diagnose only enriched
quantmini schema diagnose --data-type stocks_daily --dataset enriched
# With custom data root
quantmini schema diagnose --data-type stocks_daily --data-root /custom/pathShows:
- Number of different schemas detected
- Example files for each schema
- Column-by-column differences
- Date ranges for each schema
Fix schema inconsistencies by re-ingesting data with correct schema.
# Fix stocks_daily schema
quantmini schema fix --data-type stocks_daily
# Fix with custom date range
quantmini schema fix \
--data-type stocks_daily \
--start-date 2020-01-01 \
--end-date 2025-12-31
# Fix all data types
quantmini schema fix --data-type all
# Dry run (show what would be done without doing it)
quantmini schema fix --data-type stocks_daily --dry-run
# With custom data root
quantmini schema fix --data-type stocks_daily --data-root /custom/pathOptions:
-t, --data-type: Data type to fix (stocks_daily,stocks_minute,options_daily,options_minute,all)-s, --start-date: Start date (default: 2020-10-16)-e, --end-date: End date (default: today)--data-root: Custom data root directory--dry-run: Show what would be done without making changes
Verify Qlib binary format compatibility and data integrity.
# Verify Qlib data
quantmini schema verify-qlib
# With custom data root
quantmini schema verify-qlib --data-root /Volumes/sandisk/quantmini-lakeComprehensive verification includes:
- File structure check (instruments, calendars, features)
- Instruments count
- Calendar validation
- Feature file verification
- Qlib initialization test
- Data query test
- Comparison with enriched parquet
Reports:
- Number of instruments
- Number of trading days
- Number of binary files
- Sample data queries
- Data consistency verification
# 1. Initialize configuration
quantmini config init
# 2. Edit credentials
nano config/credentials.yaml
# Add your Polygon.io access_key_id and secret_access_key
# 3. Run backfill for historical data
quantmini pipeline run \
--data-type stocks_daily \
--start-date 2020-01-01 \
--end-date 2024-12-31
# 4. Validate conversion
quantmini validate binary --data-type stocks_daily# Run as a daily cron job
0 9 * * * /usr/local/bin/quantmini pipeline daily --data-type stocks_dailyOr manually:
quantmini pipeline daily --data-type stocks_daily# 1. Download data
quantmini data download \
--data-type stocks_daily \
--start-date 2024-01-01 \
--end-date 2024-01-31
# 2. Ingest with custom settings
quantmini data ingest \
--data-type stocks_daily \
--start-date 2024-01-01 \
--end-date 2024-01-31 \
--mode streaming # Use for <32GB RAM systems
# 3. Add features
quantmini data enrich \
--data-type stocks_daily \
--start-date 2024-01-01 \
--end-date 2024-01-31
# 4. Convert to Qlib
quantmini data convert \
--data-type stocks_daily \
--start-date 2024-01-01 \
--end-date 2024-01-31
# 5. Query results
quantmini data query \
--data-type stocks_daily \
--symbols AAPL MSFT \
--fields date close return_1d alpha_daily \
--start-date 2024-01-01 \
--end-date 2024-01-31# 1. Ensure data is available
quantmini data status --data-type stocks_daily
# 2. Query features for backtesting
quantmini data query \
--data-type stocks_daily \
--symbols AAPL MSFT GOOGL AMZN META \
--fields date close return_1d alpha_daily volatility_20d \
--start-date 2020-01-01 \
--end-date 2024-12-31 \
--output backtest_data.csv
# 3. Use in your backtesting script
python my_backtest.py --data backtest_data.csvOverride configuration with environment variables:
export PIPELINE_MODE=streaming
export MAX_MEMORY_GB=16
export DATA_ROOT=/path/to/data
quantmini pipeline daily --data-type stocks_dailyAvailable Variables:
PIPELINE_MODE: Override processing mode (streaming,batch,parallel)MAX_MEMORY_GB: Override max memory limitLOG_LEVEL: Override log level (DEBUG,INFO,WARNING,ERROR)DATA_ROOT: Override data root directory
# Check system profile
quantmini config profile
# Recommended mode shown in outputAll commands show progress bars and status updates:
📊 Ingesting stocks_daily from 2024-01-01 to 2024-01-31...
Mode: polars, Incremental: True
Downloading [####################################] 100%
✅ Ingested 21 dates
Total records: 156,789
Total size: 45.23 MB
Time: 23.45s
Success rate: 100.0%
# Validate everything
quantmini validate config
quantmini validate parquet --data-type stocks_daily
quantmini validate binary --data-type stocks_daily
# Check data status
quantmini data status --data-type stocks_daily
# View configuration
quantmini config show# For systems with <32GB RAM
quantmini config set pipeline.mode streaming
quantmini config set processing.chunk_size 50000
# For high-performance systems
quantmini config set pipeline.mode batch
quantmini config set processing.use_polars trueGet help for any command:
quantmini --help
quantmini data --help
quantmini data ingest --help
quantmini pipeline --helpSee the examples/ directory for complete Python examples using the CLI's underlying modules.
- Documentation: https://quantmini.readthedocs.io/
- Issues: https://github.com/nittygritty-zzy/quantmini/issues
- Email: zheyuan28@gmail.com