-
System Requirements:
- Python 3.10+
- Docker & Docker Compose
- Node.js 18+ (for some tools)
- Minimum 16GB RAM, 100GB disk space
-
Required Accounts & API Keys:
# Financial data providers (choose at least one) ALPHA_VANTAGE_API_KEY=your_key_here FINNHUB_API_KEY=your_key_here POLYGON_API_KEY=your_key_here # ML & AI services MLFLOW_TRACKING_URI=your_mlflow_server ANTHROPIC_API_KEY=your_key_here # For Task Master # Cloud providers (optional, for production) AWS_ACCESS_KEY_ID=your_key AWS_SECRET_ACCESS_KEY=your_secret AZURE_CLIENT_ID=your_client_id GOOGLE_APPLICATION_CREDENTIALS=path/to/service_account.json
# Clone and setup
git clone <repository_url>
cd samp
# Copy environment template
cp .env.example .env
# Edit .env with your API keys
nano .env
# Make setup script executable
chmod +x scripts/setup.sh
# Run automated setup
./scripts/setup.sh# Start all services
docker-compose up -d
# Check service status
docker-compose ps
# View logs
docker-compose logs -f quantstream-apiService URLs after startup:
- Dashboard: http://localhost:8501
- API: http://localhost:8000
- Grafana: http://localhost:3000 (admin/admin)
- Prometheus: http://localhost:9090
- Jupyter: http://localhost:8888 (token in logs)
- MLflow: http://localhost:5000
# Install Terraform
curl -fsSL https://apt.releases.hashicorp.com/gpg | sudo apt-key add -
sudo apt-add-repository "deb [arch=amd64] https://apt.releases.hashicorp.com $(lsb_release -cs) main"
sudo apt-get update && sudo apt-get install terraform
# Configure cloud credentials
aws configure # For AWS
az login # For Azure
gcloud auth login # For GCP
# Deploy infrastructure
cd infrastructure/
./scripts/deployment/deploy.sh -e prod -c aws
# Deploy application
kubectl apply -f k8s/# Create namespace
kubectl create namespace quantstream
# Deploy with Helm
helm install quantstream ./charts/quantstream -n quantstream
# Check deployment
kubectl get pods -n quantstream# Initialize swarm
docker swarm init
# Deploy stack
docker stack deploy -c docker-compose.prod.yml quantstream
# Check services
docker service ls# Method 1: Using Python directly
cd src/ingestion/
python -m main --config ../../config/ingestion/sources_config.yaml
# Method 2: Using Docker
docker-compose exec quantstream-ingestion python -m src.ingestion.main
# Method 3: Using API
curl -X POST http://localhost:8000/ingestion/start \
-H "Content-Type: application/json" \
-d '{"sources": ["alpha_vantage", "finnhub"]}'# Start streaming ETL
cd src/etl/
python -m main run --config ../../config/etl/streaming_config.yaml
# Or via Docker
docker-compose exec quantstream-etl python -m src.etl.main run# Start Streamlit dashboard
cd src/dashboard/
streamlit run app.py --server.port 8501
# Or access via Docker
# Already running at http://localhost:8501# Train anomaly detection models
cd src/ml/
python -m training.train_models --config ../../config/ml/training_configs.yaml
# Or via API
curl -X POST http://localhost:8000/ml/train \
-H "Content-Type: application/json" \
-d '{"model_type": "ensemble", "retrain": true}'- Data Sources (
config/ingestion/sources_config.yaml):
sources:
alpha_vantage:
api_key: ${ALPHA_VANTAGE_API_KEY}
rate_limit: 5 # requests per minute
symbols: ["AAPL", "GOOGL", "MSFT", "TSLA"]- ETL Pipeline (
config/etl/streaming_config.yaml):
spark:
app_name: "QuantStream ETL"
master: "local[*]"
streaming:
trigger_interval: "10 seconds"
watermark_delay: "1 minute"- ML Models (
config/ml/model_configs.yaml):
models:
isolation_forest:
contamination: 0.1
max_samples: 1000
lstm_autoencoder:
sequence_length: 60
encoding_dim: 32# Core application
QUANTSTREAM_ENV=development # development, staging, production
LOG_LEVEL=INFO
DEBUG=true
# Database connections
POSTGRES_URL=postgresql://user:pass@localhost:5432/quantstream
REDIS_URL=redis://localhost:6379/0
# Kafka
KAFKA_BOOTSTRAP_SERVERS=localhost:9092
KAFKA_SCHEMA_REGISTRY=http://localhost:8081
# Storage
DELTA_LAKE_PATH=./data/delta
S3_BUCKET=quantstream-data# Check all services
curl http://localhost:8000/health
# Specific component health
curl http://localhost:8000/health/ingestion
curl http://localhost:8000/health/etl
curl http://localhost:8000/health/ml# Application logs
docker-compose logs -f quantstream-api
docker-compose logs -f quantstream-etl
docker-compose logs -f quantstream-ml
# System logs
tail -f logs/quantstream.log
tail -f logs/ingestion.log
tail -f logs/etl.log- API Keys Not Working:
# Verify environment variables
echo $ALPHA_VANTAGE_API_KEY
echo $FINNHUB_API_KEY
# Test API connectivity
curl "https://www.alphavantage.co/query?function=GLOBAL_QUOTE&symbol=AAPL&apikey=$ALPHA_VANTAGE_API_KEY"- Out of Memory:
# Increase Docker memory limits
# Edit docker-compose.yml
services:
quantstream-etl:
mem_limit: 4g
# Or adjust Spark configuration
export SPARK_EXECUTOR_MEMORY=2g
export SPARK_DRIVER_MEMORY=2g- Slow Performance:
# Check system resources
docker stats
# Monitor Kafka lag
docker-compose exec kafka kafka-consumer-groups.sh \
--bootstrap-server localhost:9092 \
--describe --group quantstream-etl# config/etl/streaming_config.yaml
spark:
sql.adaptive.enabled: true
sql.adaptive.coalescePartitions.enabled: true
serializer: "org.apache.spark.serializer.KryoSerializer"
sql.adaptive.skewJoin.enabled: true# Increase partition count for better parallelism
docker-compose exec kafka kafka-topics.sh \
--bootstrap-server localhost:9092 \
--alter --topic market-data \
--partitions 12-- Create indexes for better query performance
CREATE INDEX idx_quotes_symbol_timestamp ON quotes(symbol, timestamp);
CREATE INDEX idx_trades_timestamp ON trades(timestamp);
-- Enable connection pooling
ALTER SYSTEM SET max_connections = 200;
ALTER SYSTEM SET shared_buffers = '256MB';# Unit tests
pytest tests/ -v
# Integration tests
pytest tests/integration/ -v --integration
# Performance tests
pytest tests/performance/ -v --performance
# Chaos engineering tests
./scripts/chaos_tests.sh# Install k6
sudo apt install k6
# Run load tests
k6 run tests/load/api_load_test.js
k6 run tests/load/dashboard_load_test.js# Pull latest changes
git pull origin main
# Rebuild containers
docker-compose build --no-cache
# Restart services
docker-compose down && docker-compose up -d# Run migrations
python -m alembic upgrade head
# Or via Docker
docker-compose exec quantstream-api alembic upgrade head# Backup PostgreSQL
docker-compose exec postgres pg_dump -U quantstream quantstream > backup.sql
# Backup Delta Lake data
aws s3 sync ./data/delta s3://quantstream-backup/delta/$(date +%Y%m%d)/- Architecture Documentation: docs/ARCHITECTURE.md
- API Documentation: http://localhost:8000/docs (Swagger UI)
- Performance Benchmarks: docs/BENCHMARKS.md
- Troubleshooting Guide: docs/TROUBLESHOOTING.md
For issues and questions:
- Check the troubleshooting section above
- Review logs for error details
- Check system resources (CPU, memory, disk)
- Verify API keys and network connectivity
- Consult the documentation in the
docs/directory
Happy trading! 📈