A comprehensive real-time analytics platform for quantitative data processing, machine learning, and financial market analysis.
QuantStream Analytics Platform is a scalable, production-ready system designed for real-time data ingestion, processing, and analysis. Built with modern data engineering practices, it provides a complete solution for quantitative analytics with integrated machine learning capabilities, feature stores, and interactive dashboards.
The platform follows a microservices architecture with the following key components:
- Data Ingestion: Real-time streaming data ingestion using Apache Kafka
- ETL Pipeline: Scalable data processing with Apache Spark and Delta Lake
- Machine Learning: MLflow-integrated model training and deployment
- Feature Store: Redis-based feature serving with PostgreSQL offline storage
- Dashboard: Interactive Streamlit-based analytics dashboard
- Monitoring: Comprehensive observability with Prometheus and Grafana
- API Layer: FastAPI-based REST services
- Real-time Data Processing: Stream processing with Apache Spark and Kafka
- Delta Lake Integration: ACID transactions and time travel for data lakes
- ML Pipeline: End-to-end machine learning workflow with MLflow
- Feature Engineering: Scalable feature computation and serving
- Interactive Dashboards: Real-time analytics visualization
- Monitoring & Alerting: Production-grade observability
- Containerized Deployment: Docker and Docker Compose support
- Infrastructure as Code: Terraform configurations for cloud deployment
- Docker and Docker Compose
- Python 3.9+
- Git
-
Clone the repository
git clone <repository-url> cd quantstream-analytics
-
Set up environment variables
cp .env.example .env # Edit .env with your configuration -
Start the development environment
docker-compose up -d
-
Install Python dependencies (optional, for local development)
python -m venv venv source venv/bin/activate # On Windows: venv\\Scripts\\activate pip install -r requirements.txt pip install -e .
Once the containers are running, you can access:
- API Documentation: http://localhost:8000/docs
- Streamlit Dashboard: http://localhost:8501
- Spark UI: http://localhost:8080
- MLflow UI: http://localhost:5000
- Grafana: http://localhost:3000 (admin/admin)
- Prometheus: http://localhost:9090
- MinIO Console: http://localhost:9001 (minioadmin/minioadmin)
- Jupyter Lab: http://localhost:8888
- Airflow: http://localhost:8082 (admin/admin)
quantstream-analytics/
├── src/ # Main source code
│ ├── ingestion/ # Data ingestion components
│ ├── etl/ # ETL pipeline code
│ ├── ml/ # Machine learning models
│ ├── features/ # Feature store
│ ├── dashboard/ # Dashboard and UI
│ └── monitoring/ # Observability
├── infrastructure/ # Terraform IaC
├── tests/ # Testing suite
│ ├── unit/ # Unit tests
│ ├── integration/ # Integration tests
│ └── e2e/ # End-to-end tests
├── config/ # Configuration files
├── scripts/ # Utility scripts
├── docs/ # Documentation
├── data/ # Sample data and schemas
│ ├── raw/ # Raw data samples
│ ├── processed/ # Processed data samples
│ └── schemas/ # Data schemas
├── requirements.txt # Python dependencies
├── pyproject.toml # Project configuration
├── Dockerfile # Multi-stage Docker build
├── docker-compose.yml # Local development environment
├── .env.example # Environment variables template
└── README.md # This file
- Apache Spark: Distributed data processing
- Delta Lake: Data lake storage with ACID transactions
- Apache Kafka: Real-time streaming platform
- Redis: In-memory data structure store for caching and features
- MLflow: ML lifecycle management
- scikit-learn: Machine learning algorithms
- XGBoost/LightGBM: Gradient boosting frameworks
- Pandas/NumPy: Data manipulation and numerical computing
- FastAPI: Modern Python web framework
- Streamlit: Interactive dashboard framework
- Uvicorn: ASGI server
- Pydantic: Data validation and settings management
- PostgreSQL: Primary relational database
- MinIO: S3-compatible object storage
- Redis: Feature store and caching
- Delta Lake: Data lake storage format
- Prometheus: Metrics collection
- Grafana: Metrics visualization
- OpenTelemetry: Distributed tracing
- Structlog: Structured logging
- Docker: Containerization
- Terraform: Infrastructure as Code
- Apache Airflow: Workflow orchestration
- pytest: Testing framework
# Run all tests
pytest
# Run with coverage
pytest --cov=src --cov-report=html
# Run specific test categories
pytest -m unit
pytest -m integration
pytest -m e2eThe project uses several tools for code quality:
# Format code
black src/ tests/
# Sort imports
isort src/ tests/
# Type checking
mypy src/
# Linting
flake8 src/ tests/
# Run all quality checks
pre-commit run --all-files- Create a feature branch
- Make your changes
- Run tests and quality checks
- Submit a pull request
Key environment variables (see .env.example for full list):
DATABASE_URL: PostgreSQL connection stringREDIS_URL: Redis connection stringKAFKA_BOOTSTRAP_SERVERS: Kafka broker addressesMLFLOW_TRACKING_URI: MLflow server URIS3_ENDPOINT_URL: Object storage endpoint
Each service can be configured through environment variables or configuration files in the config/ directory.
# Build and start all services
docker-compose up --build
# Scale specific services
docker-compose up --scale spark-worker=3
# Production deployment
docker-compose -f docker-compose.yml -f docker-compose.prod.yml up -dUse the Terraform configurations in the infrastructure/ directory:
cd infrastructure
terraform init
terraform plan
terraform applyThe platform includes comprehensive monitoring:
- Application Metrics: Custom business metrics via Prometheus
- Infrastructure Metrics: System and container metrics
- Distributed Tracing: Request tracing with OpenTelemetry
- Logging: Structured logging with correlation IDs
- Alerting: Grafana alerts and notifications
Security features include:
- JWT Authentication: Token-based authentication
- CORS Configuration: Cross-origin resource sharing setup
- Rate Limiting: API rate limiting and throttling
- Environment Isolation: Separate configurations for different environments
- Secrets Management: Secure handling of sensitive configuration
The platform provides comprehensive API documentation:
- OpenAPI/Swagger: Interactive API docs at
/docs - ReDoc: Alternative API documentation at
/redoc - API Versioning: Versioned API endpoints
- Authentication: JWT-based API authentication
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
For support and questions:
- Create an issue in the repository
- Check the documentation in the
docs/directory - Review the API documentation at
/docswhen running
- Kubernetes deployment support
- Advanced ML model serving
- Real-time anomaly detection
- Enhanced security features
- Multi-tenant support
- Advanced visualization features
- Apache Spark community
- MLflow contributors
- FastAPI developers
- Streamlit team
- All open-source contributors