Flit ML Project: Complete Phase Implementation Plan
Overview
Build ML components for the flit ecosystem, focusing on predictive models for financial risk assessment for a BNPL product. This is the first ML project at Flit, requiring research-first approach followed by production infrastructure.
Objectives
Generate and store 3 months of synthetic BNPL data
Set up data warehouse for ML research
Build data pipeline for ongoing data collection
Deliverables
Technical Tasks
Tooling Required
Database : google-cloud-bigquery, pandas-gbq
Orchestration : apache-airflow
Validation : great-expectations
Simtom Enhancement : Date range API feature
Phase 0: Research & Discovery
Objectives
Understand BNPL data patterns and business problem
Define ML problem clearly (classification vs regression, target variable)
Establish baseline performance metrics
Identify best performing model architectures
Deliverables
Technical Tasks (Some of these notebooks could be combined depending on logical workflow, and io overhead)
Tooling Required
Jupyter Notebook : Interactive development
Additional packages : matplotlib, seaborn, plotly, mlflow
Model libraries : lightgbm, catboost
Data processing : pandas, numpy (already included)
Phase 1: Production Infrastructure
Objectives
Build production-ready ML serving infrastructure
Implement model versioning and deployment pipeline
Create monitoring and observability
Deliverables
Technical Tasks
Core Architecture : Base classes, model registry, plugin system
API Development : FastAPI endpoints, async request handling
Model Deployment : Model loading, caching, version management
Data Pipeline : Real-time feature engineering, validation
Monitoring : Metrics collection, alerting, dashboards
Testing : Unit tests, integration tests, load tests
Tooling Required
FastAPI : Already included
Model Storage : joblib, pickle, or mlflow model registry
Monitoring : prometheus, grafana or simple logging
Caching : redis (optional for model caching)
Container : docker for deployment
Phase 2: Real-time Processing
Objectives
Handle streaming data from simtom
Implement real-time feature engineering
Build batch prediction capabilities
Deliverables
Technical Tasks
Tooling Required
Streaming : asyncio, httpx (already included)
Message Queue : celery + redis or simple async queues
Feature Store : redis or in-memory with persistence
Batch Processing : pandas for data processing
Phase 3: Model Operations (MLOps)
Objectives
Automated model retraining pipeline
A/B testing framework
Model performance monitoring
Deliverables
Technical Tasks
Tooling Required
Scheduling : celery beat or cron jobs
Experiment Management : Custom A/B testing or mlflow
Monitoring : evidently for data drift, custom metrics
Alerting : slack webhooks or email notifications
Phase 4: Advanced Features
Objectives
Model interpretability and explainability
Advanced model architectures
Integration with flit ecosystem
Deliverables
Technical Tasks
Tooling Required
Explainability : shap, lime, eli5
Deep Learning : pytorch or tensorflow (if needed)
Visualization : streamlit for dashboards
Integration : Custom APIs, database connectors
Phase 5: Production Deployment (1-2 weeks)
Objectives
Deploy to production environment
Load testing and performance optimization
Documentation and handover
Deliverables
Technical Tasks
Tooling Required
Deployment : railway CLI, docker
Load Testing : locust or wrk
Documentation : mkdocs or simple markdown
Monitoring : Production monitoring setup
Technology Stack Summary
Core ML Stack
Python : 3.11+ (already set)
ML Libraries : scikit-learn, xgboost, pandas, numpy (already included)
API : FastAPI + uvicorn (already included)
Validation : Pydantic (already included)
Additional Requirements by Phase
Phase -1 : google-cloud-bigquery, pandas-gbq, apache-airflow, great-expectations
Phase 0 : jupyter, matplotlib, seaborn, plotly, mlflow
Phase 1 : redis (optional), prometheus (optional)
Phase 2 : celery (optional), message queue
Phase 3 : evidently, experiment tracking
Phase 4 : shap, streamlit, pytorch (optional)
Phase 5 : locust, mkdocs
Infrastructure
Development : Poetry + virtual env (already set)
Database : BigQuery
Deployment : Railway
Storage : GCP Cloud Storage
Monitoring : Simple logging initially, then proper monitoring
Dependencies
Simtom API Enhancement : Need to contribute date range feature to simtom project before Phase -1
BigQuery Setup : GCP project and BigQuery dataset creation
Airflow Environment : Local Airflow setup or cloud-managed Airflow
Success Criteria
Phase -1 : 3 months of quality BNPL data in BigQuery
Phase 0 : Clear model recommendations with >75% baseline accuracy
Phase 1 : Production API with <100ms latency, 99%+ uptime
Phase 2 : Handle 100+ predictions/second
Phase 3 : Automated retraining and A/B testing
Phase 4 : Model explainability and advanced features
Phase 5 : Full production deployment on Railway
This issue will be updated as we progress through each phase. Each phase will have its own sub-issues for detailed tracking.
Flit ML Project: Complete Phase Implementation Plan
Overview
Build ML components for the flit ecosystem, focusing on predictive models for financial risk assessment for a BNPL product. This is the first ML project at Flit, requiring research-first approach followed by production infrastructure.
Phase -1: Data Infrastructure (whitehackr/flit-data-platform#9)
Objectives
Deliverables
Data Generation Pipeline
Data Warehouse Setup
Data Pipeline
Technical Tasks
/stream/bnpl?start_date=2024-06-01&end_date=2024-09-01)Tooling Required
google-cloud-bigquery,pandas-gbqapache-airflowgreat-expectationsPhase 0: Research & Discovery
Objectives
Deliverables
Data Understanding Report
Problem Definition Document
Model Experimentation Results
Technical Tasks (Some of these notebooks could be combined depending on logical workflow, and io overhead)
Tooling Required
matplotlib,seaborn,plotly,mlflowlightgbm,catboostpandas,numpy(already included)Phase 1: Production Infrastructure
Objectives
Deliverables
Model Serving API
Model Registry System
Monitoring Dashboard
Technical Tasks
Tooling Required
joblib,pickle, ormlflowmodel registryprometheus,grafanaor simple loggingredis(optional for model caching)dockerfor deploymentPhase 2: Real-time Processing
Objectives
Deliverables
Streaming Data Pipeline
Real-time Prediction Service
Technical Tasks
Tooling Required
asyncio,httpx(already included)celery+redisor simple async queuesredisor in-memory with persistencepandasfor data processingPhase 3: Model Operations (MLOps)
Objectives
Deliverables
Automated Training Pipeline
A/B Testing Framework
Performance Monitoring
Technical Tasks
Tooling Required
celery beatorcronjobsmlflowevidentlyfor data drift, custom metricsslackwebhooks or email notificationsPhase 4: Advanced Features
Objectives
Deliverables
Model Explainability
Advanced Models
Ecosystem Integration
Technical Tasks
Tooling Required
shap,lime,eli5pytorchortensorflow(if needed)streamlitfor dashboardsPhase 5: Production Deployment (1-2 weeks)
Objectives
Deliverables
Production Deployment
Documentation
Performance Validation
Technical Tasks
Tooling Required
railwayCLI,dockerlocustorwrkmkdocsor simple markdownTechnology Stack Summary
Core ML Stack
Additional Requirements by Phase
google-cloud-bigquery,pandas-gbq,apache-airflow,great-expectationsjupyter,matplotlib,seaborn,plotly,mlflowredis(optional),prometheus(optional)celery(optional), message queueevidently, experiment trackingshap,streamlit,pytorch(optional)locust,mkdocsInfrastructure
Dependencies
Success Criteria
This issue will be updated as we progress through each phase. Each phase will have its own sub-issues for detailed tracking.