I build production computer vision and deep learning systems. Currently at Vetology Innovations in Chicago, where I've taken models from raw research ideas to deployed production infrastructure at scale. Over 50 models shipped across classification, detection, and segmentation, trained on A100s with 400K+ images.
My focus spans self-supervised learning, foundation models, vision-language models, generative models, and MLOps. I care about the full stack: from pretraining and architecture design to optimized inference in production.
currently : Data Scientist @ Vetology Innovations, Chicago
building : foundation models, vision-language systems, production CV pipelines
background : MS Data Science, Stevens Institute of Technology
interests : SSL pretraining · vision-language · generative models · efficient inference|
Self-Supervised & Foundation Models ssl-comparison: BYOL, SimCLR, MAE, DINOv2, MoCo on the same datasets, apples to apples mae-pretraining: MAE from scratch with reconstruction visualization medical-foundation-model: domain-specific foundation model pretrained on 100K+ images ijepa-implementation: clean I-JEPA implementation |
Vision-Language & Multimodal llava-medical-vqa: LLaVA fine-tuned for domain-specific visual Q&A clip-finetuning: domain-specific CLIP/SigLIP fine-tuning with zero-shot evaluation vlm-comparison: LLaVA vs InternVL vs Qwen-VL vs PaLiGemma benchmarked visual-rag: retrieval-augmented generation using image embeddings |
|
Generative Models diffusion-model-from-scratch: DDPM from scratch with full training pipeline diffusion-transformer: DiT implementation with scalable architecture experiments controlnet-finetuning: ControlNet fine-tuning for conditioned image generation gan-progression: DCGAN to StyleGAN2 progression with training stability |
Detection, Tracking & Segmentation yolo-benchmark: YOLOv8 vs v9 vs v10 vs YOLO-World speed/accuracy benchmark medical-image-segmentation: UNet++, DeepLabV3+, SegFormer end-to-end pipeline sam2-finetuning: SAM2 fine-tuned on custom domains with prompt strategies open-vocabulary-detection: Grounding DINO and YOLO-World for zero-shot detection |
Agentic AI
vision-agent · medical-diagnosis-agent · visual-rag · auto-image-annotator · image-search-engine · multi-agent-ml-experiments
Detection, Tracking & Video
yolo-benchmark · open-vocabulary-detection · sam2-video-tracking · multi-object-tracking · action-recognition · temporal-action-localization
Generative Models
diffusion-model-from-scratch · diffusion-transformer · controlnet-finetuning · medical-image-synthesis · gan-progression · dreambooth-finetuning
Autonomous Vehicles & 3D Vision
lidar-object-detection · multi-sensor-fusion · 3d-object-detection · depth-estimation · 3d-scene-reconstruction · occupancy-prediction
Efficient Models & Edge AI
model-compression · tensorrt-optimization · onnx-deployment · quantization-aware-training · knowledge-distillation · edge-vision-pipeline
MLOps & Infrastructure
ml-project-template · distributed-training · triton-inference-server · model-monitoring · ci-cd-for-ml · experiment-tracking
Deep Learning Fundamentals
transformer-from-scratch · vision-transformer-from-scratch · autograd-engine · paper-implementations · diffusion-math-walkthrough · backpropagation-from-scratch