π AI Engineer | Data Engineering | Software Engineer
π MS in Artificial Intelligence β Northeastern University
πΌ AI Engineer @ IpserLab
π Open to AI Engineer, Data Engineer, and Software Engineer roles (Visa Sponsorship)
I build production AI systems that combine LLMs, data pipelines, and scalable backend infrastructure.
My interests include:
- AI Agents & LLM Systems
- Retrieval Augmented Generation (RAG)
- Data Engineering Pipelines
- Backend Infrastructure for AI systems
- Applied Machine Learning
Python β’ PyTorch β’ Scikit-learn β’ MLflow β’ LangChain β’ LangGraph β’ RAG
Prompt Engineering β’ Agent Orchestration β’ Function Calling β’ Vector Databases
FastAPI β’ REST APIs β’ Docker β’ Kubernetes β’ CI/CD
Kafka β’ Apache Airflow β’ SQL β’ ETL / ELT Pipelines β’ Data Modeling
PostgreSQL β’ Redis β’ Pinecone β’ Vector Databases
AI system that automatically builds machine learning models from datasets.
Capabilities
- Dataset analysis
- Feature engineering
- Model training
- Evaluation and comparison
Technologies
Python β’ MLflow β’ LLM orchestration
Visual platform for building AI workflows using node-based pipelines.
Example pipeline
Web Scraper β Embeddings β LLM β Database
Technologies
React Flow β’ Python β’ API orchestration
End-to-end data pipeline using AWS services.
Architecture
Spotify API
β
AWS Lambda
β
S3 Data Lake
β
Transformation
β
Analytics Dashboard
- Developed Python backend APIs integrating LLM agents with external data sources
- Built RAG pipelines over structured and unstructured datasets achieving 95% deterministic responses
- Designed document ingestion pipelines reducing manual research time 45%
- Implemented ML evaluation and monitoring using MLflow and Airflow
- Built backend APIs supporting 10K+ monthly users
- Implemented Kafka streaming pipelines reducing analytics latency 60%
- Developed CI/CD pipelines using Docker and Kubernetes
- Developed scalable enterprise applications using Python and SQL for large-scale data processing
- Built data integration workflows improving data processing efficiency across internal systems
- Collaborated with cross-functional engineering teams to design and deploy production-grade backend services
- Optimized database queries and batch processing pipelines to improve system performance and reliability
Spotify Data Engineering Pipeline
https://medium.com/@nikhil-datasolutions/building-a-spotify-etl-pipeline-with-aws-from-api-to-dashboard-81a647ae5bcd
LinkedIn
https://linkedin.com/in/nikhil-doye
GitHub
https://github.com/Nikhil-Doye
Email
nikhil.doye@gmail.com
β Always excited to collaborate on AI agents, LLM systems, and scalable data platforms.
- π₯ Ranked World No.1οΈβ£ SQL Developer (Practice) on Hacker Rank.
- π₯ Achieved Silver Medal during NeuroHack by building a BERT model to understand service ticket descriptions and later using HDBSCAN for automated ticket categorization.
- π₯ Earned Bronze Medal in Kaggle Competition to predict student drop-out rates using machine learning models like logistic regression and random forest, later employing hyperparameter tuning to enhance performance.
- βοΈ Auto_ML_Agent: Build Machine Learning Pipelines with AI.
- π WorkFlow_Builder: Open-Source AI powered Agentic Workflow builder with No Code.
- π Hotel_Reservation_Prediction: End-to-End MLOps project to predict whether customer will cancel the reservation.
- πΊοΈ Thematic Maps with Tableau: Spatial data visualization using NHANES.
- π Noise Pollution Analysis: NYC dataset analysis with compelling storytelling.
When Iβm not analyzing data, youβll likely find me experimenting with recipes in the kitchen or exploring the hidden gems of Boston. Data and good food are my favorite combinations!
Let's collaborate and create something amazing together!
