Problem:
Toastmasters members often struggle to find accurate, up-to-date information scattered across multiple documents such as pathways, roles, and club processes.
What I built:
A Retrieval-Augmented Generation (RAG) system that provides document-grounded answers, improving accuracy and reducing hallucinations in responses.
Key features:
- Three-stage pipeline: Ingestion โ Retrieval & Generation โ Evaluation
- Query classification for improved retrieval routing
- Metadata-driven filtering to narrow search scope
- Reranking to improve relevance of retrieved documents
Tech: Python, LLMs, RAG, Vector Search
๐ GitHub
Goal:
Develop strong intuition for data behavior, predictive modeling, and generalization through hands-on implementation of core ML techniques and exploratory analysis.
What I worked on:
-
Exploratory Data Analysis (NYC Taxi Dataset):
Analyzed large-scale trip data to uncover financial and temporal patterns.- Data preparation: sampling, cleaning, preprocessing
- Analysis: revenue trends, peak demand hours, seasonal effects
- Insights: operational and business strategies to optimize profitability
-
Linear Regression (Car Price Prediction):
Built predictive models for car prices and studied the effect of regularisation.- Applied Ridge and Lasso to handle multicollinearity
- Used regularisation for feature selection and improved generalization
-
Logistic Regression (Employee Attrition Prediction):
Modeled employee attrition as a classification problem.- Data preprocessing and feature selection
- Model training and evaluation
- Intuitive data visualisations to interpret predictions and decision boundaries
Focus areas:
- Biasโvariance trade-off and overfitting
- Role of regularisation in controlling model complexity
- Translating data patterns into interpretable insights
- Using data visualisation as a reasoning and diagnostic tool
Tech: Python, NumPy, Pandas, Matplotlib, Scikit-learn
๐ Repos:

