I'm a data enthusiast with hands-on experience in Machine Learning, Deep Learning, Natural Language Processing (NLP), Data Analysis, and Data Visualization.
My background is in clinical research, and I bring that domain knowledge directly into my data science projects — bridging healthcare expertise with rigorous analytical methods using Python, SQL, TensorFlow/Keras, NLP libraries, and Tableau.
End-to-end data science pipeline on 5,000 ClinicalTrials.gov records — EDA, NLP, XGBoost, SHAP, and SQL analytics.
This is my most comprehensive project, combining my clinical research background with a full data science workflow:
| Notebook | Focus |
|---|---|
| 📊 EDA & Data Acquisition | HuggingFace streaming, XML parsing, feature engineering |
| 📝 NLP & Text Analytics | TF-IDF, Sentence Transformers, BART zero-shot, NER |
| 🤖 Machine Learning | XGBoost + hyperparameter tuning (ROC-AUC: 0.68) |
| 🗄️ SQL Analytics | SQLite, window functions, multi-CTE sponsor scorecard |
Key results: Predicted clinical trial completion from registration metadata alone; SHAP explainability identified phase and collaborator presence as the strongest completion signals. NLP baseline (TF-IDF) achieved ROC-AUC of 0.69 from free-text summaries alone.
Tech: Python · XGBoost · SHAP · HuggingFace Transformers · Sentence Transformers · SQLite · Plotly · scikit-learn
- Cat vs Dog CNN Image Classifier — End-to-end CNN with TensorFlow/Keras for binary image classification, including data augmentation and evaluation.
- Bank Customer Churn Prediction (ANN) — Artificial Neural Network predicting customer churn for retention strategy insights.
- Vehicle Market Segmentation — K-Means & Hierarchical Clustering to segment vehicles by specification.
- Startup Profit Prediction — Compared multiple regression models to predict startup profitability.
- Drug Classification — Classification models for predicting drug effectiveness.
- Employee Attrition & Retention Analysis — End-to-end HR analytics with EDA, statistical analysis, Tableau dashboards, and predictive modelling.
- Customer Sentiment Analysis — NLP project on hotel reviews using VADER, TF-IDF, and neural networks.
- Python Data Analysis Project — Exploring datasets with Python to uncover patterns and trends.
- SQL Data Analysis Project — SQL-based analysis on real-world datasets for actionable insights.
- Tableau Dashboards — Interactive Tableau dashboards visualizing sales performance.
- Seattle Airbnb Analysis — Analyzing Airbnb listings and pricing trends.
Languages & Tools: Python · SQL · TensorFlow · Keras · Tableau · NLTK · Pandas · NumPy · Scikit-learn · XGBoost · SHAP · HuggingFace Transformers · Matplotlib · Seaborn · Plotly · SQLite · WordCloud
Techniques: Regression · Classification · Clustering · Artificial Neural Networks · Convolutional Neural Networks · NLP · Sentiment Analysis · SHAP Explainability · Data Cleaning · Data Visualization · ETL Pipelines
📌 All projects are fully reproducible with notebooks and environment files included. Explore my repositories for the full workflow.