Junior Data Scientist
My practice in data science and business intelligence centres on translating raw data into strategic insight. With a strong foundation in business administration, I evaluate analytical opportunities through a commercial lens, design KPI frameworks, and deliver executive-ready narratives that drive informed decision-making.
I engineer end‑to‑end machine learning and deep learning solutions—from rigorous data cleaning and feature engineering to model deployment—solving classification, regression, and clustering challenges. I emphasise robustness, interpretability, and scalability to ensure AI systems generate measurable value.
A deep engagement with natural language processing, grounded in computational linguistics and formal study of language and psychology, enables me to model unstructured text, uncover semantic patterns, and build systems that understand and generate human language at scale.
My multidisciplinary education—an MBA, a minor in psychology, and a major in Spanish language and literature—provides a distinct analytical advantage: business acumen aligns projects with strategy; psychological and linguistic training refines experimental design, textual analysis, and user‑centric thinking, all converging to strengthen my work in data science, artificial intelligence, and natural language processing.
Data Science • Machine Learning • Deep Learning • Artificial Intelligence • Natural Language Processing • Business Intelligence
Technical: Python, SQL, Scikit‑learn, XGBoost, FastAPI, Streamlit, Tableau, BigQuery, ETL, Data Modeling, Feature Engineering, Model Deployment, Version Control (Git), MLOps
Predicting term deposit subscriptions from a Portuguese bank's direct marketing campaign using a Random Forest classifier (ROC‑AUC 0.906). The solution is deployed as an interactive Streamlit app supporting both single and batch predictions.
- Logic: Engineered binary features from categorical and numerical data (e.g., previous campaign success, contact month, job type), selected via correlation analysis to reduce multicollinearity, and trained a Random Forest achieving strong recall on the minority class.
- Live Demo: Streamlit App
- Code: GitHub Repository
Using WHO health and socioeconomic indicators to predict life expectancy with a Random Forest regressor (R² = 0.958, RMSE = 1.93 years). Served via a FastAPI REST API with a lightweight HTML frontend.
- Logic: Created polynomial and interaction features (e.g., mortality‑BMI cross, log‑HIV), validated multicollinearity with VIF, and built a model that explains over 95% of variance. The API accepts JSON input and returns instant predictions.
- Live Demo: FastAPI App
- Code: GitHub Repository
Unsupervised segmentation of credit card users into five behavioural clusters using K‑Means, with an interactive Streamlit app that classifies new customers and suggests business strategies.
- Logic: Engineered domain‑driven features (cash advance ratio, one‑off purchase share, purchase frequency), refined through multimodality checks and correlation analysis, then applied K‑Means (k=5) after comparing silhouette scores against DBSCAN. Each segment receives a targeted business strategy.
- Live Demo: Streamlit App
- Code: GitHub Repository
- MBA, University of Tehran
- BA in Spanish Language & Literature, Minor in Psychology, University of Tehran
- Google Advanced Data Analytics Professional Certificate
- Google Business Intelligence Professional Certificate
- IBM Machine Learning Professional Certificate
- Email: mohammaderfanrashidi@gmail.com
- LinkedIn: linkedin.com/in/mohammad-erfan-rashidi-4a50b0284
- WhatsApp: +98 935 217 0440
- Portfolio: mohammaderfanrashidi.github.io
Committed to delivering robust, interpretable, and impactful AI solutions.