Skip to content
View NadiaRozman's full-sized avatar

Block or report NadiaRozman

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
NadiaRozman/README.md

Hi, I'm Nadia! ☺️

I'm a data enthusiast with hands-on experience in Machine Learning, Deep Learning, Natural Language Processing (NLP), Data Analysis, and Data Visualization.

My background is in clinical research, and I bring that domain knowledge directly into my data science projects — bridging healthcare expertise with rigorous analytical methods using Python, SQL, TensorFlow/Keras, NLP libraries, and Tableau.


🌟 Featured Project

End-to-end data science pipeline on 5,000 ClinicalTrials.gov records — EDA, NLP, XGBoost, SHAP, and SQL analytics.

This is my most comprehensive project, combining my clinical research background with a full data science workflow:

Notebook Focus
📊 EDA & Data Acquisition HuggingFace streaming, XML parsing, feature engineering
📝 NLP & Text Analytics TF-IDF, Sentence Transformers, BART zero-shot, NER
🤖 Machine Learning XGBoost + hyperparameter tuning (ROC-AUC: 0.68)
🗄️ SQL Analytics SQLite, window functions, multi-CTE sponsor scorecard

Key results: Predicted clinical trial completion from registration metadata alone; SHAP explainability identified phase and collaborator presence as the strongest completion signals. NLP baseline (TF-IDF) achieved ROC-AUC of 0.69 from free-text summaries alone.

Tech: Python · XGBoost · SHAP · HuggingFace Transformers · Sentence Transformers · SQLite · Plotly · scikit-learn


🔹 Other Projects

Machine Learning

Analytics & NLP

Visualization


🔹 Skills

Languages & Tools: Python · SQL · TensorFlow · Keras · Tableau · NLTK · Pandas · NumPy · Scikit-learn · XGBoost · SHAP · HuggingFace Transformers · Matplotlib · Seaborn · Plotly · SQLite · WordCloud

Techniques: Regression · Classification · Clustering · Artificial Neural Networks · Convolutional Neural Networks · NLP · Sentiment Analysis · SHAP Explainability · Data Cleaning · Data Visualization · ETL Pipelines


📌 All projects are fully reproducible with notebooks and environment files included. Explore my repositories for the full workflow.

🔗 Connect: LinkedIn · GitHub

Pinned Loading

  1. Clinical_Trials_Analysis Clinical_Trials_Analysis Public

    End-to-end data science pipeline on 5,000 ClinicalTrials.gov records — EDA, NLP, XGBoost, SHAP, and SQL analytics.

    Jupyter Notebook

  2. Analytics_Portfolio_Dual_Projects Analytics_Portfolio_Dual_Projects Public

    Analytics Portfolio showcasing two end-to-end data science projects: Employee Attrition Analysis and Customer Sentiment Analysis, including EDA, NLP, ML, and Tableau dashboards.

    Jupyter Notebook 1

  3. Cat-vs-Dog-CNN-Image-Classifier Cat-vs-Dog-CNN-Image-Classifier Public

    Binary image classification of cats vs dogs using CNNs built with Keras/TensorFlow, including training visualization.

    Jupyter Notebook 1

  4. ANN_Bank_Customer_Churn_Prediction ANN_Bank_Customer_Churn_Prediction Public

    Predicting bank customer churn using Artificial Neural Networks (ANN) with scikit-learn.

    Jupyter Notebook 1

  5. ML_Clustering_Vehicle_Market_Segmentation ML_Clustering_Vehicle_Market_Segmentation Public

    Unsupervised clustering analysis of vehicle specifications using Hierarchical and K-Means methods (educational dataset).

    Jupyter Notebook 1

  6. Python_Data_Analysis_Project Python_Data_Analysis_Project Public

    Comprehensive Python analysis of job market trends, in‑demand skills, and pay for data analysts.

    Jupyter Notebook 1