NLP + Machine Learning project identifying student math misconceptions from open-ended responses.
Techniques used: TF-IDF, embeddings, logistic regression, deep learning baselines, and full model evaluation.
- Project Overview
- Dataset
- Notebook
- Results
- How to Run
- Requirements
- Kaggle Integration
- License
- Acknowledgments
This project explores natural language processing (NLP) and machine learning models for automatically detecting student math misconceptions by analyzing their responses. Various vectorization and modeling techniques are compared, including traditional ML and deep learning baselines. Model performance is benchmarked and explained in detail.
- Source: MAP Charting Student Math Misunderstandings Competition (Kaggle)
- Format: Open-text student responses with associated misconception labels.
- Availability:
- Download directly from Kaggle using the link above (requires a Kaggle account and competition agreement).
Project workflow, modeling, and evaluation are contained in the main Jupyter notebook:
The notebook includes data loading, preprocessing, feature engineering, modeling (TF-IDF, embeddings, logistic regression, deep learning), and evaluation.
Detailed markdown cells throughout the notebook explain each step.
- Main Findings:
- (Insert your best model’s performance summary here, e.g., “The best logistic regression model achieved X% F1-score.”)
- (Comment on key insights or interesting failure cases if desired.)
- Sample Outputs:
- (Optionally add images or output snippets here.)
Locally:
-
Clone this repository:
git clone https://github.com/LanaGeis/MAP-Student-Math-Misunderstandings_Kaggle.git cd MAP-Student-Math-Misunderstandings_Kaggle -
(Optional) Set up a Python virtual environment.
-
Install dependencies:
pip install -r requirements.txt
-
Download the dataset from the Kaggle competition page
and place it in adata/directory in this repository. -
Open the notebook:
jupyter notebook Term_Project_geissinger_final.ipynb
and run the cells in order.
On Kaggle:
- If you wish to run this notebook on Kaggle, create a new Notebook and upload
Term_Project_geissinger_final.ipynb. - Attach the MAP Charting Student Math Misunderstandings dataset from the “Add Data” sidebar.
- Make sure code for data loading references the correct Kaggle input paths.
See requirements.txt for all dependencies.
Key packages:
- Python 3.8+
- numpy, pandas, scikit-learn, matplotlib, seaborn
- tensorflow or pytorch (for deep learning models)
- tqdm, nltk, sentence-transformers
On Kaggle, most of these dependencies are pre-installed.
- Dataset:
MAP Charting Student Math Misunderstandings Competition Data - Notebook:
This notebook not yet published on Kaggle as of this release.
To publish:- Go to your Kaggle account, “Code” → “+ New Notebook”.
- Upload
Term_Project_geissinger_final.ipynb. - Attach required dataset and run all cells.
- Publish/share when ready.
MIT License.
See LICENSE for details.
- Professor Brett Werner, Bellevue University, for feedback and review.