This project applies Explainable AI techniques to a Student Dropout dataset, covering pre-, in- and post-modeling explanations, as well as an analysis of their quality. The project was developed for the "Adavnced Topics on Machine Learning" course. 1st Semester of the 1st Year of the Master's Degree in Artificial Intelligence.
The objective of this project is to analyze and compare multiple XAI approaches, evaluating:
- The type of insights provided by each technique;
- The consistency of the explanations;
- The differences between interpretable (glass-box) and complex (black-box) models.
Notebooks:
task_1_1_all_data_analysis.ipynbtask_1_2_data_analysis.ipynb
This task focuses on exploratory data analysis before model training.
The dataset is analyzed to understand:
- Feature distributions;
- Relationships between features and the target variable;
- Potential data issues and relevant patterns.
These insights support informed decisions in later modeling stages.
Notebook:
task_2_in_modelling.ipynb
In this task, an interpretable (glass-box) model is trained.
The analysis focuses on:
- Feature importance;
- Model parameters and learned relationships;
- The interpretability offered directly by the model.
Notebooks:
task_3_and_4_mlp.ipynbtask_3_and_4_xgboost.ipynb
This stage involves training black-box models (MLP and XGBoost) and applying post-hoc XAI techniques, including:
- Simplification-based methods to approximate model behavior;
- Feature-based explanation techniques;
- Example-based explanations for individual predictions.
The explanations obtained from different methods are compared and discussed.
This task evaluates the quality of the generated explanations using functionally-grounded metrics.
The results are analyzed to assess explanation reliability and to suggest possible interpretability improvements.
TAACproject
├── datasets
│ ├── data_all_pca_21_components # The components generated by PCA on the Original Dataset
│ ├── data_all_preprocessed.csv # Original Dataset Pre Processed
│ ├── data_all.csv # Original Dataset
│ ├── data_preprocessed.csv # Original Dataset Pre Processed without "Enrolled"
│ ├── data.csv # Original Dataset without "Enrolled"
├── pickle_jar
│ ├── mlp_model.pkl # Saved MLP model
├── ProjectStatment # The Project Statment
├── Report_TAAC__DS2_G3.pdf # The report of the project
├── task_1_1_all_data_analysis.ipynb # Task 1 with all data ("Enrolled", "Graduated", "Dropout")
├── task_1_2_data_analysis.ipynb # Task 1 without "Enrolled"
├── task_2_in_modelling.ipynb # Task 2 with a Decision Tree as the glass-box model
├── task_3_and_4_mlp.ipynb # Task 3 and 4 with MLP
├── task_3_and_4_xgboost.ipynb # Task 3 and 4 with XGBoost
This repository contains all the code and analyses developed throughout the project.
This course is part of the first semester of the first year of the Master's Degree in Artificial Intelligence at FEUP and FCUP in the academic year 2025/2026. You can find more information about this course at the following link: