GlassBoxDriver: Post-Hoc XAI for Autonomous Vehicle Actions (Explainable End-to-End Autonomous Driving)

Project Overview

GlassBoxDriver is a post-hoc XAI system for autonomous vehicle decision making, built on EfficientNet-B0 and nuScenes Mini. It analyses driving footage frame-by-frame, predicts actions in real-time, and explains each decision via Grad-CAM heatmaps and a steering arc overlay showing predicted direction and turn degree.

GlassBoxDriver features a closed-loop human feedback pipeline- uncertain frames are auto-flagged, a human corrects them via the Streamlit UI, and the model retrains on those corrections, continuously improving from real-world errors.

The full pipeline:

Audit: AI analyses every frame of a driving video/ or Images
Flag: Low-confidence frames are automatically flagged
Review: Human corrects AI mistakes through the UI
Retrain: Model learns from human corrections
Repeat: System continuously improves with use

Streamlit UI Pages

Page	Description
Home	Project overview and system architecture
Run Audit	Upload dashcam video/Images, get annotated output with Grad-CAM steering arc
Review Flags	View flagged uncertain frames and correct AI mistakes
Feedback Retrain	Merge human corrections into training data and retrain
Session Logs	View past audit sessions, action distribution charts, trust over time

Live Screen Inference (screen_ai.py)

Real-time predictions overlaid on game footage with steering arc and probability bars: (ignore label bug [left-right] on overlay)

5 Predicted Actions:

Action	Trigger Condition
Go Straight	Default- no strong signal
Brake	brake_switch active OR brake > 5
Accelerate	throttle > 200 AND speed > 5
Turn Left	steering > 0.3 rad
Turn Right	steering < -0.3 rad

Installation

1. Clone the Repository

git clone https://github.com/Ravevx/GlassBoxDriver-Post-Hoc-XAI-for-Autonomous-Vehicle-Actions.git
cd GlassBoxDriver-Post-Hoc-XAI-for-Autonomous-Vehicle-Actions

2. Create Conda Environment

conda create -n agent-local python=3.10
conda activate agent-local

3. Install Dependencies

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip install streamlit opencv-python timm matplotlib tqdm mss pillow

Dataset Setup

Step 1 - Download nuScenes Mini Dataset

Go to: https://www.nuscenes.org/nuscenes
Go to the Download page
Download nuScenes Mini (approx 4GB)
- File: v1.0-mini.tgz

Step 2 - Download CAN Bus Data

On the same download page
Download CAN bus expansion for mini
- File: can_bus.zip

Step 3 - Extract and Arrange Files

Extract both downloads and arrange exactly like this:

data/
└── nuscenes/
    ├── can_bus/
    │   ├── scene-0061_steeranglefeedback.json
    │   ├── scene-0061_vehicle_monitor.json
    │   ├── scene-0553_steeranglefeedback.json
    │   └── ... (all scene JSON files)
    ├── sweeps/
    │   ├── CAM_FRONT/
    │   │   └── *.jpg
    │   ├── CAM_FRONT_LEFT/
    │   │   └── *.jpg
    │   └── CAM_FRONT_RIGHT/
    │       └── *.jpg
    └── samples/
        ├── CAM_FRONT/
        │   └── *.jpg
        ├── CAM_FRONT_LEFT/
        │   └── *.jpg
        └── CAM_FRONT_RIGHT/
            └── *.jpg

Step 4 - Update Data Path

Open dataset.py and update line 6 to your local nuScenes path:

NUSCENES_ROOT = r"C:\your\path\to\data\nuscenes"

Project Structure

GlassBoxDriver/
│
├── app.py                  # Streamlit UI - main entry point
├── analyse.py              # XAI video audit engine
├── dataset.py              # nuScenes data extractor + labeller
├── train.py                # Model training script
├── screen_ai.py            # Live screen capture inference
├── balance_dataset.py      # Undersample classes to equal size
│
├── src/
│   ├── decision.py         # EfficientNet-B0 model definition + ACTIONS
│   ├── gradcam.py          # Grad-CAM heatmap generator
│   ├── flagging.py         # Uncertain frame flagging logic
│   └── feedback.py         # Human-in-the-loop retraining
│
├── data/
│   ├── nuscenes/           # put downloaded dataset here
│   ├── train/              # Auto-generated by dataset.py
│   │   ├── Go Straight/
│   │   ├── Brake/
│   │   ├── Accelerate/
│   │   ├── Turn Left/
│   │   └── Turn Right/
│   ├── flagged/            # Auto-generated during audit
│   └── video/              # Uploaded videos via Streamlit
│
├── models/
│   └── driving_cnn.pth     # Auto-saved after training
│
├── output/                 # Annotated audit videos saved here
├── logs/                   # Session CSV logs saved here
└── README.md
└── utils/
    └──balance_dataset.py   #Undersample all classes to equal size
    └──check_canbus.py      #run this to see structure
    └──check_dataset.py     #Verify images + labels are correctly paired
    └──fix_cleanup.py      #Delete all augmented files (keep only originals)
    └──aug_data.py         #Flip images to augment
    └──review_app.py       #Human review UI for flagged frames

Running the Project

Follow these steps in order:

Step 1 - Extract Training Data

python dataset.py

Expected output:

Total can_bus records: 19722
Processing sweeps/CAM_FRONT: 1938 images
Processing sweeps/CAM_FRONT_LEFT: 1940 images
Processing sweeps/CAM_FRONT_RIGHT: 1934 images
Processing samples/CAM_FRONT: 404 images
...
Total images extracted : ~6400
Class distribution:
  Go Straight : 2200
  Brake       : 1900
  Accelerate  :  650
  Turn Left   :  750
  Turn Right  :  620

Step 2 - (Optional) Balance Classes

python balance_dataset.py

Undersamples all classes to match the smallest class count for perfectly balanced training.

Step 3 - Train the Model

python train.py

Step 4.1 - Run on hardcoded video path (without Ui)

python analyse.py

Step 4.2 - Launch Streamlit UI (upload,audit,xai,humanFeedback,retrain,full logs)

streamlit run app.py

Step 4.3 - Live Screen Inference

python screen_ai.py

Captures your full screen in real-time and predicts driving actions live. Press Q to quit.

Model Architecture

Input Image (224x224x3)
        |
EfficientNet-B0 Backbone (pretrained ImageNet)
        |
       1280 features
    /              \
action_head          steering_head
Linear(1280->256)     Linear(1280->1)
ReLU + Dropout(0.3)
Linear(256->5)
        |                 |
5 class probs        steering angle
(softmax)            (tanh * 30 deg)

How It Works

dataset.py   -->  Extracts frames from nuScenes cameras
                  Labels each frame using CAN bus sensor timestamp matching
                  Saves labelled images to data/train/<action>/

train.py     -->  Loads labelled images
                  Fine-tunes EfficientNet-B0 on 5 driving action classes
                  Uses WeightedRandomSampler to handle class imbalance
                  Saves best model weights to models/driving_cnn.pth

analyse.py   -->  Loads trained model
                  Reads video frame by frame
                  Runs inference on each frame
                  Generates Grad-CAM heatmap every 5 frames
                  Computes Trust Score
                  Draws steering arc overlay on frame
                  Saves annotated video + CSV log
                  Flags uncertain frames for human review

app.py       -->  Streamlit UI that ties all above together
                  Allows video upload, audit, review, and retraining

Trust Score Formula

Trust = (Confidence + Heatmap Concentration + (1 - Entropy)) / 3

Where:
  Confidence           = max class probability
  Heatmap Concentration = max(heatmap) - mean(heatmap)
  Entropy              = -sum(p * log(p)) / log(num_classes)

Score > 0.5  -->  High trust, model is confident and focused
Score < 0.5  -->  Low trust, frame flagged for human review

How Frames Get Labelled (dataset.py)

Image filename contains timestamp:
n015-2018-07-24__CAM_FRONT__1532402927612460.jpg
                              ^
                              Extract this number

Find nearest CAN bus record within 2 seconds of timestamp

CAN bus record contains:
  steering_rad, brake, brake_switch, throttle, speed

Apply get_label() rules:
  brake_switch in (2,3) OR brake > 5  -->  Brake
  steering > 0.3 rad                  -->  Turn Left
  steering < -0.3 rad                 -->  Turn Right
  throttle > 200 AND speed > 5        -->  Accelerate
  else                                -->  Go Straight

🤝 Open for Collaboration

GlassBoxDriver is an ongoing project and still has known limitations we are actively trying to solve. If you are interested in improving this system, collaborating, or building on top of it — contributions are very welcome!

Known Issues & Open Problems

Problem	Current State	What's Needed
Model predicts Go Straight too often	Domain shift from nuScenes to real/game footage	More diverse training data
Only 10 nuScenes Mini scenes used	~6400 images, too small for generalization	Full nuScenes dataset (1000 scenes)
Labels come from CAN bus rules	Rigid threshold-based labelling	Learned or smoother label generation
Left/Right arc overlay is flipped	Known visual bug in steering arc	Fix in `draw_steering_overlay()`
No temporal context	Each frame predicted independently	Add LSTM or temporal attention
Game footage not in training data	Model never saw rendered graphics	Add GTA VC / BeamNG / sim data

Areas to Contribute

Larger Dataset: Integrate full nuScenes (1000 scenes) or BDD100K for broader driving coverage and better generalization
Model Architecture: Experiment with temporal models (LSTM, Transformer) that use sequences of frames instead of single frames for richer context
Better Labelling: Replace hard threshold rules in get_label() with smoother or learned labels from steering angle regression
Sim-to-Real Transfer: Add synthetic driving data from simulators like CARLA, BeamNG, or GTA V to improve game footage predictions
Grad-CAM Improvements: Replace Grad-CAM with GradCAM++, EigenCAM or SHAP for more accurate and stable heatmaps
Active Learning: Smarter flagging strategy that selects the most informative uncertain frames for human review
Steering Arc Bug Fix: Left and Right labels on the overlay arc are currently inverted — needs a sign correction in draw_steering_overlay()
Also open to any other Contributions

Get in Touch

If you want to collaborate, raise an issue or start a discussion on GitHub: ⭐ Star the repo if you find it useful- it helps others discover the project!

Built with curiosity and a lot of debugging. —@Ravevx

Credits

Dataset: nuScenes Mini by Motional
Model Backbone: EfficientNet-B0 via timm
XAI Method: Grad-CAM

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.vscode		.vscode
Images		Images
models		models
src		src
utils		utils
README.md		README.md
analyse.py		analyse.py
app.py		app.py
dataset.py		dataset.py
requirements.txt		requirements.txt
screen_ai.py		screen_ai.py
train.py		train.py

Folders and files

Latest commit

History

Repository files navigation

GlassBoxDriver: Post-Hoc XAI for Autonomous Vehicle Actions (Explainable End-to-End Autonomous Driving)

Project Overview

Streamlit UI Pages

Live Screen Inference (screen_ai.py)

Installation

1. Clone the Repository

2. Create Conda Environment

3. Install Dependencies

Dataset Setup

Step 1 - Download nuScenes Mini Dataset

Step 2 - Download CAN Bus Data

Step 3 - Extract and Arrange Files

Step 4 - Update Data Path

Project Structure

Running the Project

Step 1 - Extract Training Data

Step 2 - (Optional) Balance Classes

Step 3 - Train the Model

Step 4.1 - Run on hardcoded video path (without Ui)

Step 4.2 - Launch Streamlit UI (upload,audit,xai,humanFeedback,retrain,full logs)

Step 4.3 - Live Screen Inference

Model Architecture

How It Works

Trust Score Formula

How Frames Get Labelled (dataset.py)

🤝 Open for Collaboration

Known Issues & Open Problems

Areas to Contribute

Get in Touch

Credits

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages