GlassBoxDriver: Post-Hoc XAI for Autonomous Vehicle Actions (Explainable End-to-End Autonomous Driving)
GlassBoxDriver is a post-hoc XAI system for autonomous vehicle decision making, built on EfficientNet-B0 and nuScenes Mini. It analyses driving footage frame-by-frame, predicts actions in real-time, and explains each decision via Grad-CAM heatmaps and a steering arc overlay showing predicted direction and turn degree.
GlassBoxDriver features a closed-loop human feedback pipeline- uncertain frames are auto-flagged, a human corrects them via the Streamlit UI, and the model retrains on those corrections, continuously improving from real-world errors.
The full pipeline:
- Audit: AI analyses every frame of a driving video/ or Images
- Flag: Low-confidence frames are automatically flagged
- Review: Human corrects AI mistakes through the UI
- Retrain: Model learns from human corrections
- Repeat: System continuously improves with use
| Page | Description |
|---|---|
| Home | Project overview and system architecture |
| Run Audit | Upload dashcam video/Images, get annotated output with Grad-CAM steering arc |
| Review Flags | View flagged uncertain frames and correct AI mistakes |
| Feedback Retrain | Merge human corrections into training data and retrain |
| Session Logs | View past audit sessions, action distribution charts, trust over time |
Real-time predictions overlaid on game footage with steering arc and probability bars: (ignore label bug [left-right] on overlay)
5 Predicted Actions:
| Action | Trigger Condition |
|---|---|
| Go Straight | Default- no strong signal |
| Brake | brake_switch active OR brake > 5 |
| Accelerate | throttle > 200 AND speed > 5 |
| Turn Left | steering > 0.3 rad |
| Turn Right | steering < -0.3 rad |
git clone https://github.com/Ravevx/GlassBoxDriver-Post-Hoc-XAI-for-Autonomous-Vehicle-Actions.git
cd GlassBoxDriver-Post-Hoc-XAI-for-Autonomous-Vehicle-Actionsconda create -n agent-local python=3.10
conda activate agent-localpip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip install streamlit opencv-python timm matplotlib tqdm mss pillow- Go to: https://www.nuscenes.org/nuscenes
- Go to the Download page
- Download nuScenes Mini (approx 4GB)
- File:
v1.0-mini.tgz
- File:
- On the same download page
- Download CAN bus expansion for mini
- File:
can_bus.zip
- File:
Extract both downloads and arrange exactly like this:
data/
└── nuscenes/
├── can_bus/
│ ├── scene-0061_steeranglefeedback.json
│ ├── scene-0061_vehicle_monitor.json
│ ├── scene-0553_steeranglefeedback.json
│ └── ... (all scene JSON files)
├── sweeps/
│ ├── CAM_FRONT/
│ │ └── *.jpg
│ ├── CAM_FRONT_LEFT/
│ │ └── *.jpg
│ └── CAM_FRONT_RIGHT/
│ └── *.jpg
└── samples/
├── CAM_FRONT/
│ └── *.jpg
├── CAM_FRONT_LEFT/
│ └── *.jpg
└── CAM_FRONT_RIGHT/
└── *.jpg
Open dataset.py and update line 6 to your local nuScenes path:
NUSCENES_ROOT = r"C:\your\path\to\data\nuscenes"GlassBoxDriver/
│
├── app.py # Streamlit UI - main entry point
├── analyse.py # XAI video audit engine
├── dataset.py # nuScenes data extractor + labeller
├── train.py # Model training script
├── screen_ai.py # Live screen capture inference
├── balance_dataset.py # Undersample classes to equal size
│
├── src/
│ ├── decision.py # EfficientNet-B0 model definition + ACTIONS
│ ├── gradcam.py # Grad-CAM heatmap generator
│ ├── flagging.py # Uncertain frame flagging logic
│ └── feedback.py # Human-in-the-loop retraining
│
├── data/
│ ├── nuscenes/ # put downloaded dataset here
│ ├── train/ # Auto-generated by dataset.py
│ │ ├── Go Straight/
│ │ ├── Brake/
│ │ ├── Accelerate/
│ │ ├── Turn Left/
│ │ └── Turn Right/
│ ├── flagged/ # Auto-generated during audit
│ └── video/ # Uploaded videos via Streamlit
│
├── models/
│ └── driving_cnn.pth # Auto-saved after training
│
├── output/ # Annotated audit videos saved here
├── logs/ # Session CSV logs saved here
└── README.md
└── utils/
└──balance_dataset.py #Undersample all classes to equal size
└──check_canbus.py #run this to see structure
└──check_dataset.py #Verify images + labels are correctly paired
└──fix_cleanup.py #Delete all augmented files (keep only originals)
└──aug_data.py #Flip images to augment
└──review_app.py #Human review UI for flagged frames
Follow these steps in order:
python dataset.pyExpected output:
Total can_bus records: 19722
Processing sweeps/CAM_FRONT: 1938 images
Processing sweeps/CAM_FRONT_LEFT: 1940 images
Processing sweeps/CAM_FRONT_RIGHT: 1934 images
Processing samples/CAM_FRONT: 404 images
...
Total images extracted : ~6400
Class distribution:
Go Straight : 2200
Brake : 1900
Accelerate : 650
Turn Left : 750
Turn Right : 620
python balance_dataset.pyUndersamples all classes to match the smallest class count for perfectly balanced training.
python train.pypython analyse.pystreamlit run app.pypython screen_ai.pyCaptures your full screen in real-time and predicts driving actions live.
Press Q to quit.
Input Image (224x224x3)
|
EfficientNet-B0 Backbone (pretrained ImageNet)
|
1280 features
/ \
action_head steering_head
Linear(1280->256) Linear(1280->1)
ReLU + Dropout(0.3)
Linear(256->5)
| |
5 class probs steering angle
(softmax) (tanh * 30 deg)
dataset.py --> Extracts frames from nuScenes cameras
Labels each frame using CAN bus sensor timestamp matching
Saves labelled images to data/train/<action>/
train.py --> Loads labelled images
Fine-tunes EfficientNet-B0 on 5 driving action classes
Uses WeightedRandomSampler to handle class imbalance
Saves best model weights to models/driving_cnn.pth
analyse.py --> Loads trained model
Reads video frame by frame
Runs inference on each frame
Generates Grad-CAM heatmap every 5 frames
Computes Trust Score
Draws steering arc overlay on frame
Saves annotated video + CSV log
Flags uncertain frames for human review
app.py --> Streamlit UI that ties all above together
Allows video upload, audit, review, and retraining
Trust = (Confidence + Heatmap Concentration + (1 - Entropy)) / 3
Where:
Confidence = max class probability
Heatmap Concentration = max(heatmap) - mean(heatmap)
Entropy = -sum(p * log(p)) / log(num_classes)
Score > 0.5 --> High trust, model is confident and focused
Score < 0.5 --> Low trust, frame flagged for human review
Image filename contains timestamp:
n015-2018-07-24__CAM_FRONT__1532402927612460.jpg
^
Extract this number
Find nearest CAN bus record within 2 seconds of timestamp
CAN bus record contains:
steering_rad, brake, brake_switch, throttle, speed
Apply get_label() rules:
brake_switch in (2,3) OR brake > 5 --> Brake
steering > 0.3 rad --> Turn Left
steering < -0.3 rad --> Turn Right
throttle > 200 AND speed > 5 --> Accelerate
else --> Go Straight
GlassBoxDriver is an ongoing project and still has known limitations we are actively trying to solve. If you are interested in improving this system, collaborating, or building on top of it — contributions are very welcome!
| Problem | Current State | What's Needed |
|---|---|---|
| Model predicts Go Straight too often | Domain shift from nuScenes to real/game footage | More diverse training data |
| Only 10 nuScenes Mini scenes used | ~6400 images, too small for generalization | Full nuScenes dataset (1000 scenes) |
| Labels come from CAN bus rules | Rigid threshold-based labelling | Learned or smoother label generation |
| Left/Right arc overlay is flipped | Known visual bug in steering arc | Fix in draw_steering_overlay() |
| No temporal context | Each frame predicted independently | Add LSTM or temporal attention |
| Game footage not in training data | Model never saw rendered graphics | Add GTA VC / BeamNG / sim data |
- Larger Dataset: Integrate full nuScenes (1000 scenes) or BDD100K for broader driving coverage and better generalization
- Model Architecture: Experiment with temporal models (LSTM, Transformer) that use sequences of frames instead of single frames for richer context
- Better Labelling: Replace hard threshold rules in
get_label()with smoother or learned labels from steering angle regression - Sim-to-Real Transfer: Add synthetic driving data from simulators like CARLA, BeamNG, or GTA V to improve game footage predictions
- Grad-CAM Improvements: Replace Grad-CAM with GradCAM++, EigenCAM or SHAP for more accurate and stable heatmaps
- Active Learning: Smarter flagging strategy that selects the most informative uncertain frames for human review
- Steering Arc Bug Fix: Left and Right labels on the overlay arc are
currently inverted — needs a sign correction in
draw_steering_overlay() - Also open to any other Contributions
If you want to collaborate, raise an issue or start a discussion on GitHub: ⭐ Star the repo if you find it useful- it helps others discover the project!
Built with curiosity and a lot of debugging. —@Ravevx
- Dataset: nuScenes Mini by Motional
- Model Backbone: EfficientNet-B0 via timm
- XAI Method: Grad-CAM






