ULPS: Uncertainty-Aware LLM-Guided Policy Shaping (MiniGrid UnlockPickup)

This repository contains the research code and experiment artifacts used in the paper:

"Uncertainty-Aware LLM-Guided Policy Shaping for Sparse-Reward Reinforcement Learning"

The project implements ULPS, a framework that combines:

A fine-tuned BERT-based LLM (action predictor)
Monte Carlo Dropout for uncertainty estimation
Entropy-based blending between the LLM policy and PPO policy
Baselines including Unguided PPO, Linear Decay, Q-Learning, and DQN

Repository Structure

MainCode/
Main implementation of ULPS (Calibrated LLM + PPO) and experiment runners.
Qlearning/
Q-Learning baseline implementation and related utilities.
DQN/
DQN baseline implementation.
Ablation Study/
Additional scripts for ablation experiments.
results/
Pre-generated CSV logs, metrics, plots, and summaries used to create the figures/tables in the paper.

Notes on Reproducibility

This repository is provided as a research artifact.
The core architecture and training pipeline are included, however:

Some results (CSV files and plots) were generated during experimentation and are included under results/.
Not all result files are regenerated automatically by a single script.
Some plotting and summarization scripts assume the same output folder structure used during the experiments.
Several additional plots were generated during development, but only a subset was selected for the final paper for compactness.

Running Experiments (Main ULPS)

Example scripts:

MainCode/run_4x4_experiment.py
MainCode/run_8x8_experiment.py

These scripts generate episode-level CSV logs and summaries.

Output Files

The results/ folder contains:

episode-level logs (episode_metrics_*.csv)
entropy logs (entropy_values_*.csv)
calibration metrics (calibration_metrics_*.csv)
final plots (final_training_summary_*.png)
consolidated experiment summaries (experiment_auc_summary.csv)

These files were used to generate the tables and figures reported in the paper.

Disclaimer

This codebase reflects the structure used during research and experimentation.
Some parts may require minor path adjustments depending on your local environment.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
Ablation Study		Ablation Study
DQN		DQN
MainCode		MainCode
Qlearning		Qlearning
Results		Results
.DS_Store		.DS_Store
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ULPS: Uncertainty-Aware LLM-Guided Policy Shaping (MiniGrid UnlockPickup)

Repository Structure

Notes on Reproducibility

Running Experiments (Main ULPS)

Output Files

Disclaimer

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ULPS: Uncertainty-Aware LLM-Guided Policy Shaping (MiniGrid UnlockPickup)

Repository Structure

Notes on Reproducibility

Running Experiments (Main ULPS)

Output Files

Disclaimer

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages