🧟 Frankenstein-Analysis

What Does Reinforcement Learning Really Change in Visual Reasoning?

A Frankenstein-style analysis framework for understanding where reinforcement learning (RL) alters vision-language models (VLMs) and what changes across the training pipeline (Base → SFT/IN → RL).

🔑 Key Findings

📉 No monotonic improvement. Despite end-to-end benchmark gains, vision ability, reasoning ability, and vision-to-reasoning alignment do not improve monotonically from Base → IN → RL.
👁️ Consistent attention shift. Across diverse training recipes, RL consistently increases attention from reasoning tokens to visual tokens, primarily in mid-to-late transformer layers.
🧬 Structured parameter updates. RL refinements concentrate in mid-to-late layers, are transferable across recipes, and contribute primarily to improved vision-to-reasoning alignment and reasoning performance.

🏗️ Framework Overview

The framework consists of three analysis components and one validation step:

Component	Goal	Method	Script
🔬 Functional Localization	Where do vision/reasoning abilities reside?	Causal probing across transformer depth	`localization_vision.py`, `localization_reasoning.py`
📊 Update Characterization	What does RL change structurally?	SVD spectral analysis & Frobenius norm comparison	`characterisation_diversity.py`, `characterisation_energy.py`
🧩 Transferability Test	Are RL updates transferable?	Block-wise model merging	`transferability_test.py`
❄️ Necessity Validation	Are mid-late layer updates necessary?	Block-wise parameter freezing during RL	OpenMMReasoner

🤖 Supported Training Recipes

All experiments use Qwen/Qwen2.5-VL-7B-Instruct as the base model with the following IN+RL pairs:

Recipe	IN (SFT) Model	RL Model
OpenMMReasoner	`OpenMMReasoner/OpenMMReasoner-ColdStart`	`OpenMMReasoner/OpenMMReasoner-RL`
Revisual-R1	`csfufu/Revisual-R1-Coldstart`	`csfufu/Revisual-R1-final`
MMR1	`MMR1/MMR1-7B-SFT`	`MMR1/MMR1-7B-RL`

⚙️ Setup

conda create -n frankenstein python=3.10
conda activate frankenstein
pip install -r requirements.txt

For experiments that use GPT-based grading (transferability test), set your OpenAI API key:

export OPENAI_API_KEY="your-api-key"

🚀 Usage

All experiments are launched from the project root via shell scripts in runs/.

1. 🔬 Functional Localization

Vision localization measures how visual token swapping at each layer affects model output:

bash runs/localization_vision.sh

The script evaluates four vision tasks (counting, OCR, grounding, recognition) using a custom dataset under dataset/. At each target layer, visual tokens from a source image are swapped into the KV cache, and the resulting output change rate is measured.

Reasoning localization measures accuracy degradation when zero-ablating each layer on math benchmarks (GSM8K, MATH-500):

bash runs/localization_reasoning.sh

For each layer i, the attention + MLP output is set to zero (keeping the skip connection), isolating each layer's contribution to reasoning.

2. 📊 Update Characterization

SVD analysis compares the singular value spectra of weight deltas between IN and RL phases:

bash runs/characterisation_diversity.sh

Energy analysis compares Frobenius norms of weight deltas layer by layer:

bash runs/characterisation_energy.sh

Both scripts analyze all layers across the vision encoder, projector, and LLM backbone using a unified layer indexing scheme.

3. 🧩 Transferability Test

Evaluates whether transplanting RL-refined layer blocks into IN models preserves functional improvements. Tests multiple block-wise splicing configurations (early/mid/late thirds):

bash runs/run_eval_v1_openmmreasoner.sh
bash runs/run_eval_v1_revisual.sh
bash runs/run_eval_v1_mmr1.sh

Each run evaluates the original SFT/RL models alongside six Frankenstein variants on perception (General VQA), alignment (Math VQA), and reasoning (MATH-500) tasks.

📁 Project Structure

Frankenstein-Analysis/
├── scripts/                          # Core analysis scripts
│   ├── localization_vision.py        # Vision localization via visual token swapping
│   ├── localization_reasoning.py     # Reasoning localization via zero ablation
│   ├── characterisation_diversity.py # Layer-wise SVD spectral analysis
│   ├── characterisation_energy.py    # Layer-wise Frobenius norm analysis
│   └── transferability_test.py       # Block-wise model merging evaluation
├── utils/
│   └── vision_token_swapper.py       # Visual token swapping utility (KV cache manipulation)
├── runs/                             # Shell scripts to launch experiments
├── dataset/                          # Vision functionality dataset (counting, OCR, grounding, recognition)
├── results/                          # Output directory (gitignored)
├── requirements.txt
├── LICENSE
└── README.md

📂 Results

All results are saved under results/, organized by experiment type and model:

results/
├── localization_vision/{Model_Name}/
├── localization_reasoning/{Model_Name}/{dataset}/
├── characterisation_diversity/{Recipe}/
├── characterisation_energy/{Recipe}/
└── transferability_test/{Recipe}/

📝 Citation

@misc{li2026doesrlimprovevisual,
      title={What does RL improve for Visual Reasoning? A Frankenstein-Style Analysis}, 
      author={Xirui Li and Ming Li and Tianyi Zhou},
      year={2026},
      eprint={2602.12395},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2602.12395}, 
}

📄 License

This project is licensed under the MIT License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧟 Frankenstein-Analysis

🔑 Key Findings

🏗️ Framework Overview

🤖 Supported Training Recipes

⚙️ Setup

🚀 Usage

1. 🔬 Functional Localization

2. 📊 Update Characterization

3. 🧩 Transferability Test

📁 Project Structure

📂 Results

📝 Citation

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
dataset/vision_functionality_dataset		dataset/vision_functionality_dataset
images		images
runs		runs
scripts		scripts
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

🧟 Frankenstein-Analysis

🔑 Key Findings

🏗️ Framework Overview

🤖 Supported Training Recipes

⚙️ Setup

🚀 Usage

1. 🔬 Functional Localization

2. 📊 Update Characterization

3. 🧩 Transferability Test

📁 Project Structure

📂 Results

📝 Citation

📄 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages