This repo contains the code for my master thesis which analyzed expert specialization in the OLMoE-1B-7B-0924 Mixture-of-Experts large language model.
- To run any scripts you need to have uv installed. If you dont want to install it you can also run
pip install -r requirements.txtand run scripts in the normal way. - Dependencies are pinned to the versions I used, but newer and older versions might also work.
- All scripts will produce plots as PDFs as found in the thesis and also might save some result data as CSV or numpy files.
- All scripts are not optimized for memory usage so they can consume a lot of RAM. Consider running a script for only a single layer at a time. You can do this by passing
--layersor-lto almost all scripts. - Check out all options that can be passed to scripts by running
uv run insert_script_name.py --help
In this section I investigated how confident the router is in its decision to route tokens to experts.
To collect router values and calculate entropy values across layers run for example:
uv run router_decisive.py --num_tokens 100000 --layers 1 7 15 To collect router weights (router logits), router softmax probabilities and renormalized softmax probabilities run for example:
uv run router_stats.py --num_tokens 100000 --layers 1 In this section I looked at the MoE expert outputs in activation space and their weight matrices and compared similarities.
To compare expert output similarities in activation space via cosine or CKA run for example:
uv run sim.py --num_tokens 10000 --layers 1 --method CKATo create 2D UMAP visualizations for activations from expert outputs run for example:
uv run dim_reduc.py --num_tokens 10000 --layers 1 7 15 To create some advanced MoE router metrics (Expert Selection Frequency, Domain Specialization, Expert Token Overlap) run for example:
uv run router_metrics.py --num_tokens 10000 --layers 1 7 15 --num_experts 64 --topk 8 --num_domains 22In this section I selected individual experts, created a profile for them and then causally investigated their effect on model outputs by modifiying the routing value assigned to an expert.
To create a expert profile for a given expert as a JSON file run:
uv run profiles.py --expert_id 9 --layer 1To steer the model using the MoE router by boosting or ablating the routing weight (softmax value) for a specific expert run for example:
uv run interv.py --prompt "Data collected by Barack Obama" --target_idx 4 --expert_id 9 --layer 1