Large Language Models Discriminate Against Speakers of German Dialects

We examine whether such stereotypes are mirrored by large language models (LLMs). We draw on the sociolinguistic literature on dialect perception to analyze traits commonly associated with dialect speakers. Based on these traits, we assess the dialect naming bias and dialect usage bias expressed by LLMs in two tasks: association task and decision task. To assess a model's dialect usage bias, we construct a novel evaluation corpus that pairs sentences from seven regional German dialects (e.g., Alemannic and Bavarian) with their standard German counterparts.

📂 Directory/File Structure Overview

scripts/init.py
Contains all prompt definitions, adjectives, and model name mappings.
scripts/prompt_creation/create_tasks.py
Script for creating prompt tasks based on your annotated data.
scripts/inference.py
Runs the model inference to generate predictions.
scripts/eval_implicit.py
Evaluates the model predictions for association task.
scripts/extract_decision.py
Extracts decision outputs from the model.
scripts/eval_decision.py
Evaluates the model for decision task.
See plots/ to generate final plots

You can access all data and outputs generated in this project here: Drive.

0. Requirements.txt

Python 3.12 (recommended)
Install dependencies with:

pip install -r requirements.txt

1. Data Creation Stage

Steps

Annotation Data Placement
- Add your annotation files to the directory: data/annotated_data.

Generate Prompts

Run the prompt creation script:

python scripts/prompt_creation/create_tasks.py

2. Run Inference

Running the Inference

Dialect Usage Bias Inference

Execute the following command:

python scripts/inference.py --model_name $MODEL_PATH --output_folder output/association_usage --gt_file data/prompts/tasks/association_usage.csv

Dialect Naming Bias Inference

Execute the following command:

python scripts/inference.py --model_name $MODEL_PATH --output_folder output/association_naming --gt_file data/prompts/tasks/association_naming.csv

Make sure the environment variable $MODEL_PATH is set to your desired model path.

3. Evaluation of the Association Task

Run the implicit bias evaluation script:
```
python scripts/eval_association.py
```

4. Evaluation of Decision Task

Steps

Extract Decisions

Run the extraction script:

python scripts/extract_decision.py --model_name $MODEL_PATH

Evaluate Extracted Decisions

Run the evaluation script for decisions:
```
python scripts/eval_decision.py
```

Additional Notes

In scripts/init.py, we specifiy the model name mapping function that is being used in the eval scripts to clean model names up.
Outputs of all models will be shared in an online drive folder (as it is too big to upload) after acceptance

License

This project is licensed under the xxxx - see the LICENSE.md file for details

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
data		data
img		img
plots		plots
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Large Language Models Discriminate Against Speakers of German Dialects

📂 Directory/File Structure Overview

0. Requirements.txt

1. Data Creation Stage

Steps

2. Run Inference

Running the Inference

3. Evaluation of the Association Task

4. Evaluation of Decision Task

Steps

Additional Notes

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Large Language Models Discriminate Against Speakers of German Dialects

📂 Directory/File Structure Overview

0. Requirements.txt

1. Data Creation Stage

Steps

2. Run Inference

Running the Inference

3. Evaluation of the Association Task

4. Evaluation of Decision Task

Steps

Additional Notes

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages