This repository contains the code used to explore the effectiveness of scalable multi-tenant dense retrieval deployments based on LoRA (Low-Rank Adaptation).
.
├── beir_datasets/ # automatically populated with BEIR datasets
├── beir_results/ # results directory
├── utils/ # utilities and helper functions
├── src/ # core modules and components
├── eval_beir.py # evaluation
├── finetuning.py # fine-tuning and lora adaptation
├── prepare_contrastive_dataset.py # hard-negative mining and preprocessing
├── scripts/ # shell scripts
└── README.md
We recommend using uv to manage the environment and dependencies.
Once uv is installed, run
uv pip install -r requirements.txtThe scripts/ directory contains .sh files. These scripts can be used to perform experiments using the following dense retrieval models:
- Contriever:
facebook/contriever-msmarco - TAS-B:
sebastian-hofstaetter/distilbert-dot-tas_b-b256-msmarco - E5:
intfloat/e5-base-v2
The first step consists in generating contrastive training data by mining hard negatives from the selected BEIR datasets.
bash scripts/preparare_contrastive_dataset.shThe second step consists in fine-tuning the dense retrieval models using the TSFT or the LoRA-r method.
To fine-tune with TSFT:
bash scripts/finetuning_tsft.shTo fine-tune with LoRA-r:
bash scripts/finetuning_lorar.shTo perform inference and evaluation of TSFT -adapted dense models:
bash scripts/eval_tsft.shTo perform inference and evaluation of LoRA-r -adapted dense models:
bash scripts/eval_lora.shTo perform inference and evaluation with zero-shot applied dense models:
bash scripts/eval_zs.shIf you use this code or build upon it, please consider citing our work.
- Giulio Capecchi
- Francesca Pezzuti
- Cesare Campagnano
- Antonio Mallia
- Nicola Tonellotto