Dense Retrieval with Low-Rank Adaptation for Scalable Multi-Tenant Search

This repository contains the code used to explore the effectiveness of scalable multi-tenant dense retrieval deployments based on LoRA (Low-Rank Adaptation).

🗂️ Repository Structure

.
├── beir_datasets/ # automatically populated with BEIR datasets
├── beir_results/ # results directory
├── utils/ # utilities and helper functions
├── src/ # core modules and components
├── eval_beir.py # evaluation
├── finetuning.py # fine-tuning and lora adaptation
├── prepare_contrastive_dataset.py # hard-negative mining and preprocessing
├── scripts/ # shell scripts
└── README.md

⚙️ Setup & Requirements

We recommend using uv to manage the environment and dependencies.

Once uv is installed, run

uv pip install -r requirements.txt

🔬 Replicating Our Experiments

The scripts/ directory contains .sh files. These scripts can be used to perform experiments using the following dense retrieval models:

Contriever: facebook/contriever-msmarco
TAS-B: sebastian-hofstaetter/distilbert-dot-tas_b-b256-msmarco
E5: intfloat/e5-base-v2

1. Hard-Negative Mining

The first step consists in generating contrastive training data by mining hard negatives from the selected BEIR datasets.

bash scripts/preparare_contrastive_dataset.sh

2. Fine-tuning with TSFT or LoRA-r

The second step consists in fine-tuning the dense retrieval models using the TSFT or the LoRA-r method.

To fine-tune with TSFT:

bash scripts/finetuning_tsft.sh

To fine-tune with LoRA-r:

bash scripts/finetuning_lorar.sh

3. Inference & IR Evaluation

To perform inference and evaluation of TSFT -adapted dense models:

bash scripts/eval_tsft.sh

To perform inference and evaluation of LoRA-r -adapted dense models:

bash scripts/eval_lora.sh

To perform inference and evaluation with zero-shot applied dense models:

bash scripts/eval_zs.sh

📚 Citation

If you use this code or build upon it, please consider citing our work.

👥 Contributors

Giulio Capecchi
Francesca Pezzuti
Cesare Campagnano
Antonio Mallia
Nicola Tonellotto

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dense Retrieval with Low-Rank Adaptation for Scalable Multi-Tenant Search

🗂️ Repository Structure

⚙️ Setup & Requirements

🔬 Replicating Our Experiments

1. Hard-Negative Mining

2. Fine-tuning with TSFT or LoRA-r

3. Inference & IR Evaluation

📚 Citation

👥 Contributors

About

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
scripts		scripts
src		src
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
eval_beir.py		eval_beir.py
filter_nq.py		filter_nq.py
finetuning.py		finetuning.py
prepare_contrastive_dataset.py		prepare_contrastive_dataset.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Dense Retrieval with Low-Rank Adaptation for Scalable Multi-Tenant Search

🗂️ Repository Structure

⚙️ Setup & Requirements

🔬 Replicating Our Experiments

1. Hard-Negative Mining

2. Fine-tuning with TSFT or LoRA-r

3. Inference & IR Evaluation

📚 Citation

👥 Contributors

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages