Skip to content

fpezzuti/lora-dense-retrieval

Repository files navigation

Dense Retrieval with Low-Rank Adaptation for Scalable Multi-Tenant Search

This repository contains the code used to explore the effectiveness of scalable multi-tenant dense retrieval deployments based on LoRA (Low-Rank Adaptation).

🗂️ Repository Structure

.
├── beir_datasets/ # automatically populated with BEIR datasets
├── beir_results/ # results directory
├── utils/ # utilities and helper functions
├── src/ # core modules and components
├── eval_beir.py # evaluation
├── finetuning.py # fine-tuning and lora adaptation
├── prepare_contrastive_dataset.py # hard-negative mining and preprocessing
├── scripts/ # shell scripts
└── README.md

⚙️ Setup & Requirements

We recommend using uv to manage the environment and dependencies.

Once uv is installed, run

uv pip install -r requirements.txt

🔬 Replicating Our Experiments

The scripts/ directory contains .sh files. These scripts can be used to perform experiments using the following dense retrieval models:

1. Hard-Negative Mining

The first step consists in generating contrastive training data by mining hard negatives from the selected BEIR datasets.

bash scripts/preparare_contrastive_dataset.sh

2. Fine-tuning with TSFT or LoRA-r

The second step consists in fine-tuning the dense retrieval models using the TSFT or the LoRA-r method.

To fine-tune with TSFT:

bash scripts/finetuning_tsft.sh

To fine-tune with LoRA-r:

bash scripts/finetuning_lorar.sh

3. Inference & IR Evaluation

To perform inference and evaluation of TSFT -adapted dense models:

bash scripts/eval_tsft.sh

To perform inference and evaluation of LoRA-r -adapted dense models:

bash scripts/eval_lora.sh

To perform inference and evaluation with zero-shot applied dense models:

bash scripts/eval_zs.sh

📚 Citation

If you use this code or build upon it, please consider citing our work.

👥 Contributors

  • Giulio Capecchi
  • Francesca Pezzuti
  • Cesare Campagnano
  • Antonio Mallia
  • Nicola Tonellotto

About

Dense Retrieval with Low-Rank Adaptation for Scalable Multi-Tenant Search

Topics

Resources

License

Stars

Watchers

Forks

Contributors