NLI and Dense Passage Retrieval Project

This project implements Natural Language Inference (NLI) and Dense Passage Retrieval (DPR) using transformer models. The implementation includes fine-tuning DistilBERT for NLI, soft prompt tuning, and training DPR models for question-answer retrieval.

Usage

Part 1: Natural Language Inference (NLI)

Fine-tune DistilBERT on the NLI task:

python scripts/run_part1_nli.py

This script:

Loads the NLI dataset from data/nli/
Fine-tunes a DistilBERT model for binary entailment classification
Uses mixed precision training and gradient accumulation
Implements learning rate warmup and scheduling

Part 2: Soft Prompt Tuning

Train soft prompts on the frozen DistilBERT model:

python scripts/run_part2_prompting.py

This script:

Freezes the fine-tuned DistilBERT model
Trains soft prompts with different configurations (p=5, 10, 20)
Compares prompt tuning performance

Part 3: Dense Passage Retrieval (DPR)

Train and evaluate DPR models:

python scripts/run_part3_dpr.py

This script:

Loads question-answer pairs from data/qa/
Trains separate encoders for questions and passages
Uses contrastive loss with in-batch negative sampling
Evaluates using Recall@k and Mean Reciprocal Rank (MRR)

Results

Part 1: NLI Fine-tuning

Training Configuration:

Model: DistilBERT (distilbert-base-uncased)
Batch Size: 128 (effective 256 with gradient accumulation)
Learning Rate: 2e-5
Epochs: 10
Weight Decay: 0.01

Performance:

Epoch	Training Loss	Validation F1 Score	Global Step
1	0.3963	0.9038	1,563
2	0.2460	0.9179	3,126
3	0.2175	0.9225	4,689
4	0.2000	0.9276	6,252
5	0.1866	0.9309	7,815
6	0.1763	0.9321	9,378
7	0.1665	0.9343	10,941
8	0.1583	0.9366	12,504
9	0.1507	0.9382	14,067
10	0.1443	0.9390	15,630

Training:

Example Evaluations:

The model was evaluated on 3 randomly selected validation examples:

Example 1 ✓
- Premise: "A group of kids listening to their band instructor, and reading music off their papers."
- Hypothesis: "The kids are not reading."
- True Label: Entailment (1)
- Predicted: Entailment (1)
- Confidence: 0.9541
Example 2 ✓
- Premise: "A man cooking food on the stove."
- Hypothesis: "A man is making hot food."
- True Label: Not Entailment (0)
- Predicted: Not Entailment (0)
- Confidence: 0.7855
Example 3 ✗
- Premise: "A baby in an indoor pool is using an inflatable tube on it's on"
- Hypothesis: "The baby is inside"
- True Label: Not Entailment (0)
- Predicted: Entailment (1)
- Confidence: 0.6960

Part 2: Soft Prompt Tuning

Configuration:

Frozen Model: DistilBERT (from Part 1)
Prompt Lengths: p=5, 10, 20
Learning Rate: 1e-3
Epochs: 3

Performance by Configuration:

Soft-5 (3,840 trainable parameters, 66,362,880 frozen)

Epoch	Train Loss	Train Acc	Val Loss	Val Acc
1	0.1839	0.9275	0.1599	0.9375
2	0.1742	0.9321	0.1594	0.9376
3	0.1711	0.9335	0.1595	0.9373

Soft-10 (7,680 trainable parameters, 66,362,880 frozen)

Epoch	Train Loss	Train Acc	Val Loss	Val Acc
1	0.2168	0.9125	0.1647	0.9379
2	0.1836	0.9274	0.1613	0.9384
3	0.1747	0.9319	0.1600	0.9367

Soft-20 (15,360 trainable parameters, 66,362,880 frozen)

Epoch	Train Loss	Train Acc	Val Loss	Val Acc
1	0.2277	0.9075	0.1640	0.9366
2	0.1805	0.9288	0.1607	0.9372
3	0.1738	0.9319	0.1600	0.9373

Part 3: Dense Passage Retrieval

Configuration:

Model: ELECTRA-small (google/electra-small-discriminator)
Contrastive Loss: In-batch negative sampling
Embedding Dimension: 256
Max Length: 16 tokens

Initial Evaluation (Before Training):

Recall@3: 0.125 (12.5%)
Mean Reciprocal Rank (MRR): 0.125

The DPR system successfully retrieves relevant passages for questions using learned dense representations. The contrastive learning approach with in-batch negatives enables efficient training without explicit negative examples.

Validation Examples:

Question 1:

Title: "What type of beer is best for beer battered fish?"
Body: "I was looking for a beer battered fish recipe the other day when I noticed most of the recipes don't state a style of beer to use. Some of the recipes use a significant amount of beer so I assume that some of the flavor profile from the beer will carry over to the fish. So I'm wondering, which style is ideal? Porter? IPA? Maybe a Hefeweizen?"
True Answer: "The primary use of beer in a beer batter is its alcohol, which disrupts gluten formation and needs less heat than water to evaporate, improving the texture of the final crust. For flavor, most recipes using beer do best with a malty, low-bitterness beer, like a marzen, scotch ale, or (maybe) amber ale. Highly-hopped 'put hair on your chest' IPAs are a bad idea: you don't want that bitterness. Hefeweizen would be fine."
Top Retrieved Passages: Retrieved passages were not directly relevant to the question, indicating the need for training.

Question 2:

Title: "When sauteing should I put onion or garlic first?"
Body: "Most of the dishes here in the Philippines involved sauteing. But I am a little bit confused on what should I put first, are there any advantages on it?"
True Answer: "Onions always benefit from a few minutes on their own to soften and start sweetening. Garlic burns easily, especially when finely chopped or crushed, so in general should not be fried as long as onion. Having said that, when doing a quick stir fry or similar dish, you can throw in the garlic first for 10-20 seconds so that it flavours the oil."
Top Retrieved Passages: Retrieved passages were not directly relevant to the question.

Question 3:

Title: "How long do unrefrigerated opened canned peppers last?"
Body: "I received a couple homemade cans of banana peppers that were canned with a jalapeno in each for some extra heat. I absolutely love the taste of them, but I am curious how long they can last once opened when there isn't access to refrigeration."
True Answer: "Given the vinegar, these sound like pickled peppers. Pickled items are usually made to last, even when not refrigerated -- preservation was the original purpose of pickling..."
Top Retrieved Passages: One of the top retrieved passages correctly matched the true answer about pickled peppers, demonstrating some retrieval capability even before training.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data		data
plots		plots
scripts		scripts
src		src
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NLI and Dense Passage Retrieval Project

Usage

Part 1: Natural Language Inference (NLI)

Part 2: Soft Prompt Tuning

Part 3: Dense Passage Retrieval (DPR)

Results

Part 1: NLI Fine-tuning

Part 2: Soft Prompt Tuning

Soft-5 (3,840 trainable parameters, 66,362,880 frozen)

Soft-10 (7,680 trainable parameters, 66,362,880 frozen)

Soft-20 (15,360 trainable parameters, 66,362,880 frozen)

Part 3: Dense Passage Retrieval

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NLI and Dense Passage Retrieval Project

Usage

Part 1: Natural Language Inference (NLI)

Part 2: Soft Prompt Tuning

Part 3: Dense Passage Retrieval (DPR)

Results

Part 1: NLI Fine-tuning

Part 2: Soft Prompt Tuning

Soft-5 (3,840 trainable parameters, 66,362,880 frozen)

Soft-10 (7,680 trainable parameters, 66,362,880 frozen)

Soft-20 (15,360 trainable parameters, 66,362,880 frozen)

Part 3: Dense Passage Retrieval

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages