Skip to content

ofbread/NLI-and-Dense-Passage-Retrieval

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NLI and Dense Passage Retrieval Project

This project implements Natural Language Inference (NLI) and Dense Passage Retrieval (DPR) using transformer models. The implementation includes fine-tuning DistilBERT for NLI, soft prompt tuning, and training DPR models for question-answer retrieval.

Usage

Part 1: Natural Language Inference (NLI)

Fine-tune DistilBERT on the NLI task:

python scripts/run_part1_nli.py

This script:

  • Loads the NLI dataset from data/nli/
  • Fine-tunes a DistilBERT model for binary entailment classification
  • Uses mixed precision training and gradient accumulation
  • Implements learning rate warmup and scheduling

Part 2: Soft Prompt Tuning

Train soft prompts on the frozen DistilBERT model:

python scripts/run_part2_prompting.py

This script:

  • Freezes the fine-tuned DistilBERT model
  • Trains soft prompts with different configurations (p=5, 10, 20)
  • Compares prompt tuning performance

Part 3: Dense Passage Retrieval (DPR)

Train and evaluate DPR models:

python scripts/run_part3_dpr.py

This script:

  • Loads question-answer pairs from data/qa/
  • Trains separate encoders for questions and passages
  • Uses contrastive loss with in-batch negative sampling
  • Evaluates using Recall@k and Mean Reciprocal Rank (MRR)

Results

Part 1: NLI Fine-tuning

Training Configuration:

  • Model: DistilBERT (distilbert-base-uncased)
  • Batch Size: 128 (effective 256 with gradient accumulation)
  • Learning Rate: 2e-5
  • Epochs: 10
  • Weight Decay: 0.01

Performance:

Epoch Training Loss Validation F1 Score Global Step
1 0.3963 0.9038 1,563
2 0.2460 0.9179 3,126
3 0.2175 0.9225 4,689
4 0.2000 0.9276 6,252
5 0.1866 0.9309 7,815
6 0.1763 0.9321 9,378
7 0.1665 0.9343 10,941
8 0.1583 0.9366 12,504
9 0.1507 0.9382 14,067
10 0.1443 0.9390 15,630

Training:

F1 Score over Epochs

Example Evaluations:

The model was evaluated on 3 randomly selected validation examples:

  1. Example 1

    • Premise: "A group of kids listening to their band instructor, and reading music off their papers."
    • Hypothesis: "The kids are not reading."
    • True Label: Entailment (1)
    • Predicted: Entailment (1)
    • Confidence: 0.9541
  2. Example 2

    • Premise: "A man cooking food on the stove."
    • Hypothesis: "A man is making hot food."
    • True Label: Not Entailment (0)
    • Predicted: Not Entailment (0)
    • Confidence: 0.7855
  3. Example 3

    • Premise: "A baby in an indoor pool is using an inflatable tube on it's on"
    • Hypothesis: "The baby is inside"
    • True Label: Not Entailment (0)
    • Predicted: Entailment (1)
    • Confidence: 0.6960

Part 2: Soft Prompt Tuning

Configuration:

  • Frozen Model: DistilBERT (from Part 1)
  • Prompt Lengths: p=5, 10, 20
  • Learning Rate: 1e-3
  • Epochs: 3

Performance by Configuration:

Soft-5 (3,840 trainable parameters, 66,362,880 frozen)

Epoch Train Loss Train Acc Val Loss Val Acc
1 0.1839 0.9275 0.1599 0.9375
2 0.1742 0.9321 0.1594 0.9376
3 0.1711 0.9335 0.1595 0.9373

Soft-10 (7,680 trainable parameters, 66,362,880 frozen)

Epoch Train Loss Train Acc Val Loss Val Acc
1 0.2168 0.9125 0.1647 0.9379
2 0.1836 0.9274 0.1613 0.9384
3 0.1747 0.9319 0.1600 0.9367

Soft-20 (15,360 trainable parameters, 66,362,880 frozen)

Epoch Train Loss Train Acc Val Loss Val Acc
1 0.2277 0.9075 0.1640 0.9366
2 0.1805 0.9288 0.1607 0.9372
3 0.1738 0.9319 0.1600 0.9373

Prompt Tuning Comparison

Part 3: Dense Passage Retrieval

Configuration:

  • Model: ELECTRA-small (google/electra-small-discriminator)
  • Contrastive Loss: In-batch negative sampling
  • Embedding Dimension: 256
  • Max Length: 16 tokens

Initial Evaluation (Before Training):

  • Recall@3: 0.125 (12.5%)
  • Mean Reciprocal Rank (MRR): 0.125

The DPR system successfully retrieves relevant passages for questions using learned dense representations. The contrastive learning approach with in-batch negatives enables efficient training without explicit negative examples.

Validation Examples:

Question 1:

  • Title: "What type of beer is best for beer battered fish?"
  • Body: "I was looking for a beer battered fish recipe the other day when I noticed most of the recipes don't state a style of beer to use. Some of the recipes use a significant amount of beer so I assume that some of the flavor profile from the beer will carry over to the fish. So I'm wondering, which style is ideal? Porter? IPA? Maybe a Hefeweizen?"
  • True Answer: "The primary use of beer in a beer batter is its alcohol, which disrupts gluten formation and needs less heat than water to evaporate, improving the texture of the final crust. For flavor, most recipes using beer do best with a malty, low-bitterness beer, like a marzen, scotch ale, or (maybe) amber ale. Highly-hopped 'put hair on your chest' IPAs are a bad idea: you don't want that bitterness. Hefeweizen would be fine."
  • Top Retrieved Passages: Retrieved passages were not directly relevant to the question, indicating the need for training.

Question 2:

  • Title: "When sauteing should I put onion or garlic first?"
  • Body: "Most of the dishes here in the Philippines involved sauteing. But I am a little bit confused on what should I put first, are there any advantages on it?"
  • True Answer: "Onions always benefit from a few minutes on their own to soften and start sweetening. Garlic burns easily, especially when finely chopped or crushed, so in general should not be fried as long as onion. Having said that, when doing a quick stir fry or similar dish, you can throw in the garlic first for 10-20 seconds so that it flavours the oil."
  • Top Retrieved Passages: Retrieved passages were not directly relevant to the question.

Question 3:

  • Title: "How long do unrefrigerated opened canned peppers last?"
  • Body: "I received a couple homemade cans of banana peppers that were canned with a jalapeno in each for some extra heat. I absolutely love the taste of them, but I am curious how long they can last once opened when there isn't access to refrigeration."
  • True Answer: "Given the vinegar, these sound like pickled peppers. Pickled items are usually made to last, even when not refrigerated -- preservation was the original purpose of pickling..."
  • Top Retrieved Passages: One of the top retrieved passages correctly matched the true answer about pickled peppers, demonstrating some retrieval capability even before training.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages