nucleus_sampling_analysis/index.html at main · gaushh/nucleus_sampling_analysis · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
<!DOCTYPE html>
<html>
<head>
    <meta charset="UTF-8">
    <title>Thresholding as a Decoding Strategy for LLMs</title>
    <style>
        body {
            font-family: Arial, sans-serif;
            background-color: #f8f8f8;
            margin: 0;
            padding: 0;
        }

        .container {
            max-width: 800px;
            margin: 0 auto;
            padding: 20px;
            background-color: #fff;
            box-shadow: 0 2px 5px rgba(0, 0, 0, 0.1);
        }

        h1 {
            text-align: center;
            color: #333;
        }

        p {
            line-height: 1.5;
        }

        .image-placeholder {
            display: block;
            width: 100%;
            height: 200px;
            background-color: #ccc;
            margin: 15px 0;
            text-align: center;
            line-height: 200px;
        }

        .members strong, .contact strong, .university strong {
    font-weight: bold;
}

.members, .contact, .university {
    font-weight: normal;
}
.project-img {
            width: 100%;
            height: auto;
            display: block;
            margin: 15px 0;
        }

    </style>
</head>


<body>
    <div class="container">
        <h1>Thresholding as a Decoding Strategy for LLMs</h1>
<p class="members"><strong>Project Members:</strong> Gaurav Sharma, Milind Kesar Thummala, Shikher Srivastava</p>
<p class="contact"><strong>Contact Email:</strong> gauravsharma2024@u.northwestern.edu, milindthummala2023@u.northwestern.edu, shikhersrivastava2023@u.northwestern.edu</p>
<p class="university"><strong>Course:</strong> COMP_SCI_496-0 Special Topics in Computer Science: Generative Deep Models, Northwestern University, Professor Bryan Pardo</p>


        <h2>Introduction</h2>
        <p>This project investigates the decoding strategies of Language Model (LLM) with a particular focus on Nucleus Sampling, and analyzes potential shortcomings, proposing a new sampling strategy to overcome these issues. The dynamic thresholding strategy based on perplexity is the crux of this project.</p>

        <h2>Nucleus Sampling: The Current Approach</h2>
        <p>Nucleus Sampling, along with other methods like Top-K Sampling, has been instrumental in shaping the success of LLMs. However, it is far from perfect. This project scrutinizes the text generated by Nucleus Sampling to identify its shortcomings. We uncover that the method can potentially miss words with significant probability, thereby impacting the model's creativity.</p>
        <img class="project-img" src="flat_distibution.png" alt="Image showing Nucleus Sampling shortcomings" style="width: 50%; height: auto;">
<img class="project-img" src="peaked_disrtibution.png" alt="Image showing Nucleus Sampling shortcomings" style="width: 50%; height: auto;">

        <p>The visualization of how Nucleus Sampling can miss words with significant probability.</p>

        <h2>Thresholding: A Proposed Solution</h2>
        <p>We present a novel decoding strategy, namely Threshold-based sampling. The key idea is to apply a threshold on the probability values of the tokens and to sample only from tokens that have probabilities higher than this threshold. This approach successfully addresses the issues we found in Nucleus Sampling and Top-K Sampling.</p>
        <img class="project-img" src="threshold_sampling_example.png" alt="Image illustrating Thresholding method" style="width: 50%; height: auto;">
        <p>A demonstration of the Threshold-based sampling strategy, displaying its robustness to skewed distributions.</p>

        <h2>Selecting the Threshold Value</h2>
        <p>Choosing an appropriate value for the threshold is crucial. Below shows graph for static thresholding approach with standard deviation and mean of perplexity versus the threshold. This approach allows us to select a static threshold by comparing perplexity with human perplexity of 12.38 withing range of threshold values that enhances our model's performance.</p>
        <img class="project-img" src="GraphThresholdvsPerplexity.png" alt="Image showing graphs of average perplexity vs range of threshold values and variance vs range of threshold">
        <p>Graphs showing the relationship between average perplexity, variance, and a range of threshold values.</p>

        <h2>Dynamic Threshold Adjustment</h2>
        <p>In addition to static thresholding, we also explore dynamically adjusting the threshold based on the perplexity scores of previously generated words. We found that this method can help keep the perplexity close to the target perplexity, thereby enhancing the coherence and creativity of the generated text.</p>
        <img class="project-img" src="dynamic_thresholding.png" alt="Image demonstrating the logic for dynamic thresholding" style="width: 50%; height: auto;">
        <p>The difference in average perplexity and the target can help update the current threshold.</p>

        <h2>Results</h2>
        <p>Through extensive experimentation, we found that our threshold-based sampling yields more coherent text with less repetition than the previous methods. Dynamic thresholding shows potential in controlling the perplexity of the generated text but requires further refinement. Overall, our results point to Threshold-based sampling as a promising alternative to current decoding strategies in LLMs.</p>
    </div>


</body>
</html>