evalplus.github.io/index.html at main · evalplus/evalplus.github.io · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
<!doctype html>
<html>
<link rel="preconnect" href="https://fonts.googleapis.com" />
<link rel="preconnect" href="https://fonts.gstatic.com" crossorigin />
<link href="https://fonts.googleapis.com/css2?family=JetBrains+Mono:wght@100;400&display=swap" rel="stylesheet" />

<head>
  <meta charset="UTF-8" />
  <title>Benchmarks by EvalPlus Team</title>
  <link rel="icon" href="https://images.emojiterra.com/google/noto-emoji/unicode-15/color/1024px/1f9d1-1f4bb.png" />
  <link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/bootstrap@5.0.0/dist/css/bootstrap.min.css" />

  <style>
    body {
      font-family: "JetBrains Mono", monospace;
      background-color: #ffffff;
      color: #000000;
    }

    #content {
      width: 75%;
    }

    .block {
      height: 40%;
      width: 90%;
      max-width: 1000px;
      margin: 20px;
      padding: 20px;
      border: 1px solid #ccc;
      border-radius: 5px;
    }

    @media screen and (max-width: 1400px) {
      body {
        font-size: 1.9vw;
      }

      #content {
        width: 100%;
      }

      h1 {
        font-size: 2em;
      }

      h2 {
        font-size: 1.6em;
      }
    }
  </style>
</head>

<body>
  <div class="container-fluid d-flex flex-column align-items-center gap-3">
    <!-- add an image -->
    <img src="assets/evalplus.png" alt="EvalPlus Logo" style="max-width: 180px; width: 20%" class="mt-5" />
    <h1 class="text-nowrap mt-5"><b>Benchmarks @ EvalPlus</b></h1>
    <div class="fs-5">
      <p>
        EvalPlus team aims to build <i>high-quality</i> and <i>precise</i> evaluators to understand
        LLM performance on code related tasks:
      </p>
    </div>
    <div class="block">
      <h2 class="d-flex flex-row justify-content-center gap-3 text-nowrap">
        🔨 HumanEval+ & MBPP+
      </h2>
      <p>
        HumanEval and MBPP initially came with limited tests. EvalPlus made
        HumanEval+ & MBPP+ by extending the tests by 80x/35x for rigorous
        eval.
      </p>
      <a class="d-flex flex-row justify-content-center gap-3 text-nowrap" href="leaderboard.html">Go to EvalPlus
        Leaderboard</a>
    </div>
    <div class="block">
      <h2 class="d-flex flex-row justify-content-center gap-3 text-nowrap">
        🚀 EvalPerf: Code Efficiency Evaluation
      </h2>
      <p>
        Based on Differential Performance Evaluation proposed by our COLM'24 paper, we rigorously
        evaluate the code efficiency of LLM-generated code with performance-exercising coding tasks and test inputs.
      </p>
      <a class="d-flex flex-row justify-content-center gap-3 text-nowrap" href="evalperf.html">Evalperf Leaderboard</a>
    </div>
    <div class="block">
      <h2 class="d-flex flex-row justify-content-center gap-3 text-nowrap">
        📦 RepoQA: Long-Context Code Understanding
      </h2>
      <p>
        Repository understanding is crucial for intelligent code agents. At
        RepoQA, we are designing evaluators of long-context code
        understanding.
      </p>
      <a class="d-flex flex-row justify-content-center gap-3 text-nowrap" href="repoqa.html">Learn about RepoQA</a>
    </div>
  </div>
</body>

</html>