-
Notifications
You must be signed in to change notification settings - Fork 5
Expand file tree
/
Copy pathindex.html
More file actions
101 lines (91 loc) · 3.1 KB
/
index.html
File metadata and controls
101 lines (91 loc) · 3.1 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
<!doctype html>
<html>
<link rel="preconnect" href="https://fonts.googleapis.com" />
<link rel="preconnect" href="https://fonts.gstatic.com" crossorigin />
<link href="https://fonts.googleapis.com/css2?family=JetBrains+Mono:wght@100;400&display=swap" rel="stylesheet" />
<head>
<meta charset="UTF-8" />
<title>Benchmarks by EvalPlus Team</title>
<link rel="icon" href="https://images.emojiterra.com/google/noto-emoji/unicode-15/color/1024px/1f9d1-1f4bb.png" />
<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/bootstrap@5.0.0/dist/css/bootstrap.min.css" />
<style>
body {
font-family: "JetBrains Mono", monospace;
background-color: #ffffff;
color: #000000;
}
#content {
width: 75%;
}
.block {
height: 40%;
width: 90%;
max-width: 1000px;
margin: 20px;
padding: 20px;
border: 1px solid #ccc;
border-radius: 5px;
}
@media screen and (max-width: 1400px) {
body {
font-size: 1.9vw;
}
#content {
width: 100%;
}
h1 {
font-size: 2em;
}
h2 {
font-size: 1.6em;
}
}
</style>
</head>
<body>
<div class="container-fluid d-flex flex-column align-items-center gap-3">
<!-- add an image -->
<img src="assets/evalplus.png" alt="EvalPlus Logo" style="max-width: 180px; width: 20%" class="mt-5" />
<h1 class="text-nowrap mt-5"><b>Benchmarks @ EvalPlus</b></h1>
<div class="fs-5">
<p>
EvalPlus team aims to build <i>high-quality</i> and <i>precise</i> evaluators to understand
LLM performance on code related tasks:
</p>
</div>
<div class="block">
<h2 class="d-flex flex-row justify-content-center gap-3 text-nowrap">
🔨 HumanEval+ & MBPP+
</h2>
<p>
HumanEval and MBPP initially came with limited tests. EvalPlus made
HumanEval+ & MBPP+ by extending the tests by 80x/35x for rigorous
eval.
</p>
<a class="d-flex flex-row justify-content-center gap-3 text-nowrap" href="leaderboard.html">Go to EvalPlus
Leaderboard</a>
</div>
<div class="block">
<h2 class="d-flex flex-row justify-content-center gap-3 text-nowrap">
🚀 EvalPerf: Code Efficiency Evaluation
</h2>
<p>
Based on Differential Performance Evaluation proposed by our COLM'24 paper, we rigorously
evaluate the code efficiency of LLM-generated code with performance-exercising coding tasks and test inputs.
</p>
<a class="d-flex flex-row justify-content-center gap-3 text-nowrap" href="evalperf.html">Evalperf Leaderboard</a>
</div>
<div class="block">
<h2 class="d-flex flex-row justify-content-center gap-3 text-nowrap">
📦 RepoQA: Long-Context Code Understanding
</h2>
<p>
Repository understanding is crucial for intelligent code agents. At
RepoQA, we are designing evaluators of long-context code
understanding.
</p>
<a class="d-flex flex-row justify-content-center gap-3 text-nowrap" href="repoqa.html">Learn about RepoQA</a>
</div>
</div>
</body>
</html>