-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathindex.html
More file actions
188 lines (154 loc) · 13.2 KB
/
index.html
File metadata and controls
188 lines (154 loc) · 13.2 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta name="description" content="">
<meta name="author" content="">
<link rel="shortcut icon" type="image/x-icon" href="../favicon.ico">
<title>Balancing Speech Understanding and Generation Using Continual Pre-training for Codec-based Speech LLM</title>
<!-- Fonts -->
<link href="https://fonts.googleapis.com/css?family=Abril+Fatface|Open+Sans" rel="stylesheet">
<!-- Bootstrap core CSS -->
<!--<link href="bootstrap/dist/css/bootstrap.min.css" rel="stylesheet">-->
<link href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/css/bootstrap.min.css" rel="stylesheet">
<!-- IE10 viewport hack for Surface/desktop Windows 8 bug -->
<!--<link href="bootstrap/assets/css/ie10-viewport-bug-workaround.css" rel="stylesheet">-->
<link href="https://maxcdn.bootstrapcdn.com/css/ie10-viewport-bug-workaround.css" rel="stylesheet">
<!-- Custom styles for this template -->
<link href="poster.css" rel="stylesheet">
<!-- Just for debugging purposes. Don't actually copy these 2 lines! -->
<!--[if lt IE 9]><script src="bootstrap/assets/js/ie8-responsive-file-warning.js"></script><![endif]-->
<!--<script src="bootstrap/assets/js/ie-emulation-modes-warning.js"></script>-->
<!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries -->
<!--[if lt IE 9]>
<script src="https://oss.maxcdn.com/html5shiv/3.7.3/html5shiv.min.js"></script>
<script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script>
<![endif]-->
</head>
<body>
<div class="container">
<div class="page-header">
<h1>
Balancing Speech Understanding and Generation Using Continual Pre-training for Codec-based Speech LLM
</h1>
<p class="centered">
Authors: Anonymous
</p>
<p class="centered">
Contact: None
</p>
<p class="centered">In Submission</p>
</div>
<h3>Abstract</h3>
<p>Speech language models have emerged as a versatile framework for both understanding and generating spoken language,
excelling across a broad spectrum of tasks. A key challenge in this field is achieving a balance between generation
quality and understanding capacity. To address this, we explore continual pre-training for speech language models within
neural-codec-based systems in this paper. Our experiments focus on continually pre-training a large textual language
model using either speech-only or joint speech-text data. In addition to the original textual model, we conduct extensive
supervised fine-tuning on tasks such as speech recognition, text-to-speech, speech-to-text translation, and speech-to-
speech translation. The results indicate that continual pre-training consistently improves the performance of speech
codec-based language models across both understanding and generation tasks, with particularly notable gains in the complex S2ST task.
</p>
<center><img src="codec-llm2.png" height="100%" width="100%"></center>
<h3>Speech-to-speech Demo</h3>
<div style="height: 300px">
<div class="row"">
<div class="col-sm-3" style="height: 125px">Source Speech</div>
<div class="col-sm-3" style="height: 125px">Source Text</div>
<div class="col-sm-3" style="height: 125px">Target Translation</div>
<div class="col-sm-3" style="height: 125px">Target Speech</div>
</div>
<div class="row h-auto">
<div class="col-sm-3" style="height: 125px"><audio src="s2st/s2st_source-speech_YOU1000000183_S0000136.wav" controls></audio></div>
<div class="col-sm-3" style="height: 125px">but we may have one when the future and don't get worry about. it'll have one, sometime we will have one and well no one would tell you once that gonna happen it's.</div>
<div class="col-sm-3" style="height: 125px">但我们将来可能会经历一次,不用担心。有时我们会遇到一次,没有人会告诉你它会发生。</div>
<div class="col-sm-3" style="height: 125px"><audio src="s2st/s2st_target-speech_YOU1000000183_S0000136.wav" controls></audio></div>
</div>
<div class="row h-auto">
<div class="col-sm-3" style="height: 125px"><audio src="s2st/s2st_source-speech_YOU1000000184_S0000015.wav" controls></audio></div>
<div class="col-sm-3" style="height: 125px">we we both had a fair amount of experience in real estate and charlie made his early money in real estate um.</div>
<div class="col-sm-3" style="height: 125px">我们在房地产方面都有相当多的经验,查理早期是在房地产行业赚到钱的。</div>
<div class="col-sm-3" style="height: 125px"><audio src="s2st/s2st_target-speech_YOU1000000184_S0000015.wav" controls></audio></div>
</div>
<div class="row h-auto">
<div class="col-sm-3" style="height: 125px"><audio src="s2st/s2st_source-speech_YOU1000000184_S0000050.wav" controls></audio></div>
<div class="col-sm-3" style="height: 125px">by its nature real estate tends to be a very lousy investment for people who are taxed under sub-chapter c of the ah code relating to corporations so the combination of having it.</div>
<div class="col-sm-3" style="height: 125px">就其本质而言,房地产对于那些在与公司相关的法典第C分章下纳税的人来说,是一种非常糟糕的投资组合。</div>
<div class="col-sm-3" style="height: 125px"><audio src="s2st/s2st_target-speech_YOU1000000184_S0000050.wav" controls></audio></div>
</div>
<div class="row h-auto">
<div class="col-sm-3" style="height: 125px"><audio src="s2st/s2st_source-speech_YOU1000000185_S0000016.wav" controls></audio></div>
<div class="col-sm-3" style="height: 125px">what is the gates foundation doing to fight the corona virus and and why is it such a priority for you?</div>
<div class="col-sm-3" style="height: 125px">盖茨基金会在抗击新冠病毒中做了些什么?为什么对你来说这如此重要?</div>
<div class="col-sm-3" style="height: 125px"><audio src="s2st/s2st_target-speech_YOU1000000185_S0000016.wav" controls></audio></div>
</div>
<div class="row h-auto">
<div class="col-sm-3" style="height: 125px"><audio src="s2st/s2st_source-speech_YOU1000000185_S0000065.wav" controls></audio></div>
<div class="col-sm-3" style="height: 125px">and that makes people nervous about this time. so i think the audience will be so pleased to hear your optimism for this right but.</div>
<div class="col-sm-3" style="height: 125px">这让人们对这一次的疫情感到紧张。所以我想观众们会很高兴听到你的乐观态度,但是。</div>
<div class="col-sm-3" style="height: 125px"><audio src="s2st/s2st_target-speech_YOU1000000185_S0000065.wav" controls></audio></div>
</div>
<div class="row h-auto">
<div class="col-sm-3" style="height: 125px"><audio src="s2st/s2st_source-speech_YOU1000000186_S0000032.wav" controls></audio></div>
<div class="col-sm-3" style="height: 125px">and he said that one of the good qualities any n g o should have any person who wants to be part of a social, you know, service should have is that, ah.</div>
<div class="col-sm-3" style="height: 125px">他说任何非政府组织都应该具备的优秀品质之一就是任何想要成为社会一份子的人都应该具备的品质。</div>
<div class="col-sm-3" style="height: 125px"><audio src="s2st/s2st_target-speech_YOU1000000186_S0000032.wav" controls></audio></div>
</div>
<div class="row h-auto">
<div class="col-sm-3" style="height: 125px"><audio src="s2st/s2st_source-speech_YOU1000000186_S0000144.wav" controls></audio></div>
<div class="col-sm-3" style="height: 125px">and you haven't had the twenty or thirty year period of experience where you're really familiar with the territory.</div>
<div class="col-sm-3" style="height: 125px">尽管你对这个领域很熟悉,你没有二、三十年的经验。</div>
<div class="col-sm-3" style="height: 125px"><audio src="s2st/s2st_target-speech_YOU1000000186_S0000144.wav" controls></audio></div>
</div>
<div class="row h-auto">
<div class="col-sm-3" style="height: 125px"><audio src="s2st/s2st_source-speech_YOU1000000187_S0000084.wav" controls></audio></div>
<div class="col-sm-3" style="height: 125px">or maybe what you want is to write script for detective shows. it doesn't really matter you know what matters is to be is to know what you want.</div>
<div class="col-sm-3" style="height: 125px">或者你想要的是为侦探剧写剧本。这并不重要,重要的是知道自己想要什么。</div>
<div class="col-sm-3" style="height: 125px"><audio src="s2st/s2st_target-speech_YOU1000000187_S0000084.wav" controls></audio></div>
</div>
<div class="row h-auto">
<div class="col-sm-3" style="height: 125px"><audio src="s2st/s2st_source-speech_YOU1000000188_S0000040.wav" controls></audio></div>
<div class="col-sm-3" style="height: 125px">number six, night watch. arguably the most famous artwork of the rich museum in amsterdam.</div>
<div class="col-sm-3" style="height: 125px">第六名,《夜巡》。可以说是阿姆斯特丹国立博物馆最著名的艺术品。</div>
<div class="col-sm-3" style="height: 125px"><audio src="s2st/s2st_target-speech_YOU1000000188_S0000040.wav" controls></audio></div>
</div>
<div class="row h-auto">
<div class="col-sm-3" style="height: 125px"><audio src="s2st/s2st_source-speech_YOU1000000189_S0000108.wav" controls></audio></div>
<div class="col-sm-3" style="height: 125px">and we are allowing food industries to spray it with pesticides, to transport it super far, change its enzymes and nutritional benefits.</div>
<div class="col-sm-3" style="height: 125px">我们允许食品工业向食物喷洒杀虫剂,将其运输到很远的地方,改变它的酶和营养价值。</div>
<div class="col-sm-3" style="height: 125px"><audio src="s2st/s2st_target-speech_YOU1000000189_S0000108.wav" controls></audio></div>
</div>
<div class="row h-auto">
<div class="col-sm-3" style="height: 125px"><audio src="s2st/s2st_source-speech_YOU1000000191_S0000002.wav" controls></audio></div>
<div class="col-sm-3" style="height: 125px">videos every tuesday and friday. at odd occasion, we do this sunday video, what is going on? i'm gonna take a couple of polls.</div>
<div class="col-sm-3" style="height: 125px">每周二和周五都更新视频。我们偶尔会在周日更新视频。为何如此呢?我要做几个观众调查。</div>
<div class="col-sm-3" style="height: 125px"><audio src="s2st/s2st_target-speech_YOU1000000191_S0000002.wav" controls></audio></div>
</div>
<div class="row h-auto">
<div class="col-sm-3" style="height: 125px"><audio src="s2st/s2st_source-speech_YOU1000000192_S0000109.wav" controls></audio></div>
<div class="col-sm-3" style="height: 125px">so you what um so starting from the start there you just grab it in threes and it's just.</div>
<div class="col-sm-3" style="height: 125px">所以你从开始的地方把它分成三份,然后它就。</div>
<div class="col-sm-3" style="height: 125px"><audio src="s2st/s2st_target-speech_YOU1000000192_S0000109.wav" controls></audio></div>
</div>
<div class="row h-auto">
<div class="col-sm-3" style="height: 125px"><audio src="s2st/s2st_source-speech_YOU1000000192_S0000113.wav" controls></audio></div>
<div class="col-sm-3" style="height: 125px">and it just, if it fails, just resort to doing a simple a simple bun. just so it isn't so messy just put it up.</div>
<div class="col-sm-3" style="height: 125px">如果失败了,就做一个简单的丸子头。这样就不会太乱了,把它扎起来。</div>
<div class="col-sm-3" style="height: 125px"><audio src="s2st/s2st_target-speech_YOU1000000192_S0000113.wav" controls></audio></div>
</div>
<div class="row h-auto">
<div class="col-sm-3" style="height: 125px"><audio src="s2st/s2st_source-speech_YOU1000000192_S0000236.wav" controls></audio></div>
<div class="col-sm-3" style="height: 125px">you know this carpet's not good for it anyway. though i don't know how to dance so i'll just jump dad.</div>
<div class="col-sm-3" style="height: 125px">你也知道这地毯对它不好。我不会跳舞,所以我就蹦蹦跳跳啦,爸爸。</div>
<div class="col-sm-3" style="height: 125px"><audio src="s2st/s2st_target-speech_YOU1000000192_S0000236.wav" controls></audio></div>
</div>
</div>
<h3>.</h3>
</div> <!-- /container -->
<!-- IE10 viewport hack for Surface/desktop Windows 8 bug -->
<!--<script src="bootstrap/assets/js/ie10-viewport-bug-workaround.js"></script>-->
<script src="https://maxcdn.bootstrapcdn.com/js/ie10-viewport-bug-workaround.js"></script>
</body>
</html>