About limitations on the task

Hi, and thank you for your great work!

I was wondering if the early exit techniques introduced in the paper can be extended to be used with **language modeling**, or do they only apply to **classification tasks**? I think the only difference is that (1) language modeling has a rather large answer space at tens of thousands of vocabularies, and that (2) language models usually output a probability distribution to be sampled. Maybe it is because the conservative predictions are not strong enough when facing such a large number of possible sampling outcomes?

I see that you have a later work (CALM) addressing the case on language models by enforcing the early-exit objective during training, but I think the approaches used in CATs are more desirable because it is _distribution-free_ and _model-agnostic_.

Thank you for your time!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About limitations on the task #2

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

About limitations on the task #2

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions