Conversation
…into ensembling_layer
…into ensembling_layer
for more information, see https://pre-commit.ci
…into ensembling_layer
for more information, see https://pre-commit.ci
…into ensembling_layer
elk/metrics/eval.py
Outdated
| return {**auroc_dict, **cal_acc_dict, **acc_dict, **cal_dict} | ||
|
|
||
|
|
||
| def calc_auroc(y_logits, y_true, ensembling, num_classes): |
…into ensembling_layer
for more information, see https://pre-commit.ci
…into ensembling_layer
for more information, see https://pre-commit.ci
lauritowal
left a comment
There was a problem hiding this comment.
Tests run forever on my machine. Need to check what is wrong there.
AlexTMallen
left a comment
There was a problem hiding this comment.
Mainly just fix the handling of the multidataset case
❯ elk elicit gpt2 imdb amazon_polarity --max_examples 10 300 --debug --num_gpus 1
| y_logits_collection.append(y_logits) | ||
|
|
||
| # get logits and ground_truth from middle to last layer | ||
| middle_index = len(layer_outputs) // 2 |
There was a problem hiding this comment.
in some ways I think we should allow the layers over which we ensemble to be configurable. E.g. sometimes the last layers perform worse.
There was a problem hiding this comment.
yeah, it makes sense to make it configurable. However, I'm curious, how would you decide which layers to pick?
| middle_index = len(layer_outputs) // 2 | ||
| y_logits_stacked = torch.stack(y_logits_collection[middle_index:]) | ||
| # layer prompt_ensembling of the stacked logits | ||
| y_logits_stacked_mean = torch.mean(y_logits_stacked, dim=0) |
There was a problem hiding this comment.
It seems like the ensembling is done by taking the mean over layers, rather than concatenating. This isn't super clear from comments/docstrings, and hard to tell from reading the code because the shapes aren't commented.
| from enum import Enum | ||
|
|
||
|
|
||
| class PromptEnsembling(Enum): |
elk/training/train.py
Outdated
| devices: list[str], | ||
| world_size: int, | ||
| ) -> dict[str, pd.DataFrame]: | ||
| ) -> tuple[dict[str, pd.DataFrame], list[dict]]: |
There was a problem hiding this comment.
Same comment here regarding return type
elk/run.py
Outdated
| try: | ||
| for df_dict in tqdm(mapper(func, layers), total=len(layers)): | ||
| for k, v in df_dict.items(): | ||
| for df_dict, layer_output in tqdm( |
There was a problem hiding this comment.
This doesn't write all the appropriate lines for
❯ elk elicit gpt2 imdb amazon_polarity --max_examples 10 300 --debug --num_gpus 1
There should be evaluation results for both imdb and amazon_polarity in the layer_ensembling_results.csv
sorting remove comment
for more information, see https://pre-commit.ci
my fixes for layer ensembling
for more information, see https://pre-commit.ci fix merge
f3319c1 to
64e762a
Compare
Ensembling from mid to last layer