From 5b49452b82422ae6a1ee34d7f6406875f31565f7 Mon Sep 17 00:00:00 2001 From: zhangkeliang Date: Mon, 4 Jan 2021 06:42:39 +0000 Subject: [PATCH 1/7] Add Transformer --- Transformer/README.md | 192 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 192 insertions(+) create mode 100644 Transformer/README.md diff --git a/Transformer/README.md b/Transformer/README.md new file mode 100644 index 0000000..17dd287 --- /dev/null +++ b/Transformer/README.md @@ -0,0 +1,192 @@ + +# Paddle Transformer 性能测试 + +此处给出了基于 Paddle 框架实现的 Transformer 任务的训练性能详细测试报告,包括执行环境、Paddle 版本、环境搭建、复现脚本、测试结果和测试日志。 + +相同环境下,其他深度学习框架的 Transformer 训练性能数据测试流程,请参考:[OtherReports](./OtherReports)。 + + +## 目录 +- [一、测试说明](#一测试说明) +- [二、环境介绍](#二环境介绍) + - [1.物理机环境](#1物理机环境) + - [2.Docker 镜像](#2docker-镜像) +- [三、环境搭建](#三环境搭建) +- [四、测试步骤](#四测试步骤) + - [1.单机(单卡、8卡)测试](#1单机单卡8卡测试) + - [2.多机(32卡)测试](#2多机32卡测试) +- [五、测试结果](#五测试结果) + - [1.Paddle训练性能](#1paddle训练性能) + - [2.与业内其它框架对比](#2与业内其它框架对比) +- [六、日志数据](#六日志数据) + - [1.单机(单卡、8卡)日志](#1单机单卡8卡日志) + + + +## 一、测试说明 + +我们统一使用了 **吞吐能力** 作为衡量性能的数据指标。**吞吐能力** 是业界公认的、最主流的框架性能考核指标,它直接体现了框架训练的速度。 + +在测试性能时,我们以 **words/sec** 作为训练期间的吞吐性能。在其它框架中,默认也均采用相同的计算方式。 + +测试中,我们选择如下3个维度,测试吞吐性能: + +- **卡数** + + 本次测试关注1卡、8卡、32卡情况下,模型训练的吞吐性能。选择的物理机是单机8卡配置。 + 因此,1卡、8卡测试在单机下完成。32卡在4台机器下完成。 + +- **FP32/AMP** + + FP32 和 AMP 是业界框架均支持的两种精度训练模式,也是衡量框架性能的混合精度量化训练的重要维度。 + 本次测试分别对 FP32 和 AMP 两种精度模式进行了测试。 + + +- **BatchSize** + + 本次测试,结合各框架具体情况,BatchSize 选用如下: + + | 参数 | PaddlePaddle | NGC PyTorch | + |:-----:|:-----:|:-----:| + | FP32 | 2560 | 2560 | + | AMP | 5120 | 5120 | + +关于其它一些参数的说明: + +- **XLA** + + 本次测试的原则是测试 Transformer 在 Paddle 下的最好性能表现,同时对比其与其它框架最好性能表现的优劣。 + + 因此,对于支持 XLA 的框架,我们默认打开 XLA 模式,以获得该框架最好的吞吐性能数据。 + +- **优化器** + + 在 Transformer 中,各个框架使用的优化器略有不同。NGC PyTorch 均支持 LAMBOptimizer,PaddlePaddle 默认使用的是 AdamOptimizer。LAMBOptimizer 优化器由于支持**梯度聚合策略**,在多机参数更新时,通信开销更低,性能会比原生的 AdamOptimizer 优化器更好一些。 + + 此处我们以各个框架默认使用的优化器为准,并测试模型的吞吐性能。Paddle 后续也会支持性能更优的 LAMBOptimizer 优化器。 + +## 二、环境介绍 +### 1.物理机环境 + +- 单机(单卡、8卡) + - 系统:CentOS Linux release 7.5.1804 + - GPU:Tesla V100-SXM2-16GB * 8 + - CPU:Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz * 38 + - Driver Version: 450.80.02 + - 内存:432 GB + +- 多机(32卡) + - 系统:CentOS release 6.3 (Final) + - GPU:Tesla V100-SXM2-32GB * 8 + - CPU:Intel(R) Xeon(R) Gold 6271C CPU @ 2.60GHz * 48 + - Driver Version: 450.80.02 + - 内存:502 GB + +### 2.Docker 镜像 + +- **镜像版本**: `hub.baidubce.com/paddlepaddle/paddle-benchmark:cuda10.1-cudnn7-runtime-ubuntu16.04` +- **Paddle 版本**: `develop+613c46bc0745c8069c55686aef4adc775f9e27d1` +- **模型代码**:[PaddleNLP](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP) +- **CUDA 版本**: `10.1` +- **cuDnn 版本:** `7.6.5` + + +## 三、环境搭建 + +各深度学习框架在公开的 Github 仓库中给出了详细的docker镜像和构建脚本,具体搭建流程请参考:[此处](./OtherReports)。 + +如下是 Paddle 测试环境的具体搭建流程: + +- **拉取代码** + ```bash + git clone https://github.com/PaddlePaddle/models.git + cd models && git checkout 5b4aef8ecef2c6f9a4ec81652a4138c623a754ba + ``` + + +- **构建镜像** + + ```bash + # 拉取镜像 + docker pull hub.baidubce.com/paddlepaddle/paddle:latest-dev-cuda11.0-cudnn8-gcc82 + + # 创建并进入容器 + nvidia-docker run --name=test_transformer_paddle -it \ + --net=host \ + --shm-size=1g \ + --ulimit memlock=-1 \ + --ulimit stack=67108864 \ + -e NVIDIA_VISIBLE_DEVICES=all \ + -v $PWD:/workspace/models \ + hub.baidubce.com/paddlepaddle/paddle:latest-dev-cuda11.0-cudnn8-gcc82 /bin/bash + ``` + +- **安装依赖** + ```bash + # 安装 PaddleNLP 中依赖库 + pip3.7 install -r PaddleNLP/requirements.txt + # 还需要额外安装两个依赖库 + pip3.7 install attrdict + pip3.7 install seqeval + ``` + +- **准备数据** + + 训练脚本会自动下载数据集到目录 `/root/.paddlenlp/datasets/machine_translation/WMT14ende/WMT14.en-de.tar.gz` + ``` + +## 四、测试步骤 + +### 1.单机单卡测试 + +- **FP32 启动命令:** +``` +export CUDA_VISIBLE_DEVICES=0 & nohup python3.7 train.py > ./logs/transformer_bs2560_fp32_gpu1.log 2>&1 & +``` + +需要修改 `../configs/transformer.big.yaml` 的参数 `use_amp = False`。 + +- **AMP 启动命令:** +``` +export CUDA_VISIBLE_DEVICES=0 & nohup python3.7 train.py > ./logs/transformer_bs5120_fp16_gpu1.log 2>&1 & +``` + +需要修改 `../configs/transformer.big.yaml` 的参数 `use_amp = True`。 + +## 五、测试结果 + +### 1.Paddle训练性能 + +- 训练吞吐率(sequences/sec)如下: + + |卡数 | FP32(BS=2560) | AMP(BS=5120) | + |:-----:|:-----:|:-----:| + |1 | ~ | ~ | + +### 2.与业内其它框架对比 + +- 说明: + - 同等执行环境下测试 + - 单位:`words/sec` + - BatchSize FP32下统一选择 2560、AMP下统一选择 5120 + + +- FP32测试 + + | 参数 | [PaddlePaddle](./Transformer) | [NGC PyTorch](./Transformer/OtherReports/PyTorch) | + |:-----:|:-----:|:-----:|:-----:| + | GPU=1,BS=2560 | ~ | ~ | + + +- AMP测试 + + | 参数 | [PaddlePaddle](./Transformer) | [NGC PyTorch](./Transformer/OtherReports/PyTorch) | + |:-----:|:-----:|:-----:| + | GPU=1,BS=5120 | ~ | ~ | + + +## 六、日志数据 +### 1.单机(单卡、8卡)日志 + +- [单卡 bs=2560、FP32](./logs/transformer_bs2560_fp32_gpu1.log) +- [单卡 bs=5120、AMP](./logs/transformer_bs5120_fp16_gpu1.log) From 97a2f9dce01fc3e3f9378f6a231a95fb0097a2f9 Mon Sep 17 00:00:00 2001 From: zhangkeliang Date: Mon, 4 Jan 2021 08:25:01 +0000 Subject: [PATCH 2/7] Add OtherReports --- Transformer/OtherReports/PyTorch/README.md | 158 +++++++++++++++++++++ Transformer/OtherReports/README.md | 4 + 2 files changed, 162 insertions(+) create mode 100644 Transformer/OtherReports/PyTorch/README.md create mode 100644 Transformer/OtherReports/README.md diff --git a/Transformer/OtherReports/PyTorch/README.md b/Transformer/OtherReports/PyTorch/README.md new file mode 100644 index 0000000..ec3a096 --- /dev/null +++ b/Transformer/OtherReports/PyTorch/README.md @@ -0,0 +1,158 @@ + +# NGC PyTorch Transformer 性能复现 + + +此处给出了基于 [NGC PyTorch](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/Translation/Transformer) 实现的 Transformer 任务的详细复现流程,包括执行环境、PyTorch版本、环境搭建、复现脚本、测试结果和测试日志。 + + +## 目录 +- [一、环境介绍](#一环境介绍) + - [1.物理机环境](#1物理机环境) + - [2.Docker 镜像](#2docker-镜像) +- [二、环境搭建](#二环境搭建) + - [1. 单机(单卡、8卡)环境搭建](#1-单机单卡8卡环境搭建) + - [2. 多机(32卡)环境搭建](#2-多机32卡环境搭建) +- [三、测试步骤](#三测试步骤) + - [1. 单机(单卡、8卡)测试](#1-单机单卡8卡测试) + - [2. 多机(32卡)测试](#2-多机32卡测试) +- [四、测试结果](#四测试结果) +- [五、日志数据](#五日志数据) + - [1.单机(单卡、8卡)日志](#1单机单卡8卡日志) + + +## 一、环境介绍 + +### 1.物理机环境 + +我们使用了同一个物理机环境,对 [NGC PyTorch](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/Translation/Transformer) 的 Transformer 模型进行了测试,详细物理机配置,见[Paddle Transformer 性能测试](../../README.md#1.物理机环境)。 + +### 2.Docker 镜像 + +NGC PyTorch 的代码仓库提供了自动构建 Docker 镜像的的 [shell 脚本](https://github.com/NVIDIA/DeepLearningExamples/blob/master/PyTorch/Translation/Transformer/scripts/docker/build.sh), + +- **镜像版本**: `nvcr.io/nvidia/pytorch:20.06-py3` +- **PyTorch 版本**: `1.6.0a0+9907a3e` +- **CUDA 版本**: `11.0` +- **cuDnn 版本**: `8.0.1` + +## 二、环境搭建 + +### 1. 单机单卡环境搭建 + +我们遵循了 NGC PyTorch 官网提供的 [Quick Start Guide](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/Translation/Transformer#quick-start-guide) 教程搭建了测试环境,主要过程如下: + +- **拉取代码** + + ```bash + git clone https://github.com/NVIDIA/DeepLearningExamples + cd DeepLearningExamples/PyTorch/Translation/Transformer + # 本次测试是在如下版本下完成的: + git checkout 99b1c898cead5603c945721162270c2fe077b4a2 + ``` + +- **构建镜像** + + ```bash + bash scripts/docker/build.sh # 构建镜像 + bash scripts/docker/launch.sh # 启动容器 + ``` + +- **准备数据** + + NGC PyTorch 提供单独的数据下载和预处理脚本 [scripts/run_preprocessing.sh](https://github.com/NVIDIA/DeepLearningExamples/blob/master/PyTorch/Translation/Transformer/scripts/run_preprocessing.sh)。在容器中执行如下命令,可以下载和制作 `WMT14 English-German` 数据集。 + + ```bash + bash scripts/run_preprocessing.sh + ``` + +## 三、测试步骤 + +### 1. 单机单卡测试 + +- **FP32 训练命令:** + + 若测试单机单卡 batch_size=2560、FP32 的训练性能,执行如下命令: + + ``` + python3.7 train.py /data/wmt14_en_de_joined_dict \ + --arch transformer_wmt_en_de_big_t2t \ + --share-all-embeddings \ + --optimizer adam \ + --adam-betas '(0.9, 0.997)' \ + --adam-eps "1e-9" \ + --clip-norm 0.0 \ + --lr-scheduler inverse_sqrt \ + --warmup-init-lr 0.0 \ + --warmup-updates 4000 \ + --lr 0.0006 \ + --min-lr 0.0 \ + --dropout 0.1 \ + --weight-decay 0.0 \ + --criterion label_smoothed_cross_entropy \ + --label-smoothing 0.1 \ + --max-tokens 5120 \ + --seed 1 \ + --fuse-layer-norm \ + --save-dir ./checkpoints + ``` + +- **FP32 训练命令:** + + 若测试单机单卡 batch_size=5120、AMP O1 的训练性能,执行如下命令: + + ``` + python3.7 train.py /data/wmt14_en_de_joined_dict \ + --arch transformer_wmt_en_de_big_t2t \ + --share-all-embeddings \ + --optimizer adam \ + --adam-betas '(0.9, 0.997)' \ + --adam-eps "1e-9" \ + --clip-norm 0.0 \ + --lr-scheduler inverse_sqrt \ + --warmup-init-lr 0.0 \ + --warmup-updates 4000 \ + --lr 0.0006 \ + --min-lr 0.0 \ + --dropout 0.1 \ + --weight-decay 0.0 \ + --criterion label_smoothed_cross_entropy \ + --label-smoothing 0.1 \ + --max-tokens 5120 \ + --seed 1 \ + --fuse-layer-norm \ + --amp \ + --amp-level O1 \ + --save-dir ./checkpoints + ``` + +## 四、测试结果 + +> 单位: sequences/sec + +|卡数 | FP32(BS=32) | FP32(BS=48) | AMP(BS=64) | AMP(BS=96)| +|:-----:|:-----:|:-----:|:-----:|:-----:| +|1 | 128.53 | 128.92 | 524.48 | 543.76 | +|8 | 999.99 | 995.88 | 4058.34 |4208.12 | +|32 | 3994.1 | 3974.0 | 15941.1 | 16311.6| +|32[W/O AccGrad] | 2836.7 | 3180.0 | 10391.2 | 12061.6| +> 关于batch_size 从32增加到48时,8卡和32卡性能并没有提升的问题,我们反复重测了多次。若了解相关原因,欢迎issue我们。 + +## 五、日志数据 +### 1.单机(单卡、8卡)日志 + +- [单卡 bs=32、FP32](./logs/bert_base_lamb_pretraining.pyt_bert_pretraining_phase1_fp32_bs32_gpu1.log) +- [单卡 bs=48、FP32](./logs/bert_base_lamb_pretraining.pyt_bert_pretraining_phase1_fp32_bs48_gpu1.log) +- [单卡 bs=64、AMP](./logs/bert_base_lamb_pretraining.pyt_bert_pretraining_phase1_fp16_bs64_gpu1.log) +- [单卡 bs=96、AMP](./logs/bert_base_lamb_pretraining.pyt_bert_pretraining_phase1_fp16_bs96_gpu1.log) +- [8卡 bs=32、FP32](./logs/bert_base_lamb_pretraining.pyt_bert_pretraining_phase1_fp32_bs32_gpu8.log) +- [8卡 bs=48、FP32](./logs/bert_base_lamb_pretraining.pyt_bert_pretraining_phase1_fp32_bs48_gpu8.log) +- [8卡 bs=64、AMP](./logs/bert_base_lamb_pretraining.pyt_bert_pretraining_phase1_fp16_bs64_gpu8.log) +- [8卡 bs=96、AMP](./logs/bert_base_lamb_pretraining.pyt_bert_pretraining_phase1_fp16_bs96_gpu8.log) +- [32卡 bs=32、FP32](./logs/bert_base_lamb_pretraining.pyt_bert_pretraining_phase1_fp32_bs32_gpu32.log) +- [32卡 bs=48、FP32](./logs/bert_base_lamb_pretraining.pyt_bert_pretraining_phase1_fp32_bs48_gpu32.log) +- [32卡 bs=64、AMP](./logs/bert_base_lamb_pretraining.pyt_bert_pretraining_phase1_fp16_bs64_gpu32.log) +- [32卡 bs=96、AMP](./logs/bert_base_lamb_pretraining.pyt_bert_pretraining_phase1_fp16_bs96_gpu32.log) +- [32卡 bs=32、FP32 no GradAcc](./logs/bert_base_lamb_pretraining.pyt_bert_pretraining_phase1_without_gradacc_fp32_bs32_gpu32.log) +- [32卡 bs=48、FP32 no GradAcc](./logs/bert_base_lamb_pretraining.pyt_bert_pretraining_phase1_without_gradacc_fp32_bs48_gpu32.log) +- [32卡 bs=64、AMP no GradAcc](./logs/bert_base_lamb_pretraining.pyt_bert_pretraining_phase1_without_gradacc_fp16_bs64_gpu32.log) +- [32卡 bs=96、AMP no GradAcc](./logs/bert_base_lamb_pretraining.pyt_bert_pretraining_phase1_without_gradacc_fp16_bs96_gpu32.log) diff --git a/Transformer/OtherReports/README.md b/Transformer/OtherReports/README.md new file mode 100644 index 0000000..d6c4067 --- /dev/null +++ b/Transformer/OtherReports/README.md @@ -0,0 +1,4 @@ +# README.md + +以下是业内其它框架在 Transformer 模型下的性能测试报告: +- [NGC PyTorch Transformer 性能复现](./PyTorch/) From 08f1af76c512bdd7629d04ea14bc62748ec150ca Mon Sep 17 00:00:00 2001 From: zhangkeliang Date: Mon, 4 Jan 2021 08:28:01 +0000 Subject: [PATCH 3/7] Updates README --- Transformer/OtherReports/PyTorch/README.md | 18 ++---------------- Transformer/README.md | 2 +- 2 files changed, 3 insertions(+), 17 deletions(-) diff --git a/Transformer/OtherReports/PyTorch/README.md b/Transformer/OtherReports/PyTorch/README.md index ec3a096..f52492e 100644 --- a/Transformer/OtherReports/PyTorch/README.md +++ b/Transformer/OtherReports/PyTorch/README.md @@ -140,19 +140,5 @@ NGC PyTorch 的代码仓库提供了自动构建 Docker 镜像的的 [shell 脚 ## 五、日志数据 ### 1.单机(单卡、8卡)日志 -- [单卡 bs=32、FP32](./logs/bert_base_lamb_pretraining.pyt_bert_pretraining_phase1_fp32_bs32_gpu1.log) -- [单卡 bs=48、FP32](./logs/bert_base_lamb_pretraining.pyt_bert_pretraining_phase1_fp32_bs48_gpu1.log) -- [单卡 bs=64、AMP](./logs/bert_base_lamb_pretraining.pyt_bert_pretraining_phase1_fp16_bs64_gpu1.log) -- [单卡 bs=96、AMP](./logs/bert_base_lamb_pretraining.pyt_bert_pretraining_phase1_fp16_bs96_gpu1.log) -- [8卡 bs=32、FP32](./logs/bert_base_lamb_pretraining.pyt_bert_pretraining_phase1_fp32_bs32_gpu8.log) -- [8卡 bs=48、FP32](./logs/bert_base_lamb_pretraining.pyt_bert_pretraining_phase1_fp32_bs48_gpu8.log) -- [8卡 bs=64、AMP](./logs/bert_base_lamb_pretraining.pyt_bert_pretraining_phase1_fp16_bs64_gpu8.log) -- [8卡 bs=96、AMP](./logs/bert_base_lamb_pretraining.pyt_bert_pretraining_phase1_fp16_bs96_gpu8.log) -- [32卡 bs=32、FP32](./logs/bert_base_lamb_pretraining.pyt_bert_pretraining_phase1_fp32_bs32_gpu32.log) -- [32卡 bs=48、FP32](./logs/bert_base_lamb_pretraining.pyt_bert_pretraining_phase1_fp32_bs48_gpu32.log) -- [32卡 bs=64、AMP](./logs/bert_base_lamb_pretraining.pyt_bert_pretraining_phase1_fp16_bs64_gpu32.log) -- [32卡 bs=96、AMP](./logs/bert_base_lamb_pretraining.pyt_bert_pretraining_phase1_fp16_bs96_gpu32.log) -- [32卡 bs=32、FP32 no GradAcc](./logs/bert_base_lamb_pretraining.pyt_bert_pretraining_phase1_without_gradacc_fp32_bs32_gpu32.log) -- [32卡 bs=48、FP32 no GradAcc](./logs/bert_base_lamb_pretraining.pyt_bert_pretraining_phase1_without_gradacc_fp32_bs48_gpu32.log) -- [32卡 bs=64、AMP no GradAcc](./logs/bert_base_lamb_pretraining.pyt_bert_pretraining_phase1_without_gradacc_fp16_bs64_gpu32.log) -- [32卡 bs=96、AMP no GradAcc](./logs/bert_base_lamb_pretraining.pyt_bert_pretraining_phase1_without_gradacc_fp16_bs96_gpu32.log) +- [单卡 bs=2560、FP32](./logs/transformer.pyt_transformer_fp32_bs2560_gpu1.log) +- [单卡 bs=5120、AMP](./logs/transformer.pyt_transformer_amp_bs5120_gpu1.log) diff --git a/Transformer/README.md b/Transformer/README.md index 17dd287..9610748 100644 --- a/Transformer/README.md +++ b/Transformer/README.md @@ -189,4 +189,4 @@ export CUDA_VISIBLE_DEVICES=0 & nohup python3.7 train.py > ./logs/transformer_bs ### 1.单机(单卡、8卡)日志 - [单卡 bs=2560、FP32](./logs/transformer_bs2560_fp32_gpu1.log) -- [单卡 bs=5120、AMP](./logs/transformer_bs5120_fp16_gpu1.log) +- [单卡 bs=5120、AMP](./logs/transformer_bs5120_amp_gpu1.log) From dce817ae8944d00daa3b0bc70bd8ecd339ad450c Mon Sep 17 00:00:00 2001 From: zhangkeliang Date: Wed, 6 Jan 2021 08:16:35 +0000 Subject: [PATCH 4/7] update README with perf and add logs --- Transformer/OtherReports/PyTorch/README.md | 86 ++++++----- ...mer.pyt_transformer_amp_O2_bs5120_gpu1.log | 134 ++++++++++++++++++ ...ormer.pyt_transformer_fp32_bs2560_gpu1.log | 89 ++++++++++++ 3 files changed, 273 insertions(+), 36 deletions(-) create mode 100644 Transformer/OtherReports/PyTorch/logs/transformer.pyt_transformer_amp_O2_bs5120_gpu1.log create mode 100644 Transformer/OtherReports/PyTorch/logs/transformer.pyt_transformer_fp32_bs2560_gpu1.log diff --git a/Transformer/OtherReports/PyTorch/README.md b/Transformer/OtherReports/PyTorch/README.md index f52492e..74fb834 100644 --- a/Transformer/OtherReports/PyTorch/README.md +++ b/Transformer/OtherReports/PyTorch/README.md @@ -74,7 +74,13 @@ NGC PyTorch 的代码仓库提供了自动构建 Docker 镜像的的 [shell 脚 若测试单机单卡 batch_size=2560、FP32 的训练性能,执行如下命令: ``` - python3.7 train.py /data/wmt14_en_de_joined_dict \ + RESULTS_DIR='/results' + CHECKPOINTS_DIR='/results/checkpoints' + STAT_FILE=${RESULTS_DIR}/run_log.json + mkdir -p $CHECKPOINTS_DIR + + python /workspace/translation/train.py \ + /data/wmt14_en_de_joined_dict \ --arch transformer_wmt_en_de_big_t2t \ --share-all-embeddings \ --optimizer adam \ @@ -90,55 +96,63 @@ NGC PyTorch 的代码仓库提供了自动构建 Docker 镜像的的 [shell 脚 --weight-decay 0.0 \ --criterion label_smoothed_cross_entropy \ --label-smoothing 0.1 \ - --max-tokens 5120 \ + --max-tokens 2560 \ --seed 1 \ + --max-epoch 1 \ --fuse-layer-norm \ - --save-dir ./checkpoints + --log-interval 500 \ + --save-dir ${CHECKPOINTS_DIR} \ + --stat-file ${STAT_FILE} \ ``` -- **FP32 训练命令:** +- **AMP O2 训练命令:** - 若测试单机单卡 batch_size=5120、AMP O1 的训练性能,执行如下命令: + 若测试单机单卡 batch_size=5120、AMP O2 的训练性能,执行如下命令: ``` - python3.7 train.py /data/wmt14_en_de_joined_dict \ - --arch transformer_wmt_en_de_big_t2t \ - --share-all-embeddings \ - --optimizer adam \ - --adam-betas '(0.9, 0.997)' \ - --adam-eps "1e-9" \ - --clip-norm 0.0 \ - --lr-scheduler inverse_sqrt \ - --warmup-init-lr 0.0 \ - --warmup-updates 4000 \ - --lr 0.0006 \ - --min-lr 0.0 \ - --dropout 0.1 \ - --weight-decay 0.0 \ - --criterion label_smoothed_cross_entropy \ - --label-smoothing 0.1 \ - --max-tokens 5120 \ - --seed 1 \ - --fuse-layer-norm \ - --amp \ - --amp-level O1 \ - --save-dir ./checkpoints + RESULTS_DIR='/results' + CHECKPOINTS_DIR='/results/checkpoints' + STAT_FILE=${RESULTS_DIR}/run_log.json + mkdir -p $CHECKPOINTS_DIR + + python /workspace/translation/train.py \ + /data/wmt14_en_de_joined_dict \ + --arch transformer_wmt_en_de_big_t2t \ + --share-all-embeddings \ + --optimizer adam \ + --adam-betas '(0.9, 0.997)' \ + --adam-eps "1e-9" \ + --clip-norm 0.0 \ + --lr-scheduler inverse_sqrt \ + --warmup-init-lr 0.0 \ + --warmup-updates 4000 \ + --lr 0.0006 \ + --min-lr 0.0 \ + --dropout 0.1 \ + --weight-decay 0.0 \ + --criterion label_smoothed_cross_entropy \ + --label-smoothing 0.1 \ + --max-tokens 5120 \ + --seed 1 \ + --max-epoch 1 \ + --fuse-layer-norm \ + --amp \ + --amp-level O2 \ + --log-interval 500 \ + --save-dir ${RESULTS_DIR} \ + --stat-file ${STAT_FILE} \ ``` ## 四、测试结果 -> 单位: sequences/sec +> 单位: tokens/sec -|卡数 | FP32(BS=32) | FP32(BS=48) | AMP(BS=64) | AMP(BS=96)| -|:-----:|:-----:|:-----:|:-----:|:-----:| -|1 | 128.53 | 128.92 | 524.48 | 543.76 | -|8 | 999.99 | 995.88 | 4058.34 |4208.12 | -|32 | 3994.1 | 3974.0 | 15941.1 | 16311.6| -|32[W/O AccGrad] | 2836.7 | 3180.0 | 10391.2 | 12061.6| -> 关于batch_size 从32增加到48时,8卡和32卡性能并没有提升的问题,我们反复重测了多次。若了解相关原因,欢迎issue我们。 +|卡数 | FP32(BS=2560) | AMP O2(BS=5120) | +|:-----:|:-----:|:-----:| +|1 | 7893.1 | 30523.5 | ## 五、日志数据 ### 1.单机(单卡、8卡)日志 - [单卡 bs=2560、FP32](./logs/transformer.pyt_transformer_fp32_bs2560_gpu1.log) -- [单卡 bs=5120、AMP](./logs/transformer.pyt_transformer_amp_bs5120_gpu1.log) +- [单卡 bs=5120、AMP O2](./logs/transformer.pyt_transformer_amp_O2_bs5120_gpu1.log) diff --git a/Transformer/OtherReports/PyTorch/logs/transformer.pyt_transformer_amp_O2_bs5120_gpu1.log b/Transformer/OtherReports/PyTorch/logs/transformer.pyt_transformer_amp_O2_bs5120_gpu1.log new file mode 100644 index 0000000..58d4ec9 --- /dev/null +++ b/Transformer/OtherReports/PyTorch/logs/transformer.pyt_transformer_amp_O2_bs5120_gpu1.log @@ -0,0 +1,134 @@ +nohup: ignoring input +Namespace(adam_betas='(0.9, 0.997)', adam_eps=1e-09, adaptive_softmax_cutoff=None, amp=True, amp_level='O2', arch='transformer_wmt_en_de_big_t2t', attention_dropout=0.1, beam=4, bpe_codes=None, buffer_size=64, clip_norm=0.0, cpu=False, criterion='label_smoothed_cross_entropy', data='/data/wmt14_en_de_joined_dict', decoder_attention_heads=16, decoder_embed_dim=1024, decoder_embed_path=None, decoder_ffn_embed_dim=4096, decoder_layers=6, decoder_learned_pos=False, decoder_normalize_before=True, device_id=0, distributed_backend='nccl', distributed_init_method=None, distributed_port=-1, distributed_rank=0, distributed_world_size=1, do_sanity_check=False, dropout=0.1, enable_parallel_backward_allred_opt=False, enable_parallel_backward_allred_opt_correctness_check=False, encoder_attention_heads=16, encoder_embed_dim=1024, encoder_embed_path=None, encoder_ffn_embed_dim=4096, encoder_layers=6, encoder_learned_pos=False, encoder_normalize_before=True, fp16=False, fuse_dropout_add=False, fuse_layer_norm=True, fuse_relu_dropout=False, gen_subset='test', keep_interval_updates=-1, label_smoothing=0.1, left_pad_source=True, left_pad_target=False, lenpen=1, local_rank=0, log_interval=500, lr=[0.000846], lr_scheduler='inverse_sqrt', lr_shrink=0.1, max_epoch=1, max_len_a=0, max_len_b=200, max_positions=(1024, 1024), max_sentences=None, max_sentences_valid=None, max_source_positions=1024, max_target_positions=1024, max_tokens=5120, max_update=0, min_len=1, min_loss_scale=0.0001, min_lr=0.0, model_overrides='{}', momentum=0.99, nbest=1, no_beamable_mm=False, no_early_stop=False, no_epoch_checkpoints=False, no_save=False, no_token_positional_embeddings=False, num_shards=1, online_eval=False, optimizer='adam', pad_sequence=1, parallel_backward_allred_opt_threshold=0, path=None, prefix_size=0, print_alignment=False, profile=False, profiler_file=None, profiler_steps=100, quiet=False, raw_text=False, relu_dropout=0.1, remove_bpe=None, replace_unk=None, restore_file='checkpoint_last.pt', sampling=False, sampling_temperature=1, sampling_topk=-1, save_dir='/results', save_interval=1, save_interval_updates=0, save_predictions=False, score_reference=False, seed=1, sentence_avg=False, sentencepiece=False, shard_id=0, share_all_embeddings=True, share_decoder_input_output_embed=False, skip_invalid_size_inputs_valid_test=False, source_lang=None, stat_file='/results/run_log.json', target_bleu=0.0, target_lang=None, test_cased_bleu=False, train_subset='train', unkpen=0, unnormalized=False, update_freq=[1], valid_subset='valid', validate_interval=1, warmup_init_lr=0.0, warmup_updates=4000, weight_decay=0.0) +| [en] dictionary: 33712 types +| [de] dictionary: 33712 types +| /data/wmt14_en_de_joined_dict train 4575637 examples +| /data/wmt14_en_de_joined_dict valid 3000 examples +| /data/wmt14_en_de_joined_dict test 3003 examples +| num. model params: 210808832 +Selected optimization level O2: FP16 training with FP32 batchnorm and FP32 master weights. + +Defaults for this optimization level are: +enabled : True +opt_level : O2 +cast_model_type : torch.float16 +patch_torch_functions : False +keep_batchnorm_fp32 : True +master_weights : True +loss_scale : dynamic +Processing user overrides (additional kwargs that are not None)... +After processing overrides, optimization options are: +enabled : True +opt_level : O2 +cast_model_type : torch.float16 +patch_torch_functions : False +keep_batchnorm_fp32 : True +master_weights : True +loss_scale : dynamic +| model transformer_wmt_en_de_big_t2t, criterion LabelSmoothedCrossEntropyCriterion +| training on 1 GPUs +| max tokens per GPU = 5120 and max sentences per GPU = None +| Sentences are being padded to multiples of: 1 +| Sentences are being padded to multiples of: 1 +| Sentences are being padded to multiples of: 1 +Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8192.0 +Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4096.0 +Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2048.0 +Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1024.0 +Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 512.0 +Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 256.0 +Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 128.0 +Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 64.0 +Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 32.0 +Transformer | epoch 0 | step 500 |avg loss 11.929 |avg tokens 4553.874 |tokens/s 30151.741 |walltime 85.803 | +Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 16.0 +Transformer | epoch 0 | step 1000 |avg loss 10.219 |avg tokens 4542.972 |tokens/s 30034.481 |walltime 161.433 | +Transformer | epoch 0 | step 1500 |avg loss 9.310 |avg tokens 4444.522 |tokens/s 29779.187 |walltime 236.057 | +Transformer | epoch 0 | step 2000 |avg loss 8.666 |avg tokens 4501.520 |tokens/s 30076.303 |walltime 310.892 | +Transformer | epoch 0 | step 2500 |avg loss 7.975 |avg tokens 4560.646 |tokens/s 30385.137 |walltime 385.940 | +Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 16.0 +Transformer | epoch 0 | step 3000 |avg loss 7.589 |avg tokens 4581.066 |tokens/s 30327.427 |walltime 461.466 | +Transformer | epoch 0 | step 3500 |avg loss 7.380 |avg tokens 4509.460 |tokens/s 30196.587 |walltime 536.135 | +Transformer | epoch 0 | step 4000 |avg loss 7.277 |avg tokens 4461.996 |tokens/s 30072.400 |walltime 610.322 | +Transformer | epoch 0 | step 4500 |avg loss 7.163 |avg tokens 4593.226 |tokens/s 30906.309 |walltime 684.631 | +Transformer | epoch 0 | step 5000 |avg loss 7.294 |avg tokens 4504.228 |tokens/s 30265.299 |walltime 759.044 | +Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 16.0 +Transformer | epoch 0 | step 5500 |avg loss 7.283 |avg tokens 4549.098 |tokens/s 30656.508 |walltime 833.238 | +Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8.0 +Transformer | epoch 0 | step 6000 |avg loss 7.275 |avg tokens 4529.492 |tokens/s 30358.172 |walltime 907.839 | +Transformer | epoch 0 | step 6500 |avg loss 7.408 |avg tokens 4514.662 |tokens/s 30332.363 |walltime 982.259 | +Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.0 +Transformer | epoch 0 | step 7000 |avg loss 7.559 |avg tokens 4523.974 |tokens/s 30411.125 |walltime 1056.639 | +Transformer | epoch 0 | step 7500 |avg loss 7.527 |avg tokens 4543.398 |tokens/s 30288.618 |walltime 1131.641 | +Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.0 +Transformer | epoch 0 | step 8000 |avg loss 7.543 |avg tokens 4531.322 |tokens/s 30204.047 |walltime 1206.653 | +Transformer | epoch 0 | step 8500 |avg loss 7.681 |avg tokens 4574.306 |tokens/s 30782.811 |walltime 1280.953 | +Transformer | epoch 0 | step 9000 |avg loss 7.736 |avg tokens 4495.478 |tokens/s 30609.895 |walltime 1354.384 | +Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.0 +Transformer | epoch 0 | step 9500 |avg loss 7.786 |avg tokens 4484.618 |tokens/s 30028.078 |walltime 1429.058 | +Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 0.5 +Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 0.25 +Transformer | epoch 0 | step 10000 |avg loss 7.691 |avg tokens 4567.118 |tokens/s 30780.733 |walltime 1503.246 | +Transformer | epoch 0 | step 10500 |avg loss 7.790 |avg tokens 4510.976 |tokens/s 30647.884 |walltime 1576.840 | +Transformer | epoch 0 | step 11000 |avg loss 7.752 |avg tokens 4499.432 |tokens/s 30318.893 |walltime 1651.042 | +Transformer | epoch 0 | step 11500 |avg loss 7.772 |avg tokens 4553.214 |tokens/s 30843.717 |walltime 1724.853 | +Transformer | epoch 0 | step 12000 |avg loss 7.826 |avg tokens 4472.098 |tokens/s 30739.117 |walltime 1797.595 | +Transformer | epoch 0 | step 12500 |avg loss 7.794 |avg tokens 4445.792 |tokens/s 30228.351 |walltime 1871.132 | +Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 0.25 +Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 0.125 +Transformer | epoch 0 | step 13000 |avg loss 7.757 |avg tokens 4550.220 |tokens/s 30678.936 |walltime 1945.291 | +Transformer | epoch 0 | step 13500 |avg loss 7.807 |avg tokens 4484.394 |tokens/s 30476.049 |walltime 2018.863 | +Transformer | epoch 0 | step 14000 |avg loss 7.827 |avg tokens 4520.988 |tokens/s 30552.921 |walltime 2092.850 | +Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 0.0625 +Transformer | epoch 0 | step 14500 |avg loss 7.762 |avg tokens 4521.436 |tokens/s 30523.482 |walltime 2166.914 | +Transformer | epoch 0 | step 15000 |avg loss 7.879 |avg tokens 4516.702 |tokens/s 30947.123 |walltime 2239.889 | +Transformer | epoch 0 | step 15500 |avg loss 7.848 |avg tokens 4499.284 |tokens/s 30559.256 |walltime 2313.505 | +Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 0.03125 +Transformer | epoch 0 | step 16000 |avg loss 7.874 |avg tokens 4557.068 |tokens/s 30914.484 |walltime 2387.209 | +Transformer | epoch 0 | step 16500 |avg loss 7.862 |avg tokens 4477.750 |tokens/s 30376.611 |walltime 2460.913 | +Transformer | epoch 0 | step 17000 |avg loss 7.814 |avg tokens 4606.024 |tokens/s 30842.483 |walltime 2535.583 | +Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 0.015625 +Transformer | epoch 0 | step 17500 |avg loss 7.869 |avg tokens 4479.544 |tokens/s 30338.165 |walltime 2609.410 | +Transformer | epoch 0 | step 18000 |avg loss 7.907 |avg tokens 4480.724 |tokens/s 30495.077 |walltime 2682.876 | +Transformer | epoch 0 | step 18500 |avg loss 7.845 |avg tokens 4512.074 |tokens/s 30558.811 |walltime 2756.702 | +Transformer | epoch 0 | step 19000 |avg loss 7.825 |avg tokens 4545.856 |tokens/s 30906.872 |walltime 2830.244 | +Transformer | epoch 0 | step 19500 |avg loss 7.840 |avg tokens 4546.442 |tokens/s 30527.025 |walltime 2904.710 | +Transformer | epoch 0 | step 20000 |avg loss 7.923 |avg tokens 4496.134 |tokens/s 30482.550 |walltime 2978.459 | +Transformer | epoch 0 | step 20500 |avg loss 7.905 |avg tokens 4519.676 |tokens/s 30679.300 |walltime 3052.119 | +Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 0.015625 +Transformer | epoch 0 | step 21000 |avg loss 7.958 |avg tokens 4509.232 |tokens/s 30261.188 |walltime 3126.624 | +Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 0.0078125 +Transformer | epoch 0 | step 21500 |avg loss 7.983 |avg tokens 4519.686 |tokens/s 30247.938 |walltime 3201.335 | +Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 0.00390625 +Transformer | epoch 0 | step 22000 |avg loss 8.078 |avg tokens 4499.402 |tokens/s 30601.066 |walltime 3274.852 | +Transformer | epoch 0 | step 22500 |avg loss 8.005 |avg tokens 4523.794 |tokens/s 30520.011 |walltime 3348.964 | +Transformer | epoch 0 | step 23000 |avg loss 8.006 |avg tokens 4512.090 |tokens/s 30523.122 |walltime 3422.876 | +Transformer | epoch 0 | step 23500 |avg loss 7.993 |avg tokens 4501.332 |tokens/s 30366.430 |walltime 3496.993 | +Transformer | epoch 0 | step 24000 |avg loss 8.012 |avg tokens 4482.898 |tokens/s 30488.550 |walltime 3570.511 | +Transformer | epoch 0 | step 24500 |avg loss 7.954 |avg tokens 4511.830 |tokens/s 30711.236 |walltime 3643.967 | +Transformer | epoch 0 | step 25000 |avg loss 7.939 |avg tokens 4555.644 |tokens/s 30817.959 |walltime 3717.879 | +Transformer | epoch 0 | step 25500 |avg loss 8.016 |avg tokens 4471.746 |tokens/s 30626.510 |walltime 3790.883 | +Transformer | epoch 0 | step 26000 |avg loss 7.950 |avg tokens 4516.412 |tokens/s 30559.760 |walltime 3864.778 | +Transformer | epoch 0 | step 26500 |avg loss 8.003 |avg tokens 4477.858 |tokens/s 30523.033 |walltime 3938.130 | +Transformer | epoch 0 | step 27000 |avg loss 7.933 |avg tokens 4532.400 |tokens/s 30811.621 |walltime 4011.680 | +Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 0.0078125 +Transformer | epoch 0 | step 27500 |avg loss 7.985 |avg tokens 4518.778 |tokens/s 30663.218 |walltime 4085.365 | +Transformer | epoch 0 | step 28000 |avg loss 7.990 |avg tokens 4587.856 |tokens/s 31275.274 |walltime 4158.711 | +Transformer | epoch 0 | step 28500 |avg loss 8.050 |avg tokens 4421.904 |tokens/s 30080.992 |walltime 4232.211 | +Transformer | epoch 0 | step 29000 |avg loss 8.012 |avg tokens 4549.126 |tokens/s 31214.659 |walltime 4305.079 | +Transformer | epoch 0 | step 29500 |avg loss 7.988 |avg tokens 4546.422 |tokens/s 31030.572 |walltime 4378.336 | +Transformer | epoch 0 | step 30000 |avg loss 8.006 |avg tokens 4524.482 |tokens/s 30744.507 |walltime 4451.918 | +Transformer | epoch 0 | step 30500 |avg loss 8.011 |avg tokens 4540.014 |tokens/s 30637.047 |walltime 4526.012 | +Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 0.0078125 +Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 0.00390625 +Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 0.001953125 +Transformer | epoch 0 | step 31000 |avg loss 8.004 |avg tokens 4498.210 |tokens/s 30299.802 |walltime 4600.240 | +Epoch time: 4661.986679553986 +Transformer | epoch 0 | step 31487 |avg loss 8.005 |avg tokens 4529.889 |tokens/s 30691.527 |walltime 4672.119 | +Validation loss on subset valid: 8.048188442107273 +/workspace/translation/fairseq/sequence_generator.py:376: UserWarning: Integer division of tensors using div or / is deprecated, and in a future release div will perform true division as in Python 3. Use true_divide or floor_divide (// in Python) instead. (Triggered internally at ../aten/src/ATen/native/BinaryOps.cpp:66.) + torch.div(cand_indices, self.vocab_size, out=cand_beams) +| Translated 3000 sentences (98034 tokens) in 66.5s (45.11 sentences/s, 1474.17 tokens/s) +| Eval completed in: 87.92s | UNCASED BLEU 0.70 +| done training in 4765.1 seconds +Transformer | epoch 0 | step RUN |avg loss 8.048 |walltime 4775.538 | diff --git a/Transformer/OtherReports/PyTorch/logs/transformer.pyt_transformer_fp32_bs2560_gpu1.log b/Transformer/OtherReports/PyTorch/logs/transformer.pyt_transformer_fp32_bs2560_gpu1.log new file mode 100644 index 0000000..984a1dd --- /dev/null +++ b/Transformer/OtherReports/PyTorch/logs/transformer.pyt_transformer_fp32_bs2560_gpu1.log @@ -0,0 +1,89 @@ +nohup: ignoring input +Namespace(adam_betas='(0.9, 0.997)', adam_eps=1e-09, adaptive_softmax_cutoff=None, amp=False, amp_level='O1', arch='transformer_wmt_en_de_big_t2t', attention_dropout=0.1, beam=4, bpe_codes=None, buffer_size=64, clip_norm=0.0, cpu=False, criterion='label_smoothed_cross_entropy', data='/data/wmt14_en_de_joined_dict', decoder_attention_heads=16, decoder_embed_dim=1024, decoder_embed_path=None, decoder_ffn_embed_dim=4096, decoder_layers=6, decoder_learned_pos=False, decoder_normalize_before=True, device_id=0, distributed_backend='nccl', distributed_init_method=None, distributed_port=-1, distributed_rank=0, distributed_world_size=1, do_sanity_check=False, dropout=0.1, enable_parallel_backward_allred_opt=False, enable_parallel_backward_allred_opt_correctness_check=False, encoder_attention_heads=16, encoder_embed_dim=1024, encoder_embed_path=None, encoder_ffn_embed_dim=4096, encoder_layers=6, encoder_learned_pos=False, encoder_normalize_before=True, fp16=False, fuse_dropout_add=False, fuse_layer_norm=True, fuse_relu_dropout=False, gen_subset='test', keep_interval_updates=-1, label_smoothing=0.1, left_pad_source=True, left_pad_target=False, lenpen=1, local_rank=0, log_interval=500, lr=[0.0006], lr_scheduler='inverse_sqrt', lr_shrink=0.1, max_epoch=1, max_len_a=0, max_len_b=200, max_positions=(1024, 1024), max_sentences=None, max_sentences_valid=None, max_source_positions=1024, max_target_positions=1024, max_tokens=2560, max_update=0, min_len=1, min_loss_scale=0.0001, min_lr=0.0, model_overrides='{}', momentum=0.99, nbest=1, no_beamable_mm=False, no_early_stop=False, no_epoch_checkpoints=False, no_save=False, no_token_positional_embeddings=False, num_shards=1, online_eval=False, optimizer='adam', pad_sequence=1, parallel_backward_allred_opt_threshold=0, path=None, prefix_size=0, print_alignment=False, profile=False, profiler_file=None, profiler_steps=100, quiet=False, raw_text=False, relu_dropout=0.1, remove_bpe=None, replace_unk=None, restore_file='checkpoint_last.pt', sampling=False, sampling_temperature=1, sampling_topk=-1, save_dir='/results/checkpoints', save_interval=1, save_interval_updates=0, save_predictions=False, score_reference=False, seed=1, sentence_avg=False, sentencepiece=False, shard_id=0, share_all_embeddings=True, share_decoder_input_output_embed=False, skip_invalid_size_inputs_valid_test=False, source_lang=None, stat_file='/results/run_log.json', target_bleu=0.0, target_lang=None, test_cased_bleu=False, train_subset='train', unkpen=0, unnormalized=False, update_freq=[1], valid_subset='valid', validate_interval=1, warmup_init_lr=0.0, warmup_updates=4000, weight_decay=0.0) +| [en] dictionary: 33712 types +| [de] dictionary: 33712 types +| /data/wmt14_en_de_joined_dict train 4575637 examples +| /data/wmt14_en_de_joined_dict valid 3000 examples +| /data/wmt14_en_de_joined_dict test 3003 examples +| num. model params: 210808832 +| NOTICE: your device may support faster training with --amp +| model transformer_wmt_en_de_big_t2t, criterion LabelSmoothedCrossEntropyCriterion +| training on 1 GPUs +| max tokens per GPU = 2560 and max sentences per GPU = None +| Sentences are being padded to multiples of: 1 +| Sentences are being padded to multiples of: 1 +| Sentences are being padded to multiples of: 1 +Transformer | epoch 0 | step 500 |avg loss 12.152 |avg tokens 2195.818 |tokens/s 8289.358 |walltime 142.634 | +Transformer | epoch 0 | step 1000 |avg loss 10.630 |avg tokens 2202.080 |tokens/s 8228.213 |walltime 276.447 | +Transformer | epoch 0 | step 1500 |avg loss 9.917 |avg tokens 2206.962 |tokens/s 8312.206 |walltime 409.201 | +Transformer | epoch 0 | step 2000 |avg loss 9.383 |avg tokens 2171.184 |tokens/s 8177.711 |walltime 541.951 | +Transformer | epoch 0 | step 2500 |avg loss 8.994 |avg tokens 2185.496 |tokens/s 8263.584 |walltime 674.188 | +Transformer | epoch 0 | step 3000 |avg loss 8.660 |avg tokens 2204.280 |tokens/s 8301.811 |walltime 806.947 | +Transformer | epoch 0 | step 3500 |avg loss 8.411 |avg tokens 2195.928 |tokens/s 8263.810 |walltime 939.811 | +Transformer | epoch 0 | step 4000 |avg loss 8.222 |avg tokens 2206.462 |tokens/s 8299.936 |walltime 1072.731 | +Transformer | epoch 0 | step 4500 |avg loss 8.082 |avg tokens 2182.408 |tokens/s 8276.575 |walltime 1204.574 | +Transformer | epoch 0 | step 5000 |avg loss 7.891 |avg tokens 2192.006 |tokens/s 8297.123 |walltime 1336.668 | +Transformer | epoch 0 | step 5500 |avg loss 7.858 |avg tokens 2150.818 |tokens/s 8207.926 |walltime 1467.689 | +Transformer | epoch 0 | step 6000 |avg loss 7.723 |avg tokens 2184.456 |tokens/s 8273.980 |walltime 1599.697 | +Transformer | epoch 0 | step 6500 |avg loss 7.624 |avg tokens 2188.844 |tokens/s 8280.595 |walltime 1731.864 | +Transformer | epoch 0 | step 7000 |avg loss 7.616 |avg tokens 2169.096 |tokens/s 8213.447 |walltime 1863.909 | +Transformer | epoch 0 | step 7500 |avg loss 7.600 |avg tokens 2200.412 |tokens/s 8328.036 |walltime 1996.018 | +Transformer | epoch 0 | step 8000 |avg loss 7.586 |avg tokens 2179.380 |tokens/s 8275.324 |walltime 2127.697 | +Transformer | epoch 0 | step 8500 |avg loss 7.550 |avg tokens 2201.336 |tokens/s 8306.888 |walltime 2260.198 | +Transformer | epoch 0 | step 9000 |avg loss 7.437 |avg tokens 2186.126 |tokens/s 8228.255 |walltime 2393.040 | +Transformer | epoch 0 | step 9500 |avg loss 7.460 |avg tokens 2194.148 |tokens/s 8258.480 |walltime 2525.883 | +Transformer | epoch 0 | step 10000 |avg loss 7.474 |avg tokens 2192.180 |tokens/s 8247.883 |walltime 2658.776 | +Transformer | epoch 0 | step 10500 |avg loss 7.507 |avg tokens 2149.200 |tokens/s 8153.027 |walltime 2790.580 | +Transformer | epoch 0 | step 11000 |avg loss 7.589 |avg tokens 2169.804 |tokens/s 8250.802 |walltime 2922.070 | +Transformer | epoch 0 | step 11500 |avg loss 7.571 |avg tokens 2169.048 |tokens/s 8224.166 |walltime 3053.941 | +Transformer | epoch 0 | step 12000 |avg loss 7.559 |avg tokens 2196.918 |tokens/s 8321.068 |walltime 3185.950 | +Transformer | epoch 0 | step 12500 |avg loss 7.508 |avg tokens 2182.824 |tokens/s 8205.070 |walltime 3318.967 | +Transformer | epoch 0 | step 13000 |avg loss 7.531 |avg tokens 2203.356 |tokens/s 8294.488 |walltime 3451.788 | +Transformer | epoch 0 | step 13500 |avg loss 7.568 |avg tokens 2217.090 |tokens/s 8380.963 |walltime 3584.057 | +Transformer | epoch 0 | step 14000 |avg loss 7.592 |avg tokens 2166.636 |tokens/s 8187.261 |walltime 3716.374 | +Transformer | epoch 0 | step 14500 |avg loss 7.608 |avg tokens 2170.448 |tokens/s 8227.936 |walltime 3848.270 | +Transformer | epoch 0 | step 15000 |avg loss 7.622 |avg tokens 2201.498 |tokens/s 8309.194 |walltime 3980.743 | +Transformer | epoch 0 | step 15500 |avg loss 7.629 |avg tokens 2192.570 |tokens/s 8152.767 |walltime 4115.211 | +Transformer | epoch 0 | step 16000 |avg loss 7.591 |avg tokens 2207.126 |tokens/s 8229.432 |walltime 4249.311 | +Transformer | epoch 0 | step 16500 |avg loss 7.664 |avg tokens 2186.202 |tokens/s 8209.315 |walltime 4382.464 | +Transformer | epoch 0 | step 17000 |avg loss 7.657 |avg tokens 2189.744 |tokens/s 8197.156 |walltime 4516.032 | +Transformer | epoch 0 | step 17500 |avg loss 7.635 |avg tokens 2169.092 |tokens/s 8114.278 |walltime 4649.691 | +Transformer | epoch 0 | step 18000 |avg loss 7.679 |avg tokens 2165.366 |tokens/s 8174.168 |walltime 4782.142 | +Transformer | epoch 0 | step 18500 |avg loss 7.607 |avg tokens 2196.778 |tokens/s 8187.382 |walltime 4916.299 | +Transformer | epoch 0 | step 19000 |avg loss 7.680 |avg tokens 2184.738 |tokens/s 8244.761 |walltime 5048.791 | +Transformer | epoch 0 | step 19500 |avg loss 7.651 |avg tokens 2166.342 |tokens/s 8219.323 |walltime 5180.575 | +Transformer | epoch 0 | step 20000 |avg loss 7.670 |avg tokens 2161.914 |tokens/s 8225.962 |walltime 5311.983 | +Transformer | epoch 0 | step 20500 |avg loss 7.680 |avg tokens 2166.076 |tokens/s 8209.319 |walltime 5443.911 | +Transformer | epoch 0 | step 21000 |avg loss 7.748 |avg tokens 2191.680 |tokens/s 8306.427 |walltime 5575.837 | +Transformer | epoch 0 | step 21500 |avg loss 7.697 |avg tokens 2194.442 |tokens/s 8309.224 |walltime 5707.886 | +Transformer | epoch 0 | step 22000 |avg loss 7.689 |avg tokens 2204.234 |tokens/s 8307.733 |walltime 5840.547 | +Transformer | epoch 0 | step 22500 |avg loss 7.699 |avg tokens 2172.204 |tokens/s 8269.636 |walltime 5971.884 | +Transformer | epoch 0 | step 23000 |avg loss 7.635 |avg tokens 2172.254 |tokens/s 8226.100 |walltime 6103.918 | +Transformer | epoch 0 | step 23500 |avg loss 7.683 |avg tokens 2178.170 |tokens/s 8304.169 |walltime 6235.067 | +Transformer | epoch 0 | step 24000 |avg loss 7.701 |avg tokens 2163.650 |tokens/s 8237.664 |walltime 6366.394 | +Transformer | epoch 0 | step 24500 |avg loss 7.637 |avg tokens 2169.594 |tokens/s 8213.130 |walltime 6498.475 | +Transformer | epoch 0 | step 25000 |avg loss 7.607 |avg tokens 2197.396 |tokens/s 8290.153 |walltime 6631.005 | +Transformer | epoch 0 | step 25500 |avg loss 7.616 |avg tokens 2205.076 |tokens/s 8286.256 |walltime 6764.061 | +Transformer | epoch 0 | step 26000 |avg loss 7.589 |avg tokens 2215.762 |tokens/s 8329.483 |walltime 6897.069 | +Transformer | epoch 0 | step 26500 |avg loss 7.615 |avg tokens 2203.484 |tokens/s 8313.073 |walltime 7029.600 | +Transformer | epoch 0 | step 27000 |avg loss 7.633 |avg tokens 2177.088 |tokens/s 8257.757 |walltime 7161.421 | +Transformer | epoch 0 | step 27500 |avg loss 7.626 |avg tokens 2186.434 |tokens/s 8254.484 |walltime 7293.860 | +Transformer | epoch 0 | step 28000 |avg loss 7.655 |avg tokens 2194.886 |tokens/s 8279.447 |walltime 7426.410 | +Transformer | epoch 0 | step 28500 |avg loss 7.636 |avg tokens 2194.806 |tokens/s 8327.890 |walltime 7558.184 | +Transformer | epoch 0 | step 29000 |avg loss 7.669 |avg tokens 2164.240 |tokens/s 8250.329 |walltime 7689.345 | +Transformer | epoch 0 | step 29500 |avg loss 7.639 |avg tokens 2199.542 |tokens/s 8324.032 |walltime 7821.465 | +Transformer | epoch 0 | step 30000 |avg loss 7.660 |avg tokens 2167.926 |tokens/s 8250.513 |walltime 7952.847 | +Transformer | epoch 0 | step 30500 |avg loss 7.661 |avg tokens 2195.226 |tokens/s 8314.417 |walltime 8084.860 | +Transformer | epoch 0 | step 31000 |avg loss 7.687 |avg tokens 2180.980 |tokens/s 8291.038 |walltime 8216.386 | +Transformer | epoch 0 | step 31500 |avg loss 7.632 |avg tokens 2180.762 |tokens/s 8259.813 |walltime 8348.397 | +Transformer | epoch 0 | step 32000 |avg loss 7.606 |avg tokens 2193.666 |tokens/s 8318.803 |walltime 8480.246 | +Transformer | epoch 0 | step 32500 |avg loss 7.658 |avg tokens 2165.796 |tokens/s 8270.194 |walltime 8611.186 | +Transformer | epoch 0 | step 33000 |avg loss 7.666 |avg tokens 2182.462 |tokens/s 8276.988 |walltime 8743.025 | +Transformer | epoch 0 | step 33500 |avg loss 7.631 |avg tokens 2200.074 |tokens/s 8322.422 |walltime 8875.203 | +Transformer | epoch 0 | step 34000 |avg loss 7.577 |avg tokens 2211.412 |tokens/s 8341.737 |walltime 9007.754 | +Transformer | epoch 0 | step 34500 |avg loss 7.618 |avg tokens 2174.824 |tokens/s 8299.537 |walltime 9138.775 | +Transformer | epoch 0 | step 35000 |avg loss 7.602 |avg tokens 2174.564 |tokens/s 8271.678 |walltime 9270.221 | +Transformer | epoch 0 | step 35500 |avg loss 7.679 |avg tokens 2162.148 |tokens/s 8243.202 |walltime 9401.369 | +Transformer | epoch 0 | step 36000 |avg loss 7.601 |avg tokens 2165.980 |tokens/s 8197.896 |walltime 9533.474 | +Transformer | epoch 0 | step 36500 |avg loss 7.654 |avg tokens 2203.624 |tokens/s 8294.220 |walltime 9666.315 | +Transformer | epoch 0 | step 37000 |avg loss 7.662 |avg tokens 2163.496 |tokens/s 8162.756 |walltime 9798.838 | \ No newline at end of file From f30f8961183d2fa56eefd60c72a6f80bf32039ef Mon Sep 17 00:00:00 2001 From: zhangkeliang Date: Wed, 6 Jan 2021 10:36:04 +0000 Subject: [PATCH 5/7] Update pytorch fp32 log --- ...ormer.pyt_transformer_fp32_bs2560_gpu1.log | 67 ++++++++++++++++++- 1 file changed, 66 insertions(+), 1 deletion(-) diff --git a/Transformer/OtherReports/PyTorch/logs/transformer.pyt_transformer_fp32_bs2560_gpu1.log b/Transformer/OtherReports/PyTorch/logs/transformer.pyt_transformer_fp32_bs2560_gpu1.log index 984a1dd..2fbb731 100644 --- a/Transformer/OtherReports/PyTorch/logs/transformer.pyt_transformer_fp32_bs2560_gpu1.log +++ b/Transformer/OtherReports/PyTorch/logs/transformer.pyt_transformer_fp32_bs2560_gpu1.log @@ -86,4 +86,69 @@ Transformer | epoch 0 | step 35000 |avg loss 7.602 |avg tokens 2174.564 |tokens/ Transformer | epoch 0 | step 35500 |avg loss 7.679 |avg tokens 2162.148 |tokens/s 8243.202 |walltime 9401.369 | Transformer | epoch 0 | step 36000 |avg loss 7.601 |avg tokens 2165.980 |tokens/s 8197.896 |walltime 9533.474 | Transformer | epoch 0 | step 36500 |avg loss 7.654 |avg tokens 2203.624 |tokens/s 8294.220 |walltime 9666.315 | -Transformer | epoch 0 | step 37000 |avg loss 7.662 |avg tokens 2163.496 |tokens/s 8162.756 |walltime 9798.838 | \ No newline at end of file +Transformer | epoch 0 | step 37000 |avg loss 7.662 |avg tokens 2163.496 |tokens/s 8162.756 |walltime 9798.838 | +Transformer | epoch 0 | step 37500 |avg loss 7.597 |avg tokens 2172.708 |tokens/s 8160.810 |walltime 9931.956 | +Transformer | epoch 0 | step 38000 |avg loss 7.569 |avg tokens 2200.082 |tokens/s 8258.275 |walltime 10065.161 | +Transformer | epoch 0 | step 38500 |avg loss 7.595 |avg tokens 2195.128 |tokens/s 8272.491 |walltime 10197.837 | +Transformer | epoch 0 | step 39000 |avg loss 7.565 |avg tokens 2222.478 |tokens/s 8259.310 |walltime 10332.381 | +Transformer | epoch 0 | step 39500 |avg loss 7.607 |avg tokens 2195.140 |tokens/s 8306.001 |walltime 10464.523 | +Transformer | epoch 0 | step 40000 |avg loss 7.575 |avg tokens 2185.690 |tokens/s 8245.382 |walltime 10597.063 | +Transformer | epoch 0 | step 40500 |avg loss 7.563 |avg tokens 2207.220 |tokens/s 8236.402 |walltime 10731.055 | +Transformer | epoch 0 | step 41000 |avg loss 7.560 |avg tokens 2187.070 |tokens/s 8160.796 |walltime 10865.054 | +Transformer | epoch 0 | step 41500 |avg loss 7.597 |avg tokens 2163.030 |tokens/s 8155.508 |walltime 10997.665 | +Transformer | epoch 0 | step 42000 |avg loss 7.563 |avg tokens 2152.882 |tokens/s 8121.644 |walltime 11130.205 | +Transformer | epoch 0 | step 42500 |avg loss 7.549 |avg tokens 2216.850 |tokens/s 8258.616 |walltime 11264.419 | +Transformer | epoch 0 | step 43000 |avg loss 7.590 |avg tokens 2175.198 |tokens/s 8150.641 |walltime 11397.857 | +Transformer | epoch 0 | step 43500 |avg loss 7.576 |avg tokens 2187.446 |tokens/s 8150.600 |walltime 11532.046 | +Transformer | epoch 0 | step 44000 |avg loss 7.541 |avg tokens 2193.696 |tokens/s 8226.812 |walltime 11665.372 | +Transformer | epoch 0 | step 44500 |avg loss 7.548 |avg tokens 2167.230 |tokens/s 8153.264 |walltime 11798.278 | +Transformer | epoch 0 | step 45000 |avg loss 7.520 |avg tokens 2162.548 |tokens/s 8161.142 |walltime 11930.768 | +Transformer | epoch 0 | step 45500 |avg loss 7.479 |avg tokens 2178.632 |tokens/s 8151.418 |walltime 12064.403 | +Transformer | epoch 0 | step 46000 |avg loss 7.515 |avg tokens 2166.516 |tokens/s 8150.569 |walltime 12197.309 | +Transformer | epoch 0 | step 46500 |avg loss 7.544 |avg tokens 2177.210 |tokens/s 8173.620 |walltime 12330.494 | +Transformer | epoch 0 | step 47000 |avg loss 7.516 |avg tokens 2159.966 |tokens/s 8112.725 |walltime 12463.616 | +Transformer | epoch 0 | step 47500 |avg loss 7.445 |avg tokens 2183.310 |tokens/s 8201.379 |walltime 12596.723 | +Transformer | epoch 0 | step 48000 |avg loss 7.485 |avg tokens 2207.598 |tokens/s 8241.243 |walltime 12730.659 | +Transformer | epoch 0 | step 48500 |avg loss 7.527 |avg tokens 2174.254 |tokens/s 8205.941 |walltime 12863.139 | +Transformer | epoch 0 | step 49000 |avg loss 7.517 |avg tokens 2180.624 |tokens/s 8209.757 |walltime 12995.946 | +Transformer | epoch 0 | step 49500 |avg loss 7.535 |avg tokens 2156.342 |tokens/s 8178.281 |walltime 13127.779 | +Transformer | epoch 0 | step 50000 |avg loss 7.476 |avg tokens 2199.442 |tokens/s 8208.834 |walltime 13261.747 | +Transformer | epoch 0 | step 50500 |avg loss 7.512 |avg tokens 2164.196 |tokens/s 8200.566 |walltime 13393.702 | +Transformer | epoch 0 | step 51000 |avg loss 7.519 |avg tokens 2210.832 |tokens/s 8300.968 |walltime 13526.869 | +Transformer | epoch 0 | step 51500 |avg loss 7.526 |avg tokens 2170.272 |tokens/s 8154.514 |walltime 13659.940 | +Transformer | epoch 0 | step 52000 |avg loss 7.566 |avg tokens 2144.520 |tokens/s 8236.081 |walltime 13790.131 | +Transformer | epoch 0 | step 52500 |avg loss 7.477 |avg tokens 2173.838 |tokens/s 8202.748 |walltime 13922.638 | +Transformer | epoch 0 | step 53000 |avg loss 7.471 |avg tokens 2186.586 |tokens/s 8320.763 |walltime 14054.031 | +Transformer | epoch 0 | step 53500 |avg loss 7.493 |avg tokens 2162.470 |tokens/s 8276.680 |walltime 14184.667 | +Transformer | epoch 0 | step 54000 |avg loss 7.511 |avg tokens 2185.144 |tokens/s 8339.763 |walltime 14315.675 | +Transformer | epoch 0 | step 54500 |avg loss 7.515 |avg tokens 2181.010 |tokens/s 8323.035 |walltime 14446.698 | +Transformer | epoch 0 | step 55000 |avg loss 7.471 |avg tokens 2164.734 |tokens/s 8169.513 |walltime 14579.186 | +Transformer | epoch 0 | step 55500 |avg loss 7.462 |avg tokens 2175.078 |tokens/s 8194.102 |walltime 14711.908 | +Transformer | epoch 0 | step 56000 |avg loss 7.465 |avg tokens 2165.984 |tokens/s 8260.911 |walltime 14843.007 | +Transformer | epoch 0 | step 56500 |avg loss 7.467 |avg tokens 2200.316 |tokens/s 8346.269 |walltime 14974.821 | +Transformer | epoch 0 | step 57000 |avg loss 7.519 |avg tokens 2158.848 |tokens/s 8250.587 |walltime 15105.651 | +Transformer | epoch 0 | step 57500 |avg loss 7.450 |avg tokens 2168.044 |tokens/s 8260.896 |walltime 15236.874 | +Transformer | epoch 0 | step 58000 |avg loss 7.454 |avg tokens 2158.620 |tokens/s 8225.355 |walltime 15368.092 | +Transformer | epoch 0 | step 58500 |avg loss 7.475 |avg tokens 2188.858 |tokens/s 8303.732 |walltime 15499.891 | +Transformer | epoch 0 | step 59000 |avg loss 7.450 |avg tokens 2168.490 |tokens/s 8143.818 |walltime 15633.029 | +Transformer | epoch 0 | step 59500 |avg loss 7.468 |avg tokens 2155.120 |tokens/s 8118.629 |walltime 15765.755 | +Transformer | epoch 0 | step 60000 |avg loss 7.389 |avg tokens 2186.216 |tokens/s 8323.939 |walltime 15897.077 | +Transformer | epoch 0 | step 60500 |avg loss 7.435 |avg tokens 2178.198 |tokens/s 8281.456 |walltime 16028.587 | +Transformer | epoch 0 | step 61000 |avg loss 7.452 |avg tokens 2154.616 |tokens/s 8227.945 |walltime 16159.520 | +Transformer | epoch 0 | step 61500 |avg loss 7.475 |avg tokens 2174.858 |tokens/s 8334.523 |walltime 16289.993 | +Transformer | epoch 0 | step 62000 |avg loss 7.472 |avg tokens 2162.480 |tokens/s 8133.308 |walltime 16422.933 | +Transformer | epoch 0 | step 62500 |avg loss 7.455 |avg tokens 2164.270 |tokens/s 8110.761 |walltime 16556.352 | +Transformer | epoch 0 | step 63000 |avg loss 7.449 |avg tokens 2176.958 |tokens/s 8196.144 |walltime 16689.156 | +Transformer | epoch 0 | step 63500 |avg loss 7.462 |avg tokens 2174.884 |tokens/s 8277.476 |walltime 16820.530 | +Transformer | epoch 0 | step 64000 |avg loss 7.412 |avg tokens 2208.194 |tokens/s 8305.859 |walltime 16953.460 | +Transformer | epoch 0 | step 64500 |avg loss 7.463 |avg tokens 2184.822 |tokens/s 8153.564 |walltime 17087.439 | +Transformer | epoch 0 | step 65000 |avg loss 7.402 |avg tokens 2204.512 |tokens/s 8204.052 |walltime 17221.794 | +Epoch time: 17264.9287276268 +Transformer | epoch 0 | step 65198 |avg loss 7.538 |avg tokens 2144.318 |tokens/s 7983.551 |walltime 17274.975 | +Validation loss on subset valid: 7.380476935892614 +/workspace/translation/fairseq/sequence_generator.py:376: UserWarning: Integer division of tensors using div or / is deprecated, and in a future release div will perform true division as in Python 3. Use true_divide or floor_divide (// in Python) instead. (Triggered internally at ../aten/src/ATen/native/BinaryOps.cpp:66.) + torch.div(cand_indices, self.vocab_size, out=cand_beams) +| Translated 3000 sentences (124565 tokens) in 75.7s (39.64 sentences/s, 1645.87 tokens/s) +| Eval completed in: 98.04s | UNCASED BLEU 1.45 +| done training in 17378.6 seconds +Transformer | epoch 0 | step RUN |avg loss 7.380 |walltime 17388.917 | From 4f527b6db6bca5604d7ba0d6780ff12dbd452867 Mon Sep 17 00:00:00 2001 From: zhangkeliang Date: Sun, 10 Jan 2021 11:38:55 +0000 Subject: [PATCH 6/7] update pytorch transformer logs --- ...mer.pyt_transformer_amp_O2_bs5120_gpu1.log | 3251 +++++++- ...ormer.pyt_transformer_fp32_bs2560_gpu1.log | 6665 ++++++++++++++++- 2 files changed, 9690 insertions(+), 226 deletions(-) diff --git a/Transformer/OtherReports/PyTorch/logs/transformer.pyt_transformer_amp_O2_bs5120_gpu1.log b/Transformer/OtherReports/PyTorch/logs/transformer.pyt_transformer_amp_O2_bs5120_gpu1.log index 58d4ec9..046200f 100644 --- a/Transformer/OtherReports/PyTorch/logs/transformer.pyt_transformer_amp_O2_bs5120_gpu1.log +++ b/Transformer/OtherReports/PyTorch/logs/transformer.pyt_transformer_amp_O2_bs5120_gpu1.log @@ -1,5 +1,5 @@ nohup: ignoring input -Namespace(adam_betas='(0.9, 0.997)', adam_eps=1e-09, adaptive_softmax_cutoff=None, amp=True, amp_level='O2', arch='transformer_wmt_en_de_big_t2t', attention_dropout=0.1, beam=4, bpe_codes=None, buffer_size=64, clip_norm=0.0, cpu=False, criterion='label_smoothed_cross_entropy', data='/data/wmt14_en_de_joined_dict', decoder_attention_heads=16, decoder_embed_dim=1024, decoder_embed_path=None, decoder_ffn_embed_dim=4096, decoder_layers=6, decoder_learned_pos=False, decoder_normalize_before=True, device_id=0, distributed_backend='nccl', distributed_init_method=None, distributed_port=-1, distributed_rank=0, distributed_world_size=1, do_sanity_check=False, dropout=0.1, enable_parallel_backward_allred_opt=False, enable_parallel_backward_allred_opt_correctness_check=False, encoder_attention_heads=16, encoder_embed_dim=1024, encoder_embed_path=None, encoder_ffn_embed_dim=4096, encoder_layers=6, encoder_learned_pos=False, encoder_normalize_before=True, fp16=False, fuse_dropout_add=False, fuse_layer_norm=True, fuse_relu_dropout=False, gen_subset='test', keep_interval_updates=-1, label_smoothing=0.1, left_pad_source=True, left_pad_target=False, lenpen=1, local_rank=0, log_interval=500, lr=[0.000846], lr_scheduler='inverse_sqrt', lr_shrink=0.1, max_epoch=1, max_len_a=0, max_len_b=200, max_positions=(1024, 1024), max_sentences=None, max_sentences_valid=None, max_source_positions=1024, max_target_positions=1024, max_tokens=5120, max_update=0, min_len=1, min_loss_scale=0.0001, min_lr=0.0, model_overrides='{}', momentum=0.99, nbest=1, no_beamable_mm=False, no_early_stop=False, no_epoch_checkpoints=False, no_save=False, no_token_positional_embeddings=False, num_shards=1, online_eval=False, optimizer='adam', pad_sequence=1, parallel_backward_allred_opt_threshold=0, path=None, prefix_size=0, print_alignment=False, profile=False, profiler_file=None, profiler_steps=100, quiet=False, raw_text=False, relu_dropout=0.1, remove_bpe=None, replace_unk=None, restore_file='checkpoint_last.pt', sampling=False, sampling_temperature=1, sampling_topk=-1, save_dir='/results', save_interval=1, save_interval_updates=0, save_predictions=False, score_reference=False, seed=1, sentence_avg=False, sentencepiece=False, shard_id=0, share_all_embeddings=True, share_decoder_input_output_embed=False, skip_invalid_size_inputs_valid_test=False, source_lang=None, stat_file='/results/run_log.json', target_bleu=0.0, target_lang=None, test_cased_bleu=False, train_subset='train', unkpen=0, unnormalized=False, update_freq=[1], valid_subset='valid', validate_interval=1, warmup_init_lr=0.0, warmup_updates=4000, weight_decay=0.0) +Namespace(adam_betas='(0.9, 0.997)', adam_eps=1e-09, adaptive_softmax_cutoff=None, amp=True, amp_level='O2', arch='transformer_wmt_en_de_big_t2t', attention_dropout=0.1, beam=4, bpe_codes=None, buffer_size=64, clip_norm=0.0, cpu=False, criterion='label_smoothed_cross_entropy', data='/data/wmt14_en_de_joined_dict', decoder_attention_heads=16, decoder_embed_dim=1024, decoder_embed_path=None, decoder_ffn_embed_dim=4096, decoder_layers=6, decoder_learned_pos=False, decoder_normalize_before=True, device_id=0, distributed_backend='nccl', distributed_init_method=None, distributed_port=-1, distributed_rank=0, distributed_world_size=1, do_sanity_check=False, dropout=0.1, enable_parallel_backward_allred_opt=False, enable_parallel_backward_allred_opt_correctness_check=False, encoder_attention_heads=16, encoder_embed_dim=1024, encoder_embed_path=None, encoder_ffn_embed_dim=4096, encoder_layers=6, encoder_learned_pos=False, encoder_normalize_before=True, fp16=False, fuse_dropout_add=False, fuse_layer_norm=True, fuse_relu_dropout=False, gen_subset='test', keep_interval_updates=-1, label_smoothing=0.1, left_pad_source=True, left_pad_target=False, lenpen=1, local_rank=0, log_interval=10, lr=[0.0006], lr_scheduler='inverse_sqrt', lr_shrink=0.1, max_epoch=1, max_len_a=0, max_len_b=200, max_positions=(1024, 1024), max_sentences=None, max_sentences_valid=None, max_source_positions=1024, max_target_positions=1024, max_tokens=5120, max_update=0, min_len=1, min_loss_scale=0.0001, min_lr=0.0, model_overrides='{}', momentum=0.99, nbest=1, no_beamable_mm=False, no_early_stop=False, no_epoch_checkpoints=True, no_save=False, no_token_positional_embeddings=False, num_shards=1, online_eval=False, optimizer='adam', pad_sequence=1, parallel_backward_allred_opt_threshold=0, path=None, prefix_size=0, print_alignment=False, profile=False, profiler_file=None, profiler_steps=100, quiet=False, raw_text=False, relu_dropout=0.1, remove_bpe=None, replace_unk=None, restore_file='checkpoint_last.pt', sampling=False, sampling_temperature=1, sampling_topk=-1, save_dir='/results', save_interval=1, save_interval_updates=0, save_predictions=False, score_reference=False, seed=1, sentence_avg=False, sentencepiece=False, shard_id=0, share_all_embeddings=True, share_decoder_input_output_embed=False, skip_invalid_size_inputs_valid_test=False, source_lang=None, stat_file='/results/run_log.json', target_bleu=0.0, target_lang=None, test_cased_bleu=False, train_subset='train', unkpen=0, unnormalized=False, update_freq=[1], valid_subset='valid', validate_interval=1, warmup_init_lr=0.0, warmup_updates=4000, weight_decay=0.0) | [en] dictionary: 33712 types | [de] dictionary: 33712 types | /data/wmt14_en_de_joined_dict train 4575637 examples @@ -39,96 +39,3171 @@ Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 512.0 Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 256.0 Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 128.0 Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 64.0 +Transformer | epoch 0 | step 10 |avg loss 16.126 |avg tokens 4746.000 |tokens/s 35038.571 |walltime 12.191 | +Transformer | epoch 0 | step 20 |avg loss 15.770 |avg tokens 4684.900 |tokens/s 33647.558 |walltime 13.583 | +Transformer | epoch 0 | step 30 |avg loss 15.009 |avg tokens 4825.900 |tokens/s 34159.802 |walltime 14.996 | +Transformer | epoch 0 | step 40 |avg loss 14.454 |avg tokens 4474.100 |tokens/s 33178.188 |walltime 16.344 | +Transformer | epoch 0 | step 50 |avg loss 13.947 |avg tokens 4491.300 |tokens/s 31753.365 |walltime 17.759 | +Transformer | epoch 0 | step 60 |avg loss 13.651 |avg tokens 4605.600 |tokens/s 33091.241 |walltime 19.151 | +Transformer | epoch 0 | step 70 |avg loss 13.348 |avg tokens 4797.300 |tokens/s 34942.870 |walltime 20.523 | +Transformer | epoch 0 | step 80 |avg loss 13.280 |avg tokens 4635.400 |tokens/s 34382.411 |walltime 21.872 | +Transformer | epoch 0 | step 90 |avg loss 13.005 |avg tokens 4480.700 |tokens/s 33355.492 |walltime 23.215 | +Transformer | epoch 0 | step 100 |avg loss 12.823 |avg tokens 4863.700 |tokens/s 35096.493 |walltime 24.601 | +Transformer | epoch 0 | step 110 |avg loss 12.749 |avg tokens 4102.000 |tokens/s 31155.690 |walltime 25.917 | +Transformer | epoch 0 | step 120 |avg loss 12.487 |avg tokens 4813.100 |tokens/s 34106.496 |walltime 27.329 | +Transformer | epoch 0 | step 130 |avg loss 12.542 |avg tokens 4012.100 |tokens/s 32296.169 |walltime 28.571 | +Transformer | epoch 0 | step 140 |avg loss 12.331 |avg tokens 4501.700 |tokens/s 30281.649 |walltime 30.058 | +Transformer | epoch 0 | step 150 |avg loss 12.247 |avg tokens 4478.300 |tokens/s 32942.441 |walltime 31.417 | +Transformer | epoch 0 | step 160 |avg loss 12.120 |avg tokens 4764.200 |tokens/s 34370.600 |walltime 32.803 | +Transformer | epoch 0 | step 170 |avg loss 12.058 |avg tokens 4419.500 |tokens/s 32407.459 |walltime 34.167 | +Transformer | epoch 0 | step 180 |avg loss 11.794 |avg tokens 4704.800 |tokens/s 33778.244 |walltime 35.560 | +Transformer | epoch 0 | step 190 |avg loss 11.786 |avg tokens 4298.700 |tokens/s 31521.934 |walltime 36.923 | +Transformer | epoch 0 | step 200 |avg loss 11.798 |avg tokens 4392.700 |tokens/s 33238.307 |walltime 38.245 | +Transformer | epoch 0 | step 210 |avg loss 11.835 |avg tokens 4111.300 |tokens/s 31215.326 |walltime 39.562 | +Transformer | epoch 0 | step 220 |avg loss 11.659 |avg tokens 4313.800 |tokens/s 31959.563 |walltime 40.912 | +Transformer | epoch 0 | step 230 |avg loss 11.517 |avg tokens 4865.800 |tokens/s 33261.408 |walltime 42.375 | Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 32.0 -Transformer | epoch 0 | step 500 |avg loss 11.929 |avg tokens 4553.874 |tokens/s 30151.741 |walltime 85.803 | +Transformer | epoch 0 | step 240 |avg loss 11.529 |avg tokens 4560.800 |tokens/s 33214.053 |walltime 43.748 | +Transformer | epoch 0 | step 250 |avg loss 11.790 |avg tokens 4366.200 |tokens/s 33111.064 |walltime 45.067 | +Transformer | epoch 0 | step 260 |avg loss 11.506 |avg tokens 4891.600 |tokens/s 34715.377 |walltime 46.476 | +Transformer | epoch 0 | step 270 |avg loss 11.312 |avg tokens 4857.800 |tokens/s 33278.178 |walltime 47.935 | +Transformer | epoch 0 | step 280 |avg loss 11.483 |avg tokens 4607.100 |tokens/s 33665.398 |walltime 49.304 | +Transformer | epoch 0 | step 290 |avg loss 11.686 |avg tokens 4808.600 |tokens/s 37783.748 |walltime 50.577 | +Transformer | epoch 0 | step 300 |avg loss 11.469 |avg tokens 3985.000 |tokens/s 30623.014 |walltime 51.878 | +Transformer | epoch 0 | step 310 |avg loss 11.169 |avg tokens 4752.000 |tokens/s 33436.313 |walltime 53.299 | +Transformer | epoch 0 | step 320 |avg loss 11.513 |avg tokens 4491.200 |tokens/s 33480.785 |walltime 54.640 | +Transformer | epoch 0 | step 330 |avg loss 11.302 |avg tokens 4595.600 |tokens/s 33653.251 |walltime 56.006 | +Transformer | epoch 0 | step 340 |avg loss 11.316 |avg tokens 4527.900 |tokens/s 31440.500 |walltime 57.446 | +Transformer | epoch 0 | step 350 |avg loss 11.020 |avg tokens 4840.000 |tokens/s 33913.716 |walltime 58.873 | +Transformer | epoch 0 | step 360 |avg loss 11.511 |avg tokens 4818.300 |tokens/s 35176.149 |walltime 60.243 | +Transformer | epoch 0 | step 370 |avg loss 11.293 |avg tokens 4466.600 |tokens/s 33062.992 |walltime 61.594 | +Transformer | epoch 0 | step 380 |avg loss 11.341 |avg tokens 4497.300 |tokens/s 32846.960 |walltime 62.963 | +Transformer | epoch 0 | step 390 |avg loss 11.247 |avg tokens 4554.000 |tokens/s 32269.633 |walltime 64.374 | +Transformer | epoch 0 | step 400 |avg loss 11.653 |avg tokens 4280.600 |tokens/s 32310.123 |walltime 65.699 | +Transformer | epoch 0 | step 410 |avg loss 11.334 |avg tokens 4490.200 |tokens/s 33496.966 |walltime 67.040 | +Transformer | epoch 0 | step 420 |avg loss 11.104 |avg tokens 4122.700 |tokens/s 30178.140 |walltime 68.406 | +Transformer | epoch 0 | step 430 |avg loss 11.144 |avg tokens 4399.400 |tokens/s 32006.086 |walltime 69.781 | +Transformer | epoch 0 | step 440 |avg loss 11.508 |avg tokens 4768.500 |tokens/s 35651.838 |walltime 71.118 | +Transformer | epoch 0 | step 450 |avg loss 11.424 |avg tokens 3972.100 |tokens/s 31119.198 |walltime 72.394 | +Transformer | epoch 0 | step 460 |avg loss 10.948 |avg tokens 5023.200 |tokens/s 33595.041 |walltime 73.890 | +Transformer | epoch 0 | step 470 |avg loss 11.135 |avg tokens 4543.900 |tokens/s 33703.496 |walltime 75.238 | +Transformer | epoch 0 | step 480 |avg loss 11.207 |avg tokens 4486.900 |tokens/s 32995.963 |walltime 76.598 | +Transformer | epoch 0 | step 490 |avg loss 11.245 |avg tokens 4767.300 |tokens/s 34667.245 |walltime 77.973 | +Transformer | epoch 0 | step 500 |avg loss 11.002 |avg tokens 4836.000 |tokens/s 33899.965 |walltime 79.399 | +Transformer | epoch 0 | step 510 |avg loss 11.269 |avg tokens 4769.600 |tokens/s 35429.579 |walltime 80.746 | +Transformer | epoch 0 | step 520 |avg loss 11.036 |avg tokens 4485.800 |tokens/s 33690.383 |walltime 82.077 | +Transformer | epoch 0 | step 530 |avg loss 10.964 |avg tokens 4718.800 |tokens/s 33832.329 |walltime 83.472 | +Transformer | epoch 0 | step 540 |avg loss 11.074 |avg tokens 4225.400 |tokens/s 32076.307 |walltime 84.789 | +Transformer | epoch 0 | step 550 |avg loss 11.007 |avg tokens 4537.400 |tokens/s 33559.775 |walltime 86.141 | +Transformer | epoch 0 | step 560 |avg loss 10.751 |avg tokens 4657.600 |tokens/s 33550.070 |walltime 87.530 | +Transformer | epoch 0 | step 570 |avg loss 10.834 |avg tokens 4406.300 |tokens/s 31874.870 |walltime 88.912 | +Transformer | epoch 0 | step 580 |avg loss 10.693 |avg tokens 4371.300 |tokens/s 31265.990 |walltime 90.310 | +Transformer | epoch 0 | step 590 |avg loss 10.742 |avg tokens 4391.900 |tokens/s 31394.991 |walltime 91.709 | +Transformer | epoch 0 | step 600 |avg loss 10.627 |avg tokens 4589.800 |tokens/s 33011.008 |walltime 93.099 | +Transformer | epoch 0 | step 610 |avg loss 10.767 |avg tokens 4798.900 |tokens/s 34847.248 |walltime 94.476 | +Transformer | epoch 0 | step 620 |avg loss 10.631 |avg tokens 4643.200 |tokens/s 33270.498 |walltime 95.872 | +Transformer | epoch 0 | step 630 |avg loss 10.842 |avg tokens 4290.200 |tokens/s 31839.011 |walltime 97.219 | +Transformer | epoch 0 | step 640 |avg loss 10.475 |avg tokens 4483.200 |tokens/s 31661.664 |walltime 98.635 | +Transformer | epoch 0 | step 650 |avg loss 10.846 |avg tokens 4729.900 |tokens/s 35205.221 |walltime 99.979 | +Transformer | epoch 0 | step 660 |avg loss 10.238 |avg tokens 4681.700 |tokens/s 34710.452 |walltime 101.328 | +Transformer | epoch 0 | step 670 |avg loss 10.669 |avg tokens 4824.000 |tokens/s 34783.967 |walltime 102.715 | +Transformer | epoch 0 | step 680 |avg loss 10.543 |avg tokens 4452.400 |tokens/s 32562.815 |walltime 104.082 | +Transformer | epoch 0 | step 690 |avg loss 10.600 |avg tokens 4292.800 |tokens/s 31421.308 |walltime 105.448 | +Transformer | epoch 0 | step 700 |avg loss 10.574 |avg tokens 4044.200 |tokens/s 30816.658 |walltime 106.761 | +Transformer | epoch 0 | step 710 |avg loss 10.364 |avg tokens 4747.600 |tokens/s 33733.493 |walltime 108.168 | Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 16.0 -Transformer | epoch 0 | step 1000 |avg loss 10.219 |avg tokens 4542.972 |tokens/s 30034.481 |walltime 161.433 | -Transformer | epoch 0 | step 1500 |avg loss 9.310 |avg tokens 4444.522 |tokens/s 29779.187 |walltime 236.057 | -Transformer | epoch 0 | step 2000 |avg loss 8.666 |avg tokens 4501.520 |tokens/s 30076.303 |walltime 310.892 | -Transformer | epoch 0 | step 2500 |avg loss 7.975 |avg tokens 4560.646 |tokens/s 30385.137 |walltime 385.940 | +Transformer | epoch 0 | step 720 |avg loss 10.516 |avg tokens 4380.400 |tokens/s 32143.674 |walltime 109.531 | +Transformer | epoch 0 | step 730 |avg loss 10.568 |avg tokens 3953.800 |tokens/s 29980.779 |walltime 110.849 | +Transformer | epoch 0 | step 740 |avg loss 10.555 |avg tokens 4533.800 |tokens/s 33274.287 |walltime 112.212 | +Transformer | epoch 0 | step 750 |avg loss 10.457 |avg tokens 4559.400 |tokens/s 32721.335 |walltime 113.605 | +Transformer | epoch 0 | step 760 |avg loss 10.249 |avg tokens 4836.800 |tokens/s 33730.740 |walltime 115.039 | +Transformer | epoch 0 | step 770 |avg loss 10.343 |avg tokens 4763.100 |tokens/s 34194.955 |walltime 116.432 | +Transformer | epoch 0 | step 780 |avg loss 10.395 |avg tokens 4412.300 |tokens/s 33118.190 |walltime 117.765 | +Transformer | epoch 0 | step 790 |avg loss 10.527 |avg tokens 4735.200 |tokens/s 34774.233 |walltime 119.126 | +Transformer | epoch 0 | step 800 |avg loss 10.108 |avg tokens 4525.600 |tokens/s 32368.493 |walltime 120.524 | +Transformer | epoch 0 | step 810 |avg loss 10.239 |avg tokens 4537.300 |tokens/s 33175.856 |walltime 121.892 | +Transformer | epoch 0 | step 820 |avg loss 10.265 |avg tokens 4601.600 |tokens/s 33712.621 |walltime 123.257 | +Transformer | epoch 0 | step 830 |avg loss 10.208 |avg tokens 4701.700 |tokens/s 33460.452 |walltime 124.662 | +Transformer | epoch 0 | step 840 |avg loss 10.351 |avg tokens 4362.600 |tokens/s 32459.191 |walltime 126.006 | +Transformer | epoch 0 | step 850 |avg loss 10.125 |avg tokens 4365.300 |tokens/s 32250.124 |walltime 127.360 | +Transformer | epoch 0 | step 860 |avg loss 9.974 |avg tokens 4655.900 |tokens/s 33270.347 |walltime 128.759 | +Transformer | epoch 0 | step 870 |avg loss 10.121 |avg tokens 4686.300 |tokens/s 33791.857 |walltime 130.146 | +Transformer | epoch 0 | step 880 |avg loss 10.011 |avg tokens 4841.900 |tokens/s 35836.069 |walltime 131.497 | +Transformer | epoch 0 | step 890 |avg loss 10.374 |avg tokens 4208.500 |tokens/s 32453.906 |walltime 132.794 | +Transformer | epoch 0 | step 900 |avg loss 10.275 |avg tokens 4439.500 |tokens/s 32694.333 |walltime 134.152 | +Transformer | epoch 0 | step 910 |avg loss 9.973 |avg tokens 4735.800 |tokens/s 33914.537 |walltime 135.548 | +Transformer | epoch 0 | step 920 |avg loss 10.000 |avg tokens 4521.000 |tokens/s 32537.690 |walltime 136.938 | +Transformer | epoch 0 | step 930 |avg loss 10.111 |avg tokens 4485.700 |tokens/s 33455.363 |walltime 138.279 | +Transformer | epoch 0 | step 940 |avg loss 9.921 |avg tokens 4760.800 |tokens/s 34896.032 |walltime 139.643 | +Transformer | epoch 0 | step 950 |avg loss 10.233 |avg tokens 3946.700 |tokens/s 31071.759 |walltime 140.913 | +Transformer | epoch 0 | step 960 |avg loss 10.103 |avg tokens 4745.200 |tokens/s 34734.370 |walltime 142.279 | +Transformer | epoch 0 | step 970 |avg loss 9.959 |avg tokens 4967.000 |tokens/s 35530.255 |walltime 143.677 | +Transformer | epoch 0 | step 980 |avg loss 9.936 |avg tokens 4290.400 |tokens/s 31348.897 |walltime 145.046 | +Transformer | epoch 0 | step 990 |avg loss 9.988 |avg tokens 4642.100 |tokens/s 35094.220 |walltime 146.368 | +Transformer | epoch 0 | step 1000 |avg loss 9.754 |avg tokens 4810.900 |tokens/s 34290.020 |walltime 147.771 | +Transformer | epoch 0 | step 1010 |avg loss 10.000 |avg tokens 3619.400 |tokens/s 29283.656 |walltime 149.007 | +Transformer | epoch 0 | step 1020 |avg loss 9.645 |avg tokens 4821.800 |tokens/s 33524.869 |walltime 150.446 | +Transformer | epoch 0 | step 1030 |avg loss 9.921 |avg tokens 4113.200 |tokens/s 31635.296 |walltime 151.746 | +Transformer | epoch 0 | step 1040 |avg loss 9.563 |avg tokens 4625.200 |tokens/s 32158.962 |walltime 153.184 | +Transformer | epoch 0 | step 1050 |avg loss 9.873 |avg tokens 4734.700 |tokens/s 35668.132 |walltime 154.512 | +Transformer | epoch 0 | step 1060 |avg loss 9.663 |avg tokens 4226.700 |tokens/s 31673.123 |walltime 155.846 | +Transformer | epoch 0 | step 1070 |avg loss 9.932 |avg tokens 4131.800 |tokens/s 30654.116 |walltime 157.194 | +Transformer | epoch 0 | step 1080 |avg loss 9.789 |avg tokens 4392.200 |tokens/s 33293.204 |walltime 158.513 | +Transformer | epoch 0 | step 1090 |avg loss 9.908 |avg tokens 4145.900 |tokens/s 31073.521 |walltime 159.847 | +Transformer | epoch 0 | step 1100 |avg loss 9.438 |avg tokens 4239.500 |tokens/s 30819.597 |walltime 161.223 | +Transformer | epoch 0 | step 1110 |avg loss 9.591 |avg tokens 4679.700 |tokens/s 33224.365 |walltime 162.632 | +Transformer | epoch 0 | step 1120 |avg loss 9.626 |avg tokens 4609.100 |tokens/s 33207.713 |walltime 164.020 | +Transformer | epoch 0 | step 1130 |avg loss 9.505 |avg tokens 4714.600 |tokens/s 34218.349 |walltime 165.397 | +Transformer | epoch 0 | step 1140 |avg loss 9.519 |avg tokens 4256.700 |tokens/s 31090.349 |walltime 166.766 | +Transformer | epoch 0 | step 1150 |avg loss 9.939 |avg tokens 4098.100 |tokens/s 31426.236 |walltime 168.070 | +Transformer | epoch 0 | step 1160 |avg loss 9.697 |avg tokens 3743.600 |tokens/s 28866.791 |walltime 169.367 | +Transformer | epoch 0 | step 1170 |avg loss 9.763 |avg tokens 4452.500 |tokens/s 33579.217 |walltime 170.693 | +Transformer | epoch 0 | step 1180 |avg loss 9.520 |avg tokens 4444.200 |tokens/s 32469.044 |walltime 172.062 | +Transformer | epoch 0 | step 1190 |avg loss 9.310 |avg tokens 4816.700 |tokens/s 34174.796 |walltime 173.472 | +Transformer | epoch 0 | step 1200 |avg loss 9.557 |avg tokens 4153.500 |tokens/s 30926.372 |walltime 174.815 | +Transformer | epoch 0 | step 1210 |avg loss 9.369 |avg tokens 4412.700 |tokens/s 33176.485 |walltime 176.145 | +Transformer | epoch 0 | step 1220 |avg loss 9.591 |avg tokens 4489.100 |tokens/s 32404.789 |walltime 177.530 | +Transformer | epoch 0 | step 1230 |avg loss 9.561 |avg tokens 4406.300 |tokens/s 32029.135 |walltime 178.906 | +Transformer | epoch 0 | step 1240 |avg loss 9.533 |avg tokens 4218.800 |tokens/s 31931.545 |walltime 180.227 | +Transformer | epoch 0 | step 1250 |avg loss 9.580 |avg tokens 4411.800 |tokens/s 33438.106 |walltime 181.546 | +Transformer | epoch 0 | step 1260 |avg loss 9.284 |avg tokens 4450.000 |tokens/s 30799.020 |walltime 182.991 | +Transformer | epoch 0 | step 1270 |avg loss 9.537 |avg tokens 4509.000 |tokens/s 32194.169 |walltime 184.392 | +Transformer | epoch 0 | step 1280 |avg loss 9.341 |avg tokens 4760.800 |tokens/s 33415.374 |walltime 185.816 | +Transformer | epoch 0 | step 1290 |avg loss 9.461 |avg tokens 4139.700 |tokens/s 30606.113 |walltime 187.169 | +Transformer | epoch 0 | step 1300 |avg loss 9.417 |avg tokens 4875.700 |tokens/s 33805.278 |walltime 188.611 | +Transformer | epoch 0 | step 1310 |avg loss 9.421 |avg tokens 4744.100 |tokens/s 34161.620 |walltime 190.000 | +Transformer | epoch 0 | step 1320 |avg loss 9.297 |avg tokens 4128.400 |tokens/s 30919.043 |walltime 191.335 | +Transformer | epoch 0 | step 1330 |avg loss 9.529 |avg tokens 4701.300 |tokens/s 32351.164 |walltime 192.788 | +Transformer | epoch 0 | step 1340 |avg loss 9.589 |avg tokens 4167.600 |tokens/s 30553.102 |walltime 194.153 | +Transformer | epoch 0 | step 1350 |avg loss 8.984 |avg tokens 4708.800 |tokens/s 31807.418 |walltime 195.633 | +Transformer | epoch 0 | step 1360 |avg loss 9.444 |avg tokens 4692.900 |tokens/s 33388.342 |walltime 197.038 | +Transformer | epoch 0 | step 1370 |avg loss 9.460 |avg tokens 4811.200 |tokens/s 33887.145 |walltime 198.458 | +Transformer | epoch 0 | step 1380 |avg loss 9.316 |avg tokens 4095.900 |tokens/s 30688.544 |walltime 199.793 | +Transformer | epoch 0 | step 1390 |avg loss 9.344 |avg tokens 4153.500 |tokens/s 30052.989 |walltime 201.175 | +Transformer | epoch 0 | step 1400 |avg loss 9.353 |avg tokens 4743.700 |tokens/s 33401.491 |walltime 202.595 | +Transformer | epoch 0 | step 1410 |avg loss 9.674 |avg tokens 4357.600 |tokens/s 33703.471 |walltime 203.888 | +Transformer | epoch 0 | step 1420 |avg loss 8.834 |avg tokens 4777.600 |tokens/s 33571.876 |walltime 205.311 | +Transformer | epoch 0 | step 1430 |avg loss 9.224 |avg tokens 4613.700 |tokens/s 32664.391 |walltime 206.724 | +Transformer | epoch 0 | step 1440 |avg loss 9.414 |avg tokens 4474.100 |tokens/s 32707.039 |walltime 208.092 | +Transformer | epoch 0 | step 1450 |avg loss 9.315 |avg tokens 4552.100 |tokens/s 33737.404 |walltime 209.441 | +Transformer | epoch 0 | step 1460 |avg loss 9.309 |avg tokens 4805.200 |tokens/s 34512.656 |walltime 210.833 | +Transformer | epoch 0 | step 1470 |avg loss 9.169 |avg tokens 4808.300 |tokens/s 33589.590 |walltime 212.265 | +Transformer | epoch 0 | step 1480 |avg loss 9.134 |avg tokens 4686.200 |tokens/s 31567.958 |walltime 213.749 | +Transformer | epoch 0 | step 1490 |avg loss 9.217 |avg tokens 4018.100 |tokens/s 28899.444 |walltime 215.140 | +Transformer | epoch 0 | step 1500 |avg loss 9.361 |avg tokens 4492.800 |tokens/s 32461.990 |walltime 216.524 | +Transformer | epoch 0 | step 1510 |avg loss 9.389 |avg tokens 4327.400 |tokens/s 32516.897 |walltime 217.854 | +Transformer | epoch 0 | step 1520 |avg loss 8.971 |avg tokens 4669.500 |tokens/s 33150.063 |walltime 219.263 | +Transformer | epoch 0 | step 1530 |avg loss 8.802 |avg tokens 4531.100 |tokens/s 32868.701 |walltime 220.642 | +Transformer | epoch 0 | step 1540 |avg loss 9.508 |avg tokens 4215.400 |tokens/s 31847.129 |walltime 221.965 | +Transformer | epoch 0 | step 1550 |avg loss 9.310 |avg tokens 4473.100 |tokens/s 32664.238 |walltime 223.335 | +Transformer | epoch 0 | step 1560 |avg loss 8.898 |avg tokens 4819.700 |tokens/s 35423.473 |walltime 224.695 | +Transformer | epoch 0 | step 1570 |avg loss 8.805 |avg tokens 4379.700 |tokens/s 31692.652 |walltime 226.077 | +Transformer | epoch 0 | step 1580 |avg loss 8.789 |avg tokens 4669.200 |tokens/s 33448.639 |walltime 227.473 | +Transformer | epoch 0 | step 1590 |avg loss 9.126 |avg tokens 4354.100 |tokens/s 32170.916 |walltime 228.827 | +Transformer | epoch 0 | step 1600 |avg loss 9.401 |avg tokens 3894.400 |tokens/s 30593.178 |walltime 230.099 | +Transformer | epoch 0 | step 1610 |avg loss 9.375 |avg tokens 4300.400 |tokens/s 32759.204 |walltime 231.412 | +Transformer | epoch 0 | step 1620 |avg loss 9.007 |avg tokens 4232.900 |tokens/s 31383.322 |walltime 232.761 | +Transformer | epoch 0 | step 1630 |avg loss 8.846 |avg tokens 4570.000 |tokens/s 32881.085 |walltime 234.151 | +Transformer | epoch 0 | step 1640 |avg loss 9.243 |avg tokens 4848.400 |tokens/s 36209.602 |walltime 235.490 | +Transformer | epoch 0 | step 1650 |avg loss 8.983 |avg tokens 4497.100 |tokens/s 32279.773 |walltime 236.883 | +Transformer | epoch 0 | step 1660 |avg loss 9.032 |avg tokens 4492.200 |tokens/s 33233.399 |walltime 238.235 | +Transformer | epoch 0 | step 1670 |avg loss 8.858 |avg tokens 4453.700 |tokens/s 32450.211 |walltime 239.607 | +Transformer | epoch 0 | step 1680 |avg loss 8.843 |avg tokens 4243.300 |tokens/s 31947.867 |walltime 240.935 | +Transformer | epoch 0 | step 1690 |avg loss 9.009 |avg tokens 4299.500 |tokens/s 31036.814 |walltime 242.321 | +Transformer | epoch 0 | step 1700 |avg loss 8.372 |avg tokens 4318.700 |tokens/s 31934.493 |walltime 243.673 | +Transformer | epoch 0 | step 1710 |avg loss 9.384 |avg tokens 4495.800 |tokens/s 33432.547 |walltime 245.018 | +Transformer | epoch 0 | step 1720 |avg loss 9.066 |avg tokens 4581.900 |tokens/s 33788.315 |walltime 246.374 | +Transformer | epoch 0 | step 1730 |avg loss 9.198 |avg tokens 4460.300 |tokens/s 33758.617 |walltime 247.695 | +Transformer | epoch 0 | step 1740 |avg loss 8.846 |avg tokens 4639.400 |tokens/s 33750.010 |walltime 249.070 | +Transformer | epoch 0 | step 1750 |avg loss 8.389 |avg tokens 4440.200 |tokens/s 31012.164 |walltime 250.502 | +Transformer | epoch 0 | step 1760 |avg loss 8.794 |avg tokens 4516.800 |tokens/s 33170.144 |walltime 251.863 | +Transformer | epoch 0 | step 1770 |avg loss 9.018 |avg tokens 4522.000 |tokens/s 33697.826 |walltime 253.205 | +Transformer | epoch 0 | step 1780 |avg loss 8.996 |avg tokens 4501.000 |tokens/s 32585.303 |walltime 254.586 | +Transformer | epoch 0 | step 1790 |avg loss 9.172 |avg tokens 4117.000 |tokens/s 31761.249 |walltime 255.883 | +Transformer | epoch 0 | step 1800 |avg loss 8.527 |avg tokens 4558.700 |tokens/s 32183.146 |walltime 257.299 | +Transformer | epoch 0 | step 1810 |avg loss 9.026 |avg tokens 4614.500 |tokens/s 33958.940 |walltime 258.658 | +Transformer | epoch 0 | step 1820 |avg loss 8.675 |avg tokens 4472.500 |tokens/s 33527.721 |walltime 259.992 | +Transformer | epoch 0 | step 1830 |avg loss 8.932 |avg tokens 4461.200 |tokens/s 33113.280 |walltime 261.339 | +Transformer | epoch 0 | step 1840 |avg loss 8.660 |avg tokens 4369.700 |tokens/s 31720.443 |walltime 262.717 | +Transformer | epoch 0 | step 1850 |avg loss 8.720 |avg tokens 4823.600 |tokens/s 34585.173 |walltime 264.112 | +Transformer | epoch 0 | step 1860 |avg loss 8.484 |avg tokens 4683.200 |tokens/s 33101.316 |walltime 265.526 | +Transformer | epoch 0 | step 1870 |avg loss 9.073 |avg tokens 4443.800 |tokens/s 34893.113 |walltime 266.800 | +Transformer | epoch 0 | step 1880 |avg loss 8.815 |avg tokens 4682.600 |tokens/s 33882.464 |walltime 268.182 | +Transformer | epoch 0 | step 1890 |avg loss 8.678 |avg tokens 4498.200 |tokens/s 33251.079 |walltime 269.535 | +Transformer | epoch 0 | step 1900 |avg loss 8.352 |avg tokens 4576.800 |tokens/s 32924.035 |walltime 270.925 | +Transformer | epoch 0 | step 1910 |avg loss 8.127 |avg tokens 4909.700 |tokens/s 33876.966 |walltime 272.374 | +Transformer | epoch 0 | step 1920 |avg loss 8.689 |avg tokens 4542.900 |tokens/s 33518.232 |walltime 273.729 | +Transformer | epoch 0 | step 1930 |avg loss 8.434 |avg tokens 4310.000 |tokens/s 31194.560 |walltime 275.111 | +Transformer | epoch 0 | step 1940 |avg loss 8.534 |avg tokens 4631.200 |tokens/s 33105.689 |walltime 276.510 | +Transformer | epoch 0 | step 1950 |avg loss 8.670 |avg tokens 4509.200 |tokens/s 33644.493 |walltime 277.850 | +Transformer | epoch 0 | step 1960 |avg loss 8.420 |avg tokens 4483.900 |tokens/s 32649.596 |walltime 279.224 | +Transformer | epoch 0 | step 1970 |avg loss 8.113 |avg tokens 4791.100 |tokens/s 32415.623 |walltime 280.702 | +Transformer | epoch 0 | step 1980 |avg loss 9.095 |avg tokens 4380.000 |tokens/s 33130.124 |walltime 282.024 | +Transformer | epoch 0 | step 1990 |avg loss 8.388 |avg tokens 4672.000 |tokens/s 33311.430 |walltime 283.426 | +Transformer | epoch 0 | step 2000 |avg loss 8.486 |avg tokens 4797.000 |tokens/s 34360.600 |walltime 284.822 | +Transformer | epoch 0 | step 2010 |avg loss 8.171 |avg tokens 4725.900 |tokens/s 34628.583 |walltime 286.187 | +Transformer | epoch 0 | step 2020 |avg loss 8.122 |avg tokens 4814.000 |tokens/s 33731.050 |walltime 287.614 | +Transformer | epoch 0 | step 2030 |avg loss 8.055 |avg tokens 4819.800 |tokens/s 34453.098 |walltime 289.013 | +Transformer | epoch 0 | step 2040 |avg loss 8.678 |avg tokens 4704.800 |tokens/s 34568.674 |walltime 290.374 | +Transformer | epoch 0 | step 2050 |avg loss 8.512 |avg tokens 4561.100 |tokens/s 33809.340 |walltime 291.723 | +Transformer | epoch 0 | step 2060 |avg loss 8.102 |avg tokens 4978.400 |tokens/s 35326.729 |walltime 293.133 | +Transformer | epoch 0 | step 2070 |avg loss 8.600 |avg tokens 4304.400 |tokens/s 31737.176 |walltime 294.489 | +Transformer | epoch 0 | step 2080 |avg loss 8.197 |avg tokens 4396.100 |tokens/s 32339.155 |walltime 295.848 | +Transformer | epoch 0 | step 2090 |avg loss 8.153 |avg tokens 4608.100 |tokens/s 33178.925 |walltime 297.237 | +Transformer | epoch 0 | step 2100 |avg loss 8.528 |avg tokens 4488.200 |tokens/s 33128.190 |walltime 298.592 | +Transformer | epoch 0 | step 2110 |avg loss 8.745 |avg tokens 4167.900 |tokens/s 31933.857 |walltime 299.897 | +Transformer | epoch 0 | step 2120 |avg loss 8.121 |avg tokens 4437.600 |tokens/s 31807.171 |walltime 301.292 | +Transformer | epoch 0 | step 2130 |avg loss 8.385 |avg tokens 4524.900 |tokens/s 32950.924 |walltime 302.665 | +Transformer | epoch 0 | step 2140 |avg loss 8.299 |avg tokens 4858.100 |tokens/s 34230.189 |walltime 304.085 | +Transformer | epoch 0 | step 2150 |avg loss 8.655 |avg tokens 4491.400 |tokens/s 33855.816 |walltime 305.411 | +Transformer | epoch 0 | step 2160 |avg loss 8.318 |avg tokens 4675.100 |tokens/s 34407.212 |walltime 306.770 | +Transformer | epoch 0 | step 2170 |avg loss 8.376 |avg tokens 4459.400 |tokens/s 32708.943 |walltime 308.133 | +Transformer | epoch 0 | step 2180 |avg loss 8.313 |avg tokens 4088.200 |tokens/s 31019.347 |walltime 309.451 | +Transformer | epoch 0 | step 2190 |avg loss 8.036 |avg tokens 4880.600 |tokens/s 32846.162 |walltime 310.937 | +Transformer | epoch 0 | step 2200 |avg loss 8.406 |avg tokens 4615.800 |tokens/s 33746.403 |walltime 312.305 | +Transformer | epoch 0 | step 2210 |avg loss 7.841 |avg tokens 4213.400 |tokens/s 30084.656 |walltime 313.706 | +Transformer | epoch 0 | step 2220 |avg loss 8.294 |avg tokens 4809.100 |tokens/s 34516.298 |walltime 315.099 | +Transformer | epoch 0 | step 2230 |avg loss 7.755 |avg tokens 4847.200 |tokens/s 33652.713 |walltime 316.539 | +Transformer | epoch 0 | step 2240 |avg loss 7.905 |avg tokens 4871.200 |tokens/s 34969.990 |walltime 317.932 | +Transformer | epoch 0 | step 2250 |avg loss 7.781 |avg tokens 4487.800 |tokens/s 31500.850 |walltime 319.357 | +Transformer | epoch 0 | step 2260 |avg loss 7.634 |avg tokens 4374.500 |tokens/s 30105.686 |walltime 320.810 | +Transformer | epoch 0 | step 2270 |avg loss 8.269 |avg tokens 4482.900 |tokens/s 31474.004 |walltime 322.234 | +Transformer | epoch 0 | step 2280 |avg loss 7.950 |avg tokens 4619.200 |tokens/s 32114.390 |walltime 323.673 | +Transformer | epoch 0 | step 2290 |avg loss 7.941 |avg tokens 4869.600 |tokens/s 34127.827 |walltime 325.099 | +Transformer | epoch 0 | step 2300 |avg loss 8.123 |avg tokens 4270.100 |tokens/s 31544.024 |walltime 326.453 | +Transformer | epoch 0 | step 2310 |avg loss 8.163 |avg tokens 4845.700 |tokens/s 35150.439 |walltime 327.832 | +Transformer | epoch 0 | step 2320 |avg loss 8.365 |avg tokens 3956.200 |tokens/s 29406.734 |walltime 329.177 | +Transformer | epoch 0 | step 2330 |avg loss 8.393 |avg tokens 4580.200 |tokens/s 33870.141 |walltime 330.529 | +Transformer | epoch 0 | step 2340 |avg loss 8.414 |avg tokens 3958.800 |tokens/s 29516.318 |walltime 331.871 | +Transformer | epoch 0 | step 2350 |avg loss 8.193 |avg tokens 4497.200 |tokens/s 32695.095 |walltime 333.246 | +Transformer | epoch 0 | step 2360 |avg loss 8.143 |avg tokens 4488.300 |tokens/s 33657.540 |walltime 334.580 | +Transformer | epoch 0 | step 2370 |avg loss 7.613 |avg tokens 4794.400 |tokens/s 34347.426 |walltime 335.975 | +Transformer | epoch 0 | step 2380 |avg loss 8.002 |avg tokens 4752.000 |tokens/s 33539.001 |walltime 337.392 | +Transformer | epoch 0 | step 2390 |avg loss 8.041 |avg tokens 4304.100 |tokens/s 30262.708 |walltime 338.815 | +Transformer | epoch 0 | step 2400 |avg loss 8.026 |avg tokens 4257.000 |tokens/s 31389.358 |walltime 340.171 | +Transformer | epoch 0 | step 2410 |avg loss 7.873 |avg tokens 4707.900 |tokens/s 34204.251 |walltime 341.547 | +Transformer | epoch 0 | step 2420 |avg loss 7.938 |avg tokens 4799.400 |tokens/s 34823.597 |walltime 342.925 | +Transformer | epoch 0 | step 2430 |avg loss 8.522 |avg tokens 4149.300 |tokens/s 31745.835 |walltime 344.232 | +Transformer | epoch 0 | step 2440 |avg loss 7.590 |avg tokens 4544.300 |tokens/s 32075.235 |walltime 345.649 | +Transformer | epoch 0 | step 2450 |avg loss 7.795 |avg tokens 4848.400 |tokens/s 33993.205 |walltime 347.076 | +Transformer | epoch 0 | step 2460 |avg loss 8.234 |avg tokens 4652.000 |tokens/s 34952.151 |walltime 348.406 | +Transformer | epoch 0 | step 2470 |avg loss 7.248 |avg tokens 4695.100 |tokens/s 33390.542 |walltime 349.813 | +Transformer | epoch 0 | step 2480 |avg loss 8.235 |avg tokens 4430.800 |tokens/s 33232.125 |walltime 351.146 | +Transformer | epoch 0 | step 2490 |avg loss 7.872 |avg tokens 4577.000 |tokens/s 33708.692 |walltime 352.504 | +Transformer | epoch 0 | step 2500 |avg loss 8.159 |avg tokens 4749.400 |tokens/s 35782.446 |walltime 353.831 | +Transformer | epoch 0 | step 2510 |avg loss 7.633 |avg tokens 4614.400 |tokens/s 32243.205 |walltime 355.262 | +Transformer | epoch 0 | step 2520 |avg loss 7.584 |avg tokens 4528.500 |tokens/s 31773.106 |walltime 356.687 | +Transformer | epoch 0 | step 2530 |avg loss 7.509 |avg tokens 4942.600 |tokens/s 35740.372 |walltime 358.070 | +Transformer | epoch 0 | step 2540 |avg loss 8.352 |avg tokens 4209.700 |tokens/s 31289.904 |walltime 359.416 | +Transformer | epoch 0 | step 2550 |avg loss 7.797 |avg tokens 4218.800 |tokens/s 30715.591 |walltime 360.789 | +Transformer | epoch 0 | step 2560 |avg loss 7.981 |avg tokens 4196.600 |tokens/s 31893.721 |walltime 362.105 | +Transformer | epoch 0 | step 2570 |avg loss 8.450 |avg tokens 4717.000 |tokens/s 34909.646 |walltime 363.456 | +Transformer | epoch 0 | step 2580 |avg loss 8.080 |avg tokens 4259.000 |tokens/s 31154.977 |walltime 364.823 | +Transformer | epoch 0 | step 2590 |avg loss 7.697 |avg tokens 4557.600 |tokens/s 33330.875 |walltime 366.191 | +Transformer | epoch 0 | step 2600 |avg loss 7.990 |avg tokens 4531.900 |tokens/s 32950.274 |walltime 367.566 | +Transformer | epoch 0 | step 2610 |avg loss 8.172 |avg tokens 4809.800 |tokens/s 35167.471 |walltime 368.934 | +Transformer | epoch 0 | step 2620 |avg loss 7.490 |avg tokens 4689.500 |tokens/s 32758.264 |walltime 370.365 | +Transformer | epoch 0 | step 2630 |avg loss 7.590 |avg tokens 4837.000 |tokens/s 34670.630 |walltime 371.760 | +Transformer | epoch 0 | step 2640 |avg loss 7.495 |avg tokens 4858.600 |tokens/s 35449.313 |walltime 373.131 | +Transformer | epoch 0 | step 2650 |avg loss 7.535 |avg tokens 4916.600 |tokens/s 34870.433 |walltime 374.541 | +Transformer | epoch 0 | step 2660 |avg loss 7.889 |avg tokens 4636.300 |tokens/s 34221.673 |walltime 375.896 | +Transformer | epoch 0 | step 2670 |avg loss 7.533 |avg tokens 4866.200 |tokens/s 34540.100 |walltime 377.305 | +Transformer | epoch 0 | step 2680 |avg loss 7.830 |avg tokens 4668.700 |tokens/s 34887.919 |walltime 378.643 | +Transformer | epoch 0 | step 2690 |avg loss 7.486 |avg tokens 4719.500 |tokens/s 32859.344 |walltime 380.079 | +Transformer | epoch 0 | step 2700 |avg loss 6.972 |avg tokens 4715.200 |tokens/s 33239.218 |walltime 381.498 | +Transformer | epoch 0 | step 2710 |avg loss 8.274 |avg tokens 4519.100 |tokens/s 34766.434 |walltime 382.798 | +Transformer | epoch 0 | step 2720 |avg loss 7.951 |avg tokens 4620.000 |tokens/s 33938.052 |walltime 384.159 | +Transformer | epoch 0 | step 2730 |avg loss 7.713 |avg tokens 4460.700 |tokens/s 32929.884 |walltime 385.513 | +Transformer | epoch 0 | step 2740 |avg loss 7.332 |avg tokens 4468.000 |tokens/s 32346.095 |walltime 386.895 | +Transformer | epoch 0 | step 2750 |avg loss 7.626 |avg tokens 4680.800 |tokens/s 32972.895 |walltime 388.314 | +Transformer | epoch 0 | step 2760 |avg loss 7.051 |avg tokens 4638.400 |tokens/s 32336.052 |walltime 389.749 | Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 16.0 -Transformer | epoch 0 | step 3000 |avg loss 7.589 |avg tokens 4581.066 |tokens/s 30327.427 |walltime 461.466 | -Transformer | epoch 0 | step 3500 |avg loss 7.380 |avg tokens 4509.460 |tokens/s 30196.587 |walltime 536.135 | -Transformer | epoch 0 | step 4000 |avg loss 7.277 |avg tokens 4461.996 |tokens/s 30072.400 |walltime 610.322 | -Transformer | epoch 0 | step 4500 |avg loss 7.163 |avg tokens 4593.226 |tokens/s 30906.309 |walltime 684.631 | -Transformer | epoch 0 | step 5000 |avg loss 7.294 |avg tokens 4504.228 |tokens/s 30265.299 |walltime 759.044 | +Transformer | epoch 0 | step 2770 |avg loss 7.931 |avg tokens 4048.100 |tokens/s 31358.904 |walltime 391.040 | +Transformer | epoch 0 | step 2780 |avg loss 7.093 |avg tokens 4778.100 |tokens/s 34109.005 |walltime 392.441 | +Transformer | epoch 0 | step 2790 |avg loss 8.213 |avg tokens 3867.800 |tokens/s 29065.350 |walltime 393.771 | +Transformer | epoch 0 | step 2800 |avg loss 8.368 |avg tokens 4538.000 |tokens/s 34155.619 |walltime 395.100 | +Transformer | epoch 0 | step 2810 |avg loss 8.258 |avg tokens 4381.000 |tokens/s 32895.748 |walltime 396.432 | +Transformer | epoch 0 | step 2820 |avg loss 7.887 |avg tokens 4419.200 |tokens/s 30584.038 |walltime 397.877 | +Transformer | epoch 0 | step 2830 |avg loss 7.609 |avg tokens 4910.600 |tokens/s 35902.030 |walltime 399.244 | +Transformer | epoch 0 | step 2840 |avg loss 7.111 |avg tokens 4520.400 |tokens/s 32068.476 |walltime 400.654 | +Transformer | epoch 0 | step 2850 |avg loss 7.169 |avg tokens 4584.200 |tokens/s 32399.969 |walltime 402.069 | +Transformer | epoch 0 | step 2860 |avg loss 7.508 |avg tokens 4672.600 |tokens/s 34072.400 |walltime 403.440 | +Transformer | epoch 0 | step 2870 |avg loss 8.167 |avg tokens 4281.600 |tokens/s 32470.528 |walltime 404.759 | +Transformer | epoch 0 | step 2880 |avg loss 7.708 |avg tokens 4483.200 |tokens/s 32676.675 |walltime 406.131 | +Transformer | epoch 0 | step 2890 |avg loss 7.530 |avg tokens 4717.300 |tokens/s 34102.619 |walltime 407.514 | +Transformer | epoch 0 | step 2900 |avg loss 7.137 |avg tokens 4648.500 |tokens/s 33343.547 |walltime 408.908 | +Transformer | epoch 0 | step 2910 |avg loss 7.919 |avg tokens 4441.700 |tokens/s 32659.298 |walltime 410.268 | +Transformer | epoch 0 | step 2920 |avg loss 7.457 |avg tokens 4958.700 |tokens/s 36156.479 |walltime 411.640 | +Transformer | epoch 0 | step 2930 |avg loss 7.436 |avg tokens 4812.300 |tokens/s 33924.556 |walltime 413.058 | +Transformer | epoch 0 | step 2940 |avg loss 7.072 |avg tokens 4592.800 |tokens/s 32267.930 |walltime 414.482 | +Transformer | epoch 0 | step 2950 |avg loss 7.826 |avg tokens 4451.900 |tokens/s 32418.681 |walltime 415.855 | +Transformer | epoch 0 | step 2960 |avg loss 7.419 |avg tokens 4556.400 |tokens/s 33337.415 |walltime 417.222 | +Transformer | epoch 0 | step 2970 |avg loss 7.328 |avg tokens 4738.000 |tokens/s 33381.255 |walltime 418.641 | +Transformer | epoch 0 | step 2980 |avg loss 7.796 |avg tokens 4520.100 |tokens/s 33684.822 |walltime 419.983 | +Transformer | epoch 0 | step 2990 |avg loss 7.350 |avg tokens 4483.200 |tokens/s 32648.642 |walltime 421.356 | +Transformer | epoch 0 | step 3000 |avg loss 7.114 |avg tokens 4767.100 |tokens/s 33806.731 |walltime 422.766 | +Transformer | epoch 0 | step 3010 |avg loss 8.015 |avg tokens 4051.600 |tokens/s 30483.761 |walltime 424.095 | +Transformer | epoch 0 | step 3020 |avg loss 7.407 |avg tokens 4708.000 |tokens/s 34186.289 |walltime 425.472 | +Transformer | epoch 0 | step 3030 |avg loss 7.735 |avg tokens 4462.500 |tokens/s 33275.468 |walltime 426.813 | +Transformer | epoch 0 | step 3040 |avg loss 8.014 |avg tokens 4632.900 |tokens/s 34140.011 |walltime 428.171 | +Transformer | epoch 0 | step 3050 |avg loss 7.157 |avg tokens 4756.600 |tokens/s 32456.709 |walltime 429.636 | +Transformer | epoch 0 | step 3060 |avg loss 7.096 |avg tokens 4733.800 |tokens/s 34275.017 |walltime 431.017 | +Transformer | epoch 0 | step 3070 |avg loss 7.658 |avg tokens 3856.500 |tokens/s 29607.535 |walltime 432.320 | +Transformer | epoch 0 | step 3080 |avg loss 7.368 |avg tokens 4400.200 |tokens/s 31204.605 |walltime 433.730 | +Transformer | epoch 0 | step 3090 |avg loss 6.836 |avg tokens 4792.600 |tokens/s 33407.813 |walltime 435.164 | +Transformer | epoch 0 | step 3100 |avg loss 6.803 |avg tokens 4781.700 |tokens/s 34391.868 |walltime 436.555 | +Transformer | epoch 0 | step 3110 |avg loss 6.582 |avg tokens 4752.800 |tokens/s 32056.115 |walltime 438.037 | +Transformer | epoch 0 | step 3120 |avg loss 7.675 |avg tokens 4234.800 |tokens/s 31688.885 |walltime 439.374 | +Transformer | epoch 0 | step 3130 |avg loss 7.573 |avg tokens 4025.100 |tokens/s 29714.967 |walltime 440.728 | +Transformer | epoch 0 | step 3140 |avg loss 7.778 |avg tokens 4137.600 |tokens/s 31505.319 |walltime 442.042 | +Transformer | epoch 0 | step 3150 |avg loss 6.823 |avg tokens 4786.500 |tokens/s 33850.115 |walltime 443.456 | +Transformer | epoch 0 | step 3160 |avg loss 7.023 |avg tokens 4760.600 |tokens/s 34356.453 |walltime 444.841 | +Transformer | epoch 0 | step 3170 |avg loss 7.599 |avg tokens 4462.700 |tokens/s 32761.347 |walltime 446.204 | +Transformer | epoch 0 | step 3180 |avg loss 7.854 |avg tokens 4450.100 |tokens/s 33395.225 |walltime 447.536 | +Transformer | epoch 0 | step 3190 |avg loss 7.985 |avg tokens 3842.800 |tokens/s 30069.988 |walltime 448.814 | +Transformer | epoch 0 | step 3200 |avg loss 7.287 |avg tokens 4288.300 |tokens/s 31142.476 |walltime 450.191 | +Transformer | epoch 0 | step 3210 |avg loss 7.237 |avg tokens 4634.700 |tokens/s 33636.890 |walltime 451.569 | +Transformer | epoch 0 | step 3220 |avg loss 6.952 |avg tokens 4686.100 |tokens/s 32797.921 |walltime 452.998 | +Transformer | epoch 0 | step 3230 |avg loss 7.531 |avg tokens 4234.700 |tokens/s 32440.934 |walltime 454.303 | +Transformer | epoch 0 | step 3240 |avg loss 7.860 |avg tokens 4545.400 |tokens/s 34359.268 |walltime 455.626 | +Transformer | epoch 0 | step 3250 |avg loss 7.103 |avg tokens 4806.800 |tokens/s 34032.915 |walltime 457.038 | +Transformer | epoch 0 | step 3260 |avg loss 7.277 |avg tokens 4752.200 |tokens/s 34622.551 |walltime 458.411 | +Transformer | epoch 0 | step 3270 |avg loss 7.522 |avg tokens 4409.400 |tokens/s 33359.393 |walltime 459.733 | +Transformer | epoch 0 | step 3280 |avg loss 6.787 |avg tokens 4706.300 |tokens/s 33375.253 |walltime 461.143 | +Transformer | epoch 0 | step 3290 |avg loss 6.642 |avg tokens 4719.900 |tokens/s 32835.231 |walltime 462.580 | +Transformer | epoch 0 | step 3300 |avg loss 7.541 |avg tokens 4489.000 |tokens/s 34013.844 |walltime 463.900 | +Transformer | epoch 0 | step 3310 |avg loss 7.674 |avg tokens 4264.300 |tokens/s 32162.838 |walltime 465.226 | +Transformer | epoch 0 | step 3320 |avg loss 7.795 |avg tokens 4098.600 |tokens/s 30844.632 |walltime 466.555 | +Transformer | epoch 0 | step 3330 |avg loss 6.995 |avg tokens 4858.800 |tokens/s 34510.329 |walltime 467.963 | +Transformer | epoch 0 | step 3340 |avg loss 7.839 |avg tokens 4445.100 |tokens/s 34616.481 |walltime 469.247 | +Transformer | epoch 0 | step 3350 |avg loss 6.933 |avg tokens 4368.500 |tokens/s 31556.629 |walltime 470.631 | +Transformer | epoch 0 | step 3360 |avg loss 7.044 |avg tokens 4671.500 |tokens/s 32886.412 |walltime 472.052 | +Transformer | epoch 0 | step 3370 |avg loss 7.356 |avg tokens 4650.600 |tokens/s 33556.556 |walltime 473.438 | +Transformer | epoch 0 | step 3380 |avg loss 7.533 |avg tokens 4269.000 |tokens/s 31649.809 |walltime 474.786 | +Transformer | epoch 0 | step 3390 |avg loss 7.253 |avg tokens 4604.400 |tokens/s 34398.886 |walltime 476.125 | +Transformer | epoch 0 | step 3400 |avg loss 7.210 |avg tokens 4721.500 |tokens/s 34003.528 |walltime 477.513 | +Transformer | epoch 0 | step 3410 |avg loss 7.375 |avg tokens 4245.100 |tokens/s 31176.693 |walltime 478.875 | +Transformer | epoch 0 | step 3420 |avg loss 7.515 |avg tokens 4493.800 |tokens/s 33424.157 |walltime 480.220 | +Transformer | epoch 0 | step 3430 |avg loss 7.273 |avg tokens 4389.600 |tokens/s 32660.503 |walltime 481.564 | +Transformer | epoch 0 | step 3440 |avg loss 7.081 |avg tokens 4570.700 |tokens/s 33195.097 |walltime 482.940 | +Transformer | epoch 0 | step 3450 |avg loss 6.626 |avg tokens 4399.500 |tokens/s 29810.251 |walltime 484.416 | +Transformer | epoch 0 | step 3460 |avg loss 6.934 |avg tokens 4854.500 |tokens/s 33558.176 |walltime 485.863 | +Transformer | epoch 0 | step 3470 |avg loss 7.238 |avg tokens 4634.500 |tokens/s 34211.566 |walltime 487.218 | +Transformer | epoch 0 | step 3480 |avg loss 7.880 |avg tokens 4730.900 |tokens/s 35686.212 |walltime 488.543 | +Transformer | epoch 0 | step 3490 |avg loss 7.124 |avg tokens 4415.000 |tokens/s 31414.186 |walltime 489.949 | +Transformer | epoch 0 | step 3500 |avg loss 7.123 |avg tokens 4884.900 |tokens/s 35511.490 |walltime 491.324 | +Transformer | epoch 0 | step 3510 |avg loss 6.737 |avg tokens 4468.800 |tokens/s 31805.516 |walltime 492.729 | +Transformer | epoch 0 | step 3520 |avg loss 7.529 |avg tokens 4194.200 |tokens/s 33044.664 |walltime 493.999 | +Transformer | epoch 0 | step 3530 |avg loss 6.742 |avg tokens 4820.000 |tokens/s 33549.059 |walltime 495.435 | +Transformer | epoch 0 | step 3540 |avg loss 7.086 |avg tokens 4648.200 |tokens/s 34265.930 |walltime 496.792 | +Transformer | epoch 0 | step 3550 |avg loss 7.187 |avg tokens 4660.900 |tokens/s 33798.500 |walltime 498.171 | +Transformer | epoch 0 | step 3560 |avg loss 6.790 |avg tokens 4922.800 |tokens/s 34274.136 |walltime 499.607 | +Transformer | epoch 0 | step 3570 |avg loss 7.310 |avg tokens 3913.300 |tokens/s 29938.527 |walltime 500.914 | +Transformer | epoch 0 | step 3580 |avg loss 7.155 |avg tokens 4667.700 |tokens/s 33793.329 |walltime 502.295 | +Transformer | epoch 0 | step 3590 |avg loss 7.040 |avg tokens 4084.400 |tokens/s 30304.483 |walltime 503.643 | +Transformer | epoch 0 | step 3600 |avg loss 6.995 |avg tokens 4491.500 |tokens/s 33215.150 |walltime 504.996 | +Transformer | epoch 0 | step 3610 |avg loss 6.585 |avg tokens 4800.000 |tokens/s 32732.731 |walltime 506.462 | +Transformer | epoch 0 | step 3620 |avg loss 6.549 |avg tokens 4767.200 |tokens/s 34518.637 |walltime 507.843 | +Transformer | epoch 0 | step 3630 |avg loss 6.860 |avg tokens 4487.500 |tokens/s 32332.868 |walltime 509.231 | +Transformer | epoch 0 | step 3640 |avg loss 6.897 |avg tokens 4341.900 |tokens/s 30919.364 |walltime 510.635 | +Transformer | epoch 0 | step 3650 |avg loss 6.567 |avg tokens 4433.600 |tokens/s 32403.850 |walltime 512.003 | +Transformer | epoch 0 | step 3660 |avg loss 7.501 |avg tokens 4456.700 |tokens/s 32946.502 |walltime 513.356 | +Transformer | epoch 0 | step 3670 |avg loss 7.245 |avg tokens 4502.700 |tokens/s 32583.347 |walltime 514.738 | +Transformer | epoch 0 | step 3680 |avg loss 7.503 |avg tokens 4347.000 |tokens/s 32299.108 |walltime 516.084 | +Transformer | epoch 0 | step 3690 |avg loss 7.477 |avg tokens 3903.700 |tokens/s 30279.754 |walltime 517.373 | +Transformer | epoch 0 | step 3700 |avg loss 7.805 |avg tokens 4325.200 |tokens/s 33502.999 |walltime 518.664 | +Transformer | epoch 0 | step 3710 |avg loss 7.744 |avg tokens 4150.000 |tokens/s 31217.011 |walltime 519.994 | +Transformer | epoch 0 | step 3720 |avg loss 7.811 |avg tokens 3794.700 |tokens/s 29666.120 |walltime 521.273 | +Transformer | epoch 0 | step 3730 |avg loss 6.851 |avg tokens 4585.400 |tokens/s 32851.268 |walltime 522.668 | +Transformer | epoch 0 | step 3740 |avg loss 7.134 |avg tokens 4395.900 |tokens/s 32804.928 |walltime 524.008 | +Transformer | epoch 0 | step 3750 |avg loss 6.869 |avg tokens 4758.000 |tokens/s 34041.969 |walltime 525.406 | +Transformer | epoch 0 | step 3760 |avg loss 6.761 |avg tokens 4435.200 |tokens/s 32343.691 |walltime 526.777 | +Transformer | epoch 0 | step 3770 |avg loss 7.139 |avg tokens 4706.800 |tokens/s 34890.211 |walltime 528.126 | +Transformer | epoch 0 | step 3780 |avg loss 7.318 |avg tokens 3980.400 |tokens/s 30201.504 |walltime 529.444 | +Transformer | epoch 0 | step 3790 |avg loss 6.595 |avg tokens 4795.000 |tokens/s 33587.752 |walltime 530.872 | +Transformer | epoch 0 | step 3800 |avg loss 6.870 |avg tokens 4566.500 |tokens/s 33339.372 |walltime 532.242 | +Transformer | epoch 0 | step 3810 |avg loss 7.230 |avg tokens 4135.300 |tokens/s 31603.943 |walltime 533.550 | +Transformer | epoch 0 | step 3820 |avg loss 7.314 |avg tokens 4482.500 |tokens/s 33334.689 |walltime 534.895 | +Transformer | epoch 0 | step 3830 |avg loss 7.200 |avg tokens 4706.400 |tokens/s 34495.895 |walltime 536.259 | +Transformer | epoch 0 | step 3840 |avg loss 6.611 |avg tokens 4627.200 |tokens/s 32860.454 |walltime 537.667 | +Transformer | epoch 0 | step 3850 |avg loss 6.667 |avg tokens 4498.700 |tokens/s 32003.512 |walltime 539.073 | +Transformer | epoch 0 | step 3860 |avg loss 7.144 |avg tokens 4257.300 |tokens/s 31725.238 |walltime 540.415 | +Transformer | epoch 0 | step 3870 |avg loss 7.424 |avg tokens 4747.900 |tokens/s 35744.909 |walltime 541.743 | +Transformer | epoch 0 | step 3880 |avg loss 6.491 |avg tokens 4851.800 |tokens/s 33916.642 |walltime 543.174 | +Transformer | epoch 0 | step 3890 |avg loss 7.497 |avg tokens 4493.200 |tokens/s 33632.230 |walltime 544.510 | +Transformer | epoch 0 | step 3900 |avg loss 7.400 |avg tokens 4631.700 |tokens/s 32946.761 |walltime 545.916 | +Transformer | epoch 0 | step 3910 |avg loss 7.071 |avg tokens 4482.600 |tokens/s 32279.337 |walltime 547.304 | +Transformer | epoch 0 | step 3920 |avg loss 6.460 |avg tokens 4790.700 |tokens/s 33289.042 |walltime 548.743 | +Transformer | epoch 0 | step 3930 |avg loss 7.308 |avg tokens 3834.100 |tokens/s 30141.113 |walltime 550.015 | +Transformer | epoch 0 | step 3940 |avg loss 6.737 |avg tokens 4577.900 |tokens/s 32170.602 |walltime 551.438 | +Transformer | epoch 0 | step 3950 |avg loss 6.743 |avg tokens 4858.100 |tokens/s 34757.411 |walltime 552.836 | +Transformer | epoch 0 | step 3960 |avg loss 7.441 |avg tokens 4213.900 |tokens/s 32575.547 |walltime 554.130 | +Transformer | epoch 0 | step 3970 |avg loss 6.877 |avg tokens 4324.600 |tokens/s 32693.877 |walltime 555.453 | +Transformer | epoch 0 | step 3980 |avg loss 7.460 |avg tokens 4116.600 |tokens/s 30693.990 |walltime 556.794 | +Transformer | epoch 0 | step 3990 |avg loss 6.944 |avg tokens 4243.300 |tokens/s 32278.886 |walltime 558.108 | +Transformer | epoch 0 | step 4000 |avg loss 6.721 |avg tokens 4820.800 |tokens/s 34926.693 |walltime 559.489 | +Transformer | epoch 0 | step 4010 |avg loss 6.686 |avg tokens 4525.100 |tokens/s 32689.315 |walltime 560.873 | +Transformer | epoch 0 | step 4020 |avg loss 6.411 |avg tokens 4737.600 |tokens/s 34357.975 |walltime 562.252 | +Transformer | epoch 0 | step 4030 |avg loss 7.351 |avg tokens 4585.600 |tokens/s 35532.105 |walltime 563.542 | +Transformer | epoch 0 | step 4040 |avg loss 7.014 |avg tokens 4099.900 |tokens/s 30785.490 |walltime 564.874 | +Transformer | epoch 0 | step 4050 |avg loss 7.016 |avg tokens 4795.300 |tokens/s 34877.425 |walltime 566.249 | +Transformer | epoch 0 | step 4060 |avg loss 6.579 |avg tokens 4415.300 |tokens/s 31925.983 |walltime 567.632 | +Transformer | epoch 0 | step 4070 |avg loss 7.076 |avg tokens 4089.100 |tokens/s 31426.231 |walltime 568.933 | +Transformer | epoch 0 | step 4080 |avg loss 6.892 |avg tokens 4679.100 |tokens/s 33748.445 |walltime 570.320 | +Transformer | epoch 0 | step 4090 |avg loss 6.381 |avg tokens 4786.400 |tokens/s 33645.798 |walltime 571.742 | +Transformer | epoch 0 | step 4100 |avg loss 7.068 |avg tokens 4735.300 |tokens/s 34169.775 |walltime 573.128 | +Transformer | epoch 0 | step 4110 |avg loss 7.094 |avg tokens 4287.200 |tokens/s 31580.112 |walltime 574.486 | +Transformer | epoch 0 | step 4120 |avg loss 7.020 |avg tokens 4061.600 |tokens/s 30696.523 |walltime 575.809 | +Transformer | epoch 0 | step 4130 |avg loss 6.941 |avg tokens 4720.200 |tokens/s 34004.465 |walltime 577.197 | +Transformer | epoch 0 | step 4140 |avg loss 7.035 |avg tokens 4554.500 |tokens/s 33670.000 |walltime 578.550 | +Transformer | epoch 0 | step 4150 |avg loss 6.721 |avg tokens 4460.500 |tokens/s 32779.322 |walltime 579.910 | +Transformer | epoch 0 | step 4160 |avg loss 6.885 |avg tokens 4707.200 |tokens/s 34400.872 |walltime 581.279 | +Transformer | epoch 0 | step 4170 |avg loss 6.692 |avg tokens 4741.800 |tokens/s 34867.658 |walltime 582.639 | +Transformer | epoch 0 | step 4180 |avg loss 6.294 |avg tokens 4937.400 |tokens/s 33717.476 |walltime 584.103 | +Transformer | epoch 0 | step 4190 |avg loss 7.326 |avg tokens 4397.300 |tokens/s 33767.389 |walltime 585.405 | +Transformer | epoch 0 | step 4200 |avg loss 7.064 |avg tokens 4302.000 |tokens/s 31320.798 |walltime 586.779 | +Transformer | epoch 0 | step 4210 |avg loss 7.156 |avg tokens 4660.800 |tokens/s 33929.279 |walltime 588.152 | +Transformer | epoch 0 | step 4220 |avg loss 6.775 |avg tokens 4817.500 |tokens/s 34920.899 |walltime 589.532 | +Transformer | epoch 0 | step 4230 |avg loss 6.618 |avg tokens 4532.400 |tokens/s 32097.764 |walltime 590.944 | +Transformer | epoch 0 | step 4240 |avg loss 6.775 |avg tokens 4726.500 |tokens/s 33966.872 |walltime 592.336 | +Transformer | epoch 0 | step 4250 |avg loss 6.757 |avg tokens 4230.000 |tokens/s 31281.449 |walltime 593.688 | +Transformer | epoch 0 | step 4260 |avg loss 6.879 |avg tokens 4632.500 |tokens/s 33842.005 |walltime 595.057 | +Transformer | epoch 0 | step 4270 |avg loss 6.433 |avg tokens 4711.200 |tokens/s 34010.094 |walltime 596.442 | +Transformer | epoch 0 | step 4280 |avg loss 6.726 |avg tokens 4783.600 |tokens/s 34780.169 |walltime 597.817 | +Transformer | epoch 0 | step 4290 |avg loss 6.867 |avg tokens 4682.600 |tokens/s 34593.937 |walltime 599.171 | +Transformer | epoch 0 | step 4300 |avg loss 6.640 |avg tokens 4710.200 |tokens/s 33749.780 |walltime 600.567 | +Transformer | epoch 0 | step 4310 |avg loss 6.682 |avg tokens 4259.900 |tokens/s 31289.534 |walltime 601.928 | +Transformer | epoch 0 | step 4320 |avg loss 6.327 |avg tokens 4786.200 |tokens/s 33866.266 |walltime 603.341 | +Transformer | epoch 0 | step 4330 |avg loss 6.243 |avg tokens 4976.100 |tokens/s 35114.276 |walltime 604.758 | +Transformer | epoch 0 | step 4340 |avg loss 6.396 |avg tokens 4897.300 |tokens/s 35344.350 |walltime 606.144 | +Transformer | epoch 0 | step 4350 |avg loss 6.564 |avg tokens 4771.600 |tokens/s 34776.257 |walltime 607.516 | +Transformer | epoch 0 | step 4360 |avg loss 6.675 |avg tokens 4546.000 |tokens/s 33304.576 |walltime 608.881 | +Transformer | epoch 0 | step 4370 |avg loss 6.208 |avg tokens 4741.500 |tokens/s 33771.149 |walltime 610.285 | +Transformer | epoch 0 | step 4380 |avg loss 6.141 |avg tokens 4860.000 |tokens/s 34571.471 |walltime 611.691 | +Transformer | epoch 0 | step 4390 |avg loss 6.562 |avg tokens 4632.600 |tokens/s 33430.933 |walltime 613.077 | +Transformer | epoch 0 | step 4400 |avg loss 6.546 |avg tokens 4809.600 |tokens/s 34362.037 |walltime 614.476 | +Transformer | epoch 0 | step 4410 |avg loss 6.391 |avg tokens 4511.400 |tokens/s 33067.518 |walltime 615.841 | +Transformer | epoch 0 | step 4420 |avg loss 6.672 |avg tokens 4736.800 |tokens/s 33830.694 |walltime 617.241 | +Transformer | epoch 0 | step 4430 |avg loss 6.680 |avg tokens 4425.300 |tokens/s 32283.434 |walltime 618.611 | +Transformer | epoch 0 | step 4440 |avg loss 6.745 |avg tokens 4341.600 |tokens/s 31828.452 |walltime 619.976 | +Transformer | epoch 0 | step 4450 |avg loss 6.359 |avg tokens 4696.700 |tokens/s 33814.738 |walltime 621.364 | +Transformer | epoch 0 | step 4460 |avg loss 6.935 |avg tokens 4009.700 |tokens/s 30092.613 |walltime 622.697 | +Transformer | epoch 0 | step 4470 |avg loss 6.585 |avg tokens 4660.400 |tokens/s 33035.867 |walltime 624.108 | +Transformer | epoch 0 | step 4480 |avg loss 6.772 |avg tokens 4635.900 |tokens/s 33905.288 |walltime 625.475 | +Transformer | epoch 0 | step 4490 |avg loss 7.039 |avg tokens 4497.800 |tokens/s 33390.669 |walltime 626.822 | +Transformer | epoch 0 | step 4500 |avg loss 6.258 |avg tokens 4759.200 |tokens/s 33632.382 |walltime 628.237 | +Transformer | epoch 0 | step 4510 |avg loss 6.214 |avg tokens 4565.000 |tokens/s 31526.353 |walltime 629.685 | +Transformer | epoch 0 | step 4520 |avg loss 6.735 |avg tokens 4777.600 |tokens/s 33857.317 |walltime 631.096 | +Transformer | epoch 0 | step 4530 |avg loss 6.277 |avg tokens 4808.800 |tokens/s 34561.055 |walltime 632.488 | +Transformer | epoch 0 | step 4540 |avg loss 6.162 |avg tokens 4684.800 |tokens/s 33404.880 |walltime 633.890 | +Transformer | epoch 0 | step 4550 |avg loss 7.171 |avg tokens 4766.500 |tokens/s 35597.284 |walltime 635.229 | +Transformer | epoch 0 | step 4560 |avg loss 6.559 |avg tokens 4534.300 |tokens/s 33034.328 |walltime 636.602 | +Transformer | epoch 0 | step 4570 |avg loss 6.405 |avg tokens 4792.000 |tokens/s 34489.690 |walltime 637.991 | +Transformer | epoch 0 | step 4580 |avg loss 6.429 |avg tokens 4493.700 |tokens/s 32349.849 |walltime 639.380 | +Transformer | epoch 0 | step 4590 |avg loss 6.620 |avg tokens 4432.400 |tokens/s 31676.877 |walltime 640.779 | +Transformer | epoch 0 | step 4600 |avg loss 6.727 |avg tokens 4740.000 |tokens/s 35088.916 |walltime 642.130 | +Transformer | epoch 0 | step 4610 |avg loss 6.411 |avg tokens 4532.700 |tokens/s 32168.254 |walltime 643.539 | +Transformer | epoch 0 | step 4620 |avg loss 6.685 |avg tokens 4344.100 |tokens/s 32695.910 |walltime 644.868 | +Transformer | epoch 0 | step 4630 |avg loss 7.441 |avg tokens 4440.600 |tokens/s 34300.411 |walltime 646.163 | +Transformer | epoch 0 | step 4640 |avg loss 7.260 |avg tokens 4544.200 |tokens/s 35104.719 |walltime 647.457 | +Transformer | epoch 0 | step 4650 |avg loss 6.515 |avg tokens 4390.800 |tokens/s 31793.906 |walltime 648.838 | +Transformer | epoch 0 | step 4660 |avg loss 6.328 |avg tokens 4532.100 |tokens/s 33556.624 |walltime 650.189 | +Transformer | epoch 0 | step 4670 |avg loss 7.202 |avg tokens 4505.100 |tokens/s 34883.458 |walltime 651.480 | +Transformer | epoch 0 | step 4680 |avg loss 6.721 |avg tokens 4507.200 |tokens/s 34030.827 |walltime 652.805 | +Transformer | epoch 0 | step 4690 |avg loss 6.742 |avg tokens 4162.100 |tokens/s 30832.605 |walltime 654.154 | +Transformer | epoch 0 | step 4700 |avg loss 7.388 |avg tokens 3964.900 |tokens/s 30354.989 |walltime 655.461 | +Transformer | epoch 0 | step 4710 |avg loss 6.418 |avg tokens 4451.900 |tokens/s 32697.497 |walltime 656.822 | +Transformer | epoch 0 | step 4720 |avg loss 6.304 |avg tokens 4472.900 |tokens/s 33149.989 |walltime 658.172 | +Transformer | epoch 0 | step 4730 |avg loss 6.522 |avg tokens 4342.900 |tokens/s 31195.389 |walltime 659.564 | +Transformer | epoch 0 | step 4740 |avg loss 6.369 |avg tokens 4516.100 |tokens/s 33063.002 |walltime 660.930 | +Transformer | epoch 0 | step 4750 |avg loss 6.561 |avg tokens 4667.000 |tokens/s 33294.457 |walltime 662.331 | +Transformer | epoch 0 | step 4760 |avg loss 6.445 |avg tokens 4367.900 |tokens/s 31512.085 |walltime 663.717 | +Transformer | epoch 0 | step 4770 |avg loss 6.093 |avg tokens 4503.300 |tokens/s 32426.222 |walltime 665.106 | +Transformer | epoch 0 | step 4780 |avg loss 7.105 |avg tokens 4380.900 |tokens/s 32076.885 |walltime 666.472 | +Transformer | epoch 0 | step 4790 |avg loss 6.236 |avg tokens 4590.400 |tokens/s 32673.389 |walltime 667.877 | +Transformer | epoch 0 | step 4800 |avg loss 7.056 |avg tokens 4484.400 |tokens/s 33353.738 |walltime 669.221 | +Transformer | epoch 0 | step 4810 |avg loss 7.085 |avg tokens 4613.600 |tokens/s 34110.949 |walltime 670.574 | +Transformer | epoch 0 | step 4820 |avg loss 5.904 |avg tokens 4752.800 |tokens/s 33080.168 |walltime 672.011 | +Transformer | epoch 0 | step 4830 |avg loss 6.605 |avg tokens 4666.100 |tokens/s 33646.222 |walltime 673.398 | +Transformer | epoch 0 | step 4840 |avg loss 5.968 |avg tokens 4764.800 |tokens/s 34590.492 |walltime 674.775 | +Transformer | epoch 0 | step 4850 |avg loss 6.661 |avg tokens 4130.200 |tokens/s 31287.214 |walltime 676.095 | +Transformer | epoch 0 | step 4860 |avg loss 7.008 |avg tokens 4481.900 |tokens/s 33627.715 |walltime 677.428 | +Transformer | epoch 0 | step 4870 |avg loss 6.312 |avg tokens 4849.400 |tokens/s 35513.360 |walltime 678.793 | +Transformer | epoch 0 | step 4880 |avg loss 6.734 |avg tokens 4705.900 |tokens/s 35157.499 |walltime 680.132 | +Transformer | epoch 0 | step 4890 |avg loss 6.738 |avg tokens 3942.700 |tokens/s 29689.665 |walltime 681.460 | +Transformer | epoch 0 | step 4900 |avg loss 7.018 |avg tokens 3877.800 |tokens/s 29288.295 |walltime 682.784 | +Transformer | epoch 0 | step 4910 |avg loss 6.891 |avg tokens 4907.200 |tokens/s 36464.416 |walltime 684.130 | +Transformer | epoch 0 | step 4920 |avg loss 7.069 |avg tokens 4641.700 |tokens/s 35190.271 |walltime 685.449 | +Transformer | epoch 0 | step 4930 |avg loss 6.637 |avg tokens 4803.900 |tokens/s 33979.378 |walltime 686.863 | +Transformer | epoch 0 | step 4940 |avg loss 6.879 |avg tokens 4372.600 |tokens/s 33482.361 |walltime 688.168 | +Transformer | epoch 0 | step 4950 |avg loss 6.491 |avg tokens 4342.200 |tokens/s 32655.369 |walltime 689.498 | +Transformer | epoch 0 | step 4960 |avg loss 6.764 |avg tokens 4705.800 |tokens/s 33269.089 |walltime 690.913 | +Transformer | epoch 0 | step 4970 |avg loss 6.827 |avg tokens 4300.200 |tokens/s 31556.679 |walltime 692.275 | +Transformer | epoch 0 | step 4980 |avg loss 7.304 |avg tokens 3810.800 |tokens/s 30233.752 |walltime 693.536 | +Transformer | epoch 0 | step 4990 |avg loss 7.026 |avg tokens 4337.000 |tokens/s 33112.103 |walltime 694.846 | +Transformer | epoch 0 | step 5000 |avg loss 6.325 |avg tokens 4908.200 |tokens/s 34290.025 |walltime 696.277 | +Transformer | epoch 0 | step 5010 |avg loss 5.970 |avg tokens 4392.100 |tokens/s 31726.168 |walltime 697.661 | +Transformer | epoch 0 | step 5020 |avg loss 5.821 |avg tokens 4881.600 |tokens/s 33181.244 |walltime 699.133 | +Transformer | epoch 0 | step 5030 |avg loss 6.425 |avg tokens 4556.000 |tokens/s 33485.022 |walltime 700.493 | +Transformer | epoch 0 | step 5040 |avg loss 6.742 |avg tokens 4568.500 |tokens/s 33682.826 |walltime 701.849 | +Transformer | epoch 0 | step 5050 |avg loss 6.751 |avg tokens 4718.600 |tokens/s 34494.709 |walltime 703.217 | +Transformer | epoch 0 | step 5060 |avg loss 6.773 |avg tokens 4252.500 |tokens/s 31466.148 |walltime 704.569 | +Transformer | epoch 0 | step 5070 |avg loss 7.482 |avg tokens 4179.300 |tokens/s 31704.394 |walltime 705.887 | +Transformer | epoch 0 | step 5080 |avg loss 6.232 |avg tokens 4591.000 |tokens/s 33331.475 |walltime 707.264 | +Transformer | epoch 0 | step 5090 |avg loss 6.886 |avg tokens 4182.600 |tokens/s 30894.604 |walltime 708.618 | +Transformer | epoch 0 | step 5100 |avg loss 6.597 |avg tokens 4366.500 |tokens/s 32706.871 |walltime 709.953 | +Transformer | epoch 0 | step 5110 |avg loss 6.004 |avg tokens 4669.500 |tokens/s 32374.525 |walltime 711.396 | +Transformer | epoch 0 | step 5120 |avg loss 6.492 |avg tokens 4357.700 |tokens/s 32076.976 |walltime 712.754 | +Transformer | epoch 0 | step 5130 |avg loss 6.408 |avg tokens 4765.400 |tokens/s 33757.980 |walltime 714.166 | +Transformer | epoch 0 | step 5140 |avg loss 6.204 |avg tokens 4533.800 |tokens/s 33146.480 |walltime 715.534 | +Transformer | epoch 0 | step 5150 |avg loss 6.485 |avg tokens 4537.100 |tokens/s 32583.289 |walltime 716.926 | +Transformer | epoch 0 | step 5160 |avg loss 6.211 |avg tokens 4629.900 |tokens/s 31973.264 |walltime 718.374 | +Transformer | epoch 0 | step 5170 |avg loss 6.250 |avg tokens 4836.000 |tokens/s 34848.418 |walltime 719.762 | +Transformer | epoch 0 | step 5180 |avg loss 6.728 |avg tokens 4406.600 |tokens/s 32848.741 |walltime 721.103 | +Transformer | epoch 0 | step 5190 |avg loss 6.201 |avg tokens 4882.500 |tokens/s 34375.386 |walltime 722.524 | +Transformer | epoch 0 | step 5200 |avg loss 6.852 |avg tokens 4506.200 |tokens/s 32386.283 |walltime 723.915 | +Transformer | epoch 0 | step 5210 |avg loss 6.863 |avg tokens 4086.800 |tokens/s 30949.696 |walltime 725.236 | +Transformer | epoch 0 | step 5220 |avg loss 6.337 |avg tokens 4533.300 |tokens/s 32565.528 |walltime 726.628 | +Transformer | epoch 0 | step 5230 |avg loss 6.697 |avg tokens 4540.300 |tokens/s 33906.317 |walltime 727.967 | +Transformer | epoch 0 | step 5240 |avg loss 6.212 |avg tokens 4854.900 |tokens/s 35201.992 |walltime 729.346 | +Transformer | epoch 0 | step 5250 |avg loss 7.036 |avg tokens 4628.300 |tokens/s 33889.781 |walltime 730.712 | +Transformer | epoch 0 | step 5260 |avg loss 6.785 |avg tokens 4877.400 |tokens/s 36331.127 |walltime 732.054 | +Transformer | epoch 0 | step 5270 |avg loss 5.979 |avg tokens 4707.200 |tokens/s 33231.269 |walltime 733.471 | +Transformer | epoch 0 | step 5280 |avg loss 6.419 |avg tokens 4299.100 |tokens/s 32599.489 |walltime 734.789 | +Transformer | epoch 0 | step 5290 |avg loss 7.462 |avg tokens 4819.600 |tokens/s 38133.741 |walltime 736.053 | +Transformer | epoch 0 | step 5300 |avg loss 6.671 |avg tokens 4576.400 |tokens/s 32925.710 |walltime 737.443 | +Transformer | epoch 0 | step 5310 |avg loss 6.267 |avg tokens 4565.000 |tokens/s 33479.204 |walltime 738.807 | +Transformer | epoch 0 | step 5320 |avg loss 5.991 |avg tokens 4465.000 |tokens/s 32357.367 |walltime 740.187 | +Transformer | epoch 0 | step 5330 |avg loss 6.912 |avg tokens 4213.900 |tokens/s 31755.055 |walltime 741.514 | +Transformer | epoch 0 | step 5340 |avg loss 6.476 |avg tokens 4311.200 |tokens/s 31435.076 |walltime 742.885 | +Transformer | epoch 0 | step 5350 |avg loss 5.940 |avg tokens 4700.800 |tokens/s 32495.952 |walltime 744.332 | +Transformer | epoch 0 | step 5360 |avg loss 6.107 |avg tokens 4734.300 |tokens/s 33398.020 |walltime 745.749 | +Transformer | epoch 0 | step 5370 |avg loss 5.952 |avg tokens 4915.100 |tokens/s 35085.425 |walltime 747.150 | +Transformer | epoch 0 | step 5380 |avg loss 6.302 |avg tokens 4554.600 |tokens/s 32856.845 |walltime 748.536 | +Transformer | epoch 0 | step 5390 |avg loss 6.790 |avg tokens 4544.200 |tokens/s 33879.843 |walltime 749.878 | +Transformer | epoch 0 | step 5400 |avg loss 6.977 |avg tokens 4242.700 |tokens/s 32843.586 |walltime 751.169 | +Transformer | epoch 0 | step 5410 |avg loss 6.533 |avg tokens 4043.400 |tokens/s 31307.057 |walltime 752.461 | +Transformer | epoch 0 | step 5420 |avg loss 6.427 |avg tokens 4282.200 |tokens/s 31739.374 |walltime 753.810 | +Transformer | epoch 0 | step 5430 |avg loss 6.344 |avg tokens 4971.500 |tokens/s 35303.378 |walltime 755.218 | +Transformer | epoch 0 | step 5440 |avg loss 5.684 |avg tokens 4505.600 |tokens/s 31516.082 |walltime 756.648 | +Transformer | epoch 0 | step 5450 |avg loss 5.851 |avg tokens 4814.400 |tokens/s 34867.599 |walltime 758.029 | +Transformer | epoch 0 | step 5460 |avg loss 6.770 |avg tokens 4751.300 |tokens/s 34913.233 |walltime 759.390 | +Transformer | epoch 0 | step 5470 |avg loss 6.286 |avg tokens 4740.500 |tokens/s 32912.721 |walltime 760.830 | +Transformer | epoch 0 | step 5480 |avg loss 6.591 |avg tokens 4551.500 |tokens/s 34013.618 |walltime 762.168 | +Transformer | epoch 0 | step 5490 |avg loss 6.068 |avg tokens 4830.100 |tokens/s 34891.090 |walltime 763.552 | +Transformer | epoch 0 | step 5500 |avg loss 6.544 |avg tokens 3980.900 |tokens/s 29906.560 |walltime 764.883 | +Transformer | epoch 0 | step 5510 |avg loss 6.257 |avg tokens 4471.000 |tokens/s 31757.101 |walltime 766.291 | +Transformer | epoch 0 | step 5520 |avg loss 5.824 |avg tokens 4612.600 |tokens/s 32113.868 |walltime 767.728 | +Transformer | epoch 0 | step 5530 |avg loss 6.028 |avg tokens 4534.000 |tokens/s 32894.670 |walltime 769.106 | +Transformer | epoch 0 | step 5540 |avg loss 6.510 |avg tokens 4762.200 |tokens/s 34724.407 |walltime 770.477 | +Transformer | epoch 0 | step 5550 |avg loss 6.209 |avg tokens 4795.800 |tokens/s 35087.391 |walltime 771.844 | +Transformer | epoch 0 | step 5560 |avg loss 5.999 |avg tokens 4810.800 |tokens/s 34568.943 |walltime 773.236 | +Transformer | epoch 0 | step 5570 |avg loss 6.973 |avg tokens 4233.300 |tokens/s 31693.414 |walltime 774.572 | +Transformer | epoch 0 | step 5580 |avg loss 6.871 |avg tokens 3960.100 |tokens/s 29117.456 |walltime 775.932 | +Transformer | epoch 0 | step 5590 |avg loss 5.769 |avg tokens 4765.600 |tokens/s 33757.401 |walltime 777.343 | +Transformer | epoch 0 | step 5600 |avg loss 6.149 |avg tokens 4353.600 |tokens/s 30562.441 |walltime 778.768 | +Transformer | epoch 0 | step 5610 |avg loss 6.709 |avg tokens 4202.100 |tokens/s 32455.949 |walltime 780.063 | +Transformer | epoch 0 | step 5620 |avg loss 6.139 |avg tokens 4680.900 |tokens/s 33872.319 |walltime 781.445 | +Transformer | epoch 0 | step 5630 |avg loss 6.390 |avg tokens 4967.100 |tokens/s 35575.601 |walltime 782.841 | +Transformer | epoch 0 | step 5640 |avg loss 6.209 |avg tokens 4590.700 |tokens/s 33033.427 |walltime 784.231 | +Transformer | epoch 0 | step 5650 |avg loss 6.103 |avg tokens 4399.300 |tokens/s 31495.136 |walltime 785.627 | +Transformer | epoch 0 | step 5660 |avg loss 6.805 |avg tokens 4178.000 |tokens/s 32221.081 |walltime 786.924 | +Transformer | epoch 0 | step 5670 |avg loss 5.891 |avg tokens 4821.600 |tokens/s 32547.276 |walltime 788.405 | +Transformer | epoch 0 | step 5680 |avg loss 5.833 |avg tokens 4656.000 |tokens/s 32680.990 |walltime 789.830 | +Transformer | epoch 0 | step 5690 |avg loss 6.158 |avg tokens 4441.600 |tokens/s 32755.544 |walltime 791.186 | +Transformer | epoch 0 | step 5700 |avg loss 6.645 |avg tokens 4514.700 |tokens/s 33164.692 |walltime 792.547 | +Transformer | epoch 0 | step 5710 |avg loss 6.313 |avg tokens 4524.300 |tokens/s 33461.975 |walltime 793.899 | +Transformer | epoch 0 | step 5720 |avg loss 6.262 |avg tokens 4968.000 |tokens/s 35352.173 |walltime 795.305 | +Transformer | epoch 0 | step 5730 |avg loss 5.530 |avg tokens 4988.700 |tokens/s 34810.302 |walltime 796.738 | +Transformer | epoch 0 | step 5740 |avg loss 6.486 |avg tokens 4547.000 |tokens/s 32766.429 |walltime 798.126 | +Transformer | epoch 0 | step 5750 |avg loss 6.304 |avg tokens 4774.100 |tokens/s 35505.614 |walltime 799.470 | +Transformer | epoch 0 | step 5760 |avg loss 6.330 |avg tokens 4508.100 |tokens/s 32324.670 |walltime 800.865 | +Transformer | epoch 0 | step 5770 |avg loss 6.541 |avg tokens 4745.100 |tokens/s 34489.001 |walltime 802.241 | +Transformer | epoch 0 | step 5780 |avg loss 5.746 |avg tokens 4813.100 |tokens/s 33700.839 |walltime 803.669 | +Transformer | epoch 0 | step 5790 |avg loss 6.412 |avg tokens 4738.500 |tokens/s 35573.229 |walltime 805.001 | +Transformer | epoch 0 | step 5800 |avg loss 6.051 |avg tokens 4603.000 |tokens/s 34040.281 |walltime 806.353 | +Transformer | epoch 0 | step 5810 |avg loss 6.475 |avg tokens 3978.900 |tokens/s 30711.898 |walltime 807.649 | +Transformer | epoch 0 | step 5820 |avg loss 6.154 |avg tokens 4246.100 |tokens/s 30681.392 |walltime 809.033 | +Transformer | epoch 0 | step 5830 |avg loss 6.222 |avg tokens 4585.900 |tokens/s 33688.549 |walltime 810.394 | +Transformer | epoch 0 | step 5840 |avg loss 6.577 |avg tokens 4145.200 |tokens/s 31150.999 |walltime 811.725 | +Transformer | epoch 0 | step 5850 |avg loss 5.875 |avg tokens 4870.400 |tokens/s 35468.163 |walltime 813.098 | +Transformer | epoch 0 | step 5860 |avg loss 6.333 |avg tokens 4675.600 |tokens/s 35654.954 |walltime 814.409 | +Transformer | epoch 0 | step 5870 |avg loss 6.475 |avg tokens 4060.900 |tokens/s 29793.146 |walltime 815.772 | +Transformer | epoch 0 | step 5880 |avg loss 6.390 |avg tokens 4357.100 |tokens/s 32543.652 |walltime 817.111 | +Transformer | epoch 0 | step 5890 |avg loss 5.875 |avg tokens 4756.300 |tokens/s 33992.203 |walltime 818.510 | +Transformer | epoch 0 | step 5900 |avg loss 6.451 |avg tokens 4540.600 |tokens/s 34248.243 |walltime 819.836 | +Transformer | epoch 0 | step 5910 |avg loss 6.269 |avg tokens 4254.600 |tokens/s 31368.777 |walltime 821.192 | +Transformer | epoch 0 | step 5920 |avg loss 6.142 |avg tokens 4638.100 |tokens/s 33855.831 |walltime 822.562 | +Transformer | epoch 0 | step 5930 |avg loss 6.294 |avg tokens 4795.700 |tokens/s 34374.694 |walltime 823.957 | +Transformer | epoch 0 | step 5940 |avg loss 6.776 |avg tokens 3767.200 |tokens/s 29573.109 |walltime 825.231 | +Transformer | epoch 0 | step 5950 |avg loss 6.180 |avg tokens 4590.200 |tokens/s 33876.533 |walltime 826.586 | +Transformer | epoch 0 | step 5960 |avg loss 6.497 |avg tokens 3981.200 |tokens/s 30110.323 |walltime 827.908 | +Transformer | epoch 0 | step 5970 |avg loss 6.139 |avg tokens 4529.100 |tokens/s 33412.311 |walltime 829.264 | +Transformer | epoch 0 | step 5980 |avg loss 6.084 |avg tokens 4394.700 |tokens/s 31306.714 |walltime 830.668 | +Transformer | epoch 0 | step 5990 |avg loss 5.898 |avg tokens 4481.600 |tokens/s 33081.593 |walltime 832.022 | +Transformer | epoch 0 | step 6000 |avg loss 6.134 |avg tokens 4864.300 |tokens/s 34119.176 |walltime 833.448 | +Transformer | epoch 0 | step 6010 |avg loss 5.550 |avg tokens 4601.600 |tokens/s 32869.347 |walltime 834.848 | +Transformer | epoch 0 | step 6020 |avg loss 5.935 |avg tokens 4697.600 |tokens/s 33036.842 |walltime 836.270 | +Transformer | epoch 0 | step 6030 |avg loss 5.678 |avg tokens 4615.100 |tokens/s 31383.485 |walltime 837.741 | +Transformer | epoch 0 | step 6040 |avg loss 6.405 |avg tokens 4142.700 |tokens/s 29639.062 |walltime 839.138 | +Transformer | epoch 0 | step 6050 |avg loss 5.656 |avg tokens 4808.200 |tokens/s 33972.594 |walltime 840.554 | +Transformer | epoch 0 | step 6060 |avg loss 6.159 |avg tokens 4390.100 |tokens/s 32421.750 |walltime 841.908 | +Transformer | epoch 0 | step 6070 |avg loss 5.955 |avg tokens 4571.900 |tokens/s 33008.949 |walltime 843.293 | +Transformer | epoch 0 | step 6080 |avg loss 6.726 |avg tokens 3983.800 |tokens/s 30522.523 |walltime 844.598 | +Transformer | epoch 0 | step 6090 |avg loss 6.258 |avg tokens 4429.200 |tokens/s 31929.178 |walltime 845.985 | +Transformer | epoch 0 | step 6100 |avg loss 6.508 |avg tokens 4248.800 |tokens/s 31564.177 |walltime 847.331 | +Transformer | epoch 0 | step 6110 |avg loss 6.673 |avg tokens 4191.800 |tokens/s 31000.229 |walltime 848.683 | +Transformer | epoch 0 | step 6120 |avg loss 5.883 |avg tokens 4347.300 |tokens/s 31177.231 |walltime 850.078 | +Transformer | epoch 0 | step 6130 |avg loss 6.358 |avg tokens 4275.200 |tokens/s 32507.501 |walltime 851.393 | +Transformer | epoch 0 | step 6140 |avg loss 6.194 |avg tokens 4477.800 |tokens/s 31911.575 |walltime 852.796 | +Transformer | epoch 0 | step 6150 |avg loss 6.618 |avg tokens 4063.100 |tokens/s 30194.400 |walltime 854.142 | +Transformer | epoch 0 | step 6160 |avg loss 6.011 |avg tokens 4832.200 |tokens/s 33669.595 |walltime 855.577 | +Transformer | epoch 0 | step 6170 |avg loss 5.923 |avg tokens 4673.300 |tokens/s 32553.971 |walltime 857.013 | +Transformer | epoch 0 | step 6180 |avg loss 6.972 |avg tokens 4442.300 |tokens/s 30494.681 |walltime 858.469 | +Transformer | epoch 0 | step 6190 |avg loss 6.026 |avg tokens 4277.100 |tokens/s 31423.541 |walltime 859.830 | +Transformer | epoch 0 | step 6200 |avg loss 6.854 |avg tokens 3508.700 |tokens/s 27014.778 |walltime 861.129 | +Transformer | epoch 0 | step 6210 |avg loss 6.862 |avg tokens 4281.600 |tokens/s 34015.529 |walltime 862.388 | +Transformer | epoch 0 | step 6220 |avg loss 5.894 |avg tokens 4976.400 |tokens/s 33634.510 |walltime 863.868 | +Transformer | epoch 0 | step 6230 |avg loss 6.006 |avg tokens 4799.200 |tokens/s 34976.559 |walltime 865.240 | +Transformer | epoch 0 | step 6240 |avg loss 5.853 |avg tokens 4651.200 |tokens/s 33384.272 |walltime 866.633 | +Transformer | epoch 0 | step 6250 |avg loss 5.889 |avg tokens 4634.700 |tokens/s 32601.033 |walltime 868.055 | +Transformer | epoch 0 | step 6260 |avg loss 5.712 |avg tokens 4682.400 |tokens/s 32775.142 |walltime 869.483 | +Transformer | epoch 0 | step 6270 |avg loss 6.192 |avg tokens 4843.400 |tokens/s 34696.599 |walltime 870.879 | +Transformer | epoch 0 | step 6280 |avg loss 5.927 |avg tokens 4779.200 |tokens/s 35242.782 |walltime 872.235 | +Transformer | epoch 0 | step 6290 |avg loss 6.128 |avg tokens 4834.100 |tokens/s 34750.193 |walltime 873.626 | +Transformer | epoch 0 | step 6300 |avg loss 6.207 |avg tokens 4579.900 |tokens/s 32328.601 |walltime 875.043 | +Transformer | epoch 0 | step 6310 |avg loss 6.613 |avg tokens 4461.900 |tokens/s 33244.451 |walltime 876.385 | +Transformer | epoch 0 | step 6320 |avg loss 6.412 |avg tokens 4766.100 |tokens/s 35002.454 |walltime 877.747 | +Transformer | epoch 0 | step 6330 |avg loss 5.955 |avg tokens 4811.500 |tokens/s 34841.506 |walltime 879.128 | +Transformer | epoch 0 | step 6340 |avg loss 5.951 |avg tokens 4386.400 |tokens/s 31500.122 |walltime 880.520 | +Transformer | epoch 0 | step 6350 |avg loss 6.175 |avg tokens 4858.800 |tokens/s 35356.548 |walltime 881.895 | +Transformer | epoch 0 | step 6360 |avg loss 6.177 |avg tokens 4440.800 |tokens/s 33146.169 |walltime 883.234 | +Transformer | epoch 0 | step 6370 |avg loss 6.154 |avg tokens 4299.800 |tokens/s 31709.385 |walltime 884.590 | +Transformer | epoch 0 | step 6380 |avg loss 6.227 |avg tokens 4332.900 |tokens/s 32058.801 |walltime 885.942 | +Transformer | epoch 0 | step 6390 |avg loss 6.341 |avg tokens 4825.100 |tokens/s 35362.411 |walltime 887.306 | +Transformer | epoch 0 | step 6400 |avg loss 5.621 |avg tokens 4639.200 |tokens/s 32868.712 |walltime 888.718 | +Transformer | epoch 0 | step 6410 |avg loss 5.943 |avg tokens 4543.500 |tokens/s 32000.835 |walltime 890.138 | +Transformer | epoch 0 | step 6420 |avg loss 6.683 |avg tokens 4013.500 |tokens/s 31127.201 |walltime 891.427 | +Transformer | epoch 0 | step 6430 |avg loss 5.976 |avg tokens 4544.200 |tokens/s 33061.368 |walltime 892.801 | +Transformer | epoch 0 | step 6440 |avg loss 6.013 |avg tokens 4657.400 |tokens/s 34249.746 |walltime 894.161 | +Transformer | epoch 0 | step 6450 |avg loss 6.166 |avg tokens 4717.100 |tokens/s 34142.213 |walltime 895.543 | +Transformer | epoch 0 | step 6460 |avg loss 6.171 |avg tokens 4130.900 |tokens/s 30904.948 |walltime 896.880 | +Transformer | epoch 0 | step 6470 |avg loss 5.964 |avg tokens 4565.600 |tokens/s 32729.899 |walltime 898.274 | +Transformer | epoch 0 | step 6480 |avg loss 5.435 |avg tokens 4682.600 |tokens/s 33049.874 |walltime 899.691 | +Transformer | epoch 0 | step 6490 |avg loss 6.017 |avg tokens 4769.600 |tokens/s 34191.529 |walltime 901.086 | +Transformer | epoch 0 | step 6500 |avg loss 6.073 |avg tokens 4646.300 |tokens/s 34084.568 |walltime 902.449 | +Transformer | epoch 0 | step 6510 |avg loss 5.635 |avg tokens 4820.700 |tokens/s 33940.361 |walltime 903.870 | +Transformer | epoch 0 | step 6520 |avg loss 6.507 |avg tokens 4183.700 |tokens/s 31916.527 |walltime 905.181 | +Transformer | epoch 0 | step 6530 |avg loss 6.135 |avg tokens 4669.000 |tokens/s 33468.951 |walltime 906.576 | +Transformer | epoch 0 | step 6540 |avg loss 6.562 |avg tokens 4303.600 |tokens/s 32952.401 |walltime 907.882 | +Transformer | epoch 0 | step 6550 |avg loss 6.789 |avg tokens 4258.900 |tokens/s 32008.552 |walltime 909.212 | +Transformer | epoch 0 | step 6560 |avg loss 6.132 |avg tokens 4704.900 |tokens/s 33005.070 |walltime 910.638 | +Transformer | epoch 0 | step 6570 |avg loss 6.349 |avg tokens 4773.300 |tokens/s 35873.534 |walltime 911.968 | +Transformer | epoch 0 | step 6580 |avg loss 6.337 |avg tokens 4481.600 |tokens/s 33713.799 |walltime 913.298 | +Transformer | epoch 0 | step 6590 |avg loss 6.567 |avg tokens 4183.000 |tokens/s 31564.437 |walltime 914.623 | +Transformer | epoch 0 | step 6600 |avg loss 6.473 |avg tokens 4447.100 |tokens/s 34177.548 |walltime 915.924 | +Transformer | epoch 0 | step 6610 |avg loss 5.921 |avg tokens 4696.700 |tokens/s 34327.295 |walltime 917.292 | +Transformer | epoch 0 | step 6620 |avg loss 5.646 |avg tokens 4371.500 |tokens/s 31442.627 |walltime 918.683 | +Transformer | epoch 0 | step 6630 |avg loss 5.647 |avg tokens 4422.400 |tokens/s 31694.534 |walltime 920.078 | +Transformer | epoch 0 | step 6640 |avg loss 6.089 |avg tokens 4593.800 |tokens/s 33224.309 |walltime 921.461 | +Transformer | epoch 0 | step 6650 |avg loss 6.081 |avg tokens 4787.400 |tokens/s 34639.272 |walltime 922.843 | +Transformer | epoch 0 | step 6660 |avg loss 5.975 |avg tokens 4567.900 |tokens/s 33314.151 |walltime 924.214 | +Transformer | epoch 0 | step 6670 |avg loss 6.056 |avg tokens 4498.900 |tokens/s 32022.217 |walltime 925.619 | +Transformer | epoch 0 | step 6680 |avg loss 6.162 |avg tokens 4368.000 |tokens/s 32709.853 |walltime 926.954 | +Transformer | epoch 0 | step 6690 |avg loss 6.533 |avg tokens 4604.300 |tokens/s 34920.591 |walltime 928.273 | +Transformer | epoch 0 | step 6700 |avg loss 6.223 |avg tokens 4301.800 |tokens/s 31802.154 |walltime 929.625 | +Transformer | epoch 0 | step 6710 |avg loss 5.670 |avg tokens 4702.600 |tokens/s 33646.834 |walltime 931.023 | +Transformer | epoch 0 | step 6720 |avg loss 6.010 |avg tokens 4368.700 |tokens/s 31862.783 |walltime 932.394 | +Transformer | epoch 0 | step 6730 |avg loss 6.296 |avg tokens 4250.700 |tokens/s 31780.983 |walltime 933.732 | +Transformer | epoch 0 | step 6740 |avg loss 6.114 |avg tokens 4324.000 |tokens/s 31710.245 |walltime 935.095 | +Transformer | epoch 0 | step 6750 |avg loss 6.394 |avg tokens 4482.000 |tokens/s 33430.924 |walltime 936.436 | +Transformer | epoch 0 | step 6760 |avg loss 5.774 |avg tokens 4498.600 |tokens/s 31677.286 |walltime 937.856 | +Transformer | epoch 0 | step 6770 |avg loss 5.979 |avg tokens 4635.300 |tokens/s 33805.030 |walltime 939.227 | +Transformer | epoch 0 | step 6780 |avg loss 6.193 |avg tokens 4844.200 |tokens/s 35181.006 |walltime 940.604 | +Transformer | epoch 0 | step 6790 |avg loss 6.064 |avg tokens 4699.300 |tokens/s 34053.961 |walltime 941.984 | +Transformer | epoch 0 | step 6800 |avg loss 6.417 |avg tokens 4374.000 |tokens/s 32663.424 |walltime 943.323 | +Transformer | epoch 0 | step 6810 |avg loss 6.146 |avg tokens 4399.600 |tokens/s 32156.469 |walltime 944.691 | +Transformer | epoch 0 | step 6820 |avg loss 6.193 |avg tokens 4745.700 |tokens/s 33736.607 |walltime 946.098 | +Transformer | epoch 0 | step 6830 |avg loss 6.303 |avg tokens 4489.900 |tokens/s 32702.124 |walltime 947.471 | +Transformer | epoch 0 | step 6840 |avg loss 6.086 |avg tokens 4652.100 |tokens/s 33854.376 |walltime 948.845 | +Transformer | epoch 0 | step 6850 |avg loss 5.733 |avg tokens 4760.700 |tokens/s 34536.691 |walltime 950.224 | +Transformer | epoch 0 | step 6860 |avg loss 5.757 |avg tokens 4758.200 |tokens/s 33873.010 |walltime 951.628 | +Transformer | epoch 0 | step 6870 |avg loss 5.988 |avg tokens 4793.500 |tokens/s 35116.222 |walltime 952.993 | +Transformer | epoch 0 | step 6880 |avg loss 5.755 |avg tokens 4641.100 |tokens/s 34452.320 |walltime 954.341 | +Transformer | epoch 0 | step 6890 |avg loss 5.671 |avg tokens 4334.000 |tokens/s 32097.703 |walltime 955.691 | +Transformer | epoch 0 | step 6900 |avg loss 5.983 |avg tokens 4516.400 |tokens/s 32466.273 |walltime 957.082 | +Transformer | epoch 0 | step 6910 |avg loss 6.337 |avg tokens 4318.300 |tokens/s 32685.827 |walltime 958.403 | +Transformer | epoch 0 | step 6920 |avg loss 6.212 |avg tokens 4509.500 |tokens/s 33608.468 |walltime 959.745 | +Transformer | epoch 0 | step 6930 |avg loss 6.303 |avg tokens 4521.000 |tokens/s 33081.477 |walltime 961.111 | +Transformer | epoch 0 | step 6940 |avg loss 5.967 |avg tokens 4183.500 |tokens/s 29976.639 |walltime 962.507 | +Transformer | epoch 0 | step 6950 |avg loss 5.835 |avg tokens 4538.400 |tokens/s 32151.986 |walltime 963.919 | +Transformer | epoch 0 | step 6960 |avg loss 6.395 |avg tokens 4481.400 |tokens/s 33709.604 |walltime 965.248 | +Transformer | epoch 0 | step 6970 |avg loss 6.356 |avg tokens 4647.500 |tokens/s 34782.978 |walltime 966.584 | +Transformer | epoch 0 | step 6980 |avg loss 6.002 |avg tokens 4731.300 |tokens/s 34143.572 |walltime 967.970 | +Transformer | epoch 0 | step 6990 |avg loss 6.144 |avg tokens 4698.600 |tokens/s 34021.685 |walltime 969.351 | +Transformer | epoch 0 | step 7000 |avg loss 5.847 |avg tokens 4260.100 |tokens/s 31644.184 |walltime 970.697 | +Transformer | epoch 0 | step 7010 |avg loss 5.495 |avg tokens 4759.000 |tokens/s 33573.452 |walltime 972.115 | +Transformer | epoch 0 | step 7020 |avg loss 5.574 |avg tokens 4563.900 |tokens/s 32019.168 |walltime 973.540 | +Transformer | epoch 0 | step 7030 |avg loss 5.799 |avg tokens 4581.400 |tokens/s 32750.921 |walltime 974.939 | +Transformer | epoch 0 | step 7040 |avg loss 6.885 |avg tokens 4292.000 |tokens/s 32777.665 |walltime 976.248 | +Transformer | epoch 0 | step 7050 |avg loss 6.190 |avg tokens 4325.900 |tokens/s 31858.876 |walltime 977.606 | +Transformer | epoch 0 | step 7060 |avg loss 5.898 |avg tokens 4818.200 |tokens/s 34906.795 |walltime 978.986 | +Transformer | epoch 0 | step 7070 |avg loss 6.393 |avg tokens 4407.900 |tokens/s 33301.174 |walltime 980.310 | +Transformer | epoch 0 | step 7080 |avg loss 6.857 |avg tokens 4781.900 |tokens/s 36023.150 |walltime 981.638 | +Transformer | epoch 0 | step 7090 |avg loss 5.932 |avg tokens 4548.500 |tokens/s 32518.002 |walltime 983.036 | +Transformer | epoch 0 | step 7100 |avg loss 6.123 |avg tokens 4652.300 |tokens/s 33677.197 |walltime 984.418 | +Transformer | epoch 0 | step 7110 |avg loss 6.973 |avg tokens 3993.300 |tokens/s 31860.091 |walltime 985.671 | +Transformer | epoch 0 | step 7120 |avg loss 5.777 |avg tokens 4723.800 |tokens/s 33077.066 |walltime 987.099 | +Transformer | epoch 0 | step 7130 |avg loss 6.580 |avg tokens 4403.300 |tokens/s 32257.599 |walltime 988.464 | +Transformer | epoch 0 | step 7140 |avg loss 5.927 |avg tokens 4635.200 |tokens/s 33245.525 |walltime 989.859 | +Transformer | epoch 0 | step 7150 |avg loss 6.005 |avg tokens 4669.100 |tokens/s 32482.026 |walltime 991.296 | +Transformer | epoch 0 | step 7160 |avg loss 6.000 |avg tokens 4276.500 |tokens/s 32023.207 |walltime 992.631 | +Transformer | epoch 0 | step 7170 |avg loss 6.454 |avg tokens 4189.200 |tokens/s 31265.236 |walltime 993.971 | +Transformer | epoch 0 | step 7180 |avg loss 6.597 |avg tokens 3872.000 |tokens/s 30997.290 |walltime 995.221 | +Transformer | epoch 0 | step 7190 |avg loss 5.836 |avg tokens 4417.200 |tokens/s 31869.677 |walltime 996.607 | +Transformer | epoch 0 | step 7200 |avg loss 5.950 |avg tokens 4595.700 |tokens/s 33042.847 |walltime 997.997 | +Transformer | epoch 0 | step 7210 |avg loss 6.414 |avg tokens 4239.300 |tokens/s 32253.694 |walltime 999.312 | +Transformer | epoch 0 | step 7220 |avg loss 6.222 |avg tokens 4537.500 |tokens/s 32877.285 |walltime 1000.692 | +Transformer | epoch 0 | step 7230 |avg loss 5.949 |avg tokens 4641.100 |tokens/s 34398.307 |walltime 1002.041 | +Transformer | epoch 0 | step 7240 |avg loss 6.010 |avg tokens 4740.400 |tokens/s 33615.263 |walltime 1003.451 | +Transformer | epoch 0 | step 7250 |avg loss 6.049 |avg tokens 4377.700 |tokens/s 32131.461 |walltime 1004.814 | +Transformer | epoch 0 | step 7260 |avg loss 5.998 |avg tokens 4420.000 |tokens/s 31812.413 |walltime 1006.203 | +Transformer | epoch 0 | step 7270 |avg loss 5.488 |avg tokens 4913.500 |tokens/s 34649.216 |walltime 1007.621 | +Transformer | epoch 0 | step 7280 |avg loss 5.794 |avg tokens 4689.600 |tokens/s 34400.971 |walltime 1008.984 | +Transformer | epoch 0 | step 7290 |avg loss 5.957 |avg tokens 4702.200 |tokens/s 33959.987 |walltime 1010.369 | +Transformer | epoch 0 | step 7300 |avg loss 5.899 |avg tokens 4470.300 |tokens/s 33406.890 |walltime 1011.707 | +Transformer | epoch 0 | step 7310 |avg loss 5.858 |avg tokens 4326.200 |tokens/s 32501.731 |walltime 1013.038 | +Transformer | epoch 0 | step 7320 |avg loss 5.916 |avg tokens 4582.800 |tokens/s 33198.764 |walltime 1014.419 | +Transformer | epoch 0 | step 7330 |avg loss 6.040 |avg tokens 4382.600 |tokens/s 33216.798 |walltime 1015.738 | +Transformer | epoch 0 | step 7340 |avg loss 5.717 |avg tokens 4686.300 |tokens/s 33516.858 |walltime 1017.136 | +Transformer | epoch 0 | step 7350 |avg loss 6.204 |avg tokens 4648.500 |tokens/s 34270.093 |walltime 1018.493 | +Transformer | epoch 0 | step 7360 |avg loss 5.434 |avg tokens 4927.200 |tokens/s 34432.738 |walltime 1019.924 | +Transformer | epoch 0 | step 7370 |avg loss 6.505 |avg tokens 4401.400 |tokens/s 33510.955 |walltime 1021.237 | +Transformer | epoch 0 | step 7380 |avg loss 5.431 |avg tokens 4727.200 |tokens/s 32505.871 |walltime 1022.691 | +Transformer | epoch 0 | step 7390 |avg loss 5.748 |avg tokens 4283.700 |tokens/s 30684.929 |walltime 1024.087 | +Transformer | epoch 0 | step 7400 |avg loss 5.391 |avg tokens 4539.300 |tokens/s 32417.171 |walltime 1025.488 | +Transformer | epoch 0 | step 7410 |avg loss 5.781 |avg tokens 4495.000 |tokens/s 31776.599 |walltime 1026.902 | +Transformer | epoch 0 | step 7420 |avg loss 5.820 |avg tokens 4688.500 |tokens/s 34226.103 |walltime 1028.272 | +Transformer | epoch 0 | step 7430 |avg loss 5.576 |avg tokens 4866.200 |tokens/s 34090.669 |walltime 1029.700 | +Transformer | epoch 0 | step 7440 |avg loss 5.919 |avg tokens 4888.200 |tokens/s 35257.574 |walltime 1031.086 | +Transformer | epoch 0 | step 7450 |avg loss 5.880 |avg tokens 4605.200 |tokens/s 33166.032 |walltime 1032.475 | +Transformer | epoch 0 | step 7460 |avg loss 5.763 |avg tokens 4847.200 |tokens/s 33992.690 |walltime 1033.900 | +Transformer | epoch 0 | step 7470 |avg loss 5.752 |avg tokens 4662.600 |tokens/s 33755.835 |walltime 1035.282 | +Transformer | epoch 0 | step 7480 |avg loss 6.008 |avg tokens 4209.700 |tokens/s 30862.211 |walltime 1036.646 | +Transformer | epoch 0 | step 7490 |avg loss 5.737 |avg tokens 4464.500 |tokens/s 32833.396 |walltime 1038.006 | +Transformer | epoch 0 | step 7500 |avg loss 5.723 |avg tokens 4699.700 |tokens/s 32199.674 |walltime 1039.465 | +Transformer | epoch 0 | step 7510 |avg loss 6.067 |avg tokens 4515.600 |tokens/s 32442.334 |walltime 1040.857 | +Transformer | epoch 0 | step 7520 |avg loss 5.838 |avg tokens 4696.500 |tokens/s 34500.394 |walltime 1042.218 | +Transformer | epoch 0 | step 7530 |avg loss 6.088 |avg tokens 4454.000 |tokens/s 32620.960 |walltime 1043.584 | +Transformer | epoch 0 | step 7540 |avg loss 5.740 |avg tokens 4552.300 |tokens/s 32366.917 |walltime 1044.990 | +Transformer | epoch 0 | step 7550 |avg loss 6.288 |avg tokens 4198.500 |tokens/s 30517.972 |walltime 1046.366 | +Transformer | epoch 0 | step 7560 |avg loss 5.844 |avg tokens 4670.900 |tokens/s 33594.024 |walltime 1047.756 | +Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 32.0 +Transformer | epoch 0 | step 7570 |avg loss 5.377 |avg tokens 4459.900 |tokens/s 31821.218 |walltime 1049.158 | +Transformer | epoch 0 | step 7580 |avg loss 5.715 |avg tokens 4442.400 |tokens/s 32082.419 |walltime 1050.543 | +Transformer | epoch 0 | step 7590 |avg loss 5.479 |avg tokens 4687.200 |tokens/s 33583.866 |walltime 1051.938 | +Transformer | epoch 0 | step 7600 |avg loss 5.636 |avg tokens 4768.900 |tokens/s 33777.046 |walltime 1053.350 | +Transformer | epoch 0 | step 7610 |avg loss 6.217 |avg tokens 4828.000 |tokens/s 34200.807 |walltime 1054.762 | +Transformer | epoch 0 | step 7620 |avg loss 5.782 |avg tokens 4369.000 |tokens/s 31609.313 |walltime 1056.144 | +Transformer | epoch 0 | step 7630 |avg loss 5.795 |avg tokens 4227.700 |tokens/s 31064.483 |walltime 1057.505 | +Transformer | epoch 0 | step 7640 |avg loss 6.207 |avg tokens 4225.600 |tokens/s 31254.278 |walltime 1058.857 | +Transformer | epoch 0 | step 7650 |avg loss 5.622 |avg tokens 4722.400 |tokens/s 34186.568 |walltime 1060.238 | +Transformer | epoch 0 | step 7660 |avg loss 6.016 |avg tokens 4489.500 |tokens/s 33492.715 |walltime 1061.579 | +Transformer | epoch 0 | step 7670 |avg loss 5.892 |avg tokens 4387.600 |tokens/s 32270.361 |walltime 1062.938 | +Transformer | epoch 0 | step 7680 |avg loss 6.296 |avg tokens 4654.100 |tokens/s 33680.885 |walltime 1064.320 | +Transformer | epoch 0 | step 7690 |avg loss 5.360 |avg tokens 4739.400 |tokens/s 32381.452 |walltime 1065.784 | +Transformer | epoch 0 | step 7700 |avg loss 5.843 |avg tokens 4348.300 |tokens/s 31382.568 |walltime 1067.169 | +Transformer | epoch 0 | step 7710 |avg loss 6.299 |avg tokens 4419.700 |tokens/s 32913.829 |walltime 1068.512 | +Transformer | epoch 0 | step 7720 |avg loss 6.926 |avg tokens 3701.600 |tokens/s 28105.495 |walltime 1069.829 | +Transformer | epoch 0 | step 7730 |avg loss 5.596 |avg tokens 4857.200 |tokens/s 33429.321 |walltime 1071.282 | +Transformer | epoch 0 | step 7740 |avg loss 5.711 |avg tokens 4926.300 |tokens/s 35413.027 |walltime 1072.673 | +Transformer | epoch 0 | step 7750 |avg loss 6.135 |avg tokens 4336.200 |tokens/s 30470.233 |walltime 1074.096 | +Transformer | epoch 0 | step 7760 |avg loss 6.755 |avg tokens 4675.800 |tokens/s 36307.739 |walltime 1075.384 | +Transformer | epoch 0 | step 7770 |avg loss 5.287 |avg tokens 4824.400 |tokens/s 33357.105 |walltime 1076.831 | +Transformer | epoch 0 | step 7780 |avg loss 5.684 |avg tokens 4780.400 |tokens/s 34586.022 |walltime 1078.213 | +Transformer | epoch 0 | step 7790 |avg loss 5.904 |avg tokens 4546.100 |tokens/s 32530.998 |walltime 1079.610 | +Transformer | epoch 0 | step 7800 |avg loss 5.964 |avg tokens 4678.500 |tokens/s 34884.159 |walltime 1080.951 | +Transformer | epoch 0 | step 7810 |avg loss 5.884 |avg tokens 4768.400 |tokens/s 34502.732 |walltime 1082.333 | +Transformer | epoch 0 | step 7820 |avg loss 6.137 |avg tokens 4447.400 |tokens/s 32053.922 |walltime 1083.721 | +Transformer | epoch 0 | step 7830 |avg loss 5.289 |avg tokens 4864.000 |tokens/s 34787.233 |walltime 1085.119 | +Transformer | epoch 0 | step 7840 |avg loss 5.960 |avg tokens 4629.300 |tokens/s 33953.344 |walltime 1086.483 | +Transformer | epoch 0 | step 7850 |avg loss 5.868 |avg tokens 4351.200 |tokens/s 31592.371 |walltime 1087.860 | +Transformer | epoch 0 | step 7860 |avg loss 6.126 |avg tokens 4784.400 |tokens/s 34775.201 |walltime 1089.236 | +Transformer | epoch 0 | step 7870 |avg loss 5.792 |avg tokens 4261.800 |tokens/s 31337.723 |walltime 1090.596 | +Transformer | epoch 0 | step 7880 |avg loss 6.735 |avg tokens 3737.100 |tokens/s 28948.148 |walltime 1091.887 | +Transformer | epoch 0 | step 7890 |avg loss 5.875 |avg tokens 4448.800 |tokens/s 31943.232 |walltime 1093.279 | +Transformer | epoch 0 | step 7900 |avg loss 6.474 |avg tokens 4413.300 |tokens/s 32680.932 |walltime 1094.630 | +Transformer | epoch 0 | step 7910 |avg loss 5.864 |avg tokens 4544.700 |tokens/s 33644.435 |walltime 1095.981 | +Transformer | epoch 0 | step 7920 |avg loss 5.680 |avg tokens 4587.500 |tokens/s 33185.915 |walltime 1097.363 | +Transformer | epoch 0 | step 7930 |avg loss 5.878 |avg tokens 4697.700 |tokens/s 34425.155 |walltime 1098.727 | +Transformer | epoch 0 | step 7940 |avg loss 5.822 |avg tokens 4615.300 |tokens/s 33253.211 |walltime 1100.115 | +Transformer | epoch 0 | step 7950 |avg loss 5.325 |avg tokens 4754.700 |tokens/s 33319.762 |walltime 1101.542 | +Transformer | epoch 0 | step 7960 |avg loss 5.750 |avg tokens 4309.300 |tokens/s 31718.288 |walltime 1102.901 | +Transformer | epoch 0 | step 7970 |avg loss 5.868 |avg tokens 4799.800 |tokens/s 35236.900 |walltime 1104.263 | +Transformer | epoch 0 | step 7980 |avg loss 5.402 |avg tokens 4689.600 |tokens/s 33385.477 |walltime 1105.668 | +Transformer | epoch 0 | step 7990 |avg loss 6.111 |avg tokens 4243.200 |tokens/s 31121.373 |walltime 1107.031 | +Transformer | epoch 0 | step 8000 |avg loss 5.802 |avg tokens 4434.600 |tokens/s 32814.577 |walltime 1108.383 | +Transformer | epoch 0 | step 8010 |avg loss 5.514 |avg tokens 4590.400 |tokens/s 32441.597 |walltime 1109.798 | +Transformer | epoch 0 | step 8020 |avg loss 5.538 |avg tokens 4910.500 |tokens/s 34433.201 |walltime 1111.224 | +Transformer | epoch 0 | step 8030 |avg loss 6.518 |avg tokens 4260.400 |tokens/s 30513.311 |walltime 1112.620 | +Transformer | epoch 0 | step 8040 |avg loss 5.443 |avg tokens 4849.900 |tokens/s 34075.185 |walltime 1114.043 | +Transformer | epoch 0 | step 8050 |avg loss 5.562 |avg tokens 4614.700 |tokens/s 32380.339 |walltime 1115.469 | +Transformer | epoch 0 | step 8060 |avg loss 5.762 |avg tokens 4746.900 |tokens/s 33621.674 |walltime 1116.880 | +Transformer | epoch 0 | step 8070 |avg loss 5.321 |avg tokens 4722.300 |tokens/s 32973.416 |walltime 1118.313 | +Transformer | epoch 0 | step 8080 |avg loss 5.428 |avg tokens 4718.100 |tokens/s 33064.471 |walltime 1119.739 | +Transformer | epoch 0 | step 8090 |avg loss 5.433 |avg tokens 4712.600 |tokens/s 33868.936 |walltime 1121.131 | +Transformer | epoch 0 | step 8100 |avg loss 5.981 |avg tokens 4954.200 |tokens/s 35944.852 |walltime 1122.509 | +Transformer | epoch 0 | step 8110 |avg loss 6.627 |avg tokens 3927.200 |tokens/s 30515.388 |walltime 1123.796 | +Transformer | epoch 0 | step 8120 |avg loss 6.438 |avg tokens 4561.900 |tokens/s 33271.104 |walltime 1125.167 | +Transformer | epoch 0 | step 8130 |avg loss 5.439 |avg tokens 4786.800 |tokens/s 34110.953 |walltime 1126.571 | +Transformer | epoch 0 | step 8140 |avg loss 5.438 |avg tokens 4452.000 |tokens/s 31347.253 |walltime 1127.991 | +Transformer | epoch 0 | step 8150 |avg loss 5.731 |avg tokens 4906.400 |tokens/s 35755.671 |walltime 1129.363 | +Transformer | epoch 0 | step 8160 |avg loss 5.689 |avg tokens 4530.900 |tokens/s 32853.460 |walltime 1130.742 | +Transformer | epoch 0 | step 8170 |avg loss 5.245 |avg tokens 4694.500 |tokens/s 33156.740 |walltime 1132.158 | +Transformer | epoch 0 | step 8180 |avg loss 5.735 |avg tokens 4698.200 |tokens/s 32968.000 |walltime 1133.583 | +Transformer | epoch 0 | step 8190 |avg loss 6.118 |avg tokens 4699.900 |tokens/s 34459.544 |walltime 1134.947 | +Transformer | epoch 0 | step 8200 |avg loss 6.082 |avg tokens 4298.400 |tokens/s 32034.211 |walltime 1136.289 | +Transformer | epoch 0 | step 8210 |avg loss 6.312 |avg tokens 4809.600 |tokens/s 36641.530 |walltime 1137.601 | +Transformer | epoch 0 | step 8220 |avg loss 6.084 |avg tokens 4400.100 |tokens/s 32433.263 |walltime 1138.958 | +Transformer | epoch 0 | step 8230 |avg loss 6.094 |avg tokens 4618.500 |tokens/s 34554.948 |walltime 1140.295 | +Transformer | epoch 0 | step 8240 |avg loss 5.561 |avg tokens 4556.000 |tokens/s 32509.045 |walltime 1141.696 | +Transformer | epoch 0 | step 8250 |avg loss 6.536 |avg tokens 4226.500 |tokens/s 32780.622 |walltime 1142.985 | +Transformer | epoch 0 | step 8260 |avg loss 5.809 |avg tokens 4690.200 |tokens/s 34450.941 |walltime 1144.347 | +Transformer | epoch 0 | step 8270 |avg loss 6.564 |avg tokens 4401.600 |tokens/s 33049.635 |walltime 1145.679 | +Transformer | epoch 0 | step 8280 |avg loss 5.535 |avg tokens 4590.900 |tokens/s 32563.383 |walltime 1147.089 | +Transformer | epoch 0 | step 8290 |avg loss 6.587 |avg tokens 4393.700 |tokens/s 34284.898 |walltime 1148.370 | +Transformer | epoch 0 | step 8300 |avg loss 6.373 |avg tokens 4829.600 |tokens/s 35271.715 |walltime 1149.739 | +Transformer | epoch 0 | step 8310 |avg loss 5.507 |avg tokens 4804.800 |tokens/s 33935.120 |walltime 1151.155 | +Transformer | epoch 0 | step 8320 |avg loss 5.713 |avg tokens 4570.700 |tokens/s 33057.451 |walltime 1152.538 | +Transformer | epoch 0 | step 8330 |avg loss 5.423 |avg tokens 4570.100 |tokens/s 32076.207 |walltime 1153.963 | +Transformer | epoch 0 | step 8340 |avg loss 6.497 |avg tokens 4416.000 |tokens/s 32652.231 |walltime 1155.315 | +Transformer | epoch 0 | step 8350 |avg loss 5.466 |avg tokens 4934.100 |tokens/s 36164.055 |walltime 1156.679 | +Transformer | epoch 0 | step 8360 |avg loss 5.582 |avg tokens 4322.300 |tokens/s 31706.931 |walltime 1158.043 | +Transformer | epoch 0 | step 8370 |avg loss 5.957 |avg tokens 4354.700 |tokens/s 32277.726 |walltime 1159.392 | +Transformer | epoch 0 | step 8380 |avg loss 5.874 |avg tokens 4810.200 |tokens/s 35345.686 |walltime 1160.753 | +Transformer | epoch 0 | step 8390 |avg loss 6.182 |avg tokens 4110.500 |tokens/s 31368.370 |walltime 1162.063 | +Transformer | epoch 0 | step 8400 |avg loss 6.075 |avg tokens 4469.300 |tokens/s 31980.390 |walltime 1163.461 | +Transformer | epoch 0 | step 8410 |avg loss 6.411 |avg tokens 4157.700 |tokens/s 31440.441 |walltime 1164.783 | +Transformer | epoch 0 | step 8420 |avg loss 5.445 |avg tokens 4892.400 |tokens/s 34929.871 |walltime 1166.184 | +Transformer | epoch 0 | step 8430 |avg loss 5.722 |avg tokens 4794.800 |tokens/s 33894.879 |walltime 1167.598 | +Transformer | epoch 0 | step 8440 |avg loss 6.419 |avg tokens 4210.800 |tokens/s 32492.851 |walltime 1168.894 | +Transformer | epoch 0 | step 8450 |avg loss 5.609 |avg tokens 4747.300 |tokens/s 33345.435 |walltime 1170.318 | +Transformer | epoch 0 | step 8460 |avg loss 5.930 |avg tokens 4334.500 |tokens/s 31835.736 |walltime 1171.679 | +Transformer | epoch 0 | step 8470 |avg loss 6.008 |avg tokens 4366.800 |tokens/s 31879.471 |walltime 1173.049 | +Transformer | epoch 0 | step 8480 |avg loss 5.874 |avg tokens 4595.700 |tokens/s 34051.450 |walltime 1174.399 | +Transformer | epoch 0 | step 8490 |avg loss 6.008 |avg tokens 4456.500 |tokens/s 32451.541 |walltime 1175.772 | +Transformer | epoch 0 | step 8500 |avg loss 6.037 |avg tokens 4643.200 |tokens/s 34919.844 |walltime 1177.102 | +Transformer | epoch 0 | step 8510 |avg loss 6.444 |avg tokens 4325.900 |tokens/s 32396.252 |walltime 1178.437 | +Transformer | epoch 0 | step 8520 |avg loss 5.727 |avg tokens 4443.200 |tokens/s 31932.290 |walltime 1179.829 | +Transformer | epoch 0 | step 8530 |avg loss 5.771 |avg tokens 4306.300 |tokens/s 32033.879 |walltime 1181.173 | +Transformer | epoch 0 | step 8540 |avg loss 6.284 |avg tokens 4428.700 |tokens/s 32501.554 |walltime 1182.535 | +Transformer | epoch 0 | step 8550 |avg loss 5.565 |avg tokens 4715.800 |tokens/s 33783.078 |walltime 1183.931 | +Transformer | epoch 0 | step 8560 |avg loss 5.975 |avg tokens 4739.000 |tokens/s 33635.771 |walltime 1185.340 | +Transformer | epoch 0 | step 8570 |avg loss 6.031 |avg tokens 4096.700 |tokens/s 30875.610 |walltime 1186.667 | +Transformer | epoch 0 | step 8580 |avg loss 5.926 |avg tokens 4534.700 |tokens/s 33583.303 |walltime 1188.017 | +Transformer | epoch 0 | step 8590 |avg loss 6.190 |avg tokens 4554.300 |tokens/s 33128.983 |walltime 1189.392 | +Transformer | epoch 0 | step 8600 |avg loss 6.250 |avg tokens 4676.400 |tokens/s 35665.970 |walltime 1190.703 | +Transformer | epoch 0 | step 8610 |avg loss 6.277 |avg tokens 4130.700 |tokens/s 31391.620 |walltime 1192.019 | +Transformer | epoch 0 | step 8620 |avg loss 5.769 |avg tokens 4657.100 |tokens/s 33599.215 |walltime 1193.405 | +Transformer | epoch 0 | step 8630 |avg loss 5.519 |avg tokens 4786.000 |tokens/s 33623.979 |walltime 1194.829 | +Transformer | epoch 0 | step 8640 |avg loss 6.101 |avg tokens 4842.300 |tokens/s 34710.933 |walltime 1196.224 | +Transformer | epoch 0 | step 8650 |avg loss 6.199 |avg tokens 4255.400 |tokens/s 33256.339 |walltime 1197.503 | +Transformer | epoch 0 | step 8660 |avg loss 5.939 |avg tokens 4533.900 |tokens/s 33708.839 |walltime 1198.848 | +Transformer | epoch 0 | step 8670 |avg loss 6.549 |avg tokens 4484.300 |tokens/s 33499.909 |walltime 1200.187 | +Transformer | epoch 0 | step 8680 |avg loss 5.444 |avg tokens 4660.400 |tokens/s 32719.486 |walltime 1201.611 | +Transformer | epoch 0 | step 8690 |avg loss 6.191 |avg tokens 4186.600 |tokens/s 30504.623 |walltime 1202.984 | +Transformer | epoch 0 | step 8700 |avg loss 6.081 |avg tokens 4248.600 |tokens/s 31269.579 |walltime 1204.342 | +Transformer | epoch 0 | step 8710 |avg loss 5.675 |avg tokens 4432.800 |tokens/s 32381.763 |walltime 1205.711 | +Transformer | epoch 0 | step 8720 |avg loss 6.630 |avg tokens 4410.000 |tokens/s 34111.629 |walltime 1207.004 | +Transformer | epoch 0 | step 8730 |avg loss 5.946 |avg tokens 4730.600 |tokens/s 34597.557 |walltime 1208.371 | +Transformer | epoch 0 | step 8740 |avg loss 5.881 |avg tokens 4615.300 |tokens/s 33970.823 |walltime 1209.730 | +Transformer | epoch 0 | step 8750 |avg loss 5.800 |avg tokens 4415.800 |tokens/s 31799.051 |walltime 1211.119 | +Transformer | epoch 0 | step 8760 |avg loss 5.933 |avg tokens 4257.000 |tokens/s 31930.425 |walltime 1212.452 | +Transformer | epoch 0 | step 8770 |avg loss 6.377 |avg tokens 4271.900 |tokens/s 32429.186 |walltime 1213.769 | +Transformer | epoch 0 | step 8780 |avg loss 5.973 |avg tokens 4470.200 |tokens/s 33797.186 |walltime 1215.092 | +Transformer | epoch 0 | step 8790 |avg loss 5.932 |avg tokens 4782.300 |tokens/s 33928.252 |walltime 1216.501 | +Transformer | epoch 0 | step 8800 |avg loss 6.007 |avg tokens 4785.900 |tokens/s 34558.620 |walltime 1217.886 | +Transformer | epoch 0 | step 8810 |avg loss 5.122 |avg tokens 4657.600 |tokens/s 32871.842 |walltime 1219.303 | +Transformer | epoch 0 | step 8820 |avg loss 5.625 |avg tokens 4842.400 |tokens/s 33687.205 |walltime 1220.741 | +Transformer | epoch 0 | step 8830 |avg loss 5.741 |avg tokens 4771.800 |tokens/s 34343.361 |walltime 1222.130 | +Transformer | epoch 0 | step 8840 |avg loss 6.106 |avg tokens 4463.800 |tokens/s 34158.177 |walltime 1223.437 | +Transformer | epoch 0 | step 8850 |avg loss 5.861 |avg tokens 4784.900 |tokens/s 34627.396 |walltime 1224.819 | +Transformer | epoch 0 | step 8860 |avg loss 5.734 |avg tokens 4425.600 |tokens/s 32719.297 |walltime 1226.171 | +Transformer | epoch 0 | step 8870 |avg loss 5.795 |avg tokens 4124.200 |tokens/s 29980.858 |walltime 1227.547 | +Transformer | epoch 0 | step 8880 |avg loss 5.399 |avg tokens 4390.900 |tokens/s 32126.370 |walltime 1228.914 | +Transformer | epoch 0 | step 8890 |avg loss 5.333 |avg tokens 4633.200 |tokens/s 32274.958 |walltime 1230.349 | +Transformer | epoch 0 | step 8900 |avg loss 6.157 |avg tokens 4403.900 |tokens/s 33353.886 |walltime 1231.670 | +Transformer | epoch 0 | step 8910 |avg loss 5.633 |avg tokens 4525.400 |tokens/s 31926.385 |walltime 1233.087 | +Transformer | epoch 0 | step 8920 |avg loss 6.216 |avg tokens 4452.700 |tokens/s 32368.012 |walltime 1234.463 | +Transformer | epoch 0 | step 8930 |avg loss 5.894 |avg tokens 4467.700 |tokens/s 32594.797 |walltime 1235.833 | +Transformer | epoch 0 | step 8940 |avg loss 5.765 |avg tokens 4813.500 |tokens/s 34499.062 |walltime 1237.229 | +Transformer | epoch 0 | step 8950 |avg loss 5.687 |avg tokens 4714.300 |tokens/s 34829.639 |walltime 1238.582 | +Transformer | epoch 0 | step 8960 |avg loss 6.357 |avg tokens 3953.700 |tokens/s 29847.999 |walltime 1239.907 | +Transformer | epoch 0 | step 8970 |avg loss 5.504 |avg tokens 4663.900 |tokens/s 34140.281 |walltime 1241.273 | +Transformer | epoch 0 | step 8980 |avg loss 5.894 |avg tokens 4226.900 |tokens/s 32069.956 |walltime 1242.591 | +Transformer | epoch 0 | step 8990 |avg loss 5.820 |avg tokens 4653.400 |tokens/s 32634.064 |walltime 1244.017 | +Transformer | epoch 0 | step 9000 |avg loss 6.337 |avg tokens 4026.000 |tokens/s 29829.631 |walltime 1245.367 | +Transformer | epoch 0 | step 9010 |avg loss 5.921 |avg tokens 4249.100 |tokens/s 31151.366 |walltime 1246.731 | +Transformer | epoch 0 | step 9020 |avg loss 5.835 |avg tokens 4639.800 |tokens/s 33450.081 |walltime 1248.118 | +Transformer | epoch 0 | step 9030 |avg loss 6.251 |avg tokens 4369.400 |tokens/s 32931.840 |walltime 1249.444 | +Transformer | epoch 0 | step 9040 |avg loss 6.039 |avg tokens 4680.900 |tokens/s 34049.197 |walltime 1250.819 | +Transformer | epoch 0 | step 9050 |avg loss 5.964 |avg tokens 4273.300 |tokens/s 31677.847 |walltime 1252.168 | +Transformer | epoch 0 | step 9060 |avg loss 5.337 |avg tokens 4755.400 |tokens/s 32629.397 |walltime 1253.626 | +Transformer | epoch 0 | step 9070 |avg loss 6.256 |avg tokens 4526.500 |tokens/s 33513.586 |walltime 1254.976 | +Transformer | epoch 0 | step 9080 |avg loss 5.985 |avg tokens 4270.400 |tokens/s 31258.321 |walltime 1256.342 | +Transformer | epoch 0 | step 9090 |avg loss 5.699 |avg tokens 4503.400 |tokens/s 31823.334 |walltime 1257.758 | +Transformer | epoch 0 | step 9100 |avg loss 5.195 |avg tokens 4754.200 |tokens/s 33768.267 |walltime 1259.165 | +Transformer | epoch 0 | step 9110 |avg loss 5.703 |avg tokens 4656.300 |tokens/s 32761.975 |walltime 1260.587 | +Transformer | epoch 0 | step 9120 |avg loss 5.814 |avg tokens 4552.900 |tokens/s 32980.009 |walltime 1261.967 | +Transformer | epoch 0 | step 9130 |avg loss 5.671 |avg tokens 4669.500 |tokens/s 33911.806 |walltime 1263.344 | +Transformer | epoch 0 | step 9140 |avg loss 6.017 |avg tokens 4510.900 |tokens/s 33224.988 |walltime 1264.702 | +Transformer | epoch 0 | step 9150 |avg loss 5.723 |avg tokens 4637.500 |tokens/s 34222.376 |walltime 1266.057 | +Transformer | epoch 0 | step 9160 |avg loss 5.710 |avg tokens 4687.000 |tokens/s 33596.437 |walltime 1267.452 | +Transformer | epoch 0 | step 9170 |avg loss 6.275 |avg tokens 4101.100 |tokens/s 30652.521 |walltime 1268.790 | +Transformer | epoch 0 | step 9180 |avg loss 5.707 |avg tokens 4736.000 |tokens/s 33785.118 |walltime 1270.192 | +Transformer | epoch 0 | step 9190 |avg loss 6.049 |avg tokens 4327.400 |tokens/s 31593.968 |walltime 1271.561 | +Transformer | epoch 0 | step 9200 |avg loss 6.338 |avg tokens 4582.300 |tokens/s 34159.810 |walltime 1272.903 | +Transformer | epoch 0 | step 9210 |avg loss 5.905 |avg tokens 4454.700 |tokens/s 31265.233 |walltime 1274.328 | +Transformer | epoch 0 | step 9220 |avg loss 6.286 |avg tokens 4247.600 |tokens/s 31766.447 |walltime 1275.665 | +Transformer | epoch 0 | step 9230 |avg loss 5.757 |avg tokens 4915.300 |tokens/s 35086.798 |walltime 1277.066 | +Transformer | epoch 0 | step 9240 |avg loss 5.839 |avg tokens 4790.000 |tokens/s 34070.766 |walltime 1278.472 | +Transformer | epoch 0 | step 9250 |avg loss 5.707 |avg tokens 4300.200 |tokens/s 31088.599 |walltime 1279.855 | +Transformer | epoch 0 | step 9260 |avg loss 5.668 |avg tokens 4790.800 |tokens/s 33683.575 |walltime 1281.277 | +Transformer | epoch 0 | step 9270 |avg loss 5.621 |avg tokens 4639.600 |tokens/s 33336.553 |walltime 1282.669 | +Transformer | epoch 0 | step 9280 |avg loss 5.730 |avg tokens 4674.900 |tokens/s 34205.812 |walltime 1284.036 | +Transformer | epoch 0 | step 9290 |avg loss 6.112 |avg tokens 3874.700 |tokens/s 29097.985 |walltime 1285.367 | +Transformer | epoch 0 | step 9300 |avg loss 5.490 |avg tokens 4565.700 |tokens/s 33259.699 |walltime 1286.740 | +Transformer | epoch 0 | step 9310 |avg loss 5.596 |avg tokens 4422.800 |tokens/s 30963.945 |walltime 1288.168 | +Transformer | epoch 0 | step 9320 |avg loss 5.693 |avg tokens 4645.400 |tokens/s 33405.880 |walltime 1289.559 | +Transformer | epoch 0 | step 9330 |avg loss 5.918 |avg tokens 4216.800 |tokens/s 30281.590 |walltime 1290.952 | +Transformer | epoch 0 | step 9340 |avg loss 6.099 |avg tokens 3889.600 |tokens/s 29502.328 |walltime 1292.270 | +Transformer | epoch 0 | step 9350 |avg loss 5.802 |avg tokens 4536.400 |tokens/s 34081.409 |walltime 1293.601 | +Transformer | epoch 0 | step 9360 |avg loss 5.547 |avg tokens 4628.000 |tokens/s 33365.744 |walltime 1294.988 | +Transformer | epoch 0 | step 9370 |avg loss 5.951 |avg tokens 4883.300 |tokens/s 35926.756 |walltime 1296.347 | +Transformer | epoch 0 | step 9380 |avg loss 6.285 |avg tokens 3769.600 |tokens/s 27749.245 |walltime 1297.706 | +Transformer | epoch 0 | step 9390 |avg loss 5.735 |avg tokens 4657.000 |tokens/s 33578.636 |walltime 1299.093 | +Transformer | epoch 0 | step 9400 |avg loss 5.566 |avg tokens 4698.400 |tokens/s 34158.893 |walltime 1300.468 | +Transformer | epoch 0 | step 9410 |avg loss 5.827 |avg tokens 4346.000 |tokens/s 32103.145 |walltime 1301.822 | +Transformer | epoch 0 | step 9420 |avg loss 5.631 |avg tokens 4664.800 |tokens/s 34209.930 |walltime 1303.185 | +Transformer | epoch 0 | step 9430 |avg loss 5.366 |avg tokens 4725.800 |tokens/s 33135.834 |walltime 1304.612 | +Transformer | epoch 0 | step 9440 |avg loss 5.686 |avg tokens 4245.000 |tokens/s 30498.558 |walltime 1306.003 | +Transformer | epoch 0 | step 9450 |avg loss 5.254 |avg tokens 4689.500 |tokens/s 32642.332 |walltime 1307.440 | +Transformer | epoch 0 | step 9460 |avg loss 5.896 |avg tokens 4163.300 |tokens/s 30492.958 |walltime 1308.805 | +Transformer | epoch 0 | step 9470 |avg loss 5.890 |avg tokens 4450.500 |tokens/s 33563.204 |walltime 1310.131 | +Transformer | epoch 0 | step 9480 |avg loss 6.069 |avg tokens 4099.600 |tokens/s 31302.218 |walltime 1311.441 | +Transformer | epoch 0 | step 9490 |avg loss 5.738 |avg tokens 4171.700 |tokens/s 30358.840 |walltime 1312.815 | +Transformer | epoch 0 | step 9500 |avg loss 5.388 |avg tokens 4590.600 |tokens/s 32295.568 |walltime 1314.237 | +Transformer | epoch 0 | step 9510 |avg loss 5.405 |avg tokens 4910.900 |tokens/s 35003.907 |walltime 1315.640 | +Transformer | epoch 0 | step 9520 |avg loss 5.477 |avg tokens 4896.900 |tokens/s 34358.308 |walltime 1317.065 | +Transformer | epoch 0 | step 9530 |avg loss 5.557 |avg tokens 4264.400 |tokens/s 30209.730 |walltime 1318.477 | +Transformer | epoch 0 | step 9540 |avg loss 5.428 |avg tokens 4464.500 |tokens/s 32501.563 |walltime 1319.850 | +Transformer | epoch 0 | step 9550 |avg loss 5.250 |avg tokens 4574.100 |tokens/s 33399.167 |walltime 1321.220 | +Transformer | epoch 0 | step 9560 |avg loss 6.205 |avg tokens 4440.800 |tokens/s 31908.399 |walltime 1322.611 | +Transformer | epoch 0 | step 9570 |avg loss 5.669 |avg tokens 4733.700 |tokens/s 32996.536 |walltime 1324.046 | +Transformer | epoch 0 | step 9580 |avg loss 5.342 |avg tokens 4775.300 |tokens/s 34134.710 |walltime 1325.445 | +Transformer | epoch 0 | step 9590 |avg loss 5.684 |avg tokens 4394.300 |tokens/s 32482.297 |walltime 1326.798 | +Transformer | epoch 0 | step 9600 |avg loss 5.831 |avg tokens 4241.000 |tokens/s 31659.626 |walltime 1328.137 | +Transformer | epoch 0 | step 9610 |avg loss 5.788 |avg tokens 4386.700 |tokens/s 31989.025 |walltime 1329.509 | +Transformer | epoch 0 | step 9620 |avg loss 5.501 |avg tokens 4877.700 |tokens/s 34572.581 |walltime 1330.920 | +Transformer | epoch 0 | step 9630 |avg loss 5.179 |avg tokens 4847.200 |tokens/s 34127.388 |walltime 1332.340 | +Transformer | epoch 0 | step 9640 |avg loss 5.462 |avg tokens 4846.900 |tokens/s 35250.695 |walltime 1333.715 | +Transformer | epoch 0 | step 9650 |avg loss 5.932 |avg tokens 4474.100 |tokens/s 32599.816 |walltime 1335.087 | +Transformer | epoch 0 | step 9660 |avg loss 5.642 |avg tokens 4238.100 |tokens/s 31415.306 |walltime 1336.436 | +Transformer | epoch 0 | step 9670 |avg loss 5.519 |avg tokens 4339.600 |tokens/s 32039.517 |walltime 1337.791 | +Transformer | epoch 0 | step 9680 |avg loss 5.656 |avg tokens 4600.600 |tokens/s 32082.340 |walltime 1339.225 | +Transformer | epoch 0 | step 9690 |avg loss 6.084 |avg tokens 4322.200 |tokens/s 33674.020 |walltime 1340.508 | +Transformer | epoch 0 | step 9700 |avg loss 5.680 |avg tokens 4614.400 |tokens/s 32824.959 |walltime 1341.914 | +Transformer | epoch 0 | step 9710 |avg loss 5.848 |avg tokens 4753.000 |tokens/s 34422.686 |walltime 1343.295 | +Transformer | epoch 0 | step 9720 |avg loss 5.680 |avg tokens 4421.600 |tokens/s 32585.974 |walltime 1344.652 | +Transformer | epoch 0 | step 9730 |avg loss 6.212 |avg tokens 4339.500 |tokens/s 33197.982 |walltime 1345.959 | +Transformer | epoch 0 | step 9740 |avg loss 5.597 |avg tokens 4454.100 |tokens/s 32687.778 |walltime 1347.322 | +Transformer | epoch 0 | step 9750 |avg loss 5.609 |avg tokens 4522.800 |tokens/s 32834.273 |walltime 1348.699 | +Transformer | epoch 0 | step 9760 |avg loss 6.055 |avg tokens 4627.900 |tokens/s 34322.647 |walltime 1350.047 | +Transformer | epoch 0 | step 9770 |avg loss 5.238 |avg tokens 4665.300 |tokens/s 33397.200 |walltime 1351.444 | +Transformer | epoch 0 | step 9780 |avg loss 6.024 |avg tokens 4509.900 |tokens/s 32656.899 |walltime 1352.825 | +Transformer | epoch 0 | step 9790 |avg loss 5.752 |avg tokens 4380.200 |tokens/s 31682.534 |walltime 1354.208 | +Transformer | epoch 0 | step 9800 |avg loss 6.175 |avg tokens 4310.900 |tokens/s 32698.537 |walltime 1355.526 | +Transformer | epoch 0 | step 9810 |avg loss 5.527 |avg tokens 4766.300 |tokens/s 33836.408 |walltime 1356.935 | +Transformer | epoch 0 | step 9820 |avg loss 5.452 |avg tokens 4710.000 |tokens/s 33647.611 |walltime 1358.335 | +Transformer | epoch 0 | step 9830 |avg loss 6.335 |avg tokens 4271.000 |tokens/s 32856.704 |walltime 1359.635 | +Transformer | epoch 0 | step 9840 |avg loss 5.802 |avg tokens 4867.100 |tokens/s 35067.893 |walltime 1361.023 | +Transformer | epoch 0 | step 9850 |avg loss 6.652 |avg tokens 4001.700 |tokens/s 31209.655 |walltime 1362.305 | +Transformer | epoch 0 | step 9860 |avg loss 5.931 |avg tokens 4672.600 |tokens/s 34532.967 |walltime 1363.658 | +Transformer | epoch 0 | step 9870 |avg loss 5.299 |avg tokens 4797.200 |tokens/s 33929.361 |walltime 1365.072 | +Transformer | epoch 0 | step 9880 |avg loss 6.211 |avg tokens 4411.900 |tokens/s 32511.128 |walltime 1366.429 | +Transformer | epoch 0 | step 9890 |avg loss 5.676 |avg tokens 4673.700 |tokens/s 33674.411 |walltime 1367.817 | +Transformer | epoch 0 | step 9900 |avg loss 5.671 |avg tokens 4716.700 |tokens/s 34409.027 |walltime 1369.187 | +Transformer | epoch 0 | step 9910 |avg loss 5.430 |avg tokens 4828.800 |tokens/s 34623.860 |walltime 1370.582 | +Transformer | epoch 0 | step 9920 |avg loss 5.650 |avg tokens 4502.400 |tokens/s 33023.232 |walltime 1371.946 | +Transformer | epoch 0 | step 9930 |avg loss 5.975 |avg tokens 4589.900 |tokens/s 33613.141 |walltime 1373.311 | +Transformer | epoch 0 | step 9940 |avg loss 5.831 |avg tokens 4376.900 |tokens/s 32223.493 |walltime 1374.669 | +Transformer | epoch 0 | step 9950 |avg loss 5.710 |avg tokens 4781.500 |tokens/s 34959.662 |walltime 1376.037 | +Transformer | epoch 0 | step 9960 |avg loss 5.492 |avg tokens 4720.100 |tokens/s 34257.141 |walltime 1377.415 | +Transformer | epoch 0 | step 9970 |avg loss 5.870 |avg tokens 4628.900 |tokens/s 33995.024 |walltime 1378.777 | +Transformer | epoch 0 | step 9980 |avg loss 5.371 |avg tokens 4792.400 |tokens/s 33716.754 |walltime 1380.198 | +Transformer | epoch 0 | step 9990 |avg loss 5.226 |avg tokens 4552.700 |tokens/s 31685.619 |walltime 1381.635 | +Transformer | epoch 0 | step 10000 |avg loss 5.646 |avg tokens 4465.500 |tokens/s 32711.811 |walltime 1383.000 | +Transformer | epoch 0 | step 10010 |avg loss 5.216 |avg tokens 4976.700 |tokens/s 34304.368 |walltime 1384.451 | +Transformer | epoch 0 | step 10020 |avg loss 5.472 |avg tokens 4674.100 |tokens/s 33289.738 |walltime 1385.855 | +Transformer | epoch 0 | step 10030 |avg loss 6.002 |avg tokens 4162.700 |tokens/s 31927.791 |walltime 1387.158 | +Transformer | epoch 0 | step 10040 |avg loss 5.193 |avg tokens 4833.700 |tokens/s 33228.909 |walltime 1388.613 | +Transformer | epoch 0 | step 10050 |avg loss 6.217 |avg tokens 4436.700 |tokens/s 34463.101 |walltime 1389.901 | +Transformer | epoch 0 | step 10060 |avg loss 5.655 |avg tokens 4812.300 |tokens/s 34440.009 |walltime 1391.298 | +Transformer | epoch 0 | step 10070 |avg loss 5.081 |avg tokens 4848.000 |tokens/s 33477.307 |walltime 1392.746 | +Transformer | epoch 0 | step 10080 |avg loss 5.905 |avg tokens 4390.100 |tokens/s 32627.574 |walltime 1394.091 | +Transformer | epoch 0 | step 10090 |avg loss 6.848 |avg tokens 3619.300 |tokens/s 28742.720 |walltime 1395.351 | +Transformer | epoch 0 | step 10100 |avg loss 6.056 |avg tokens 4332.600 |tokens/s 32047.175 |walltime 1396.703 | +Transformer | epoch 0 | step 10110 |avg loss 6.262 |avg tokens 4043.200 |tokens/s 31406.801 |walltime 1397.990 | +Transformer | epoch 0 | step 10120 |avg loss 6.020 |avg tokens 4457.500 |tokens/s 32797.512 |walltime 1399.349 | +Transformer | epoch 0 | step 10130 |avg loss 5.553 |avg tokens 4709.000 |tokens/s 33227.086 |walltime 1400.766 | +Transformer | epoch 0 | step 10140 |avg loss 5.265 |avg tokens 4418.400 |tokens/s 32547.561 |walltime 1402.124 | +Transformer | epoch 0 | step 10150 |avg loss 5.744 |avg tokens 4724.200 |tokens/s 34635.604 |walltime 1403.488 | +Transformer | epoch 0 | step 10160 |avg loss 5.709 |avg tokens 4321.800 |tokens/s 31023.317 |walltime 1404.881 | +Transformer | epoch 0 | step 10170 |avg loss 6.245 |avg tokens 4105.300 |tokens/s 31627.759 |walltime 1406.179 | +Transformer | epoch 0 | step 10180 |avg loss 5.337 |avg tokens 4577.600 |tokens/s 32179.622 |walltime 1407.601 | +Transformer | epoch 0 | step 10190 |avg loss 5.312 |avg tokens 4745.600 |tokens/s 34021.947 |walltime 1408.996 | +Transformer | epoch 0 | step 10200 |avg loss 5.726 |avg tokens 4619.900 |tokens/s 33441.955 |walltime 1410.378 | +Transformer | epoch 0 | step 10210 |avg loss 5.880 |avg tokens 4082.500 |tokens/s 31391.613 |walltime 1411.678 | +Transformer | epoch 0 | step 10220 |avg loss 5.134 |avg tokens 4845.600 |tokens/s 33856.215 |walltime 1413.110 | +Transformer | epoch 0 | step 10230 |avg loss 6.329 |avg tokens 4455.900 |tokens/s 32762.371 |walltime 1414.470 | +Transformer | epoch 0 | step 10240 |avg loss 6.188 |avg tokens 4325.300 |tokens/s 32389.827 |walltime 1415.805 | +Transformer | epoch 0 | step 10250 |avg loss 5.658 |avg tokens 4347.500 |tokens/s 31347.973 |walltime 1417.192 | +Transformer | epoch 0 | step 10260 |avg loss 6.473 |avg tokens 4220.500 |tokens/s 31779.833 |walltime 1418.520 | +Transformer | epoch 0 | step 10270 |avg loss 5.077 |avg tokens 4637.800 |tokens/s 33724.981 |walltime 1419.895 | +Transformer | epoch 0 | step 10280 |avg loss 5.839 |avg tokens 4380.200 |tokens/s 32681.049 |walltime 1421.235 | +Transformer | epoch 0 | step 10290 |avg loss 6.063 |avg tokens 4564.100 |tokens/s 33920.744 |walltime 1422.581 | +Transformer | epoch 0 | step 10300 |avg loss 5.122 |avg tokens 4845.600 |tokens/s 33976.300 |walltime 1424.007 | +Transformer | epoch 0 | step 10310 |avg loss 5.225 |avg tokens 4655.600 |tokens/s 32397.449 |walltime 1425.444 | +Transformer | epoch 0 | step 10320 |avg loss 5.852 |avg tokens 4871.800 |tokens/s 35464.781 |walltime 1426.818 | +Transformer | epoch 0 | step 10330 |avg loss 5.525 |avg tokens 4669.700 |tokens/s 33714.824 |walltime 1428.203 | +Transformer | epoch 0 | step 10340 |avg loss 5.723 |avg tokens 4009.200 |tokens/s 30191.617 |walltime 1429.531 | +Transformer | epoch 0 | step 10350 |avg loss 5.450 |avg tokens 4841.500 |tokens/s 34485.403 |walltime 1430.935 | +Transformer | epoch 0 | step 10360 |avg loss 5.156 |avg tokens 4895.900 |tokens/s 34903.313 |walltime 1432.337 | +Transformer | epoch 0 | step 10370 |avg loss 5.889 |avg tokens 4269.700 |tokens/s 31998.509 |walltime 1433.672 | +Transformer | epoch 0 | step 10380 |avg loss 6.312 |avg tokens 4456.000 |tokens/s 34373.743 |walltime 1434.968 | +Transformer | epoch 0 | step 10390 |avg loss 5.950 |avg tokens 4165.800 |tokens/s 31399.381 |walltime 1436.295 | +Transformer | epoch 0 | step 10400 |avg loss 5.545 |avg tokens 4443.100 |tokens/s 32585.493 |walltime 1437.658 | +Transformer | epoch 0 | step 10410 |avg loss 5.679 |avg tokens 4763.200 |tokens/s 34707.950 |walltime 1439.031 | +Transformer | epoch 0 | step 10420 |avg loss 5.359 |avg tokens 4777.900 |tokens/s 34606.886 |walltime 1440.411 | +Transformer | epoch 0 | step 10430 |avg loss 5.586 |avg tokens 4361.000 |tokens/s 31398.560 |walltime 1441.800 | +Transformer | epoch 0 | step 10440 |avg loss 6.298 |avg tokens 4274.900 |tokens/s 33821.810 |walltime 1443.064 | +Transformer | epoch 0 | step 10450 |avg loss 5.699 |avg tokens 4390.200 |tokens/s 33308.714 |walltime 1444.382 | +Transformer | epoch 0 | step 10460 |avg loss 5.415 |avg tokens 4791.600 |tokens/s 34858.027 |walltime 1445.757 | +Transformer | epoch 0 | step 10470 |avg loss 5.267 |avg tokens 4975.700 |tokens/s 34204.976 |walltime 1447.212 | +Transformer | epoch 0 | step 10480 |avg loss 5.850 |avg tokens 4391.000 |tokens/s 32044.801 |walltime 1448.582 | +Transformer | epoch 0 | step 10490 |avg loss 6.359 |avg tokens 4587.900 |tokens/s 34174.739 |walltime 1449.924 | +Transformer | epoch 0 | step 10500 |avg loss 5.512 |avg tokens 4443.400 |tokens/s 32724.307 |walltime 1451.282 | +Transformer | epoch 0 | step 10510 |avg loss 5.792 |avg tokens 4271.000 |tokens/s 31410.570 |walltime 1452.642 | +Transformer | epoch 0 | step 10520 |avg loss 5.550 |avg tokens 4381.500 |tokens/s 32030.127 |walltime 1454.010 | +Transformer | epoch 0 | step 10530 |avg loss 4.960 |avg tokens 4713.600 |tokens/s 32601.378 |walltime 1455.456 | +Transformer | epoch 0 | step 10540 |avg loss 5.798 |avg tokens 4410.900 |tokens/s 32171.019 |walltime 1456.827 | +Transformer | epoch 0 | step 10550 |avg loss 5.706 |avg tokens 4564.500 |tokens/s 33023.173 |walltime 1458.209 | +Transformer | epoch 0 | step 10560 |avg loss 6.094 |avg tokens 4595.600 |tokens/s 33883.595 |walltime 1459.565 | +Transformer | epoch 0 | step 10570 |avg loss 5.632 |avg tokens 4567.100 |tokens/s 32974.401 |walltime 1460.950 | +Transformer | epoch 0 | step 10580 |avg loss 5.424 |avg tokens 5002.300 |tokens/s 35677.900 |walltime 1462.352 | +Transformer | epoch 0 | step 10590 |avg loss 5.745 |avg tokens 4560.000 |tokens/s 33848.546 |walltime 1463.700 | +Transformer | epoch 0 | step 10600 |avg loss 6.220 |avg tokens 3996.600 |tokens/s 30165.802 |walltime 1465.024 | +Transformer | epoch 0 | step 10610 |avg loss 5.680 |avg tokens 4505.500 |tokens/s 31723.726 |walltime 1466.445 | +Transformer | epoch 0 | step 10620 |avg loss 5.725 |avg tokens 4790.300 |tokens/s 34656.991 |walltime 1467.827 | +Transformer | epoch 0 | step 10630 |avg loss 5.962 |avg tokens 4474.200 |tokens/s 32904.975 |walltime 1469.187 | +Transformer | epoch 0 | step 10640 |avg loss 6.063 |avg tokens 4486.000 |tokens/s 33490.539 |walltime 1470.526 | +Transformer | epoch 0 | step 10650 |avg loss 5.612 |avg tokens 4628.200 |tokens/s 34031.577 |walltime 1471.886 | +Transformer | epoch 0 | step 10660 |avg loss 5.536 |avg tokens 4440.900 |tokens/s 32269.153 |walltime 1473.262 | +Transformer | epoch 0 | step 10670 |avg loss 5.058 |avg tokens 4821.300 |tokens/s 34335.381 |walltime 1474.666 | +Transformer | epoch 0 | step 10680 |avg loss 5.352 |avg tokens 4553.900 |tokens/s 32462.339 |walltime 1476.069 | +Transformer | epoch 0 | step 10690 |avg loss 5.389 |avg tokens 4671.100 |tokens/s 33321.026 |walltime 1477.471 | +Transformer | epoch 0 | step 10700 |avg loss 6.035 |avg tokens 4130.900 |tokens/s 30071.468 |walltime 1478.845 | +Transformer | epoch 0 | step 10710 |avg loss 6.356 |avg tokens 4113.100 |tokens/s 32159.838 |walltime 1480.124 | +Transformer | epoch 0 | step 10720 |avg loss 6.089 |avg tokens 4595.700 |tokens/s 34415.511 |walltime 1481.459 | +Transformer | epoch 0 | step 10730 |avg loss 5.463 |avg tokens 4647.600 |tokens/s 33154.467 |walltime 1482.861 | +Transformer | epoch 0 | step 10740 |avg loss 5.300 |avg tokens 4548.000 |tokens/s 33233.803 |walltime 1484.229 | +Transformer | epoch 0 | step 10750 |avg loss 5.434 |avg tokens 4500.300 |tokens/s 32813.349 |walltime 1485.601 | +Transformer | epoch 0 | step 10760 |avg loss 4.901 |avg tokens 4619.300 |tokens/s 33071.148 |walltime 1486.998 | +Transformer | epoch 0 | step 10770 |avg loss 5.810 |avg tokens 4531.200 |tokens/s 33559.249 |walltime 1488.348 | +Transformer | epoch 0 | step 10780 |avg loss 5.693 |avg tokens 4396.600 |tokens/s 32768.233 |walltime 1489.690 | +Transformer | epoch 0 | step 10790 |avg loss 5.336 |avg tokens 4309.300 |tokens/s 31632.125 |walltime 1491.052 | +Transformer | epoch 0 | step 10800 |avg loss 5.426 |avg tokens 4734.800 |tokens/s 33565.502 |walltime 1492.463 | +Transformer | epoch 0 | step 10810 |avg loss 5.897 |avg tokens 4194.200 |tokens/s 32086.937 |walltime 1493.770 | +Transformer | epoch 0 | step 10820 |avg loss 5.438 |avg tokens 4426.800 |tokens/s 32416.121 |walltime 1495.135 | +Transformer | epoch 0 | step 10830 |avg loss 5.803 |avg tokens 4547.900 |tokens/s 32301.810 |walltime 1496.543 | +Transformer | epoch 0 | step 10840 |avg loss 5.189 |avg tokens 4768.000 |tokens/s 33459.393 |walltime 1497.968 | +Transformer | epoch 0 | step 10850 |avg loss 5.826 |avg tokens 4733.600 |tokens/s 34819.430 |walltime 1499.328 | +Transformer | epoch 0 | step 10860 |avg loss 5.883 |avg tokens 3946.100 |tokens/s 29972.726 |walltime 1500.644 | +Transformer | epoch 0 | step 10870 |avg loss 5.190 |avg tokens 4883.900 |tokens/s 33769.688 |walltime 1502.091 | +Transformer | epoch 0 | step 10880 |avg loss 6.150 |avg tokens 4399.100 |tokens/s 33876.653 |walltime 1503.389 | +Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 32.0 +Transformer | epoch 0 | step 10890 |avg loss 5.704 |avg tokens 4352.900 |tokens/s 31929.202 |walltime 1504.752 | +Transformer | epoch 0 | step 10900 |avg loss 5.969 |avg tokens 4621.200 |tokens/s 33065.617 |walltime 1506.150 | +Transformer | epoch 0 | step 10910 |avg loss 6.389 |avg tokens 4323.800 |tokens/s 33923.148 |walltime 1507.425 | +Transformer | epoch 0 | step 10920 |avg loss 5.500 |avg tokens 4428.200 |tokens/s 31210.041 |walltime 1508.844 | +Transformer | epoch 0 | step 10930 |avg loss 5.463 |avg tokens 4717.700 |tokens/s 34338.506 |walltime 1510.217 | +Transformer | epoch 0 | step 10940 |avg loss 5.760 |avg tokens 4358.300 |tokens/s 32681.601 |walltime 1511.551 | +Transformer | epoch 0 | step 10950 |avg loss 5.579 |avg tokens 4354.600 |tokens/s 31265.156 |walltime 1512.944 | +Transformer | epoch 0 | step 10960 |avg loss 5.682 |avg tokens 4630.300 |tokens/s 31732.224 |walltime 1514.403 | +Transformer | epoch 0 | step 10970 |avg loss 5.723 |avg tokens 4439.500 |tokens/s 32362.517 |walltime 1515.775 | +Transformer | epoch 0 | step 10980 |avg loss 5.788 |avg tokens 4342.300 |tokens/s 31664.665 |walltime 1517.146 | +Transformer | epoch 0 | step 10990 |avg loss 6.103 |avg tokens 4330.500 |tokens/s 32203.807 |walltime 1518.491 | +Transformer | epoch 0 | step 11000 |avg loss 5.641 |avg tokens 4611.400 |tokens/s 32684.392 |walltime 1519.902 | +Transformer | epoch 0 | step 11010 |avg loss 5.327 |avg tokens 4720.800 |tokens/s 33700.255 |walltime 1521.303 | +Transformer | epoch 0 | step 11020 |avg loss 5.621 |avg tokens 4350.200 |tokens/s 29958.665 |walltime 1522.755 | +Transformer | epoch 0 | step 11030 |avg loss 5.445 |avg tokens 4580.500 |tokens/s 32746.162 |walltime 1524.153 | +Transformer | epoch 0 | step 11040 |avg loss 5.568 |avg tokens 4142.600 |tokens/s 29657.171 |walltime 1525.550 | +Transformer | epoch 0 | step 11050 |avg loss 5.862 |avg tokens 4422.800 |tokens/s 33035.802 |walltime 1526.889 | +Transformer | epoch 0 | step 11060 |avg loss 6.342 |avg tokens 4627.000 |tokens/s 35135.971 |walltime 1528.206 | +Transformer | epoch 0 | step 11070 |avg loss 5.946 |avg tokens 4770.700 |tokens/s 35058.324 |walltime 1529.567 | +Transformer | epoch 0 | step 11080 |avg loss 5.570 |avg tokens 4659.700 |tokens/s 33830.800 |walltime 1530.944 | +Transformer | epoch 0 | step 11090 |avg loss 5.824 |avg tokens 3906.700 |tokens/s 28820.172 |walltime 1532.300 | +Transformer | epoch 0 | step 11100 |avg loss 5.964 |avg tokens 3946.700 |tokens/s 30164.627 |walltime 1533.608 | +Transformer | epoch 0 | step 11110 |avg loss 5.674 |avg tokens 4772.000 |tokens/s 33950.748 |walltime 1535.014 | +Transformer | epoch 0 | step 11120 |avg loss 5.490 |avg tokens 4643.700 |tokens/s 31838.451 |walltime 1536.472 | +Transformer | epoch 0 | step 11130 |avg loss 5.665 |avg tokens 4243.000 |tokens/s 31479.356 |walltime 1537.820 | +Transformer | epoch 0 | step 11140 |avg loss 5.473 |avg tokens 4572.500 |tokens/s 33093.758 |walltime 1539.202 | +Transformer | epoch 0 | step 11150 |avg loss 5.481 |avg tokens 4653.600 |tokens/s 33935.186 |walltime 1540.573 | +Transformer | epoch 0 | step 11160 |avg loss 5.092 |avg tokens 4906.800 |tokens/s 33359.010 |walltime 1542.044 | +Transformer | epoch 0 | step 11170 |avg loss 5.589 |avg tokens 4819.600 |tokens/s 35338.404 |walltime 1543.408 | +Transformer | epoch 0 | step 11180 |avg loss 5.218 |avg tokens 4745.100 |tokens/s 33665.827 |walltime 1544.817 | +Transformer | epoch 0 | step 11190 |avg loss 5.801 |avg tokens 4503.300 |tokens/s 33529.403 |walltime 1546.160 | +Transformer | epoch 0 | step 11200 |avg loss 5.834 |avg tokens 4298.700 |tokens/s 32212.655 |walltime 1547.495 | +Transformer | epoch 0 | step 11210 |avg loss 6.368 |avg tokens 4026.900 |tokens/s 30838.759 |walltime 1548.801 | +Transformer | epoch 0 | step 11220 |avg loss 5.683 |avg tokens 4750.800 |tokens/s 34677.665 |walltime 1550.171 | +Transformer | epoch 0 | step 11230 |avg loss 5.405 |avg tokens 4504.700 |tokens/s 32636.808 |walltime 1551.551 | +Transformer | epoch 0 | step 11240 |avg loss 5.117 |avg tokens 4768.300 |tokens/s 33444.705 |walltime 1552.977 | +Transformer | epoch 0 | step 11250 |avg loss 5.413 |avg tokens 4361.400 |tokens/s 31085.327 |walltime 1554.380 | +Transformer | epoch 0 | step 11260 |avg loss 5.182 |avg tokens 4532.600 |tokens/s 32284.999 |walltime 1555.784 | +Transformer | epoch 0 | step 11270 |avg loss 5.935 |avg tokens 4884.900 |tokens/s 35740.617 |walltime 1557.150 | +Transformer | epoch 0 | step 11280 |avg loss 5.302 |avg tokens 4552.300 |tokens/s 32700.869 |walltime 1558.542 | +Transformer | epoch 0 | step 11290 |avg loss 5.333 |avg tokens 4699.000 |tokens/s 33329.108 |walltime 1559.952 | +Transformer | epoch 0 | step 11300 |avg loss 5.546 |avg tokens 4578.400 |tokens/s 32584.686 |walltime 1561.357 | +Transformer | epoch 0 | step 11310 |avg loss 5.719 |avg tokens 4690.800 |tokens/s 33962.034 |walltime 1562.739 | +Transformer | epoch 0 | step 11320 |avg loss 5.147 |avg tokens 4518.400 |tokens/s 32100.846 |walltime 1564.146 | +Transformer | epoch 0 | step 11330 |avg loss 5.399 |avg tokens 4674.100 |tokens/s 33101.765 |walltime 1565.558 | +Transformer | epoch 0 | step 11340 |avg loss 5.285 |avg tokens 4529.800 |tokens/s 31976.844 |walltime 1566.975 | +Transformer | epoch 0 | step 11350 |avg loss 5.429 |avg tokens 4502.800 |tokens/s 32074.819 |walltime 1568.379 | +Transformer | epoch 0 | step 11360 |avg loss 6.015 |avg tokens 4589.200 |tokens/s 34201.448 |walltime 1569.720 | +Transformer | epoch 0 | step 11370 |avg loss 5.011 |avg tokens 4768.300 |tokens/s 33753.554 |walltime 1571.133 | +Transformer | epoch 0 | step 11380 |avg loss 5.179 |avg tokens 4748.200 |tokens/s 32638.722 |walltime 1572.588 | +Transformer | epoch 0 | step 11390 |avg loss 5.938 |avg tokens 4629.800 |tokens/s 33425.886 |walltime 1573.973 | +Transformer | epoch 0 | step 11400 |avg loss 5.742 |avg tokens 4573.500 |tokens/s 35014.387 |walltime 1575.279 | +Transformer | epoch 0 | step 11410 |avg loss 5.825 |avg tokens 4407.800 |tokens/s 33187.134 |walltime 1576.607 | +Transformer | epoch 0 | step 11420 |avg loss 5.612 |avg tokens 4619.400 |tokens/s 32866.982 |walltime 1578.013 | +Transformer | epoch 0 | step 11430 |avg loss 5.532 |avg tokens 4600.900 |tokens/s 32734.677 |walltime 1579.418 | +Transformer | epoch 0 | step 11440 |avg loss 5.808 |avg tokens 4122.500 |tokens/s 31123.326 |walltime 1580.743 | +Transformer | epoch 0 | step 11450 |avg loss 6.009 |avg tokens 4795.900 |tokens/s 34563.452 |walltime 1582.131 | +Transformer | epoch 0 | step 11460 |avg loss 6.131 |avg tokens 4248.200 |tokens/s 32247.983 |walltime 1583.448 | +Transformer | epoch 0 | step 11470 |avg loss 5.303 |avg tokens 4820.000 |tokens/s 34583.623 |walltime 1584.842 | +Transformer | epoch 0 | step 11480 |avg loss 5.288 |avg tokens 4606.400 |tokens/s 32793.495 |walltime 1586.246 | +Transformer | epoch 0 | step 11490 |avg loss 5.730 |avg tokens 4629.000 |tokens/s 34097.613 |walltime 1587.604 | +Transformer | epoch 0 | step 11500 |avg loss 5.544 |avg tokens 4640.400 |tokens/s 34516.576 |walltime 1588.948 | +Transformer | epoch 0 | step 11510 |avg loss 5.754 |avg tokens 4579.100 |tokens/s 33650.777 |walltime 1590.309 | +Transformer | epoch 0 | step 11520 |avg loss 6.206 |avg tokens 3906.200 |tokens/s 31532.108 |walltime 1591.548 | +Transformer | epoch 0 | step 11530 |avg loss 5.781 |avg tokens 4505.500 |tokens/s 34765.262 |walltime 1592.844 | +Transformer | epoch 0 | step 11540 |avg loss 6.299 |avg tokens 4330.800 |tokens/s 32315.834 |walltime 1594.184 | +Transformer | epoch 0 | step 11550 |avg loss 6.110 |avg tokens 4557.800 |tokens/s 34657.203 |walltime 1595.499 | +Transformer | epoch 0 | step 11560 |avg loss 5.411 |avg tokens 4662.800 |tokens/s 33050.344 |walltime 1596.910 | +Transformer | epoch 0 | step 11570 |avg loss 5.917 |avg tokens 4328.000 |tokens/s 33207.363 |walltime 1598.213 | +Transformer | epoch 0 | step 11580 |avg loss 6.236 |avg tokens 4221.900 |tokens/s 31922.030 |walltime 1599.536 | +Transformer | epoch 0 | step 11590 |avg loss 6.682 |avg tokens 3810.800 |tokens/s 30791.052 |walltime 1600.773 | +Transformer | epoch 0 | step 11600 |avg loss 5.523 |avg tokens 4540.200 |tokens/s 32845.668 |walltime 1602.156 | +Transformer | epoch 0 | step 11610 |avg loss 5.907 |avg tokens 4219.600 |tokens/s 31581.156 |walltime 1603.492 | +Transformer | epoch 0 | step 11620 |avg loss 5.270 |avg tokens 4497.300 |tokens/s 33422.789 |walltime 1604.837 | +Transformer | epoch 0 | step 11630 |avg loss 5.799 |avg tokens 4602.300 |tokens/s 34407.761 |walltime 1606.175 | +Transformer | epoch 0 | step 11640 |avg loss 5.612 |avg tokens 4527.800 |tokens/s 33412.758 |walltime 1607.530 | +Transformer | epoch 0 | step 11650 |avg loss 5.123 |avg tokens 4918.500 |tokens/s 33830.551 |walltime 1608.984 | +Transformer | epoch 0 | step 11660 |avg loss 5.404 |avg tokens 4834.500 |tokens/s 35339.166 |walltime 1610.352 | +Transformer | epoch 0 | step 11670 |avg loss 4.989 |avg tokens 4726.500 |tokens/s 33266.967 |walltime 1611.773 | +Transformer | epoch 0 | step 11680 |avg loss 6.334 |avg tokens 4541.000 |tokens/s 34069.449 |walltime 1613.106 | +Transformer | epoch 0 | step 11690 |avg loss 5.915 |avg tokens 3855.100 |tokens/s 28823.900 |walltime 1614.443 | +Transformer | epoch 0 | step 11700 |avg loss 5.570 |avg tokens 4750.300 |tokens/s 33415.270 |walltime 1615.865 | +Transformer | epoch 0 | step 11710 |avg loss 5.686 |avg tokens 4828.100 |tokens/s 35066.696 |walltime 1617.242 | +Transformer | epoch 0 | step 11720 |avg loss 5.109 |avg tokens 4664.200 |tokens/s 33479.643 |walltime 1618.635 | +Transformer | epoch 0 | step 11730 |avg loss 6.052 |avg tokens 4155.700 |tokens/s 30916.842 |walltime 1619.979 | +Transformer | epoch 0 | step 11740 |avg loss 5.420 |avg tokens 4346.500 |tokens/s 32114.157 |walltime 1621.332 | +Transformer | epoch 0 | step 11750 |avg loss 5.971 |avg tokens 4072.200 |tokens/s 31059.709 |walltime 1622.643 | +Transformer | epoch 0 | step 11760 |avg loss 5.697 |avg tokens 4446.600 |tokens/s 33997.381 |walltime 1623.951 | +Transformer | epoch 0 | step 11770 |avg loss 5.625 |avg tokens 4724.500 |tokens/s 33911.061 |walltime 1625.345 | +Transformer | epoch 0 | step 11780 |avg loss 5.841 |avg tokens 4499.900 |tokens/s 32424.802 |walltime 1626.732 | +Transformer | epoch 0 | step 11790 |avg loss 5.241 |avg tokens 4474.300 |tokens/s 31805.737 |walltime 1628.139 | +Transformer | epoch 0 | step 11800 |avg loss 5.137 |avg tokens 4977.600 |tokens/s 36211.659 |walltime 1629.514 | +Transformer | epoch 0 | step 11810 |avg loss 5.546 |avg tokens 4438.200 |tokens/s 32785.071 |walltime 1630.867 | +Transformer | epoch 0 | step 11820 |avg loss 5.314 |avg tokens 4310.200 |tokens/s 30994.372 |walltime 1632.258 | +Transformer | epoch 0 | step 11830 |avg loss 4.886 |avg tokens 4787.000 |tokens/s 32500.837 |walltime 1633.731 | +Transformer | epoch 0 | step 11840 |avg loss 5.588 |avg tokens 4412.900 |tokens/s 33635.109 |walltime 1635.043 | +Transformer | epoch 0 | step 11850 |avg loss 6.155 |avg tokens 4379.100 |tokens/s 33795.233 |walltime 1636.339 | +Transformer | epoch 0 | step 11860 |avg loss 5.488 |avg tokens 4243.400 |tokens/s 32209.191 |walltime 1637.656 | +Transformer | epoch 0 | step 11870 |avg loss 6.383 |avg tokens 4417.900 |tokens/s 33367.690 |walltime 1638.980 | +Transformer | epoch 0 | step 11880 |avg loss 5.634 |avg tokens 4563.400 |tokens/s 33443.506 |walltime 1640.345 | +Transformer | epoch 0 | step 11890 |avg loss 5.695 |avg tokens 4452.000 |tokens/s 33908.235 |walltime 1641.658 | +Transformer | epoch 0 | step 11900 |avg loss 5.535 |avg tokens 4612.500 |tokens/s 33356.732 |walltime 1643.040 | +Transformer | epoch 0 | step 11910 |avg loss 5.843 |avg tokens 4666.400 |tokens/s 34852.172 |walltime 1644.379 | +Transformer | epoch 0 | step 11920 |avg loss 5.608 |avg tokens 4679.300 |tokens/s 34382.625 |walltime 1645.740 | +Transformer | epoch 0 | step 11930 |avg loss 5.953 |avg tokens 4244.200 |tokens/s 32277.291 |walltime 1647.055 | +Transformer | epoch 0 | step 11940 |avg loss 5.619 |avg tokens 4541.000 |tokens/s 34098.030 |walltime 1648.387 | +Transformer | epoch 0 | step 11950 |avg loss 5.289 |avg tokens 4639.900 |tokens/s 32656.083 |walltime 1649.808 | +Transformer | epoch 0 | step 11960 |avg loss 5.604 |avg tokens 4543.700 |tokens/s 33195.609 |walltime 1651.177 | +Transformer | epoch 0 | step 11970 |avg loss 5.925 |avg tokens 3989.500 |tokens/s 31842.802 |walltime 1652.430 | +Transformer | epoch 0 | step 11980 |avg loss 5.641 |avg tokens 4768.600 |tokens/s 35227.051 |walltime 1653.783 | +Transformer | epoch 0 | step 11990 |avg loss 5.534 |avg tokens 4542.900 |tokens/s 33462.288 |walltime 1655.141 | +Transformer | epoch 0 | step 12000 |avg loss 5.961 |avg tokens 4267.200 |tokens/s 32547.918 |walltime 1656.452 | +Transformer | epoch 0 | step 12010 |avg loss 5.264 |avg tokens 4359.500 |tokens/s 31145.585 |walltime 1657.852 | +Transformer | epoch 0 | step 12020 |avg loss 5.197 |avg tokens 4481.400 |tokens/s 32787.308 |walltime 1659.218 | +Transformer | epoch 0 | step 12030 |avg loss 6.378 |avg tokens 3904.000 |tokens/s 30792.989 |walltime 1660.486 | +Transformer | epoch 0 | step 12040 |avg loss 5.861 |avg tokens 4195.200 |tokens/s 31845.419 |walltime 1661.804 | +Transformer | epoch 0 | step 12050 |avg loss 5.582 |avg tokens 4613.400 |tokens/s 33224.671 |walltime 1663.192 | +Transformer | epoch 0 | step 12060 |avg loss 5.463 |avg tokens 4670.800 |tokens/s 33745.048 |walltime 1664.576 | +Transformer | epoch 0 | step 12070 |avg loss 6.145 |avg tokens 4308.300 |tokens/s 31579.229 |walltime 1665.941 | +Transformer | epoch 0 | step 12080 |avg loss 6.128 |avg tokens 4009.200 |tokens/s 30856.395 |walltime 1667.240 | +Transformer | epoch 0 | step 12090 |avg loss 6.518 |avg tokens 4187.000 |tokens/s 32142.143 |walltime 1668.543 | +Transformer | epoch 0 | step 12100 |avg loss 5.396 |avg tokens 4662.500 |tokens/s 33726.952 |walltime 1669.925 | +Transformer | epoch 0 | step 12110 |avg loss 5.512 |avg tokens 4335.800 |tokens/s 31868.672 |walltime 1671.285 | +Transformer | epoch 0 | step 12120 |avg loss 5.517 |avg tokens 4848.500 |tokens/s 34346.223 |walltime 1672.697 | +Transformer | epoch 0 | step 12130 |avg loss 5.604 |avg tokens 4807.900 |tokens/s 35065.946 |walltime 1674.068 | +Transformer | epoch 0 | step 12140 |avg loss 5.418 |avg tokens 4347.400 |tokens/s 29872.372 |walltime 1675.524 | +Transformer | epoch 0 | step 12150 |avg loss 6.050 |avg tokens 4512.800 |tokens/s 32592.859 |walltime 1676.908 | Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 16.0 -Transformer | epoch 0 | step 5500 |avg loss 7.283 |avg tokens 4549.098 |tokens/s 30656.508 |walltime 833.238 | -Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8.0 -Transformer | epoch 0 | step 6000 |avg loss 7.275 |avg tokens 4529.492 |tokens/s 30358.172 |walltime 907.839 | -Transformer | epoch 0 | step 6500 |avg loss 7.408 |avg tokens 4514.662 |tokens/s 30332.363 |walltime 982.259 | -Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.0 -Transformer | epoch 0 | step 7000 |avg loss 7.559 |avg tokens 4523.974 |tokens/s 30411.125 |walltime 1056.639 | -Transformer | epoch 0 | step 7500 |avg loss 7.527 |avg tokens 4543.398 |tokens/s 30288.618 |walltime 1131.641 | -Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.0 -Transformer | epoch 0 | step 8000 |avg loss 7.543 |avg tokens 4531.322 |tokens/s 30204.047 |walltime 1206.653 | -Transformer | epoch 0 | step 8500 |avg loss 7.681 |avg tokens 4574.306 |tokens/s 30782.811 |walltime 1280.953 | -Transformer | epoch 0 | step 9000 |avg loss 7.736 |avg tokens 4495.478 |tokens/s 30609.895 |walltime 1354.384 | -Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.0 -Transformer | epoch 0 | step 9500 |avg loss 7.786 |avg tokens 4484.618 |tokens/s 30028.078 |walltime 1429.058 | -Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 0.5 -Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 0.25 -Transformer | epoch 0 | step 10000 |avg loss 7.691 |avg tokens 4567.118 |tokens/s 30780.733 |walltime 1503.246 | -Transformer | epoch 0 | step 10500 |avg loss 7.790 |avg tokens 4510.976 |tokens/s 30647.884 |walltime 1576.840 | -Transformer | epoch 0 | step 11000 |avg loss 7.752 |avg tokens 4499.432 |tokens/s 30318.893 |walltime 1651.042 | -Transformer | epoch 0 | step 11500 |avg loss 7.772 |avg tokens 4553.214 |tokens/s 30843.717 |walltime 1724.853 | -Transformer | epoch 0 | step 12000 |avg loss 7.826 |avg tokens 4472.098 |tokens/s 30739.117 |walltime 1797.595 | -Transformer | epoch 0 | step 12500 |avg loss 7.794 |avg tokens 4445.792 |tokens/s 30228.351 |walltime 1871.132 | -Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 0.25 -Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 0.125 -Transformer | epoch 0 | step 13000 |avg loss 7.757 |avg tokens 4550.220 |tokens/s 30678.936 |walltime 1945.291 | -Transformer | epoch 0 | step 13500 |avg loss 7.807 |avg tokens 4484.394 |tokens/s 30476.049 |walltime 2018.863 | -Transformer | epoch 0 | step 14000 |avg loss 7.827 |avg tokens 4520.988 |tokens/s 30552.921 |walltime 2092.850 | -Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 0.0625 -Transformer | epoch 0 | step 14500 |avg loss 7.762 |avg tokens 4521.436 |tokens/s 30523.482 |walltime 2166.914 | -Transformer | epoch 0 | step 15000 |avg loss 7.879 |avg tokens 4516.702 |tokens/s 30947.123 |walltime 2239.889 | -Transformer | epoch 0 | step 15500 |avg loss 7.848 |avg tokens 4499.284 |tokens/s 30559.256 |walltime 2313.505 | -Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 0.03125 -Transformer | epoch 0 | step 16000 |avg loss 7.874 |avg tokens 4557.068 |tokens/s 30914.484 |walltime 2387.209 | -Transformer | epoch 0 | step 16500 |avg loss 7.862 |avg tokens 4477.750 |tokens/s 30376.611 |walltime 2460.913 | -Transformer | epoch 0 | step 17000 |avg loss 7.814 |avg tokens 4606.024 |tokens/s 30842.483 |walltime 2535.583 | -Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 0.015625 -Transformer | epoch 0 | step 17500 |avg loss 7.869 |avg tokens 4479.544 |tokens/s 30338.165 |walltime 2609.410 | -Transformer | epoch 0 | step 18000 |avg loss 7.907 |avg tokens 4480.724 |tokens/s 30495.077 |walltime 2682.876 | -Transformer | epoch 0 | step 18500 |avg loss 7.845 |avg tokens 4512.074 |tokens/s 30558.811 |walltime 2756.702 | -Transformer | epoch 0 | step 19000 |avg loss 7.825 |avg tokens 4545.856 |tokens/s 30906.872 |walltime 2830.244 | -Transformer | epoch 0 | step 19500 |avg loss 7.840 |avg tokens 4546.442 |tokens/s 30527.025 |walltime 2904.710 | -Transformer | epoch 0 | step 20000 |avg loss 7.923 |avg tokens 4496.134 |tokens/s 30482.550 |walltime 2978.459 | -Transformer | epoch 0 | step 20500 |avg loss 7.905 |avg tokens 4519.676 |tokens/s 30679.300 |walltime 3052.119 | -Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 0.015625 -Transformer | epoch 0 | step 21000 |avg loss 7.958 |avg tokens 4509.232 |tokens/s 30261.188 |walltime 3126.624 | -Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 0.0078125 -Transformer | epoch 0 | step 21500 |avg loss 7.983 |avg tokens 4519.686 |tokens/s 30247.938 |walltime 3201.335 | -Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 0.00390625 -Transformer | epoch 0 | step 22000 |avg loss 8.078 |avg tokens 4499.402 |tokens/s 30601.066 |walltime 3274.852 | -Transformer | epoch 0 | step 22500 |avg loss 8.005 |avg tokens 4523.794 |tokens/s 30520.011 |walltime 3348.964 | -Transformer | epoch 0 | step 23000 |avg loss 8.006 |avg tokens 4512.090 |tokens/s 30523.122 |walltime 3422.876 | -Transformer | epoch 0 | step 23500 |avg loss 7.993 |avg tokens 4501.332 |tokens/s 30366.430 |walltime 3496.993 | -Transformer | epoch 0 | step 24000 |avg loss 8.012 |avg tokens 4482.898 |tokens/s 30488.550 |walltime 3570.511 | -Transformer | epoch 0 | step 24500 |avg loss 7.954 |avg tokens 4511.830 |tokens/s 30711.236 |walltime 3643.967 | -Transformer | epoch 0 | step 25000 |avg loss 7.939 |avg tokens 4555.644 |tokens/s 30817.959 |walltime 3717.879 | -Transformer | epoch 0 | step 25500 |avg loss 8.016 |avg tokens 4471.746 |tokens/s 30626.510 |walltime 3790.883 | -Transformer | epoch 0 | step 26000 |avg loss 7.950 |avg tokens 4516.412 |tokens/s 30559.760 |walltime 3864.778 | -Transformer | epoch 0 | step 26500 |avg loss 8.003 |avg tokens 4477.858 |tokens/s 30523.033 |walltime 3938.130 | -Transformer | epoch 0 | step 27000 |avg loss 7.933 |avg tokens 4532.400 |tokens/s 30811.621 |walltime 4011.680 | -Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 0.0078125 -Transformer | epoch 0 | step 27500 |avg loss 7.985 |avg tokens 4518.778 |tokens/s 30663.218 |walltime 4085.365 | -Transformer | epoch 0 | step 28000 |avg loss 7.990 |avg tokens 4587.856 |tokens/s 31275.274 |walltime 4158.711 | -Transformer | epoch 0 | step 28500 |avg loss 8.050 |avg tokens 4421.904 |tokens/s 30080.992 |walltime 4232.211 | -Transformer | epoch 0 | step 29000 |avg loss 8.012 |avg tokens 4549.126 |tokens/s 31214.659 |walltime 4305.079 | -Transformer | epoch 0 | step 29500 |avg loss 7.988 |avg tokens 4546.422 |tokens/s 31030.572 |walltime 4378.336 | -Transformer | epoch 0 | step 30000 |avg loss 8.006 |avg tokens 4524.482 |tokens/s 30744.507 |walltime 4451.918 | -Transformer | epoch 0 | step 30500 |avg loss 8.011 |avg tokens 4540.014 |tokens/s 30637.047 |walltime 4526.012 | -Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 0.0078125 -Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 0.00390625 -Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 0.001953125 -Transformer | epoch 0 | step 31000 |avg loss 8.004 |avg tokens 4498.210 |tokens/s 30299.802 |walltime 4600.240 | -Epoch time: 4661.986679553986 -Transformer | epoch 0 | step 31487 |avg loss 8.005 |avg tokens 4529.889 |tokens/s 30691.527 |walltime 4672.119 | -Validation loss on subset valid: 8.048188442107273 +Transformer | epoch 0 | step 12160 |avg loss 5.985 |avg tokens 4698.600 |tokens/s 35546.167 |walltime 1678.230 | +Transformer | epoch 0 | step 12170 |avg loss 5.977 |avg tokens 4112.000 |tokens/s 30856.367 |walltime 1679.563 | +Transformer | epoch 0 | step 12180 |avg loss 6.103 |avg tokens 3654.800 |tokens/s 28424.080 |walltime 1680.848 | +Transformer | epoch 0 | step 12190 |avg loss 6.001 |avg tokens 4510.200 |tokens/s 33738.668 |walltime 1682.185 | +Transformer | epoch 0 | step 12200 |avg loss 6.147 |avg tokens 3772.300 |tokens/s 28448.954 |walltime 1683.511 | +Transformer | epoch 0 | step 12210 |avg loss 5.956 |avg tokens 4375.200 |tokens/s 32741.603 |walltime 1684.848 | +Transformer | epoch 0 | step 12220 |avg loss 5.264 |avg tokens 4536.800 |tokens/s 32994.795 |walltime 1686.223 | +Transformer | epoch 0 | step 12230 |avg loss 5.912 |avg tokens 4701.200 |tokens/s 35137.262 |walltime 1687.561 | +Transformer | epoch 0 | step 12240 |avg loss 5.416 |avg tokens 4663.200 |tokens/s 34186.367 |walltime 1688.925 | +Transformer | epoch 0 | step 12250 |avg loss 5.950 |avg tokens 4209.000 |tokens/s 33365.650 |walltime 1690.186 | +Transformer | epoch 0 | step 12260 |avg loss 5.222 |avg tokens 4991.700 |tokens/s 35853.227 |walltime 1691.578 | +Transformer | epoch 0 | step 12270 |avg loss 4.745 |avg tokens 4914.400 |tokens/s 33594.042 |walltime 1693.041 | +Transformer | epoch 0 | step 12280 |avg loss 5.422 |avg tokens 4726.300 |tokens/s 34792.086 |walltime 1694.400 | +Transformer | epoch 0 | step 12290 |avg loss 6.070 |avg tokens 4410.000 |tokens/s 33578.042 |walltime 1695.713 | +Transformer | epoch 0 | step 12300 |avg loss 5.870 |avg tokens 4080.900 |tokens/s 31026.028 |walltime 1697.028 | +Transformer | epoch 0 | step 12310 |avg loss 5.332 |avg tokens 4775.400 |tokens/s 34908.934 |walltime 1698.396 | +Transformer | epoch 0 | step 12320 |avg loss 6.030 |avg tokens 4331.800 |tokens/s 33319.975 |walltime 1699.696 | +Transformer | epoch 0 | step 12330 |avg loss 5.339 |avg tokens 4599.400 |tokens/s 33574.439 |walltime 1701.066 | +Transformer | epoch 0 | step 12340 |avg loss 5.802 |avg tokens 4558.900 |tokens/s 33159.052 |walltime 1702.441 | +Transformer | epoch 0 | step 12350 |avg loss 5.307 |avg tokens 4620.300 |tokens/s 34339.078 |walltime 1703.787 | +Transformer | epoch 0 | step 12360 |avg loss 5.367 |avg tokens 4742.500 |tokens/s 33867.408 |walltime 1705.187 | +Transformer | epoch 0 | step 12370 |avg loss 5.649 |avg tokens 4302.300 |tokens/s 32513.287 |walltime 1706.510 | +Transformer | epoch 0 | step 12380 |avg loss 5.130 |avg tokens 4400.800 |tokens/s 31777.848 |walltime 1707.895 | +Transformer | epoch 0 | step 12390 |avg loss 5.460 |avg tokens 4336.200 |tokens/s 32876.694 |walltime 1709.214 | +Transformer | epoch 0 | step 12400 |avg loss 5.695 |avg tokens 4409.700 |tokens/s 32378.948 |walltime 1710.576 | +Transformer | epoch 0 | step 12410 |avg loss 5.408 |avg tokens 4550.500 |tokens/s 32700.977 |walltime 1711.967 | +Transformer | epoch 0 | step 12420 |avg loss 5.179 |avg tokens 4883.500 |tokens/s 35556.117 |walltime 1713.341 | +Transformer | epoch 0 | step 12430 |avg loss 5.420 |avg tokens 4664.500 |tokens/s 34377.306 |walltime 1714.698 | +Transformer | epoch 0 | step 12440 |avg loss 6.034 |avg tokens 4328.300 |tokens/s 33200.360 |walltime 1716.001 | +Transformer | epoch 0 | step 12450 |avg loss 5.400 |avg tokens 4719.100 |tokens/s 32638.455 |walltime 1717.447 | +Transformer | epoch 0 | step 12460 |avg loss 5.479 |avg tokens 4097.900 |tokens/s 29892.262 |walltime 1718.818 | +Transformer | epoch 0 | step 12470 |avg loss 5.416 |avg tokens 4403.900 |tokens/s 32502.376 |walltime 1720.173 | +Transformer | epoch 0 | step 12480 |avg loss 5.112 |avg tokens 4509.700 |tokens/s 32128.362 |walltime 1721.577 | +Transformer | epoch 0 | step 12490 |avg loss 5.455 |avg tokens 4199.300 |tokens/s 30842.156 |walltime 1722.938 | +Transformer | epoch 0 | step 12500 |avg loss 5.286 |avg tokens 4255.300 |tokens/s 31758.255 |walltime 1724.278 | +Transformer | epoch 0 | step 12510 |avg loss 5.662 |avg tokens 3838.800 |tokens/s 29754.115 |walltime 1725.568 | +Transformer | epoch 0 | step 12520 |avg loss 5.552 |avg tokens 4270.800 |tokens/s 31661.509 |walltime 1726.917 | +Transformer | epoch 0 | step 12530 |avg loss 5.367 |avg tokens 4399.300 |tokens/s 32427.737 |walltime 1728.274 | +Transformer | epoch 0 | step 12540 |avg loss 6.015 |avg tokens 4179.600 |tokens/s 32722.383 |walltime 1729.551 | +Transformer | epoch 0 | step 12550 |avg loss 5.095 |avg tokens 4896.800 |tokens/s 34517.087 |walltime 1730.970 | +Transformer | epoch 0 | step 12560 |avg loss 5.200 |avg tokens 4295.500 |tokens/s 30984.673 |walltime 1732.356 | +Transformer | epoch 0 | step 12570 |avg loss 6.296 |avg tokens 4402.200 |tokens/s 33825.788 |walltime 1733.658 | +Transformer | epoch 0 | step 12580 |avg loss 5.032 |avg tokens 4762.800 |tokens/s 32208.139 |walltime 1735.137 | +Transformer | epoch 0 | step 12590 |avg loss 5.579 |avg tokens 4682.800 |tokens/s 34696.822 |walltime 1736.486 | +Transformer | epoch 0 | step 12600 |avg loss 5.077 |avg tokens 4764.800 |tokens/s 33618.989 |walltime 1737.903 | +Transformer | epoch 0 | step 12610 |avg loss 5.467 |avg tokens 4572.500 |tokens/s 33495.963 |walltime 1739.269 | +Transformer | epoch 0 | step 12620 |avg loss 5.570 |avg tokens 4518.400 |tokens/s 33559.298 |walltime 1740.615 | +Transformer | epoch 0 | step 12630 |avg loss 5.242 |avg tokens 4612.800 |tokens/s 33291.157 |walltime 1742.001 | +Transformer | epoch 0 | step 12640 |avg loss 5.073 |avg tokens 4755.700 |tokens/s 33250.887 |walltime 1743.431 | +Transformer | epoch 0 | step 12650 |avg loss 5.232 |avg tokens 4681.700 |tokens/s 33324.785 |walltime 1744.836 | +Transformer | epoch 0 | step 12660 |avg loss 5.887 |avg tokens 4348.200 |tokens/s 33096.930 |walltime 1746.149 | +Transformer | epoch 0 | step 12670 |avg loss 5.533 |avg tokens 4448.600 |tokens/s 33270.332 |walltime 1747.487 | +Transformer | epoch 0 | step 12680 |avg loss 5.322 |avg tokens 4784.300 |tokens/s 33933.895 |walltime 1748.896 | +Transformer | epoch 0 | step 12690 |avg loss 6.076 |avg tokens 3652.900 |tokens/s 27942.220 |walltime 1750.204 | +Transformer | epoch 0 | step 12700 |avg loss 5.762 |avg tokens 4738.200 |tokens/s 35556.765 |walltime 1751.536 | +Transformer | epoch 0 | step 12710 |avg loss 5.392 |avg tokens 4789.100 |tokens/s 35031.564 |walltime 1752.903 | +Transformer | epoch 0 | step 12720 |avg loss 5.581 |avg tokens 4432.500 |tokens/s 32007.792 |walltime 1754.288 | +Transformer | epoch 0 | step 12730 |avg loss 5.354 |avg tokens 4804.700 |tokens/s 34927.641 |walltime 1755.664 | +Transformer | epoch 0 | step 12740 |avg loss 6.363 |avg tokens 4467.000 |tokens/s 33194.396 |walltime 1757.010 | +Transformer | epoch 0 | step 12750 |avg loss 5.535 |avg tokens 4709.300 |tokens/s 33989.633 |walltime 1758.395 | +Transformer | epoch 0 | step 12760 |avg loss 6.067 |avg tokens 4511.100 |tokens/s 33707.371 |walltime 1759.733 | +Transformer | epoch 0 | step 12770 |avg loss 4.928 |avg tokens 4810.300 |tokens/s 33663.790 |walltime 1761.162 | +Transformer | epoch 0 | step 12780 |avg loss 5.347 |avg tokens 4607.700 |tokens/s 33180.869 |walltime 1762.551 | +Transformer | epoch 0 | step 12790 |avg loss 5.111 |avg tokens 4683.700 |tokens/s 34008.473 |walltime 1763.928 | +Transformer | epoch 0 | step 12800 |avg loss 5.129 |avg tokens 4866.400 |tokens/s 34936.919 |walltime 1765.321 | +Transformer | epoch 0 | step 12810 |avg loss 5.375 |avg tokens 4761.800 |tokens/s 33803.627 |walltime 1766.730 | +Transformer | epoch 0 | step 12820 |avg loss 5.379 |avg tokens 4244.200 |tokens/s 31561.779 |walltime 1768.075 | +Transformer | epoch 0 | step 12830 |avg loss 5.639 |avg tokens 4498.600 |tokens/s 33234.753 |walltime 1769.428 | +Transformer | epoch 0 | step 12840 |avg loss 5.507 |avg tokens 4525.400 |tokens/s 33285.092 |walltime 1770.788 | +Transformer | epoch 0 | step 12850 |avg loss 5.187 |avg tokens 4659.600 |tokens/s 32943.364 |walltime 1772.202 | +Transformer | epoch 0 | step 12860 |avg loss 5.278 |avg tokens 4546.800 |tokens/s 32868.193 |walltime 1773.586 | +Transformer | epoch 0 | step 12870 |avg loss 5.821 |avg tokens 4216.200 |tokens/s 32232.339 |walltime 1774.894 | +Transformer | epoch 0 | step 12880 |avg loss 5.312 |avg tokens 4532.800 |tokens/s 32531.849 |walltime 1776.287 | +Transformer | epoch 0 | step 12890 |avg loss 5.290 |avg tokens 4514.300 |tokens/s 32402.008 |walltime 1777.680 | +Transformer | epoch 0 | step 12900 |avg loss 4.795 |avg tokens 4762.000 |tokens/s 34331.102 |walltime 1779.067 | +Transformer | epoch 0 | step 12910 |avg loss 5.184 |avg tokens 4760.800 |tokens/s 34136.368 |walltime 1780.462 | +Transformer | epoch 0 | step 12920 |avg loss 6.204 |avg tokens 4062.300 |tokens/s 30809.420 |walltime 1781.780 | +Transformer | epoch 0 | step 12930 |avg loss 5.687 |avg tokens 4657.400 |tokens/s 34508.867 |walltime 1783.130 | +Transformer | epoch 0 | step 12940 |avg loss 5.283 |avg tokens 4633.600 |tokens/s 33147.991 |walltime 1784.528 | +Transformer | epoch 0 | step 12950 |avg loss 5.385 |avg tokens 4871.200 |tokens/s 35109.001 |walltime 1785.915 | +Transformer | epoch 0 | step 12960 |avg loss 5.243 |avg tokens 4586.400 |tokens/s 32362.180 |walltime 1787.333 | +Transformer | epoch 0 | step 12970 |avg loss 5.244 |avg tokens 4715.800 |tokens/s 34016.809 |walltime 1788.719 | +Transformer | epoch 0 | step 12980 |avg loss 5.477 |avg tokens 4449.400 |tokens/s 32100.572 |walltime 1790.105 | +Transformer | epoch 0 | step 12990 |avg loss 5.356 |avg tokens 4396.300 |tokens/s 32535.251 |walltime 1791.456 | +Transformer | epoch 0 | step 13000 |avg loss 5.614 |avg tokens 4827.600 |tokens/s 36472.846 |walltime 1792.780 | +Transformer | epoch 0 | step 13010 |avg loss 5.139 |avg tokens 4485.800 |tokens/s 33326.618 |walltime 1794.126 | +Transformer | epoch 0 | step 13020 |avg loss 5.184 |avg tokens 4319.900 |tokens/s 30635.116 |walltime 1795.536 | +Transformer | epoch 0 | step 13030 |avg loss 5.696 |avg tokens 4058.500 |tokens/s 30093.120 |walltime 1796.885 | +Transformer | epoch 0 | step 13040 |avg loss 5.537 |avg tokens 4791.500 |tokens/s 35280.630 |walltime 1798.243 | +Transformer | epoch 0 | step 13050 |avg loss 5.390 |avg tokens 4294.100 |tokens/s 31510.211 |walltime 1799.605 | +Transformer | epoch 0 | step 13060 |avg loss 5.702 |avg tokens 4597.800 |tokens/s 33448.167 |walltime 1800.980 | +Transformer | epoch 0 | step 13070 |avg loss 5.527 |avg tokens 4670.500 |tokens/s 33900.684 |walltime 1802.358 | +Transformer | epoch 0 | step 13080 |avg loss 5.749 |avg tokens 4119.900 |tokens/s 30543.057 |walltime 1803.707 | +Transformer | epoch 0 | step 13090 |avg loss 5.518 |avg tokens 4483.200 |tokens/s 32263.550 |walltime 1805.096 | +Transformer | epoch 0 | step 13100 |avg loss 5.306 |avg tokens 4271.900 |tokens/s 32041.796 |walltime 1806.429 | +Transformer | epoch 0 | step 13110 |avg loss 5.318 |avg tokens 4812.300 |tokens/s 34507.172 |walltime 1807.824 | +Transformer | epoch 0 | step 13120 |avg loss 6.584 |avg tokens 3831.600 |tokens/s 31542.087 |walltime 1809.039 | +Transformer | epoch 0 | step 13130 |avg loss 5.597 |avg tokens 4729.600 |tokens/s 34883.357 |walltime 1810.395 | +Transformer | epoch 0 | step 13140 |avg loss 5.647 |avg tokens 4187.400 |tokens/s 30851.164 |walltime 1811.752 | +Transformer | epoch 0 | step 13150 |avg loss 5.333 |avg tokens 4208.400 |tokens/s 31012.198 |walltime 1813.109 | +Transformer | epoch 0 | step 13160 |avg loss 5.076 |avg tokens 4578.100 |tokens/s 33053.724 |walltime 1814.494 | +Transformer | epoch 0 | step 13170 |avg loss 5.589 |avg tokens 4310.800 |tokens/s 31202.472 |walltime 1815.876 | +Transformer | epoch 0 | step 13180 |avg loss 5.498 |avg tokens 4549.600 |tokens/s 32739.879 |walltime 1817.265 | +Transformer | epoch 0 | step 13190 |avg loss 5.516 |avg tokens 4436.400 |tokens/s 31904.478 |walltime 1818.656 | +Transformer | epoch 0 | step 13200 |avg loss 6.216 |avg tokens 3979.200 |tokens/s 29138.929 |walltime 1820.021 | +Transformer | epoch 0 | step 13210 |avg loss 5.748 |avg tokens 4787.700 |tokens/s 34941.173 |walltime 1821.392 | +Transformer | epoch 0 | step 13220 |avg loss 5.203 |avg tokens 4473.300 |tokens/s 30976.348 |walltime 1822.836 | +Transformer | epoch 0 | step 13230 |avg loss 5.356 |avg tokens 4832.500 |tokens/s 33712.277 |walltime 1824.269 | +Transformer | epoch 0 | step 13240 |avg loss 5.753 |avg tokens 4653.200 |tokens/s 33098.912 |walltime 1825.675 | +Transformer | epoch 0 | step 13250 |avg loss 5.108 |avg tokens 4381.500 |tokens/s 31293.948 |walltime 1827.075 | +Transformer | epoch 0 | step 13260 |avg loss 4.922 |avg tokens 4992.000 |tokens/s 35543.193 |walltime 1828.480 | +Transformer | epoch 0 | step 13270 |avg loss 5.572 |avg tokens 4532.100 |tokens/s 33478.901 |walltime 1829.833 | +Transformer | epoch 0 | step 13280 |avg loss 5.257 |avg tokens 4657.300 |tokens/s 34094.575 |walltime 1831.199 | +Transformer | epoch 0 | step 13290 |avg loss 5.192 |avg tokens 4556.800 |tokens/s 32632.350 |walltime 1832.596 | +Transformer | epoch 0 | step 13300 |avg loss 5.053 |avg tokens 4834.700 |tokens/s 33901.470 |walltime 1834.022 | +Transformer | epoch 0 | step 13310 |avg loss 5.897 |avg tokens 4072.200 |tokens/s 30829.321 |walltime 1835.343 | +Transformer | epoch 0 | step 13320 |avg loss 5.318 |avg tokens 4315.600 |tokens/s 31793.236 |walltime 1836.700 | +Transformer | epoch 0 | step 13330 |avg loss 5.655 |avg tokens 4642.300 |tokens/s 33721.258 |walltime 1838.077 | +Transformer | epoch 0 | step 13340 |avg loss 5.464 |avg tokens 4308.000 |tokens/s 31452.598 |walltime 1839.446 | +Transformer | epoch 0 | step 13350 |avg loss 5.636 |avg tokens 4546.700 |tokens/s 33688.656 |walltime 1840.796 | +Transformer | epoch 0 | step 13360 |avg loss 5.505 |avg tokens 4709.400 |tokens/s 34160.119 |walltime 1842.175 | +Transformer | epoch 0 | step 13370 |avg loss 5.201 |avg tokens 4776.300 |tokens/s 34488.450 |walltime 1843.560 | +Transformer | epoch 0 | step 13380 |avg loss 5.656 |avg tokens 4482.500 |tokens/s 32635.662 |walltime 1844.933 | +Transformer | epoch 0 | step 13390 |avg loss 5.607 |avg tokens 4300.500 |tokens/s 31776.396 |walltime 1846.287 | +Transformer | epoch 0 | step 13400 |avg loss 5.751 |avg tokens 4778.200 |tokens/s 34944.013 |walltime 1847.654 | +Transformer | epoch 0 | step 13410 |avg loss 5.675 |avg tokens 4417.800 |tokens/s 31949.981 |walltime 1849.037 | +Transformer | epoch 0 | step 13420 |avg loss 5.151 |avg tokens 4591.200 |tokens/s 33376.702 |walltime 1850.412 | +Transformer | epoch 0 | step 13430 |avg loss 5.814 |avg tokens 3976.100 |tokens/s 27880.025 |walltime 1851.838 | +Transformer | epoch 0 | step 13440 |avg loss 5.254 |avg tokens 4646.800 |tokens/s 33780.564 |walltime 1853.214 | +Transformer | epoch 0 | step 13450 |avg loss 6.072 |avg tokens 4342.100 |tokens/s 32711.246 |walltime 1854.541 | +Transformer | epoch 0 | step 13460 |avg loss 5.169 |avg tokens 4710.900 |tokens/s 33402.311 |walltime 1855.952 | +Transformer | epoch 0 | step 13470 |avg loss 5.317 |avg tokens 4849.700 |tokens/s 34198.998 |walltime 1857.370 | +Transformer | epoch 0 | step 13480 |avg loss 5.567 |avg tokens 4687.300 |tokens/s 34614.527 |walltime 1858.724 | +Transformer | epoch 0 | step 13490 |avg loss 5.212 |avg tokens 4480.000 |tokens/s 31995.911 |walltime 1860.124 | +Transformer | epoch 0 | step 13500 |avg loss 5.713 |avg tokens 4168.500 |tokens/s 30839.693 |walltime 1861.476 | +Transformer | epoch 0 | step 13510 |avg loss 5.343 |avg tokens 4632.800 |tokens/s 33961.649 |walltime 1862.840 | +Transformer | epoch 0 | step 13520 |avg loss 5.596 |avg tokens 4336.300 |tokens/s 32332.595 |walltime 1864.181 | +Transformer | epoch 0 | step 13530 |avg loss 5.480 |avg tokens 4387.500 |tokens/s 32620.569 |walltime 1865.526 | +Transformer | epoch 0 | step 13540 |avg loss 6.221 |avg tokens 4262.500 |tokens/s 32609.893 |walltime 1866.833 | +Transformer | epoch 0 | step 13550 |avg loss 6.084 |avg tokens 3607.700 |tokens/s 28598.394 |walltime 1868.095 | +Transformer | epoch 0 | step 13560 |avg loss 5.241 |avg tokens 4685.100 |tokens/s 33199.078 |walltime 1869.506 | +Transformer | epoch 0 | step 13570 |avg loss 5.230 |avg tokens 4581.200 |tokens/s 33113.134 |walltime 1870.889 | +Transformer | epoch 0 | step 13580 |avg loss 5.346 |avg tokens 4563.100 |tokens/s 32712.061 |walltime 1872.284 | +Transformer | epoch 0 | step 13590 |avg loss 5.631 |avg tokens 4397.600 |tokens/s 32749.516 |walltime 1873.627 | +Transformer | epoch 0 | step 13600 |avg loss 6.093 |avg tokens 3968.700 |tokens/s 31476.559 |walltime 1874.888 | +Transformer | epoch 0 | step 13610 |avg loss 5.461 |avg tokens 4350.000 |tokens/s 32438.488 |walltime 1876.229 | +Transformer | epoch 0 | step 13620 |avg loss 5.173 |avg tokens 4590.300 |tokens/s 33126.275 |walltime 1877.615 | +Transformer | epoch 0 | step 13630 |avg loss 5.572 |avg tokens 4419.400 |tokens/s 32783.706 |walltime 1878.963 | +Transformer | epoch 0 | step 13640 |avg loss 5.731 |avg tokens 4622.800 |tokens/s 35034.509 |walltime 1880.282 | +Transformer | epoch 0 | step 13650 |avg loss 4.974 |avg tokens 4666.200 |tokens/s 34281.714 |walltime 1881.643 | +Transformer | epoch 0 | step 13660 |avg loss 5.525 |avg tokens 4416.900 |tokens/s 32137.225 |walltime 1883.018 | +Transformer | epoch 0 | step 13670 |avg loss 5.156 |avg tokens 4668.000 |tokens/s 32882.690 |walltime 1884.437 | +Transformer | epoch 0 | step 13680 |avg loss 5.871 |avg tokens 4594.700 |tokens/s 34953.586 |walltime 1885.752 | +Transformer | epoch 0 | step 13690 |avg loss 6.167 |avg tokens 4357.600 |tokens/s 32519.307 |walltime 1887.092 | +Transformer | epoch 0 | step 13700 |avg loss 4.978 |avg tokens 4835.800 |tokens/s 34001.823 |walltime 1888.514 | +Transformer | epoch 0 | step 13710 |avg loss 4.933 |avg tokens 4884.200 |tokens/s 34170.967 |walltime 1889.944 | +Transformer | epoch 0 | step 13720 |avg loss 5.169 |avg tokens 4631.200 |tokens/s 33240.505 |walltime 1891.337 | +Transformer | epoch 0 | step 13730 |avg loss 6.031 |avg tokens 4620.100 |tokens/s 34546.829 |walltime 1892.674 | +Transformer | epoch 0 | step 13740 |avg loss 5.576 |avg tokens 4798.000 |tokens/s 34576.507 |walltime 1894.062 | +Transformer | epoch 0 | step 13750 |avg loss 6.323 |avg tokens 4258.300 |tokens/s 32850.376 |walltime 1895.358 | +Transformer | epoch 0 | step 13760 |avg loss 5.361 |avg tokens 4431.500 |tokens/s 32894.485 |walltime 1896.705 | +Transformer | epoch 0 | step 13770 |avg loss 5.296 |avg tokens 4483.700 |tokens/s 32161.458 |walltime 1898.099 | +Transformer | epoch 0 | step 13780 |avg loss 4.982 |avg tokens 4733.800 |tokens/s 33686.800 |walltime 1899.505 | +Transformer | epoch 0 | step 13790 |avg loss 4.940 |avg tokens 4773.900 |tokens/s 33993.275 |walltime 1900.909 | +Transformer | epoch 0 | step 13800 |avg loss 4.950 |avg tokens 4649.800 |tokens/s 32697.915 |walltime 1902.331 | +Transformer | epoch 0 | step 13810 |avg loss 6.192 |avg tokens 4250.900 |tokens/s 32546.058 |walltime 1903.637 | +Transformer | epoch 0 | step 13820 |avg loss 6.034 |avg tokens 4518.900 |tokens/s 33910.070 |walltime 1904.970 | +Transformer | epoch 0 | step 13830 |avg loss 5.447 |avg tokens 4763.700 |tokens/s 34689.939 |walltime 1906.343 | +Transformer | epoch 0 | step 13840 |avg loss 5.290 |avg tokens 4453.500 |tokens/s 32521.955 |walltime 1907.712 | +Transformer | epoch 0 | step 13850 |avg loss 4.862 |avg tokens 4929.000 |tokens/s 34229.630 |walltime 1909.152 | +Transformer | epoch 0 | step 13860 |avg loss 4.873 |avg tokens 4743.500 |tokens/s 33713.054 |walltime 1910.559 | +Transformer | epoch 0 | step 13870 |avg loss 4.879 |avg tokens 4862.100 |tokens/s 33823.512 |walltime 1911.997 | +Transformer | epoch 0 | step 13880 |avg loss 5.526 |avg tokens 4073.200 |tokens/s 31051.029 |walltime 1913.309 | +Transformer | epoch 0 | step 13890 |avg loss 5.309 |avg tokens 4256.200 |tokens/s 30578.502 |walltime 1914.701 | +Transformer | epoch 0 | step 13900 |avg loss 5.371 |avg tokens 4504.700 |tokens/s 33456.455 |walltime 1916.047 | +Transformer | epoch 0 | step 13910 |avg loss 5.702 |avg tokens 4862.900 |tokens/s 34978.902 |walltime 1917.437 | +Transformer | epoch 0 | step 13920 |avg loss 5.263 |avg tokens 4477.000 |tokens/s 32481.437 |walltime 1918.816 | +Transformer | epoch 0 | step 13930 |avg loss 5.405 |avg tokens 4585.500 |tokens/s 33842.799 |walltime 1920.171 | +Transformer | epoch 0 | step 13940 |avg loss 5.080 |avg tokens 4721.200 |tokens/s 33942.666 |walltime 1921.561 | +Transformer | epoch 0 | step 13950 |avg loss 5.189 |avg tokens 4321.200 |tokens/s 31558.233 |walltime 1922.931 | +Transformer | epoch 0 | step 13960 |avg loss 5.576 |avg tokens 4461.500 |tokens/s 32437.008 |walltime 1924.306 | +Transformer | epoch 0 | step 13970 |avg loss 5.077 |avg tokens 4744.000 |tokens/s 32758.759 |walltime 1925.754 | +Transformer | epoch 0 | step 13980 |avg loss 4.861 |avg tokens 4888.100 |tokens/s 34558.229 |walltime 1927.169 | +Transformer | epoch 0 | step 13990 |avg loss 5.481 |avg tokens 4233.400 |tokens/s 30602.140 |walltime 1928.552 | +Transformer | epoch 0 | step 14000 |avg loss 5.502 |avg tokens 4190.200 |tokens/s 31518.107 |walltime 1929.882 | +Transformer | epoch 0 | step 14010 |avg loss 5.142 |avg tokens 4863.800 |tokens/s 34353.726 |walltime 1931.297 | +Transformer | epoch 0 | step 14020 |avg loss 5.720 |avg tokens 4451.500 |tokens/s 33422.110 |walltime 1932.629 | +Transformer | epoch 0 | step 14030 |avg loss 5.002 |avg tokens 4933.600 |tokens/s 34751.940 |walltime 1934.049 | +Transformer | epoch 0 | step 14040 |avg loss 5.375 |avg tokens 4477.900 |tokens/s 33276.237 |walltime 1935.395 | +Transformer | epoch 0 | step 14050 |avg loss 4.800 |avg tokens 4781.100 |tokens/s 34014.879 |walltime 1936.800 | +Transformer | epoch 0 | step 14060 |avg loss 5.355 |avg tokens 4725.600 |tokens/s 33826.833 |walltime 1938.197 | +Transformer | epoch 0 | step 14070 |avg loss 5.697 |avg tokens 4691.800 |tokens/s 34336.554 |walltime 1939.564 | +Transformer | epoch 0 | step 14080 |avg loss 5.083 |avg tokens 4675.900 |tokens/s 33515.027 |walltime 1940.959 | +Transformer | epoch 0 | step 14090 |avg loss 5.697 |avg tokens 4049.700 |tokens/s 30431.810 |walltime 1942.290 | +Transformer | epoch 0 | step 14100 |avg loss 6.394 |avg tokens 4112.900 |tokens/s 31901.089 |walltime 1943.579 | +Transformer | epoch 0 | step 14110 |avg loss 5.623 |avg tokens 4593.700 |tokens/s 33557.109 |walltime 1944.948 | +Transformer | epoch 0 | step 14120 |avg loss 5.880 |avg tokens 4485.800 |tokens/s 32648.235 |walltime 1946.322 | +Transformer | epoch 0 | step 14130 |avg loss 5.969 |avg tokens 4559.500 |tokens/s 33824.237 |walltime 1947.670 | +Transformer | epoch 0 | step 14140 |avg loss 4.836 |avg tokens 4659.200 |tokens/s 32708.920 |walltime 1949.094 | +Transformer | epoch 0 | step 14150 |avg loss 4.953 |avg tokens 4964.900 |tokens/s 35836.087 |walltime 1950.480 | +Transformer | epoch 0 | step 14160 |avg loss 5.148 |avg tokens 4171.900 |tokens/s 30795.004 |walltime 1951.834 | +Transformer | epoch 0 | step 14170 |avg loss 5.938 |avg tokens 4305.800 |tokens/s 31896.501 |walltime 1953.184 | +Transformer | epoch 0 | step 14180 |avg loss 5.326 |avg tokens 4644.100 |tokens/s 32560.154 |walltime 1954.611 | +Transformer | epoch 0 | step 14190 |avg loss 5.143 |avg tokens 4566.400 |tokens/s 33048.782 |walltime 1955.992 | +Transformer | epoch 0 | step 14200 |avg loss 5.548 |avg tokens 4142.600 |tokens/s 30914.355 |walltime 1957.332 | +Transformer | epoch 0 | step 14210 |avg loss 5.826 |avg tokens 4362.300 |tokens/s 33091.959 |walltime 1958.651 | +Transformer | epoch 0 | step 14220 |avg loss 5.924 |avg tokens 4100.100 |tokens/s 30520.208 |walltime 1959.994 | +Transformer | epoch 0 | step 14230 |avg loss 5.514 |avg tokens 4697.600 |tokens/s 33759.877 |walltime 1961.386 | +Transformer | epoch 0 | step 14240 |avg loss 5.124 |avg tokens 4424.600 |tokens/s 33132.206 |walltime 1962.721 | +Transformer | epoch 0 | step 14250 |avg loss 4.975 |avg tokens 4733.900 |tokens/s 33609.885 |walltime 1964.130 | +Transformer | epoch 0 | step 14260 |avg loss 5.190 |avg tokens 4628.800 |tokens/s 33204.465 |walltime 1965.524 | +Transformer | epoch 0 | step 14270 |avg loss 4.953 |avg tokens 4790.400 |tokens/s 33337.289 |walltime 1966.960 | +Transformer | epoch 0 | step 14280 |avg loss 5.396 |avg tokens 4243.500 |tokens/s 29380.507 |walltime 1968.405 | +Transformer | epoch 0 | step 14290 |avg loss 4.825 |avg tokens 4936.000 |tokens/s 34815.387 |walltime 1969.823 | +Transformer | epoch 0 | step 14300 |avg loss 5.322 |avg tokens 4504.500 |tokens/s 31799.118 |walltime 1971.239 | +Transformer | epoch 0 | step 14310 |avg loss 5.260 |avg tokens 4435.500 |tokens/s 32513.683 |walltime 1972.603 | +Transformer | epoch 0 | step 14320 |avg loss 4.853 |avg tokens 4907.500 |tokens/s 34258.515 |walltime 1974.036 | +Transformer | epoch 0 | step 14330 |avg loss 4.848 |avg tokens 4786.400 |tokens/s 33411.277 |walltime 1975.468 | +Transformer | epoch 0 | step 14340 |avg loss 5.441 |avg tokens 4509.600 |tokens/s 31890.035 |walltime 1976.883 | +Transformer | epoch 0 | step 14350 |avg loss 5.732 |avg tokens 4397.800 |tokens/s 32967.721 |walltime 1978.216 | +Transformer | epoch 0 | step 14360 |avg loss 4.985 |avg tokens 4636.700 |tokens/s 32549.416 |walltime 1979.641 | +Transformer | epoch 0 | step 14370 |avg loss 6.110 |avg tokens 4789.400 |tokens/s 35872.893 |walltime 1980.976 | +Transformer | epoch 0 | step 14380 |avg loss 5.290 |avg tokens 4741.000 |tokens/s 33943.396 |walltime 1982.373 | +Transformer | epoch 0 | step 14390 |avg loss 5.575 |avg tokens 4344.500 |tokens/s 32625.289 |walltime 1983.704 | +Transformer | epoch 0 | step 14400 |avg loss 5.293 |avg tokens 4273.300 |tokens/s 30750.152 |walltime 1985.094 | +Transformer | epoch 0 | step 14410 |avg loss 5.795 |avg tokens 3978.700 |tokens/s 28960.847 |walltime 1986.468 | +Transformer | epoch 0 | step 14420 |avg loss 5.195 |avg tokens 4690.700 |tokens/s 34031.634 |walltime 1987.846 | +Transformer | epoch 0 | step 14430 |avg loss 5.896 |avg tokens 4695.000 |tokens/s 36189.443 |walltime 1989.144 | +Transformer | epoch 0 | step 14440 |avg loss 6.073 |avg tokens 3559.200 |tokens/s 27555.894 |walltime 1990.435 | +Transformer | epoch 0 | step 14450 |avg loss 4.931 |avg tokens 4805.600 |tokens/s 33617.358 |walltime 1991.865 | +Transformer | epoch 0 | step 14460 |avg loss 5.539 |avg tokens 4652.300 |tokens/s 34598.293 |walltime 1993.209 | +Transformer | epoch 0 | step 14470 |avg loss 5.619 |avg tokens 4518.400 |tokens/s 33469.508 |walltime 1994.560 | +Transformer | epoch 0 | step 14480 |avg loss 5.509 |avg tokens 4549.600 |tokens/s 33129.239 |walltime 1995.933 | +Transformer | epoch 0 | step 14490 |avg loss 5.918 |avg tokens 4031.300 |tokens/s 29925.999 |walltime 1997.280 | +Transformer | epoch 0 | step 14500 |avg loss 5.948 |avg tokens 4489.900 |tokens/s 33027.927 |walltime 1998.639 | +Transformer | epoch 0 | step 14510 |avg loss 5.355 |avg tokens 4768.100 |tokens/s 33617.661 |walltime 2000.058 | +Transformer | epoch 0 | step 14520 |avg loss 6.212 |avg tokens 4473.900 |tokens/s 34897.723 |walltime 2001.340 | +Transformer | epoch 0 | step 14530 |avg loss 5.284 |avg tokens 4470.500 |tokens/s 31613.125 |walltime 2002.754 | +Transformer | epoch 0 | step 14540 |avg loss 5.349 |avg tokens 4442.000 |tokens/s 31957.797 |walltime 2004.144 | +Transformer | epoch 0 | step 14550 |avg loss 5.546 |avg tokens 4304.600 |tokens/s 31662.752 |walltime 2005.503 | +Transformer | epoch 0 | step 14560 |avg loss 5.478 |avg tokens 4463.500 |tokens/s 33127.209 |walltime 2006.851 | +Transformer | epoch 0 | step 14570 |avg loss 5.501 |avg tokens 4620.600 |tokens/s 33009.332 |walltime 2008.250 | +Transformer | epoch 0 | step 14580 |avg loss 5.293 |avg tokens 4748.400 |tokens/s 34147.343 |walltime 2009.641 | +Transformer | epoch 0 | step 14590 |avg loss 5.820 |avg tokens 4153.600 |tokens/s 32199.891 |walltime 2010.931 | +Transformer | epoch 0 | step 14600 |avg loss 5.270 |avg tokens 4630.700 |tokens/s 33246.998 |walltime 2012.324 | +Transformer | epoch 0 | step 14610 |avg loss 5.992 |avg tokens 4141.100 |tokens/s 31681.278 |walltime 2013.631 | +Transformer | epoch 0 | step 14620 |avg loss 5.422 |avg tokens 4907.400 |tokens/s 36257.140 |walltime 2014.984 | +Transformer | epoch 0 | step 14630 |avg loss 5.383 |avg tokens 4391.200 |tokens/s 31785.159 |walltime 2016.366 | +Transformer | epoch 0 | step 14640 |avg loss 5.221 |avg tokens 4649.500 |tokens/s 33044.360 |walltime 2017.773 | +Transformer | epoch 0 | step 14650 |avg loss 5.996 |avg tokens 4281.100 |tokens/s 32940.124 |walltime 2019.073 | +Transformer | epoch 0 | step 14660 |avg loss 5.889 |avg tokens 4459.600 |tokens/s 33070.591 |walltime 2020.421 | +Transformer | epoch 0 | step 14670 |avg loss 5.910 |avg tokens 4367.500 |tokens/s 32411.373 |walltime 2021.769 | +Transformer | epoch 0 | step 14680 |avg loss 5.514 |avg tokens 4331.600 |tokens/s 31784.709 |walltime 2023.131 | +Transformer | epoch 0 | step 14690 |avg loss 5.374 |avg tokens 4393.500 |tokens/s 32366.094 |walltime 2024.489 | +Transformer | epoch 0 | step 14700 |avg loss 5.600 |avg tokens 4359.100 |tokens/s 32101.163 |walltime 2025.847 | +Transformer | epoch 0 | step 14710 |avg loss 5.475 |avg tokens 4589.700 |tokens/s 34132.991 |walltime 2027.192 | +Transformer | epoch 0 | step 14720 |avg loss 5.394 |avg tokens 4428.000 |tokens/s 32188.316 |walltime 2028.567 | +Transformer | epoch 0 | step 14730 |avg loss 5.582 |avg tokens 4517.700 |tokens/s 33348.804 |walltime 2029.922 | +Transformer | epoch 0 | step 14740 |avg loss 5.419 |avg tokens 4786.500 |tokens/s 35134.986 |walltime 2031.284 | +Transformer | epoch 0 | step 14750 |avg loss 5.711 |avg tokens 4495.200 |tokens/s 32783.178 |walltime 2032.655 | +Transformer | epoch 0 | step 14760 |avg loss 5.419 |avg tokens 4681.700 |tokens/s 35116.582 |walltime 2033.989 | +Transformer | epoch 0 | step 14770 |avg loss 5.449 |avg tokens 4681.000 |tokens/s 33884.670 |walltime 2035.370 | +Transformer | epoch 0 | step 14780 |avg loss 6.058 |avg tokens 4117.200 |tokens/s 30849.027 |walltime 2036.705 | +Transformer | epoch 0 | step 14790 |avg loss 5.721 |avg tokens 3985.400 |tokens/s 31138.490 |walltime 2037.985 | +Transformer | epoch 0 | step 14800 |avg loss 5.418 |avg tokens 4890.900 |tokens/s 35276.958 |walltime 2039.371 | +Transformer | epoch 0 | step 14810 |avg loss 5.532 |avg tokens 4523.900 |tokens/s 33648.698 |walltime 2040.715 | +Transformer | epoch 0 | step 14820 |avg loss 5.153 |avg tokens 4404.000 |tokens/s 32180.587 |walltime 2042.084 | +Transformer | epoch 0 | step 14830 |avg loss 5.455 |avg tokens 4694.800 |tokens/s 33900.286 |walltime 2043.469 | +Transformer | epoch 0 | step 14840 |avg loss 5.223 |avg tokens 4570.000 |tokens/s 32583.293 |walltime 2044.871 | +Transformer | epoch 0 | step 14850 |avg loss 5.031 |avg tokens 4628.800 |tokens/s 33820.112 |walltime 2046.240 | +Transformer | epoch 0 | step 14860 |avg loss 5.192 |avg tokens 4560.000 |tokens/s 33064.795 |walltime 2047.619 | +Transformer | epoch 0 | step 14870 |avg loss 5.110 |avg tokens 4656.000 |tokens/s 34003.175 |walltime 2048.988 | +Transformer | epoch 0 | step 14880 |avg loss 5.002 |avg tokens 4490.600 |tokens/s 32320.165 |walltime 2050.378 | +Transformer | epoch 0 | step 14890 |avg loss 6.047 |avg tokens 4137.700 |tokens/s 32203.135 |walltime 2051.663 | +Transformer | epoch 0 | step 14900 |avg loss 5.486 |avg tokens 4939.200 |tokens/s 35985.005 |walltime 2053.035 | +Transformer | epoch 0 | step 14910 |avg loss 4.974 |avg tokens 4665.100 |tokens/s 32384.966 |walltime 2054.476 | +Transformer | epoch 0 | step 14920 |avg loss 5.574 |avg tokens 4317.100 |tokens/s 31759.091 |walltime 2055.835 | +Transformer | epoch 0 | step 14930 |avg loss 5.181 |avg tokens 4750.600 |tokens/s 34384.548 |walltime 2057.217 | +Transformer | epoch 0 | step 14940 |avg loss 5.139 |avg tokens 4550.100 |tokens/s 32022.450 |walltime 2058.638 | +Transformer | epoch 0 | step 14950 |avg loss 5.684 |avg tokens 4455.500 |tokens/s 33064.892 |walltime 2059.985 | +Transformer | epoch 0 | step 14960 |avg loss 6.188 |avg tokens 4373.200 |tokens/s 33752.713 |walltime 2061.281 | +Transformer | epoch 0 | step 14970 |avg loss 4.961 |avg tokens 4863.000 |tokens/s 35259.541 |walltime 2062.660 | +Transformer | epoch 0 | step 14980 |avg loss 5.191 |avg tokens 4631.300 |tokens/s 33551.401 |walltime 2064.040 | +Transformer | epoch 0 | step 14990 |avg loss 4.747 |avg tokens 4847.200 |tokens/s 33555.650 |walltime 2065.485 | +Transformer | epoch 0 | step 15000 |avg loss 5.501 |avg tokens 4246.200 |tokens/s 31357.198 |walltime 2066.839 | +Transformer | epoch 0 | step 15010 |avg loss 5.741 |avg tokens 4037.700 |tokens/s 30930.639 |walltime 2068.145 | +Transformer | epoch 0 | step 15020 |avg loss 4.781 |avg tokens 4879.100 |tokens/s 33808.698 |walltime 2069.588 | +Transformer | epoch 0 | step 15030 |avg loss 5.708 |avg tokens 4334.900 |tokens/s 32924.759 |walltime 2070.904 | +Transformer | epoch 0 | step 15040 |avg loss 5.203 |avg tokens 4773.600 |tokens/s 34305.073 |walltime 2072.296 | +Transformer | epoch 0 | step 15050 |avg loss 5.645 |avg tokens 4724.100 |tokens/s 34083.233 |walltime 2073.682 | +Transformer | epoch 0 | step 15060 |avg loss 5.191 |avg tokens 4574.700 |tokens/s 33547.967 |walltime 2075.046 | +Transformer | epoch 0 | step 15070 |avg loss 5.647 |avg tokens 3989.400 |tokens/s 30366.033 |walltime 2076.359 | +Transformer | epoch 0 | step 15080 |avg loss 5.687 |avg tokens 4440.500 |tokens/s 32893.617 |walltime 2077.709 | +Transformer | epoch 0 | step 15090 |avg loss 5.210 |avg tokens 4797.300 |tokens/s 34112.005 |walltime 2079.116 | +Transformer | epoch 0 | step 15100 |avg loss 5.156 |avg tokens 4801.600 |tokens/s 34130.893 |walltime 2080.522 | +Transformer | epoch 0 | step 15110 |avg loss 5.906 |avg tokens 4407.300 |tokens/s 33009.297 |walltime 2081.858 | +Transformer | epoch 0 | step 15120 |avg loss 5.293 |avg tokens 4652.000 |tokens/s 34882.829 |walltime 2083.191 | +Transformer | epoch 0 | step 15130 |avg loss 4.913 |avg tokens 4776.000 |tokens/s 33452.114 |walltime 2084.619 | +Transformer | epoch 0 | step 15140 |avg loss 5.872 |avg tokens 4246.700 |tokens/s 31299.737 |walltime 2085.976 | +Transformer | epoch 0 | step 15150 |avg loss 5.197 |avg tokens 4407.600 |tokens/s 31745.849 |walltime 2087.364 | +Transformer | epoch 0 | step 15160 |avg loss 5.109 |avg tokens 4630.400 |tokens/s 33293.114 |walltime 2088.755 | +Transformer | epoch 0 | step 15170 |avg loss 5.372 |avg tokens 4393.200 |tokens/s 33469.124 |walltime 2090.068 | +Transformer | epoch 0 | step 15180 |avg loss 5.414 |avg tokens 4612.300 |tokens/s 33531.424 |walltime 2091.443 | +Transformer | epoch 0 | step 15190 |avg loss 5.735 |avg tokens 4737.500 |tokens/s 35011.946 |walltime 2092.796 | +Transformer | epoch 0 | step 15200 |avg loss 5.067 |avg tokens 4569.800 |tokens/s 33403.340 |walltime 2094.164 | +Transformer | epoch 0 | step 15210 |avg loss 5.254 |avg tokens 4984.000 |tokens/s 35605.400 |walltime 2095.564 | +Transformer | epoch 0 | step 15220 |avg loss 5.446 |avg tokens 4530.600 |tokens/s 33762.237 |walltime 2096.906 | +Transformer | epoch 0 | step 15230 |avg loss 6.650 |avg tokens 2894.900 |tokens/s 23561.892 |walltime 2098.135 | +Transformer | epoch 0 | step 15240 |avg loss 5.355 |avg tokens 4064.900 |tokens/s 30275.988 |walltime 2099.477 | +Transformer | epoch 0 | step 15250 |avg loss 5.437 |avg tokens 4310.700 |tokens/s 32717.374 |walltime 2100.795 | +Transformer | epoch 0 | step 15260 |avg loss 5.070 |avg tokens 4663.200 |tokens/s 33726.804 |walltime 2102.177 | +Transformer | epoch 0 | step 15270 |avg loss 5.695 |avg tokens 4582.700 |tokens/s 35079.596 |walltime 2103.484 | +Transformer | epoch 0 | step 15280 |avg loss 5.295 |avg tokens 4307.400 |tokens/s 31402.296 |walltime 2104.855 | +Transformer | epoch 0 | step 15290 |avg loss 5.400 |avg tokens 4550.000 |tokens/s 33768.389 |walltime 2106.203 | +Transformer | epoch 0 | step 15300 |avg loss 5.754 |avg tokens 4632.100 |tokens/s 35607.547 |walltime 2107.504 | +Transformer | epoch 0 | step 15310 |avg loss 5.001 |avg tokens 4852.000 |tokens/s 34304.632 |walltime 2108.918 | +Transformer | epoch 0 | step 15320 |avg loss 5.073 |avg tokens 4638.600 |tokens/s 33965.794 |walltime 2110.284 | +Transformer | epoch 0 | step 15330 |avg loss 5.299 |avg tokens 4807.000 |tokens/s 34965.378 |walltime 2111.659 | +Transformer | epoch 0 | step 15340 |avg loss 5.202 |avg tokens 4563.700 |tokens/s 33911.438 |walltime 2113.004 | +Transformer | epoch 0 | step 15350 |avg loss 5.242 |avg tokens 4398.600 |tokens/s 32108.874 |walltime 2114.374 | +Transformer | epoch 0 | step 15360 |avg loss 5.423 |avg tokens 4867.500 |tokens/s 35188.479 |walltime 2115.758 | +Transformer | epoch 0 | step 15370 |avg loss 5.337 |avg tokens 4246.800 |tokens/s 31410.739 |walltime 2117.110 | +Transformer | epoch 0 | step 15380 |avg loss 5.575 |avg tokens 4377.300 |tokens/s 31934.442 |walltime 2118.480 | +Transformer | epoch 0 | step 15390 |avg loss 5.399 |avg tokens 4378.600 |tokens/s 31792.440 |walltime 2119.858 | +Transformer | epoch 0 | step 15400 |avg loss 5.738 |avg tokens 3998.200 |tokens/s 30540.945 |walltime 2121.167 | +Transformer | epoch 0 | step 15410 |avg loss 5.650 |avg tokens 4322.300 |tokens/s 32051.092 |walltime 2122.515 | +Transformer | epoch 0 | step 15420 |avg loss 5.268 |avg tokens 4712.000 |tokens/s 32856.983 |walltime 2123.949 | +Transformer | epoch 0 | step 15430 |avg loss 5.485 |avg tokens 4665.900 |tokens/s 33662.159 |walltime 2125.335 | +Transformer | epoch 0 | step 15440 |avg loss 5.589 |avg tokens 4556.600 |tokens/s 34038.492 |walltime 2126.674 | +Transformer | epoch 0 | step 15450 |avg loss 5.358 |avg tokens 4639.000 |tokens/s 32723.202 |walltime 2128.092 | +Transformer | epoch 0 | step 15460 |avg loss 5.047 |avg tokens 4563.700 |tokens/s 33004.508 |walltime 2129.475 | +Transformer | epoch 0 | step 15470 |avg loss 5.157 |avg tokens 4744.000 |tokens/s 33707.337 |walltime 2130.882 | +Transformer | epoch 0 | step 15480 |avg loss 5.326 |avg tokens 4801.300 |tokens/s 35252.194 |walltime 2132.244 | +Transformer | epoch 0 | step 15490 |avg loss 5.904 |avg tokens 4111.100 |tokens/s 31664.754 |walltime 2133.542 | +Transformer | epoch 0 | step 15500 |avg loss 5.418 |avg tokens 4413.800 |tokens/s 32364.089 |walltime 2134.906 | +Transformer | epoch 0 | step 15510 |avg loss 5.937 |avg tokens 3881.100 |tokens/s 29799.467 |walltime 2136.208 | +Transformer | epoch 0 | step 15520 |avg loss 5.552 |avg tokens 4735.100 |tokens/s 34340.420 |walltime 2137.587 | +Transformer | epoch 0 | step 15530 |avg loss 5.622 |avg tokens 4067.900 |tokens/s 30488.960 |walltime 2138.922 | +Transformer | epoch 0 | step 15540 |avg loss 5.037 |avg tokens 4897.600 |tokens/s 35759.954 |walltime 2140.291 | +Transformer | epoch 0 | step 15550 |avg loss 5.299 |avg tokens 4574.800 |tokens/s 32412.860 |walltime 2141.703 | +Transformer | epoch 0 | step 15560 |avg loss 5.304 |avg tokens 4689.800 |tokens/s 33615.858 |walltime 2143.098 | +Transformer | epoch 0 | step 15570 |avg loss 5.557 |avg tokens 4498.700 |tokens/s 33139.424 |walltime 2144.455 | +Transformer | epoch 0 | step 15580 |avg loss 5.710 |avg tokens 3921.700 |tokens/s 29373.154 |walltime 2145.790 | +Transformer | epoch 0 | step 15590 |avg loss 6.279 |avg tokens 4612.500 |tokens/s 34789.854 |walltime 2147.116 | +Transformer | epoch 0 | step 15600 |avg loss 5.323 |avg tokens 4582.800 |tokens/s 33452.745 |walltime 2148.486 | +Transformer | epoch 0 | step 15610 |avg loss 4.901 |avg tokens 4835.800 |tokens/s 32751.867 |walltime 2149.963 | +Transformer | epoch 0 | step 15620 |avg loss 4.993 |avg tokens 4486.600 |tokens/s 32315.381 |walltime 2151.351 | +Transformer | epoch 0 | step 15630 |avg loss 5.141 |avg tokens 4562.400 |tokens/s 33166.461 |walltime 2152.727 | +Transformer | epoch 0 | step 15640 |avg loss 5.186 |avg tokens 4657.400 |tokens/s 33734.894 |walltime 2154.107 | +Transformer | epoch 0 | step 15650 |avg loss 5.117 |avg tokens 4610.000 |tokens/s 33047.437 |walltime 2155.502 | +Transformer | epoch 0 | step 15660 |avg loss 5.296 |avg tokens 4366.600 |tokens/s 31496.438 |walltime 2156.889 | +Transformer | epoch 0 | step 15670 |avg loss 5.513 |avg tokens 4660.600 |tokens/s 33830.898 |walltime 2158.266 | +Transformer | epoch 0 | step 15680 |avg loss 4.918 |avg tokens 4645.300 |tokens/s 33382.856 |walltime 2159.658 | +Transformer | epoch 0 | step 15690 |avg loss 5.228 |avg tokens 4223.300 |tokens/s 30935.107 |walltime 2161.023 | +Transformer | epoch 0 | step 15700 |avg loss 5.414 |avg tokens 4584.600 |tokens/s 34391.894 |walltime 2162.356 | +Transformer | epoch 0 | step 15710 |avg loss 5.152 |avg tokens 4802.200 |tokens/s 34631.914 |walltime 2163.743 | +Transformer | epoch 0 | step 15720 |avg loss 4.985 |avg tokens 4883.300 |tokens/s 33539.382 |walltime 2165.199 | +Transformer | epoch 0 | step 15730 |avg loss 5.774 |avg tokens 4395.300 |tokens/s 33152.013 |walltime 2166.524 | +Transformer | epoch 0 | step 15740 |avg loss 4.703 |avg tokens 4818.600 |tokens/s 34011.665 |walltime 2167.941 | +Transformer | epoch 0 | step 15750 |avg loss 5.430 |avg tokens 4760.500 |tokens/s 35474.964 |walltime 2169.283 | +Transformer | epoch 0 | step 15760 |avg loss 5.478 |avg tokens 4121.400 |tokens/s 30776.136 |walltime 2170.622 | +Transformer | epoch 0 | step 15770 |avg loss 5.347 |avg tokens 4797.100 |tokens/s 33862.730 |walltime 2172.039 | +Transformer | epoch 0 | step 15780 |avg loss 4.821 |avg tokens 4738.600 |tokens/s 32872.448 |walltime 2173.480 | +Transformer | epoch 0 | step 15790 |avg loss 5.764 |avg tokens 4289.300 |tokens/s 33362.624 |walltime 2174.766 | +Transformer | epoch 0 | step 15800 |avg loss 6.055 |avg tokens 4395.200 |tokens/s 33360.354 |walltime 2176.084 | +Transformer | epoch 0 | step 15810 |avg loss 5.406 |avg tokens 4749.900 |tokens/s 34165.957 |walltime 2177.474 | +Transformer | epoch 0 | step 15820 |avg loss 5.678 |avg tokens 4663.400 |tokens/s 34782.570 |walltime 2178.815 | +Transformer | epoch 0 | step 15830 |avg loss 4.891 |avg tokens 4840.900 |tokens/s 34311.724 |walltime 2180.225 | +Transformer | epoch 0 | step 15840 |avg loss 5.278 |avg tokens 4474.100 |tokens/s 33616.548 |walltime 2181.556 | +Transformer | epoch 0 | step 15850 |avg loss 5.412 |avg tokens 4702.900 |tokens/s 33908.676 |walltime 2182.943 | +Transformer | epoch 0 | step 15860 |avg loss 6.292 |avg tokens 4082.700 |tokens/s 32185.564 |walltime 2184.212 | +Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 16.0 +Transformer | epoch 0 | step 15870 |avg loss 4.752 |avg tokens 4550.800 |tokens/s 33052.649 |walltime 2185.589 | +Transformer | epoch 0 | step 15880 |avg loss 5.100 |avg tokens 4373.000 |tokens/s 32580.263 |walltime 2186.931 | +Transformer | epoch 0 | step 15890 |avg loss 5.194 |avg tokens 4464.200 |tokens/s 32369.249 |walltime 2188.310 | +Transformer | epoch 0 | step 15900 |avg loss 5.050 |avg tokens 4450.100 |tokens/s 32103.717 |walltime 2189.696 | +Transformer | epoch 0 | step 15910 |avg loss 5.885 |avg tokens 4624.800 |tokens/s 34883.599 |walltime 2191.022 | +Transformer | epoch 0 | step 15920 |avg loss 5.432 |avg tokens 4876.000 |tokens/s 34816.236 |walltime 2192.422 | +Transformer | epoch 0 | step 15930 |avg loss 5.661 |avg tokens 4642.300 |tokens/s 34504.213 |walltime 2193.768 | +Transformer | epoch 0 | step 15940 |avg loss 5.968 |avg tokens 4797.600 |tokens/s 35906.778 |walltime 2195.104 | +Transformer | epoch 0 | step 15950 |avg loss 5.329 |avg tokens 4310.000 |tokens/s 32171.193 |walltime 2196.444 | +Transformer | epoch 0 | step 15960 |avg loss 5.118 |avg tokens 4732.800 |tokens/s 34629.045 |walltime 2197.810 | +Transformer | epoch 0 | step 15970 |avg loss 5.157 |avg tokens 4423.900 |tokens/s 31742.378 |walltime 2199.204 | +Transformer | epoch 0 | step 15980 |avg loss 5.776 |avg tokens 4613.000 |tokens/s 35024.638 |walltime 2200.521 | +Transformer | epoch 0 | step 15990 |avg loss 5.126 |avg tokens 4711.000 |tokens/s 33268.035 |walltime 2201.937 | +Transformer | epoch 0 | step 16000 |avg loss 5.235 |avg tokens 4673.900 |tokens/s 33563.900 |walltime 2203.330 | +Transformer | epoch 0 | step 16010 |avg loss 4.862 |avg tokens 4759.500 |tokens/s 33469.113 |walltime 2204.752 | +Transformer | epoch 0 | step 16020 |avg loss 5.113 |avg tokens 4666.200 |tokens/s 33127.033 |walltime 2206.160 | +Transformer | epoch 0 | step 16030 |avg loss 5.605 |avg tokens 4540.300 |tokens/s 33781.830 |walltime 2207.504 | +Transformer | epoch 0 | step 16040 |avg loss 5.233 |avg tokens 4872.700 |tokens/s 34844.025 |walltime 2208.903 | +Transformer | epoch 0 | step 16050 |avg loss 5.477 |avg tokens 4461.900 |tokens/s 32959.603 |walltime 2210.257 | +Transformer | epoch 0 | step 16060 |avg loss 5.712 |avg tokens 4407.700 |tokens/s 33317.109 |walltime 2211.580 | +Transformer | epoch 0 | step 16070 |avg loss 5.977 |avg tokens 3808.900 |tokens/s 29491.980 |walltime 2212.871 | +Transformer | epoch 0 | step 16080 |avg loss 6.517 |avg tokens 4129.400 |tokens/s 32721.582 |walltime 2214.133 | +Transformer | epoch 0 | step 16090 |avg loss 5.456 |avg tokens 4354.900 |tokens/s 32144.025 |walltime 2215.488 | +Transformer | epoch 0 | step 16100 |avg loss 5.000 |avg tokens 4537.700 |tokens/s 32306.913 |walltime 2216.892 | +Transformer | epoch 0 | step 16110 |avg loss 5.346 |avg tokens 4651.600 |tokens/s 33559.719 |walltime 2218.279 | +Transformer | epoch 0 | step 16120 |avg loss 5.135 |avg tokens 4422.300 |tokens/s 32177.352 |walltime 2219.653 | +Transformer | epoch 0 | step 16130 |avg loss 5.282 |avg tokens 4170.100 |tokens/s 29979.691 |walltime 2221.044 | +Transformer | epoch 0 | step 16140 |avg loss 5.332 |avg tokens 4750.900 |tokens/s 34336.396 |walltime 2222.428 | +Transformer | epoch 0 | step 16150 |avg loss 5.542 |avg tokens 4890.200 |tokens/s 36064.603 |walltime 2223.783 | +Transformer | epoch 0 | step 16160 |avg loss 5.245 |avg tokens 4200.400 |tokens/s 31298.246 |walltime 2225.126 | +Transformer | epoch 0 | step 16170 |avg loss 4.914 |avg tokens 4660.000 |tokens/s 34128.774 |walltime 2226.491 | +Transformer | epoch 0 | step 16180 |avg loss 5.411 |avg tokens 4450.900 |tokens/s 32317.485 |walltime 2227.868 | +Transformer | epoch 0 | step 16190 |avg loss 5.454 |avg tokens 4723.500 |tokens/s 33485.690 |walltime 2229.279 | +Transformer | epoch 0 | step 16200 |avg loss 5.280 |avg tokens 4472.200 |tokens/s 32741.935 |walltime 2230.645 | +Transformer | epoch 0 | step 16210 |avg loss 5.606 |avg tokens 4315.600 |tokens/s 31425.115 |walltime 2232.018 | +Transformer | epoch 0 | step 16220 |avg loss 5.798 |avg tokens 4254.800 |tokens/s 32116.563 |walltime 2233.343 | +Transformer | epoch 0 | step 16230 |avg loss 5.965 |avg tokens 3805.300 |tokens/s 30462.079 |walltime 2234.592 | +Transformer | epoch 0 | step 16240 |avg loss 5.564 |avg tokens 4480.900 |tokens/s 33508.200 |walltime 2235.929 | +Transformer | epoch 0 | step 16250 |avg loss 5.573 |avg tokens 4242.400 |tokens/s 31496.889 |walltime 2237.276 | +Transformer | epoch 0 | step 16260 |avg loss 4.972 |avg tokens 4310.800 |tokens/s 31113.668 |walltime 2238.662 | +Transformer | epoch 0 | step 16270 |avg loss 5.396 |avg tokens 4273.600 |tokens/s 31713.488 |walltime 2240.009 | +Transformer | epoch 0 | step 16280 |avg loss 5.488 |avg tokens 4316.000 |tokens/s 32952.144 |walltime 2241.319 | +Transformer | epoch 0 | step 16290 |avg loss 6.545 |avg tokens 4269.100 |tokens/s 34065.130 |walltime 2242.572 | +Transformer | epoch 0 | step 16300 |avg loss 5.495 |avg tokens 4833.700 |tokens/s 34856.127 |walltime 2243.959 | +Transformer | epoch 0 | step 16310 |avg loss 5.069 |avg tokens 4650.900 |tokens/s 33351.927 |walltime 2245.354 | +Transformer | epoch 0 | step 16320 |avg loss 6.139 |avg tokens 3866.000 |tokens/s 30231.760 |walltime 2246.632 | +Transformer | epoch 0 | step 16330 |avg loss 5.533 |avg tokens 4466.600 |tokens/s 32259.743 |walltime 2248.017 | +Transformer | epoch 0 | step 16340 |avg loss 4.913 |avg tokens 4556.400 |tokens/s 32725.232 |walltime 2249.409 | +Transformer | epoch 0 | step 16350 |avg loss 5.142 |avg tokens 4370.500 |tokens/s 30918.961 |walltime 2250.823 | +Transformer | epoch 0 | step 16360 |avg loss 5.513 |avg tokens 4304.600 |tokens/s 31300.375 |walltime 2252.198 | +Transformer | epoch 0 | step 16370 |avg loss 5.418 |avg tokens 4518.600 |tokens/s 34121.238 |walltime 2253.522 | +Transformer | epoch 0 | step 16380 |avg loss 5.083 |avg tokens 4908.800 |tokens/s 33786.638 |walltime 2254.975 | +Transformer | epoch 0 | step 16390 |avg loss 5.194 |avg tokens 4445.600 |tokens/s 31625.177 |walltime 2256.381 | +Transformer | epoch 0 | step 16400 |avg loss 5.039 |avg tokens 4639.300 |tokens/s 33605.770 |walltime 2257.761 | +Transformer | epoch 0 | step 16410 |avg loss 4.751 |avg tokens 4775.200 |tokens/s 33506.101 |walltime 2259.187 | +Transformer | epoch 0 | step 16420 |avg loss 4.964 |avg tokens 4597.000 |tokens/s 33184.470 |walltime 2260.572 | +Transformer | epoch 0 | step 16430 |avg loss 5.387 |avg tokens 4864.800 |tokens/s 35782.039 |walltime 2261.931 | +Transformer | epoch 0 | step 16440 |avg loss 5.290 |avg tokens 4195.900 |tokens/s 31290.124 |walltime 2263.272 | +Transformer | epoch 0 | step 16450 |avg loss 5.693 |avg tokens 4718.200 |tokens/s 34714.284 |walltime 2264.632 | +Transformer | epoch 0 | step 16460 |avg loss 5.560 |avg tokens 4366.400 |tokens/s 33618.663 |walltime 2265.930 | +Transformer | epoch 0 | step 16470 |avg loss 4.837 |avg tokens 4718.100 |tokens/s 33022.522 |walltime 2267.359 | +Transformer | epoch 0 | step 16480 |avg loss 4.894 |avg tokens 4673.100 |tokens/s 32162.424 |walltime 2268.812 | +Transformer | epoch 0 | step 16490 |avg loss 4.868 |avg tokens 4537.200 |tokens/s 31495.885 |walltime 2270.253 | +Transformer | epoch 0 | step 16500 |avg loss 5.450 |avg tokens 4680.800 |tokens/s 32681.630 |walltime 2271.685 | +Transformer | epoch 0 | step 16510 |avg loss 5.431 |avg tokens 4696.800 |tokens/s 33975.184 |walltime 2273.067 | +Transformer | epoch 0 | step 16520 |avg loss 4.916 |avg tokens 4785.600 |tokens/s 33314.824 |walltime 2274.504 | +Transformer | epoch 0 | step 16530 |avg loss 5.257 |avg tokens 4488.900 |tokens/s 32267.692 |walltime 2275.895 | +Transformer | epoch 0 | step 16540 |avg loss 5.100 |avg tokens 4619.300 |tokens/s 33310.986 |walltime 2277.282 | +Transformer | epoch 0 | step 16550 |avg loss 5.044 |avg tokens 4830.800 |tokens/s 34698.687 |walltime 2278.674 | +Transformer | epoch 0 | step 16560 |avg loss 5.423 |avg tokens 4612.500 |tokens/s 34397.433 |walltime 2280.015 | +Transformer | epoch 0 | step 16570 |avg loss 5.042 |avg tokens 4845.400 |tokens/s 34082.688 |walltime 2281.437 | +Transformer | epoch 0 | step 16580 |avg loss 4.844 |avg tokens 4567.000 |tokens/s 32438.365 |walltime 2282.844 | +Transformer | epoch 0 | step 16590 |avg loss 5.293 |avg tokens 4651.600 |tokens/s 34296.962 |walltime 2284.201 | +Transformer | epoch 0 | step 16600 |avg loss 5.241 |avg tokens 4526.400 |tokens/s 33889.651 |walltime 2285.536 | +Transformer | epoch 0 | step 16610 |avg loss 4.923 |avg tokens 4751.600 |tokens/s 33904.402 |walltime 2286.938 | +Transformer | epoch 0 | step 16620 |avg loss 5.440 |avg tokens 4612.100 |tokens/s 34322.264 |walltime 2288.282 | +Transformer | epoch 0 | step 16630 |avg loss 4.768 |avg tokens 4970.400 |tokens/s 35086.568 |walltime 2289.698 | +Transformer | epoch 0 | step 16640 |avg loss 5.623 |avg tokens 4002.100 |tokens/s 30711.876 |walltime 2291.001 | +Transformer | epoch 0 | step 16650 |avg loss 5.427 |avg tokens 4492.500 |tokens/s 32952.642 |walltime 2292.365 | +Transformer | epoch 0 | step 16660 |avg loss 5.004 |avg tokens 4399.900 |tokens/s 32139.620 |walltime 2293.734 | +Transformer | epoch 0 | step 16670 |avg loss 5.208 |avg tokens 4737.400 |tokens/s 33919.526 |walltime 2295.130 | +Transformer | epoch 0 | step 16680 |avg loss 4.862 |avg tokens 4963.300 |tokens/s 34841.118 |walltime 2296.555 | +Transformer | epoch 0 | step 16690 |avg loss 4.995 |avg tokens 4555.200 |tokens/s 33454.410 |walltime 2297.916 | +Transformer | epoch 0 | step 16700 |avg loss 5.359 |avg tokens 4387.500 |tokens/s 32575.869 |walltime 2299.263 | +Transformer | epoch 0 | step 16710 |avg loss 5.551 |avg tokens 4340.400 |tokens/s 32691.545 |walltime 2300.591 | +Transformer | epoch 0 | step 16720 |avg loss 4.959 |avg tokens 4762.000 |tokens/s 33975.217 |walltime 2301.993 | +Transformer | epoch 0 | step 16730 |avg loss 5.104 |avg tokens 4488.700 |tokens/s 33482.769 |walltime 2303.333 | +Transformer | epoch 0 | step 16740 |avg loss 4.771 |avg tokens 4752.800 |tokens/s 32848.047 |walltime 2304.780 | +Transformer | epoch 0 | step 16750 |avg loss 4.979 |avg tokens 4768.000 |tokens/s 34187.550 |walltime 2306.175 | +Transformer | epoch 0 | step 16760 |avg loss 5.189 |avg tokens 4500.000 |tokens/s 32352.395 |walltime 2307.566 | +Transformer | epoch 0 | step 16770 |avg loss 5.241 |avg tokens 4295.300 |tokens/s 32222.611 |walltime 2308.899 | +Transformer | epoch 0 | step 16780 |avg loss 5.833 |avg tokens 4659.200 |tokens/s 35325.741 |walltime 2310.218 | +Transformer | epoch 0 | step 16790 |avg loss 4.669 |avg tokens 4591.300 |tokens/s 34296.704 |walltime 2311.556 | +Transformer | epoch 0 | step 16800 |avg loss 5.279 |avg tokens 4405.900 |tokens/s 33165.036 |walltime 2312.885 | +Transformer | epoch 0 | step 16810 |avg loss 5.400 |avg tokens 4789.300 |tokens/s 33631.801 |walltime 2314.309 | +Transformer | epoch 0 | step 16820 |avg loss 5.476 |avg tokens 4324.100 |tokens/s 32616.821 |walltime 2315.635 | +Transformer | epoch 0 | step 16830 |avg loss 5.264 |avg tokens 4842.500 |tokens/s 36135.902 |walltime 2316.975 | +Transformer | epoch 0 | step 16840 |avg loss 5.080 |avg tokens 4958.500 |tokens/s 34785.900 |walltime 2318.400 | +Transformer | epoch 0 | step 16850 |avg loss 5.476 |avg tokens 4447.700 |tokens/s 32597.245 |walltime 2319.765 | +Transformer | epoch 0 | step 16860 |avg loss 5.936 |avg tokens 4481.300 |tokens/s 34020.641 |walltime 2321.082 | +Transformer | epoch 0 | step 16870 |avg loss 5.719 |avg tokens 4015.500 |tokens/s 30803.204 |walltime 2322.385 | +Transformer | epoch 0 | step 16880 |avg loss 5.566 |avg tokens 4505.700 |tokens/s 33123.600 |walltime 2323.746 | +Transformer | epoch 0 | step 16890 |avg loss 5.448 |avg tokens 4629.400 |tokens/s 33465.118 |walltime 2325.129 | +Transformer | epoch 0 | step 16900 |avg loss 4.734 |avg tokens 4918.400 |tokens/s 34618.425 |walltime 2326.550 | +Transformer | epoch 0 | step 16910 |avg loss 5.104 |avg tokens 4943.900 |tokens/s 35071.118 |walltime 2327.960 | +Transformer | epoch 0 | step 16920 |avg loss 5.236 |avg tokens 4720.800 |tokens/s 34172.826 |walltime 2329.341 | +Transformer | epoch 0 | step 16930 |avg loss 5.622 |avg tokens 4456.500 |tokens/s 33065.041 |walltime 2330.689 | +Transformer | epoch 0 | step 16940 |avg loss 5.428 |avg tokens 4448.600 |tokens/s 32059.816 |walltime 2332.076 | +Transformer | epoch 0 | step 16950 |avg loss 4.793 |avg tokens 4624.000 |tokens/s 33692.542 |walltime 2333.449 | +Transformer | epoch 0 | step 16960 |avg loss 5.405 |avg tokens 4623.900 |tokens/s 33845.061 |walltime 2334.815 | +Transformer | epoch 0 | step 16970 |avg loss 4.761 |avg tokens 4905.200 |tokens/s 35325.130 |walltime 2336.204 | +Transformer | epoch 0 | step 16980 |avg loss 5.075 |avg tokens 4673.200 |tokens/s 34509.017 |walltime 2337.558 | +Transformer | epoch 0 | step 16990 |avg loss 5.608 |avg tokens 4270.500 |tokens/s 32028.729 |walltime 2338.891 | +Transformer | epoch 0 | step 17000 |avg loss 5.225 |avg tokens 4662.300 |tokens/s 33979.771 |walltime 2340.263 | +Transformer | epoch 0 | step 17010 |avg loss 6.051 |avg tokens 4017.600 |tokens/s 31831.897 |walltime 2341.525 | +Transformer | epoch 0 | step 17020 |avg loss 5.645 |avg tokens 3739.400 |tokens/s 29124.201 |walltime 2342.809 | +Transformer | epoch 0 | step 17030 |avg loss 4.978 |avg tokens 4865.600 |tokens/s 35021.230 |walltime 2344.199 | +Transformer | epoch 0 | step 17040 |avg loss 5.235 |avg tokens 4206.800 |tokens/s 31695.577 |walltime 2345.526 | +Transformer | epoch 0 | step 17050 |avg loss 5.240 |avg tokens 4438.000 |tokens/s 31766.246 |walltime 2346.923 | +Transformer | epoch 0 | step 17060 |avg loss 5.176 |avg tokens 4522.200 |tokens/s 33243.466 |walltime 2348.283 | +Transformer | epoch 0 | step 17070 |avg loss 4.767 |avg tokens 4797.600 |tokens/s 33850.786 |walltime 2349.701 | +Transformer | epoch 0 | step 17080 |avg loss 5.480 |avg tokens 4540.600 |tokens/s 33676.321 |walltime 2351.049 | +Transformer | epoch 0 | step 17090 |avg loss 5.696 |avg tokens 4537.800 |tokens/s 33009.336 |walltime 2352.424 | +Transformer | epoch 0 | step 17100 |avg loss 5.394 |avg tokens 4453.000 |tokens/s 34116.445 |walltime 2353.729 | +Transformer | epoch 0 | step 17110 |avg loss 4.757 |avg tokens 4908.800 |tokens/s 34707.933 |walltime 2355.143 | +Transformer | epoch 0 | step 17120 |avg loss 4.949 |avg tokens 4516.600 |tokens/s 32132.345 |walltime 2356.549 | +Transformer | epoch 0 | step 17130 |avg loss 4.982 |avg tokens 4962.900 |tokens/s 35243.006 |walltime 2357.957 | +Transformer | epoch 0 | step 17140 |avg loss 5.514 |avg tokens 4374.200 |tokens/s 32247.239 |walltime 2359.313 | +Transformer | epoch 0 | step 17150 |avg loss 5.048 |avg tokens 4632.300 |tokens/s 34201.249 |walltime 2360.668 | +Transformer | epoch 0 | step 17160 |avg loss 5.067 |avg tokens 4482.400 |tokens/s 32656.957 |walltime 2362.040 | +Transformer | epoch 0 | step 17170 |avg loss 5.315 |avg tokens 4348.100 |tokens/s 32889.854 |walltime 2363.362 | +Transformer | epoch 0 | step 17180 |avg loss 5.292 |avg tokens 4226.900 |tokens/s 31487.185 |walltime 2364.705 | +Transformer | epoch 0 | step 17190 |avg loss 5.519 |avg tokens 4324.300 |tokens/s 31665.291 |walltime 2366.071 | +Transformer | epoch 0 | step 17200 |avg loss 5.377 |avg tokens 4332.500 |tokens/s 32761.750 |walltime 2367.393 | +Transformer | epoch 0 | step 17210 |avg loss 5.146 |avg tokens 4211.900 |tokens/s 30500.219 |walltime 2368.774 | +Transformer | epoch 0 | step 17220 |avg loss 5.265 |avg tokens 4671.400 |tokens/s 34668.613 |walltime 2370.121 | +Transformer | epoch 0 | step 17230 |avg loss 4.958 |avg tokens 4521.800 |tokens/s 32808.189 |walltime 2371.500 | +Transformer | epoch 0 | step 17240 |avg loss 5.007 |avg tokens 4597.200 |tokens/s 33229.252 |walltime 2372.883 | +Transformer | epoch 0 | step 17250 |avg loss 5.511 |avg tokens 4402.300 |tokens/s 32907.187 |walltime 2374.221 | +Transformer | epoch 0 | step 17260 |avg loss 4.980 |avg tokens 4385.800 |tokens/s 31867.198 |walltime 2375.597 | +Transformer | epoch 0 | step 17270 |avg loss 5.937 |avg tokens 3697.900 |tokens/s 28737.643 |walltime 2376.884 | +Transformer | epoch 0 | step 17280 |avg loss 5.069 |avg tokens 4600.800 |tokens/s 33361.831 |walltime 2378.263 | +Transformer | epoch 0 | step 17290 |avg loss 5.258 |avg tokens 4552.900 |tokens/s 33523.990 |walltime 2379.621 | +Transformer | epoch 0 | step 17300 |avg loss 4.943 |avg tokens 4789.600 |tokens/s 34665.727 |walltime 2381.003 | +Transformer | epoch 0 | step 17310 |avg loss 5.424 |avg tokens 4754.400 |tokens/s 34470.228 |walltime 2382.382 | +Transformer | epoch 0 | step 17320 |avg loss 5.048 |avg tokens 4726.800 |tokens/s 33468.462 |walltime 2383.794 | +Transformer | epoch 0 | step 17330 |avg loss 5.449 |avg tokens 4271.600 |tokens/s 32831.969 |walltime 2385.095 | +Transformer | epoch 0 | step 17340 |avg loss 5.009 |avg tokens 4947.200 |tokens/s 35601.409 |walltime 2386.485 | +Transformer | epoch 0 | step 17350 |avg loss 5.133 |avg tokens 4815.700 |tokens/s 35399.682 |walltime 2387.845 | +Transformer | epoch 0 | step 17360 |avg loss 5.063 |avg tokens 4241.300 |tokens/s 31820.523 |walltime 2389.178 | +Transformer | epoch 0 | step 17370 |avg loss 5.349 |avg tokens 4282.900 |tokens/s 32095.536 |walltime 2390.513 | +Transformer | epoch 0 | step 17380 |avg loss 4.948 |avg tokens 4740.100 |tokens/s 34157.276 |walltime 2391.900 | +Transformer | epoch 0 | step 17390 |avg loss 5.579 |avg tokens 4176.400 |tokens/s 30932.840 |walltime 2393.251 | +Transformer | epoch 0 | step 17400 |avg loss 5.141 |avg tokens 4649.900 |tokens/s 33645.417 |walltime 2394.633 | +Transformer | epoch 0 | step 17410 |avg loss 4.690 |avg tokens 4653.800 |tokens/s 33500.426 |walltime 2396.022 | +Transformer | epoch 0 | step 17420 |avg loss 5.692 |avg tokens 4075.300 |tokens/s 31235.547 |walltime 2397.327 | +Transformer | epoch 0 | step 17430 |avg loss 5.028 |avg tokens 4454.900 |tokens/s 32331.601 |walltime 2398.704 | +Transformer | epoch 0 | step 17440 |avg loss 5.615 |avg tokens 4315.300 |tokens/s 31982.788 |walltime 2400.054 | +Transformer | epoch 0 | step 17450 |avg loss 5.338 |avg tokens 4706.900 |tokens/s 34042.760 |walltime 2401.436 | +Transformer | epoch 0 | step 17460 |avg loss 5.476 |avg tokens 4658.800 |tokens/s 34845.340 |walltime 2402.773 | +Transformer | epoch 0 | step 17470 |avg loss 5.425 |avg tokens 4249.900 |tokens/s 30908.583 |walltime 2404.148 | +Transformer | epoch 0 | step 17480 |avg loss 4.826 |avg tokens 4737.600 |tokens/s 34760.939 |walltime 2405.511 | +Transformer | epoch 0 | step 17490 |avg loss 5.304 |avg tokens 4232.200 |tokens/s 32140.726 |walltime 2406.828 | +Transformer | epoch 0 | step 17500 |avg loss 5.604 |avg tokens 4627.000 |tokens/s 34007.121 |walltime 2408.189 | +Transformer | epoch 0 | step 17510 |avg loss 5.448 |avg tokens 4358.200 |tokens/s 32392.735 |walltime 2409.534 | +Transformer | epoch 0 | step 17520 |avg loss 5.766 |avg tokens 3584.100 |tokens/s 27409.698 |walltime 2410.842 | +Transformer | epoch 0 | step 17530 |avg loss 5.459 |avg tokens 4105.900 |tokens/s 29639.615 |walltime 2412.227 | +Transformer | epoch 0 | step 17540 |avg loss 5.603 |avg tokens 4188.100 |tokens/s 30537.541 |walltime 2413.598 | +Transformer | epoch 0 | step 17550 |avg loss 4.789 |avg tokens 4782.500 |tokens/s 34504.522 |walltime 2414.984 | +Transformer | epoch 0 | step 17560 |avg loss 4.936 |avg tokens 4695.000 |tokens/s 33685.910 |walltime 2416.378 | +Transformer | epoch 0 | step 17570 |avg loss 4.897 |avg tokens 4802.200 |tokens/s 34360.638 |walltime 2417.776 | +Transformer | epoch 0 | step 17580 |avg loss 5.140 |avg tokens 4208.800 |tokens/s 31204.040 |walltime 2419.125 | +Transformer | epoch 0 | step 17590 |avg loss 5.190 |avg tokens 4755.800 |tokens/s 34065.080 |walltime 2420.521 | +Transformer | epoch 0 | step 17600 |avg loss 5.410 |avg tokens 4314.900 |tokens/s 31139.884 |walltime 2421.906 | +Transformer | epoch 0 | step 17610 |avg loss 5.848 |avg tokens 4435.600 |tokens/s 33460.950 |walltime 2423.232 | +Transformer | epoch 0 | step 17620 |avg loss 5.043 |avg tokens 4471.800 |tokens/s 33046.562 |walltime 2424.585 | +Transformer | epoch 0 | step 17630 |avg loss 5.737 |avg tokens 4402.600 |tokens/s 32588.355 |walltime 2425.936 | +Transformer | epoch 0 | step 17640 |avg loss 5.364 |avg tokens 4146.500 |tokens/s 31210.783 |walltime 2427.265 | +Transformer | epoch 0 | step 17650 |avg loss 5.254 |avg tokens 4537.300 |tokens/s 33772.821 |walltime 2428.608 | +Transformer | epoch 0 | step 17660 |avg loss 5.668 |avg tokens 4470.900 |tokens/s 33356.316 |walltime 2429.949 | +Transformer | epoch 0 | step 17670 |avg loss 5.641 |avg tokens 4317.600 |tokens/s 32155.983 |walltime 2431.291 | +Transformer | epoch 0 | step 17680 |avg loss 5.660 |avg tokens 3821.300 |tokens/s 29151.290 |walltime 2432.602 | +Transformer | epoch 0 | step 17690 |avg loss 4.950 |avg tokens 4867.300 |tokens/s 33724.949 |walltime 2434.045 | +Transformer | epoch 0 | step 17700 |avg loss 5.578 |avg tokens 4578.400 |tokens/s 34102.200 |walltime 2435.388 | +Transformer | epoch 0 | step 17710 |avg loss 4.696 |avg tokens 4819.400 |tokens/s 33415.987 |walltime 2436.830 | +Transformer | epoch 0 | step 17720 |avg loss 5.605 |avg tokens 3828.800 |tokens/s 28802.961 |walltime 2438.159 | +Transformer | epoch 0 | step 17730 |avg loss 5.338 |avg tokens 4695.800 |tokens/s 34850.807 |walltime 2439.507 | +Transformer | epoch 0 | step 17740 |avg loss 5.520 |avg tokens 4631.000 |tokens/s 33568.239 |walltime 2440.886 | +Transformer | epoch 0 | step 17750 |avg loss 5.058 |avg tokens 4426.900 |tokens/s 32215.221 |walltime 2442.261 | +Transformer | epoch 0 | step 17760 |avg loss 5.127 |avg tokens 4472.500 |tokens/s 33319.730 |walltime 2443.603 | +Transformer | epoch 0 | step 17770 |avg loss 5.042 |avg tokens 4606.400 |tokens/s 34207.045 |walltime 2444.950 | +Transformer | epoch 0 | step 17780 |avg loss 5.268 |avg tokens 4904.900 |tokens/s 34957.997 |walltime 2446.353 | +Transformer | epoch 0 | step 17790 |avg loss 5.525 |avg tokens 4678.200 |tokens/s 34492.275 |walltime 2447.709 | +Transformer | epoch 0 | step 17800 |avg loss 5.469 |avg tokens 4240.800 |tokens/s 32175.731 |walltime 2449.027 | +Transformer | epoch 0 | step 17810 |avg loss 5.676 |avg tokens 4783.800 |tokens/s 35420.182 |walltime 2450.378 | +Transformer | epoch 0 | step 17820 |avg loss 5.070 |avg tokens 4173.500 |tokens/s 30028.814 |walltime 2451.767 | +Transformer | epoch 0 | step 17830 |avg loss 5.198 |avg tokens 4695.500 |tokens/s 33742.542 |walltime 2453.159 | +Transformer | epoch 0 | step 17840 |avg loss 5.252 |avg tokens 4649.600 |tokens/s 33469.015 |walltime 2454.548 | +Transformer | epoch 0 | step 17850 |avg loss 5.375 |avg tokens 4152.700 |tokens/s 30866.628 |walltime 2455.894 | +Transformer | epoch 0 | step 17860 |avg loss 4.749 |avg tokens 4795.000 |tokens/s 34580.799 |walltime 2457.280 | +Transformer | epoch 0 | step 17870 |avg loss 5.313 |avg tokens 4628.300 |tokens/s 34560.043 |walltime 2458.619 | +Transformer | epoch 0 | step 17880 |avg loss 5.066 |avg tokens 4716.500 |tokens/s 34090.243 |walltime 2460.003 | +Transformer | epoch 0 | step 17890 |avg loss 5.259 |avg tokens 4494.400 |tokens/s 32968.309 |walltime 2461.366 | +Transformer | epoch 0 | step 17900 |avg loss 5.497 |avg tokens 4261.700 |tokens/s 33362.728 |walltime 2462.644 | +Transformer | epoch 0 | step 17910 |avg loss 5.520 |avg tokens 4337.400 |tokens/s 31082.295 |walltime 2464.039 | +Transformer | epoch 0 | step 17920 |avg loss 4.994 |avg tokens 4700.000 |tokens/s 34369.183 |walltime 2465.406 | +Transformer | epoch 0 | step 17930 |avg loss 5.497 |avg tokens 4130.400 |tokens/s 30707.938 |walltime 2466.752 | +Transformer | epoch 0 | step 17940 |avg loss 5.231 |avg tokens 4872.200 |tokens/s 35273.173 |walltime 2468.133 | +Transformer | epoch 0 | step 17950 |avg loss 5.360 |avg tokens 4528.300 |tokens/s 33203.536 |walltime 2469.497 | +Transformer | epoch 0 | step 17960 |avg loss 5.220 |avg tokens 4629.900 |tokens/s 33994.422 |walltime 2470.859 | +Transformer | epoch 0 | step 17970 |avg loss 5.027 |avg tokens 4907.200 |tokens/s 34419.665 |walltime 2472.284 | +Transformer | epoch 0 | step 17980 |avg loss 5.262 |avg tokens 4356.000 |tokens/s 32560.711 |walltime 2473.622 | +Transformer | epoch 0 | step 17990 |avg loss 4.731 |avg tokens 4868.500 |tokens/s 34871.416 |walltime 2475.018 | +Transformer | epoch 0 | step 18000 |avg loss 5.326 |avg tokens 4205.800 |tokens/s 31558.292 |walltime 2476.351 | +Transformer | epoch 0 | step 18010 |avg loss 4.802 |avg tokens 4778.400 |tokens/s 34186.076 |walltime 2477.749 | +Transformer | epoch 0 | step 18020 |avg loss 4.697 |avg tokens 4692.200 |tokens/s 33777.162 |walltime 2479.138 | +Transformer | epoch 0 | step 18030 |avg loss 5.307 |avg tokens 4200.500 |tokens/s 30574.819 |walltime 2480.512 | +Transformer | epoch 0 | step 18040 |avg loss 5.524 |avg tokens 4128.000 |tokens/s 30909.832 |walltime 2481.847 | +Transformer | epoch 0 | step 18050 |avg loss 5.163 |avg tokens 4628.700 |tokens/s 32692.784 |walltime 2483.263 | +Transformer | epoch 0 | step 18060 |avg loss 5.323 |avg tokens 4324.800 |tokens/s 32610.606 |walltime 2484.589 | +Transformer | epoch 0 | step 18070 |avg loss 5.201 |avg tokens 4626.800 |tokens/s 34288.461 |walltime 2485.939 | +Transformer | epoch 0 | step 18080 |avg loss 5.245 |avg tokens 4370.400 |tokens/s 31941.150 |walltime 2487.307 | +Transformer | epoch 0 | step 18090 |avg loss 5.224 |avg tokens 4665.300 |tokens/s 33028.028 |walltime 2488.719 | +Transformer | epoch 0 | step 18100 |avg loss 5.148 |avg tokens 4238.600 |tokens/s 32448.178 |walltime 2490.026 | +Transformer | epoch 0 | step 18110 |avg loss 4.644 |avg tokens 4897.800 |tokens/s 34376.544 |walltime 2491.450 | +Transformer | epoch 0 | step 18120 |avg loss 5.246 |avg tokens 4537.900 |tokens/s 34094.382 |walltime 2492.781 | +Transformer | epoch 0 | step 18130 |avg loss 5.058 |avg tokens 4733.800 |tokens/s 33899.537 |walltime 2494.178 | +Transformer | epoch 0 | step 18140 |avg loss 5.665 |avg tokens 4680.800 |tokens/s 34168.347 |walltime 2495.548 | +Transformer | epoch 0 | step 18150 |avg loss 5.077 |avg tokens 4535.700 |tokens/s 32585.359 |walltime 2496.940 | +Transformer | epoch 0 | step 18160 |avg loss 6.169 |avg tokens 4442.000 |tokens/s 32897.472 |walltime 2498.290 | +Transformer | epoch 0 | step 18170 |avg loss 4.820 |avg tokens 4773.300 |tokens/s 33858.783 |walltime 2499.700 | +Transformer | epoch 0 | step 18180 |avg loss 5.423 |avg tokens 4330.200 |tokens/s 32160.043 |walltime 2501.046 | +Transformer | epoch 0 | step 18190 |avg loss 5.559 |avg tokens 3922.200 |tokens/s 30582.683 |walltime 2502.329 | +Transformer | epoch 0 | step 18200 |avg loss 4.715 |avg tokens 4903.900 |tokens/s 34899.028 |walltime 2503.734 | +Transformer | epoch 0 | step 18210 |avg loss 5.301 |avg tokens 4641.100 |tokens/s 33101.730 |walltime 2505.136 | +Transformer | epoch 0 | step 18220 |avg loss 4.918 |avg tokens 4680.700 |tokens/s 33449.274 |walltime 2506.535 | +Transformer | epoch 0 | step 18230 |avg loss 5.172 |avg tokens 4558.800 |tokens/s 33149.058 |walltime 2507.911 | +Transformer | epoch 0 | step 18240 |avg loss 4.998 |avg tokens 4491.800 |tokens/s 32539.710 |walltime 2509.291 | +Transformer | epoch 0 | step 18250 |avg loss 5.552 |avg tokens 4609.300 |tokens/s 34407.828 |walltime 2510.631 | +Transformer | epoch 0 | step 18260 |avg loss 5.009 |avg tokens 4566.000 |tokens/s 32786.451 |walltime 2512.023 | +Transformer | epoch 0 | step 18270 |avg loss 5.637 |avg tokens 4728.200 |tokens/s 35174.610 |walltime 2513.367 | +Transformer | epoch 0 | step 18280 |avg loss 5.665 |avg tokens 3785.400 |tokens/s 30271.133 |walltime 2514.618 | +Transformer | epoch 0 | step 18290 |avg loss 5.698 |avg tokens 3787.700 |tokens/s 29027.304 |walltime 2515.923 | +Transformer | epoch 0 | step 18300 |avg loss 4.944 |avg tokens 4831.600 |tokens/s 33604.331 |walltime 2517.361 | +Transformer | epoch 0 | step 18310 |avg loss 5.143 |avg tokens 4188.200 |tokens/s 30048.422 |walltime 2518.754 | +Transformer | epoch 0 | step 18320 |avg loss 5.527 |avg tokens 4474.400 |tokens/s 34452.254 |walltime 2520.053 | +Transformer | epoch 0 | step 18330 |avg loss 5.252 |avg tokens 4974.800 |tokens/s 36752.363 |walltime 2521.407 | +Transformer | epoch 0 | step 18340 |avg loss 5.358 |avg tokens 4773.700 |tokens/s 34766.950 |walltime 2522.780 | +Transformer | epoch 0 | step 18350 |avg loss 5.843 |avg tokens 3895.000 |tokens/s 30046.152 |walltime 2524.076 | +Transformer | epoch 0 | step 18360 |avg loss 4.960 |avg tokens 4609.300 |tokens/s 33273.329 |walltime 2525.461 | +Transformer | epoch 0 | step 18370 |avg loss 4.959 |avg tokens 4611.800 |tokens/s 32505.914 |walltime 2526.880 | +Transformer | epoch 0 | step 18380 |avg loss 5.139 |avg tokens 4741.400 |tokens/s 34012.359 |walltime 2528.274 | +Transformer | epoch 0 | step 18390 |avg loss 5.012 |avg tokens 4591.500 |tokens/s 33064.971 |walltime 2529.663 | +Transformer | epoch 0 | step 18400 |avg loss 5.237 |avg tokens 4426.500 |tokens/s 32705.313 |walltime 2531.016 | +Transformer | epoch 0 | step 18410 |avg loss 5.589 |avg tokens 3799.400 |tokens/s 27398.754 |walltime 2532.403 | +Transformer | epoch 0 | step 18420 |avg loss 4.774 |avg tokens 4884.800 |tokens/s 34910.153 |walltime 2533.802 | +Transformer | epoch 0 | step 18430 |avg loss 5.156 |avg tokens 4818.600 |tokens/s 34851.985 |walltime 2535.185 | +Transformer | epoch 0 | step 18440 |avg loss 5.151 |avg tokens 4834.400 |tokens/s 35552.521 |walltime 2536.545 | +Transformer | epoch 0 | step 18450 |avg loss 5.884 |avg tokens 4567.200 |tokens/s 34128.775 |walltime 2537.883 | +Transformer | epoch 0 | step 18460 |avg loss 4.845 |avg tokens 4593.400 |tokens/s 33050.366 |walltime 2539.273 | +Transformer | epoch 0 | step 18470 |avg loss 4.989 |avg tokens 4574.000 |tokens/s 33611.497 |walltime 2540.634 | +Transformer | epoch 0 | step 18480 |avg loss 5.433 |avg tokens 3988.300 |tokens/s 30282.388 |walltime 2541.951 | +Transformer | epoch 0 | step 18490 |avg loss 5.826 |avg tokens 4681.800 |tokens/s 34953.460 |walltime 2543.290 | +Transformer | epoch 0 | step 18500 |avg loss 5.488 |avg tokens 4483.300 |tokens/s 32469.957 |walltime 2544.671 | +Transformer | epoch 0 | step 18510 |avg loss 5.539 |avg tokens 4153.100 |tokens/s 31272.915 |walltime 2545.999 | +Transformer | epoch 0 | step 18520 |avg loss 4.880 |avg tokens 4564.800 |tokens/s 32964.202 |walltime 2547.384 | +Transformer | epoch 0 | step 18530 |avg loss 5.160 |avg tokens 4684.700 |tokens/s 34229.920 |walltime 2548.752 | +Transformer | epoch 0 | step 18540 |avg loss 4.966 |avg tokens 4632.000 |tokens/s 32728.553 |walltime 2550.168 | +Transformer | epoch 0 | step 18550 |avg loss 5.473 |avg tokens 4396.200 |tokens/s 32543.681 |walltime 2551.518 | +Transformer | epoch 0 | step 18560 |avg loss 5.312 |avg tokens 4510.600 |tokens/s 34110.730 |walltime 2552.841 | +Transformer | epoch 0 | step 18570 |avg loss 4.685 |avg tokens 4786.400 |tokens/s 33508.306 |walltime 2554.269 | +Transformer | epoch 0 | step 18580 |avg loss 5.517 |avg tokens 4183.700 |tokens/s 31959.275 |walltime 2555.578 | +Transformer | epoch 0 | step 18590 |avg loss 5.282 |avg tokens 4548.700 |tokens/s 33817.076 |walltime 2556.923 | +Transformer | epoch 0 | step 18600 |avg loss 5.423 |avg tokens 4326.400 |tokens/s 32306.406 |walltime 2558.263 | +Transformer | epoch 0 | step 18610 |avg loss 5.893 |avg tokens 4652.200 |tokens/s 35382.423 |walltime 2559.577 | +Transformer | epoch 0 | step 18620 |avg loss 5.088 |avg tokens 4497.200 |tokens/s 32304.700 |walltime 2560.969 | +Transformer | epoch 0 | step 18630 |avg loss 5.242 |avg tokens 4698.400 |tokens/s 34766.384 |walltime 2562.321 | +Transformer | epoch 0 | step 18640 |avg loss 5.353 |avg tokens 4806.400 |tokens/s 34292.514 |walltime 2563.722 | +Transformer | epoch 0 | step 18650 |avg loss 4.913 |avg tokens 4897.900 |tokens/s 34776.806 |walltime 2565.131 | +Transformer | epoch 0 | step 18660 |avg loss 5.386 |avg tokens 4409.100 |tokens/s 32945.986 |walltime 2566.469 | +Transformer | epoch 0 | step 18670 |avg loss 5.285 |avg tokens 4409.600 |tokens/s 32784.130 |walltime 2567.814 | +Transformer | epoch 0 | step 18680 |avg loss 5.484 |avg tokens 4461.300 |tokens/s 32773.039 |walltime 2569.175 | +Transformer | epoch 0 | step 18690 |avg loss 4.886 |avg tokens 4477.200 |tokens/s 31838.142 |walltime 2570.582 | +Transformer | epoch 0 | step 18700 |avg loss 5.234 |avg tokens 4871.900 |tokens/s 34929.829 |walltime 2571.977 | +Transformer | epoch 0 | step 18710 |avg loss 4.926 |avg tokens 4661.700 |tokens/s 33266.396 |walltime 2573.378 | +Transformer | epoch 0 | step 18720 |avg loss 4.752 |avg tokens 4843.200 |tokens/s 34387.246 |walltime 2574.786 | +Transformer | epoch 0 | step 18730 |avg loss 4.988 |avg tokens 4498.100 |tokens/s 31805.451 |walltime 2576.201 | +Transformer | epoch 0 | step 18740 |avg loss 5.377 |avg tokens 4632.400 |tokens/s 34243.440 |walltime 2577.553 | +Transformer | epoch 0 | step 18750 |avg loss 5.071 |avg tokens 4181.300 |tokens/s 30130.115 |walltime 2578.941 | +Transformer | epoch 0 | step 18760 |avg loss 5.267 |avg tokens 4286.500 |tokens/s 31882.485 |walltime 2580.286 | +Transformer | epoch 0 | step 18770 |avg loss 5.680 |avg tokens 4445.600 |tokens/s 33097.488 |walltime 2581.629 | +Transformer | epoch 0 | step 18780 |avg loss 5.539 |avg tokens 4451.800 |tokens/s 33514.736 |walltime 2582.957 | +Transformer | epoch 0 | step 18790 |avg loss 5.117 |avg tokens 4902.100 |tokens/s 35881.816 |walltime 2584.323 | +Transformer | epoch 0 | step 18800 |avg loss 4.712 |avg tokens 4692.100 |tokens/s 32791.942 |walltime 2585.754 | +Transformer | epoch 0 | step 18810 |avg loss 5.672 |avg tokens 4530.800 |tokens/s 33489.511 |walltime 2587.107 | +Transformer | epoch 0 | step 18820 |avg loss 5.019 |avg tokens 4553.600 |tokens/s 32777.127 |walltime 2588.496 | +Transformer | epoch 0 | step 18830 |avg loss 5.170 |avg tokens 4672.100 |tokens/s 34246.011 |walltime 2589.861 | +Transformer | epoch 0 | step 18840 |avg loss 5.595 |avg tokens 4250.700 |tokens/s 32044.197 |walltime 2591.187 | +Transformer | epoch 0 | step 18850 |avg loss 5.649 |avg tokens 3953.800 |tokens/s 30712.962 |walltime 2592.474 | +Transformer | epoch 0 | step 18860 |avg loss 5.027 |avg tokens 4858.700 |tokens/s 34559.117 |walltime 2593.880 | +Transformer | epoch 0 | step 18870 |avg loss 4.533 |avg tokens 4634.400 |tokens/s 33595.172 |walltime 2595.260 | +Transformer | epoch 0 | step 18880 |avg loss 5.045 |avg tokens 4387.600 |tokens/s 31668.988 |walltime 2596.645 | +Transformer | epoch 0 | step 18890 |avg loss 4.763 |avg tokens 4632.800 |tokens/s 32771.924 |walltime 2598.059 | +Transformer | epoch 0 | step 18900 |avg loss 4.642 |avg tokens 4865.000 |tokens/s 34415.381 |walltime 2599.473 | +Transformer | epoch 0 | step 18910 |avg loss 5.246 |avg tokens 4914.700 |tokens/s 35611.932 |walltime 2600.853 | +Transformer | epoch 0 | step 18920 |avg loss 4.662 |avg tokens 4825.400 |tokens/s 34528.965 |walltime 2602.250 | +Transformer | epoch 0 | step 18930 |avg loss 5.032 |avg tokens 4554.700 |tokens/s 33497.849 |walltime 2603.610 | +Transformer | epoch 0 | step 18940 |avg loss 5.061 |avg tokens 4226.800 |tokens/s 31163.339 |walltime 2604.966 | +Transformer | epoch 0 | step 18950 |avg loss 5.054 |avg tokens 4410.300 |tokens/s 33002.783 |walltime 2606.303 | +Transformer | epoch 0 | step 18960 |avg loss 5.189 |avg tokens 4243.000 |tokens/s 31884.996 |walltime 2607.633 | +Transformer | epoch 0 | step 18970 |avg loss 5.435 |avg tokens 4696.800 |tokens/s 35010.582 |walltime 2608.975 | +Transformer | epoch 0 | step 18980 |avg loss 5.688 |avg tokens 4437.700 |tokens/s 32606.507 |walltime 2610.336 | +Transformer | epoch 0 | step 18990 |avg loss 5.195 |avg tokens 4535.800 |tokens/s 33838.677 |walltime 2611.676 | +Transformer | epoch 0 | step 19000 |avg loss 5.183 |avg tokens 4545.500 |tokens/s 33548.687 |walltime 2613.031 | +Transformer | epoch 0 | step 19010 |avg loss 5.043 |avg tokens 4507.400 |tokens/s 32363.343 |walltime 2614.424 | +Transformer | epoch 0 | step 19020 |avg loss 5.614 |avg tokens 3558.900 |tokens/s 27292.694 |walltime 2615.728 | +Transformer | epoch 0 | step 19030 |avg loss 4.897 |avg tokens 4865.300 |tokens/s 33902.723 |walltime 2617.163 | +Transformer | epoch 0 | step 19040 |avg loss 5.113 |avg tokens 4590.200 |tokens/s 32752.581 |walltime 2618.564 | +Transformer | epoch 0 | step 19050 |avg loss 5.630 |avg tokens 4203.000 |tokens/s 32119.460 |walltime 2619.873 | +Transformer | epoch 0 | step 19060 |avg loss 5.060 |avg tokens 4656.500 |tokens/s 33463.745 |walltime 2621.264 | +Transformer | epoch 0 | step 19070 |avg loss 5.300 |avg tokens 4572.000 |tokens/s 33551.385 |walltime 2622.627 | +Transformer | epoch 0 | step 19080 |avg loss 5.522 |avg tokens 4668.600 |tokens/s 33749.861 |walltime 2624.010 | +Transformer | epoch 0 | step 19090 |avg loss 4.850 |avg tokens 4398.500 |tokens/s 31263.270 |walltime 2625.417 | +Transformer | epoch 0 | step 19100 |avg loss 5.236 |avg tokens 4492.800 |tokens/s 33442.955 |walltime 2626.761 | +Transformer | epoch 0 | step 19110 |avg loss 5.376 |avg tokens 4103.300 |tokens/s 30863.566 |walltime 2628.090 | +Transformer | epoch 0 | step 19120 |avg loss 5.499 |avg tokens 4166.600 |tokens/s 31320.556 |walltime 2629.421 | +Transformer | epoch 0 | step 19130 |avg loss 6.032 |avg tokens 3990.200 |tokens/s 31405.515 |walltime 2630.691 | +Transformer | epoch 0 | step 19140 |avg loss 5.278 |avg tokens 4597.600 |tokens/s 33649.208 |walltime 2632.057 | +Transformer | epoch 0 | step 19150 |avg loss 4.911 |avg tokens 4672.800 |tokens/s 32943.397 |walltime 2633.476 | +Transformer | epoch 0 | step 19160 |avg loss 5.447 |avg tokens 4503.300 |tokens/s 33850.244 |walltime 2634.806 | +Transformer | epoch 0 | step 19170 |avg loss 4.759 |avg tokens 4641.400 |tokens/s 32789.646 |walltime 2636.222 | +Transformer | epoch 0 | step 19180 |avg loss 4.633 |avg tokens 4832.800 |tokens/s 33142.399 |walltime 2637.680 | +Transformer | epoch 0 | step 19190 |avg loss 5.133 |avg tokens 4386.800 |tokens/s 31474.132 |walltime 2639.074 | +Transformer | epoch 0 | step 19200 |avg loss 5.414 |avg tokens 4477.000 |tokens/s 31928.589 |walltime 2640.476 | +Transformer | epoch 0 | step 19210 |avg loss 4.675 |avg tokens 4795.600 |tokens/s 33625.717 |walltime 2641.902 | +Transformer | epoch 0 | step 19220 |avg loss 5.295 |avg tokens 4088.200 |tokens/s 29979.737 |walltime 2643.266 | +Transformer | epoch 0 | step 19230 |avg loss 6.479 |avg tokens 4675.400 |tokens/s 35630.224 |walltime 2644.578 | +Transformer | epoch 0 | step 19240 |avg loss 4.994 |avg tokens 4552.300 |tokens/s 33870.472 |walltime 2645.922 | +Transformer | epoch 0 | step 19250 |avg loss 5.146 |avg tokens 4555.600 |tokens/s 32914.766 |walltime 2647.306 | +Transformer | epoch 0 | step 19260 |avg loss 5.985 |avg tokens 4266.300 |tokens/s 32626.747 |walltime 2648.614 | +Transformer | epoch 0 | step 19270 |avg loss 5.159 |avg tokens 4483.300 |tokens/s 33317.086 |walltime 2649.959 | +Transformer | epoch 0 | step 19280 |avg loss 5.094 |avg tokens 4503.700 |tokens/s 32709.200 |walltime 2651.336 | +Transformer | epoch 0 | step 19290 |avg loss 5.384 |avg tokens 4260.400 |tokens/s 32288.304 |walltime 2652.656 | +Transformer | epoch 0 | step 19300 |avg loss 5.357 |avg tokens 4457.800 |tokens/s 32303.884 |walltime 2654.036 | +Transformer | epoch 0 | step 19310 |avg loss 5.041 |avg tokens 4616.600 |tokens/s 33078.637 |walltime 2655.431 | +Transformer | epoch 0 | step 19320 |avg loss 5.015 |avg tokens 4627.800 |tokens/s 33271.802 |walltime 2656.822 | +Transformer | epoch 0 | step 19330 |avg loss 5.287 |avg tokens 4296.500 |tokens/s 30560.640 |walltime 2658.228 | +Transformer | epoch 0 | step 19340 |avg loss 5.161 |avg tokens 4593.200 |tokens/s 33763.813 |walltime 2659.589 | +Transformer | epoch 0 | step 19350 |avg loss 5.154 |avg tokens 4914.000 |tokens/s 34026.358 |walltime 2661.033 | +Transformer | epoch 0 | step 19360 |avg loss 4.994 |avg tokens 4553.600 |tokens/s 32627.751 |walltime 2662.428 | +Transformer | epoch 0 | step 19370 |avg loss 4.868 |avg tokens 4742.700 |tokens/s 34246.081 |walltime 2663.813 | +Transformer | epoch 0 | step 19380 |avg loss 5.214 |avg tokens 4660.100 |tokens/s 34272.203 |walltime 2665.173 | +Transformer | epoch 0 | step 19390 |avg loss 5.018 |avg tokens 4478.400 |tokens/s 31861.655 |walltime 2666.579 | +Transformer | epoch 0 | step 19400 |avg loss 4.386 |avg tokens 4757.600 |tokens/s 31826.601 |walltime 2668.073 | +Transformer | epoch 0 | step 19410 |avg loss 4.930 |avg tokens 4899.400 |tokens/s 34525.417 |walltime 2669.493 | +Transformer | epoch 0 | step 19420 |avg loss 5.572 |avg tokens 4472.700 |tokens/s 33821.599 |walltime 2670.815 | +Transformer | epoch 0 | step 19430 |avg loss 4.797 |avg tokens 4820.000 |tokens/s 34199.309 |walltime 2672.224 | +Transformer | epoch 0 | step 19440 |avg loss 5.285 |avg tokens 4786.300 |tokens/s 34817.200 |walltime 2673.599 | +Transformer | epoch 0 | step 19450 |avg loss 4.850 |avg tokens 4783.500 |tokens/s 33528.582 |walltime 2675.026 | +Transformer | epoch 0 | step 19460 |avg loss 4.819 |avg tokens 4928.300 |tokens/s 35897.492 |walltime 2676.399 | +Transformer | epoch 0 | step 19470 |avg loss 4.959 |avg tokens 4497.600 |tokens/s 33180.499 |walltime 2677.754 | +Transformer | epoch 0 | step 19480 |avg loss 4.691 |avg tokens 4877.600 |tokens/s 33419.592 |walltime 2679.214 | +Transformer | epoch 0 | step 19490 |avg loss 4.867 |avg tokens 4918.300 |tokens/s 33298.143 |walltime 2680.691 | +Transformer | epoch 0 | step 19500 |avg loss 5.272 |avg tokens 4574.300 |tokens/s 33127.599 |walltime 2682.072 | +Transformer | epoch 0 | step 19510 |avg loss 4.979 |avg tokens 4421.200 |tokens/s 32397.994 |walltime 2683.436 | +Transformer | epoch 0 | step 19520 |avg loss 4.774 |avg tokens 4646.200 |tokens/s 33502.595 |walltime 2684.823 | +Transformer | epoch 0 | step 19530 |avg loss 5.284 |avg tokens 4532.700 |tokens/s 32954.942 |walltime 2686.198 | +Transformer | epoch 0 | step 19540 |avg loss 5.447 |avg tokens 4278.000 |tokens/s 32060.374 |walltime 2687.533 | +Transformer | epoch 0 | step 19550 |avg loss 6.077 |avg tokens 4645.500 |tokens/s 35681.395 |walltime 2688.835 | +Transformer | epoch 0 | step 19560 |avg loss 4.899 |avg tokens 4561.700 |tokens/s 32430.667 |walltime 2690.241 | +Transformer | epoch 0 | step 19570 |avg loss 5.453 |avg tokens 4383.500 |tokens/s 33924.186 |walltime 2691.533 | +Transformer | epoch 0 | step 19580 |avg loss 5.464 |avg tokens 4475.500 |tokens/s 33293.600 |walltime 2692.878 | +Transformer | epoch 0 | step 19590 |avg loss 4.863 |avg tokens 4529.500 |tokens/s 32258.629 |walltime 2694.282 | +Transformer | epoch 0 | step 19600 |avg loss 5.540 |avg tokens 3882.500 |tokens/s 30272.791 |walltime 2695.564 | +Transformer | epoch 0 | step 19610 |avg loss 5.602 |avg tokens 4780.400 |tokens/s 36136.171 |walltime 2696.887 | +Transformer | epoch 0 | step 19620 |avg loss 4.594 |avg tokens 4797.600 |tokens/s 34745.171 |walltime 2698.268 | +Transformer | epoch 0 | step 19630 |avg loss 5.011 |avg tokens 4640.400 |tokens/s 33349.968 |walltime 2699.659 | +Transformer | epoch 0 | step 19640 |avg loss 4.760 |avg tokens 4805.100 |tokens/s 33327.017 |walltime 2701.101 | +Transformer | epoch 0 | step 19650 |avg loss 5.353 |avg tokens 4404.100 |tokens/s 32500.667 |walltime 2702.456 | +Transformer | epoch 0 | step 19660 |avg loss 5.430 |avg tokens 4115.100 |tokens/s 30411.169 |walltime 2703.810 | +Transformer | epoch 0 | step 19670 |avg loss 5.398 |avg tokens 4244.700 |tokens/s 30352.877 |walltime 2705.208 | +Transformer | epoch 0 | step 19680 |avg loss 5.594 |avg tokens 4279.200 |tokens/s 32502.721 |walltime 2706.525 | +Transformer | epoch 0 | step 19690 |avg loss 5.508 |avg tokens 4299.800 |tokens/s 31519.337 |walltime 2707.889 | +Transformer | epoch 0 | step 19700 |avg loss 5.454 |avg tokens 4546.700 |tokens/s 33023.591 |walltime 2709.266 | +Transformer | epoch 0 | step 19710 |avg loss 4.851 |avg tokens 4880.000 |tokens/s 34956.813 |walltime 2710.662 | +Transformer | epoch 0 | step 19720 |avg loss 5.338 |avg tokens 4632.700 |tokens/s 34053.974 |walltime 2712.022 | +Transformer | epoch 0 | step 19730 |avg loss 5.166 |avg tokens 4875.700 |tokens/s 34960.534 |walltime 2713.417 | +Transformer | epoch 0 | step 19740 |avg loss 4.896 |avg tokens 4664.700 |tokens/s 32978.820 |walltime 2714.831 | +Transformer | epoch 0 | step 19750 |avg loss 5.157 |avg tokens 4276.400 |tokens/s 31130.504 |walltime 2716.205 | +Transformer | epoch 0 | step 19760 |avg loss 4.611 |avg tokens 4673.600 |tokens/s 32679.241 |walltime 2717.635 | +Transformer | epoch 0 | step 19770 |avg loss 4.874 |avg tokens 4580.200 |tokens/s 32225.737 |walltime 2719.056 | +Transformer | epoch 0 | step 19780 |avg loss 5.255 |avg tokens 4587.100 |tokens/s 33190.555 |walltime 2720.438 | +Transformer | epoch 0 | step 19790 |avg loss 5.446 |avg tokens 4355.300 |tokens/s 32778.260 |walltime 2721.767 | +Transformer | epoch 0 | step 19800 |avg loss 6.103 |avg tokens 3935.200 |tokens/s 29706.854 |walltime 2723.092 | +Transformer | epoch 0 | step 19810 |avg loss 5.145 |avg tokens 4760.700 |tokens/s 34405.007 |walltime 2724.475 | +Transformer | epoch 0 | step 19820 |avg loss 4.836 |avg tokens 4439.300 |tokens/s 32541.078 |walltime 2725.840 | +Transformer | epoch 0 | step 19830 |avg loss 5.603 |avg tokens 4884.700 |tokens/s 35035.639 |walltime 2727.234 | +Transformer | epoch 0 | step 19840 |avg loss 4.992 |avg tokens 4767.900 |tokens/s 33433.603 |walltime 2728.660 | +Transformer | epoch 0 | step 19850 |avg loss 4.800 |avg tokens 4743.300 |tokens/s 34317.410 |walltime 2730.042 | +Transformer | epoch 0 | step 19860 |avg loss 5.541 |avg tokens 4619.300 |tokens/s 34406.404 |walltime 2731.385 | +Transformer | epoch 0 | step 19870 |avg loss 5.024 |avg tokens 4681.900 |tokens/s 33669.160 |walltime 2732.775 | +Transformer | epoch 0 | step 19880 |avg loss 5.168 |avg tokens 4430.700 |tokens/s 31322.691 |walltime 2734.190 | +Transformer | epoch 0 | step 19890 |avg loss 5.241 |avg tokens 4707.700 |tokens/s 35038.038 |walltime 2735.533 | +Transformer | epoch 0 | step 19900 |avg loss 5.365 |avg tokens 3806.900 |tokens/s 28068.824 |walltime 2736.890 | +Transformer | epoch 0 | step 19910 |avg loss 5.553 |avg tokens 4544.700 |tokens/s 33477.463 |walltime 2738.247 | +Transformer | epoch 0 | step 19920 |avg loss 4.770 |avg tokens 4624.100 |tokens/s 33462.366 |walltime 2739.629 | +Transformer | epoch 0 | step 19930 |avg loss 5.317 |avg tokens 3942.100 |tokens/s 29874.103 |walltime 2740.949 | +Transformer | epoch 0 | step 19940 |avg loss 5.749 |avg tokens 3854.300 |tokens/s 29521.623 |walltime 2742.254 | +Transformer | epoch 0 | step 19950 |avg loss 5.126 |avg tokens 4841.600 |tokens/s 34832.483 |walltime 2743.644 | +Transformer | epoch 0 | step 19960 |avg loss 5.442 |avg tokens 4405.800 |tokens/s 33016.321 |walltime 2744.979 | +Transformer | epoch 0 | step 19970 |avg loss 4.847 |avg tokens 4744.500 |tokens/s 34054.836 |walltime 2746.372 | +Transformer | epoch 0 | step 19980 |avg loss 5.300 |avg tokens 4144.000 |tokens/s 30969.832 |walltime 2747.710 | +Transformer | epoch 0 | step 19990 |avg loss 5.902 |avg tokens 4400.300 |tokens/s 33591.539 |walltime 2749.020 | +Transformer | epoch 0 | step 20000 |avg loss 5.043 |avg tokens 4732.600 |tokens/s 34593.117 |walltime 2750.388 | +Transformer | epoch 0 | step 20010 |avg loss 5.688 |avg tokens 4268.400 |tokens/s 30807.010 |walltime 2751.773 | +Transformer | epoch 0 | step 20020 |avg loss 4.990 |avg tokens 4445.400 |tokens/s 32651.164 |walltime 2753.135 | +Transformer | epoch 0 | step 20030 |avg loss 5.811 |avg tokens 4124.300 |tokens/s 31809.030 |walltime 2754.432 | +Transformer | epoch 0 | step 20040 |avg loss 5.141 |avg tokens 4746.400 |tokens/s 34594.795 |walltime 2755.804 | +Transformer | epoch 0 | step 20050 |avg loss 4.669 |avg tokens 4840.500 |tokens/s 33675.462 |walltime 2757.241 | +Transformer | epoch 0 | step 20060 |avg loss 5.370 |avg tokens 4450.400 |tokens/s 32621.531 |walltime 2758.605 | +Transformer | epoch 0 | step 20070 |avg loss 4.828 |avg tokens 4483.100 |tokens/s 32847.320 |walltime 2759.970 | +Transformer | epoch 0 | step 20080 |avg loss 5.079 |avg tokens 4191.000 |tokens/s 30441.257 |walltime 2761.347 | +Transformer | epoch 0 | step 20090 |avg loss 5.204 |avg tokens 4596.000 |tokens/s 33422.366 |walltime 2762.722 | +Transformer | epoch 0 | step 20100 |avg loss 5.799 |avg tokens 4166.100 |tokens/s 31245.595 |walltime 2764.055 | +Transformer | epoch 0 | step 20110 |avg loss 5.356 |avg tokens 4171.300 |tokens/s 32493.242 |walltime 2765.339 | +Transformer | epoch 0 | step 20120 |avg loss 5.406 |avg tokens 4435.400 |tokens/s 34070.156 |walltime 2766.641 | +Transformer | epoch 0 | step 20130 |avg loss 5.244 |avg tokens 4636.400 |tokens/s 35014.858 |walltime 2767.965 | +Transformer | epoch 0 | step 20140 |avg loss 4.849 |avg tokens 4628.000 |tokens/s 32643.177 |walltime 2769.383 | +Transformer | epoch 0 | step 20150 |avg loss 4.922 |avg tokens 4967.200 |tokens/s 35314.360 |walltime 2770.789 | +Transformer | epoch 0 | step 20160 |avg loss 5.728 |avg tokens 4617.500 |tokens/s 34344.595 |walltime 2772.134 | +Transformer | epoch 0 | step 20170 |avg loss 4.854 |avg tokens 4509.300 |tokens/s 33167.621 |walltime 2773.493 | +Transformer | epoch 0 | step 20180 |avg loss 5.335 |avg tokens 4684.200 |tokens/s 34199.957 |walltime 2774.863 | +Transformer | epoch 0 | step 20190 |avg loss 4.871 |avg tokens 4696.200 |tokens/s 33495.535 |walltime 2776.265 | +Transformer | epoch 0 | step 20200 |avg loss 5.141 |avg tokens 4803.400 |tokens/s 34639.419 |walltime 2777.652 | +Transformer | epoch 0 | step 20210 |avg loss 5.202 |avg tokens 4791.500 |tokens/s 36026.331 |walltime 2778.982 | +Transformer | epoch 0 | step 20220 |avg loss 5.268 |avg tokens 4479.900 |tokens/s 33667.944 |walltime 2780.312 | +Transformer | epoch 0 | step 20230 |avg loss 5.064 |avg tokens 4240.900 |tokens/s 31094.347 |walltime 2781.676 | +Transformer | epoch 0 | step 20240 |avg loss 5.346 |avg tokens 4201.500 |tokens/s 31658.147 |walltime 2783.003 | +Transformer | epoch 0 | step 20250 |avg loss 5.102 |avg tokens 4363.000 |tokens/s 31787.068 |walltime 2784.376 | +Transformer | epoch 0 | step 20260 |avg loss 5.327 |avg tokens 4420.300 |tokens/s 32831.813 |walltime 2785.722 | +Transformer | epoch 0 | step 20270 |avg loss 5.446 |avg tokens 4332.700 |tokens/s 33377.538 |walltime 2787.020 | +Transformer | epoch 0 | step 20280 |avg loss 4.964 |avg tokens 4762.700 |tokens/s 33774.180 |walltime 2788.431 | +Transformer | epoch 0 | step 20290 |avg loss 5.837 |avg tokens 4332.500 |tokens/s 33444.599 |walltime 2789.726 | +Transformer | epoch 0 | step 20300 |avg loss 4.864 |avg tokens 4903.000 |tokens/s 34733.967 |walltime 2791.138 | +Transformer | epoch 0 | step 20310 |avg loss 5.123 |avg tokens 4441.000 |tokens/s 32891.756 |walltime 2792.488 | +Transformer | epoch 0 | step 20320 |avg loss 5.703 |avg tokens 4347.400 |tokens/s 32451.369 |walltime 2793.827 | +Transformer | epoch 0 | step 20330 |avg loss 5.161 |avg tokens 4632.300 |tokens/s 33675.067 |walltime 2795.203 | +Transformer | epoch 0 | step 20340 |avg loss 4.990 |avg tokens 4748.000 |tokens/s 34672.964 |walltime 2796.572 | +Transformer | epoch 0 | step 20350 |avg loss 4.737 |avg tokens 4544.200 |tokens/s 31721.760 |walltime 2798.005 | +Transformer | epoch 0 | step 20360 |avg loss 4.977 |avg tokens 4694.300 |tokens/s 34512.964 |walltime 2799.365 | +Transformer | epoch 0 | step 20370 |avg loss 4.911 |avg tokens 4444.800 |tokens/s 32716.625 |walltime 2800.724 | +Transformer | epoch 0 | step 20380 |avg loss 5.233 |avg tokens 4437.400 |tokens/s 31608.294 |walltime 2802.128 | +Transformer | epoch 0 | step 20390 |avg loss 5.278 |avg tokens 4624.800 |tokens/s 34042.982 |walltime 2803.486 | +Transformer | epoch 0 | step 20400 |avg loss 4.961 |avg tokens 4379.900 |tokens/s 32647.410 |walltime 2804.828 | +Transformer | epoch 0 | step 20410 |avg loss 5.358 |avg tokens 4646.100 |tokens/s 34209.935 |walltime 2806.186 | +Transformer | epoch 0 | step 20420 |avg loss 5.616 |avg tokens 4567.200 |tokens/s 33980.651 |walltime 2807.530 | +Transformer | epoch 0 | step 20430 |avg loss 5.686 |avg tokens 4530.700 |tokens/s 34169.214 |walltime 2808.856 | +Transformer | epoch 0 | step 20440 |avg loss 5.558 |avg tokens 4503.300 |tokens/s 34139.626 |walltime 2810.175 | +Transformer | epoch 0 | step 20450 |avg loss 4.883 |avg tokens 4594.700 |tokens/s 33317.662 |walltime 2811.554 | +Transformer | epoch 0 | step 20460 |avg loss 5.122 |avg tokens 4638.200 |tokens/s 32991.112 |walltime 2812.960 | +Transformer | epoch 0 | step 20470 |avg loss 5.013 |avg tokens 4563.200 |tokens/s 32653.178 |walltime 2814.357 | +Transformer | epoch 0 | step 20480 |avg loss 5.041 |avg tokens 4642.200 |tokens/s 31770.815 |walltime 2815.818 | +Transformer | epoch 0 | step 20490 |avg loss 5.689 |avg tokens 4536.300 |tokens/s 33737.196 |walltime 2817.163 | +Transformer | epoch 0 | step 20500 |avg loss 5.633 |avg tokens 4179.300 |tokens/s 32085.417 |walltime 2818.466 | +Transformer | epoch 0 | step 20510 |avg loss 5.329 |avg tokens 4402.700 |tokens/s 32327.430 |walltime 2819.828 | +Transformer | epoch 0 | step 20520 |avg loss 5.084 |avg tokens 4864.800 |tokens/s 33910.681 |walltime 2821.262 | +Transformer | epoch 0 | step 20530 |avg loss 4.773 |avg tokens 4728.300 |tokens/s 34020.045 |walltime 2822.652 | +Transformer | epoch 0 | step 20540 |avg loss 5.331 |avg tokens 4690.700 |tokens/s 34517.323 |walltime 2824.011 | +Transformer | epoch 0 | step 20550 |avg loss 5.196 |avg tokens 4183.400 |tokens/s 31476.662 |walltime 2825.340 | +Transformer | epoch 0 | step 20560 |avg loss 5.143 |avg tokens 4394.500 |tokens/s 31836.395 |walltime 2826.720 | +Transformer | epoch 0 | step 20570 |avg loss 5.574 |avg tokens 4246.300 |tokens/s 31501.159 |walltime 2828.068 | +Transformer | epoch 0 | step 20580 |avg loss 5.415 |avg tokens 4237.600 |tokens/s 31820.637 |walltime 2829.400 | +Transformer | epoch 0 | step 20590 |avg loss 5.455 |avg tokens 4765.700 |tokens/s 34882.163 |walltime 2830.766 | +Transformer | epoch 0 | step 20600 |avg loss 4.876 |avg tokens 4916.800 |tokens/s 34932.510 |walltime 2832.174 | +Transformer | epoch 0 | step 20610 |avg loss 4.888 |avg tokens 4380.200 |tokens/s 31610.300 |walltime 2833.559 | +Transformer | epoch 0 | step 20620 |avg loss 4.932 |avg tokens 4725.300 |tokens/s 33580.203 |walltime 2834.967 | +Transformer | epoch 0 | step 20630 |avg loss 4.333 |avg tokens 4795.200 |tokens/s 33091.058 |walltime 2836.416 | +Transformer | epoch 0 | step 20640 |avg loss 5.092 |avg tokens 4243.300 |tokens/s 30918.016 |walltime 2837.788 | +Transformer | epoch 0 | step 20650 |avg loss 5.729 |avg tokens 4546.900 |tokens/s 34482.169 |walltime 2839.107 | +Transformer | epoch 0 | step 20660 |avg loss 4.851 |avg tokens 4740.000 |tokens/s 34349.782 |walltime 2840.487 | +Transformer | epoch 0 | step 20670 |avg loss 5.389 |avg tokens 4238.500 |tokens/s 31610.110 |walltime 2841.828 | +Transformer | epoch 0 | step 20680 |avg loss 4.897 |avg tokens 4612.300 |tokens/s 33720.855 |walltime 2843.195 | +Transformer | epoch 0 | step 20690 |avg loss 5.335 |avg tokens 4386.200 |tokens/s 31851.890 |walltime 2844.572 | +Transformer | epoch 0 | step 20700 |avg loss 5.367 |avg tokens 4446.300 |tokens/s 32441.739 |walltime 2845.943 | +Transformer | epoch 0 | step 20710 |avg loss 4.868 |avg tokens 4753.600 |tokens/s 35048.956 |walltime 2847.299 | +Transformer | epoch 0 | step 20720 |avg loss 5.127 |avg tokens 4921.700 |tokens/s 35561.660 |walltime 2848.683 | +Transformer | epoch 0 | step 20730 |avg loss 5.764 |avg tokens 3966.000 |tokens/s 29294.805 |walltime 2850.037 | +Transformer | epoch 0 | step 20740 |avg loss 4.891 |avg tokens 4825.900 |tokens/s 33993.731 |walltime 2851.457 | +Transformer | epoch 0 | step 20750 |avg loss 5.296 |avg tokens 4424.200 |tokens/s 33231.667 |walltime 2852.788 | +Transformer | epoch 0 | step 20760 |avg loss 6.214 |avg tokens 4586.400 |tokens/s 35682.293 |walltime 2854.073 | +Transformer | epoch 0 | step 20770 |avg loss 5.137 |avg tokens 4596.000 |tokens/s 32484.228 |walltime 2855.488 | +Transformer | epoch 0 | step 20780 |avg loss 4.923 |avg tokens 4193.900 |tokens/s 31769.446 |walltime 2856.808 | +Transformer | epoch 0 | step 20790 |avg loss 5.242 |avg tokens 4250.900 |tokens/s 31366.889 |walltime 2858.164 | +Transformer | epoch 0 | step 20800 |avg loss 5.176 |avg tokens 4716.300 |tokens/s 33645.203 |walltime 2859.565 | +Transformer | epoch 0 | step 20810 |avg loss 5.365 |avg tokens 4496.500 |tokens/s 30666.668 |walltime 2861.032 | +Transformer | epoch 0 | step 20820 |avg loss 5.455 |avg tokens 4701.200 |tokens/s 34759.685 |walltime 2862.384 | +Transformer | epoch 0 | step 20830 |avg loss 4.783 |avg tokens 4507.300 |tokens/s 32566.999 |walltime 2863.768 | +Transformer | epoch 0 | step 20840 |avg loss 4.930 |avg tokens 4590.400 |tokens/s 33342.569 |walltime 2865.145 | +Transformer | epoch 0 | step 20850 |avg loss 5.401 |avg tokens 4351.100 |tokens/s 33190.252 |walltime 2866.456 | +Transformer | epoch 0 | step 20860 |avg loss 4.827 |avg tokens 4841.700 |tokens/s 34280.076 |walltime 2867.868 | +Transformer | epoch 0 | step 20870 |avg loss 5.885 |avg tokens 4400.100 |tokens/s 33825.311 |walltime 2869.169 | +Transformer | epoch 0 | step 20880 |avg loss 5.586 |avg tokens 4445.600 |tokens/s 33688.483 |walltime 2870.489 | +Transformer | epoch 0 | step 20890 |avg loss 5.968 |avg tokens 4491.000 |tokens/s 33800.948 |walltime 2871.817 | +Transformer | epoch 0 | step 20900 |avg loss 4.839 |avg tokens 4585.800 |tokens/s 33620.156 |walltime 2873.181 | +Transformer | epoch 0 | step 20910 |avg loss 5.163 |avg tokens 4326.100 |tokens/s 32824.147 |walltime 2874.499 | +Transformer | epoch 0 | step 20920 |avg loss 5.144 |avg tokens 4793.800 |tokens/s 34896.402 |walltime 2875.873 | +Transformer | epoch 0 | step 20930 |avg loss 4.876 |avg tokens 4540.300 |tokens/s 32321.742 |walltime 2877.278 | +Transformer | epoch 0 | step 20940 |avg loss 5.441 |avg tokens 3943.100 |tokens/s 30290.392 |walltime 2878.580 | +Transformer | epoch 0 | step 20950 |avg loss 5.174 |avg tokens 4313.600 |tokens/s 31860.833 |walltime 2879.933 | +Transformer | epoch 0 | step 20960 |avg loss 5.005 |avg tokens 4535.100 |tokens/s 32366.142 |walltime 2881.335 | +Transformer | epoch 0 | step 20970 |avg loss 4.998 |avg tokens 4158.600 |tokens/s 29681.881 |walltime 2882.736 | +Transformer | epoch 0 | step 20980 |avg loss 5.138 |avg tokens 4660.800 |tokens/s 33301.765 |walltime 2884.135 | +Transformer | epoch 0 | step 20990 |avg loss 4.842 |avg tokens 4502.500 |tokens/s 33316.492 |walltime 2885.487 | +Transformer | epoch 0 | step 21000 |avg loss 5.374 |avg tokens 4497.100 |tokens/s 33684.686 |walltime 2886.822 | +Transformer | epoch 0 | step 21010 |avg loss 4.739 |avg tokens 4875.300 |tokens/s 35017.818 |walltime 2888.214 | +Transformer | epoch 0 | step 21020 |avg loss 5.115 |avg tokens 4562.700 |tokens/s 33819.306 |walltime 2889.563 | +Transformer | epoch 0 | step 21030 |avg loss 5.318 |avg tokens 3955.400 |tokens/s 29886.513 |walltime 2890.887 | +Transformer | epoch 0 | step 21040 |avg loss 5.021 |avg tokens 4574.700 |tokens/s 33331.942 |walltime 2892.259 | +Transformer | epoch 0 | step 21050 |avg loss 5.757 |avg tokens 3944.700 |tokens/s 30041.029 |walltime 2893.572 | +Transformer | epoch 0 | step 21060 |avg loss 5.341 |avg tokens 4425.500 |tokens/s 32688.135 |walltime 2894.926 | +Transformer | epoch 0 | step 21070 |avg loss 5.058 |avg tokens 4588.700 |tokens/s 33126.880 |walltime 2896.311 | +Transformer | epoch 0 | step 21080 |avg loss 5.444 |avg tokens 4097.600 |tokens/s 30539.961 |walltime 2897.653 | +Transformer | epoch 0 | step 21090 |avg loss 4.767 |avg tokens 4569.200 |tokens/s 32069.370 |walltime 2899.078 | +Transformer | epoch 0 | step 21100 |avg loss 5.208 |avg tokens 4377.500 |tokens/s 31626.010 |walltime 2900.462 | +Transformer | epoch 0 | step 21110 |avg loss 5.283 |avg tokens 3891.200 |tokens/s 29155.625 |walltime 2901.797 | +Transformer | epoch 0 | step 21120 |avg loss 5.211 |avg tokens 4835.200 |tokens/s 34855.828 |walltime 2903.184 | +Transformer | epoch 0 | step 21130 |avg loss 4.997 |avg tokens 4618.300 |tokens/s 33840.585 |walltime 2904.549 | +Transformer | epoch 0 | step 21140 |avg loss 4.947 |avg tokens 4538.100 |tokens/s 32287.427 |walltime 2905.954 | +Transformer | epoch 0 | step 21150 |avg loss 5.522 |avg tokens 4049.900 |tokens/s 30798.119 |walltime 2907.269 | +Transformer | epoch 0 | step 21160 |avg loss 4.961 |avg tokens 4792.800 |tokens/s 33827.377 |walltime 2908.686 | +Transformer | epoch 0 | step 21170 |avg loss 5.867 |avg tokens 3905.200 |tokens/s 30627.339 |walltime 2909.961 | +Transformer | epoch 0 | step 21180 |avg loss 5.063 |avg tokens 4676.000 |tokens/s 35302.723 |walltime 2911.285 | +Transformer | epoch 0 | step 21190 |avg loss 5.162 |avg tokens 4634.900 |tokens/s 34474.504 |walltime 2912.630 | +Transformer | epoch 0 | step 21200 |avg loss 4.753 |avg tokens 4813.900 |tokens/s 34533.812 |walltime 2914.024 | +Transformer | epoch 0 | step 21210 |avg loss 5.454 |avg tokens 4646.400 |tokens/s 35313.792 |walltime 2915.340 | +Transformer | epoch 0 | step 21220 |avg loss 5.054 |avg tokens 4425.500 |tokens/s 32274.626 |walltime 2916.711 | +Transformer | epoch 0 | step 21230 |avg loss 5.168 |avg tokens 4552.000 |tokens/s 33090.066 |walltime 2918.087 | +Transformer | epoch 0 | step 21240 |avg loss 4.804 |avg tokens 4652.800 |tokens/s 33187.583 |walltime 2919.488 | +Transformer | epoch 0 | step 21250 |avg loss 5.138 |avg tokens 4675.100 |tokens/s 33984.063 |walltime 2920.864 | +Transformer | epoch 0 | step 21260 |avg loss 4.971 |avg tokens 4725.000 |tokens/s 33785.279 |walltime 2922.263 | +Transformer | epoch 0 | step 21270 |avg loss 5.392 |avg tokens 4605.200 |tokens/s 33151.307 |walltime 2923.652 | +Transformer | epoch 0 | step 21280 |avg loss 4.669 |avg tokens 4828.000 |tokens/s 33996.351 |walltime 2925.072 | +Transformer | epoch 0 | step 21290 |avg loss 5.646 |avg tokens 4481.100 |tokens/s 33659.020 |walltime 2926.403 | +Transformer | epoch 0 | step 21300 |avg loss 4.966 |avg tokens 4566.700 |tokens/s 33542.557 |walltime 2927.765 | +Transformer | epoch 0 | step 21310 |avg loss 5.032 |avg tokens 4321.900 |tokens/s 30591.643 |walltime 2929.178 | +Transformer | epoch 0 | step 21320 |avg loss 4.832 |avg tokens 4749.400 |tokens/s 34113.925 |walltime 2930.570 | +Transformer | epoch 0 | step 21330 |avg loss 5.648 |avg tokens 4284.700 |tokens/s 32220.256 |walltime 2931.900 | +Transformer | epoch 0 | step 21340 |avg loss 5.425 |avg tokens 4709.900 |tokens/s 34982.274 |walltime 2933.246 | +Transformer | epoch 0 | step 21350 |avg loss 4.629 |avg tokens 4501.200 |tokens/s 32715.102 |walltime 2934.622 | +Transformer | epoch 0 | step 21360 |avg loss 5.229 |avg tokens 4930.600 |tokens/s 35680.844 |walltime 2936.004 | +Transformer | epoch 0 | step 21370 |avg loss 4.780 |avg tokens 4820.800 |tokens/s 34423.069 |walltime 2937.404 | +Transformer | epoch 0 | step 21380 |avg loss 4.890 |avg tokens 4653.300 |tokens/s 34062.242 |walltime 2938.770 | +Transformer | epoch 0 | step 21390 |avg loss 4.988 |avg tokens 4453.700 |tokens/s 32845.118 |walltime 2940.126 | +Transformer | epoch 0 | step 21400 |avg loss 5.070 |avg tokens 4450.000 |tokens/s 32046.173 |walltime 2941.515 | +Transformer | epoch 0 | step 21410 |avg loss 5.457 |avg tokens 4061.600 |tokens/s 29856.721 |walltime 2942.875 | +Transformer | epoch 0 | step 21420 |avg loss 5.173 |avg tokens 4492.600 |tokens/s 32993.945 |walltime 2944.237 | +Transformer | epoch 0 | step 21430 |avg loss 5.206 |avg tokens 4995.500 |tokens/s 35917.961 |walltime 2945.628 | +Transformer | epoch 0 | step 21440 |avg loss 5.087 |avg tokens 4536.900 |tokens/s 33340.069 |walltime 2946.989 | +Transformer | epoch 0 | step 21450 |avg loss 5.199 |avg tokens 4615.500 |tokens/s 33192.901 |walltime 2948.379 | +Transformer | epoch 0 | step 21460 |avg loss 4.628 |avg tokens 4802.400 |tokens/s 33911.631 |walltime 2949.795 | +Transformer | epoch 0 | step 21470 |avg loss 5.598 |avg tokens 3805.600 |tokens/s 28315.348 |walltime 2951.139 | +Transformer | epoch 0 | step 21480 |avg loss 5.239 |avg tokens 4630.500 |tokens/s 34137.901 |walltime 2952.496 | +Transformer | epoch 0 | step 21490 |avg loss 4.779 |avg tokens 4516.800 |tokens/s 32426.516 |walltime 2953.889 | +Transformer | epoch 0 | step 21500 |avg loss 5.319 |avg tokens 4766.800 |tokens/s 34718.041 |walltime 2955.262 | +Transformer | epoch 0 | step 21510 |avg loss 5.143 |avg tokens 4280.200 |tokens/s 32433.398 |walltime 2956.581 | +Transformer | epoch 0 | step 21520 |avg loss 5.632 |avg tokens 4332.800 |tokens/s 32525.099 |walltime 2957.913 | +Transformer | epoch 0 | step 21530 |avg loss 5.083 |avg tokens 4390.400 |tokens/s 32668.183 |walltime 2959.257 | +Transformer | epoch 0 | step 21540 |avg loss 5.221 |avg tokens 4544.700 |tokens/s 33264.635 |walltime 2960.624 | +Transformer | epoch 0 | step 21550 |avg loss 5.010 |avg tokens 4544.400 |tokens/s 31909.993 |walltime 2962.048 | +Transformer | epoch 0 | step 21560 |avg loss 5.830 |avg tokens 3895.000 |tokens/s 30610.849 |walltime 2963.320 | +Transformer | epoch 0 | step 21570 |avg loss 5.653 |avg tokens 4636.300 |tokens/s 34529.317 |walltime 2964.663 | +Transformer | epoch 0 | step 21580 |avg loss 4.618 |avg tokens 4835.300 |tokens/s 34305.754 |walltime 2966.072 | +Transformer | epoch 0 | step 21590 |avg loss 5.361 |avg tokens 4506.100 |tokens/s 32241.097 |walltime 2967.470 | +Transformer | epoch 0 | step 21600 |avg loss 5.823 |avg tokens 4368.000 |tokens/s 33142.266 |walltime 2968.788 | +Transformer | epoch 0 | step 21610 |avg loss 4.765 |avg tokens 4716.100 |tokens/s 34390.964 |walltime 2970.159 | +Transformer | epoch 0 | step 21620 |avg loss 5.763 |avg tokens 4131.800 |tokens/s 31313.270 |walltime 2971.479 | +Transformer | epoch 0 | step 21630 |avg loss 5.512 |avg tokens 4795.100 |tokens/s 34357.370 |walltime 2972.874 | +Transformer | epoch 0 | step 21640 |avg loss 5.016 |avg tokens 4504.000 |tokens/s 32063.698 |walltime 2974.279 | +Transformer | epoch 0 | step 21650 |avg loss 5.704 |avg tokens 4479.400 |tokens/s 34290.133 |walltime 2975.585 | +Transformer | epoch 0 | step 21660 |avg loss 5.315 |avg tokens 4490.100 |tokens/s 32398.346 |walltime 2976.971 | +Transformer | epoch 0 | step 21670 |avg loss 5.290 |avg tokens 4657.600 |tokens/s 33165.228 |walltime 2978.376 | +Transformer | epoch 0 | step 21680 |avg loss 5.209 |avg tokens 4551.900 |tokens/s 31417.896 |walltime 2979.825 | +Transformer | epoch 0 | step 21690 |avg loss 5.757 |avg tokens 4443.900 |tokens/s 34235.050 |walltime 2981.123 | +Transformer | epoch 0 | step 21700 |avg loss 5.323 |avg tokens 4140.200 |tokens/s 32091.799 |walltime 2982.413 | +Transformer | epoch 0 | step 21710 |avg loss 4.625 |avg tokens 4733.200 |tokens/s 33916.835 |walltime 2983.808 | +Transformer | epoch 0 | step 21720 |avg loss 5.079 |avg tokens 4259.500 |tokens/s 31110.691 |walltime 2985.177 | +Transformer | epoch 0 | step 21730 |avg loss 5.495 |avg tokens 4598.400 |tokens/s 34905.066 |walltime 2986.495 | +Transformer | epoch 0 | step 21740 |avg loss 4.528 |avg tokens 4717.600 |tokens/s 33771.781 |walltime 2987.892 | +Transformer | epoch 0 | step 21750 |avg loss 4.946 |avg tokens 4493.700 |tokens/s 33404.783 |walltime 2989.237 | +Transformer | epoch 0 | step 21760 |avg loss 5.357 |avg tokens 4733.300 |tokens/s 34821.309 |walltime 2990.596 | +Transformer | epoch 0 | step 21770 |avg loss 4.950 |avg tokens 4822.600 |tokens/s 35230.309 |walltime 2991.965 | +Transformer | epoch 0 | step 21780 |avg loss 5.055 |avg tokens 4611.400 |tokens/s 33665.352 |walltime 2993.335 | +Transformer | epoch 0 | step 21790 |avg loss 5.680 |avg tokens 3999.700 |tokens/s 30438.733 |walltime 2994.649 | +Transformer | epoch 0 | step 21800 |avg loss 5.134 |avg tokens 4585.600 |tokens/s 33763.593 |walltime 2996.007 | +Transformer | epoch 0 | step 21810 |avg loss 5.124 |avg tokens 4285.400 |tokens/s 31740.791 |walltime 2997.357 | +Transformer | epoch 0 | step 21820 |avg loss 4.837 |avg tokens 4799.900 |tokens/s 34422.199 |walltime 2998.752 | +Transformer | epoch 0 | step 21830 |avg loss 4.825 |avg tokens 4430.300 |tokens/s 31763.239 |walltime 3000.146 | +Transformer | epoch 0 | step 21840 |avg loss 5.492 |avg tokens 4307.900 |tokens/s 31837.323 |walltime 3001.500 | +Transformer | epoch 0 | step 21850 |avg loss 4.914 |avg tokens 4685.200 |tokens/s 33084.392 |walltime 3002.916 | +Transformer | epoch 0 | step 21860 |avg loss 5.637 |avg tokens 4621.300 |tokens/s 34187.791 |walltime 3004.267 | +Transformer | epoch 0 | step 21870 |avg loss 5.353 |avg tokens 4523.900 |tokens/s 33597.995 |walltime 3005.614 | +Transformer | epoch 0 | step 21880 |avg loss 4.975 |avg tokens 4644.700 |tokens/s 34016.132 |walltime 3006.979 | +Transformer | epoch 0 | step 21890 |avg loss 5.062 |avg tokens 4639.400 |tokens/s 33510.949 |walltime 3008.364 | +Transformer | epoch 0 | step 21900 |avg loss 5.024 |avg tokens 4838.800 |tokens/s 33709.484 |walltime 3009.799 | +Transformer | epoch 0 | step 21910 |avg loss 4.893 |avg tokens 4922.600 |tokens/s 33946.933 |walltime 3011.249 | +Transformer | epoch 0 | step 21920 |avg loss 5.054 |avg tokens 4565.000 |tokens/s 32369.683 |walltime 3012.660 | +Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 64.0 +Transformer | epoch 0 | step 21930 |avg loss 5.029 |avg tokens 4458.000 |tokens/s 31121.623 |walltime 3014.092 | +Transformer | epoch 0 | step 21940 |avg loss 5.496 |avg tokens 4566.000 |tokens/s 33380.689 |walltime 3015.460 | +Transformer | epoch 0 | step 21950 |avg loss 5.420 |avg tokens 4620.300 |tokens/s 33753.162 |walltime 3016.829 | +Transformer | epoch 0 | step 21960 |avg loss 4.949 |avg tokens 4629.900 |tokens/s 32947.736 |walltime 3018.234 | +Transformer | epoch 0 | step 21970 |avg loss 5.295 |avg tokens 4217.300 |tokens/s 32361.685 |walltime 3019.537 | +Transformer | epoch 0 | step 21980 |avg loss 5.278 |avg tokens 4556.900 |tokens/s 32674.612 |walltime 3020.932 | +Transformer | epoch 0 | step 21990 |avg loss 5.721 |avg tokens 4088.100 |tokens/s 30389.909 |walltime 3022.277 | +Transformer | epoch 0 | step 22000 |avg loss 5.807 |avg tokens 3770.900 |tokens/s 29630.182 |walltime 3023.550 | +Transformer | epoch 0 | step 22010 |avg loss 4.941 |avg tokens 4520.400 |tokens/s 32884.452 |walltime 3024.924 | +Transformer | epoch 0 | step 22020 |avg loss 5.312 |avg tokens 4509.800 |tokens/s 33296.436 |walltime 3026.279 | +Transformer | epoch 0 | step 22030 |avg loss 5.289 |avg tokens 4565.300 |tokens/s 33516.655 |walltime 3027.641 | +Transformer | epoch 0 | step 22040 |avg loss 5.079 |avg tokens 4396.300 |tokens/s 32557.167 |walltime 3028.991 | +Transformer | epoch 0 | step 22050 |avg loss 5.532 |avg tokens 4388.300 |tokens/s 34039.793 |walltime 3030.280 | +Transformer | epoch 0 | step 22060 |avg loss 4.871 |avg tokens 4648.100 |tokens/s 33531.578 |walltime 3031.667 | +Transformer | epoch 0 | step 22070 |avg loss 5.611 |avg tokens 4181.800 |tokens/s 32754.189 |walltime 3032.943 | +Transformer | epoch 0 | step 22080 |avg loss 5.065 |avg tokens 4613.000 |tokens/s 34048.623 |walltime 3034.298 | +Transformer | epoch 0 | step 22090 |avg loss 4.973 |avg tokens 4862.400 |tokens/s 35402.731 |walltime 3035.672 | +Transformer | epoch 0 | step 22100 |avg loss 5.641 |avg tokens 3975.800 |tokens/s 30381.403 |walltime 3036.980 | +Transformer | epoch 0 | step 22110 |avg loss 5.385 |avg tokens 4836.700 |tokens/s 36565.124 |walltime 3038.303 | +Transformer | epoch 0 | step 22120 |avg loss 4.872 |avg tokens 4487.100 |tokens/s 32615.928 |walltime 3039.679 | +Transformer | epoch 0 | step 22130 |avg loss 5.087 |avg tokens 4658.800 |tokens/s 33643.244 |walltime 3041.064 | +Transformer | epoch 0 | step 22140 |avg loss 6.125 |avg tokens 3337.000 |tokens/s 26616.347 |walltime 3042.317 | +Transformer | epoch 0 | step 22150 |avg loss 5.225 |avg tokens 4616.800 |tokens/s 34486.721 |walltime 3043.656 | +Transformer | epoch 0 | step 22160 |avg loss 4.988 |avg tokens 4635.600 |tokens/s 33923.868 |walltime 3045.022 | +Transformer | epoch 0 | step 22170 |avg loss 4.801 |avg tokens 4508.800 |tokens/s 32170.518 |walltime 3046.424 | +Transformer | epoch 0 | step 22180 |avg loss 5.069 |avg tokens 4708.900 |tokens/s 34649.403 |walltime 3047.783 | +Transformer | epoch 0 | step 22190 |avg loss 5.601 |avg tokens 4555.600 |tokens/s 34551.764 |walltime 3049.102 | +Transformer | epoch 0 | step 22200 |avg loss 4.987 |avg tokens 4426.300 |tokens/s 32379.723 |walltime 3050.469 | +Transformer | epoch 0 | step 22210 |avg loss 5.444 |avg tokens 4405.800 |tokens/s 32891.355 |walltime 3051.808 | +Transformer | epoch 0 | step 22220 |avg loss 4.795 |avg tokens 4657.000 |tokens/s 34240.639 |walltime 3053.168 | +Transformer | epoch 0 | step 22230 |avg loss 5.508 |avg tokens 4663.500 |tokens/s 34581.641 |walltime 3054.517 | +Transformer | epoch 0 | step 22240 |avg loss 4.740 |avg tokens 4646.100 |tokens/s 33044.894 |walltime 3055.923 | +Transformer | epoch 0 | step 22250 |avg loss 4.881 |avg tokens 4763.100 |tokens/s 34327.568 |walltime 3057.310 | +Transformer | epoch 0 | step 22260 |avg loss 4.900 |avg tokens 4217.100 |tokens/s 30663.927 |walltime 3058.685 | +Transformer | epoch 0 | step 22270 |avg loss 4.923 |avg tokens 4687.200 |tokens/s 34325.008 |walltime 3060.051 | +Transformer | epoch 0 | step 22280 |avg loss 4.943 |avg tokens 4373.600 |tokens/s 31998.649 |walltime 3061.418 | +Transformer | epoch 0 | step 22290 |avg loss 5.541 |avg tokens 4772.300 |tokens/s 35922.964 |walltime 3062.746 | +Transformer | epoch 0 | step 22300 |avg loss 5.630 |avg tokens 4768.300 |tokens/s 34817.873 |walltime 3064.116 | +Transformer | epoch 0 | step 22310 |avg loss 4.621 |avg tokens 4574.000 |tokens/s 31943.895 |walltime 3065.548 | +Transformer | epoch 0 | step 22320 |avg loss 4.723 |avg tokens 4695.200 |tokens/s 33019.243 |walltime 3066.970 | +Transformer | epoch 0 | step 22330 |avg loss 4.771 |avg tokens 4865.600 |tokens/s 33789.721 |walltime 3068.410 | +Transformer | epoch 0 | step 22340 |avg loss 5.381 |avg tokens 4933.400 |tokens/s 36858.648 |walltime 3069.748 | +Transformer | epoch 0 | step 22350 |avg loss 5.345 |avg tokens 4590.000 |tokens/s 34082.936 |walltime 3071.095 | +Transformer | epoch 0 | step 22360 |avg loss 5.242 |avg tokens 4579.400 |tokens/s 33493.060 |walltime 3072.462 | +Transformer | epoch 0 | step 22370 |avg loss 5.006 |avg tokens 4234.400 |tokens/s 31230.764 |walltime 3073.818 | +Transformer | epoch 0 | step 22380 |avg loss 4.678 |avg tokens 4890.400 |tokens/s 34964.437 |walltime 3075.217 | +Transformer | epoch 0 | step 22390 |avg loss 5.162 |avg tokens 4750.800 |tokens/s 34611.702 |walltime 3076.589 | +Transformer | epoch 0 | step 22400 |avg loss 5.075 |avg tokens 4497.400 |tokens/s 32304.975 |walltime 3077.981 | +Transformer | epoch 0 | step 22410 |avg loss 5.236 |avg tokens 4437.100 |tokens/s 33502.424 |walltime 3079.306 | +Transformer | epoch 0 | step 22420 |avg loss 5.194 |avg tokens 4370.800 |tokens/s 31835.706 |walltime 3080.679 | +Transformer | epoch 0 | step 22430 |avg loss 4.760 |avg tokens 4532.300 |tokens/s 32679.072 |walltime 3082.066 | +Transformer | epoch 0 | step 22440 |avg loss 5.585 |avg tokens 3830.400 |tokens/s 30336.432 |walltime 3083.328 | +Transformer | epoch 0 | step 22450 |avg loss 5.415 |avg tokens 4079.700 |tokens/s 30129.019 |walltime 3084.682 | +Transformer | epoch 0 | step 22460 |avg loss 5.131 |avg tokens 4654.400 |tokens/s 34422.211 |walltime 3086.035 | +Transformer | epoch 0 | step 22470 |avg loss 5.102 |avg tokens 4244.800 |tokens/s 31478.072 |walltime 3087.383 | +Transformer | epoch 0 | step 22480 |avg loss 5.507 |avg tokens 4592.400 |tokens/s 34473.991 |walltime 3088.715 | +Transformer | epoch 0 | step 22490 |avg loss 4.915 |avg tokens 4653.600 |tokens/s 33797.540 |walltime 3090.092 | +Transformer | epoch 0 | step 22500 |avg loss 4.621 |avg tokens 4796.800 |tokens/s 33809.799 |walltime 3091.511 | +Transformer | epoch 0 | step 22510 |avg loss 4.885 |avg tokens 4654.300 |tokens/s 33899.812 |walltime 3092.884 | +Transformer | epoch 0 | step 22520 |avg loss 4.745 |avg tokens 4915.500 |tokens/s 35787.782 |walltime 3094.257 | +Transformer | epoch 0 | step 22530 |avg loss 5.256 |avg tokens 4495.900 |tokens/s 33764.987 |walltime 3095.589 | +Transformer | epoch 0 | step 22540 |avg loss 5.385 |avg tokens 4323.500 |tokens/s 33002.461 |walltime 3096.899 | +Transformer | epoch 0 | step 22550 |avg loss 4.881 |avg tokens 4557.100 |tokens/s 32840.639 |walltime 3098.287 | +Transformer | epoch 0 | step 22560 |avg loss 6.027 |avg tokens 3786.300 |tokens/s 29339.494 |walltime 3099.577 | +Transformer | epoch 0 | step 22570 |avg loss 5.931 |avg tokens 4057.000 |tokens/s 31401.611 |walltime 3100.869 | +Transformer | epoch 0 | step 22580 |avg loss 4.718 |avg tokens 4655.300 |tokens/s 33073.112 |walltime 3102.277 | +Transformer | epoch 0 | step 22590 |avg loss 5.664 |avg tokens 4337.200 |tokens/s 32032.545 |walltime 3103.631 | +Transformer | epoch 0 | step 22600 |avg loss 4.954 |avg tokens 4697.900 |tokens/s 33635.089 |walltime 3105.027 | +Transformer | epoch 0 | step 22610 |avg loss 5.453 |avg tokens 4759.400 |tokens/s 34758.017 |walltime 3106.397 | +Transformer | epoch 0 | step 22620 |avg loss 5.363 |avg tokens 4244.500 |tokens/s 31775.688 |walltime 3107.732 | +Transformer | epoch 0 | step 22630 |avg loss 5.407 |avg tokens 4580.700 |tokens/s 33407.270 |walltime 3109.104 | +Transformer | epoch 0 | step 22640 |avg loss 4.942 |avg tokens 4712.000 |tokens/s 33430.173 |walltime 3110.513 | +Transformer | epoch 0 | step 22650 |avg loss 4.824 |avg tokens 4740.600 |tokens/s 33935.232 |walltime 3111.910 | +Transformer | epoch 0 | step 22660 |avg loss 5.580 |avg tokens 3819.600 |tokens/s 28294.035 |walltime 3113.260 | +Transformer | epoch 0 | step 22670 |avg loss 5.317 |avg tokens 4516.400 |tokens/s 33280.858 |walltime 3114.617 | +Transformer | epoch 0 | step 22680 |avg loss 4.990 |avg tokens 4925.900 |tokens/s 35063.518 |walltime 3116.022 | +Transformer | epoch 0 | step 22690 |avg loss 5.216 |avg tokens 4271.500 |tokens/s 32214.114 |walltime 3117.348 | +Transformer | epoch 0 | step 22700 |avg loss 5.242 |avg tokens 4671.100 |tokens/s 34954.286 |walltime 3118.684 | +Transformer | epoch 0 | step 22710 |avg loss 4.515 |avg tokens 4901.600 |tokens/s 34356.491 |walltime 3120.111 | +Transformer | epoch 0 | step 22720 |avg loss 5.232 |avg tokens 4153.300 |tokens/s 30537.838 |walltime 3121.471 | +Transformer | epoch 0 | step 22730 |avg loss 4.726 |avg tokens 4589.300 |tokens/s 32546.431 |walltime 3122.881 | +Transformer | epoch 0 | step 22740 |avg loss 5.306 |avg tokens 4317.500 |tokens/s 32528.597 |walltime 3124.208 | +Transformer | epoch 0 | step 22750 |avg loss 5.197 |avg tokens 4283.200 |tokens/s 31047.108 |walltime 3125.588 | +Transformer | epoch 0 | step 22760 |avg loss 4.596 |avg tokens 4851.200 |tokens/s 33252.766 |walltime 3127.047 | +Transformer | epoch 0 | step 22770 |avg loss 4.869 |avg tokens 4566.300 |tokens/s 33671.305 |walltime 3128.403 | +Transformer | epoch 0 | step 22780 |avg loss 4.820 |avg tokens 4556.000 |tokens/s 32036.058 |walltime 3129.825 | +Transformer | epoch 0 | step 22790 |avg loss 5.686 |avg tokens 4336.300 |tokens/s 33038.032 |walltime 3131.138 | +Transformer | epoch 0 | step 22800 |avg loss 5.617 |avg tokens 4287.500 |tokens/s 32838.872 |walltime 3132.443 | +Transformer | epoch 0 | step 22810 |avg loss 5.061 |avg tokens 4431.700 |tokens/s 31972.533 |walltime 3133.829 | +Transformer | epoch 0 | step 22820 |avg loss 4.759 |avg tokens 4925.600 |tokens/s 35203.808 |walltime 3135.229 | +Transformer | epoch 0 | step 22830 |avg loss 5.452 |avg tokens 4109.300 |tokens/s 30663.348 |walltime 3136.569 | +Transformer | epoch 0 | step 22840 |avg loss 5.069 |avg tokens 4541.900 |tokens/s 32567.930 |walltime 3137.963 | +Transformer | epoch 0 | step 22850 |avg loss 4.737 |avg tokens 4847.200 |tokens/s 33626.541 |walltime 3139.405 | +Transformer | epoch 0 | step 22860 |avg loss 5.224 |avg tokens 4887.900 |tokens/s 35035.686 |walltime 3140.800 | +Transformer | epoch 0 | step 22870 |avg loss 4.724 |avg tokens 4887.300 |tokens/s 33876.477 |walltime 3142.243 | +Transformer | epoch 0 | step 22880 |avg loss 4.916 |avg tokens 4400.100 |tokens/s 32077.194 |walltime 3143.614 | +Transformer | epoch 0 | step 22890 |avg loss 5.463 |avg tokens 4180.800 |tokens/s 31492.867 |walltime 3144.942 | +Transformer | epoch 0 | step 22900 |avg loss 5.022 |avg tokens 4490.500 |tokens/s 31667.906 |walltime 3146.360 | +Transformer | epoch 0 | step 22910 |avg loss 5.096 |avg tokens 4359.000 |tokens/s 31118.748 |walltime 3147.761 | +Transformer | epoch 0 | step 22920 |avg loss 4.902 |avg tokens 4698.100 |tokens/s 34514.886 |walltime 3149.122 | +Transformer | epoch 0 | step 22930 |avg loss 4.814 |avg tokens 4818.900 |tokens/s 34484.142 |walltime 3150.519 | +Transformer | epoch 0 | step 22940 |avg loss 5.055 |avg tokens 4847.700 |tokens/s 35431.450 |walltime 3151.887 | +Transformer | epoch 0 | step 22950 |avg loss 4.617 |avg tokens 4820.300 |tokens/s 32929.065 |walltime 3153.351 | +Transformer | epoch 0 | step 22960 |avg loss 5.644 |avg tokens 4381.700 |tokens/s 32683.369 |walltime 3154.692 | +Transformer | epoch 0 | step 22970 |avg loss 5.264 |avg tokens 4347.400 |tokens/s 31875.563 |walltime 3156.056 | +Transformer | epoch 0 | step 22980 |avg loss 5.159 |avg tokens 4248.400 |tokens/s 30997.383 |walltime 3157.426 | +Transformer | epoch 0 | step 22990 |avg loss 5.214 |avg tokens 4505.500 |tokens/s 31940.310 |walltime 3158.837 | +Transformer | epoch 0 | step 23000 |avg loss 5.503 |avg tokens 4311.300 |tokens/s 33292.497 |walltime 3160.132 | +Transformer | epoch 0 | step 23010 |avg loss 5.159 |avg tokens 4285.600 |tokens/s 32304.919 |walltime 3161.459 | +Transformer | epoch 0 | step 23020 |avg loss 5.478 |avg tokens 4184.000 |tokens/s 31124.142 |walltime 3162.803 | +Transformer | epoch 0 | step 23030 |avg loss 5.432 |avg tokens 4025.800 |tokens/s 29769.204 |walltime 3164.155 | +Transformer | epoch 0 | step 23040 |avg loss 5.446 |avg tokens 4336.300 |tokens/s 31914.595 |walltime 3165.514 | +Transformer | epoch 0 | step 23050 |avg loss 4.991 |avg tokens 4623.500 |tokens/s 33474.762 |walltime 3166.895 | +Transformer | epoch 0 | step 23060 |avg loss 4.682 |avg tokens 4705.300 |tokens/s 33132.181 |walltime 3168.315 | +Transformer | epoch 0 | step 23070 |avg loss 4.868 |avg tokens 4551.700 |tokens/s 31938.261 |walltime 3169.741 | +Transformer | epoch 0 | step 23080 |avg loss 4.810 |avg tokens 4768.700 |tokens/s 33786.995 |walltime 3171.152 | +Transformer | epoch 0 | step 23090 |avg loss 4.970 |avg tokens 4312.900 |tokens/s 31880.235 |walltime 3172.505 | +Transformer | epoch 0 | step 23100 |avg loss 4.497 |avg tokens 4707.300 |tokens/s 32939.395 |walltime 3173.934 | +Transformer | epoch 0 | step 23110 |avg loss 4.822 |avg tokens 4498.300 |tokens/s 32578.942 |walltime 3175.315 | +Transformer | epoch 0 | step 23120 |avg loss 4.956 |avg tokens 4648.400 |tokens/s 33556.240 |walltime 3176.700 | +Transformer | epoch 0 | step 23130 |avg loss 4.747 |avg tokens 4616.500 |tokens/s 32788.147 |walltime 3178.108 | +Transformer | epoch 0 | step 23140 |avg loss 5.459 |avg tokens 4856.200 |tokens/s 34918.157 |walltime 3179.499 | +Transformer | epoch 0 | step 23150 |avg loss 5.329 |avg tokens 4152.800 |tokens/s 31119.592 |walltime 3180.833 | +Transformer | epoch 0 | step 23160 |avg loss 5.437 |avg tokens 4484.100 |tokens/s 33118.646 |walltime 3182.187 | +Transformer | epoch 0 | step 23170 |avg loss 5.030 |avg tokens 4418.100 |tokens/s 30851.453 |walltime 3183.619 | +Transformer | epoch 0 | step 23180 |avg loss 4.923 |avg tokens 4790.400 |tokens/s 34071.600 |walltime 3185.025 | +Transformer | epoch 0 | step 23190 |avg loss 4.569 |avg tokens 4638.500 |tokens/s 32818.090 |walltime 3186.438 | +Transformer | epoch 0 | step 23200 |avg loss 5.390 |avg tokens 4336.400 |tokens/s 32543.713 |walltime 3187.771 | +Transformer | epoch 0 | step 23210 |avg loss 5.109 |avg tokens 4776.300 |tokens/s 35138.256 |walltime 3189.130 | +Transformer | epoch 0 | step 23220 |avg loss 5.558 |avg tokens 4165.200 |tokens/s 31468.856 |walltime 3190.454 | +Transformer | epoch 0 | step 23230 |avg loss 5.297 |avg tokens 4203.700 |tokens/s 32020.390 |walltime 3191.767 | +Transformer | epoch 0 | step 23240 |avg loss 5.074 |avg tokens 4403.200 |tokens/s 32800.497 |walltime 3193.109 | +Transformer | epoch 0 | step 23250 |avg loss 5.335 |avg tokens 3910.800 |tokens/s 30348.942 |walltime 3194.398 | +Transformer | epoch 0 | step 23260 |avg loss 5.580 |avg tokens 4121.300 |tokens/s 32023.151 |walltime 3195.685 | +Transformer | epoch 0 | step 23270 |avg loss 4.670 |avg tokens 4755.200 |tokens/s 33384.555 |walltime 3197.109 | +Transformer | epoch 0 | step 23280 |avg loss 5.062 |avg tokens 4435.300 |tokens/s 32669.729 |walltime 3198.467 | +Transformer | epoch 0 | step 23290 |avg loss 4.712 |avg tokens 4555.100 |tokens/s 32745.176 |walltime 3199.858 | +Transformer | epoch 0 | step 23300 |avg loss 4.886 |avg tokens 4508.800 |tokens/s 31638.469 |walltime 3201.283 | +Transformer | epoch 0 | step 23310 |avg loss 5.445 |avg tokens 4574.600 |tokens/s 35032.432 |walltime 3202.589 | +Transformer | epoch 0 | step 23320 |avg loss 4.747 |avg tokens 4694.700 |tokens/s 33955.039 |walltime 3203.971 | +Transformer | epoch 0 | step 23330 |avg loss 5.110 |avg tokens 4394.800 |tokens/s 32519.353 |walltime 3205.323 | +Transformer | epoch 0 | step 23340 |avg loss 4.899 |avg tokens 4970.200 |tokens/s 35279.823 |walltime 3206.732 | +Transformer | epoch 0 | step 23350 |avg loss 4.824 |avg tokens 4766.400 |tokens/s 33811.474 |walltime 3208.141 | +Transformer | epoch 0 | step 23360 |avg loss 4.719 |avg tokens 4831.200 |tokens/s 34287.947 |walltime 3209.550 | +Transformer | epoch 0 | step 23370 |avg loss 5.345 |avg tokens 4948.300 |tokens/s 35825.244 |walltime 3210.931 | +Transformer | epoch 0 | step 23380 |avg loss 5.429 |avg tokens 4474.600 |tokens/s 32540.142 |walltime 3212.307 | +Transformer | epoch 0 | step 23390 |avg loss 5.591 |avg tokens 4126.800 |tokens/s 31755.944 |walltime 3213.606 | +Transformer | epoch 0 | step 23400 |avg loss 4.813 |avg tokens 4533.400 |tokens/s 32637.590 |walltime 3214.995 | +Transformer | epoch 0 | step 23410 |avg loss 4.946 |avg tokens 4467.900 |tokens/s 31350.234 |walltime 3216.420 | +Transformer | epoch 0 | step 23420 |avg loss 4.605 |avg tokens 4707.000 |tokens/s 32326.029 |walltime 3217.876 | +Transformer | epoch 0 | step 23430 |avg loss 5.103 |avg tokens 4928.900 |tokens/s 35244.576 |walltime 3219.275 | +Transformer | epoch 0 | step 23440 |avg loss 5.549 |avg tokens 3999.500 |tokens/s 31660.606 |walltime 3220.538 | +Transformer | epoch 0 | step 23450 |avg loss 4.991 |avg tokens 4054.900 |tokens/s 29802.267 |walltime 3221.899 | +Transformer | epoch 0 | step 23460 |avg loss 4.890 |avg tokens 4422.000 |tokens/s 32830.457 |walltime 3223.246 | +Transformer | epoch 0 | step 23470 |avg loss 4.928 |avg tokens 4404.800 |tokens/s 32332.610 |walltime 3224.608 | +Transformer | epoch 0 | step 23480 |avg loss 5.026 |avg tokens 4660.400 |tokens/s 32033.145 |walltime 3226.063 | +Transformer | epoch 0 | step 23490 |avg loss 4.938 |avg tokens 4931.600 |tokens/s 34726.380 |walltime 3227.483 | +Transformer | epoch 0 | step 23500 |avg loss 4.985 |avg tokens 4398.900 |tokens/s 31599.941 |walltime 3228.875 | +Transformer | epoch 0 | step 23510 |avg loss 4.632 |avg tokens 4567.900 |tokens/s 32551.261 |walltime 3230.278 | +Transformer | epoch 0 | step 23520 |avg loss 5.382 |avg tokens 4264.100 |tokens/s 32533.490 |walltime 3231.589 | +Transformer | epoch 0 | step 23530 |avg loss 5.283 |avg tokens 4446.000 |tokens/s 32098.260 |walltime 3232.974 | +Transformer | epoch 0 | step 23540 |avg loss 5.500 |avg tokens 4702.000 |tokens/s 33793.919 |walltime 3234.366 | +Transformer | epoch 0 | step 23550 |avg loss 4.809 |avg tokens 4753.900 |tokens/s 34638.083 |walltime 3235.738 | +Transformer | epoch 0 | step 23560 |avg loss 5.116 |avg tokens 4466.600 |tokens/s 31754.478 |walltime 3237.145 | +Transformer | epoch 0 | step 23570 |avg loss 5.378 |avg tokens 4142.600 |tokens/s 30359.827 |walltime 3238.509 | +Transformer | epoch 0 | step 23580 |avg loss 5.411 |avg tokens 4578.800 |tokens/s 33196.322 |walltime 3239.888 | +Transformer | epoch 0 | step 23590 |avg loss 5.093 |avg tokens 4593.700 |tokens/s 33139.020 |walltime 3241.275 | +Transformer | epoch 0 | step 23600 |avg loss 4.881 |avg tokens 4675.700 |tokens/s 32854.064 |walltime 3242.698 | +Transformer | epoch 0 | step 23610 |avg loss 5.168 |avg tokens 4590.000 |tokens/s 32495.764 |walltime 3244.110 | +Transformer | epoch 0 | step 23620 |avg loss 5.079 |avg tokens 4106.900 |tokens/s 29652.917 |walltime 3245.495 | +Transformer | epoch 0 | step 23630 |avg loss 4.639 |avg tokens 4730.800 |tokens/s 32475.356 |walltime 3246.952 | +Transformer | epoch 0 | step 23640 |avg loss 5.294 |avg tokens 4034.800 |tokens/s 29123.993 |walltime 3248.337 | +Transformer | epoch 0 | step 23650 |avg loss 4.821 |avg tokens 4812.900 |tokens/s 34422.642 |walltime 3249.736 | +Transformer | epoch 0 | step 23660 |avg loss 4.864 |avg tokens 4796.000 |tokens/s 34415.768 |walltime 3251.129 | +Transformer | epoch 0 | step 23670 |avg loss 4.863 |avg tokens 4988.000 |tokens/s 34753.365 |walltime 3252.564 | +Transformer | epoch 0 | step 23680 |avg loss 5.264 |avg tokens 4420.100 |tokens/s 32878.734 |walltime 3253.909 | +Transformer | epoch 0 | step 23690 |avg loss 4.904 |avg tokens 4718.100 |tokens/s 33410.753 |walltime 3255.321 | +Transformer | epoch 0 | step 23700 |avg loss 5.358 |avg tokens 4220.200 |tokens/s 31589.508 |walltime 3256.657 | +Transformer | epoch 0 | step 23710 |avg loss 4.843 |avg tokens 4405.500 |tokens/s 31427.142 |walltime 3258.059 | +Transformer | epoch 0 | step 23720 |avg loss 4.884 |avg tokens 4832.600 |tokens/s 34757.723 |walltime 3259.449 | +Transformer | epoch 0 | step 23730 |avg loss 4.761 |avg tokens 4923.900 |tokens/s 34950.185 |walltime 3260.858 | +Transformer | epoch 0 | step 23740 |avg loss 4.872 |avg tokens 4630.800 |tokens/s 34037.227 |walltime 3262.218 | +Transformer | epoch 0 | step 23750 |avg loss 5.088 |avg tokens 4435.700 |tokens/s 32333.333 |walltime 3263.590 | +Transformer | epoch 0 | step 23760 |avg loss 5.503 |avg tokens 4303.400 |tokens/s 33373.001 |walltime 3264.880 | +Transformer | epoch 0 | step 23770 |avg loss 5.005 |avg tokens 4915.600 |tokens/s 35116.252 |walltime 3266.280 | +Transformer | epoch 0 | step 23780 |avg loss 4.712 |avg tokens 4739.300 |tokens/s 33690.967 |walltime 3267.686 | +Transformer | epoch 0 | step 23790 |avg loss 5.168 |avg tokens 4237.200 |tokens/s 31433.233 |walltime 3269.034 | +Transformer | epoch 0 | step 23800 |avg loss 4.963 |avg tokens 4592.500 |tokens/s 34714.840 |walltime 3270.357 | +Transformer | epoch 0 | step 23810 |avg loss 5.127 |avg tokens 4113.400 |tokens/s 30286.192 |walltime 3271.715 | +Transformer | epoch 0 | step 23820 |avg loss 5.348 |avg tokens 4189.700 |tokens/s 30951.692 |walltime 3273.069 | +Transformer | epoch 0 | step 23830 |avg loss 6.134 |avg tokens 3828.200 |tokens/s 30878.999 |walltime 3274.309 | +Transformer | epoch 0 | step 23840 |avg loss 5.467 |avg tokens 4560.800 |tokens/s 34082.523 |walltime 3275.647 | +Transformer | epoch 0 | step 23850 |avg loss 5.205 |avg tokens 4630.500 |tokens/s 33814.544 |walltime 3277.016 | +Transformer | epoch 0 | step 23860 |avg loss 4.622 |avg tokens 4922.600 |tokens/s 34616.275 |walltime 3278.438 | +Transformer | epoch 0 | step 23870 |avg loss 4.867 |avg tokens 4789.600 |tokens/s 33550.801 |walltime 3279.866 | +Transformer | epoch 0 | step 23880 |avg loss 4.913 |avg tokens 4608.400 |tokens/s 34908.453 |walltime 3281.186 | +Transformer | epoch 0 | step 23890 |avg loss 4.888 |avg tokens 4340.500 |tokens/s 32510.896 |walltime 3282.521 | +Transformer | epoch 0 | step 23900 |avg loss 5.401 |avg tokens 4097.100 |tokens/s 31044.627 |walltime 3283.841 | +Transformer | epoch 0 | step 23910 |avg loss 5.253 |avg tokens 4427.100 |tokens/s 32874.664 |walltime 3285.188 | +Transformer | epoch 0 | step 23920 |avg loss 5.595 |avg tokens 4467.100 |tokens/s 33800.886 |walltime 3286.509 | +Transformer | epoch 0 | step 23930 |avg loss 4.541 |avg tokens 4654.100 |tokens/s 32892.503 |walltime 3287.924 | +Transformer | epoch 0 | step 23940 |avg loss 6.391 |avg tokens 3670.900 |tokens/s 30155.110 |walltime 3289.142 | +Transformer | epoch 0 | step 23950 |avg loss 4.794 |avg tokens 4466.500 |tokens/s 31661.296 |walltime 3290.552 | +Transformer | epoch 0 | step 23960 |avg loss 5.150 |avg tokens 4608.400 |tokens/s 33996.693 |walltime 3291.908 | +Transformer | epoch 0 | step 23970 |avg loss 5.217 |avg tokens 3823.500 |tokens/s 29130.873 |walltime 3293.220 | +Transformer | epoch 0 | step 23980 |avg loss 5.467 |avg tokens 4299.400 |tokens/s 30683.839 |walltime 3294.622 | +Transformer | epoch 0 | step 23990 |avg loss 5.398 |avg tokens 4550.800 |tokens/s 34461.568 |walltime 3295.942 | +Transformer | epoch 0 | step 24000 |avg loss 5.464 |avg tokens 4470.700 |tokens/s 33344.610 |walltime 3297.283 | +Transformer | epoch 0 | step 24010 |avg loss 4.882 |avg tokens 4544.400 |tokens/s 33744.519 |walltime 3298.630 | +Transformer | epoch 0 | step 24020 |avg loss 4.802 |avg tokens 4230.400 |tokens/s 32170.084 |walltime 3299.945 | +Transformer | epoch 0 | step 24030 |avg loss 4.481 |avg tokens 4884.800 |tokens/s 34222.808 |walltime 3301.372 | +Transformer | epoch 0 | step 24040 |avg loss 5.429 |avg tokens 4625.700 |tokens/s 35035.141 |walltime 3302.692 | +Transformer | epoch 0 | step 24050 |avg loss 5.131 |avg tokens 4369.500 |tokens/s 32302.233 |walltime 3304.045 | +Transformer | epoch 0 | step 24060 |avg loss 4.845 |avg tokens 4814.100 |tokens/s 33334.162 |walltime 3305.489 | +Transformer | epoch 0 | step 24070 |avg loss 4.998 |avg tokens 4556.900 |tokens/s 33416.025 |walltime 3306.853 | +Transformer | epoch 0 | step 24080 |avg loss 5.251 |avg tokens 4312.400 |tokens/s 31292.989 |walltime 3308.231 | +Transformer | epoch 0 | step 24090 |avg loss 4.848 |avg tokens 4645.300 |tokens/s 33468.809 |walltime 3309.619 | +Transformer | epoch 0 | step 24100 |avg loss 5.389 |avg tokens 4602.600 |tokens/s 33408.054 |walltime 3310.997 | +Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 64.0 +Transformer | epoch 0 | step 24110 |avg loss 4.890 |avg tokens 4426.700 |tokens/s 30106.632 |walltime 3312.467 | +Transformer | epoch 0 | step 24120 |avg loss 4.683 |avg tokens 4714.900 |tokens/s 34332.675 |walltime 3313.840 | +Transformer | epoch 0 | step 24130 |avg loss 5.541 |avg tokens 4632.400 |tokens/s 34847.137 |walltime 3315.170 | +Transformer | epoch 0 | step 24140 |avg loss 5.297 |avg tokens 4149.700 |tokens/s 31066.417 |walltime 3316.505 | +Transformer | epoch 0 | step 24150 |avg loss 4.652 |avg tokens 4505.500 |tokens/s 33142.443 |walltime 3317.865 | +Transformer | epoch 0 | step 24160 |avg loss 5.002 |avg tokens 4662.400 |tokens/s 34113.476 |walltime 3319.231 | +Transformer | epoch 0 | step 24170 |avg loss 5.375 |avg tokens 4424.100 |tokens/s 33386.465 |walltime 3320.557 | +Transformer | epoch 0 | step 24180 |avg loss 5.571 |avg tokens 4028.900 |tokens/s 30002.506 |walltime 3321.899 | +Transformer | epoch 0 | step 24190 |avg loss 5.537 |avg tokens 4380.600 |tokens/s 33359.214 |walltime 3323.213 | +Transformer | epoch 0 | step 24200 |avg loss 4.797 |avg tokens 4539.200 |tokens/s 32926.490 |walltime 3324.591 | +Transformer | epoch 0 | step 24210 |avg loss 4.599 |avg tokens 4745.400 |tokens/s 33425.323 |walltime 3326.011 | +Transformer | epoch 0 | step 24220 |avg loss 5.304 |avg tokens 4635.500 |tokens/s 34366.233 |walltime 3327.360 | +Transformer | epoch 0 | step 24230 |avg loss 4.515 |avg tokens 4801.000 |tokens/s 34096.193 |walltime 3328.768 | +Transformer | epoch 0 | step 24240 |avg loss 4.668 |avg tokens 4771.800 |tokens/s 33876.864 |walltime 3330.176 | +Transformer | epoch 0 | step 24250 |avg loss 5.318 |avg tokens 4426.400 |tokens/s 32191.263 |walltime 3331.551 | +Transformer | epoch 0 | step 24260 |avg loss 5.156 |avg tokens 4613.500 |tokens/s 34186.852 |walltime 3332.901 | +Transformer | epoch 0 | step 24270 |avg loss 5.085 |avg tokens 4594.000 |tokens/s 33485.973 |walltime 3334.273 | +Transformer | epoch 0 | step 24280 |avg loss 5.698 |avg tokens 4240.500 |tokens/s 32006.842 |walltime 3335.598 | +Transformer | epoch 0 | step 24290 |avg loss 4.759 |avg tokens 4393.600 |tokens/s 32406.566 |walltime 3336.954 | +Transformer | epoch 0 | step 24300 |avg loss 5.210 |avg tokens 4059.700 |tokens/s 30184.429 |walltime 3338.299 | +Transformer | epoch 0 | step 24310 |avg loss 5.615 |avg tokens 4268.100 |tokens/s 32515.786 |walltime 3339.611 | +Transformer | epoch 0 | step 24320 |avg loss 4.948 |avg tokens 4854.400 |tokens/s 34614.063 |walltime 3341.014 | +Transformer | epoch 0 | step 24330 |avg loss 4.841 |avg tokens 4623.000 |tokens/s 34419.501 |walltime 3342.357 | +Transformer | epoch 0 | step 24340 |avg loss 5.121 |avg tokens 4057.600 |tokens/s 29874.219 |walltime 3343.715 | +Transformer | epoch 0 | step 24350 |avg loss 5.555 |avg tokens 4735.500 |tokens/s 35917.066 |walltime 3345.033 | +Transformer | epoch 0 | step 24360 |avg loss 4.542 |avg tokens 4776.900 |tokens/s 32544.008 |walltime 3346.501 | +Transformer | epoch 0 | step 24370 |avg loss 5.518 |avg tokens 4128.000 |tokens/s 31700.917 |walltime 3347.803 | +Transformer | epoch 0 | step 24380 |avg loss 4.921 |avg tokens 4683.100 |tokens/s 33080.468 |walltime 3349.219 | +Transformer | epoch 0 | step 24390 |avg loss 4.752 |avg tokens 4669.900 |tokens/s 33858.390 |walltime 3350.598 | +Transformer | epoch 0 | step 24400 |avg loss 4.985 |avg tokens 4408.900 |tokens/s 31910.976 |walltime 3351.980 | +Transformer | epoch 0 | step 24410 |avg loss 4.714 |avg tokens 4542.000 |tokens/s 32278.856 |walltime 3353.387 | +Transformer | epoch 0 | step 24420 |avg loss 4.726 |avg tokens 4522.400 |tokens/s 32524.995 |walltime 3354.778 | +Transformer | epoch 0 | step 24430 |avg loss 5.380 |avg tokens 4861.500 |tokens/s 36135.091 |walltime 3356.123 | +Transformer | epoch 0 | step 24440 |avg loss 5.694 |avg tokens 4205.400 |tokens/s 33060.612 |walltime 3357.395 | +Transformer | epoch 0 | step 24450 |avg loss 5.794 |avg tokens 3792.000 |tokens/s 29126.802 |walltime 3358.697 | +Transformer | epoch 0 | step 24460 |avg loss 4.774 |avg tokens 4863.400 |tokens/s 34986.596 |walltime 3360.087 | +Transformer | epoch 0 | step 24470 |avg loss 4.904 |avg tokens 4356.000 |tokens/s 30971.752 |walltime 3361.493 | +Transformer | epoch 0 | step 24480 |avg loss 4.702 |avg tokens 4812.800 |tokens/s 34811.320 |walltime 3362.876 | +Transformer | epoch 0 | step 24490 |avg loss 5.432 |avg tokens 4370.800 |tokens/s 32860.044 |walltime 3364.206 | +Transformer | epoch 0 | step 24500 |avg loss 5.074 |avg tokens 4727.900 |tokens/s 33477.037 |walltime 3365.618 | +Transformer | epoch 0 | step 24510 |avg loss 4.959 |avg tokens 4805.300 |tokens/s 35048.086 |walltime 3366.989 | +Transformer | epoch 0 | step 24520 |avg loss 5.213 |avg tokens 4404.000 |tokens/s 31976.617 |walltime 3368.367 | +Transformer | epoch 0 | step 24530 |avg loss 5.479 |avg tokens 4546.100 |tokens/s 34302.848 |walltime 3369.692 | +Transformer | epoch 0 | step 24540 |avg loss 5.173 |avg tokens 4323.300 |tokens/s 31667.947 |walltime 3371.057 | +Transformer | epoch 0 | step 24550 |avg loss 4.618 |avg tokens 4640.800 |tokens/s 32591.491 |walltime 3372.481 | +Transformer | epoch 0 | step 24560 |avg loss 5.232 |avg tokens 4436.300 |tokens/s 33072.413 |walltime 3373.822 | +Transformer | epoch 0 | step 24570 |avg loss 4.735 |avg tokens 4861.500 |tokens/s 34724.294 |walltime 3375.222 | +Transformer | epoch 0 | step 24580 |avg loss 5.108 |avg tokens 4500.300 |tokens/s 32953.672 |walltime 3376.588 | +Transformer | epoch 0 | step 24590 |avg loss 5.237 |avg tokens 4549.400 |tokens/s 33815.765 |walltime 3377.933 | +Transformer | epoch 0 | step 24600 |avg loss 5.633 |avg tokens 4211.700 |tokens/s 32127.673 |walltime 3379.244 | +Transformer | epoch 0 | step 24610 |avg loss 5.208 |avg tokens 4508.100 |tokens/s 34338.284 |walltime 3380.557 | +Transformer | epoch 0 | step 24620 |avg loss 4.645 |avg tokens 4876.700 |tokens/s 34209.562 |walltime 3381.983 | +Transformer | epoch 0 | step 24630 |avg loss 4.967 |avg tokens 4522.600 |tokens/s 33689.614 |walltime 3383.325 | +Transformer | epoch 0 | step 24640 |avg loss 4.786 |avg tokens 4410.200 |tokens/s 32261.384 |walltime 3384.692 | +Transformer | epoch 0 | step 24650 |avg loss 5.399 |avg tokens 3819.700 |tokens/s 29757.092 |walltime 3385.976 | +Transformer | epoch 0 | step 24660 |avg loss 4.738 |avg tokens 4449.000 |tokens/s 32781.764 |walltime 3387.333 | +Transformer | epoch 0 | step 24670 |avg loss 5.126 |avg tokens 4304.300 |tokens/s 30106.007 |walltime 3388.763 | +Transformer | epoch 0 | step 24680 |avg loss 5.163 |avg tokens 4591.400 |tokens/s 34003.095 |walltime 3390.113 | +Transformer | epoch 0 | step 24690 |avg loss 5.068 |avg tokens 4438.100 |tokens/s 32126.185 |walltime 3391.495 | +Transformer | epoch 0 | step 24700 |avg loss 5.249 |avg tokens 4458.900 |tokens/s 34012.930 |walltime 3392.805 | +Transformer | epoch 0 | step 24710 |avg loss 4.923 |avg tokens 4735.700 |tokens/s 33347.876 |walltime 3394.226 | +Transformer | epoch 0 | step 24720 |avg loss 5.182 |avg tokens 4403.300 |tokens/s 32503.576 |walltime 3395.580 | +Transformer | epoch 0 | step 24730 |avg loss 5.005 |avg tokens 4461.600 |tokens/s 32331.371 |walltime 3396.960 | +Transformer | epoch 0 | step 24740 |avg loss 4.940 |avg tokens 4570.600 |tokens/s 32568.100 |walltime 3398.364 | +Transformer | epoch 0 | step 24750 |avg loss 4.862 |avg tokens 4620.600 |tokens/s 34001.224 |walltime 3399.723 | +Transformer | epoch 0 | step 24760 |avg loss 4.881 |avg tokens 4796.700 |tokens/s 34571.677 |walltime 3401.110 | +Transformer | epoch 0 | step 24770 |avg loss 4.552 |avg tokens 4979.600 |tokens/s 34779.825 |walltime 3402.542 | +Transformer | epoch 0 | step 24780 |avg loss 4.536 |avg tokens 4803.700 |tokens/s 34005.404 |walltime 3403.954 | +Transformer | epoch 0 | step 24790 |avg loss 5.548 |avg tokens 4295.700 |tokens/s 31662.359 |walltime 3405.311 | +Transformer | epoch 0 | step 24800 |avg loss 5.132 |avg tokens 4229.000 |tokens/s 31939.152 |walltime 3406.635 | +Transformer | epoch 0 | step 24810 |avg loss 5.002 |avg tokens 4481.600 |tokens/s 32282.423 |walltime 3408.024 | +Transformer | epoch 0 | step 24820 |avg loss 4.988 |avg tokens 4530.400 |tokens/s 33638.507 |walltime 3409.370 | +Transformer | epoch 0 | step 24830 |avg loss 5.107 |avg tokens 4421.900 |tokens/s 33909.993 |walltime 3410.674 | +Transformer | epoch 0 | step 24840 |avg loss 5.202 |avg tokens 4439.200 |tokens/s 32674.322 |walltime 3412.033 | +Transformer | epoch 0 | step 24850 |avg loss 5.377 |avg tokens 4680.900 |tokens/s 34538.775 |walltime 3413.388 | +Transformer | epoch 0 | step 24860 |avg loss 4.817 |avg tokens 4464.500 |tokens/s 32221.517 |walltime 3414.774 | +Transformer | epoch 0 | step 24870 |avg loss 5.216 |avg tokens 4709.700 |tokens/s 34121.355 |walltime 3416.154 | +Transformer | epoch 0 | step 24880 |avg loss 4.815 |avg tokens 4634.200 |tokens/s 34460.308 |walltime 3417.499 | +Transformer | epoch 0 | step 24890 |avg loss 4.414 |avg tokens 4775.900 |tokens/s 33528.597 |walltime 3418.923 | +Transformer | epoch 0 | step 24900 |avg loss 4.645 |avg tokens 4684.500 |tokens/s 33395.236 |walltime 3420.326 | +Transformer | epoch 0 | step 24910 |avg loss 4.546 |avg tokens 4804.900 |tokens/s 34451.589 |walltime 3421.721 | +Transformer | epoch 0 | step 24920 |avg loss 4.801 |avg tokens 4672.200 |tokens/s 33697.647 |walltime 3423.107 | +Transformer | epoch 0 | step 24930 |avg loss 4.973 |avg tokens 4401.100 |tokens/s 30534.412 |walltime 3424.549 | +Transformer | epoch 0 | step 24940 |avg loss 5.157 |avg tokens 4753.800 |tokens/s 34729.704 |walltime 3425.917 | +Transformer | epoch 0 | step 24950 |avg loss 5.164 |avg tokens 4439.200 |tokens/s 31121.831 |walltime 3427.344 | +Transformer | epoch 0 | step 24960 |avg loss 4.931 |avg tokens 4265.200 |tokens/s 31373.334 |walltime 3428.703 | +Transformer | epoch 0 | step 24970 |avg loss 4.922 |avg tokens 4837.100 |tokens/s 35113.804 |walltime 3430.081 | +Transformer | epoch 0 | step 24980 |avg loss 5.667 |avg tokens 4843.100 |tokens/s 36658.285 |walltime 3431.402 | +Transformer | epoch 0 | step 24990 |avg loss 4.849 |avg tokens 4830.900 |tokens/s 33006.336 |walltime 3432.866 | +Transformer | epoch 0 | step 25000 |avg loss 5.004 |avg tokens 4557.700 |tokens/s 34067.363 |walltime 3434.204 | +Transformer | epoch 0 | step 25010 |avg loss 4.982 |avg tokens 4517.600 |tokens/s 32308.162 |walltime 3435.602 | +Transformer | epoch 0 | step 25020 |avg loss 4.549 |avg tokens 4844.400 |tokens/s 34817.677 |walltime 3436.993 | +Transformer | epoch 0 | step 25030 |avg loss 5.014 |avg tokens 4524.400 |tokens/s 33133.646 |walltime 3438.359 | +Transformer | epoch 0 | step 25040 |avg loss 5.422 |avg tokens 4007.300 |tokens/s 30173.484 |walltime 3439.687 | +Transformer | epoch 0 | step 25050 |avg loss 5.040 |avg tokens 4662.400 |tokens/s 34247.491 |walltime 3441.048 | +Transformer | epoch 0 | step 25060 |avg loss 5.445 |avg tokens 4164.000 |tokens/s 31892.027 |walltime 3442.354 | +Transformer | epoch 0 | step 25070 |avg loss 4.935 |avg tokens 4739.300 |tokens/s 33042.496 |walltime 3443.788 | +Transformer | epoch 0 | step 25080 |avg loss 5.837 |avg tokens 4430.200 |tokens/s 34892.308 |walltime 3445.058 | +Transformer | epoch 0 | step 25090 |avg loss 5.805 |avg tokens 3667.600 |tokens/s 28853.537 |walltime 3446.329 | +Transformer | epoch 0 | step 25100 |avg loss 5.031 |avg tokens 4125.400 |tokens/s 30791.511 |walltime 3447.669 | +Transformer | epoch 0 | step 25110 |avg loss 4.984 |avg tokens 4558.200 |tokens/s 33343.001 |walltime 3449.036 | +Transformer | epoch 0 | step 25120 |avg loss 5.136 |avg tokens 4436.000 |tokens/s 33835.489 |walltime 3450.347 | +Transformer | epoch 0 | step 25130 |avg loss 6.082 |avg tokens 4930.400 |tokens/s 36393.586 |walltime 3451.702 | +Transformer | epoch 0 | step 25140 |avg loss 5.106 |avg tokens 4526.900 |tokens/s 33065.292 |walltime 3453.071 | +Transformer | epoch 0 | step 25150 |avg loss 5.508 |avg tokens 3991.300 |tokens/s 30408.242 |walltime 3454.383 | +Transformer | epoch 0 | step 25160 |avg loss 5.239 |avg tokens 4279.500 |tokens/s 32236.492 |walltime 3455.711 | +Transformer | epoch 0 | step 25170 |avg loss 5.340 |avg tokens 4781.500 |tokens/s 35452.488 |walltime 3457.059 | +Transformer | epoch 0 | step 25180 |avg loss 4.898 |avg tokens 4757.600 |tokens/s 34566.168 |walltime 3458.436 | +Transformer | epoch 0 | step 25190 |avg loss 5.970 |avg tokens 4627.800 |tokens/s 35633.740 |walltime 3459.735 | +Transformer | epoch 0 | step 25200 |avg loss 4.774 |avg tokens 4258.400 |tokens/s 30735.296 |walltime 3461.120 | +Transformer | epoch 0 | step 25210 |avg loss 4.775 |avg tokens 4871.200 |tokens/s 35306.197 |walltime 3462.500 | +Transformer | epoch 0 | step 25220 |avg loss 4.961 |avg tokens 4300.300 |tokens/s 31435.604 |walltime 3463.868 | +Transformer | epoch 0 | step 25230 |avg loss 5.649 |avg tokens 4319.100 |tokens/s 32956.291 |walltime 3465.178 | +Transformer | epoch 0 | step 25240 |avg loss 4.641 |avg tokens 4603.600 |tokens/s 32632.196 |walltime 3466.589 | +Transformer | epoch 0 | step 25250 |avg loss 4.653 |avg tokens 4915.200 |tokens/s 34034.718 |walltime 3468.033 | +Transformer | epoch 0 | step 25260 |avg loss 4.692 |avg tokens 4599.200 |tokens/s 32271.574 |walltime 3469.458 | +Transformer | epoch 0 | step 25270 |avg loss 5.528 |avg tokens 4558.900 |tokens/s 33155.665 |walltime 3470.833 | +Transformer | epoch 0 | step 25280 |avg loss 5.242 |avg tokens 4131.000 |tokens/s 30407.311 |walltime 3472.192 | +Transformer | epoch 0 | step 25290 |avg loss 5.034 |avg tokens 4359.200 |tokens/s 32301.822 |walltime 3473.541 | +Transformer | epoch 0 | step 25300 |avg loss 5.035 |avg tokens 4624.100 |tokens/s 33487.834 |walltime 3474.922 | +Transformer | epoch 0 | step 25310 |avg loss 5.228 |avg tokens 4722.100 |tokens/s 34666.982 |walltime 3476.284 | +Transformer | epoch 0 | step 25320 |avg loss 4.622 |avg tokens 4533.400 |tokens/s 32361.509 |walltime 3477.685 | +Transformer | epoch 0 | step 25330 |avg loss 4.950 |avg tokens 4585.700 |tokens/s 33112.655 |walltime 3479.070 | +Transformer | epoch 0 | step 25340 |avg loss 5.710 |avg tokens 3884.900 |tokens/s 30698.831 |walltime 3480.336 | +Transformer | epoch 0 | step 25350 |avg loss 5.403 |avg tokens 4475.800 |tokens/s 32932.137 |walltime 3481.695 | +Transformer | epoch 0 | step 25360 |avg loss 4.415 |avg tokens 4821.500 |tokens/s 33013.685 |walltime 3483.155 | +Transformer | epoch 0 | step 25370 |avg loss 5.057 |avg tokens 4203.800 |tokens/s 30772.264 |walltime 3484.521 | +Transformer | epoch 0 | step 25380 |avg loss 4.772 |avg tokens 4555.700 |tokens/s 33048.933 |walltime 3485.900 | +Transformer | epoch 0 | step 25390 |avg loss 5.629 |avg tokens 4758.400 |tokens/s 35227.396 |walltime 3487.251 | +Transformer | epoch 0 | step 25400 |avg loss 4.978 |avg tokens 4794.600 |tokens/s 34063.684 |walltime 3488.658 | +Transformer | epoch 0 | step 25410 |avg loss 5.873 |avg tokens 4508.600 |tokens/s 33484.602 |walltime 3490.005 | +Transformer | epoch 0 | step 25420 |avg loss 4.785 |avg tokens 4711.300 |tokens/s 33458.762 |walltime 3491.413 | +Transformer | epoch 0 | step 25430 |avg loss 5.452 |avg tokens 4128.700 |tokens/s 31878.670 |walltime 3492.708 | +Transformer | epoch 0 | step 25440 |avg loss 5.292 |avg tokens 4371.500 |tokens/s 32901.493 |walltime 3494.037 | +Transformer | epoch 0 | step 25450 |avg loss 5.054 |avg tokens 4219.100 |tokens/s 31524.283 |walltime 3495.375 | +Transformer | epoch 0 | step 25460 |avg loss 5.280 |avg tokens 4242.300 |tokens/s 31665.386 |walltime 3496.715 | +Transformer | epoch 0 | step 25470 |avg loss 4.862 |avg tokens 4502.000 |tokens/s 32874.824 |walltime 3498.084 | +Transformer | epoch 0 | step 25480 |avg loss 4.930 |avg tokens 4385.300 |tokens/s 32049.516 |walltime 3499.452 | +Transformer | epoch 0 | step 25490 |avg loss 5.215 |avg tokens 4399.300 |tokens/s 31958.956 |walltime 3500.829 | +Transformer | epoch 0 | step 25500 |avg loss 5.116 |avg tokens 4671.300 |tokens/s 34928.562 |walltime 3502.166 | +Transformer | epoch 0 | step 25510 |avg loss 5.724 |avg tokens 4085.100 |tokens/s 30031.490 |walltime 3503.527 | +Transformer | epoch 0 | step 25520 |avg loss 4.842 |avg tokens 4458.400 |tokens/s 32513.159 |walltime 3504.898 | +Transformer | epoch 0 | step 25530 |avg loss 4.750 |avg tokens 4484.200 |tokens/s 31444.002 |walltime 3506.324 | +Transformer | epoch 0 | step 25540 |avg loss 5.116 |avg tokens 4480.200 |tokens/s 32291.521 |walltime 3507.711 | +Transformer | epoch 0 | step 25550 |avg loss 4.580 |avg tokens 4742.000 |tokens/s 31903.261 |walltime 3509.198 | +Transformer | epoch 0 | step 25560 |avg loss 5.225 |avg tokens 4231.800 |tokens/s 31960.569 |walltime 3510.522 | +Transformer | epoch 0 | step 25570 |avg loss 4.974 |avg tokens 4499.400 |tokens/s 33038.458 |walltime 3511.884 | +Transformer | epoch 0 | step 25580 |avg loss 5.140 |avg tokens 4590.500 |tokens/s 34497.168 |walltime 3513.214 | +Transformer | epoch 0 | step 25590 |avg loss 5.042 |avg tokens 4464.500 |tokens/s 33506.495 |walltime 3514.547 | +Transformer | epoch 0 | step 25600 |avg loss 5.516 |avg tokens 4132.000 |tokens/s 30639.053 |walltime 3515.895 | +Transformer | epoch 0 | step 25610 |avg loss 4.858 |avg tokens 4424.700 |tokens/s 33193.995 |walltime 3517.228 | +Transformer | epoch 0 | step 25620 |avg loss 4.436 |avg tokens 4524.000 |tokens/s 31064.209 |walltime 3518.685 | +Transformer | epoch 0 | step 25630 |avg loss 5.387 |avg tokens 4175.200 |tokens/s 31207.907 |walltime 3520.023 | +Transformer | epoch 0 | step 25640 |avg loss 4.826 |avg tokens 4570.500 |tokens/s 33443.835 |walltime 3521.389 | +Transformer | epoch 0 | step 25650 |avg loss 5.631 |avg tokens 4093.700 |tokens/s 30791.359 |walltime 3522.719 | +Transformer | epoch 0 | step 25660 |avg loss 4.738 |avg tokens 4639.200 |tokens/s 34274.655 |walltime 3524.072 | +Transformer | epoch 0 | step 25670 |avg loss 5.049 |avg tokens 4726.600 |tokens/s 34526.287 |walltime 3525.441 | +Transformer | epoch 0 | step 25680 |avg loss 5.324 |avg tokens 4250.500 |tokens/s 32711.075 |walltime 3526.741 | +Transformer | epoch 0 | step 25690 |avg loss 5.078 |avg tokens 4597.300 |tokens/s 34151.671 |walltime 3528.087 | +Transformer | epoch 0 | step 25700 |avg loss 5.039 |avg tokens 4627.900 |tokens/s 34357.014 |walltime 3529.434 | +Transformer | epoch 0 | step 25710 |avg loss 4.744 |avg tokens 4888.200 |tokens/s 35269.729 |walltime 3530.820 | +Transformer | epoch 0 | step 25720 |avg loss 5.400 |avg tokens 4208.100 |tokens/s 31324.613 |walltime 3532.163 | +Transformer | epoch 0 | step 25730 |avg loss 4.919 |avg tokens 4768.500 |tokens/s 33616.746 |walltime 3533.582 | +Transformer | epoch 0 | step 25740 |avg loss 4.950 |avg tokens 4719.300 |tokens/s 33900.562 |walltime 3534.974 | +Transformer | epoch 0 | step 25750 |avg loss 5.146 |avg tokens 4229.000 |tokens/s 30379.915 |walltime 3536.366 | +Transformer | epoch 0 | step 25760 |avg loss 4.964 |avg tokens 4810.200 |tokens/s 34626.415 |walltime 3537.755 | +Transformer | epoch 0 | step 25770 |avg loss 5.199 |avg tokens 4566.000 |tokens/s 34406.230 |walltime 3539.082 | +Transformer | epoch 0 | step 25780 |avg loss 4.986 |avg tokens 4337.500 |tokens/s 32316.322 |walltime 3540.424 | +Transformer | epoch 0 | step 25790 |avg loss 4.749 |avg tokens 4782.500 |tokens/s 33875.409 |walltime 3541.836 | +Transformer | epoch 0 | step 25800 |avg loss 4.877 |avg tokens 4582.100 |tokens/s 33089.160 |walltime 3543.221 | +Transformer | epoch 0 | step 25810 |avg loss 5.118 |avg tokens 4549.400 |tokens/s 32168.273 |walltime 3544.635 | +Transformer | epoch 0 | step 25820 |avg loss 5.284 |avg tokens 4246.200 |tokens/s 30858.345 |walltime 3546.011 | +Transformer | epoch 0 | step 25830 |avg loss 4.623 |avg tokens 4847.800 |tokens/s 34127.287 |walltime 3547.432 | +Transformer | epoch 0 | step 25840 |avg loss 4.848 |avg tokens 4341.200 |tokens/s 31766.903 |walltime 3548.798 | +Transformer | epoch 0 | step 25850 |avg loss 5.107 |avg tokens 4386.300 |tokens/s 32082.450 |walltime 3550.165 | +Transformer | epoch 0 | step 25860 |avg loss 4.690 |avg tokens 4413.100 |tokens/s 31671.160 |walltime 3551.559 | +Transformer | epoch 0 | step 25870 |avg loss 4.377 |avg tokens 4831.600 |tokens/s 33005.157 |walltime 3553.023 | +Transformer | epoch 0 | step 25880 |avg loss 4.541 |avg tokens 4746.400 |tokens/s 32232.101 |walltime 3554.495 | +Transformer | epoch 0 | step 25890 |avg loss 5.436 |avg tokens 4425.600 |tokens/s 32140.177 |walltime 3555.872 | +Transformer | epoch 0 | step 25900 |avg loss 4.498 |avg tokens 4755.200 |tokens/s 33677.647 |walltime 3557.284 | +Transformer | epoch 0 | step 25910 |avg loss 5.245 |avg tokens 4807.900 |tokens/s 35401.106 |walltime 3558.642 | +Transformer | epoch 0 | step 25920 |avg loss 5.812 |avg tokens 4087.900 |tokens/s 30659.762 |walltime 3559.976 | +Transformer | epoch 0 | step 25930 |avg loss 4.774 |avg tokens 4707.900 |tokens/s 33461.437 |walltime 3561.383 | +Transformer | epoch 0 | step 25940 |avg loss 4.887 |avg tokens 4730.400 |tokens/s 33436.018 |walltime 3562.797 | +Transformer | epoch 0 | step 25950 |avg loss 4.542 |avg tokens 4736.800 |tokens/s 32631.481 |walltime 3564.249 | +Transformer | epoch 0 | step 25960 |avg loss 4.952 |avg tokens 4511.400 |tokens/s 31979.450 |walltime 3565.660 | +Transformer | epoch 0 | step 25970 |avg loss 4.327 |avg tokens 4806.900 |tokens/s 32696.420 |walltime 3567.130 | +Transformer | epoch 0 | step 25980 |avg loss 5.247 |avg tokens 4140.600 |tokens/s 30625.470 |walltime 3568.482 | +Transformer | epoch 0 | step 25990 |avg loss 5.048 |avg tokens 4459.300 |tokens/s 32719.625 |walltime 3569.845 | +Transformer | epoch 0 | step 26000 |avg loss 4.932 |avg tokens 4599.600 |tokens/s 33474.383 |walltime 3571.219 | +Transformer | epoch 0 | step 26010 |avg loss 4.591 |avg tokens 4850.500 |tokens/s 34938.727 |walltime 3572.607 | +Transformer | epoch 0 | step 26020 |avg loss 5.202 |avg tokens 4189.800 |tokens/s 32243.945 |walltime 3573.907 | +Transformer | epoch 0 | step 26030 |avg loss 5.954 |avg tokens 4051.600 |tokens/s 31722.666 |walltime 3575.184 | +Transformer | epoch 0 | step 26040 |avg loss 4.639 |avg tokens 4989.000 |tokens/s 35184.931 |walltime 3576.602 | +Transformer | epoch 0 | step 26050 |avg loss 4.415 |avg tokens 4871.900 |tokens/s 34416.420 |walltime 3578.017 | +Transformer | epoch 0 | step 26060 |avg loss 5.673 |avg tokens 4266.500 |tokens/s 33086.792 |walltime 3579.307 | +Transformer | epoch 0 | step 26070 |avg loss 4.698 |avg tokens 4708.000 |tokens/s 33810.073 |walltime 3580.699 | +Transformer | epoch 0 | step 26080 |avg loss 4.824 |avg tokens 4741.600 |tokens/s 35433.457 |walltime 3582.038 | +Transformer | epoch 0 | step 26090 |avg loss 4.956 |avg tokens 4528.000 |tokens/s 32130.402 |walltime 3583.447 | +Transformer | epoch 0 | step 26100 |avg loss 4.857 |avg tokens 4451.400 |tokens/s 32411.641 |walltime 3584.820 | +Transformer | epoch 0 | step 26110 |avg loss 4.644 |avg tokens 4534.500 |tokens/s 32439.511 |walltime 3586.218 | +Transformer | epoch 0 | step 26120 |avg loss 5.059 |avg tokens 4276.800 |tokens/s 32642.779 |walltime 3587.528 | +Transformer | epoch 0 | step 26130 |avg loss 5.248 |avg tokens 4219.600 |tokens/s 30678.670 |walltime 3588.904 | +Transformer | epoch 0 | step 26140 |avg loss 5.343 |avg tokens 4341.000 |tokens/s 32441.517 |walltime 3590.242 | +Transformer | epoch 0 | step 26150 |avg loss 4.596 |avg tokens 4816.800 |tokens/s 33960.566 |walltime 3591.660 | +Transformer | epoch 0 | step 26160 |avg loss 5.821 |avg tokens 3671.700 |tokens/s 28790.398 |walltime 3592.935 | +Transformer | epoch 0 | step 26170 |avg loss 5.008 |avg tokens 4586.900 |tokens/s 33770.788 |walltime 3594.294 | +Transformer | epoch 0 | step 26180 |avg loss 4.858 |avg tokens 4664.800 |tokens/s 32978.860 |walltime 3595.708 | +Transformer | epoch 0 | step 26190 |avg loss 4.954 |avg tokens 4907.000 |tokens/s 35447.446 |walltime 3597.092 | +Transformer | epoch 0 | step 26200 |avg loss 4.704 |avg tokens 4731.400 |tokens/s 34395.057 |walltime 3598.468 | +Transformer | epoch 0 | step 26210 |avg loss 5.718 |avg tokens 4360.400 |tokens/s 33215.941 |walltime 3599.781 | +Transformer | epoch 0 | step 26220 |avg loss 5.002 |avg tokens 4918.000 |tokens/s 35296.237 |walltime 3601.174 | +Transformer | epoch 0 | step 26230 |avg loss 5.062 |avg tokens 4410.800 |tokens/s 32187.031 |walltime 3602.545 | +Transformer | epoch 0 | step 26240 |avg loss 5.326 |avg tokens 4329.100 |tokens/s 32481.593 |walltime 3603.877 | +Transformer | epoch 0 | step 26250 |avg loss 5.175 |avg tokens 4522.500 |tokens/s 33540.341 |walltime 3605.226 | +Transformer | epoch 0 | step 26260 |avg loss 5.052 |avg tokens 4718.000 |tokens/s 33353.362 |walltime 3606.640 | +Transformer | epoch 0 | step 26270 |avg loss 5.243 |avg tokens 4531.900 |tokens/s 33863.870 |walltime 3607.979 | +Transformer | epoch 0 | step 26280 |avg loss 5.347 |avg tokens 4402.400 |tokens/s 32896.279 |walltime 3609.317 | +Transformer | epoch 0 | step 26290 |avg loss 5.072 |avg tokens 4158.200 |tokens/s 31297.900 |walltime 3610.645 | +Transformer | epoch 0 | step 26300 |avg loss 5.036 |avg tokens 4351.600 |tokens/s 31219.760 |walltime 3612.039 | +Transformer | epoch 0 | step 26310 |avg loss 5.071 |avg tokens 3990.600 |tokens/s 29012.806 |walltime 3613.415 | +Transformer | epoch 0 | step 26320 |avg loss 4.548 |avg tokens 4457.800 |tokens/s 32210.158 |walltime 3614.799 | +Transformer | epoch 0 | step 26330 |avg loss 5.714 |avg tokens 4446.200 |tokens/s 35197.225 |walltime 3616.062 | +Transformer | epoch 0 | step 26340 |avg loss 5.031 |avg tokens 4076.900 |tokens/s 30920.060 |walltime 3617.380 | +Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 64.0 +Transformer | epoch 0 | step 26350 |avg loss 5.402 |avg tokens 4033.100 |tokens/s 31533.387 |walltime 3618.659 | +Transformer | epoch 0 | step 26360 |avg loss 5.162 |avg tokens 4530.200 |tokens/s 33087.407 |walltime 3620.029 | +Transformer | epoch 0 | step 26370 |avg loss 4.826 |avg tokens 4505.700 |tokens/s 32362.383 |walltime 3621.421 | +Transformer | epoch 0 | step 26380 |avg loss 4.887 |avg tokens 4653.200 |tokens/s 33392.319 |walltime 3622.814 | +Transformer | epoch 0 | step 26390 |avg loss 4.950 |avg tokens 4769.600 |tokens/s 34473.656 |walltime 3624.198 | +Transformer | epoch 0 | step 26400 |avg loss 4.892 |avg tokens 4913.800 |tokens/s 35128.098 |walltime 3625.597 | +Transformer | epoch 0 | step 26410 |avg loss 4.977 |avg tokens 4739.900 |tokens/s 34165.838 |walltime 3626.984 | +Transformer | epoch 0 | step 26420 |avg loss 4.806 |avg tokens 4647.200 |tokens/s 32549.874 |walltime 3628.412 | +Transformer | epoch 0 | step 26430 |avg loss 4.917 |avg tokens 4738.100 |tokens/s 34123.500 |walltime 3629.800 | +Transformer | epoch 0 | step 26440 |avg loss 5.483 |avg tokens 3938.900 |tokens/s 29424.583 |walltime 3631.139 | +Transformer | epoch 0 | step 26450 |avg loss 5.072 |avg tokens 4513.800 |tokens/s 33394.545 |walltime 3632.491 | +Transformer | epoch 0 | step 26460 |avg loss 5.144 |avg tokens 4540.800 |tokens/s 32095.944 |walltime 3633.905 | +Transformer | epoch 0 | step 26470 |avg loss 5.156 |avg tokens 4359.700 |tokens/s 32504.847 |walltime 3635.247 | +Transformer | epoch 0 | step 26480 |avg loss 5.256 |avg tokens 4282.400 |tokens/s 32659.953 |walltime 3636.558 | +Transformer | epoch 0 | step 26490 |avg loss 5.485 |avg tokens 4437.200 |tokens/s 32272.655 |walltime 3637.933 | +Transformer | epoch 0 | step 26500 |avg loss 5.085 |avg tokens 4216.100 |tokens/s 31483.864 |walltime 3639.272 | +Transformer | epoch 0 | step 26510 |avg loss 5.190 |avg tokens 4856.600 |tokens/s 35280.480 |walltime 3640.648 | +Transformer | epoch 0 | step 26520 |avg loss 4.481 |avg tokens 4762.100 |tokens/s 33510.522 |walltime 3642.070 | +Transformer | epoch 0 | step 26530 |avg loss 4.483 |avg tokens 4611.500 |tokens/s 32231.602 |walltime 3643.500 | +Transformer | epoch 0 | step 26540 |avg loss 5.242 |avg tokens 4225.100 |tokens/s 30077.915 |walltime 3644.905 | +Transformer | epoch 0 | step 26550 |avg loss 4.729 |avg tokens 4588.100 |tokens/s 33325.390 |walltime 3646.282 | +Transformer | epoch 0 | step 26560 |avg loss 4.540 |avg tokens 4867.600 |tokens/s 34424.635 |walltime 3647.696 | +Transformer | epoch 0 | step 26570 |avg loss 4.751 |avg tokens 4787.300 |tokens/s 32777.832 |walltime 3649.156 | +Transformer | epoch 0 | step 26580 |avg loss 5.295 |avg tokens 4223.900 |tokens/s 31746.123 |walltime 3650.487 | +Transformer | epoch 0 | step 26590 |avg loss 4.937 |avg tokens 4395.100 |tokens/s 31801.336 |walltime 3651.869 | +Transformer | epoch 0 | step 26600 |avg loss 5.281 |avg tokens 4143.900 |tokens/s 30419.940 |walltime 3653.231 | +Transformer | epoch 0 | step 26610 |avg loss 5.310 |avg tokens 4473.100 |tokens/s 32587.231 |walltime 3654.604 | +Transformer | epoch 0 | step 26620 |avg loss 5.272 |avg tokens 4394.200 |tokens/s 32867.838 |walltime 3655.941 | +Transformer | epoch 0 | step 26630 |avg loss 5.144 |avg tokens 4652.800 |tokens/s 34499.771 |walltime 3657.289 | +Transformer | epoch 0 | step 26640 |avg loss 4.592 |avg tokens 4757.500 |tokens/s 34478.209 |walltime 3658.669 | +Transformer | epoch 0 | step 26650 |avg loss 4.518 |avg tokens 4658.000 |tokens/s 32694.963 |walltime 3660.094 | +Transformer | epoch 0 | step 26660 |avg loss 4.614 |avg tokens 4617.600 |tokens/s 32722.591 |walltime 3661.505 | +Transformer | epoch 0 | step 26670 |avg loss 5.391 |avg tokens 4303.200 |tokens/s 32642.777 |walltime 3662.823 | +Transformer | epoch 0 | step 26680 |avg loss 4.916 |avg tokens 4713.900 |tokens/s 34081.652 |walltime 3664.206 | +Transformer | epoch 0 | step 26690 |avg loss 4.897 |avg tokens 4397.300 |tokens/s 32696.624 |walltime 3665.551 | +Transformer | epoch 0 | step 26700 |avg loss 4.351 |avg tokens 4907.000 |tokens/s 34790.222 |walltime 3666.962 | +Transformer | epoch 0 | step 26710 |avg loss 4.995 |avg tokens 4505.800 |tokens/s 33003.897 |walltime 3668.327 | +Transformer | epoch 0 | step 26720 |avg loss 4.967 |avg tokens 4441.900 |tokens/s 33075.801 |walltime 3669.670 | +Transformer | epoch 0 | step 26730 |avg loss 4.773 |avg tokens 4504.800 |tokens/s 32951.819 |walltime 3671.037 | +Transformer | epoch 0 | step 26740 |avg loss 5.534 |avg tokens 3977.500 |tokens/s 30564.820 |walltime 3672.338 | +Transformer | epoch 0 | step 26750 |avg loss 4.538 |avg tokens 4642.100 |tokens/s 32961.987 |walltime 3673.747 | +Transformer | epoch 0 | step 26760 |avg loss 4.764 |avg tokens 4687.900 |tokens/s 33763.059 |walltime 3675.135 | +Transformer | epoch 0 | step 26770 |avg loss 5.086 |avg tokens 4620.000 |tokens/s 33874.287 |walltime 3676.499 | +Transformer | epoch 0 | step 26780 |avg loss 5.155 |avg tokens 4715.700 |tokens/s 32492.294 |walltime 3677.950 | +Transformer | epoch 0 | step 26790 |avg loss 5.061 |avg tokens 4011.000 |tokens/s 31909.236 |walltime 3679.207 | +Transformer | epoch 0 | step 26800 |avg loss 5.324 |avg tokens 4593.700 |tokens/s 34600.617 |walltime 3680.535 | +Transformer | epoch 0 | step 26810 |avg loss 5.178 |avg tokens 4484.500 |tokens/s 33135.523 |walltime 3681.888 | +Transformer | epoch 0 | step 26820 |avg loss 5.068 |avg tokens 4719.000 |tokens/s 33802.145 |walltime 3683.285 | +Transformer | epoch 0 | step 26830 |avg loss 4.699 |avg tokens 4729.800 |tokens/s 34024.569 |walltime 3684.675 | +Transformer | epoch 0 | step 26840 |avg loss 4.870 |avg tokens 4717.800 |tokens/s 34021.991 |walltime 3686.061 | +Transformer | epoch 0 | step 26850 |avg loss 4.737 |avg tokens 4516.800 |tokens/s 32286.867 |walltime 3687.460 | +Transformer | epoch 0 | step 26860 |avg loss 5.456 |avg tokens 4374.900 |tokens/s 32944.901 |walltime 3688.788 | +Transformer | epoch 0 | step 26870 |avg loss 5.076 |avg tokens 4313.000 |tokens/s 32843.800 |walltime 3690.101 | +Transformer | epoch 0 | step 26880 |avg loss 5.062 |avg tokens 4460.200 |tokens/s 31225.579 |walltime 3691.530 | +Transformer | epoch 0 | step 26890 |avg loss 5.199 |avg tokens 3869.400 |tokens/s 28388.885 |walltime 3692.893 | +Transformer | epoch 0 | step 26900 |avg loss 4.801 |avg tokens 4551.200 |tokens/s 33677.836 |walltime 3694.244 | +Transformer | epoch 0 | step 26910 |avg loss 5.167 |avg tokens 4485.800 |tokens/s 33778.819 |walltime 3695.572 | +Transformer | epoch 0 | step 26920 |avg loss 4.537 |avg tokens 4706.400 |tokens/s 33687.475 |walltime 3696.969 | +Transformer | epoch 0 | step 26930 |avg loss 5.505 |avg tokens 4433.600 |tokens/s 33678.908 |walltime 3698.286 | +Transformer | epoch 0 | step 26940 |avg loss 4.996 |avg tokens 4423.300 |tokens/s 32113.615 |walltime 3699.663 | +Transformer | epoch 0 | step 26950 |avg loss 5.601 |avg tokens 4724.500 |tokens/s 35418.748 |walltime 3700.997 | +Transformer | epoch 0 | step 26960 |avg loss 5.111 |avg tokens 4587.900 |tokens/s 33402.194 |walltime 3702.371 | +Transformer | epoch 0 | step 26970 |avg loss 5.311 |avg tokens 4494.600 |tokens/s 33951.639 |walltime 3703.694 | +Transformer | epoch 0 | step 26980 |avg loss 5.757 |avg tokens 4429.500 |tokens/s 33755.807 |walltime 3705.007 | +Transformer | epoch 0 | step 26990 |avg loss 4.509 |avg tokens 4724.800 |tokens/s 33610.083 |walltime 3706.412 | +Transformer | epoch 0 | step 27000 |avg loss 4.610 |avg tokens 4836.800 |tokens/s 34752.133 |walltime 3707.804 | +Transformer | epoch 0 | step 27010 |avg loss 4.787 |avg tokens 4327.900 |tokens/s 32041.381 |walltime 3709.155 | +Transformer | epoch 0 | step 27020 |avg loss 4.616 |avg tokens 4744.800 |tokens/s 34086.231 |walltime 3710.547 | +Transformer | epoch 0 | step 27030 |avg loss 5.301 |avg tokens 4543.600 |tokens/s 33336.639 |walltime 3711.910 | +Transformer | epoch 0 | step 27040 |avg loss 4.826 |avg tokens 4656.900 |tokens/s 33923.539 |walltime 3713.283 | +Transformer | epoch 0 | step 27050 |avg loss 5.504 |avg tokens 3857.300 |tokens/s 30366.302 |walltime 3714.553 | +Transformer | epoch 0 | step 27060 |avg loss 4.802 |avg tokens 4541.800 |tokens/s 32871.015 |walltime 3715.935 | +Transformer | epoch 0 | step 27070 |avg loss 4.848 |avg tokens 4758.700 |tokens/s 33810.461 |walltime 3717.342 | +Transformer | epoch 0 | step 27080 |avg loss 4.459 |avg tokens 4898.400 |tokens/s 34813.117 |walltime 3718.749 | +Transformer | epoch 0 | step 27090 |avg loss 5.168 |avg tokens 4250.800 |tokens/s 31749.036 |walltime 3720.088 | +Transformer | epoch 0 | step 27100 |avg loss 5.388 |avg tokens 4029.600 |tokens/s 29522.225 |walltime 3721.453 | +Transformer | epoch 0 | step 27110 |avg loss 4.766 |avg tokens 4701.700 |tokens/s 34240.289 |walltime 3722.826 | +Transformer | epoch 0 | step 27120 |avg loss 4.790 |avg tokens 4356.500 |tokens/s 31876.031 |walltime 3724.193 | +Transformer | epoch 0 | step 27130 |avg loss 4.633 |avg tokens 4620.200 |tokens/s 33118.418 |walltime 3725.588 | +Transformer | epoch 0 | step 27140 |avg loss 4.785 |avg tokens 4341.600 |tokens/s 31042.691 |walltime 3726.986 | +Transformer | epoch 0 | step 27150 |avg loss 5.017 |avg tokens 4321.100 |tokens/s 31449.974 |walltime 3728.360 | +Transformer | epoch 0 | step 27160 |avg loss 5.368 |avg tokens 4508.600 |tokens/s 33017.049 |walltime 3729.726 | +Transformer | epoch 0 | step 27170 |avg loss 5.492 |avg tokens 4182.200 |tokens/s 31577.472 |walltime 3731.050 | +Transformer | epoch 0 | step 27180 |avg loss 5.798 |avg tokens 4239.800 |tokens/s 32915.262 |walltime 3732.339 | +Transformer | epoch 0 | step 27190 |avg loss 5.684 |avg tokens 3651.800 |tokens/s 28268.607 |walltime 3733.630 | +Transformer | epoch 0 | step 27200 |avg loss 4.558 |avg tokens 4903.200 |tokens/s 34862.477 |walltime 3735.037 | +Transformer | epoch 0 | step 27210 |avg loss 4.970 |avg tokens 4642.400 |tokens/s 33850.553 |walltime 3736.408 | +Transformer | epoch 0 | step 27220 |avg loss 4.717 |avg tokens 4737.000 |tokens/s 34190.301 |walltime 3737.794 | +Transformer | epoch 0 | step 27230 |avg loss 5.177 |avg tokens 4271.000 |tokens/s 31602.821 |walltime 3739.145 | +Transformer | epoch 0 | step 27240 |avg loss 5.446 |avg tokens 4224.600 |tokens/s 31842.782 |walltime 3740.472 | +Transformer | epoch 0 | step 27250 |avg loss 4.988 |avg tokens 4577.600 |tokens/s 33472.832 |walltime 3741.839 | +Transformer | epoch 0 | step 27260 |avg loss 5.306 |avg tokens 4402.800 |tokens/s 32729.472 |walltime 3743.185 | +Transformer | epoch 0 | step 27270 |avg loss 5.254 |avg tokens 4498.500 |tokens/s 33136.612 |walltime 3744.542 | +Transformer | epoch 0 | step 27280 |avg loss 4.697 |avg tokens 4734.700 |tokens/s 32828.403 |walltime 3745.984 | +Transformer | epoch 0 | step 27290 |avg loss 4.849 |avg tokens 4658.700 |tokens/s 34245.766 |walltime 3747.345 | +Transformer | epoch 0 | step 27300 |avg loss 4.908 |avg tokens 4834.600 |tokens/s 35115.378 |walltime 3748.722 | +Transformer | epoch 0 | step 27310 |avg loss 5.939 |avg tokens 3953.300 |tokens/s 31417.903 |walltime 3749.980 | +Transformer | epoch 0 | step 27320 |avg loss 4.921 |avg tokens 4355.500 |tokens/s 31572.828 |walltime 3751.359 | +Transformer | epoch 0 | step 27330 |avg loss 4.877 |avg tokens 4838.800 |tokens/s 34382.465 |walltime 3752.767 | +Transformer | epoch 0 | step 27340 |avg loss 5.040 |avg tokens 4852.600 |tokens/s 34833.499 |walltime 3754.160 | +Transformer | epoch 0 | step 27350 |avg loss 5.005 |avg tokens 4809.600 |tokens/s 34754.371 |walltime 3755.544 | +Transformer | epoch 0 | step 27360 |avg loss 5.029 |avg tokens 4688.500 |tokens/s 34671.608 |walltime 3756.896 | +Transformer | epoch 0 | step 27370 |avg loss 5.229 |avg tokens 4259.800 |tokens/s 32149.618 |walltime 3758.221 | +Transformer | epoch 0 | step 27380 |avg loss 4.786 |avg tokens 4494.000 |tokens/s 33166.676 |walltime 3759.576 | +Transformer | epoch 0 | step 27390 |avg loss 4.693 |avg tokens 4917.600 |tokens/s 34832.314 |walltime 3760.988 | +Transformer | epoch 0 | step 27400 |avg loss 4.912 |avg tokens 4746.000 |tokens/s 34588.537 |walltime 3762.360 | +Transformer | epoch 0 | step 27410 |avg loss 5.138 |avg tokens 4829.700 |tokens/s 36648.200 |walltime 3763.678 | +Transformer | epoch 0 | step 27420 |avg loss 4.913 |avg tokens 4732.600 |tokens/s 34126.854 |walltime 3765.065 | +Transformer | epoch 0 | step 27430 |avg loss 5.377 |avg tokens 4595.100 |tokens/s 33506.965 |walltime 3766.436 | +Transformer | epoch 0 | step 27440 |avg loss 4.538 |avg tokens 4804.800 |tokens/s 33085.191 |walltime 3767.888 | +Transformer | epoch 0 | step 27450 |avg loss 5.097 |avg tokens 4592.100 |tokens/s 33128.264 |walltime 3769.274 | +Transformer | epoch 0 | step 27460 |avg loss 5.037 |avg tokens 4504.700 |tokens/s 33294.585 |walltime 3770.627 | +Transformer | epoch 0 | step 27470 |avg loss 5.096 |avg tokens 4706.800 |tokens/s 34810.608 |walltime 3771.979 | +Transformer | epoch 0 | step 27480 |avg loss 4.558 |avg tokens 4749.600 |tokens/s 34442.083 |walltime 3773.358 | +Transformer | epoch 0 | step 27490 |avg loss 4.742 |avg tokens 4524.800 |tokens/s 31961.386 |walltime 3774.774 | +Transformer | epoch 0 | step 27500 |avg loss 5.365 |avg tokens 3964.600 |tokens/s 30134.067 |walltime 3776.090 | +Transformer | epoch 0 | step 27510 |avg loss 4.706 |avg tokens 4723.400 |tokens/s 34162.721 |walltime 3777.472 | +Transformer | epoch 0 | step 27520 |avg loss 4.823 |avg tokens 4515.800 |tokens/s 32637.641 |walltime 3778.856 | +Transformer | epoch 0 | step 27530 |avg loss 4.931 |avg tokens 4725.300 |tokens/s 34280.515 |walltime 3780.235 | +Transformer | epoch 0 | step 27540 |avg loss 5.505 |avg tokens 4221.400 |tokens/s 31070.898 |walltime 3781.593 | +Transformer | epoch 0 | step 27550 |avg loss 5.489 |avg tokens 3991.200 |tokens/s 30627.288 |walltime 3782.896 | +Transformer | epoch 0 | step 27560 |avg loss 4.812 |avg tokens 4912.700 |tokens/s 34505.225 |walltime 3784.320 | +Transformer | epoch 0 | step 27570 |avg loss 4.852 |avg tokens 4263.600 |tokens/s 31362.967 |walltime 3785.680 | +Transformer | epoch 0 | step 27580 |avg loss 5.044 |avg tokens 4426.500 |tokens/s 30947.078 |walltime 3787.110 | +Transformer | epoch 0 | step 27590 |avg loss 4.679 |avg tokens 4603.200 |tokens/s 33331.366 |walltime 3788.491 | +Transformer | epoch 0 | step 27600 |avg loss 4.941 |avg tokens 4546.100 |tokens/s 32877.713 |walltime 3789.874 | +Transformer | epoch 0 | step 27610 |avg loss 4.556 |avg tokens 4965.700 |tokens/s 35358.307 |walltime 3791.278 | +Transformer | epoch 0 | step 27620 |avg loss 4.466 |avg tokens 4818.200 |tokens/s 34355.952 |walltime 3792.680 | +Transformer | epoch 0 | step 27630 |avg loss 5.131 |avg tokens 4487.900 |tokens/s 33263.227 |walltime 3794.030 | +Transformer | epoch 0 | step 27640 |avg loss 4.645 |avg tokens 4806.100 |tokens/s 34454.268 |walltime 3795.425 | +Transformer | epoch 0 | step 27650 |avg loss 4.542 |avg tokens 4782.000 |tokens/s 34297.460 |walltime 3796.819 | +Transformer | epoch 0 | step 27660 |avg loss 5.214 |avg tokens 4627.100 |tokens/s 34107.242 |walltime 3798.176 | +Transformer | epoch 0 | step 27670 |avg loss 5.172 |avg tokens 4930.300 |tokens/s 35884.702 |walltime 3799.549 | +Transformer | epoch 0 | step 27680 |avg loss 5.033 |avg tokens 4390.100 |tokens/s 32021.858 |walltime 3800.920 | +Transformer | epoch 0 | step 27690 |avg loss 4.455 |avg tokens 4929.700 |tokens/s 34987.073 |walltime 3802.329 | +Transformer | epoch 0 | step 27700 |avg loss 5.228 |avg tokens 4740.000 |tokens/s 34805.083 |walltime 3803.691 | +Transformer | epoch 0 | step 27710 |avg loss 4.371 |avg tokens 4806.900 |tokens/s 34084.692 |walltime 3805.102 | +Transformer | epoch 0 | step 27720 |avg loss 5.879 |avg tokens 4865.700 |tokens/s 37299.400 |walltime 3806.406 | +Transformer | epoch 0 | step 27730 |avg loss 5.577 |avg tokens 3989.800 |tokens/s 31156.217 |walltime 3807.687 | +Transformer | epoch 0 | step 27740 |avg loss 4.733 |avg tokens 4530.400 |tokens/s 33428.610 |walltime 3809.042 | +Transformer | epoch 0 | step 27750 |avg loss 4.915 |avg tokens 4472.700 |tokens/s 33667.178 |walltime 3810.370 | +Transformer | epoch 0 | step 27760 |avg loss 5.245 |avg tokens 4635.400 |tokens/s 34236.696 |walltime 3811.724 | +Transformer | epoch 0 | step 27770 |avg loss 5.030 |avg tokens 4674.700 |tokens/s 34245.582 |walltime 3813.089 | +Transformer | epoch 0 | step 27780 |avg loss 4.904 |avg tokens 4436.800 |tokens/s 32080.416 |walltime 3814.472 | +Transformer | epoch 0 | step 27790 |avg loss 5.243 |avg tokens 4200.300 |tokens/s 30564.468 |walltime 3815.847 | +Transformer | epoch 0 | step 27800 |avg loss 4.983 |avg tokens 4765.700 |tokens/s 34413.089 |walltime 3817.232 | +Transformer | epoch 0 | step 27810 |avg loss 5.180 |avg tokens 4422.100 |tokens/s 33290.015 |walltime 3818.560 | +Transformer | epoch 0 | step 27820 |avg loss 5.389 |avg tokens 3981.500 |tokens/s 31081.270 |walltime 3819.841 | +Transformer | epoch 0 | step 27830 |avg loss 4.906 |avg tokens 4938.000 |tokens/s 36161.530 |walltime 3821.206 | +Transformer | epoch 0 | step 27840 |avg loss 4.768 |avg tokens 4490.200 |tokens/s 33262.588 |walltime 3822.556 | +Transformer | epoch 0 | step 27850 |avg loss 4.738 |avg tokens 4706.700 |tokens/s 33866.272 |walltime 3823.946 | +Transformer | epoch 0 | step 27860 |avg loss 5.291 |avg tokens 4793.500 |tokens/s 35334.168 |walltime 3825.303 | +Transformer | epoch 0 | step 27870 |avg loss 4.569 |avg tokens 4711.300 |tokens/s 33951.137 |walltime 3826.691 | +Transformer | epoch 0 | step 27880 |avg loss 5.215 |avg tokens 4883.000 |tokens/s 37276.507 |walltime 3828.000 | +Transformer | epoch 0 | step 27890 |avg loss 5.249 |avg tokens 4503.800 |tokens/s 33271.101 |walltime 3829.354 | +Transformer | epoch 0 | step 27900 |avg loss 5.149 |avg tokens 4442.300 |tokens/s 33097.024 |walltime 3830.696 | +Transformer | epoch 0 | step 27910 |avg loss 5.788 |avg tokens 4427.300 |tokens/s 33089.132 |walltime 3832.034 | +Transformer | epoch 0 | step 27920 |avg loss 4.767 |avg tokens 4730.900 |tokens/s 33839.626 |walltime 3833.432 | +Transformer | epoch 0 | step 27930 |avg loss 5.219 |avg tokens 4514.800 |tokens/s 33578.869 |walltime 3834.777 | +Transformer | epoch 0 | step 27940 |avg loss 4.548 |avg tokens 4760.600 |tokens/s 34531.068 |walltime 3836.156 | +Transformer | epoch 0 | step 27950 |avg loss 4.953 |avg tokens 4689.300 |tokens/s 33817.367 |walltime 3837.542 | +Transformer | epoch 0 | step 27960 |avg loss 5.058 |avg tokens 4443.000 |tokens/s 34110.206 |walltime 3838.845 | +Transformer | epoch 0 | step 27970 |avg loss 5.327 |avg tokens 4552.400 |tokens/s 34374.342 |walltime 3840.169 | +Transformer | epoch 0 | step 27980 |avg loss 4.976 |avg tokens 4290.600 |tokens/s 31189.108 |walltime 3841.545 | +Transformer | epoch 0 | step 27990 |avg loss 4.992 |avg tokens 4457.900 |tokens/s 32488.161 |walltime 3842.917 | +Transformer | epoch 0 | step 28000 |avg loss 4.884 |avg tokens 4868.900 |tokens/s 35407.953 |walltime 3844.292 | +Transformer | epoch 0 | step 28010 |avg loss 5.239 |avg tokens 4567.700 |tokens/s 32411.294 |walltime 3845.701 | +Transformer | epoch 0 | step 28020 |avg loss 5.060 |avg tokens 4543.900 |tokens/s 33962.336 |walltime 3847.039 | +Transformer | epoch 0 | step 28030 |avg loss 4.777 |avg tokens 4574.200 |tokens/s 32576.889 |walltime 3848.443 | +Transformer | epoch 0 | step 28040 |avg loss 5.200 |avg tokens 4263.900 |tokens/s 32445.959 |walltime 3849.758 | +Transformer | epoch 0 | step 28050 |avg loss 5.513 |avg tokens 4128.200 |tokens/s 31954.210 |walltime 3851.049 | +Transformer | epoch 0 | step 28060 |avg loss 5.374 |avg tokens 4059.200 |tokens/s 30822.866 |walltime 3852.366 | +Transformer | epoch 0 | step 28070 |avg loss 5.018 |avg tokens 4195.400 |tokens/s 30668.411 |walltime 3853.734 | +Transformer | epoch 0 | step 28080 |avg loss 5.564 |avg tokens 4345.300 |tokens/s 32729.592 |walltime 3855.062 | +Transformer | epoch 0 | step 28090 |avg loss 4.899 |avg tokens 4719.400 |tokens/s 34303.643 |walltime 3856.438 | +Transformer | epoch 0 | step 28100 |avg loss 4.938 |avg tokens 4711.000 |tokens/s 34307.022 |walltime 3857.811 | +Transformer | epoch 0 | step 28110 |avg loss 5.466 |avg tokens 4224.100 |tokens/s 32793.814 |walltime 3859.099 | +Transformer | epoch 0 | step 28120 |avg loss 5.481 |avg tokens 4530.100 |tokens/s 35033.303 |walltime 3860.392 | +Transformer | epoch 0 | step 28130 |avg loss 5.106 |avg tokens 4391.900 |tokens/s 31180.539 |walltime 3861.801 | +Transformer | epoch 0 | step 28140 |avg loss 5.086 |avg tokens 4812.600 |tokens/s 35153.014 |walltime 3863.170 | +Transformer | epoch 0 | step 28150 |avg loss 4.813 |avg tokens 4538.300 |tokens/s 32963.419 |walltime 3864.547 | +Transformer | epoch 0 | step 28160 |avg loss 5.024 |avg tokens 4474.600 |tokens/s 32710.865 |walltime 3865.914 | +Transformer | epoch 0 | step 28170 |avg loss 5.351 |avg tokens 4367.400 |tokens/s 31508.982 |walltime 3867.301 | +Transformer | epoch 0 | step 28180 |avg loss 5.563 |avg tokens 4644.000 |tokens/s 34663.170 |walltime 3868.640 | +Transformer | epoch 0 | step 28190 |avg loss 5.338 |avg tokens 4658.200 |tokens/s 34016.167 |walltime 3870.010 | +Transformer | epoch 0 | step 28200 |avg loss 4.756 |avg tokens 4601.700 |tokens/s 33761.646 |walltime 3871.373 | +Transformer | epoch 0 | step 28210 |avg loss 5.210 |avg tokens 4873.200 |tokens/s 35486.330 |walltime 3872.746 | +Transformer | epoch 0 | step 28220 |avg loss 4.912 |avg tokens 4781.100 |tokens/s 33341.314 |walltime 3874.180 | +Transformer | epoch 0 | step 28230 |avg loss 5.598 |avg tokens 3927.600 |tokens/s 30298.064 |walltime 3875.476 | +Transformer | epoch 0 | step 28240 |avg loss 5.634 |avg tokens 3893.900 |tokens/s 28847.532 |walltime 3876.826 | +Transformer | epoch 0 | step 28250 |avg loss 5.425 |avg tokens 3799.300 |tokens/s 28329.891 |walltime 3878.167 | +Transformer | epoch 0 | step 28260 |avg loss 5.371 |avg tokens 3983.000 |tokens/s 31077.010 |walltime 3879.449 | +Transformer | epoch 0 | step 28270 |avg loss 4.948 |avg tokens 4657.800 |tokens/s 34283.890 |walltime 3880.808 | +Transformer | epoch 0 | step 28280 |avg loss 5.053 |avg tokens 4419.800 |tokens/s 31945.723 |walltime 3882.191 | +Transformer | epoch 0 | step 28290 |avg loss 5.176 |avg tokens 4235.800 |tokens/s 31043.919 |walltime 3883.555 | +Transformer | epoch 0 | step 28300 |avg loss 4.628 |avg tokens 4557.800 |tokens/s 33289.994 |walltime 3884.925 | +Transformer | epoch 0 | step 28310 |avg loss 4.721 |avg tokens 4821.900 |tokens/s 35352.759 |walltime 3886.289 | +Transformer | epoch 0 | step 28320 |avg loss 4.791 |avg tokens 4688.500 |tokens/s 33298.848 |walltime 3887.697 | +Transformer | epoch 0 | step 28330 |avg loss 5.043 |avg tokens 3996.900 |tokens/s 30688.473 |walltime 3888.999 | +Transformer | epoch 0 | step 28340 |avg loss 5.120 |avg tokens 4372.400 |tokens/s 31830.401 |walltime 3890.373 | +Transformer | epoch 0 | step 28350 |avg loss 5.049 |avg tokens 4047.600 |tokens/s 29991.870 |walltime 3891.722 | +Transformer | epoch 0 | step 28360 |avg loss 5.093 |avg tokens 4573.800 |tokens/s 34506.261 |walltime 3893.048 | +Transformer | epoch 0 | step 28370 |avg loss 4.713 |avg tokens 4646.700 |tokens/s 34254.438 |walltime 3894.404 | +Transformer | epoch 0 | step 28380 |avg loss 5.177 |avg tokens 4399.900 |tokens/s 32404.972 |walltime 3895.762 | +Transformer | epoch 0 | step 28390 |avg loss 5.333 |avg tokens 4363.200 |tokens/s 33482.623 |walltime 3897.065 | +Transformer | epoch 0 | step 28400 |avg loss 5.315 |avg tokens 4263.800 |tokens/s 30647.703 |walltime 3898.456 | +Transformer | epoch 0 | step 28410 |avg loss 4.956 |avg tokens 4719.400 |tokens/s 33522.496 |walltime 3899.864 | +Transformer | epoch 0 | step 28420 |avg loss 5.327 |avg tokens 4183.600 |tokens/s 29044.477 |walltime 3901.305 | +Transformer | epoch 0 | step 28430 |avg loss 5.541 |avg tokens 4271.100 |tokens/s 33516.062 |walltime 3902.579 | +Transformer | epoch 0 | step 28440 |avg loss 4.647 |avg tokens 4674.800 |tokens/s 33657.695 |walltime 3903.968 | +Transformer | epoch 0 | step 28450 |avg loss 5.340 |avg tokens 4264.800 |tokens/s 32552.314 |walltime 3905.278 | +Transformer | epoch 0 | step 28460 |avg loss 5.214 |avg tokens 4585.100 |tokens/s 34668.881 |walltime 3906.601 | +Transformer | epoch 0 | step 28470 |avg loss 5.559 |avg tokens 3901.300 |tokens/s 30459.674 |walltime 3907.881 | +Transformer | epoch 0 | step 28480 |avg loss 5.168 |avg tokens 4642.600 |tokens/s 33976.868 |walltime 3909.248 | +Transformer | epoch 0 | step 28490 |avg loss 4.728 |avg tokens 4587.600 |tokens/s 33218.493 |walltime 3910.629 | +Transformer | epoch 0 | step 28500 |avg loss 4.690 |avg tokens 4609.800 |tokens/s 33116.244 |walltime 3912.021 | +Transformer | epoch 0 | step 28510 |avg loss 5.264 |avg tokens 4620.000 |tokens/s 33872.689 |walltime 3913.385 | +Transformer | epoch 0 | step 28520 |avg loss 4.760 |avg tokens 4538.800 |tokens/s 34224.634 |walltime 3914.711 | +Transformer | epoch 0 | step 28530 |avg loss 5.709 |avg tokens 4673.600 |tokens/s 34973.156 |walltime 3916.047 | +Transformer | epoch 0 | step 28540 |avg loss 4.895 |avg tokens 4483.500 |tokens/s 33264.120 |walltime 3917.395 | +Transformer | epoch 0 | step 28550 |avg loss 4.937 |avg tokens 4532.600 |tokens/s 33231.690 |walltime 3918.759 | +Transformer | epoch 0 | step 28560 |avg loss 4.366 |avg tokens 4655.700 |tokens/s 32853.650 |walltime 3920.176 | +Transformer | epoch 0 | step 28570 |avg loss 5.272 |avg tokens 4018.200 |tokens/s 29594.229 |walltime 3921.534 | +Transformer | epoch 0 | step 28580 |avg loss 5.631 |avg tokens 4177.700 |tokens/s 32424.501 |walltime 3922.822 | +Transformer | epoch 0 | step 28590 |avg loss 5.791 |avg tokens 3916.900 |tokens/s 30120.936 |walltime 3924.123 | +Transformer | epoch 0 | step 28600 |avg loss 4.964 |avg tokens 4889.200 |tokens/s 35147.319 |walltime 3925.514 | +Transformer | epoch 0 | step 28610 |avg loss 5.068 |avg tokens 4666.700 |tokens/s 34254.931 |walltime 3926.876 | +Transformer | epoch 0 | step 28620 |avg loss 5.439 |avg tokens 4046.500 |tokens/s 31709.082 |walltime 3928.152 | +Transformer | epoch 0 | step 28630 |avg loss 5.371 |avg tokens 4551.200 |tokens/s 32810.912 |walltime 3929.540 | +Transformer | epoch 0 | step 28640 |avg loss 4.305 |avg tokens 4863.200 |tokens/s 33293.827 |walltime 3931.000 | +Transformer | epoch 0 | step 28650 |avg loss 5.342 |avg tokens 4513.900 |tokens/s 34421.983 |walltime 3932.312 | +Transformer | epoch 0 | step 28660 |avg loss 4.505 |avg tokens 4805.000 |tokens/s 33091.414 |walltime 3933.764 | +Transformer | epoch 0 | step 28670 |avg loss 5.653 |avg tokens 3893.600 |tokens/s 29492.081 |walltime 3935.084 | +Transformer | epoch 0 | step 28680 |avg loss 5.039 |avg tokens 4460.400 |tokens/s 33230.690 |walltime 3936.426 | +Transformer | epoch 0 | step 28690 |avg loss 4.833 |avg tokens 4760.600 |tokens/s 34095.514 |walltime 3937.822 | +Transformer | epoch 0 | step 28700 |avg loss 4.957 |avg tokens 4822.100 |tokens/s 35015.108 |walltime 3939.199 | +Transformer | epoch 0 | step 28710 |avg loss 4.912 |avg tokens 4520.200 |tokens/s 33103.529 |walltime 3940.565 | +Transformer | epoch 0 | step 28720 |avg loss 4.993 |avg tokens 4722.800 |tokens/s 35556.114 |walltime 3941.893 | +Transformer | epoch 0 | step 28730 |avg loss 5.322 |avg tokens 4709.400 |tokens/s 35141.671 |walltime 3943.233 | +Transformer | epoch 0 | step 28740 |avg loss 5.126 |avg tokens 4149.700 |tokens/s 31553.432 |walltime 3944.548 | +Transformer | epoch 0 | step 28750 |avg loss 5.827 |avg tokens 4351.600 |tokens/s 33424.085 |walltime 3945.850 | +Transformer | epoch 0 | step 28760 |avg loss 4.770 |avg tokens 4407.600 |tokens/s 31756.609 |walltime 3947.238 | +Transformer | epoch 0 | step 28770 |avg loss 4.795 |avg tokens 4573.600 |tokens/s 32857.133 |walltime 3948.630 | +Transformer | epoch 0 | step 28780 |avg loss 4.999 |avg tokens 4540.500 |tokens/s 33460.817 |walltime 3949.987 | +Transformer | epoch 0 | step 28790 |avg loss 4.819 |avg tokens 4637.400 |tokens/s 33912.485 |walltime 3951.355 | +Transformer | epoch 0 | step 28800 |avg loss 4.752 |avg tokens 4650.400 |tokens/s 33932.275 |walltime 3952.725 | +Transformer | epoch 0 | step 28810 |avg loss 4.461 |avg tokens 4799.400 |tokens/s 34614.209 |walltime 3954.112 | +Transformer | epoch 0 | step 28820 |avg loss 5.069 |avg tokens 4959.300 |tokens/s 36119.218 |walltime 3955.485 | +Transformer | epoch 0 | step 28830 |avg loss 4.997 |avg tokens 4317.800 |tokens/s 31880.390 |walltime 3956.839 | +Transformer | epoch 0 | step 28840 |avg loss 4.716 |avg tokens 4807.800 |tokens/s 33543.749 |walltime 3958.273 | +Transformer | epoch 0 | step 28850 |avg loss 5.282 |avg tokens 4532.600 |tokens/s 32966.513 |walltime 3959.647 | +Transformer | epoch 0 | step 28860 |avg loss 5.792 |avg tokens 4268.100 |tokens/s 32883.894 |walltime 3960.945 | +Transformer | epoch 0 | step 28870 |avg loss 4.791 |avg tokens 4583.000 |tokens/s 32994.486 |walltime 3962.334 | +Transformer | epoch 0 | step 28880 |avg loss 5.308 |avg tokens 4143.200 |tokens/s 31076.530 |walltime 3963.668 | +Transformer | epoch 0 | step 28890 |avg loss 4.632 |avg tokens 4740.900 |tokens/s 33861.286 |walltime 3965.068 | +Transformer | epoch 0 | step 28900 |avg loss 4.851 |avg tokens 4752.100 |tokens/s 34550.118 |walltime 3966.443 | +Transformer | epoch 0 | step 28910 |avg loss 5.053 |avg tokens 4328.800 |tokens/s 30660.072 |walltime 3967.855 | +Transformer | epoch 0 | step 28920 |avg loss 5.124 |avg tokens 4631.800 |tokens/s 33359.658 |walltime 3969.243 | +Transformer | epoch 0 | step 28930 |avg loss 5.043 |avg tokens 4771.700 |tokens/s 33908.007 |walltime 3970.651 | +Transformer | epoch 0 | step 28940 |avg loss 4.483 |avg tokens 4566.500 |tokens/s 32865.633 |walltime 3972.040 | +Transformer | epoch 0 | step 28950 |avg loss 5.068 |avg tokens 4882.600 |tokens/s 35056.192 |walltime 3973.433 | +Transformer | epoch 0 | step 28960 |avg loss 5.142 |avg tokens 4677.100 |tokens/s 35023.697 |walltime 3974.768 | +Transformer | epoch 0 | step 28970 |avg loss 5.096 |avg tokens 4523.200 |tokens/s 33483.267 |walltime 3976.119 | +Transformer | epoch 0 | step 28980 |avg loss 4.919 |avg tokens 4747.900 |tokens/s 35471.259 |walltime 3977.458 | +Transformer | epoch 0 | step 28990 |avg loss 5.364 |avg tokens 4712.800 |tokens/s 35970.304 |walltime 3978.768 | +Transformer | epoch 0 | step 29000 |avg loss 4.615 |avg tokens 4557.100 |tokens/s 33939.105 |walltime 3980.111 | +Transformer | epoch 0 | step 29010 |avg loss 5.316 |avg tokens 4377.900 |tokens/s 31823.072 |walltime 3981.486 | +Transformer | epoch 0 | step 29020 |avg loss 4.834 |avg tokens 4427.300 |tokens/s 31123.074 |walltime 3982.909 | +Transformer | epoch 0 | step 29030 |avg loss 4.983 |avg tokens 4923.200 |tokens/s 35580.754 |walltime 3984.293 | +Transformer | epoch 0 | step 29040 |avg loss 4.839 |avg tokens 4773.200 |tokens/s 34677.293 |walltime 3985.669 | +Transformer | epoch 0 | step 29050 |avg loss 4.845 |avg tokens 4723.600 |tokens/s 34475.337 |walltime 3987.039 | +Transformer | epoch 0 | step 29060 |avg loss 4.888 |avg tokens 4456.300 |tokens/s 32420.417 |walltime 3988.414 | +Transformer | epoch 0 | step 29070 |avg loss 5.028 |avg tokens 4331.000 |tokens/s 31944.455 |walltime 3989.770 | +Transformer | epoch 0 | step 29080 |avg loss 5.103 |avg tokens 4221.600 |tokens/s 31551.636 |walltime 3991.108 | +Transformer | epoch 0 | step 29090 |avg loss 4.513 |avg tokens 4717.600 |tokens/s 32966.057 |walltime 3992.539 | +Transformer | epoch 0 | step 29100 |avg loss 4.974 |avg tokens 4376.700 |tokens/s 32468.789 |walltime 3993.887 | +Transformer | epoch 0 | step 29110 |avg loss 4.967 |avg tokens 4649.700 |tokens/s 33060.150 |walltime 3995.293 | +Transformer | epoch 0 | step 29120 |avg loss 5.147 |avg tokens 4610.600 |tokens/s 34201.672 |walltime 3996.641 | +Transformer | epoch 0 | step 29130 |avg loss 5.609 |avg tokens 4714.700 |tokens/s 35662.681 |walltime 3997.963 | +Transformer | epoch 0 | step 29140 |avg loss 5.122 |avg tokens 4613.500 |tokens/s 33649.571 |walltime 3999.334 | +Transformer | epoch 0 | step 29150 |avg loss 5.848 |avg tokens 4192.600 |tokens/s 31644.450 |walltime 4000.659 | +Transformer | epoch 0 | step 29160 |avg loss 5.029 |avg tokens 4444.500 |tokens/s 32245.878 |walltime 4002.037 | +Transformer | epoch 0 | step 29170 |avg loss 4.607 |avg tokens 4629.000 |tokens/s 32302.655 |walltime 4003.470 | +Transformer | epoch 0 | step 29180 |avg loss 5.130 |avg tokens 4206.900 |tokens/s 31556.836 |walltime 4004.804 | +Transformer | epoch 0 | step 29190 |avg loss 5.016 |avg tokens 4455.200 |tokens/s 33187.734 |walltime 4006.146 | +Transformer | epoch 0 | step 29200 |avg loss 5.149 |avg tokens 4730.100 |tokens/s 33601.647 |walltime 4007.554 | +Transformer | epoch 0 | step 29210 |avg loss 4.977 |avg tokens 4720.000 |tokens/s 34253.179 |walltime 4008.932 | +Transformer | epoch 0 | step 29220 |avg loss 5.010 |avg tokens 4709.300 |tokens/s 35029.993 |walltime 4010.276 | +Transformer | epoch 0 | step 29230 |avg loss 5.034 |avg tokens 4490.800 |tokens/s 33757.676 |walltime 4011.606 | +Transformer | epoch 0 | step 29240 |avg loss 5.514 |avg tokens 4389.900 |tokens/s 33283.157 |walltime 4012.925 | +Transformer | epoch 0 | step 29250 |avg loss 4.611 |avg tokens 4619.500 |tokens/s 33210.273 |walltime 4014.316 | +Transformer | epoch 0 | step 29260 |avg loss 5.603 |avg tokens 4257.100 |tokens/s 32109.608 |walltime 4015.642 | +Transformer | epoch 0 | step 29270 |avg loss 4.706 |avg tokens 4556.800 |tokens/s 32806.061 |walltime 4017.031 | +Transformer | epoch 0 | step 29280 |avg loss 5.379 |avg tokens 4375.900 |tokens/s 33591.021 |walltime 4018.334 | +Transformer | epoch 0 | step 29290 |avg loss 5.240 |avg tokens 4246.900 |tokens/s 31534.228 |walltime 4019.681 | +Transformer | epoch 0 | step 29300 |avg loss 4.972 |avg tokens 4630.300 |tokens/s 34171.342 |walltime 4021.036 | +Transformer | epoch 0 | step 29310 |avg loss 5.057 |avg tokens 4346.000 |tokens/s 32254.594 |walltime 4022.383 | +Transformer | epoch 0 | step 29320 |avg loss 4.745 |avg tokens 4540.000 |tokens/s 33028.049 |walltime 4023.758 | +Transformer | epoch 0 | step 29330 |avg loss 4.741 |avg tokens 4435.800 |tokens/s 32394.285 |walltime 4025.127 | +Transformer | epoch 0 | step 29340 |avg loss 5.387 |avg tokens 4648.200 |tokens/s 35249.553 |walltime 4026.446 | +Transformer | epoch 0 | step 29350 |avg loss 4.663 |avg tokens 4699.200 |tokens/s 33423.968 |walltime 4027.852 | +Transformer | epoch 0 | step 29360 |avg loss 4.851 |avg tokens 4632.200 |tokens/s 32974.275 |walltime 4029.256 | +Transformer | epoch 0 | step 29370 |avg loss 5.049 |avg tokens 4415.800 |tokens/s 32654.200 |walltime 4030.609 | +Transformer | epoch 0 | step 29380 |avg loss 5.253 |avg tokens 4591.600 |tokens/s 34072.510 |walltime 4031.956 | +Transformer | epoch 0 | step 29390 |avg loss 4.691 |avg tokens 4142.800 |tokens/s 32112.860 |walltime 4033.246 | +Transformer | epoch 0 | step 29400 |avg loss 5.295 |avg tokens 4325.500 |tokens/s 31892.986 |walltime 4034.603 | +Transformer | epoch 0 | step 29410 |avg loss 4.648 |avg tokens 4635.200 |tokens/s 33300.352 |walltime 4035.995 | +Transformer | epoch 0 | step 29420 |avg loss 4.843 |avg tokens 4735.700 |tokens/s 34700.979 |walltime 4037.359 | +Transformer | epoch 0 | step 29430 |avg loss 4.364 |avg tokens 4867.200 |tokens/s 34416.183 |walltime 4038.773 | +Transformer | epoch 0 | step 29440 |avg loss 4.974 |avg tokens 4705.500 |tokens/s 34829.219 |walltime 4040.124 | +Transformer | epoch 0 | step 29450 |avg loss 4.784 |avg tokens 4452.600 |tokens/s 32657.130 |walltime 4041.488 | +Transformer | epoch 0 | step 29460 |avg loss 4.985 |avg tokens 4603.600 |tokens/s 34372.754 |walltime 4042.827 | +Transformer | epoch 0 | step 29470 |avg loss 4.951 |avg tokens 4662.400 |tokens/s 34422.217 |walltime 4044.182 | +Transformer | epoch 0 | step 29480 |avg loss 5.311 |avg tokens 4810.200 |tokens/s 35026.899 |walltime 4045.555 | +Transformer | epoch 0 | step 29490 |avg loss 4.421 |avg tokens 4814.900 |tokens/s 34640.617 |walltime 4046.945 | +Transformer | epoch 0 | step 29500 |avg loss 5.058 |avg tokens 4685.000 |tokens/s 34145.540 |walltime 4048.317 | +Transformer | epoch 0 | step 29510 |avg loss 4.750 |avg tokens 4580.700 |tokens/s 32658.694 |walltime 4049.720 | +Transformer | epoch 0 | step 29520 |avg loss 4.633 |avg tokens 4792.600 |tokens/s 34641.782 |walltime 4051.103 | +Transformer | epoch 0 | step 29530 |avg loss 5.588 |avg tokens 4459.500 |tokens/s 33874.370 |walltime 4052.420 | +Transformer | epoch 0 | step 29540 |avg loss 5.184 |avg tokens 3982.400 |tokens/s 30337.868 |walltime 4053.732 | +Transformer | epoch 0 | step 29550 |avg loss 5.204 |avg tokens 4573.000 |tokens/s 33707.205 |walltime 4055.089 | +Transformer | epoch 0 | step 29560 |avg loss 5.186 |avg tokens 4593.200 |tokens/s 33826.879 |walltime 4056.447 | +Transformer | epoch 0 | step 29570 |avg loss 4.694 |avg tokens 4752.700 |tokens/s 34442.328 |walltime 4057.827 | +Transformer | epoch 0 | step 29580 |avg loss 5.175 |avg tokens 4618.100 |tokens/s 32841.303 |walltime 4059.233 | +Transformer | epoch 0 | step 29590 |avg loss 4.539 |avg tokens 4711.800 |tokens/s 34074.657 |walltime 4060.616 | +Transformer | epoch 0 | step 29600 |avg loss 4.718 |avg tokens 4452.900 |tokens/s 32210.765 |walltime 4061.998 | +Transformer | epoch 0 | step 29610 |avg loss 5.431 |avg tokens 4524.600 |tokens/s 33506.557 |walltime 4063.349 | +Transformer | epoch 0 | step 29620 |avg loss 4.910 |avg tokens 4541.100 |tokens/s 33539.861 |walltime 4064.702 | +Transformer | epoch 0 | step 29630 |avg loss 5.166 |avg tokens 4064.000 |tokens/s 29997.512 |walltime 4066.057 | +Transformer | epoch 0 | step 29640 |avg loss 5.374 |avg tokens 4230.000 |tokens/s 32284.502 |walltime 4067.367 | +Transformer | epoch 0 | step 29650 |avg loss 5.144 |avg tokens 4724.000 |tokens/s 34950.252 |walltime 4068.719 | +Transformer | epoch 0 | step 29660 |avg loss 4.844 |avg tokens 4374.400 |tokens/s 33315.097 |walltime 4070.032 | +Transformer | epoch 0 | step 29670 |avg loss 5.474 |avg tokens 4153.400 |tokens/s 31564.216 |walltime 4071.348 | +Transformer | epoch 0 | step 29680 |avg loss 5.647 |avg tokens 4234.600 |tokens/s 32469.334 |walltime 4072.652 | +Transformer | epoch 0 | step 29690 |avg loss 5.179 |avg tokens 4113.900 |tokens/s 30486.688 |walltime 4074.002 | +Transformer | epoch 0 | step 29700 |avg loss 4.526 |avg tokens 4836.700 |tokens/s 34791.238 |walltime 4075.392 | +Transformer | epoch 0 | step 29710 |avg loss 5.101 |avg tokens 4756.200 |tokens/s 34986.733 |walltime 4076.751 | +Transformer | epoch 0 | step 29720 |avg loss 5.350 |avg tokens 3896.300 |tokens/s 29703.194 |walltime 4078.063 | +Transformer | epoch 0 | step 29730 |avg loss 5.478 |avg tokens 4776.700 |tokens/s 34517.520 |walltime 4079.447 | +Transformer | epoch 0 | step 29740 |avg loss 4.686 |avg tokens 4856.300 |tokens/s 34760.495 |walltime 4080.844 | +Transformer | epoch 0 | step 29750 |avg loss 4.583 |avg tokens 4848.800 |tokens/s 33987.210 |walltime 4082.271 | +Transformer | epoch 0 | step 29760 |avg loss 5.186 |avg tokens 4346.600 |tokens/s 32261.416 |walltime 4083.618 | +Transformer | epoch 0 | step 29770 |avg loss 5.372 |avg tokens 4587.700 |tokens/s 34425.154 |walltime 4084.951 | +Transformer | epoch 0 | step 29780 |avg loss 4.891 |avg tokens 4500.200 |tokens/s 32607.782 |walltime 4086.331 | +Transformer | epoch 0 | step 29790 |avg loss 5.784 |avg tokens 4390.100 |tokens/s 32346.922 |walltime 4087.688 | +Transformer | epoch 0 | step 29800 |avg loss 4.938 |avg tokens 4533.600 |tokens/s 34039.539 |walltime 4089.020 | +Transformer | epoch 0 | step 29810 |avg loss 5.062 |avg tokens 4423.000 |tokens/s 32353.743 |walltime 4090.387 | +Transformer | epoch 0 | step 29820 |avg loss 4.514 |avg tokens 4803.200 |tokens/s 34135.636 |walltime 4091.794 | +Transformer | epoch 0 | step 29830 |avg loss 4.856 |avg tokens 4735.800 |tokens/s 33762.066 |walltime 4093.197 | +Transformer | epoch 0 | step 29840 |avg loss 4.494 |avg tokens 4856.200 |tokens/s 33901.718 |walltime 4094.629 | +Transformer | epoch 0 | step 29850 |avg loss 4.460 |avg tokens 4551.800 |tokens/s 31763.696 |walltime 4096.062 | +Transformer | epoch 0 | step 29860 |avg loss 5.399 |avg tokens 4172.400 |tokens/s 30666.673 |walltime 4097.423 | +Transformer | epoch 0 | step 29870 |avg loss 4.738 |avg tokens 4603.100 |tokens/s 32592.667 |walltime 4098.835 | +Transformer | epoch 0 | step 29880 |avg loss 4.915 |avg tokens 4733.600 |tokens/s 30358.350 |walltime 4100.394 | +Transformer | epoch 0 | step 29890 |avg loss 5.315 |avg tokens 4883.900 |tokens/s 35191.260 |walltime 4101.782 | +Transformer | epoch 0 | step 29900 |avg loss 4.969 |avg tokens 4628.000 |tokens/s 33742.891 |walltime 4103.154 | +Transformer | epoch 0 | step 29910 |avg loss 4.753 |avg tokens 4698.400 |tokens/s 33773.630 |walltime 4104.545 | +Transformer | epoch 0 | step 29920 |avg loss 5.414 |avg tokens 4349.800 |tokens/s 32205.852 |walltime 4105.895 | +Transformer | epoch 0 | step 29930 |avg loss 4.504 |avg tokens 4812.700 |tokens/s 33473.862 |walltime 4107.333 | +Transformer | epoch 0 | step 29940 |avg loss 4.560 |avg tokens 4369.500 |tokens/s 30904.547 |walltime 4108.747 | +Transformer | epoch 0 | step 29950 |avg loss 4.854 |avg tokens 4512.600 |tokens/s 32908.599 |walltime 4110.118 | +Transformer | epoch 0 | step 29960 |avg loss 5.267 |avg tokens 4456.600 |tokens/s 33154.818 |walltime 4111.462 | +Transformer | epoch 0 | step 29970 |avg loss 4.498 |avg tokens 4649.800 |tokens/s 33533.754 |walltime 4112.849 | +Transformer | epoch 0 | step 29980 |avg loss 5.479 |avg tokens 4283.100 |tokens/s 32898.186 |walltime 4114.151 | +Transformer | epoch 0 | step 29990 |avg loss 4.703 |avg tokens 4596.800 |tokens/s 33247.692 |walltime 4115.534 | +Transformer | epoch 0 | step 30000 |avg loss 6.085 |avg tokens 4297.700 |tokens/s 32970.635 |walltime 4116.837 | +Transformer | epoch 0 | step 30010 |avg loss 4.825 |avg tokens 4403.600 |tokens/s 32275.846 |walltime 4118.201 | +Transformer | epoch 0 | step 30020 |avg loss 5.378 |avg tokens 4488.100 |tokens/s 32782.397 |walltime 4119.570 | +Transformer | epoch 0 | step 30030 |avg loss 4.626 |avg tokens 4661.300 |tokens/s 34413.460 |walltime 4120.925 | +Transformer | epoch 0 | step 30040 |avg loss 5.836 |avg tokens 4725.100 |tokens/s 35408.753 |walltime 4122.259 | +Transformer | epoch 0 | step 30050 |avg loss 5.245 |avg tokens 4201.100 |tokens/s 30846.372 |walltime 4123.621 | +Transformer | epoch 0 | step 30060 |avg loss 4.576 |avg tokens 4724.100 |tokens/s 32570.038 |walltime 4125.072 | +Transformer | epoch 0 | step 30070 |avg loss 5.302 |avg tokens 3884.800 |tokens/s 29515.169 |walltime 4126.388 | +Transformer | epoch 0 | step 30080 |avg loss 5.493 |avg tokens 4686.700 |tokens/s 34986.101 |walltime 4127.728 | +Transformer | epoch 0 | step 30090 |avg loss 5.308 |avg tokens 4622.200 |tokens/s 35133.711 |walltime 4129.043 | +Transformer | epoch 0 | step 30100 |avg loss 4.803 |avg tokens 4597.200 |tokens/s 33587.689 |walltime 4130.412 | +Transformer | epoch 0 | step 30110 |avg loss 5.374 |avg tokens 4070.800 |tokens/s 31966.139 |walltime 4131.685 | +Transformer | epoch 0 | step 30120 |avg loss 5.122 |avg tokens 4649.800 |tokens/s 33734.216 |walltime 4133.064 | +Transformer | epoch 0 | step 30130 |avg loss 4.763 |avg tokens 4657.800 |tokens/s 33100.028 |walltime 4134.471 | +Transformer | epoch 0 | step 30140 |avg loss 4.606 |avg tokens 4446.900 |tokens/s 32238.893 |walltime 4135.850 | +Transformer | epoch 0 | step 30150 |avg loss 4.727 |avg tokens 4903.200 |tokens/s 35678.276 |walltime 4137.225 | +Transformer | epoch 0 | step 30160 |avg loss 4.842 |avg tokens 4763.900 |tokens/s 34588.023 |walltime 4138.602 | +Transformer | epoch 0 | step 30170 |avg loss 5.112 |avg tokens 4563.300 |tokens/s 33748.321 |walltime 4139.954 | +Transformer | epoch 0 | step 30180 |avg loss 4.628 |avg tokens 4733.100 |tokens/s 33431.845 |walltime 4141.370 | +Transformer | epoch 0 | step 30190 |avg loss 4.780 |avg tokens 4346.200 |tokens/s 30756.026 |walltime 4142.783 | +Transformer | epoch 0 | step 30200 |avg loss 5.293 |avg tokens 4313.600 |tokens/s 32213.985 |walltime 4144.122 | +Transformer | epoch 0 | step 30210 |avg loss 5.244 |avg tokens 4048.700 |tokens/s 30581.946 |walltime 4145.446 | +Transformer | epoch 0 | step 30220 |avg loss 4.577 |avg tokens 4744.800 |tokens/s 33786.580 |walltime 4146.850 | +Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 64.0 +Transformer | epoch 0 | step 30230 |avg loss 4.625 |avg tokens 4746.200 |tokens/s 34209.861 |walltime 4148.238 | +Transformer | epoch 0 | step 30240 |avg loss 4.695 |avg tokens 4622.400 |tokens/s 32628.638 |walltime 4149.654 | +Transformer | epoch 0 | step 30250 |avg loss 5.039 |avg tokens 4152.400 |tokens/s 30807.831 |walltime 4151.002 | +Transformer | epoch 0 | step 30260 |avg loss 4.251 |avg tokens 4775.200 |tokens/s 33145.875 |walltime 4152.443 | +Transformer | epoch 0 | step 30270 |avg loss 5.145 |avg tokens 4298.800 |tokens/s 31437.060 |walltime 4153.810 | +Transformer | epoch 0 | step 30280 |avg loss 4.872 |avg tokens 4340.900 |tokens/s 32443.926 |walltime 4155.148 | +Transformer | epoch 0 | step 30290 |avg loss 4.769 |avg tokens 4432.600 |tokens/s 33219.263 |walltime 4156.483 | +Transformer | epoch 0 | step 30300 |avg loss 5.105 |avg tokens 4814.800 |tokens/s 34058.453 |walltime 4157.896 | +Transformer | epoch 0 | step 30310 |avg loss 4.855 |avg tokens 4672.400 |tokens/s 32618.115 |walltime 4159.329 | +Transformer | epoch 0 | step 30320 |avg loss 5.540 |avg tokens 4628.300 |tokens/s 34824.227 |walltime 4160.658 | +Transformer | epoch 0 | step 30330 |avg loss 5.521 |avg tokens 4736.500 |tokens/s 34416.778 |walltime 4162.034 | +Transformer | epoch 0 | step 30340 |avg loss 4.959 |avg tokens 4770.400 |tokens/s 34132.888 |walltime 4163.432 | +Transformer | epoch 0 | step 30350 |avg loss 4.515 |avg tokens 4849.800 |tokens/s 35007.073 |walltime 4164.817 | +Transformer | epoch 0 | step 30360 |avg loss 5.371 |avg tokens 4239.700 |tokens/s 32233.028 |walltime 4166.132 | +Transformer | epoch 0 | step 30370 |avg loss 4.907 |avg tokens 4460.300 |tokens/s 32148.763 |walltime 4167.520 | +Transformer | epoch 0 | step 30380 |avg loss 5.041 |avg tokens 4665.100 |tokens/s 33650.003 |walltime 4168.906 | +Transformer | epoch 0 | step 30390 |avg loss 5.263 |avg tokens 4567.000 |tokens/s 34173.174 |walltime 4170.243 | +Transformer | epoch 0 | step 30400 |avg loss 4.952 |avg tokens 4778.000 |tokens/s 35153.025 |walltime 4171.602 | +Transformer | epoch 0 | step 30410 |avg loss 5.336 |avg tokens 4675.400 |tokens/s 33961.870 |walltime 4172.978 | +Transformer | epoch 0 | step 30420 |avg loss 5.165 |avg tokens 4400.200 |tokens/s 32525.278 |walltime 4174.331 | +Transformer | epoch 0 | step 30430 |avg loss 4.910 |avg tokens 4388.100 |tokens/s 31464.449 |walltime 4175.726 | +Transformer | epoch 0 | step 30440 |avg loss 4.890 |avg tokens 4530.400 |tokens/s 32337.149 |walltime 4177.127 | +Transformer | epoch 0 | step 30450 |avg loss 5.298 |avg tokens 4820.100 |tokens/s 35286.294 |walltime 4178.493 | +Transformer | epoch 0 | step 30460 |avg loss 5.151 |avg tokens 4328.100 |tokens/s 31477.222 |walltime 4179.868 | +Transformer | epoch 0 | step 30470 |avg loss 4.719 |avg tokens 4811.700 |tokens/s 34366.498 |walltime 4181.268 | +Transformer | epoch 0 | step 30480 |avg loss 5.371 |avg tokens 4138.700 |tokens/s 31508.547 |walltime 4182.582 | +Transformer | epoch 0 | step 30490 |avg loss 5.199 |avg tokens 4374.500 |tokens/s 33132.332 |walltime 4183.902 | +Transformer | epoch 0 | step 30500 |avg loss 4.971 |avg tokens 4746.400 |tokens/s 33706.415 |walltime 4185.310 | +Transformer | epoch 0 | step 30510 |avg loss 4.380 |avg tokens 4970.400 |tokens/s 34987.847 |walltime 4186.731 | +Transformer | epoch 0 | step 30520 |avg loss 4.821 |avg tokens 4940.700 |tokens/s 35777.205 |walltime 4188.112 | +Transformer | epoch 0 | step 30530 |avg loss 5.507 |avg tokens 4075.200 |tokens/s 30760.919 |walltime 4189.436 | +Transformer | epoch 0 | step 30540 |avg loss 4.665 |avg tokens 4301.300 |tokens/s 30964.487 |walltime 4190.826 | +Transformer | epoch 0 | step 30550 |avg loss 4.989 |avg tokens 4645.500 |tokens/s 34491.014 |walltime 4192.172 | +Transformer | epoch 0 | step 30560 |avg loss 5.259 |avg tokens 4559.100 |tokens/s 33985.543 |walltime 4193.514 | +Transformer | epoch 0 | step 30570 |avg loss 5.290 |avg tokens 4352.100 |tokens/s 32366.731 |walltime 4194.859 | +Transformer | epoch 0 | step 30580 |avg loss 4.684 |avg tokens 4706.400 |tokens/s 34730.872 |walltime 4196.214 | +Transformer | epoch 0 | step 30590 |avg loss 5.335 |avg tokens 3970.000 |tokens/s 29094.041 |walltime 4197.578 | +Transformer | epoch 0 | step 30600 |avg loss 5.159 |avg tokens 4052.500 |tokens/s 29926.790 |walltime 4198.932 | +Transformer | epoch 0 | step 30610 |avg loss 5.557 |avg tokens 4094.200 |tokens/s 31661.527 |walltime 4200.225 | +Transformer | epoch 0 | step 30620 |avg loss 5.103 |avg tokens 4345.800 |tokens/s 32139.745 |walltime 4201.578 | +Transformer | epoch 0 | step 30630 |avg loss 5.473 |avg tokens 4714.500 |tokens/s 34808.651 |walltime 4202.932 | +Transformer | epoch 0 | step 30640 |avg loss 4.763 |avg tokens 4635.200 |tokens/s 32780.083 |walltime 4204.346 | +Transformer | epoch 0 | step 30650 |avg loss 4.661 |avg tokens 4840.500 |tokens/s 34258.088 |walltime 4205.759 | +Transformer | epoch 0 | step 30660 |avg loss 5.263 |avg tokens 4244.400 |tokens/s 31130.648 |walltime 4207.122 | +Transformer | epoch 0 | step 30670 |avg loss 5.309 |avg tokens 4469.400 |tokens/s 33633.091 |walltime 4208.451 | +Transformer | epoch 0 | step 30680 |avg loss 4.937 |avg tokens 4526.200 |tokens/s 33129.798 |walltime 4209.817 | +Transformer | epoch 0 | step 30690 |avg loss 4.665 |avg tokens 4316.200 |tokens/s 30630.312 |walltime 4211.227 | +Transformer | epoch 0 | step 30700 |avg loss 4.996 |avg tokens 4419.200 |tokens/s 31811.011 |walltime 4212.616 | +Transformer | epoch 0 | step 30710 |avg loss 5.739 |avg tokens 4346.700 |tokens/s 33304.746 |walltime 4213.921 | +Transformer | epoch 0 | step 30720 |avg loss 5.109 |avg tokens 4170.500 |tokens/s 31019.718 |walltime 4215.265 | +Transformer | epoch 0 | step 30730 |avg loss 5.082 |avg tokens 4090.600 |tokens/s 30423.532 |walltime 4216.610 | +Transformer | epoch 0 | step 30740 |avg loss 5.186 |avg tokens 4807.700 |tokens/s 35658.687 |walltime 4217.958 | +Transformer | epoch 0 | step 30750 |avg loss 5.220 |avg tokens 4152.000 |tokens/s 30580.660 |walltime 4219.316 | +Transformer | epoch 0 | step 30760 |avg loss 5.242 |avg tokens 4089.600 |tokens/s 30518.004 |walltime 4220.656 | +Transformer | epoch 0 | step 30770 |avg loss 4.340 |avg tokens 4983.200 |tokens/s 33643.068 |walltime 4222.137 | +Transformer | epoch 0 | step 30780 |avg loss 4.611 |avg tokens 4831.400 |tokens/s 33103.251 |walltime 4223.597 | +Transformer | epoch 0 | step 30790 |avg loss 5.159 |avg tokens 4396.200 |tokens/s 32250.666 |walltime 4224.960 | +Transformer | epoch 0 | step 30800 |avg loss 4.753 |avg tokens 4478.200 |tokens/s 31784.795 |walltime 4226.369 | +Transformer | epoch 0 | step 30810 |avg loss 5.034 |avg tokens 4150.600 |tokens/s 29327.837 |walltime 4227.784 | +Transformer | epoch 0 | step 30820 |avg loss 4.480 |avg tokens 4687.600 |tokens/s 32038.139 |walltime 4229.247 | +Transformer | epoch 0 | step 30830 |avg loss 4.915 |avg tokens 4306.700 |tokens/s 31098.573 |walltime 4230.632 | +Transformer | epoch 0 | step 30840 |avg loss 4.644 |avg tokens 4649.800 |tokens/s 32116.845 |walltime 4232.080 | +Transformer | epoch 0 | step 30850 |avg loss 4.720 |avg tokens 4397.000 |tokens/s 31189.759 |walltime 4233.490 | +Transformer | epoch 0 | step 30860 |avg loss 4.811 |avg tokens 4922.900 |tokens/s 33834.998 |walltime 4234.945 | +Transformer | epoch 0 | step 30870 |avg loss 4.472 |avg tokens 4726.500 |tokens/s 31575.309 |walltime 4236.441 | +Transformer | epoch 0 | step 30880 |avg loss 4.442 |avg tokens 4740.400 |tokens/s 33104.395 |walltime 4237.873 | +Transformer | epoch 0 | step 30890 |avg loss 5.046 |avg tokens 4180.900 |tokens/s 30986.443 |walltime 4239.223 | +Transformer | epoch 0 | step 30900 |avg loss 5.233 |avg tokens 4683.500 |tokens/s 33550.432 |walltime 4240.619 | +Transformer | epoch 0 | step 30910 |avg loss 4.998 |avg tokens 4518.200 |tokens/s 32141.422 |walltime 4242.024 | +Transformer | epoch 0 | step 30920 |avg loss 4.787 |avg tokens 4710.200 |tokens/s 31839.580 |walltime 4243.504 | +Transformer | epoch 0 | step 30930 |avg loss 4.543 |avg tokens 4741.800 |tokens/s 32540.802 |walltime 4244.961 | +Transformer | epoch 0 | step 30940 |avg loss 4.762 |avg tokens 4518.800 |tokens/s 32313.807 |walltime 4246.359 | +Transformer | epoch 0 | step 30950 |avg loss 4.583 |avg tokens 4801.600 |tokens/s 33199.257 |walltime 4247.806 | +Transformer | epoch 0 | step 30960 |avg loss 4.809 |avg tokens 4640.500 |tokens/s 33121.799 |walltime 4249.207 | +Transformer | epoch 0 | step 30970 |avg loss 4.709 |avg tokens 4759.100 |tokens/s 34237.575 |walltime 4250.597 | +Transformer | epoch 0 | step 30980 |avg loss 5.860 |avg tokens 3754.900 |tokens/s 29913.615 |walltime 4251.852 | +Transformer | epoch 0 | step 30990 |avg loss 5.022 |avg tokens 4757.400 |tokens/s 34202.290 |walltime 4253.243 | +Transformer | epoch 0 | step 31000 |avg loss 5.204 |avg tokens 4737.200 |tokens/s 33888.602 |walltime 4254.641 | +Transformer | epoch 0 | step 31010 |avg loss 4.663 |avg tokens 4667.500 |tokens/s 33127.874 |walltime 4256.050 | +Transformer | epoch 0 | step 31020 |avg loss 5.052 |avg tokens 4582.100 |tokens/s 33456.661 |walltime 4257.419 | +Transformer | epoch 0 | step 31030 |avg loss 5.057 |avg tokens 4656.100 |tokens/s 34408.956 |walltime 4258.772 | +Transformer | epoch 0 | step 31040 |avg loss 4.626 |avg tokens 4684.800 |tokens/s 33444.384 |walltime 4260.173 | +Transformer | epoch 0 | step 31050 |avg loss 4.906 |avg tokens 4470.100 |tokens/s 32786.056 |walltime 4261.537 | +Transformer | epoch 0 | step 31060 |avg loss 5.275 |avg tokens 4774.600 |tokens/s 35874.427 |walltime 4262.868 | +Transformer | epoch 0 | step 31070 |avg loss 4.909 |avg tokens 4450.600 |tokens/s 33311.805 |walltime 4264.204 | +Transformer | epoch 0 | step 31080 |avg loss 5.660 |avg tokens 4411.300 |tokens/s 33072.700 |walltime 4265.537 | +Transformer | epoch 0 | step 31090 |avg loss 4.541 |avg tokens 4702.400 |tokens/s 33394.962 |walltime 4266.946 | +Transformer | epoch 0 | step 31100 |avg loss 4.854 |avg tokens 4411.500 |tokens/s 32169.656 |walltime 4268.317 | +Transformer | epoch 0 | step 31110 |avg loss 5.347 |avg tokens 4470.500 |tokens/s 33239.320 |walltime 4269.662 | +Transformer | epoch 0 | step 31120 |avg loss 4.678 |avg tokens 4876.200 |tokens/s 35035.154 |walltime 4271.054 | +Transformer | epoch 0 | step 31130 |avg loss 5.387 |avg tokens 4145.400 |tokens/s 31626.266 |walltime 4272.364 | +Transformer | epoch 0 | step 31140 |avg loss 4.720 |avg tokens 4435.400 |tokens/s 32433.536 |walltime 4273.732 | +Transformer | epoch 0 | step 31150 |avg loss 4.443 |avg tokens 4749.600 |tokens/s 33312.647 |walltime 4275.158 | +Transformer | epoch 0 | step 31160 |avg loss 4.601 |avg tokens 4374.900 |tokens/s 31828.746 |walltime 4276.532 | +Transformer | epoch 0 | step 31170 |avg loss 4.740 |avg tokens 4348.800 |tokens/s 31374.515 |walltime 4277.918 | +Transformer | epoch 0 | step 31180 |avg loss 4.813 |avg tokens 4757.700 |tokens/s 34627.519 |walltime 4279.292 | +Transformer | epoch 0 | step 31190 |avg loss 5.224 |avg tokens 4637.800 |tokens/s 35527.856 |walltime 4280.598 | +Transformer | epoch 0 | step 31200 |avg loss 5.088 |avg tokens 4266.600 |tokens/s 30180.264 |walltime 4282.011 | +Transformer | epoch 0 | step 31210 |avg loss 4.914 |avg tokens 4462.400 |tokens/s 33316.397 |walltime 4283.351 | +Transformer | epoch 0 | step 31220 |avg loss 4.733 |avg tokens 4759.500 |tokens/s 34146.596 |walltime 4284.745 | +Transformer | epoch 0 | step 31230 |avg loss 4.809 |avg tokens 4484.700 |tokens/s 32668.397 |walltime 4286.117 | +Transformer | epoch 0 | step 31240 |avg loss 4.738 |avg tokens 4776.000 |tokens/s 33722.928 |walltime 4287.534 | +Transformer | epoch 0 | step 31250 |avg loss 4.674 |avg tokens 4780.800 |tokens/s 33709.424 |walltime 4288.952 | +Transformer | epoch 0 | step 31260 |avg loss 4.927 |avg tokens 4218.800 |tokens/s 30090.204 |walltime 4290.354 | +Transformer | epoch 0 | step 31270 |avg loss 5.284 |avg tokens 4297.300 |tokens/s 30994.602 |walltime 4291.740 | +Transformer | epoch 0 | step 31280 |avg loss 4.640 |avg tokens 4646.700 |tokens/s 32729.052 |walltime 4293.160 | +Transformer | epoch 0 | step 31290 |avg loss 4.784 |avg tokens 4589.600 |tokens/s 32747.475 |walltime 4294.562 | +Transformer | epoch 0 | step 31300 |avg loss 5.270 |avg tokens 4459.000 |tokens/s 32583.188 |walltime 4295.930 | +Transformer | epoch 0 | step 31310 |avg loss 4.665 |avg tokens 4796.500 |tokens/s 34653.280 |walltime 4297.314 | +Transformer | epoch 0 | step 31320 |avg loss 4.488 |avg tokens 4823.300 |tokens/s 33547.510 |walltime 4298.752 | +Transformer | epoch 0 | step 31330 |avg loss 5.436 |avg tokens 4161.700 |tokens/s 31687.883 |walltime 4300.065 | +Transformer | epoch 0 | step 31340 |avg loss 4.568 |avg tokens 4619.400 |tokens/s 33421.637 |walltime 4301.448 | +Transformer | epoch 0 | step 31350 |avg loss 5.184 |avg tokens 4284.100 |tokens/s 31776.362 |walltime 4302.796 | +Transformer | epoch 0 | step 31360 |avg loss 5.026 |avg tokens 4272.200 |tokens/s 31338.702 |walltime 4304.159 | +Transformer | epoch 0 | step 31370 |avg loss 5.341 |avg tokens 4303.000 |tokens/s 32396.325 |walltime 4305.487 | +Transformer | epoch 0 | step 31380 |avg loss 4.680 |avg tokens 4485.600 |tokens/s 31796.790 |walltime 4306.898 | +Transformer | epoch 0 | step 31390 |avg loss 4.923 |avg tokens 4600.800 |tokens/s 33861.333 |walltime 4308.257 | +Transformer | epoch 0 | step 31400 |avg loss 4.945 |avg tokens 4280.100 |tokens/s 31543.645 |walltime 4309.614 | +Transformer | epoch 0 | step 31410 |avg loss 4.262 |avg tokens 4780.500 |tokens/s 33451.329 |walltime 4311.043 | +Transformer | epoch 0 | step 31420 |avg loss 4.992 |avg tokens 4155.000 |tokens/s 30687.959 |walltime 4312.397 | +Transformer | epoch 0 | step 31430 |avg loss 5.285 |avg tokens 3986.700 |tokens/s 30536.715 |walltime 4313.702 | +Transformer | epoch 0 | step 31440 |avg loss 4.992 |avg tokens 4487.100 |tokens/s 32455.290 |walltime 4315.085 | +Transformer | epoch 0 | step 31450 |avg loss 4.440 |avg tokens 4595.100 |tokens/s 33169.350 |walltime 4316.470 | +Transformer | epoch 0 | step 31460 |avg loss 4.615 |avg tokens 4754.400 |tokens/s 34588.826 |walltime 4317.845 | +Transformer | epoch 0 | step 31470 |avg loss 4.862 |avg tokens 4802.100 |tokens/s 34766.451 |walltime 4319.226 | +Transformer | epoch 0 | step 31480 |avg loss 4.605 |avg tokens 4997.600 |tokens/s 36468.275 |walltime 4320.596 | +Epoch time: 4311.053228378296 +Transformer | epoch 0 | step 31487 |avg loss 4.651 |avg tokens 4388.143 |tokens/s 26045.321 |walltime 4321.776 | +Validation loss on subset valid: 4.47371638012596 /workspace/translation/fairseq/sequence_generator.py:376: UserWarning: Integer division of tensors using div or / is deprecated, and in a future release div will perform true division as in Python 3. Use true_divide or floor_divide (// in Python) instead. (Triggered internally at ../aten/src/ATen/native/BinaryOps.cpp:66.) torch.div(cand_indices, self.vocab_size, out=cand_beams) -| Translated 3000 sentences (98034 tokens) in 66.5s (45.11 sentences/s, 1474.17 tokens/s) -| Eval completed in: 87.92s | UNCASED BLEU 0.70 -| done training in 4765.1 seconds -Transformer | epoch 0 | step RUN |avg loss 8.048 |walltime 4775.538 | +| Translated 3000 sentences (80276 tokens) in 18.6s (161.26 sentences/s, 4315.16 tokens/s) +| Eval completed in: 40.31s | UNCASED BLEU 10.28 +| done training in 4365.6 seconds +Transformer | epoch 0 | step RUN |avg loss 4.474 |walltime 4376.398 | diff --git a/Transformer/OtherReports/PyTorch/logs/transformer.pyt_transformer_fp32_bs2560_gpu1.log b/Transformer/OtherReports/PyTorch/logs/transformer.pyt_transformer_fp32_bs2560_gpu1.log index 2fbb731..39d150c 100644 --- a/Transformer/OtherReports/PyTorch/logs/transformer.pyt_transformer_fp32_bs2560_gpu1.log +++ b/Transformer/OtherReports/PyTorch/logs/transformer.pyt_transformer_fp32_bs2560_gpu1.log @@ -1,5 +1,5 @@ nohup: ignoring input -Namespace(adam_betas='(0.9, 0.997)', adam_eps=1e-09, adaptive_softmax_cutoff=None, amp=False, amp_level='O1', arch='transformer_wmt_en_de_big_t2t', attention_dropout=0.1, beam=4, bpe_codes=None, buffer_size=64, clip_norm=0.0, cpu=False, criterion='label_smoothed_cross_entropy', data='/data/wmt14_en_de_joined_dict', decoder_attention_heads=16, decoder_embed_dim=1024, decoder_embed_path=None, decoder_ffn_embed_dim=4096, decoder_layers=6, decoder_learned_pos=False, decoder_normalize_before=True, device_id=0, distributed_backend='nccl', distributed_init_method=None, distributed_port=-1, distributed_rank=0, distributed_world_size=1, do_sanity_check=False, dropout=0.1, enable_parallel_backward_allred_opt=False, enable_parallel_backward_allred_opt_correctness_check=False, encoder_attention_heads=16, encoder_embed_dim=1024, encoder_embed_path=None, encoder_ffn_embed_dim=4096, encoder_layers=6, encoder_learned_pos=False, encoder_normalize_before=True, fp16=False, fuse_dropout_add=False, fuse_layer_norm=True, fuse_relu_dropout=False, gen_subset='test', keep_interval_updates=-1, label_smoothing=0.1, left_pad_source=True, left_pad_target=False, lenpen=1, local_rank=0, log_interval=500, lr=[0.0006], lr_scheduler='inverse_sqrt', lr_shrink=0.1, max_epoch=1, max_len_a=0, max_len_b=200, max_positions=(1024, 1024), max_sentences=None, max_sentences_valid=None, max_source_positions=1024, max_target_positions=1024, max_tokens=2560, max_update=0, min_len=1, min_loss_scale=0.0001, min_lr=0.0, model_overrides='{}', momentum=0.99, nbest=1, no_beamable_mm=False, no_early_stop=False, no_epoch_checkpoints=False, no_save=False, no_token_positional_embeddings=False, num_shards=1, online_eval=False, optimizer='adam', pad_sequence=1, parallel_backward_allred_opt_threshold=0, path=None, prefix_size=0, print_alignment=False, profile=False, profiler_file=None, profiler_steps=100, quiet=False, raw_text=False, relu_dropout=0.1, remove_bpe=None, replace_unk=None, restore_file='checkpoint_last.pt', sampling=False, sampling_temperature=1, sampling_topk=-1, save_dir='/results/checkpoints', save_interval=1, save_interval_updates=0, save_predictions=False, score_reference=False, seed=1, sentence_avg=False, sentencepiece=False, shard_id=0, share_all_embeddings=True, share_decoder_input_output_embed=False, skip_invalid_size_inputs_valid_test=False, source_lang=None, stat_file='/results/run_log.json', target_bleu=0.0, target_lang=None, test_cased_bleu=False, train_subset='train', unkpen=0, unnormalized=False, update_freq=[1], valid_subset='valid', validate_interval=1, warmup_init_lr=0.0, warmup_updates=4000, weight_decay=0.0) +Namespace(adam_betas='(0.9, 0.997)', adam_eps=1e-09, adaptive_softmax_cutoff=None, amp=False, amp_level='O1', arch='transformer_wmt_en_de_big_t2t', attention_dropout=0.1, beam=4, bpe_codes=None, buffer_size=64, clip_norm=0.0, cpu=False, criterion='label_smoothed_cross_entropy', data='/data/wmt14_en_de_joined_dict', decoder_attention_heads=16, decoder_embed_dim=1024, decoder_embed_path=None, decoder_ffn_embed_dim=4096, decoder_layers=6, decoder_learned_pos=False, decoder_normalize_before=True, device_id=0, distributed_backend='nccl', distributed_init_method=None, distributed_port=-1, distributed_rank=0, distributed_world_size=1, do_sanity_check=False, dropout=0.1, enable_parallel_backward_allred_opt=False, enable_parallel_backward_allred_opt_correctness_check=False, encoder_attention_heads=16, encoder_embed_dim=1024, encoder_embed_path=None, encoder_ffn_embed_dim=4096, encoder_layers=6, encoder_learned_pos=False, encoder_normalize_before=True, fp16=False, fuse_dropout_add=False, fuse_layer_norm=True, fuse_relu_dropout=False, gen_subset='test', keep_interval_updates=-1, label_smoothing=0.1, left_pad_source=True, left_pad_target=False, lenpen=1, local_rank=0, log_interval=10, lr=[0.0006], lr_scheduler='inverse_sqrt', lr_shrink=0.1, max_epoch=1, max_len_a=0, max_len_b=200, max_positions=(1024, 1024), max_sentences=None, max_sentences_valid=None, max_source_positions=1024, max_target_positions=1024, max_tokens=2560, max_update=0, min_len=1, min_loss_scale=0.0001, min_lr=0.0, model_overrides='{}', momentum=0.99, nbest=1, no_beamable_mm=False, no_early_stop=False, no_epoch_checkpoints=False, no_save=False, no_token_positional_embeddings=False, num_shards=1, online_eval=False, optimizer='adam', pad_sequence=1, parallel_backward_allred_opt_threshold=0, path=None, prefix_size=0, print_alignment=False, profile=False, profiler_file=None, profiler_steps=100, quiet=False, raw_text=False, relu_dropout=0.1, remove_bpe=None, replace_unk=None, restore_file='checkpoint_last.pt', sampling=False, sampling_temperature=1, sampling_topk=-1, save_dir='/results/checkpoints', save_interval=1, save_interval_updates=0, save_predictions=False, score_reference=False, seed=1, sentence_avg=False, sentencepiece=False, shard_id=0, share_all_embeddings=True, share_decoder_input_output_embed=False, skip_invalid_size_inputs_valid_test=False, source_lang=None, stat_file='/results/run_log.json', target_bleu=0.0, target_lang=None, test_cased_bleu=False, train_subset='train', unkpen=0, unnormalized=False, update_freq=[1], valid_subset='valid', validate_interval=1, warmup_init_lr=0.0, warmup_updates=4000, weight_decay=0.0) | [en] dictionary: 33712 types | [de] dictionary: 33712 types | /data/wmt14_en_de_joined_dict train 4575637 examples @@ -13,142 +13,6531 @@ Namespace(adam_betas='(0.9, 0.997)', adam_eps=1e-09, adaptive_softmax_cutoff=Non | Sentences are being padded to multiples of: 1 | Sentences are being padded to multiples of: 1 | Sentences are being padded to multiples of: 1 -Transformer | epoch 0 | step 500 |avg loss 12.152 |avg tokens 2195.818 |tokens/s 8289.358 |walltime 142.634 | -Transformer | epoch 0 | step 1000 |avg loss 10.630 |avg tokens 2202.080 |tokens/s 8228.213 |walltime 276.447 | -Transformer | epoch 0 | step 1500 |avg loss 9.917 |avg tokens 2206.962 |tokens/s 8312.206 |walltime 409.201 | -Transformer | epoch 0 | step 2000 |avg loss 9.383 |avg tokens 2171.184 |tokens/s 8177.711 |walltime 541.951 | -Transformer | epoch 0 | step 2500 |avg loss 8.994 |avg tokens 2185.496 |tokens/s 8263.584 |walltime 674.188 | -Transformer | epoch 0 | step 3000 |avg loss 8.660 |avg tokens 2204.280 |tokens/s 8301.811 |walltime 806.947 | -Transformer | epoch 0 | step 3500 |avg loss 8.411 |avg tokens 2195.928 |tokens/s 8263.810 |walltime 939.811 | -Transformer | epoch 0 | step 4000 |avg loss 8.222 |avg tokens 2206.462 |tokens/s 8299.936 |walltime 1072.731 | -Transformer | epoch 0 | step 4500 |avg loss 8.082 |avg tokens 2182.408 |tokens/s 8276.575 |walltime 1204.574 | -Transformer | epoch 0 | step 5000 |avg loss 7.891 |avg tokens 2192.006 |tokens/s 8297.123 |walltime 1336.668 | -Transformer | epoch 0 | step 5500 |avg loss 7.858 |avg tokens 2150.818 |tokens/s 8207.926 |walltime 1467.689 | -Transformer | epoch 0 | step 6000 |avg loss 7.723 |avg tokens 2184.456 |tokens/s 8273.980 |walltime 1599.697 | -Transformer | epoch 0 | step 6500 |avg loss 7.624 |avg tokens 2188.844 |tokens/s 8280.595 |walltime 1731.864 | -Transformer | epoch 0 | step 7000 |avg loss 7.616 |avg tokens 2169.096 |tokens/s 8213.447 |walltime 1863.909 | -Transformer | epoch 0 | step 7500 |avg loss 7.600 |avg tokens 2200.412 |tokens/s 8328.036 |walltime 1996.018 | -Transformer | epoch 0 | step 8000 |avg loss 7.586 |avg tokens 2179.380 |tokens/s 8275.324 |walltime 2127.697 | -Transformer | epoch 0 | step 8500 |avg loss 7.550 |avg tokens 2201.336 |tokens/s 8306.888 |walltime 2260.198 | -Transformer | epoch 0 | step 9000 |avg loss 7.437 |avg tokens 2186.126 |tokens/s 8228.255 |walltime 2393.040 | -Transformer | epoch 0 | step 9500 |avg loss 7.460 |avg tokens 2194.148 |tokens/s 8258.480 |walltime 2525.883 | -Transformer | epoch 0 | step 10000 |avg loss 7.474 |avg tokens 2192.180 |tokens/s 8247.883 |walltime 2658.776 | -Transformer | epoch 0 | step 10500 |avg loss 7.507 |avg tokens 2149.200 |tokens/s 8153.027 |walltime 2790.580 | -Transformer | epoch 0 | step 11000 |avg loss 7.589 |avg tokens 2169.804 |tokens/s 8250.802 |walltime 2922.070 | -Transformer | epoch 0 | step 11500 |avg loss 7.571 |avg tokens 2169.048 |tokens/s 8224.166 |walltime 3053.941 | -Transformer | epoch 0 | step 12000 |avg loss 7.559 |avg tokens 2196.918 |tokens/s 8321.068 |walltime 3185.950 | -Transformer | epoch 0 | step 12500 |avg loss 7.508 |avg tokens 2182.824 |tokens/s 8205.070 |walltime 3318.967 | -Transformer | epoch 0 | step 13000 |avg loss 7.531 |avg tokens 2203.356 |tokens/s 8294.488 |walltime 3451.788 | -Transformer | epoch 0 | step 13500 |avg loss 7.568 |avg tokens 2217.090 |tokens/s 8380.963 |walltime 3584.057 | -Transformer | epoch 0 | step 14000 |avg loss 7.592 |avg tokens 2166.636 |tokens/s 8187.261 |walltime 3716.374 | -Transformer | epoch 0 | step 14500 |avg loss 7.608 |avg tokens 2170.448 |tokens/s 8227.936 |walltime 3848.270 | -Transformer | epoch 0 | step 15000 |avg loss 7.622 |avg tokens 2201.498 |tokens/s 8309.194 |walltime 3980.743 | -Transformer | epoch 0 | step 15500 |avg loss 7.629 |avg tokens 2192.570 |tokens/s 8152.767 |walltime 4115.211 | -Transformer | epoch 0 | step 16000 |avg loss 7.591 |avg tokens 2207.126 |tokens/s 8229.432 |walltime 4249.311 | -Transformer | epoch 0 | step 16500 |avg loss 7.664 |avg tokens 2186.202 |tokens/s 8209.315 |walltime 4382.464 | -Transformer | epoch 0 | step 17000 |avg loss 7.657 |avg tokens 2189.744 |tokens/s 8197.156 |walltime 4516.032 | -Transformer | epoch 0 | step 17500 |avg loss 7.635 |avg tokens 2169.092 |tokens/s 8114.278 |walltime 4649.691 | -Transformer | epoch 0 | step 18000 |avg loss 7.679 |avg tokens 2165.366 |tokens/s 8174.168 |walltime 4782.142 | -Transformer | epoch 0 | step 18500 |avg loss 7.607 |avg tokens 2196.778 |tokens/s 8187.382 |walltime 4916.299 | -Transformer | epoch 0 | step 19000 |avg loss 7.680 |avg tokens 2184.738 |tokens/s 8244.761 |walltime 5048.791 | -Transformer | epoch 0 | step 19500 |avg loss 7.651 |avg tokens 2166.342 |tokens/s 8219.323 |walltime 5180.575 | -Transformer | epoch 0 | step 20000 |avg loss 7.670 |avg tokens 2161.914 |tokens/s 8225.962 |walltime 5311.983 | -Transformer | epoch 0 | step 20500 |avg loss 7.680 |avg tokens 2166.076 |tokens/s 8209.319 |walltime 5443.911 | -Transformer | epoch 0 | step 21000 |avg loss 7.748 |avg tokens 2191.680 |tokens/s 8306.427 |walltime 5575.837 | -Transformer | epoch 0 | step 21500 |avg loss 7.697 |avg tokens 2194.442 |tokens/s 8309.224 |walltime 5707.886 | -Transformer | epoch 0 | step 22000 |avg loss 7.689 |avg tokens 2204.234 |tokens/s 8307.733 |walltime 5840.547 | -Transformer | epoch 0 | step 22500 |avg loss 7.699 |avg tokens 2172.204 |tokens/s 8269.636 |walltime 5971.884 | -Transformer | epoch 0 | step 23000 |avg loss 7.635 |avg tokens 2172.254 |tokens/s 8226.100 |walltime 6103.918 | -Transformer | epoch 0 | step 23500 |avg loss 7.683 |avg tokens 2178.170 |tokens/s 8304.169 |walltime 6235.067 | -Transformer | epoch 0 | step 24000 |avg loss 7.701 |avg tokens 2163.650 |tokens/s 8237.664 |walltime 6366.394 | -Transformer | epoch 0 | step 24500 |avg loss 7.637 |avg tokens 2169.594 |tokens/s 8213.130 |walltime 6498.475 | -Transformer | epoch 0 | step 25000 |avg loss 7.607 |avg tokens 2197.396 |tokens/s 8290.153 |walltime 6631.005 | -Transformer | epoch 0 | step 25500 |avg loss 7.616 |avg tokens 2205.076 |tokens/s 8286.256 |walltime 6764.061 | -Transformer | epoch 0 | step 26000 |avg loss 7.589 |avg tokens 2215.762 |tokens/s 8329.483 |walltime 6897.069 | -Transformer | epoch 0 | step 26500 |avg loss 7.615 |avg tokens 2203.484 |tokens/s 8313.073 |walltime 7029.600 | -Transformer | epoch 0 | step 27000 |avg loss 7.633 |avg tokens 2177.088 |tokens/s 8257.757 |walltime 7161.421 | -Transformer | epoch 0 | step 27500 |avg loss 7.626 |avg tokens 2186.434 |tokens/s 8254.484 |walltime 7293.860 | -Transformer | epoch 0 | step 28000 |avg loss 7.655 |avg tokens 2194.886 |tokens/s 8279.447 |walltime 7426.410 | -Transformer | epoch 0 | step 28500 |avg loss 7.636 |avg tokens 2194.806 |tokens/s 8327.890 |walltime 7558.184 | -Transformer | epoch 0 | step 29000 |avg loss 7.669 |avg tokens 2164.240 |tokens/s 8250.329 |walltime 7689.345 | -Transformer | epoch 0 | step 29500 |avg loss 7.639 |avg tokens 2199.542 |tokens/s 8324.032 |walltime 7821.465 | -Transformer | epoch 0 | step 30000 |avg loss 7.660 |avg tokens 2167.926 |tokens/s 8250.513 |walltime 7952.847 | -Transformer | epoch 0 | step 30500 |avg loss 7.661 |avg tokens 2195.226 |tokens/s 8314.417 |walltime 8084.860 | -Transformer | epoch 0 | step 31000 |avg loss 7.687 |avg tokens 2180.980 |tokens/s 8291.038 |walltime 8216.386 | -Transformer | epoch 0 | step 31500 |avg loss 7.632 |avg tokens 2180.762 |tokens/s 8259.813 |walltime 8348.397 | -Transformer | epoch 0 | step 32000 |avg loss 7.606 |avg tokens 2193.666 |tokens/s 8318.803 |walltime 8480.246 | -Transformer | epoch 0 | step 32500 |avg loss 7.658 |avg tokens 2165.796 |tokens/s 8270.194 |walltime 8611.186 | -Transformer | epoch 0 | step 33000 |avg loss 7.666 |avg tokens 2182.462 |tokens/s 8276.988 |walltime 8743.025 | -Transformer | epoch 0 | step 33500 |avg loss 7.631 |avg tokens 2200.074 |tokens/s 8322.422 |walltime 8875.203 | -Transformer | epoch 0 | step 34000 |avg loss 7.577 |avg tokens 2211.412 |tokens/s 8341.737 |walltime 9007.754 | -Transformer | epoch 0 | step 34500 |avg loss 7.618 |avg tokens 2174.824 |tokens/s 8299.537 |walltime 9138.775 | -Transformer | epoch 0 | step 35000 |avg loss 7.602 |avg tokens 2174.564 |tokens/s 8271.678 |walltime 9270.221 | -Transformer | epoch 0 | step 35500 |avg loss 7.679 |avg tokens 2162.148 |tokens/s 8243.202 |walltime 9401.369 | -Transformer | epoch 0 | step 36000 |avg loss 7.601 |avg tokens 2165.980 |tokens/s 8197.896 |walltime 9533.474 | -Transformer | epoch 0 | step 36500 |avg loss 7.654 |avg tokens 2203.624 |tokens/s 8294.220 |walltime 9666.315 | -Transformer | epoch 0 | step 37000 |avg loss 7.662 |avg tokens 2163.496 |tokens/s 8162.756 |walltime 9798.838 | -Transformer | epoch 0 | step 37500 |avg loss 7.597 |avg tokens 2172.708 |tokens/s 8160.810 |walltime 9931.956 | -Transformer | epoch 0 | step 38000 |avg loss 7.569 |avg tokens 2200.082 |tokens/s 8258.275 |walltime 10065.161 | -Transformer | epoch 0 | step 38500 |avg loss 7.595 |avg tokens 2195.128 |tokens/s 8272.491 |walltime 10197.837 | -Transformer | epoch 0 | step 39000 |avg loss 7.565 |avg tokens 2222.478 |tokens/s 8259.310 |walltime 10332.381 | -Transformer | epoch 0 | step 39500 |avg loss 7.607 |avg tokens 2195.140 |tokens/s 8306.001 |walltime 10464.523 | -Transformer | epoch 0 | step 40000 |avg loss 7.575 |avg tokens 2185.690 |tokens/s 8245.382 |walltime 10597.063 | -Transformer | epoch 0 | step 40500 |avg loss 7.563 |avg tokens 2207.220 |tokens/s 8236.402 |walltime 10731.055 | -Transformer | epoch 0 | step 41000 |avg loss 7.560 |avg tokens 2187.070 |tokens/s 8160.796 |walltime 10865.054 | -Transformer | epoch 0 | step 41500 |avg loss 7.597 |avg tokens 2163.030 |tokens/s 8155.508 |walltime 10997.665 | -Transformer | epoch 0 | step 42000 |avg loss 7.563 |avg tokens 2152.882 |tokens/s 8121.644 |walltime 11130.205 | -Transformer | epoch 0 | step 42500 |avg loss 7.549 |avg tokens 2216.850 |tokens/s 8258.616 |walltime 11264.419 | -Transformer | epoch 0 | step 43000 |avg loss 7.590 |avg tokens 2175.198 |tokens/s 8150.641 |walltime 11397.857 | -Transformer | epoch 0 | step 43500 |avg loss 7.576 |avg tokens 2187.446 |tokens/s 8150.600 |walltime 11532.046 | -Transformer | epoch 0 | step 44000 |avg loss 7.541 |avg tokens 2193.696 |tokens/s 8226.812 |walltime 11665.372 | -Transformer | epoch 0 | step 44500 |avg loss 7.548 |avg tokens 2167.230 |tokens/s 8153.264 |walltime 11798.278 | -Transformer | epoch 0 | step 45000 |avg loss 7.520 |avg tokens 2162.548 |tokens/s 8161.142 |walltime 11930.768 | -Transformer | epoch 0 | step 45500 |avg loss 7.479 |avg tokens 2178.632 |tokens/s 8151.418 |walltime 12064.403 | -Transformer | epoch 0 | step 46000 |avg loss 7.515 |avg tokens 2166.516 |tokens/s 8150.569 |walltime 12197.309 | -Transformer | epoch 0 | step 46500 |avg loss 7.544 |avg tokens 2177.210 |tokens/s 8173.620 |walltime 12330.494 | -Transformer | epoch 0 | step 47000 |avg loss 7.516 |avg tokens 2159.966 |tokens/s 8112.725 |walltime 12463.616 | -Transformer | epoch 0 | step 47500 |avg loss 7.445 |avg tokens 2183.310 |tokens/s 8201.379 |walltime 12596.723 | -Transformer | epoch 0 | step 48000 |avg loss 7.485 |avg tokens 2207.598 |tokens/s 8241.243 |walltime 12730.659 | -Transformer | epoch 0 | step 48500 |avg loss 7.527 |avg tokens 2174.254 |tokens/s 8205.941 |walltime 12863.139 | -Transformer | epoch 0 | step 49000 |avg loss 7.517 |avg tokens 2180.624 |tokens/s 8209.757 |walltime 12995.946 | -Transformer | epoch 0 | step 49500 |avg loss 7.535 |avg tokens 2156.342 |tokens/s 8178.281 |walltime 13127.779 | -Transformer | epoch 0 | step 50000 |avg loss 7.476 |avg tokens 2199.442 |tokens/s 8208.834 |walltime 13261.747 | -Transformer | epoch 0 | step 50500 |avg loss 7.512 |avg tokens 2164.196 |tokens/s 8200.566 |walltime 13393.702 | -Transformer | epoch 0 | step 51000 |avg loss 7.519 |avg tokens 2210.832 |tokens/s 8300.968 |walltime 13526.869 | -Transformer | epoch 0 | step 51500 |avg loss 7.526 |avg tokens 2170.272 |tokens/s 8154.514 |walltime 13659.940 | -Transformer | epoch 0 | step 52000 |avg loss 7.566 |avg tokens 2144.520 |tokens/s 8236.081 |walltime 13790.131 | -Transformer | epoch 0 | step 52500 |avg loss 7.477 |avg tokens 2173.838 |tokens/s 8202.748 |walltime 13922.638 | -Transformer | epoch 0 | step 53000 |avg loss 7.471 |avg tokens 2186.586 |tokens/s 8320.763 |walltime 14054.031 | -Transformer | epoch 0 | step 53500 |avg loss 7.493 |avg tokens 2162.470 |tokens/s 8276.680 |walltime 14184.667 | -Transformer | epoch 0 | step 54000 |avg loss 7.511 |avg tokens 2185.144 |tokens/s 8339.763 |walltime 14315.675 | -Transformer | epoch 0 | step 54500 |avg loss 7.515 |avg tokens 2181.010 |tokens/s 8323.035 |walltime 14446.698 | -Transformer | epoch 0 | step 55000 |avg loss 7.471 |avg tokens 2164.734 |tokens/s 8169.513 |walltime 14579.186 | -Transformer | epoch 0 | step 55500 |avg loss 7.462 |avg tokens 2175.078 |tokens/s 8194.102 |walltime 14711.908 | -Transformer | epoch 0 | step 56000 |avg loss 7.465 |avg tokens 2165.984 |tokens/s 8260.911 |walltime 14843.007 | -Transformer | epoch 0 | step 56500 |avg loss 7.467 |avg tokens 2200.316 |tokens/s 8346.269 |walltime 14974.821 | -Transformer | epoch 0 | step 57000 |avg loss 7.519 |avg tokens 2158.848 |tokens/s 8250.587 |walltime 15105.651 | -Transformer | epoch 0 | step 57500 |avg loss 7.450 |avg tokens 2168.044 |tokens/s 8260.896 |walltime 15236.874 | -Transformer | epoch 0 | step 58000 |avg loss 7.454 |avg tokens 2158.620 |tokens/s 8225.355 |walltime 15368.092 | -Transformer | epoch 0 | step 58500 |avg loss 7.475 |avg tokens 2188.858 |tokens/s 8303.732 |walltime 15499.891 | -Transformer | epoch 0 | step 59000 |avg loss 7.450 |avg tokens 2168.490 |tokens/s 8143.818 |walltime 15633.029 | -Transformer | epoch 0 | step 59500 |avg loss 7.468 |avg tokens 2155.120 |tokens/s 8118.629 |walltime 15765.755 | -Transformer | epoch 0 | step 60000 |avg loss 7.389 |avg tokens 2186.216 |tokens/s 8323.939 |walltime 15897.077 | -Transformer | epoch 0 | step 60500 |avg loss 7.435 |avg tokens 2178.198 |tokens/s 8281.456 |walltime 16028.587 | -Transformer | epoch 0 | step 61000 |avg loss 7.452 |avg tokens 2154.616 |tokens/s 8227.945 |walltime 16159.520 | -Transformer | epoch 0 | step 61500 |avg loss 7.475 |avg tokens 2174.858 |tokens/s 8334.523 |walltime 16289.993 | -Transformer | epoch 0 | step 62000 |avg loss 7.472 |avg tokens 2162.480 |tokens/s 8133.308 |walltime 16422.933 | -Transformer | epoch 0 | step 62500 |avg loss 7.455 |avg tokens 2164.270 |tokens/s 8110.761 |walltime 16556.352 | -Transformer | epoch 0 | step 63000 |avg loss 7.449 |avg tokens 2176.958 |tokens/s 8196.144 |walltime 16689.156 | -Transformer | epoch 0 | step 63500 |avg loss 7.462 |avg tokens 2174.884 |tokens/s 8277.476 |walltime 16820.530 | -Transformer | epoch 0 | step 64000 |avg loss 7.412 |avg tokens 2208.194 |tokens/s 8305.859 |walltime 16953.460 | -Transformer | epoch 0 | step 64500 |avg loss 7.463 |avg tokens 2184.822 |tokens/s 8153.564 |walltime 17087.439 | -Transformer | epoch 0 | step 65000 |avg loss 7.402 |avg tokens 2204.512 |tokens/s 8204.052 |walltime 17221.794 | -Epoch time: 17264.9287276268 -Transformer | epoch 0 | step 65198 |avg loss 7.538 |avg tokens 2144.318 |tokens/s 7983.551 |walltime 17274.975 | -Validation loss on subset valid: 7.380476935892614 +Transformer | epoch 0 | step 10 |avg loss 16.047 |avg tokens 2293.100 |tokens/s 8597.717 |walltime 13.871 | +Transformer | epoch 0 | step 20 |avg loss 15.546 |avg tokens 2087.600 |tokens/s 8226.985 |walltime 16.408 | +Transformer | epoch 0 | step 30 |avg loss 14.958 |avg tokens 2203.500 |tokens/s 8398.852 |walltime 19.032 | +Transformer | epoch 0 | step 40 |avg loss 14.463 |avg tokens 2363.400 |tokens/s 9164.268 |walltime 21.611 | +Transformer | epoch 0 | step 50 |avg loss 14.010 |avg tokens 2095.200 |tokens/s 8157.421 |walltime 24.179 | +Transformer | epoch 0 | step 60 |avg loss 13.654 |avg tokens 2176.000 |tokens/s 8465.320 |walltime 26.750 | +Transformer | epoch 0 | step 70 |avg loss 13.464 |avg tokens 2333.200 |tokens/s 8987.934 |walltime 29.345 | +Transformer | epoch 0 | step 80 |avg loss 13.312 |avg tokens 2123.400 |tokens/s 8441.486 |walltime 31.861 | +Transformer | epoch 0 | step 90 |avg loss 13.058 |avg tokens 2038.500 |tokens/s 7959.712 |walltime 34.422 | +Transformer | epoch 0 | step 100 |avg loss 12.876 |avg tokens 1782.500 |tokens/s 7329.394 |walltime 36.854 | +Transformer | epoch 0 | step 110 |avg loss 12.742 |avg tokens 2300.600 |tokens/s 8670.944 |walltime 39.507 | +Transformer | epoch 0 | step 120 |avg loss 12.691 |avg tokens 2043.900 |tokens/s 7925.793 |walltime 42.086 | +Transformer | epoch 0 | step 130 |avg loss 12.659 |avg tokens 2177.500 |tokens/s 8510.082 |walltime 44.645 | +Transformer | epoch 0 | step 140 |avg loss 12.516 |avg tokens 2155.300 |tokens/s 8579.654 |walltime 47.157 | +Transformer | epoch 0 | step 150 |avg loss 12.357 |avg tokens 2255.300 |tokens/s 8714.033 |walltime 49.745 | +Transformer | epoch 0 | step 160 |avg loss 12.215 |avg tokens 2338.600 |tokens/s 8916.462 |walltime 52.368 | +Transformer | epoch 0 | step 170 |avg loss 12.110 |avg tokens 2129.200 |tokens/s 8298.654 |walltime 54.933 | +Transformer | epoch 0 | step 180 |avg loss 11.931 |avg tokens 2258.900 |tokens/s 8277.203 |walltime 57.662 | +Transformer | epoch 0 | step 190 |avg loss 12.059 |avg tokens 2171.000 |tokens/s 8376.609 |walltime 60.254 | +Transformer | epoch 0 | step 200 |avg loss 11.962 |avg tokens 2249.100 |tokens/s 8506.045 |walltime 62.898 | +Transformer | epoch 0 | step 210 |avg loss 11.673 |avg tokens 2080.000 |tokens/s 8161.059 |walltime 65.447 | +Transformer | epoch 0 | step 220 |avg loss 11.729 |avg tokens 2200.000 |tokens/s 8188.724 |walltime 68.134 | +Transformer | epoch 0 | step 230 |avg loss 11.697 |avg tokens 2375.400 |tokens/s 8781.097 |walltime 70.839 | +Transformer | epoch 0 | step 240 |avg loss 11.630 |avg tokens 2310.400 |tokens/s 8576.915 |walltime 73.533 | +Transformer | epoch 0 | step 250 |avg loss 11.796 |avg tokens 1968.500 |tokens/s 8130.878 |walltime 75.954 | +Transformer | epoch 0 | step 260 |avg loss 11.584 |avg tokens 2365.800 |tokens/s 8974.729 |walltime 78.590 | +Transformer | epoch 0 | step 270 |avg loss 11.530 |avg tokens 2168.900 |tokens/s 8150.039 |walltime 81.251 | +Transformer | epoch 0 | step 280 |avg loss 11.469 |avg tokens 2147.300 |tokens/s 8206.241 |walltime 83.868 | +Transformer | epoch 0 | step 290 |avg loss 11.462 |avg tokens 2226.200 |tokens/s 8421.159 |walltime 86.511 | +Transformer | epoch 0 | step 300 |avg loss 11.407 |avg tokens 2374.800 |tokens/s 8602.509 |walltime 89.272 | +Transformer | epoch 0 | step 310 |avg loss 11.646 |avg tokens 2178.600 |tokens/s 8372.618 |walltime 91.874 | +Transformer | epoch 0 | step 320 |avg loss 11.573 |avg tokens 2360.900 |tokens/s 8871.899 |walltime 94.535 | +Transformer | epoch 0 | step 330 |avg loss 11.396 |avg tokens 2297.600 |tokens/s 8318.021 |walltime 97.297 | +Transformer | epoch 0 | step 340 |avg loss 11.308 |avg tokens 2264.400 |tokens/s 8446.999 |walltime 99.978 | +Transformer | epoch 0 | step 350 |avg loss 11.417 |avg tokens 2261.700 |tokens/s 8397.763 |walltime 102.671 | +Transformer | epoch 0 | step 360 |avg loss 11.209 |avg tokens 1943.600 |tokens/s 7665.145 |walltime 105.207 | +Transformer | epoch 0 | step 370 |avg loss 11.366 |avg tokens 2276.000 |tokens/s 8609.758 |walltime 107.850 | +Transformer | epoch 0 | step 380 |avg loss 11.441 |avg tokens 2079.200 |tokens/s 7832.078 |walltime 110.505 | +Transformer | epoch 0 | step 390 |avg loss 11.367 |avg tokens 2271.500 |tokens/s 8533.221 |walltime 113.167 | +Transformer | epoch 0 | step 400 |avg loss 11.564 |avg tokens 2016.300 |tokens/s 7919.232 |walltime 115.713 | +Transformer | epoch 0 | step 410 |avg loss 11.158 |avg tokens 2400.400 |tokens/s 8550.936 |walltime 118.520 | +Transformer | epoch 0 | step 420 |avg loss 11.295 |avg tokens 2371.700 |tokens/s 8733.250 |walltime 121.236 | +Transformer | epoch 0 | step 430 |avg loss 11.459 |avg tokens 1983.100 |tokens/s 8088.462 |walltime 123.688 | +Transformer | epoch 0 | step 440 |avg loss 11.366 |avg tokens 2270.700 |tokens/s 8702.904 |walltime 126.297 | +Transformer | epoch 0 | step 450 |avg loss 11.281 |avg tokens 2318.100 |tokens/s 8657.439 |walltime 128.974 | +Transformer | epoch 0 | step 460 |avg loss 11.169 |avg tokens 2047.200 |tokens/s 7770.958 |walltime 131.609 | +Transformer | epoch 0 | step 470 |avg loss 11.291 |avg tokens 2126.000 |tokens/s 8299.937 |walltime 134.170 | +Transformer | epoch 0 | step 480 |avg loss 11.101 |avg tokens 2156.400 |tokens/s 8348.168 |walltime 136.753 | +Transformer | epoch 0 | step 490 |avg loss 11.436 |avg tokens 1971.200 |tokens/s 7961.200 |walltime 139.229 | +Transformer | epoch 0 | step 500 |avg loss 11.072 |avg tokens 2403.200 |tokens/s 8812.772 |walltime 141.956 | +Transformer | epoch 0 | step 510 |avg loss 10.953 |avg tokens 2344.000 |tokens/s 8609.795 |walltime 144.679 | +Transformer | epoch 0 | step 520 |avg loss 11.043 |avg tokens 2238.700 |tokens/s 8396.198 |walltime 147.345 | +Transformer | epoch 0 | step 530 |avg loss 11.065 |avg tokens 2128.000 |tokens/s 8042.625 |walltime 149.991 | +Transformer | epoch 0 | step 540 |avg loss 11.350 |avg tokens 2318.200 |tokens/s 9013.190 |walltime 152.563 | +Transformer | epoch 0 | step 550 |avg loss 10.999 |avg tokens 2201.600 |tokens/s 8201.333 |walltime 155.247 | +Transformer | epoch 0 | step 560 |avg loss 10.971 |avg tokens 2337.600 |tokens/s 8548.321 |walltime 157.982 | +Transformer | epoch 0 | step 570 |avg loss 10.929 |avg tokens 2236.300 |tokens/s 8382.157 |walltime 160.650 | +Transformer | epoch 0 | step 580 |avg loss 11.151 |avg tokens 2239.300 |tokens/s 8641.560 |walltime 163.241 | +Transformer | epoch 0 | step 590 |avg loss 11.123 |avg tokens 2044.000 |tokens/s 7927.494 |walltime 165.820 | +Transformer | epoch 0 | step 600 |avg loss 10.668 |avg tokens 2152.700 |tokens/s 8374.953 |walltime 168.390 | +Transformer | epoch 0 | step 610 |avg loss 10.742 |avg tokens 2351.200 |tokens/s 8472.037 |walltime 171.165 | +Transformer | epoch 0 | step 620 |avg loss 10.866 |avg tokens 2264.800 |tokens/s 8445.780 |walltime 173.847 | +Transformer | epoch 0 | step 630 |avg loss 10.963 |avg tokens 2082.800 |tokens/s 7954.548 |walltime 176.465 | +Transformer | epoch 0 | step 640 |avg loss 10.812 |avg tokens 2230.400 |tokens/s 8199.669 |walltime 179.185 | +Transformer | epoch 0 | step 650 |avg loss 10.853 |avg tokens 1975.600 |tokens/s 7700.377 |walltime 181.751 | +Transformer | epoch 0 | step 660 |avg loss 10.728 |avg tokens 2162.400 |tokens/s 7913.353 |walltime 184.484 | +Transformer | epoch 0 | step 670 |avg loss 10.970 |avg tokens 1908.400 |tokens/s 7492.528 |walltime 187.031 | +Transformer | epoch 0 | step 680 |avg loss 10.765 |avg tokens 2251.800 |tokens/s 8118.611 |walltime 189.804 | +Transformer | epoch 0 | step 690 |avg loss 10.915 |avg tokens 2184.700 |tokens/s 8459.321 |walltime 192.387 | +Transformer | epoch 0 | step 700 |avg loss 10.819 |avg tokens 2283.200 |tokens/s 8569.392 |walltime 195.051 | +Transformer | epoch 0 | step 710 |avg loss 10.791 |avg tokens 2340.500 |tokens/s 8510.092 |walltime 197.802 | +Transformer | epoch 0 | step 720 |avg loss 10.590 |avg tokens 2274.400 |tokens/s 8331.858 |walltime 200.531 | +Transformer | epoch 0 | step 730 |avg loss 10.473 |avg tokens 2388.800 |tokens/s 8474.237 |walltime 203.350 | +Transformer | epoch 0 | step 740 |avg loss 10.493 |avg tokens 2326.400 |tokens/s 8468.517 |walltime 206.097 | +Transformer | epoch 0 | step 750 |avg loss 10.406 |avg tokens 2272.000 |tokens/s 8339.592 |walltime 208.822 | +Transformer | epoch 0 | step 760 |avg loss 10.501 |avg tokens 2067.900 |tokens/s 7951.091 |walltime 211.422 | +Transformer | epoch 0 | step 770 |avg loss 10.511 |avg tokens 2209.400 |tokens/s 8164.252 |walltime 214.129 | +Transformer | epoch 0 | step 780 |avg loss 10.512 |avg tokens 2332.000 |tokens/s 8428.650 |walltime 216.895 | +Transformer | epoch 0 | step 790 |avg loss 10.696 |avg tokens 2062.000 |tokens/s 8049.556 |walltime 219.457 | +Transformer | epoch 0 | step 800 |avg loss 10.482 |avg tokens 2229.300 |tokens/s 8270.083 |walltime 222.153 | +Transformer | epoch 0 | step 810 |avg loss 10.410 |avg tokens 2149.800 |tokens/s 8134.432 |walltime 224.795 | +Transformer | epoch 0 | step 820 |avg loss 10.483 |avg tokens 2299.200 |tokens/s 8270.671 |walltime 227.575 | +Transformer | epoch 0 | step 830 |avg loss 10.627 |avg tokens 2312.300 |tokens/s 8625.584 |walltime 230.256 | +Transformer | epoch 0 | step 840 |avg loss 10.342 |avg tokens 2215.400 |tokens/s 8220.974 |walltime 232.951 | +Transformer | epoch 0 | step 850 |avg loss 10.362 |avg tokens 2299.200 |tokens/s 8554.010 |walltime 235.639 | +Transformer | epoch 0 | step 860 |avg loss 10.418 |avg tokens 2176.400 |tokens/s 8381.823 |walltime 238.235 | +Transformer | epoch 0 | step 870 |avg loss 10.407 |avg tokens 2230.400 |tokens/s 8252.910 |walltime 240.938 | +Transformer | epoch 0 | step 880 |avg loss 10.290 |avg tokens 2253.400 |tokens/s 8296.841 |walltime 243.654 | +Transformer | epoch 0 | step 890 |avg loss 10.284 |avg tokens 2315.900 |tokens/s 8398.119 |walltime 246.412 | +Transformer | epoch 0 | step 900 |avg loss 10.432 |avg tokens 1911.800 |tokens/s 7598.174 |walltime 248.928 | +Transformer | epoch 0 | step 910 |avg loss 10.453 |avg tokens 2127.600 |tokens/s 8179.034 |walltime 251.529 | +Transformer | epoch 0 | step 920 |avg loss 10.189 |avg tokens 2416.200 |tokens/s 8668.195 |walltime 254.317 | +Transformer | epoch 0 | step 930 |avg loss 10.244 |avg tokens 2197.100 |tokens/s 8201.240 |walltime 256.995 | +Transformer | epoch 0 | step 940 |avg loss 10.347 |avg tokens 2252.000 |tokens/s 8485.135 |walltime 259.650 | +Transformer | epoch 0 | step 950 |avg loss 10.207 |avg tokens 2212.100 |tokens/s 8037.183 |walltime 262.402 | +Transformer | epoch 0 | step 960 |avg loss 10.405 |avg tokens 2058.500 |tokens/s 8170.757 |walltime 264.921 | +Transformer | epoch 0 | step 970 |avg loss 10.441 |avg tokens 2147.000 |tokens/s 8175.267 |walltime 267.547 | +Transformer | epoch 0 | step 980 |avg loss 10.323 |avg tokens 1942.800 |tokens/s 7552.757 |walltime 270.120 | +Transformer | epoch 0 | step 990 |avg loss 10.335 |avg tokens 1990.800 |tokens/s 7721.877 |walltime 272.698 | +Transformer | epoch 0 | step 1000 |avg loss 10.364 |avg tokens 2157.700 |tokens/s 8381.420 |walltime 275.272 | +Transformer | epoch 0 | step 1010 |avg loss 10.168 |avg tokens 2308.000 |tokens/s 8565.403 |walltime 277.967 | +Transformer | epoch 0 | step 1020 |avg loss 10.346 |avg tokens 2203.300 |tokens/s 8128.410 |walltime 280.677 | +Transformer | epoch 0 | step 1030 |avg loss 9.978 |avg tokens 2250.800 |tokens/s 8257.465 |walltime 283.403 | +Transformer | epoch 0 | step 1040 |avg loss 10.180 |avg tokens 2228.000 |tokens/s 8466.333 |walltime 286.035 | +Transformer | epoch 0 | step 1050 |avg loss 9.992 |avg tokens 2246.000 |tokens/s 8135.881 |walltime 288.795 | +Transformer | epoch 0 | step 1060 |avg loss 9.904 |avg tokens 2325.300 |tokens/s 8364.721 |walltime 291.575 | +Transformer | epoch 0 | step 1070 |avg loss 10.028 |avg tokens 2223.700 |tokens/s 8213.877 |walltime 294.283 | +Transformer | epoch 0 | step 1080 |avg loss 10.230 |avg tokens 2176.000 |tokens/s 8358.621 |walltime 296.886 | +Transformer | epoch 0 | step 1090 |avg loss 10.187 |avg tokens 2299.700 |tokens/s 8690.247 |walltime 299.532 | +Transformer | epoch 0 | step 1100 |avg loss 10.361 |avg tokens 2221.300 |tokens/s 8500.848 |walltime 302.145 | +Transformer | epoch 0 | step 1110 |avg loss 10.076 |avg tokens 2171.700 |tokens/s 8486.596 |walltime 304.704 | +Transformer | epoch 0 | step 1120 |avg loss 10.057 |avg tokens 2226.400 |tokens/s 8318.075 |walltime 307.381 | +Transformer | epoch 0 | step 1130 |avg loss 10.247 |avg tokens 2088.500 |tokens/s 8395.140 |walltime 309.869 | +Transformer | epoch 0 | step 1140 |avg loss 10.117 |avg tokens 2151.200 |tokens/s 8107.943 |walltime 312.522 | +Transformer | epoch 0 | step 1150 |avg loss 9.934 |avg tokens 2345.100 |tokens/s 8454.956 |walltime 315.295 | +Transformer | epoch 0 | step 1160 |avg loss 9.939 |avg tokens 2356.000 |tokens/s 8514.398 |walltime 318.063 | +Transformer | epoch 0 | step 1170 |avg loss 9.748 |avg tokens 2254.800 |tokens/s 8496.966 |walltime 320.716 | +Transformer | epoch 0 | step 1180 |avg loss 10.040 |avg tokens 2204.600 |tokens/s 8293.787 |walltime 323.374 | +Transformer | epoch 0 | step 1190 |avg loss 9.786 |avg tokens 2208.000 |tokens/s 8145.123 |walltime 326.085 | +Transformer | epoch 0 | step 1200 |avg loss 9.895 |avg tokens 2218.200 |tokens/s 8241.822 |walltime 328.777 | +Transformer | epoch 0 | step 1210 |avg loss 10.062 |avg tokens 2170.800 |tokens/s 8421.785 |walltime 331.354 | +Transformer | epoch 0 | step 1220 |avg loss 9.858 |avg tokens 2061.800 |tokens/s 7895.007 |walltime 333.966 | +Transformer | epoch 0 | step 1230 |avg loss 9.974 |avg tokens 2360.300 |tokens/s 8541.107 |walltime 336.729 | +Transformer | epoch 0 | step 1240 |avg loss 9.873 |avg tokens 2387.100 |tokens/s 8774.042 |walltime 339.450 | +Transformer | epoch 0 | step 1250 |avg loss 9.838 |avg tokens 2019.900 |tokens/s 8034.129 |walltime 341.964 | +Transformer | epoch 0 | step 1260 |avg loss 10.160 |avg tokens 2251.100 |tokens/s 8762.423 |walltime 344.533 | +Transformer | epoch 0 | step 1270 |avg loss 9.774 |avg tokens 2243.200 |tokens/s 8252.669 |walltime 347.251 | +Transformer | epoch 0 | step 1280 |avg loss 9.868 |avg tokens 2184.500 |tokens/s 8311.816 |walltime 349.879 | +Transformer | epoch 0 | step 1290 |avg loss 10.053 |avg tokens 1912.500 |tokens/s 7578.576 |walltime 352.403 | +Transformer | epoch 0 | step 1300 |avg loss 9.935 |avg tokens 2282.400 |tokens/s 8431.351 |walltime 355.110 | +Transformer | epoch 0 | step 1310 |avg loss 9.760 |avg tokens 2181.700 |tokens/s 8199.269 |walltime 357.771 | +Transformer | epoch 0 | step 1320 |avg loss 9.905 |avg tokens 2221.200 |tokens/s 8676.327 |walltime 360.331 | +Transformer | epoch 0 | step 1330 |avg loss 9.911 |avg tokens 2276.300 |tokens/s 8630.609 |walltime 362.968 | +Transformer | epoch 0 | step 1340 |avg loss 9.652 |avg tokens 2121.900 |tokens/s 8019.582 |walltime 365.614 | +Transformer | epoch 0 | step 1350 |avg loss 9.805 |avg tokens 2182.400 |tokens/s 7969.411 |walltime 368.353 | +Transformer | epoch 0 | step 1360 |avg loss 10.132 |avg tokens 2138.500 |tokens/s 8421.166 |walltime 370.892 | +Transformer | epoch 0 | step 1370 |avg loss 9.682 |avg tokens 2195.200 |tokens/s 8345.225 |walltime 373.523 | +Transformer | epoch 0 | step 1380 |avg loss 9.842 |avg tokens 1964.100 |tokens/s 7845.199 |walltime 376.026 | +Transformer | epoch 0 | step 1390 |avg loss 9.605 |avg tokens 2294.000 |tokens/s 8399.020 |walltime 378.757 | +Transformer | epoch 0 | step 1400 |avg loss 9.597 |avg tokens 2226.500 |tokens/s 8276.306 |walltime 381.448 | +Transformer | epoch 0 | step 1410 |avg loss 9.754 |avg tokens 2230.600 |tokens/s 8290.602 |walltime 384.138 | +Transformer | epoch 0 | step 1420 |avg loss 9.826 |avg tokens 2271.200 |tokens/s 8807.832 |walltime 386.717 | +Transformer | epoch 0 | step 1430 |avg loss 9.473 |avg tokens 2157.300 |tokens/s 8009.237 |walltime 389.410 | +Transformer | epoch 0 | step 1440 |avg loss 9.846 |avg tokens 1991.000 |tokens/s 7623.467 |walltime 392.022 | +Transformer | epoch 0 | step 1450 |avg loss 9.781 |avg tokens 2281.400 |tokens/s 8727.716 |walltime 394.636 | +Transformer | epoch 0 | step 1460 |avg loss 9.848 |avg tokens 2104.100 |tokens/s 8289.702 |walltime 397.174 | +Transformer | epoch 0 | step 1470 |avg loss 9.602 |avg tokens 2359.200 |tokens/s 8535.288 |walltime 399.938 | +Transformer | epoch 0 | step 1480 |avg loss 9.905 |avg tokens 2179.100 |tokens/s 8645.460 |walltime 402.459 | +Transformer | epoch 0 | step 1490 |avg loss 9.976 |avg tokens 2127.800 |tokens/s 8535.806 |walltime 404.952 | +Transformer | epoch 0 | step 1500 |avg loss 9.659 |avg tokens 2264.400 |tokens/s 8482.350 |walltime 407.621 | +Transformer | epoch 0 | step 1510 |avg loss 9.599 |avg tokens 2272.200 |tokens/s 8445.762 |walltime 410.311 | +Transformer | epoch 0 | step 1520 |avg loss 9.543 |avg tokens 2323.000 |tokens/s 8508.251 |walltime 413.042 | +Transformer | epoch 0 | step 1530 |avg loss 9.715 |avg tokens 2231.200 |tokens/s 8475.862 |walltime 415.674 | +Transformer | epoch 0 | step 1540 |avg loss 9.645 |avg tokens 2076.800 |tokens/s 8183.773 |walltime 418.212 | +Transformer | epoch 0 | step 1550 |avg loss 9.647 |avg tokens 2341.200 |tokens/s 8800.001 |walltime 420.872 | +Transformer | epoch 0 | step 1560 |avg loss 9.536 |avg tokens 2291.600 |tokens/s 8370.975 |walltime 423.610 | +Transformer | epoch 0 | step 1570 |avg loss 9.364 |avg tokens 1984.800 |tokens/s 7622.064 |walltime 426.214 | +Transformer | epoch 0 | step 1580 |avg loss 9.280 |avg tokens 2344.000 |tokens/s 8343.895 |walltime 429.023 | +Transformer | epoch 0 | step 1590 |avg loss 9.541 |avg tokens 2099.300 |tokens/s 7878.583 |walltime 431.688 | +Transformer | epoch 0 | step 1600 |avg loss 9.584 |avg tokens 1976.100 |tokens/s 7657.193 |walltime 434.268 | +Transformer | epoch 0 | step 1610 |avg loss 9.597 |avg tokens 2085.600 |tokens/s 7975.951 |walltime 436.883 | +Transformer | epoch 0 | step 1620 |avg loss 9.353 |avg tokens 2055.800 |tokens/s 7916.315 |walltime 439.480 | +Transformer | epoch 0 | step 1630 |avg loss 9.365 |avg tokens 2353.600 |tokens/s 8275.114 |walltime 442.324 | +Transformer | epoch 0 | step 1640 |avg loss 9.537 |avg tokens 2129.500 |tokens/s 8300.518 |walltime 444.890 | +Transformer | epoch 0 | step 1650 |avg loss 9.964 |avg tokens 2079.100 |tokens/s 8090.858 |walltime 447.460 | +Transformer | epoch 0 | step 1660 |avg loss 9.490 |avg tokens 1991.400 |tokens/s 7512.622 |walltime 450.110 | +Transformer | epoch 0 | step 1670 |avg loss 9.602 |avg tokens 1955.600 |tokens/s 7397.654 |walltime 452.754 | +Transformer | epoch 0 | step 1680 |avg loss 9.459 |avg tokens 1878.300 |tokens/s 7490.227 |walltime 455.262 | +Transformer | epoch 0 | step 1690 |avg loss 9.504 |avg tokens 2058.600 |tokens/s 7913.875 |walltime 457.863 | +Transformer | epoch 0 | step 1700 |avg loss 9.355 |avg tokens 2066.100 |tokens/s 7662.433 |walltime 460.559 | +Transformer | epoch 0 | step 1710 |avg loss 9.584 |avg tokens 2272.900 |tokens/s 8497.453 |walltime 463.234 | +Transformer | epoch 0 | step 1720 |avg loss 9.393 |avg tokens 2304.800 |tokens/s 8399.514 |walltime 465.978 | +Transformer | epoch 0 | step 1730 |avg loss 9.134 |avg tokens 2304.000 |tokens/s 8294.560 |walltime 468.756 | +Transformer | epoch 0 | step 1740 |avg loss 9.519 |avg tokens 2136.900 |tokens/s 8048.866 |walltime 471.411 | +Transformer | epoch 0 | step 1750 |avg loss 9.208 |avg tokens 2156.800 |tokens/s 7935.779 |walltime 474.129 | +Transformer | epoch 0 | step 1760 |avg loss 9.338 |avg tokens 2226.900 |tokens/s 8106.863 |walltime 476.875 | +Transformer | epoch 0 | step 1770 |avg loss 9.469 |avg tokens 2093.600 |tokens/s 7881.256 |walltime 479.532 | +Transformer | epoch 0 | step 1780 |avg loss 9.265 |avg tokens 2183.100 |tokens/s 8156.598 |walltime 482.208 | +Transformer | epoch 0 | step 1790 |avg loss 9.085 |avg tokens 2293.600 |tokens/s 8290.621 |walltime 484.975 | +Transformer | epoch 0 | step 1800 |avg loss 9.107 |avg tokens 2264.000 |tokens/s 8215.175 |walltime 487.731 | +Transformer | epoch 0 | step 1810 |avg loss 9.356 |avg tokens 2221.200 |tokens/s 8038.829 |walltime 490.494 | +Transformer | epoch 0 | step 1820 |avg loss 9.617 |avg tokens 1944.300 |tokens/s 7863.443 |walltime 492.966 | +Transformer | epoch 0 | step 1830 |avg loss 9.412 |avg tokens 2195.200 |tokens/s 8208.146 |walltime 495.641 | +Transformer | epoch 0 | step 1840 |avg loss 9.102 |avg tokens 2233.600 |tokens/s 8105.597 |walltime 498.396 | +Transformer | epoch 0 | step 1850 |avg loss 9.632 |avg tokens 1956.800 |tokens/s 7819.077 |walltime 500.899 | +Transformer | epoch 0 | step 1860 |avg loss 9.442 |avg tokens 2218.600 |tokens/s 8449.550 |walltime 503.525 | +Transformer | epoch 0 | step 1870 |avg loss 9.299 |avg tokens 2180.300 |tokens/s 8242.237 |walltime 506.170 | +Transformer | epoch 0 | step 1880 |avg loss 9.466 |avg tokens 2128.900 |tokens/s 8262.678 |walltime 508.747 | +Transformer | epoch 0 | step 1890 |avg loss 9.413 |avg tokens 2025.700 |tokens/s 7912.339 |walltime 511.307 | +Transformer | epoch 0 | step 1900 |avg loss 8.965 |avg tokens 2355.200 |tokens/s 8469.146 |walltime 514.088 | +Transformer | epoch 0 | step 1910 |avg loss 9.288 |avg tokens 2279.900 |tokens/s 8593.861 |walltime 516.741 | +Transformer | epoch 0 | step 1920 |avg loss 9.308 |avg tokens 2152.000 |tokens/s 8460.269 |walltime 519.284 | +Transformer | epoch 0 | step 1930 |avg loss 9.312 |avg tokens 2198.100 |tokens/s 8274.553 |walltime 521.941 | +Transformer | epoch 0 | step 1940 |avg loss 9.513 |avg tokens 2161.200 |tokens/s 8399.163 |walltime 524.514 | +Transformer | epoch 0 | step 1950 |avg loss 8.969 |avg tokens 2396.100 |tokens/s 8504.667 |walltime 527.331 | +Transformer | epoch 0 | step 1960 |avg loss 9.618 |avg tokens 1844.300 |tokens/s 7771.568 |walltime 529.704 | +Transformer | epoch 0 | step 1970 |avg loss 8.994 |avg tokens 2330.400 |tokens/s 8308.622 |walltime 532.509 | +Transformer | epoch 0 | step 1980 |avg loss 9.277 |avg tokens 2235.000 |tokens/s 8450.229 |walltime 535.154 | +Transformer | epoch 0 | step 1990 |avg loss 8.916 |avg tokens 2372.800 |tokens/s 8450.080 |walltime 537.962 | +Transformer | epoch 0 | step 2000 |avg loss 8.911 |avg tokens 2299.200 |tokens/s 8434.318 |walltime 540.688 | +Transformer | epoch 0 | step 2010 |avg loss 9.031 |avg tokens 2328.100 |tokens/s 8795.522 |walltime 543.335 | +Transformer | epoch 0 | step 2020 |avg loss 9.394 |avg tokens 2163.700 |tokens/s 8440.583 |walltime 545.899 | +Transformer | epoch 0 | step 2030 |avg loss 9.539 |avg tokens 2007.800 |tokens/s 7903.672 |walltime 548.439 | +Transformer | epoch 0 | step 2040 |avg loss 9.249 |avg tokens 2283.800 |tokens/s 8578.448 |walltime 551.101 | +Transformer | epoch 0 | step 2050 |avg loss 9.069 |avg tokens 2138.900 |tokens/s 7997.715 |walltime 553.776 | +Transformer | epoch 0 | step 2060 |avg loss 8.879 |avg tokens 2215.200 |tokens/s 8207.219 |walltime 556.475 | +Transformer | epoch 0 | step 2070 |avg loss 8.967 |avg tokens 2314.700 |tokens/s 8393.446 |walltime 559.232 | +Transformer | epoch 0 | step 2080 |avg loss 8.988 |avg tokens 2319.200 |tokens/s 8402.708 |walltime 561.992 | +Transformer | epoch 0 | step 2090 |avg loss 9.191 |avg tokens 2062.400 |tokens/s 8147.201 |walltime 564.524 | +Transformer | epoch 0 | step 2100 |avg loss 9.176 |avg tokens 2131.900 |tokens/s 8356.514 |walltime 567.075 | +Transformer | epoch 0 | step 2110 |avg loss 8.482 |avg tokens 2243.700 |tokens/s 8127.222 |walltime 569.836 | +Transformer | epoch 0 | step 2120 |avg loss 8.766 |avg tokens 2288.000 |tokens/s 8195.773 |walltime 572.627 | +Transformer | epoch 0 | step 2130 |avg loss 9.324 |avg tokens 2140.500 |tokens/s 8283.759 |walltime 575.211 | +Transformer | epoch 0 | step 2140 |avg loss 9.206 |avg tokens 2041.900 |tokens/s 8015.107 |walltime 577.759 | +Transformer | epoch 0 | step 2150 |avg loss 9.297 |avg tokens 2248.900 |tokens/s 8388.063 |walltime 580.440 | +Transformer | epoch 0 | step 2160 |avg loss 8.911 |avg tokens 2363.800 |tokens/s 8449.686 |walltime 583.238 | +Transformer | epoch 0 | step 2170 |avg loss 8.944 |avg tokens 2217.900 |tokens/s 8096.346 |walltime 585.977 | +Transformer | epoch 0 | step 2180 |avg loss 8.954 |avg tokens 2327.300 |tokens/s 8660.067 |walltime 588.664 | +Transformer | epoch 0 | step 2190 |avg loss 8.821 |avg tokens 2348.800 |tokens/s 8459.634 |walltime 591.441 | +Transformer | epoch 0 | step 2200 |avg loss 9.281 |avg tokens 1868.900 |tokens/s 7650.697 |walltime 593.884 | +Transformer | epoch 0 | step 2210 |avg loss 9.386 |avg tokens 2121.500 |tokens/s 7893.171 |walltime 596.571 | +Transformer | epoch 0 | step 2220 |avg loss 9.289 |avg tokens 1684.000 |tokens/s 7021.237 |walltime 598.970 | +Transformer | epoch 0 | step 2230 |avg loss 9.117 |avg tokens 2086.400 |tokens/s 8169.468 |walltime 601.524 | +Transformer | epoch 0 | step 2240 |avg loss 8.915 |avg tokens 2254.400 |tokens/s 8301.256 |walltime 604.240 | +Transformer | epoch 0 | step 2250 |avg loss 9.107 |avg tokens 2339.100 |tokens/s 8719.104 |walltime 606.922 | +Transformer | epoch 0 | step 2260 |avg loss 9.549 |avg tokens 2077.300 |tokens/s 8308.486 |walltime 609.422 | +Transformer | epoch 0 | step 2270 |avg loss 8.811 |avg tokens 2364.000 |tokens/s 8516.706 |walltime 612.198 | +Transformer | epoch 0 | step 2280 |avg loss 8.941 |avg tokens 2291.200 |tokens/s 8349.690 |walltime 614.942 | +Transformer | epoch 0 | step 2290 |avg loss 8.785 |avg tokens 2401.600 |tokens/s 8718.350 |walltime 617.697 | +Transformer | epoch 0 | step 2300 |avg loss 8.425 |avg tokens 2264.300 |tokens/s 8152.372 |walltime 620.474 | +Transformer | epoch 0 | step 2310 |avg loss 8.709 |avg tokens 2227.200 |tokens/s 8281.500 |walltime 623.164 | +Transformer | epoch 0 | step 2320 |avg loss 8.652 |avg tokens 2385.600 |tokens/s 8671.089 |walltime 625.915 | +Transformer | epoch 0 | step 2330 |avg loss 8.883 |avg tokens 2223.200 |tokens/s 8345.392 |walltime 628.579 | +Transformer | epoch 0 | step 2340 |avg loss 8.723 |avg tokens 2064.600 |tokens/s 7690.620 |walltime 631.264 | +Transformer | epoch 0 | step 2350 |avg loss 8.957 |avg tokens 2287.300 |tokens/s 8457.707 |walltime 633.968 | +Transformer | epoch 0 | step 2360 |avg loss 8.670 |avg tokens 2363.200 |tokens/s 8608.350 |walltime 636.713 | +Transformer | epoch 0 | step 2370 |avg loss 8.835 |avg tokens 2306.100 |tokens/s 8633.087 |walltime 639.384 | +Transformer | epoch 0 | step 2380 |avg loss 9.263 |avg tokens 2090.400 |tokens/s 8388.307 |walltime 641.876 | +Transformer | epoch 0 | step 2390 |avg loss 9.029 |avg tokens 2089.800 |tokens/s 7969.857 |walltime 644.499 | +Transformer | epoch 0 | step 2400 |avg loss 8.662 |avg tokens 2272.900 |tokens/s 8389.489 |walltime 647.208 | +Transformer | epoch 0 | step 2410 |avg loss 9.108 |avg tokens 2153.600 |tokens/s 8576.594 |walltime 649.719 | +Transformer | epoch 0 | step 2420 |avg loss 8.591 |avg tokens 2339.000 |tokens/s 8479.941 |walltime 652.477 | +Transformer | epoch 0 | step 2430 |avg loss 9.370 |avg tokens 2068.100 |tokens/s 8449.620 |walltime 654.925 | +Transformer | epoch 0 | step 2440 |avg loss 9.104 |avg tokens 2355.200 |tokens/s 8777.342 |walltime 657.608 | +Transformer | epoch 0 | step 2450 |avg loss 8.752 |avg tokens 1883.100 |tokens/s 7514.894 |walltime 660.114 | +Transformer | epoch 0 | step 2460 |avg loss 8.725 |avg tokens 2241.200 |tokens/s 8158.327 |walltime 662.861 | +Transformer | epoch 0 | step 2470 |avg loss 9.565 |avg tokens 1503.500 |tokens/s 6512.393 |walltime 665.170 | +Transformer | epoch 0 | step 2480 |avg loss 8.671 |avg tokens 2252.800 |tokens/s 8263.723 |walltime 667.896 | +Transformer | epoch 0 | step 2490 |avg loss 8.950 |avg tokens 1844.500 |tokens/s 7475.071 |walltime 670.363 | +Transformer | epoch 0 | step 2500 |avg loss 8.952 |avg tokens 2384.300 |tokens/s 8718.781 |walltime 673.098 | +Transformer | epoch 0 | step 2510 |avg loss 8.541 |avg tokens 2380.100 |tokens/s 8751.042 |walltime 675.818 | +Transformer | epoch 0 | step 2520 |avg loss 8.573 |avg tokens 2177.100 |tokens/s 8221.381 |walltime 678.466 | +Transformer | epoch 0 | step 2530 |avg loss 8.616 |avg tokens 2193.600 |tokens/s 8261.218 |walltime 681.121 | +Transformer | epoch 0 | step 2540 |avg loss 8.483 |avg tokens 2188.600 |tokens/s 8097.756 |walltime 683.824 | +Transformer | epoch 0 | step 2550 |avg loss 8.455 |avg tokens 2183.900 |tokens/s 8010.054 |walltime 686.550 | +Transformer | epoch 0 | step 2560 |avg loss 8.730 |avg tokens 2180.900 |tokens/s 8083.104 |walltime 689.248 | +Transformer | epoch 0 | step 2570 |avg loss 8.972 |avg tokens 2420.800 |tokens/s 9111.468 |walltime 691.905 | +Transformer | epoch 0 | step 2580 |avg loss 9.208 |avg tokens 1965.200 |tokens/s 8106.871 |walltime 694.329 | +Transformer | epoch 0 | step 2590 |avg loss 8.564 |avg tokens 2201.500 |tokens/s 8243.967 |walltime 697.000 | +Transformer | epoch 0 | step 2600 |avg loss 9.113 |avg tokens 1977.400 |tokens/s 7886.620 |walltime 699.507 | +Transformer | epoch 0 | step 2610 |avg loss 8.844 |avg tokens 2175.200 |tokens/s 8400.265 |walltime 702.097 | +Transformer | epoch 0 | step 2620 |avg loss 9.167 |avg tokens 2015.100 |tokens/s 8185.440 |walltime 704.558 | +Transformer | epoch 0 | step 2630 |avg loss 8.999 |avg tokens 1815.900 |tokens/s 7340.729 |walltime 707.032 | +Transformer | epoch 0 | step 2640 |avg loss 8.605 |avg tokens 2297.100 |tokens/s 8397.788 |walltime 709.768 | +Transformer | epoch 0 | step 2650 |avg loss 8.542 |avg tokens 2302.400 |tokens/s 8459.737 |walltime 712.489 | +Transformer | epoch 0 | step 2660 |avg loss 8.581 |avg tokens 2333.600 |tokens/s 8369.868 |walltime 715.277 | +Transformer | epoch 0 | step 2670 |avg loss 8.632 |avg tokens 2269.000 |tokens/s 8290.356 |walltime 718.014 | +Transformer | epoch 0 | step 2680 |avg loss 8.613 |avg tokens 2282.000 |tokens/s 8457.037 |walltime 720.712 | +Transformer | epoch 0 | step 2690 |avg loss 8.960 |avg tokens 2219.000 |tokens/s 8566.904 |walltime 723.303 | +Transformer | epoch 0 | step 2700 |avg loss 8.613 |avg tokens 2170.200 |tokens/s 8144.702 |walltime 725.967 | +Transformer | epoch 0 | step 2710 |avg loss 8.398 |avg tokens 2321.100 |tokens/s 8400.525 |walltime 728.730 | +Transformer | epoch 0 | step 2720 |avg loss 8.586 |avg tokens 2233.400 |tokens/s 8232.753 |walltime 731.443 | +Transformer | epoch 0 | step 2730 |avg loss 8.724 |avg tokens 2217.200 |tokens/s 8643.545 |walltime 734.008 | +Transformer | epoch 0 | step 2740 |avg loss 8.393 |avg tokens 2276.000 |tokens/s 8167.979 |walltime 736.795 | +Transformer | epoch 0 | step 2750 |avg loss 8.500 |avg tokens 2220.800 |tokens/s 8302.877 |walltime 739.470 | +Transformer | epoch 0 | step 2760 |avg loss 8.514 |avg tokens 2137.000 |tokens/s 8040.483 |walltime 742.127 | +Transformer | epoch 0 | step 2770 |avg loss 8.478 |avg tokens 2205.000 |tokens/s 8166.689 |walltime 744.827 | +Transformer | epoch 0 | step 2780 |avg loss 8.846 |avg tokens 2241.400 |tokens/s 8819.673 |walltime 747.369 | +Transformer | epoch 0 | step 2790 |avg loss 8.736 |avg tokens 2246.600 |tokens/s 7948.854 |walltime 750.195 | +Transformer | epoch 0 | step 2800 |avg loss 8.874 |avg tokens 2215.000 |tokens/s 8436.117 |walltime 752.821 | +Transformer | epoch 0 | step 2810 |avg loss 8.649 |avg tokens 2200.600 |tokens/s 8388.093 |walltime 755.444 | +Transformer | epoch 0 | step 2820 |avg loss 8.667 |avg tokens 2322.400 |tokens/s 8725.068 |walltime 758.106 | +Transformer | epoch 0 | step 2830 |avg loss 8.982 |avg tokens 2078.100 |tokens/s 7628.299 |walltime 760.830 | +Transformer | epoch 0 | step 2840 |avg loss 8.651 |avg tokens 2245.900 |tokens/s 8356.347 |walltime 763.518 | +Transformer | epoch 0 | step 2850 |avg loss 8.650 |avg tokens 2157.700 |tokens/s 8437.270 |walltime 766.075 | +Transformer | epoch 0 | step 2860 |avg loss 8.264 |avg tokens 2184.800 |tokens/s 8161.809 |walltime 768.752 | +Transformer | epoch 0 | step 2870 |avg loss 8.652 |avg tokens 2201.700 |tokens/s 8540.080 |walltime 771.330 | +Transformer | epoch 0 | step 2880 |avg loss 8.640 |avg tokens 2336.100 |tokens/s 8853.674 |walltime 773.969 | +Transformer | epoch 0 | step 2890 |avg loss 8.655 |avg tokens 2317.600 |tokens/s 8820.594 |walltime 776.596 | +Transformer | epoch 0 | step 2900 |avg loss 8.416 |avg tokens 2251.400 |tokens/s 8399.413 |walltime 779.277 | +Transformer | epoch 0 | step 2910 |avg loss 8.479 |avg tokens 2195.000 |tokens/s 8236.695 |walltime 781.941 | +Transformer | epoch 0 | step 2920 |avg loss 8.230 |avg tokens 2478.400 |tokens/s 8757.239 |walltime 784.772 | +Transformer | epoch 0 | step 2930 |avg loss 8.242 |avg tokens 2272.200 |tokens/s 8208.046 |walltime 787.540 | +Transformer | epoch 0 | step 2940 |avg loss 8.270 |avg tokens 2192.100 |tokens/s 8049.924 |walltime 790.263 | +Transformer | epoch 0 | step 2950 |avg loss 9.003 |avg tokens 1897.800 |tokens/s 7986.113 |walltime 792.639 | +Transformer | epoch 0 | step 2960 |avg loss 8.824 |avg tokens 2143.600 |tokens/s 8453.193 |walltime 795.175 | +Transformer | epoch 0 | step 2970 |avg loss 8.593 |avg tokens 2103.500 |tokens/s 7735.785 |walltime 797.894 | +Transformer | epoch 0 | step 2980 |avg loss 9.043 |avg tokens 2056.100 |tokens/s 8201.383 |walltime 800.401 | +Transformer | epoch 0 | step 2990 |avg loss 8.610 |avg tokens 2297.800 |tokens/s 8647.940 |walltime 803.058 | +Transformer | epoch 0 | step 3000 |avg loss 8.762 |avg tokens 2219.100 |tokens/s 8473.989 |walltime 805.677 | +Transformer | epoch 0 | step 3010 |avg loss 8.671 |avg tokens 2095.000 |tokens/s 8415.443 |walltime 808.167 | +Transformer | epoch 0 | step 3020 |avg loss 8.844 |avg tokens 2079.900 |tokens/s 8086.762 |walltime 810.739 | +Transformer | epoch 0 | step 3030 |avg loss 8.116 |avg tokens 2332.600 |tokens/s 8233.349 |walltime 813.572 | +Transformer | epoch 0 | step 3040 |avg loss 8.408 |avg tokens 2381.600 |tokens/s 8675.996 |walltime 816.317 | +Transformer | epoch 0 | step 3050 |avg loss 8.448 |avg tokens 2211.000 |tokens/s 8264.332 |walltime 818.992 | +Transformer | epoch 0 | step 3060 |avg loss 8.351 |avg tokens 2295.200 |tokens/s 8389.498 |walltime 821.728 | +Transformer | epoch 0 | step 3070 |avg loss 8.382 |avg tokens 2272.800 |tokens/s 8236.740 |walltime 824.487 | +Transformer | epoch 0 | step 3080 |avg loss 8.418 |avg tokens 2069.700 |tokens/s 7843.365 |walltime 827.126 | +Transformer | epoch 0 | step 3090 |avg loss 8.996 |avg tokens 1715.200 |tokens/s 7104.233 |walltime 829.540 | +Transformer | epoch 0 | step 3100 |avg loss 8.778 |avg tokens 2356.800 |tokens/s 8753.260 |walltime 832.233 | +Transformer | epoch 0 | step 3110 |avg loss 8.655 |avg tokens 1863.900 |tokens/s 7593.374 |walltime 834.688 | +Transformer | epoch 0 | step 3120 |avg loss 8.845 |avg tokens 2242.600 |tokens/s 8562.628 |walltime 837.307 | +Transformer | epoch 0 | step 3130 |avg loss 8.650 |avg tokens 2333.600 |tokens/s 8494.127 |walltime 840.054 | +Transformer | epoch 0 | step 3140 |avg loss 8.033 |avg tokens 2317.200 |tokens/s 8320.662 |walltime 842.839 | +Transformer | epoch 0 | step 3150 |avg loss 8.171 |avg tokens 2177.400 |tokens/s 7959.766 |walltime 845.574 | +Transformer | epoch 0 | step 3160 |avg loss 8.298 |avg tokens 2329.600 |tokens/s 8295.442 |walltime 848.383 | +Transformer | epoch 0 | step 3170 |avg loss 8.202 |avg tokens 2306.000 |tokens/s 8429.538 |walltime 851.118 | +Transformer | epoch 0 | step 3180 |avg loss 8.598 |avg tokens 2156.400 |tokens/s 8151.208 |walltime 853.764 | +Transformer | epoch 0 | step 3190 |avg loss 8.816 |avg tokens 1924.800 |tokens/s 7806.872 |walltime 856.229 | +Transformer | epoch 0 | step 3200 |avg loss 8.370 |avg tokens 2281.500 |tokens/s 8341.713 |walltime 858.964 | +Transformer | epoch 0 | step 3210 |avg loss 8.369 |avg tokens 2226.900 |tokens/s 8116.313 |walltime 861.708 | +Transformer | epoch 0 | step 3220 |avg loss 8.816 |avg tokens 2136.500 |tokens/s 8595.796 |walltime 864.194 | +Transformer | epoch 0 | step 3230 |avg loss 8.320 |avg tokens 2344.900 |tokens/s 8705.277 |walltime 866.887 | +Transformer | epoch 0 | step 3240 |avg loss 8.665 |avg tokens 2007.100 |tokens/s 7939.483 |walltime 869.415 | +Transformer | epoch 0 | step 3250 |avg loss 8.356 |avg tokens 2146.600 |tokens/s 8078.115 |walltime 872.073 | +Transformer | epoch 0 | step 3260 |avg loss 8.332 |avg tokens 2212.800 |tokens/s 8194.905 |walltime 874.773 | +Transformer | epoch 0 | step 3270 |avg loss 8.921 |avg tokens 2332.200 |tokens/s 9066.189 |walltime 877.345 | +Transformer | epoch 0 | step 3280 |avg loss 8.124 |avg tokens 2396.800 |tokens/s 8679.057 |walltime 880.107 | +Transformer | epoch 0 | step 3290 |avg loss 8.540 |avg tokens 2110.000 |tokens/s 8209.520 |walltime 882.677 | +Transformer | epoch 0 | step 3300 |avg loss 7.898 |avg tokens 2227.900 |tokens/s 8123.418 |walltime 885.420 | +Transformer | epoch 0 | step 3310 |avg loss 7.947 |avg tokens 2267.200 |tokens/s 8149.799 |walltime 888.201 | +Transformer | epoch 0 | step 3320 |avg loss 8.845 |avg tokens 1811.300 |tokens/s 7716.531 |walltime 890.549 | +Transformer | epoch 0 | step 3330 |avg loss 7.993 |avg tokens 2337.600 |tokens/s 8478.887 |walltime 893.306 | +Transformer | epoch 0 | step 3340 |avg loss 8.459 |avg tokens 2103.300 |tokens/s 8047.577 |walltime 895.919 | +Transformer | epoch 0 | step 3350 |avg loss 8.219 |avg tokens 2372.800 |tokens/s 8791.648 |walltime 898.618 | +Transformer | epoch 0 | step 3360 |avg loss 8.516 |avg tokens 2287.000 |tokens/s 8643.133 |walltime 901.264 | +Transformer | epoch 0 | step 3370 |avg loss 8.288 |avg tokens 2259.000 |tokens/s 8495.564 |walltime 903.923 | +Transformer | epoch 0 | step 3380 |avg loss 8.638 |avg tokens 1783.200 |tokens/s 7234.289 |walltime 906.388 | +Transformer | epoch 0 | step 3390 |avg loss 8.291 |avg tokens 2128.600 |tokens/s 8071.651 |walltime 909.025 | +Transformer | epoch 0 | step 3400 |avg loss 8.575 |avg tokens 2216.000 |tokens/s 8686.214 |walltime 911.577 | +Transformer | epoch 0 | step 3410 |avg loss 8.182 |avg tokens 2404.000 |tokens/s 8696.142 |walltime 914.341 | +Transformer | epoch 0 | step 3420 |avg loss 8.297 |avg tokens 2309.600 |tokens/s 8536.115 |walltime 917.047 | +Transformer | epoch 0 | step 3430 |avg loss 8.479 |avg tokens 2160.000 |tokens/s 8148.155 |walltime 919.698 | +Transformer | epoch 0 | step 3440 |avg loss 8.040 |avg tokens 2296.800 |tokens/s 8370.801 |walltime 922.441 | +Transformer | epoch 0 | step 3450 |avg loss 8.200 |avg tokens 2112.400 |tokens/s 7974.131 |walltime 925.090 | +Transformer | epoch 0 | step 3460 |avg loss 8.316 |avg tokens 2292.500 |tokens/s 8674.445 |walltime 927.733 | +Transformer | epoch 0 | step 3470 |avg loss 7.998 |avg tokens 2149.500 |tokens/s 7926.846 |walltime 930.445 | +Transformer | epoch 0 | step 3480 |avg loss 8.311 |avg tokens 2308.000 |tokens/s 8538.200 |walltime 933.148 | +Transformer | epoch 0 | step 3490 |avg loss 8.083 |avg tokens 2199.300 |tokens/s 8134.653 |walltime 935.852 | +Transformer | epoch 0 | step 3500 |avg loss 8.666 |avg tokens 2113.800 |tokens/s 8152.093 |walltime 938.445 | +Transformer | epoch 0 | step 3510 |avg loss 8.416 |avg tokens 2327.200 |tokens/s 8582.381 |walltime 941.156 | +Transformer | epoch 0 | step 3520 |avg loss 8.067 |avg tokens 1971.500 |tokens/s 7597.095 |walltime 943.751 | +Transformer | epoch 0 | step 3530 |avg loss 8.418 |avg tokens 2209.300 |tokens/s 8161.693 |walltime 946.458 | +Transformer | epoch 0 | step 3540 |avg loss 8.279 |avg tokens 2362.200 |tokens/s 8806.322 |walltime 949.141 | +Transformer | epoch 0 | step 3550 |avg loss 7.959 |avg tokens 2321.600 |tokens/s 8334.821 |walltime 951.926 | +Transformer | epoch 0 | step 3560 |avg loss 8.333 |avg tokens 2170.300 |tokens/s 8293.068 |walltime 954.543 | +Transformer | epoch 0 | step 3570 |avg loss 8.316 |avg tokens 2038.700 |tokens/s 7918.789 |walltime 957.118 | +Transformer | epoch 0 | step 3580 |avg loss 8.358 |avg tokens 2050.400 |tokens/s 7850.161 |walltime 959.730 | +Transformer | epoch 0 | step 3590 |avg loss 8.304 |avg tokens 2140.700 |tokens/s 8228.119 |walltime 962.331 | +Transformer | epoch 0 | step 3600 |avg loss 7.948 |avg tokens 2277.600 |tokens/s 8510.654 |walltime 965.007 | +Transformer | epoch 0 | step 3610 |avg loss 8.356 |avg tokens 2305.600 |tokens/s 8398.919 |walltime 967.753 | +Transformer | epoch 0 | step 3620 |avg loss 8.402 |avg tokens 2323.600 |tokens/s 8758.975 |walltime 970.405 | +Transformer | epoch 0 | step 3630 |avg loss 8.171 |avg tokens 2256.200 |tokens/s 8386.633 |walltime 973.096 | +Transformer | epoch 0 | step 3640 |avg loss 7.986 |avg tokens 2146.700 |tokens/s 7901.678 |walltime 975.812 | +Transformer | epoch 0 | step 3650 |avg loss 8.147 |avg tokens 2286.400 |tokens/s 8389.872 |walltime 978.538 | +Transformer | epoch 0 | step 3660 |avg loss 8.492 |avg tokens 2059.200 |tokens/s 7736.376 |walltime 981.199 | +Transformer | epoch 0 | step 3670 |avg loss 8.010 |avg tokens 2135.600 |tokens/s 8017.903 |walltime 983.863 | +Transformer | epoch 0 | step 3680 |avg loss 8.271 |avg tokens 2176.000 |tokens/s 8398.864 |walltime 986.454 | +Transformer | epoch 0 | step 3690 |avg loss 8.555 |avg tokens 2190.400 |tokens/s 8558.382 |walltime 989.013 | +Transformer | epoch 0 | step 3700 |avg loss 8.584 |avg tokens 2034.800 |tokens/s 8249.625 |walltime 991.480 | +Transformer | epoch 0 | step 3710 |avg loss 8.291 |avg tokens 2024.900 |tokens/s 7777.865 |walltime 994.083 | +Transformer | epoch 0 | step 3720 |avg loss 8.551 |avg tokens 2163.500 |tokens/s 8549.467 |walltime 996.614 | +Transformer | epoch 0 | step 3730 |avg loss 8.434 |avg tokens 2310.900 |tokens/s 8756.557 |walltime 999.253 | +Transformer | epoch 0 | step 3740 |avg loss 7.910 |avg tokens 2248.000 |tokens/s 8231.659 |walltime 1001.984 | +Transformer | epoch 0 | step 3750 |avg loss 8.363 |avg tokens 2405.600 |tokens/s 9127.244 |walltime 1004.619 | +Transformer | epoch 0 | step 3760 |avg loss 8.254 |avg tokens 2149.400 |tokens/s 8020.079 |walltime 1007.299 | +Transformer | epoch 0 | step 3770 |avg loss 8.343 |avg tokens 2084.200 |tokens/s 8052.743 |walltime 1009.887 | +Transformer | epoch 0 | step 3780 |avg loss 7.788 |avg tokens 2255.200 |tokens/s 8141.332 |walltime 1012.657 | +Transformer | epoch 0 | step 3790 |avg loss 8.097 |avg tokens 2314.200 |tokens/s 8728.227 |walltime 1015.309 | +Transformer | epoch 0 | step 3800 |avg loss 8.088 |avg tokens 2133.100 |tokens/s 8375.028 |walltime 1017.856 | +Transformer | epoch 0 | step 3810 |avg loss 8.120 |avg tokens 2275.800 |tokens/s 8290.899 |walltime 1020.601 | +Transformer | epoch 0 | step 3820 |avg loss 7.954 |avg tokens 2240.100 |tokens/s 8284.066 |walltime 1023.305 | +Transformer | epoch 0 | step 3830 |avg loss 8.341 |avg tokens 1999.500 |tokens/s 7942.492 |walltime 1025.822 | +Transformer | epoch 0 | step 3840 |avg loss 8.410 |avg tokens 2183.600 |tokens/s 8539.688 |walltime 1028.379 | +Transformer | epoch 0 | step 3850 |avg loss 7.862 |avg tokens 2342.700 |tokens/s 8495.414 |walltime 1031.137 | +Transformer | epoch 0 | step 3860 |avg loss 8.201 |avg tokens 2190.300 |tokens/s 8179.450 |walltime 1033.815 | +Transformer | epoch 0 | step 3870 |avg loss 8.352 |avg tokens 2236.500 |tokens/s 8241.994 |walltime 1036.528 | +Transformer | epoch 0 | step 3880 |avg loss 8.165 |avg tokens 2237.600 |tokens/s 8377.533 |walltime 1039.199 | +Transformer | epoch 0 | step 3890 |avg loss 8.326 |avg tokens 2094.000 |tokens/s 8324.299 |walltime 1041.715 | +Transformer | epoch 0 | step 3900 |avg loss 8.283 |avg tokens 2231.800 |tokens/s 8407.545 |walltime 1044.369 | +Transformer | epoch 0 | step 3910 |avg loss 8.039 |avg tokens 2238.500 |tokens/s 8285.258 |walltime 1047.071 | +Transformer | epoch 0 | step 3920 |avg loss 8.118 |avg tokens 2387.000 |tokens/s 8701.612 |walltime 1049.814 | +Transformer | epoch 0 | step 3930 |avg loss 8.065 |avg tokens 2187.200 |tokens/s 8016.994 |walltime 1052.543 | +Transformer | epoch 0 | step 3940 |avg loss 8.435 |avg tokens 2208.800 |tokens/s 8673.542 |walltime 1055.089 | +Transformer | epoch 0 | step 3950 |avg loss 8.120 |avg tokens 2323.700 |tokens/s 8555.077 |walltime 1057.805 | +Transformer | epoch 0 | step 3960 |avg loss 8.148 |avg tokens 2364.800 |tokens/s 8418.062 |walltime 1060.615 | +Transformer | epoch 0 | step 3970 |avg loss 8.214 |avg tokens 2302.800 |tokens/s 8585.862 |walltime 1063.297 | +Transformer | epoch 0 | step 3980 |avg loss 8.421 |avg tokens 1907.300 |tokens/s 7443.508 |walltime 1065.859 | +Transformer | epoch 0 | step 3990 |avg loss 8.261 |avg tokens 2248.300 |tokens/s 8463.142 |walltime 1068.516 | +Transformer | epoch 0 | step 4000 |avg loss 8.384 |avg tokens 2249.800 |tokens/s 8450.911 |walltime 1071.178 | +Transformer | epoch 0 | step 4010 |avg loss 8.304 |avg tokens 1944.800 |tokens/s 7541.683 |walltime 1073.756 | +Transformer | epoch 0 | step 4020 |avg loss 7.797 |avg tokens 2333.000 |tokens/s 8327.620 |walltime 1076.558 | +Transformer | epoch 0 | step 4030 |avg loss 7.828 |avg tokens 2286.400 |tokens/s 8278.878 |walltime 1079.320 | +Transformer | epoch 0 | step 4040 |avg loss 8.152 |avg tokens 2240.500 |tokens/s 8752.110 |walltime 1081.880 | +Transformer | epoch 0 | step 4050 |avg loss 8.048 |avg tokens 2204.000 |tokens/s 8154.060 |walltime 1084.583 | +Transformer | epoch 0 | step 4060 |avg loss 7.932 |avg tokens 2095.700 |tokens/s 8140.058 |walltime 1087.157 | +Transformer | epoch 0 | step 4070 |avg loss 9.015 |avg tokens 1527.500 |tokens/s 6930.243 |walltime 1089.361 | +Transformer | epoch 0 | step 4080 |avg loss 8.055 |avg tokens 2110.900 |tokens/s 7915.079 |walltime 1092.028 | +Transformer | epoch 0 | step 4090 |avg loss 8.069 |avg tokens 2064.400 |tokens/s 7901.452 |walltime 1094.641 | +Transformer | epoch 0 | step 4100 |avg loss 8.358 |avg tokens 2224.000 |tokens/s 8396.607 |walltime 1097.290 | +Transformer | epoch 0 | step 4110 |avg loss 8.125 |avg tokens 1918.100 |tokens/s 7622.925 |walltime 1099.806 | +Transformer | epoch 0 | step 4120 |avg loss 8.494 |avg tokens 2121.000 |tokens/s 8596.826 |walltime 1102.273 | +Transformer | epoch 0 | step 4130 |avg loss 8.306 |avg tokens 2134.100 |tokens/s 8423.046 |walltime 1104.807 | +Transformer | epoch 0 | step 4140 |avg loss 8.100 |avg tokens 2285.700 |tokens/s 8524.004 |walltime 1107.488 | +Transformer | epoch 0 | step 4150 |avg loss 8.359 |avg tokens 2184.200 |tokens/s 8511.048 |walltime 1110.055 | +Transformer | epoch 0 | step 4160 |avg loss 8.100 |avg tokens 2125.000 |tokens/s 7991.357 |walltime 1112.714 | +Transformer | epoch 0 | step 4170 |avg loss 8.303 |avg tokens 2207.600 |tokens/s 8279.957 |walltime 1115.380 | +Transformer | epoch 0 | step 4180 |avg loss 7.751 |avg tokens 2307.200 |tokens/s 8290.827 |walltime 1118.163 | +Transformer | epoch 0 | step 4190 |avg loss 8.130 |avg tokens 2062.100 |tokens/s 7980.051 |walltime 1120.747 | +Transformer | epoch 0 | step 4200 |avg loss 8.285 |avg tokens 2324.600 |tokens/s 8610.225 |walltime 1123.447 | +Transformer | epoch 0 | step 4210 |avg loss 8.617 |avg tokens 2102.900 |tokens/s 8304.081 |walltime 1125.979 | +Transformer | epoch 0 | step 4220 |avg loss 7.956 |avg tokens 2343.200 |tokens/s 8713.897 |walltime 1128.668 | +Transformer | epoch 0 | step 4230 |avg loss 7.814 |avg tokens 2205.400 |tokens/s 8045.912 |walltime 1131.409 | +Transformer | epoch 0 | step 4240 |avg loss 8.532 |avg tokens 2159.600 |tokens/s 8296.913 |walltime 1134.012 | +Transformer | epoch 0 | step 4250 |avg loss 8.376 |avg tokens 2219.300 |tokens/s 8468.294 |walltime 1136.633 | +Transformer | epoch 0 | step 4260 |avg loss 8.440 |avg tokens 2264.300 |tokens/s 8757.872 |walltime 1139.218 | +Transformer | epoch 0 | step 4270 |avg loss 8.226 |avg tokens 2173.500 |tokens/s 8110.237 |walltime 1141.898 | +Transformer | epoch 0 | step 4280 |avg loss 8.082 |avg tokens 2237.400 |tokens/s 8303.980 |walltime 1144.592 | +Transformer | epoch 0 | step 4290 |avg loss 8.891 |avg tokens 1815.100 |tokens/s 7632.380 |walltime 1146.971 | +Transformer | epoch 0 | step 4300 |avg loss 7.701 |avg tokens 2290.400 |tokens/s 8255.039 |walltime 1149.745 | +Transformer | epoch 0 | step 4310 |avg loss 7.886 |avg tokens 2419.000 |tokens/s 8673.582 |walltime 1152.534 | +Transformer | epoch 0 | step 4320 |avg loss 8.073 |avg tokens 2228.800 |tokens/s 8296.316 |walltime 1155.221 | +Transformer | epoch 0 | step 4330 |avg loss 7.767 |avg tokens 2361.900 |tokens/s 8686.188 |walltime 1157.940 | +Transformer | epoch 0 | step 4340 |avg loss 7.997 |avg tokens 2283.100 |tokens/s 8606.944 |walltime 1160.592 | +Transformer | epoch 0 | step 4350 |avg loss 7.756 |avg tokens 2256.900 |tokens/s 8541.871 |walltime 1163.235 | +Transformer | epoch 0 | step 4360 |avg loss 7.926 |avg tokens 2055.300 |tokens/s 7699.927 |walltime 1165.904 | +Transformer | epoch 0 | step 4370 |avg loss 7.883 |avg tokens 2198.400 |tokens/s 8147.948 |walltime 1168.602 | +Transformer | epoch 0 | step 4380 |avg loss 7.745 |avg tokens 2368.000 |tokens/s 8710.576 |walltime 1171.320 | +Transformer | epoch 0 | step 4390 |avg loss 8.123 |avg tokens 2127.800 |tokens/s 8183.077 |walltime 1173.921 | +Transformer | epoch 0 | step 4400 |avg loss 7.776 |avg tokens 2185.000 |tokens/s 8080.594 |walltime 1176.625 | +Transformer | epoch 0 | step 4410 |avg loss 7.676 |avg tokens 2385.200 |tokens/s 8598.369 |walltime 1179.399 | +Transformer | epoch 0 | step 4420 |avg loss 8.073 |avg tokens 2280.200 |tokens/s 8602.951 |walltime 1182.049 | +Transformer | epoch 0 | step 4430 |avg loss 8.012 |avg tokens 2167.700 |tokens/s 8080.058 |walltime 1184.732 | +Transformer | epoch 0 | step 4440 |avg loss 8.172 |avg tokens 2052.200 |tokens/s 8091.843 |walltime 1187.268 | +Transformer | epoch 0 | step 4450 |avg loss 7.895 |avg tokens 2212.000 |tokens/s 8230.721 |walltime 1189.956 | +Transformer | epoch 0 | step 4460 |avg loss 7.492 |avg tokens 2312.000 |tokens/s 8551.830 |walltime 1192.659 | +Transformer | epoch 0 | step 4470 |avg loss 8.458 |avg tokens 1876.800 |tokens/s 7527.589 |walltime 1195.152 | +Transformer | epoch 0 | step 4480 |avg loss 8.150 |avg tokens 2393.600 |tokens/s 8756.874 |walltime 1197.886 | +Transformer | epoch 0 | step 4490 |avg loss 7.946 |avg tokens 2295.200 |tokens/s 8421.885 |walltime 1200.611 | +Transformer | epoch 0 | step 4500 |avg loss 8.251 |avg tokens 2149.400 |tokens/s 8343.957 |walltime 1203.187 | +Transformer | epoch 0 | step 4510 |avg loss 7.979 |avg tokens 2174.400 |tokens/s 8367.895 |walltime 1205.786 | +Transformer | epoch 0 | step 4520 |avg loss 7.677 |avg tokens 2284.800 |tokens/s 8331.346 |walltime 1208.528 | +Transformer | epoch 0 | step 4530 |avg loss 7.840 |avg tokens 2140.500 |tokens/s 8135.756 |walltime 1211.159 | +Transformer | epoch 0 | step 4540 |avg loss 8.238 |avg tokens 2103.300 |tokens/s 8111.712 |walltime 1213.752 | +Transformer | epoch 0 | step 4550 |avg loss 7.635 |avg tokens 2232.800 |tokens/s 8255.308 |walltime 1216.457 | +Transformer | epoch 0 | step 4560 |avg loss 8.366 |avg tokens 1971.900 |tokens/s 7876.475 |walltime 1218.960 | +Transformer | epoch 0 | step 4570 |avg loss 7.572 |avg tokens 2315.200 |tokens/s 8352.118 |walltime 1221.732 | +Transformer | epoch 0 | step 4580 |avg loss 7.852 |avg tokens 2176.400 |tokens/s 8072.244 |walltime 1224.428 | +Transformer | epoch 0 | step 4590 |avg loss 7.937 |avg tokens 2243.700 |tokens/s 8320.012 |walltime 1227.125 | +Transformer | epoch 0 | step 4600 |avg loss 7.849 |avg tokens 2410.400 |tokens/s 8866.750 |walltime 1229.843 | +Transformer | epoch 0 | step 4610 |avg loss 7.835 |avg tokens 2021.500 |tokens/s 7959.390 |walltime 1232.383 | +Transformer | epoch 0 | step 4620 |avg loss 7.585 |avg tokens 2250.400 |tokens/s 8259.275 |walltime 1235.108 | +Transformer | epoch 0 | step 4630 |avg loss 7.841 |avg tokens 2266.400 |tokens/s 8418.697 |walltime 1237.800 | +Transformer | epoch 0 | step 4640 |avg loss 8.072 |avg tokens 1976.000 |tokens/s 7751.308 |walltime 1240.349 | +Transformer | epoch 0 | step 4650 |avg loss 7.726 |avg tokens 2209.000 |tokens/s 8179.161 |walltime 1243.050 | +Transformer | epoch 0 | step 4660 |avg loss 7.861 |avg tokens 2117.300 |tokens/s 8079.091 |walltime 1245.671 | +Transformer | epoch 0 | step 4670 |avg loss 8.111 |avg tokens 2106.100 |tokens/s 8429.979 |walltime 1248.169 | +Transformer | epoch 0 | step 4680 |avg loss 7.893 |avg tokens 1874.000 |tokens/s 7558.774 |walltime 1250.648 | +Transformer | epoch 0 | step 4690 |avg loss 8.125 |avg tokens 2247.500 |tokens/s 8435.568 |walltime 1253.313 | +Transformer | epoch 0 | step 4700 |avg loss 8.557 |avg tokens 2146.700 |tokens/s 8549.400 |walltime 1255.824 | +Transformer | epoch 0 | step 4710 |avg loss 8.104 |avg tokens 2252.300 |tokens/s 8476.166 |walltime 1258.481 | +Transformer | epoch 0 | step 4720 |avg loss 8.058 |avg tokens 2285.100 |tokens/s 8367.445 |walltime 1261.212 | +Transformer | epoch 0 | step 4730 |avg loss 7.717 |avg tokens 2218.300 |tokens/s 8167.989 |walltime 1263.928 | +Transformer | epoch 0 | step 4740 |avg loss 7.837 |avg tokens 2102.100 |tokens/s 7898.750 |walltime 1266.589 | +Transformer | epoch 0 | step 4750 |avg loss 8.051 |avg tokens 2172.700 |tokens/s 8281.541 |walltime 1269.213 | +Transformer | epoch 0 | step 4760 |avg loss 7.840 |avg tokens 2327.100 |tokens/s 8453.337 |walltime 1271.965 | +Transformer | epoch 0 | step 4770 |avg loss 7.826 |avg tokens 2278.500 |tokens/s 8550.287 |walltime 1274.630 | +Transformer | epoch 0 | step 4780 |avg loss 7.995 |avg tokens 2360.500 |tokens/s 8711.263 |walltime 1277.340 | +Transformer | epoch 0 | step 4790 |avg loss 7.927 |avg tokens 2128.800 |tokens/s 8287.968 |walltime 1279.908 | +Transformer | epoch 0 | step 4800 |avg loss 8.099 |avg tokens 2087.500 |tokens/s 8172.276 |walltime 1282.463 | +Transformer | epoch 0 | step 4810 |avg loss 8.392 |avg tokens 2193.700 |tokens/s 8320.725 |walltime 1285.099 | +Transformer | epoch 0 | step 4820 |avg loss 7.880 |avg tokens 2248.800 |tokens/s 8242.419 |walltime 1287.828 | +Transformer | epoch 0 | step 4830 |avg loss 8.081 |avg tokens 2138.200 |tokens/s 8185.581 |walltime 1290.440 | +Transformer | epoch 0 | step 4840 |avg loss 7.591 |avg tokens 2346.700 |tokens/s 8532.911 |walltime 1293.190 | +Transformer | epoch 0 | step 4850 |avg loss 7.613 |avg tokens 2257.600 |tokens/s 8394.659 |walltime 1295.879 | +Transformer | epoch 0 | step 4860 |avg loss 7.421 |avg tokens 2418.600 |tokens/s 8657.660 |walltime 1298.673 | +Transformer | epoch 0 | step 4870 |avg loss 8.022 |avg tokens 2308.500 |tokens/s 8648.548 |walltime 1301.342 | +Transformer | epoch 0 | step 4880 |avg loss 8.208 |avg tokens 2247.800 |tokens/s 8427.489 |walltime 1304.009 | +Transformer | epoch 0 | step 4890 |avg loss 7.635 |avg tokens 2248.800 |tokens/s 8220.684 |walltime 1306.745 | +Transformer | epoch 0 | step 4900 |avg loss 7.995 |avg tokens 1965.600 |tokens/s 7738.170 |walltime 1309.285 | +Transformer | epoch 0 | step 4910 |avg loss 7.956 |avg tokens 2338.500 |tokens/s 8543.426 |walltime 1312.022 | +Transformer | epoch 0 | step 4920 |avg loss 8.191 |avg tokens 1995.200 |tokens/s 7945.839 |walltime 1314.533 | +Transformer | epoch 0 | step 4930 |avg loss 7.851 |avg tokens 2248.800 |tokens/s 8324.236 |walltime 1317.235 | +Transformer | epoch 0 | step 4940 |avg loss 8.122 |avg tokens 2035.700 |tokens/s 7904.297 |walltime 1319.810 | +Transformer | epoch 0 | step 4950 |avg loss 7.990 |avg tokens 2241.200 |tokens/s 8437.903 |walltime 1322.466 | +Transformer | epoch 0 | step 4960 |avg loss 7.970 |avg tokens 2193.600 |tokens/s 8234.644 |walltime 1325.130 | +Transformer | epoch 0 | step 4970 |avg loss 7.617 |avg tokens 2069.700 |tokens/s 7840.023 |walltime 1327.770 | +Transformer | epoch 0 | step 4980 |avg loss 7.665 |avg tokens 2325.000 |tokens/s 8753.681 |walltime 1330.426 | +Transformer | epoch 0 | step 4990 |avg loss 8.006 |avg tokens 2308.800 |tokens/s 8714.595 |walltime 1333.075 | +Transformer | epoch 0 | step 5000 |avg loss 8.324 |avg tokens 1988.300 |tokens/s 8230.384 |walltime 1335.491 | +Transformer | epoch 0 | step 5010 |avg loss 7.569 |avg tokens 2202.800 |tokens/s 8007.755 |walltime 1338.242 | +Transformer | epoch 0 | step 5020 |avg loss 8.222 |avg tokens 1940.800 |tokens/s 7809.314 |walltime 1340.727 | +Transformer | epoch 0 | step 5030 |avg loss 7.985 |avg tokens 2004.200 |tokens/s 7778.673 |walltime 1343.304 | +Transformer | epoch 0 | step 5040 |avg loss 8.238 |avg tokens 1878.100 |tokens/s 7395.975 |walltime 1345.843 | +Transformer | epoch 0 | step 5050 |avg loss 8.325 |avg tokens 1854.900 |tokens/s 7838.557 |walltime 1348.210 | +Transformer | epoch 0 | step 5060 |avg loss 8.162 |avg tokens 1916.300 |tokens/s 7605.126 |walltime 1350.729 | +Transformer | epoch 0 | step 5070 |avg loss 7.699 |avg tokens 2144.400 |tokens/s 8081.387 |walltime 1353.383 | +Transformer | epoch 0 | step 5080 |avg loss 8.116 |avg tokens 1953.300 |tokens/s 7783.972 |walltime 1355.892 | +Transformer | epoch 0 | step 5090 |avg loss 7.735 |avg tokens 2146.400 |tokens/s 8006.047 |walltime 1358.573 | +Transformer | epoch 0 | step 5100 |avg loss 7.538 |avg tokens 2238.200 |tokens/s 8257.735 |walltime 1361.284 | +Transformer | epoch 0 | step 5110 |avg loss 7.980 |avg tokens 2327.200 |tokens/s 8643.195 |walltime 1363.976 | +Transformer | epoch 0 | step 5120 |avg loss 8.212 |avg tokens 2110.700 |tokens/s 8427.599 |walltime 1366.481 | +Transformer | epoch 0 | step 5130 |avg loss 7.995 |avg tokens 2092.300 |tokens/s 8073.565 |walltime 1369.072 | +Transformer | epoch 0 | step 5140 |avg loss 8.222 |avg tokens 2161.500 |tokens/s 8078.464 |walltime 1371.748 | +Transformer | epoch 0 | step 5150 |avg loss 8.089 |avg tokens 2344.600 |tokens/s 8819.402 |walltime 1374.406 | +Transformer | epoch 0 | step 5160 |avg loss 7.624 |avg tokens 2317.600 |tokens/s 8548.127 |walltime 1377.118 | +Transformer | epoch 0 | step 5170 |avg loss 7.988 |avg tokens 2200.500 |tokens/s 8255.799 |walltime 1379.783 | +Transformer | epoch 0 | step 5180 |avg loss 7.763 |avg tokens 2356.700 |tokens/s 8789.326 |walltime 1382.464 | +Transformer | epoch 0 | step 5190 |avg loss 8.259 |avg tokens 1732.000 |tokens/s 7141.724 |walltime 1384.890 | +Transformer | epoch 0 | step 5200 |avg loss 7.464 |avg tokens 2279.000 |tokens/s 8410.107 |walltime 1387.599 | +Transformer | epoch 0 | step 5210 |avg loss 8.043 |avg tokens 2165.600 |tokens/s 8276.342 |walltime 1390.216 | +Transformer | epoch 0 | step 5220 |avg loss 7.585 |avg tokens 2198.400 |tokens/s 8197.634 |walltime 1392.898 | +Transformer | epoch 0 | step 5230 |avg loss 7.922 |avg tokens 1879.000 |tokens/s 7740.103 |walltime 1395.325 | +Transformer | epoch 0 | step 5240 |avg loss 7.705 |avg tokens 2024.600 |tokens/s 7791.994 |walltime 1397.924 | +Transformer | epoch 0 | step 5250 |avg loss 8.358 |avg tokens 2187.200 |tokens/s 8636.141 |walltime 1400.456 | +Transformer | epoch 0 | step 5260 |avg loss 7.463 |avg tokens 2231.400 |tokens/s 8187.373 |walltime 1403.182 | +Transformer | epoch 0 | step 5270 |avg loss 7.790 |avg tokens 2199.400 |tokens/s 8071.446 |walltime 1405.907 | +Transformer | epoch 0 | step 5280 |avg loss 7.604 |avg tokens 2151.500 |tokens/s 8060.739 |walltime 1408.576 | +Transformer | epoch 0 | step 5290 |avg loss 8.276 |avg tokens 2040.300 |tokens/s 8442.487 |walltime 1410.992 | +Transformer | epoch 0 | step 5300 |avg loss 8.105 |avg tokens 2093.300 |tokens/s 8012.551 |walltime 1413.605 | +Transformer | epoch 0 | step 5310 |avg loss 7.711 |avg tokens 2331.200 |tokens/s 8523.922 |walltime 1416.340 | +Transformer | epoch 0 | step 5320 |avg loss 7.848 |avg tokens 2347.500 |tokens/s 8585.570 |walltime 1419.074 | +Transformer | epoch 0 | step 5330 |avg loss 7.222 |avg tokens 2308.000 |tokens/s 8445.828 |walltime 1421.807 | +Transformer | epoch 0 | step 5340 |avg loss 8.290 |avg tokens 2243.200 |tokens/s 8662.242 |walltime 1424.396 | +Transformer | epoch 0 | step 5350 |avg loss 7.764 |avg tokens 1996.100 |tokens/s 7919.037 |walltime 1426.917 | +Transformer | epoch 0 | step 5360 |avg loss 7.597 |avg tokens 2276.800 |tokens/s 8328.208 |walltime 1429.651 | +Transformer | epoch 0 | step 5370 |avg loss 8.546 |avg tokens 2123.300 |tokens/s 8824.281 |walltime 1432.057 | +Transformer | epoch 0 | step 5380 |avg loss 7.891 |avg tokens 2051.900 |tokens/s 7626.058 |walltime 1434.748 | +Transformer | epoch 0 | step 5390 |avg loss 7.750 |avg tokens 2005.800 |tokens/s 7914.961 |walltime 1437.282 | +Transformer | epoch 0 | step 5400 |avg loss 7.459 |avg tokens 2317.100 |tokens/s 8438.872 |walltime 1440.028 | +Transformer | epoch 0 | step 5410 |avg loss 7.916 |avg tokens 2187.400 |tokens/s 8206.892 |walltime 1442.693 | +Transformer | epoch 0 | step 5420 |avg loss 7.654 |avg tokens 2138.500 |tokens/s 7993.459 |walltime 1445.368 | +Transformer | epoch 0 | step 5430 |avg loss 7.886 |avg tokens 2175.800 |tokens/s 8259.193 |walltime 1448.003 | +Transformer | epoch 0 | step 5440 |avg loss 8.041 |avg tokens 2028.800 |tokens/s 8062.020 |walltime 1450.519 | +Transformer | epoch 0 | step 5450 |avg loss 7.841 |avg tokens 2361.000 |tokens/s 8727.523 |walltime 1453.225 | +Transformer | epoch 0 | step 5460 |avg loss 7.834 |avg tokens 2120.600 |tokens/s 7983.197 |walltime 1455.881 | +Transformer | epoch 0 | step 5470 |avg loss 7.927 |avg tokens 2353.900 |tokens/s 8750.955 |walltime 1458.571 | +Transformer | epoch 0 | step 5480 |avg loss 7.637 |avg tokens 2398.400 |tokens/s 8810.970 |walltime 1461.293 | +Transformer | epoch 0 | step 5490 |avg loss 8.131 |avg tokens 2161.000 |tokens/s 8334.172 |walltime 1463.886 | +Transformer | epoch 0 | step 5500 |avg loss 7.967 |avg tokens 2242.400 |tokens/s 8496.211 |walltime 1466.525 | +Transformer | epoch 0 | step 5510 |avg loss 7.677 |avg tokens 2366.400 |tokens/s 8681.171 |walltime 1469.251 | +Transformer | epoch 0 | step 5520 |avg loss 7.722 |avg tokens 2302.400 |tokens/s 8382.074 |walltime 1471.998 | +Transformer | epoch 0 | step 5530 |avg loss 7.850 |avg tokens 2221.800 |tokens/s 8175.640 |walltime 1474.715 | +Transformer | epoch 0 | step 5540 |avg loss 8.146 |avg tokens 1996.700 |tokens/s 7887.100 |walltime 1477.247 | +Transformer | epoch 0 | step 5550 |avg loss 7.564 |avg tokens 2022.800 |tokens/s 7668.325 |walltime 1479.885 | +Transformer | epoch 0 | step 5560 |avg loss 7.967 |avg tokens 1919.200 |tokens/s 7660.348 |walltime 1482.390 | +Transformer | epoch 0 | step 5570 |avg loss 7.885 |avg tokens 2116.700 |tokens/s 8091.530 |walltime 1485.006 | +Transformer | epoch 0 | step 5580 |avg loss 7.689 |avg tokens 2229.100 |tokens/s 8146.676 |walltime 1487.742 | +Transformer | epoch 0 | step 5590 |avg loss 7.787 |avg tokens 2420.100 |tokens/s 8987.744 |walltime 1490.435 | +Transformer | epoch 0 | step 5600 |avg loss 7.928 |avg tokens 2174.500 |tokens/s 8177.561 |walltime 1493.094 | +Transformer | epoch 0 | step 5610 |avg loss 7.508 |avg tokens 2254.600 |tokens/s 8069.031 |walltime 1495.888 | +Transformer | epoch 0 | step 5620 |avg loss 8.248 |avg tokens 2103.500 |tokens/s 8306.620 |walltime 1498.421 | +Transformer | epoch 0 | step 5630 |avg loss 7.701 |avg tokens 2180.300 |tokens/s 8182.501 |walltime 1501.085 | +Transformer | epoch 0 | step 5640 |avg loss 7.612 |avg tokens 2370.500 |tokens/s 8588.975 |walltime 1503.845 | +Transformer | epoch 0 | step 5650 |avg loss 7.936 |avg tokens 2147.700 |tokens/s 8152.682 |walltime 1506.480 | +Transformer | epoch 0 | step 5660 |avg loss 7.859 |avg tokens 2152.800 |tokens/s 8321.300 |walltime 1509.067 | +Transformer | epoch 0 | step 5670 |avg loss 7.503 |avg tokens 2191.300 |tokens/s 8288.601 |walltime 1511.710 | +Transformer | epoch 0 | step 5680 |avg loss 8.099 |avg tokens 2046.800 |tokens/s 8031.781 |walltime 1514.259 | +Transformer | epoch 0 | step 5690 |avg loss 7.862 |avg tokens 2262.400 |tokens/s 8570.018 |walltime 1516.899 | +Transformer | epoch 0 | step 5700 |avg loss 7.839 |avg tokens 2139.400 |tokens/s 8143.170 |walltime 1519.526 | +Transformer | epoch 0 | step 5710 |avg loss 7.710 |avg tokens 2217.200 |tokens/s 8248.244 |walltime 1522.214 | +Transformer | epoch 0 | step 5720 |avg loss 7.753 |avg tokens 2253.100 |tokens/s 8384.541 |walltime 1524.901 | +Transformer | epoch 0 | step 5730 |avg loss 7.661 |avg tokens 2221.600 |tokens/s 8283.733 |walltime 1527.583 | +Transformer | epoch 0 | step 5740 |avg loss 7.513 |avg tokens 2278.400 |tokens/s 8397.538 |walltime 1530.296 | +Transformer | epoch 0 | step 5750 |avg loss 7.835 |avg tokens 2327.500 |tokens/s 8764.409 |walltime 1532.952 | +Transformer | epoch 0 | step 5760 |avg loss 7.772 |avg tokens 2233.100 |tokens/s 8280.310 |walltime 1535.649 | +Transformer | epoch 0 | step 5770 |avg loss 7.699 |avg tokens 2356.800 |tokens/s 8754.677 |walltime 1538.341 | +Transformer | epoch 0 | step 5780 |avg loss 7.489 |avg tokens 2339.200 |tokens/s 8371.265 |walltime 1541.135 | +Transformer | epoch 0 | step 5790 |avg loss 7.868 |avg tokens 2011.300 |tokens/s 7873.400 |walltime 1543.690 | +Transformer | epoch 0 | step 5800 |avg loss 7.521 |avg tokens 2212.800 |tokens/s 8292.456 |walltime 1546.358 | +Transformer | epoch 0 | step 5810 |avg loss 7.884 |avg tokens 2338.900 |tokens/s 8869.698 |walltime 1548.995 | +Transformer | epoch 0 | step 5820 |avg loss 7.881 |avg tokens 2210.700 |tokens/s 8197.403 |walltime 1551.692 | +Transformer | epoch 0 | step 5830 |avg loss 7.916 |avg tokens 2082.300 |tokens/s 8051.889 |walltime 1554.278 | +Transformer | epoch 0 | step 5840 |avg loss 7.328 |avg tokens 2207.500 |tokens/s 8121.139 |walltime 1556.996 | +Transformer | epoch 0 | step 5850 |avg loss 7.560 |avg tokens 2174.600 |tokens/s 8285.403 |walltime 1559.621 | +Transformer | epoch 0 | step 5860 |avg loss 7.734 |avg tokens 2070.700 |tokens/s 7926.600 |walltime 1562.233 | +Transformer | epoch 0 | step 5870 |avg loss 8.038 |avg tokens 1824.000 |tokens/s 7534.253 |walltime 1564.654 | +Transformer | epoch 0 | step 5880 |avg loss 8.051 |avg tokens 2115.400 |tokens/s 7818.419 |walltime 1567.360 | +Transformer | epoch 0 | step 5890 |avg loss 8.034 |avg tokens 2277.700 |tokens/s 8718.316 |walltime 1569.972 | +Transformer | epoch 0 | step 5900 |avg loss 7.435 |avg tokens 2214.000 |tokens/s 8144.960 |walltime 1572.691 | +Transformer | epoch 0 | step 5910 |avg loss 7.900 |avg tokens 1964.300 |tokens/s 7590.435 |walltime 1575.279 | +Transformer | epoch 0 | step 5920 |avg loss 7.852 |avg tokens 2140.600 |tokens/s 8544.327 |walltime 1577.784 | +Transformer | epoch 0 | step 5930 |avg loss 7.536 |avg tokens 2321.600 |tokens/s 8410.235 |walltime 1580.544 | +Transformer | epoch 0 | step 5940 |avg loss 7.583 |avg tokens 2120.400 |tokens/s 8407.894 |walltime 1583.066 | +Transformer | epoch 0 | step 5950 |avg loss 7.192 |avg tokens 2186.700 |tokens/s 8165.554 |walltime 1585.744 | +Transformer | epoch 0 | step 5960 |avg loss 8.475 |avg tokens 1970.100 |tokens/s 8155.904 |walltime 1588.160 | +Transformer | epoch 0 | step 5970 |avg loss 7.974 |avg tokens 2384.800 |tokens/s 8816.559 |walltime 1590.865 | +Transformer | epoch 0 | step 5980 |avg loss 7.546 |avg tokens 2341.400 |tokens/s 9142.682 |walltime 1593.426 | +Transformer | epoch 0 | step 5990 |avg loss 7.916 |avg tokens 2180.400 |tokens/s 8122.677 |walltime 1596.110 | +Transformer | epoch 0 | step 6000 |avg loss 7.743 |avg tokens 2026.700 |tokens/s 7947.004 |walltime 1598.660 | +Transformer | epoch 0 | step 6010 |avg loss 8.029 |avg tokens 1908.200 |tokens/s 7396.489 |walltime 1601.240 | +Transformer | epoch 0 | step 6020 |avg loss 7.397 |avg tokens 2289.300 |tokens/s 8350.106 |walltime 1603.982 | +Transformer | epoch 0 | step 6030 |avg loss 8.022 |avg tokens 2098.700 |tokens/s 8102.213 |walltime 1606.572 | +Transformer | epoch 0 | step 6040 |avg loss 7.525 |avg tokens 2161.000 |tokens/s 7977.744 |walltime 1609.281 | +Transformer | epoch 0 | step 6050 |avg loss 7.540 |avg tokens 2233.800 |tokens/s 8108.922 |walltime 1612.036 | +Transformer | epoch 0 | step 6060 |avg loss 8.094 |avg tokens 2167.700 |tokens/s 8459.548 |walltime 1614.598 | +Transformer | epoch 0 | step 6070 |avg loss 7.946 |avg tokens 2147.900 |tokens/s 8060.037 |walltime 1617.263 | +Transformer | epoch 0 | step 6080 |avg loss 8.021 |avg tokens 2085.000 |tokens/s 8131.187 |walltime 1619.827 | +Transformer | epoch 0 | step 6090 |avg loss 7.789 |avg tokens 2365.500 |tokens/s 8696.742 |walltime 1622.547 | +Transformer | epoch 0 | step 6100 |avg loss 7.234 |avg tokens 2272.800 |tokens/s 8262.375 |walltime 1625.298 | +Transformer | epoch 0 | step 6110 |avg loss 7.329 |avg tokens 2203.100 |tokens/s 8168.941 |walltime 1627.995 | +Transformer | epoch 0 | step 6120 |avg loss 7.425 |avg tokens 2171.200 |tokens/s 8155.421 |walltime 1630.657 | +Transformer | epoch 0 | step 6130 |avg loss 7.392 |avg tokens 2095.100 |tokens/s 7920.667 |walltime 1633.302 | +Transformer | epoch 0 | step 6140 |avg loss 7.863 |avg tokens 2076.000 |tokens/s 8063.059 |walltime 1635.877 | +Transformer | epoch 0 | step 6150 |avg loss 7.584 |avg tokens 2178.200 |tokens/s 8308.881 |walltime 1638.498 | +Transformer | epoch 0 | step 6160 |avg loss 7.863 |avg tokens 2321.300 |tokens/s 8660.669 |walltime 1641.179 | +Transformer | epoch 0 | step 6170 |avg loss 7.613 |avg tokens 2189.500 |tokens/s 8071.304 |walltime 1643.891 | +Transformer | epoch 0 | step 6180 |avg loss 8.232 |avg tokens 2065.800 |tokens/s 8324.652 |walltime 1646.373 | +Transformer | epoch 0 | step 6190 |avg loss 7.574 |avg tokens 2140.100 |tokens/s 8100.733 |walltime 1649.015 | +Transformer | epoch 0 | step 6200 |avg loss 7.571 |avg tokens 2380.800 |tokens/s 8646.264 |walltime 1651.768 | +Transformer | epoch 0 | step 6210 |avg loss 7.971 |avg tokens 1907.000 |tokens/s 7529.814 |walltime 1654.301 | +Transformer | epoch 0 | step 6220 |avg loss 7.984 |avg tokens 1770.900 |tokens/s 7146.000 |walltime 1656.779 | +Transformer | epoch 0 | step 6230 |avg loss 7.244 |avg tokens 2373.600 |tokens/s 8394.016 |walltime 1659.607 | +Transformer | epoch 0 | step 6240 |avg loss 7.397 |avg tokens 2150.400 |tokens/s 8231.659 |walltime 1662.219 | +Transformer | epoch 0 | step 6250 |avg loss 7.489 |avg tokens 2303.200 |tokens/s 8655.028 |walltime 1664.880 | +Transformer | epoch 0 | step 6260 |avg loss 7.505 |avg tokens 2169.800 |tokens/s 8241.477 |walltime 1667.513 | +Transformer | epoch 0 | step 6270 |avg loss 8.189 |avg tokens 2103.000 |tokens/s 8275.696 |walltime 1670.054 | +Transformer | epoch 0 | step 6280 |avg loss 7.931 |avg tokens 2337.400 |tokens/s 8948.489 |walltime 1672.666 | +Transformer | epoch 0 | step 6290 |avg loss 7.129 |avg tokens 2332.000 |tokens/s 8474.936 |walltime 1675.418 | +Transformer | epoch 0 | step 6300 |avg loss 7.356 |avg tokens 2225.600 |tokens/s 8153.745 |walltime 1678.148 | +Transformer | epoch 0 | step 6310 |avg loss 7.452 |avg tokens 2308.800 |tokens/s 8480.443 |walltime 1680.870 | +Transformer | epoch 0 | step 6320 |avg loss 7.843 |avg tokens 2316.800 |tokens/s 8753.603 |walltime 1683.517 | +Transformer | epoch 0 | step 6330 |avg loss 7.752 |avg tokens 1987.600 |tokens/s 7584.453 |walltime 1686.137 | +Transformer | epoch 0 | step 6340 |avg loss 7.796 |avg tokens 2052.600 |tokens/s 8070.369 |walltime 1688.681 | +Transformer | epoch 0 | step 6350 |avg loss 7.666 |avg tokens 2077.800 |tokens/s 7899.388 |walltime 1691.311 | +Transformer | epoch 0 | step 6360 |avg loss 7.492 |avg tokens 2237.700 |tokens/s 8357.933 |walltime 1693.988 | +Transformer | epoch 0 | step 6370 |avg loss 7.876 |avg tokens 2208.900 |tokens/s 8663.144 |walltime 1696.538 | +Transformer | epoch 0 | step 6380 |avg loss 7.762 |avg tokens 2349.200 |tokens/s 8949.063 |walltime 1699.163 | +Transformer | epoch 0 | step 6390 |avg loss 7.881 |avg tokens 2059.500 |tokens/s 7876.733 |walltime 1701.778 | +Transformer | epoch 0 | step 6400 |avg loss 7.517 |avg tokens 2384.600 |tokens/s 8658.676 |walltime 1704.532 | +Transformer | epoch 0 | step 6410 |avg loss 7.783 |avg tokens 1916.500 |tokens/s 7616.225 |walltime 1707.048 | +Transformer | epoch 0 | step 6420 |avg loss 7.825 |avg tokens 2226.500 |tokens/s 8335.570 |walltime 1709.719 | +Transformer | epoch 0 | step 6430 |avg loss 7.879 |avg tokens 2290.200 |tokens/s 8735.627 |walltime 1712.341 | +Transformer | epoch 0 | step 6440 |avg loss 7.630 |avg tokens 2171.700 |tokens/s 8185.622 |walltime 1714.994 | +Transformer | epoch 0 | step 6450 |avg loss 7.913 |avg tokens 2299.600 |tokens/s 8851.427 |walltime 1717.592 | +Transformer | epoch 0 | step 6460 |avg loss 7.851 |avg tokens 2233.300 |tokens/s 8660.017 |walltime 1720.171 | +Transformer | epoch 0 | step 6470 |avg loss 7.835 |avg tokens 2171.200 |tokens/s 8338.488 |walltime 1722.775 | +Transformer | epoch 0 | step 6480 |avg loss 7.427 |avg tokens 2341.600 |tokens/s 8619.457 |walltime 1725.491 | +Transformer | epoch 0 | step 6490 |avg loss 7.673 |avg tokens 2216.700 |tokens/s 8259.826 |walltime 1728.175 | +Transformer | epoch 0 | step 6500 |avg loss 7.538 |avg tokens 2367.500 |tokens/s 8470.822 |walltime 1730.970 | +Transformer | epoch 0 | step 6510 |avg loss 7.363 |avg tokens 2201.800 |tokens/s 7987.098 |walltime 1733.727 | +Transformer | epoch 0 | step 6520 |avg loss 7.871 |avg tokens 2110.300 |tokens/s 8191.998 |walltime 1736.303 | +Transformer | epoch 0 | step 6530 |avg loss 7.773 |avg tokens 2078.100 |tokens/s 8267.342 |walltime 1738.816 | +Transformer | epoch 0 | step 6540 |avg loss 7.331 |avg tokens 2208.800 |tokens/s 8310.749 |walltime 1741.474 | +Transformer | epoch 0 | step 6550 |avg loss 7.395 |avg tokens 2252.400 |tokens/s 8323.158 |walltime 1744.180 | +Transformer | epoch 0 | step 6560 |avg loss 7.639 |avg tokens 2192.100 |tokens/s 8391.345 |walltime 1746.793 | +Transformer | epoch 0 | step 6570 |avg loss 7.535 |avg tokens 2127.600 |tokens/s 7997.122 |walltime 1749.453 | +Transformer | epoch 0 | step 6580 |avg loss 7.730 |avg tokens 2079.900 |tokens/s 7827.909 |walltime 1752.110 | +Transformer | epoch 0 | step 6590 |avg loss 7.428 |avg tokens 2396.000 |tokens/s 8769.583 |walltime 1754.842 | +Transformer | epoch 0 | step 6600 |avg loss 7.810 |avg tokens 1960.300 |tokens/s 8060.023 |walltime 1757.275 | +Transformer | epoch 0 | step 6610 |avg loss 7.972 |avg tokens 2110.600 |tokens/s 8052.383 |walltime 1759.896 | +Transformer | epoch 0 | step 6620 |avg loss 7.680 |avg tokens 2020.800 |tokens/s 7772.733 |walltime 1762.496 | +Transformer | epoch 0 | step 6630 |avg loss 7.268 |avg tokens 2261.400 |tokens/s 8270.816 |walltime 1765.230 | +Transformer | epoch 0 | step 6640 |avg loss 7.579 |avg tokens 2287.400 |tokens/s 8430.922 |walltime 1767.943 | +Transformer | epoch 0 | step 6650 |avg loss 8.168 |avg tokens 2094.800 |tokens/s 8261.406 |walltime 1770.478 | +Transformer | epoch 0 | step 6660 |avg loss 7.622 |avg tokens 2383.200 |tokens/s 8654.859 |walltime 1773.232 | +Transformer | epoch 0 | step 6670 |avg loss 7.804 |avg tokens 2204.800 |tokens/s 8284.219 |walltime 1775.894 | +Transformer | epoch 0 | step 6680 |avg loss 8.144 |avg tokens 1891.000 |tokens/s 7411.761 |walltime 1778.445 | +Transformer | epoch 0 | step 6690 |avg loss 7.820 |avg tokens 1972.400 |tokens/s 7794.432 |walltime 1780.975 | +Transformer | epoch 0 | step 6700 |avg loss 7.326 |avg tokens 2297.200 |tokens/s 8385.994 |walltime 1783.715 | +Transformer | epoch 0 | step 6710 |avg loss 7.778 |avg tokens 2231.700 |tokens/s 8573.909 |walltime 1786.318 | +Transformer | epoch 0 | step 6720 |avg loss 7.659 |avg tokens 2177.600 |tokens/s 8212.442 |walltime 1788.969 | +Transformer | epoch 0 | step 6730 |avg loss 7.377 |avg tokens 2177.800 |tokens/s 8080.355 |walltime 1791.664 | +Transformer | epoch 0 | step 6740 |avg loss 8.107 |avg tokens 2182.500 |tokens/s 8458.001 |walltime 1794.245 | +Transformer | epoch 0 | step 6750 |avg loss 7.734 |avg tokens 2312.800 |tokens/s 8664.568 |walltime 1796.914 | +Transformer | epoch 0 | step 6760 |avg loss 7.667 |avg tokens 2242.400 |tokens/s 8525.766 |walltime 1799.544 | +Transformer | epoch 0 | step 6770 |avg loss 7.469 |avg tokens 2047.600 |tokens/s 7836.858 |walltime 1802.157 | +Transformer | epoch 0 | step 6780 |avg loss 7.932 |avg tokens 2154.600 |tokens/s 8488.555 |walltime 1804.695 | +Transformer | epoch 0 | step 6790 |avg loss 7.942 |avg tokens 2069.800 |tokens/s 8075.256 |walltime 1807.258 | +Transformer | epoch 0 | step 6800 |avg loss 7.507 |avg tokens 2375.900 |tokens/s 8614.353 |walltime 1810.016 | +Transformer | epoch 0 | step 6810 |avg loss 7.555 |avg tokens 2184.000 |tokens/s 7990.126 |walltime 1812.750 | +Transformer | epoch 0 | step 6820 |avg loss 7.684 |avg tokens 2205.600 |tokens/s 8155.211 |walltime 1815.454 | +Transformer | epoch 0 | step 6830 |avg loss 8.157 |avg tokens 2063.500 |tokens/s 8145.723 |walltime 1817.988 | +Transformer | epoch 0 | step 6840 |avg loss 7.534 |avg tokens 2358.800 |tokens/s 8623.720 |walltime 1820.723 | +Transformer | epoch 0 | step 6850 |avg loss 7.648 |avg tokens 2152.000 |tokens/s 7947.104 |walltime 1823.431 | +Transformer | epoch 0 | step 6860 |avg loss 7.925 |avg tokens 1914.900 |tokens/s 7543.629 |walltime 1825.969 | +Transformer | epoch 0 | step 6870 |avg loss 8.376 |avg tokens 2173.600 |tokens/s 8783.958 |walltime 1828.444 | +Transformer | epoch 0 | step 6880 |avg loss 7.380 |avg tokens 2183.100 |tokens/s 8007.176 |walltime 1831.170 | +Transformer | epoch 0 | step 6890 |avg loss 8.171 |avg tokens 2083.600 |tokens/s 8241.355 |walltime 1833.698 | +Transformer | epoch 0 | step 6900 |avg loss 8.089 |avg tokens 2069.300 |tokens/s 8152.964 |walltime 1836.236 | +Transformer | epoch 0 | step 6910 |avg loss 7.433 |avg tokens 2361.700 |tokens/s 8644.066 |walltime 1838.969 | +Transformer | epoch 0 | step 6920 |avg loss 7.517 |avg tokens 2231.200 |tokens/s 8265.657 |walltime 1841.668 | +Transformer | epoch 0 | step 6930 |avg loss 7.554 |avg tokens 2289.600 |tokens/s 8305.639 |walltime 1844.425 | +Transformer | epoch 0 | step 6940 |avg loss 8.218 |avg tokens 2106.000 |tokens/s 8408.664 |walltime 1846.929 | +Transformer | epoch 0 | step 6950 |avg loss 7.522 |avg tokens 2276.300 |tokens/s 8404.170 |walltime 1849.638 | +Transformer | epoch 0 | step 6960 |avg loss 7.513 |avg tokens 2249.600 |tokens/s 8404.583 |walltime 1852.314 | +Transformer | epoch 0 | step 6970 |avg loss 7.346 |avg tokens 2146.800 |tokens/s 7932.591 |walltime 1855.021 | +Transformer | epoch 0 | step 6980 |avg loss 7.655 |avg tokens 2152.300 |tokens/s 8126.524 |walltime 1857.669 | +Transformer | epoch 0 | step 6990 |avg loss 7.766 |avg tokens 2168.800 |tokens/s 8269.481 |walltime 1860.292 | +Transformer | epoch 0 | step 7000 |avg loss 8.046 |avg tokens 1980.500 |tokens/s 7685.970 |walltime 1862.869 | +Transformer | epoch 0 | step 7010 |avg loss 7.462 |avg tokens 2104.100 |tokens/s 8007.644 |walltime 1865.496 | +Transformer | epoch 0 | step 7020 |avg loss 7.574 |avg tokens 2202.400 |tokens/s 8171.533 |walltime 1868.192 | +Transformer | epoch 0 | step 7030 |avg loss 7.638 |avg tokens 2338.100 |tokens/s 8600.476 |walltime 1870.910 | +Transformer | epoch 0 | step 7040 |avg loss 7.522 |avg tokens 2276.400 |tokens/s 8436.840 |walltime 1873.608 | +Transformer | epoch 0 | step 7050 |avg loss 7.639 |avg tokens 2150.200 |tokens/s 8184.765 |walltime 1876.235 | +Transformer | epoch 0 | step 7060 |avg loss 7.708 |avg tokens 2291.900 |tokens/s 8619.244 |walltime 1878.894 | +Transformer | epoch 0 | step 7070 |avg loss 7.632 |avg tokens 2322.600 |tokens/s 8718.865 |walltime 1881.558 | +Transformer | epoch 0 | step 7080 |avg loss 7.544 |avg tokens 2171.300 |tokens/s 8106.342 |walltime 1884.237 | +Transformer | epoch 0 | step 7090 |avg loss 7.995 |avg tokens 2121.200 |tokens/s 8635.010 |walltime 1886.693 | +Transformer | epoch 0 | step 7100 |avg loss 8.201 |avg tokens 2065.800 |tokens/s 8086.959 |walltime 1889.248 | +Transformer | epoch 0 | step 7110 |avg loss 7.721 |avg tokens 2234.900 |tokens/s 8511.250 |walltime 1891.874 | +Transformer | epoch 0 | step 7120 |avg loss 7.241 |avg tokens 2299.200 |tokens/s 8353.296 |walltime 1894.626 | +Transformer | epoch 0 | step 7130 |avg loss 7.867 |avg tokens 2093.000 |tokens/s 7897.987 |walltime 1897.276 | +Transformer | epoch 0 | step 7140 |avg loss 7.336 |avg tokens 2207.000 |tokens/s 8089.051 |walltime 1900.005 | +Transformer | epoch 0 | step 7150 |avg loss 7.460 |avg tokens 2309.800 |tokens/s 8405.354 |walltime 1902.753 | +Transformer | epoch 0 | step 7160 |avg loss 7.972 |avg tokens 2231.300 |tokens/s 8776.079 |walltime 1905.295 | +Transformer | epoch 0 | step 7170 |avg loss 8.011 |avg tokens 2413.200 |tokens/s 9272.078 |walltime 1907.898 | +Transformer | epoch 0 | step 7180 |avg loss 7.617 |avg tokens 2228.800 |tokens/s 8103.336 |walltime 1910.648 | +Transformer | epoch 0 | step 7190 |avg loss 8.125 |avg tokens 2086.900 |tokens/s 7928.197 |walltime 1913.280 | +Transformer | epoch 0 | step 7200 |avg loss 7.659 |avg tokens 2215.600 |tokens/s 8413.482 |walltime 1915.914 | +Transformer | epoch 0 | step 7210 |avg loss 7.400 |avg tokens 2152.100 |tokens/s 7981.158 |walltime 1918.610 | +Transformer | epoch 0 | step 7220 |avg loss 8.029 |avg tokens 2313.200 |tokens/s 8749.142 |walltime 1921.254 | +Transformer | epoch 0 | step 7230 |avg loss 7.517 |avg tokens 2156.000 |tokens/s 8072.557 |walltime 1923.925 | +Transformer | epoch 0 | step 7240 |avg loss 7.588 |avg tokens 2205.200 |tokens/s 8124.106 |walltime 1926.639 | +Transformer | epoch 0 | step 7250 |avg loss 8.012 |avg tokens 2104.600 |tokens/s 8314.790 |walltime 1929.171 | +Transformer | epoch 0 | step 7260 |avg loss 7.748 |avg tokens 2102.100 |tokens/s 7849.854 |walltime 1931.848 | +Transformer | epoch 0 | step 7270 |avg loss 8.212 |avg tokens 1801.500 |tokens/s 7217.489 |walltime 1934.344 | +Transformer | epoch 0 | step 7280 |avg loss 7.639 |avg tokens 2246.300 |tokens/s 8387.729 |walltime 1937.023 | +Transformer | epoch 0 | step 7290 |avg loss 7.686 |avg tokens 2270.400 |tokens/s 8475.307 |walltime 1939.701 | +Transformer | epoch 0 | step 7300 |avg loss 7.693 |avg tokens 2230.800 |tokens/s 8216.029 |walltime 1942.417 | +Transformer | epoch 0 | step 7310 |avg loss 7.729 |avg tokens 2079.400 |tokens/s 8018.857 |walltime 1945.010 | +Transformer | epoch 0 | step 7320 |avg loss 7.469 |avg tokens 2265.600 |tokens/s 8298.819 |walltime 1947.740 | +Transformer | epoch 0 | step 7330 |avg loss 7.745 |avg tokens 2143.800 |tokens/s 8279.129 |walltime 1950.329 | +Transformer | epoch 0 | step 7340 |avg loss 7.897 |avg tokens 2245.600 |tokens/s 8402.714 |walltime 1953.002 | +Transformer | epoch 0 | step 7350 |avg loss 7.748 |avg tokens 2311.400 |tokens/s 8508.946 |walltime 1955.718 | +Transformer | epoch 0 | step 7360 |avg loss 7.431 |avg tokens 2196.700 |tokens/s 8181.670 |walltime 1958.403 | +Transformer | epoch 0 | step 7370 |avg loss 7.887 |avg tokens 2024.100 |tokens/s 7893.036 |walltime 1960.967 | +Transformer | epoch 0 | step 7380 |avg loss 8.101 |avg tokens 1925.700 |tokens/s 7782.212 |walltime 1963.442 | +Transformer | epoch 0 | step 7390 |avg loss 7.830 |avg tokens 2202.400 |tokens/s 8276.758 |walltime 1966.103 | +Transformer | epoch 0 | step 7400 |avg loss 7.691 |avg tokens 2287.800 |tokens/s 8547.798 |walltime 1968.779 | +Transformer | epoch 0 | step 7410 |avg loss 7.750 |avg tokens 2046.700 |tokens/s 7964.105 |walltime 1971.349 | +Transformer | epoch 0 | step 7420 |avg loss 7.447 |avg tokens 2405.200 |tokens/s 8606.813 |walltime 1974.144 | +Transformer | epoch 0 | step 7430 |avg loss 7.979 |avg tokens 2210.500 |tokens/s 8618.320 |walltime 1976.709 | +Transformer | epoch 0 | step 7440 |avg loss 7.755 |avg tokens 2039.300 |tokens/s 7949.830 |walltime 1979.274 | +Transformer | epoch 0 | step 7450 |avg loss 7.666 |avg tokens 2395.300 |tokens/s 9176.428 |walltime 1981.884 | +Transformer | epoch 0 | step 7460 |avg loss 8.182 |avg tokens 1915.200 |tokens/s 7632.932 |walltime 1984.393 | +Transformer | epoch 0 | step 7470 |avg loss 7.724 |avg tokens 2324.300 |tokens/s 8546.469 |walltime 1987.113 | +Transformer | epoch 0 | step 7480 |avg loss 7.283 |avg tokens 2355.700 |tokens/s 8420.720 |walltime 1989.910 | +Transformer | epoch 0 | step 7490 |avg loss 7.594 |avg tokens 2305.600 |tokens/s 8673.483 |walltime 1992.569 | +Transformer | epoch 0 | step 7500 |avg loss 7.582 |avg tokens 2394.400 |tokens/s 8751.889 |walltime 1995.304 | +Transformer | epoch 0 | step 7510 |avg loss 7.656 |avg tokens 2258.000 |tokens/s 8559.065 |walltime 1997.943 | +Transformer | epoch 0 | step 7520 |avg loss 7.681 |avg tokens 2319.600 |tokens/s 8587.374 |walltime 2000.644 | +Transformer | epoch 0 | step 7530 |avg loss 7.510 |avg tokens 2366.400 |tokens/s 8649.355 |walltime 2003.380 | +Transformer | epoch 0 | step 7540 |avg loss 7.447 |avg tokens 2255.200 |tokens/s 8504.759 |walltime 2006.031 | +Transformer | epoch 0 | step 7550 |avg loss 7.708 |avg tokens 2152.800 |tokens/s 8111.161 |walltime 2008.686 | +Transformer | epoch 0 | step 7560 |avg loss 8.183 |avg tokens 2246.400 |tokens/s 8638.962 |walltime 2011.286 | +Transformer | epoch 0 | step 7570 |avg loss 7.413 |avg tokens 2276.700 |tokens/s 8301.251 |walltime 2014.028 | +Transformer | epoch 0 | step 7580 |avg loss 7.192 |avg tokens 2335.200 |tokens/s 8210.372 |walltime 2016.873 | +Transformer | epoch 0 | step 7590 |avg loss 7.800 |avg tokens 2259.800 |tokens/s 8552.580 |walltime 2019.515 | +Transformer | epoch 0 | step 7600 |avg loss 7.590 |avg tokens 2068.900 |tokens/s 7782.148 |walltime 2022.173 | +Transformer | epoch 0 | step 7610 |avg loss 7.691 |avg tokens 2059.500 |tokens/s 8051.558 |walltime 2024.731 | +Transformer | epoch 0 | step 7620 |avg loss 8.049 |avg tokens 2059.000 |tokens/s 8179.349 |walltime 2027.249 | +Transformer | epoch 0 | step 7630 |avg loss 7.382 |avg tokens 2228.500 |tokens/s 8301.093 |walltime 2029.933 | +Transformer | epoch 0 | step 7640 |avg loss 7.539 |avg tokens 2196.400 |tokens/s 8169.630 |walltime 2032.622 | +Transformer | epoch 0 | step 7650 |avg loss 8.536 |avg tokens 2287.300 |tokens/s 9509.394 |walltime 2035.027 | +Transformer | epoch 0 | step 7660 |avg loss 7.766 |avg tokens 2218.500 |tokens/s 8664.289 |walltime 2037.588 | +Transformer | epoch 0 | step 7670 |avg loss 7.386 |avg tokens 1973.400 |tokens/s 7662.474 |walltime 2040.163 | +Transformer | epoch 0 | step 7680 |avg loss 7.888 |avg tokens 1954.500 |tokens/s 7756.624 |walltime 2042.683 | +Transformer | epoch 0 | step 7690 |avg loss 7.882 |avg tokens 2272.800 |tokens/s 8670.183 |walltime 2045.304 | +Transformer | epoch 0 | step 7700 |avg loss 7.657 |avg tokens 1978.000 |tokens/s 7774.724 |walltime 2047.848 | +Transformer | epoch 0 | step 7710 |avg loss 7.718 |avg tokens 2258.600 |tokens/s 8459.754 |walltime 2050.518 | +Transformer | epoch 0 | step 7720 |avg loss 7.615 |avg tokens 2178.700 |tokens/s 8205.471 |walltime 2053.173 | +Transformer | epoch 0 | step 7730 |avg loss 7.677 |avg tokens 2027.800 |tokens/s 7719.332 |walltime 2055.800 | +Transformer | epoch 0 | step 7740 |avg loss 7.543 |avg tokens 2055.100 |tokens/s 7728.098 |walltime 2058.460 | +Transformer | epoch 0 | step 7750 |avg loss 7.723 |avg tokens 2071.700 |tokens/s 7845.666 |walltime 2061.100 | +Transformer | epoch 0 | step 7760 |avg loss 7.966 |avg tokens 2210.000 |tokens/s 8576.131 |walltime 2063.677 | +Transformer | epoch 0 | step 7770 |avg loss 8.034 |avg tokens 1977.600 |tokens/s 8023.168 |walltime 2066.142 | +Transformer | epoch 0 | step 7780 |avg loss 7.832 |avg tokens 2175.700 |tokens/s 8315.350 |walltime 2068.758 | +Transformer | epoch 0 | step 7790 |avg loss 7.718 |avg tokens 2338.400 |tokens/s 8635.101 |walltime 2071.466 | +Transformer | epoch 0 | step 7800 |avg loss 7.560 |avg tokens 1969.800 |tokens/s 7863.839 |walltime 2073.971 | +Transformer | epoch 0 | step 7810 |avg loss 7.741 |avg tokens 2203.900 |tokens/s 8269.004 |walltime 2076.637 | +Transformer | epoch 0 | step 7820 |avg loss 7.686 |avg tokens 2231.500 |tokens/s 8331.719 |walltime 2079.315 | +Transformer | epoch 0 | step 7830 |avg loss 8.770 |avg tokens 2108.700 |tokens/s 9003.788 |walltime 2081.657 | +Transformer | epoch 0 | step 7840 |avg loss 8.028 |avg tokens 2050.300 |tokens/s 8115.571 |walltime 2084.183 | +Transformer | epoch 0 | step 7850 |avg loss 8.045 |avg tokens 2174.400 |tokens/s 8157.268 |walltime 2086.849 | +Transformer | epoch 0 | step 7860 |avg loss 7.589 |avg tokens 2245.600 |tokens/s 8265.372 |walltime 2089.566 | +Transformer | epoch 0 | step 7870 |avg loss 7.335 |avg tokens 2216.000 |tokens/s 8116.821 |walltime 2092.296 | +Transformer | epoch 0 | step 7880 |avg loss 7.566 |avg tokens 2200.800 |tokens/s 8270.033 |walltime 2094.957 | +Transformer | epoch 0 | step 7890 |avg loss 7.416 |avg tokens 2253.300 |tokens/s 8380.340 |walltime 2097.646 | +Transformer | epoch 0 | step 7900 |avg loss 7.668 |avg tokens 2285.000 |tokens/s 8535.305 |walltime 2100.323 | +Transformer | epoch 0 | step 7910 |avg loss 7.808 |avg tokens 1983.500 |tokens/s 7829.337 |walltime 2102.856 | +Transformer | epoch 0 | step 7920 |avg loss 7.559 |avg tokens 2275.700 |tokens/s 8454.027 |walltime 2105.548 | +Transformer | epoch 0 | step 7930 |avg loss 7.941 |avg tokens 2089.700 |tokens/s 8182.836 |walltime 2108.102 | +Transformer | epoch 0 | step 7940 |avg loss 7.443 |avg tokens 2043.200 |tokens/s 7751.597 |walltime 2110.738 | +Transformer | epoch 0 | step 7950 |avg loss 7.773 |avg tokens 2326.900 |tokens/s 8783.507 |walltime 2113.387 | +Transformer | epoch 0 | step 7960 |avg loss 7.516 |avg tokens 2348.200 |tokens/s 8585.965 |walltime 2116.122 | +Transformer | epoch 0 | step 7970 |avg loss 7.552 |avg tokens 2170.100 |tokens/s 8247.502 |walltime 2118.753 | +Transformer | epoch 0 | step 7980 |avg loss 7.489 |avg tokens 2176.700 |tokens/s 8014.436 |walltime 2121.469 | +Transformer | epoch 0 | step 7990 |avg loss 7.469 |avg tokens 2276.800 |tokens/s 8396.590 |walltime 2124.181 | +Transformer | epoch 0 | step 8000 |avg loss 7.191 |avg tokens 2278.400 |tokens/s 8237.674 |walltime 2126.947 | +Transformer | epoch 0 | step 8010 |avg loss 7.623 |avg tokens 2292.600 |tokens/s 8465.306 |walltime 2129.655 | +Transformer | epoch 0 | step 8020 |avg loss 7.952 |avg tokens 2118.100 |tokens/s 8336.016 |walltime 2132.196 | +Transformer | epoch 0 | step 8030 |avg loss 7.541 |avg tokens 2242.800 |tokens/s 8197.871 |walltime 2134.932 | +Transformer | epoch 0 | step 8040 |avg loss 7.554 |avg tokens 2052.300 |tokens/s 7919.715 |walltime 2137.523 | +Transformer | epoch 0 | step 8050 |avg loss 7.433 |avg tokens 2305.000 |tokens/s 8430.542 |walltime 2140.257 | +Transformer | epoch 0 | step 8060 |avg loss 8.160 |avg tokens 2199.200 |tokens/s 9027.049 |walltime 2142.693 | +Transformer | epoch 0 | step 8070 |avg loss 7.383 |avg tokens 2248.600 |tokens/s 8410.628 |walltime 2145.367 | +Transformer | epoch 0 | step 8080 |avg loss 8.213 |avg tokens 1853.300 |tokens/s 7571.837 |walltime 2147.814 | +Transformer | epoch 0 | step 8090 |avg loss 7.664 |avg tokens 2072.400 |tokens/s 7818.235 |walltime 2150.465 | +Transformer | epoch 0 | step 8100 |avg loss 7.624 |avg tokens 2322.800 |tokens/s 8673.032 |walltime 2153.143 | +Transformer | epoch 0 | step 8110 |avg loss 7.674 |avg tokens 2261.100 |tokens/s 8597.644 |walltime 2155.773 | +Transformer | epoch 0 | step 8120 |avg loss 7.093 |avg tokens 2324.000 |tokens/s 8404.063 |walltime 2158.539 | +Transformer | epoch 0 | step 8130 |avg loss 7.528 |avg tokens 2173.600 |tokens/s 8011.068 |walltime 2161.252 | +Transformer | epoch 0 | step 8140 |avg loss 7.962 |avg tokens 2220.200 |tokens/s 8703.706 |walltime 2163.803 | +Transformer | epoch 0 | step 8150 |avg loss 7.661 |avg tokens 2252.200 |tokens/s 8717.634 |walltime 2166.386 | +Transformer | epoch 0 | step 8160 |avg loss 7.715 |avg tokens 2064.000 |tokens/s 8070.384 |walltime 2168.944 | +Transformer | epoch 0 | step 8170 |avg loss 7.604 |avg tokens 2255.600 |tokens/s 8124.697 |walltime 2171.720 | +Transformer | epoch 0 | step 8180 |avg loss 7.563 |avg tokens 2107.400 |tokens/s 8034.665 |walltime 2174.343 | +Transformer | epoch 0 | step 8190 |avg loss 8.291 |avg tokens 2048.900 |tokens/s 8256.636 |walltime 2176.824 | +Transformer | epoch 0 | step 8200 |avg loss 7.972 |avg tokens 1854.000 |tokens/s 7501.536 |walltime 2179.296 | +Transformer | epoch 0 | step 8210 |avg loss 7.613 |avg tokens 2309.300 |tokens/s 8447.580 |walltime 2182.030 | +Transformer | epoch 0 | step 8220 |avg loss 8.074 |avg tokens 2280.500 |tokens/s 8701.241 |walltime 2184.650 | +Transformer | epoch 0 | step 8230 |avg loss 7.685 |avg tokens 2361.900 |tokens/s 8750.270 |walltime 2187.350 | +Transformer | epoch 0 | step 8240 |avg loss 7.388 |avg tokens 2248.800 |tokens/s 8183.655 |walltime 2190.098 | +Transformer | epoch 0 | step 8250 |avg loss 7.700 |avg tokens 2223.800 |tokens/s 8332.068 |walltime 2192.767 | +Transformer | epoch 0 | step 8260 |avg loss 7.783 |avg tokens 2079.600 |tokens/s 8124.285 |walltime 2195.326 | +Transformer | epoch 0 | step 8270 |avg loss 7.412 |avg tokens 2103.700 |tokens/s 8079.959 |walltime 2197.930 | +Transformer | epoch 0 | step 8280 |avg loss 8.244 |avg tokens 2029.000 |tokens/s 8475.116 |walltime 2200.324 | +Transformer | epoch 0 | step 8290 |avg loss 7.707 |avg tokens 2151.700 |tokens/s 8288.364 |walltime 2202.920 | +Transformer | epoch 0 | step 8300 |avg loss 7.288 |avg tokens 2388.800 |tokens/s 8433.248 |walltime 2205.753 | +Transformer | epoch 0 | step 8310 |avg loss 7.354 |avg tokens 2419.000 |tokens/s 8764.478 |walltime 2208.513 | +Transformer | epoch 0 | step 8320 |avg loss 7.355 |avg tokens 2264.000 |tokens/s 8313.947 |walltime 2211.236 | +Transformer | epoch 0 | step 8330 |avg loss 7.662 |avg tokens 2395.200 |tokens/s 8800.311 |walltime 2213.958 | +Transformer | epoch 0 | step 8340 |avg loss 7.805 |avg tokens 2285.600 |tokens/s 8508.981 |walltime 2216.644 | +Transformer | epoch 0 | step 8350 |avg loss 7.845 |avg tokens 2106.100 |tokens/s 8114.577 |walltime 2219.239 | +Transformer | epoch 0 | step 8360 |avg loss 7.646 |avg tokens 2239.200 |tokens/s 8436.468 |walltime 2221.893 | +Transformer | epoch 0 | step 8370 |avg loss 7.678 |avg tokens 2257.800 |tokens/s 8427.782 |walltime 2224.572 | +Transformer | epoch 0 | step 8380 |avg loss 7.744 |avg tokens 2126.300 |tokens/s 8021.436 |walltime 2227.223 | +Transformer | epoch 0 | step 8390 |avg loss 7.535 |avg tokens 2272.100 |tokens/s 8473.783 |walltime 2229.904 | +Transformer | epoch 0 | step 8400 |avg loss 7.534 |avg tokens 2187.900 |tokens/s 8352.960 |walltime 2232.524 | +Transformer | epoch 0 | step 8410 |avg loss 7.570 |avg tokens 2355.000 |tokens/s 8563.103 |walltime 2235.274 | +Transformer | epoch 0 | step 8420 |avg loss 7.530 |avg tokens 2131.200 |tokens/s 8120.339 |walltime 2237.898 | +Transformer | epoch 0 | step 8430 |avg loss 7.823 |avg tokens 2124.700 |tokens/s 8292.040 |walltime 2240.461 | +Transformer | epoch 0 | step 8440 |avg loss 7.247 |avg tokens 2177.500 |tokens/s 8086.903 |walltime 2243.153 | +Transformer | epoch 0 | step 8450 |avg loss 7.485 |avg tokens 2317.600 |tokens/s 8434.485 |walltime 2245.901 | +Transformer | epoch 0 | step 8460 |avg loss 6.839 |avg tokens 2303.000 |tokens/s 8322.576 |walltime 2248.668 | +Transformer | epoch 0 | step 8470 |avg loss 8.263 |avg tokens 2058.900 |tokens/s 8278.213 |walltime 2251.155 | +Transformer | epoch 0 | step 8480 |avg loss 7.738 |avg tokens 2083.300 |tokens/s 8006.967 |walltime 2253.757 | +Transformer | epoch 0 | step 8490 |avg loss 8.214 |avg tokens 2183.600 |tokens/s 8471.892 |walltime 2256.335 | +Transformer | epoch 0 | step 8500 |avg loss 7.753 |avg tokens 2333.600 |tokens/s 8634.862 |walltime 2259.037 | +Transformer | epoch 0 | step 8510 |avg loss 7.353 |avg tokens 2269.600 |tokens/s 8369.751 |walltime 2261.749 | +Transformer | epoch 0 | step 8520 |avg loss 7.857 |avg tokens 2078.400 |tokens/s 8103.056 |walltime 2264.314 | +Transformer | epoch 0 | step 8530 |avg loss 7.814 |avg tokens 2116.800 |tokens/s 8041.101 |walltime 2266.946 | +Transformer | epoch 0 | step 8540 |avg loss 7.617 |avg tokens 2094.100 |tokens/s 7950.923 |walltime 2269.580 | +Transformer | epoch 0 | step 8550 |avg loss 7.112 |avg tokens 2214.400 |tokens/s 8216.445 |walltime 2272.275 | +Transformer | epoch 0 | step 8560 |avg loss 7.433 |avg tokens 2183.800 |tokens/s 8069.762 |walltime 2274.981 | +Transformer | epoch 0 | step 8570 |avg loss 7.452 |avg tokens 2200.900 |tokens/s 8168.991 |walltime 2277.676 | +Transformer | epoch 0 | step 8580 |avg loss 8.090 |avg tokens 1844.400 |tokens/s 7391.997 |walltime 2280.171 | +Transformer | epoch 0 | step 8590 |avg loss 7.813 |avg tokens 2283.700 |tokens/s 8569.567 |walltime 2282.836 | +Transformer | epoch 0 | step 8600 |avg loss 7.627 |avg tokens 2149.300 |tokens/s 7996.737 |walltime 2285.523 | +Transformer | epoch 0 | step 8610 |avg loss 8.212 |avg tokens 1938.500 |tokens/s 7921.254 |walltime 2287.971 | +Transformer | epoch 0 | step 8620 |avg loss 7.756 |avg tokens 2131.300 |tokens/s 7919.931 |walltime 2290.662 | +Transformer | epoch 0 | step 8630 |avg loss 7.598 |avg tokens 2371.500 |tokens/s 8594.489 |walltime 2293.421 | +Transformer | epoch 0 | step 8640 |avg loss 7.409 |avg tokens 2340.700 |tokens/s 8716.246 |walltime 2296.106 | +Transformer | epoch 0 | step 8650 |avg loss 7.195 |avg tokens 2415.200 |tokens/s 8549.319 |walltime 2298.932 | +Transformer | epoch 0 | step 8660 |avg loss 6.948 |avg tokens 2259.200 |tokens/s 8206.566 |walltime 2301.684 | +Transformer | epoch 0 | step 8670 |avg loss 6.938 |avg tokens 2279.100 |tokens/s 8422.744 |walltime 2304.390 | +Transformer | epoch 0 | step 8680 |avg loss 7.878 |avg tokens 2297.500 |tokens/s 8665.719 |walltime 2307.042 | +Transformer | epoch 0 | step 8690 |avg loss 7.443 |avg tokens 2054.700 |tokens/s 7736.786 |walltime 2309.697 | +Transformer | epoch 0 | step 8700 |avg loss 7.649 |avg tokens 2374.600 |tokens/s 8812.613 |walltime 2312.392 | +Transformer | epoch 0 | step 8710 |avg loss 7.738 |avg tokens 2372.000 |tokens/s 8828.214 |walltime 2315.079 | +Transformer | epoch 0 | step 8720 |avg loss 7.452 |avg tokens 1962.700 |tokens/s 7630.137 |walltime 2317.651 | +Transformer | epoch 0 | step 8730 |avg loss 7.669 |avg tokens 2250.400 |tokens/s 8366.781 |walltime 2320.341 | +Transformer | epoch 0 | step 8740 |avg loss 7.563 |avg tokens 2330.100 |tokens/s 8502.081 |walltime 2323.081 | +Transformer | epoch 0 | step 8750 |avg loss 7.496 |avg tokens 2024.200 |tokens/s 7724.436 |walltime 2325.702 | +Transformer | epoch 0 | step 8760 |avg loss 7.774 |avg tokens 2145.500 |tokens/s 8299.507 |walltime 2328.287 | +Transformer | epoch 0 | step 8770 |avg loss 7.370 |avg tokens 2196.000 |tokens/s 8056.176 |walltime 2331.013 | +Transformer | epoch 0 | step 8780 |avg loss 7.776 |avg tokens 2159.200 |tokens/s 8203.766 |walltime 2333.645 | +Transformer | epoch 0 | step 8790 |avg loss 7.509 |avg tokens 2177.300 |tokens/s 8264.666 |walltime 2336.279 | +Transformer | epoch 0 | step 8800 |avg loss 7.290 |avg tokens 2143.100 |tokens/s 8010.652 |walltime 2338.955 | +Transformer | epoch 0 | step 8810 |avg loss 7.706 |avg tokens 2296.800 |tokens/s 8525.090 |walltime 2341.649 | +Transformer | epoch 0 | step 8820 |avg loss 7.580 |avg tokens 2108.000 |tokens/s 7829.803 |walltime 2344.341 | +Transformer | epoch 0 | step 8830 |avg loss 7.663 |avg tokens 2203.100 |tokens/s 8218.587 |walltime 2347.022 | +Transformer | epoch 0 | step 8840 |avg loss 7.709 |avg tokens 2067.400 |tokens/s 7974.821 |walltime 2349.614 | +Transformer | epoch 0 | step 8850 |avg loss 7.827 |avg tokens 2067.200 |tokens/s 7949.374 |walltime 2352.215 | +Transformer | epoch 0 | step 8860 |avg loss 7.483 |avg tokens 1968.500 |tokens/s 7774.011 |walltime 2354.747 | +Transformer | epoch 0 | step 8870 |avg loss 7.612 |avg tokens 2047.300 |tokens/s 7887.714 |walltime 2357.342 | +Transformer | epoch 0 | step 8880 |avg loss 7.549 |avg tokens 1954.200 |tokens/s 7541.971 |walltime 2359.933 | +Transformer | epoch 0 | step 8890 |avg loss 7.760 |avg tokens 1966.400 |tokens/s 7550.428 |walltime 2362.538 | +Transformer | epoch 0 | step 8900 |avg loss 7.565 |avg tokens 2260.000 |tokens/s 8464.955 |walltime 2365.208 | +Transformer | epoch 0 | step 8910 |avg loss 7.617 |avg tokens 2229.600 |tokens/s 8228.091 |walltime 2367.917 | +Transformer | epoch 0 | step 8920 |avg loss 7.227 |avg tokens 2231.900 |tokens/s 8244.336 |walltime 2370.625 | +Transformer | epoch 0 | step 8930 |avg loss 8.068 |avg tokens 2325.700 |tokens/s 8963.739 |walltime 2373.219 | +Transformer | epoch 0 | step 8940 |avg loss 7.464 |avg tokens 2385.500 |tokens/s 8633.634 |walltime 2375.982 | +Transformer | epoch 0 | step 8950 |avg loss 7.551 |avg tokens 2151.300 |tokens/s 8049.335 |walltime 2378.655 | +Transformer | epoch 0 | step 8960 |avg loss 7.426 |avg tokens 2158.400 |tokens/s 8200.021 |walltime 2381.287 | +Transformer | epoch 0 | step 8970 |avg loss 7.628 |avg tokens 2365.700 |tokens/s 8589.620 |walltime 2384.041 | +Transformer | epoch 0 | step 8980 |avg loss 7.880 |avg tokens 2183.900 |tokens/s 8225.750 |walltime 2386.696 | +Transformer | epoch 0 | step 8990 |avg loss 7.642 |avg tokens 2306.800 |tokens/s 8580.618 |walltime 2389.384 | +Transformer | epoch 0 | step 9000 |avg loss 7.459 |avg tokens 2370.400 |tokens/s 8811.968 |walltime 2392.074 | +Transformer | epoch 0 | step 9010 |avg loss 7.872 |avg tokens 2307.200 |tokens/s 8850.468 |walltime 2394.681 | +Transformer | epoch 0 | step 9020 |avg loss 7.733 |avg tokens 2141.200 |tokens/s 8049.769 |walltime 2397.341 | +Transformer | epoch 0 | step 9030 |avg loss 7.627 |avg tokens 2186.400 |tokens/s 8198.585 |walltime 2400.008 | +Transformer | epoch 0 | step 9040 |avg loss 7.715 |avg tokens 2258.900 |tokens/s 8412.066 |walltime 2402.693 | +Transformer | epoch 0 | step 9050 |avg loss 7.476 |avg tokens 2302.400 |tokens/s 8479.874 |walltime 2405.409 | +Transformer | epoch 0 | step 9060 |avg loss 8.015 |avg tokens 2190.600 |tokens/s 8760.533 |walltime 2407.909 | +Transformer | epoch 0 | step 9070 |avg loss 7.763 |avg tokens 2068.500 |tokens/s 7860.674 |walltime 2410.541 | +Transformer | epoch 0 | step 9080 |avg loss 7.619 |avg tokens 2189.600 |tokens/s 8091.837 |walltime 2413.246 | +Transformer | epoch 0 | step 9090 |avg loss 7.983 |avg tokens 2279.700 |tokens/s 8973.923 |walltime 2415.787 | +Transformer | epoch 0 | step 9100 |avg loss 7.841 |avg tokens 2104.100 |tokens/s 7986.520 |walltime 2418.421 | +Transformer | epoch 0 | step 9110 |avg loss 7.405 |avg tokens 2241.500 |tokens/s 8216.633 |walltime 2421.149 | +Transformer | epoch 0 | step 9120 |avg loss 7.324 |avg tokens 2212.800 |tokens/s 8126.140 |walltime 2423.872 | +Transformer | epoch 0 | step 9130 |avg loss 7.714 |avg tokens 2035.000 |tokens/s 7804.646 |walltime 2426.480 | +Transformer | epoch 0 | step 9140 |avg loss 7.500 |avg tokens 2331.300 |tokens/s 8670.950 |walltime 2429.169 | +Transformer | epoch 0 | step 9150 |avg loss 7.586 |avg tokens 2090.000 |tokens/s 7816.520 |walltime 2431.842 | +Transformer | epoch 0 | step 9160 |avg loss 7.508 |avg tokens 2240.000 |tokens/s 8249.880 |walltime 2434.558 | +Transformer | epoch 0 | step 9170 |avg loss 8.065 |avg tokens 1904.000 |tokens/s 7728.718 |walltime 2437.021 | +Transformer | epoch 0 | step 9180 |avg loss 7.958 |avg tokens 2201.100 |tokens/s 8423.541 |walltime 2439.634 | +Transformer | epoch 0 | step 9190 |avg loss 7.766 |avg tokens 2267.000 |tokens/s 8459.951 |walltime 2442.314 | +Transformer | epoch 0 | step 9200 |avg loss 8.315 |avg tokens 2054.000 |tokens/s 8395.135 |walltime 2444.760 | +Transformer | epoch 0 | step 9210 |avg loss 7.692 |avg tokens 2160.900 |tokens/s 8214.494 |walltime 2447.391 | +Transformer | epoch 0 | step 9220 |avg loss 8.212 |avg tokens 1970.200 |tokens/s 8087.985 |walltime 2449.827 | +Transformer | epoch 0 | step 9230 |avg loss 7.529 |avg tokens 2389.800 |tokens/s 8700.312 |walltime 2452.574 | +Transformer | epoch 0 | step 9240 |avg loss 7.139 |avg tokens 2347.200 |tokens/s 8304.616 |walltime 2455.400 | +Transformer | epoch 0 | step 9250 |avg loss 7.713 |avg tokens 2382.700 |tokens/s 9177.024 |walltime 2457.997 | +Transformer | epoch 0 | step 9260 |avg loss 7.684 |avg tokens 2140.000 |tokens/s 7961.073 |walltime 2460.685 | +Transformer | epoch 0 | step 9270 |avg loss 7.616 |avg tokens 2228.000 |tokens/s 8267.305 |walltime 2463.380 | +Transformer | epoch 0 | step 9280 |avg loss 7.639 |avg tokens 2206.200 |tokens/s 8256.802 |walltime 2466.052 | +Transformer | epoch 0 | step 9290 |avg loss 7.924 |avg tokens 2025.800 |tokens/s 7823.955 |walltime 2468.641 | +Transformer | epoch 0 | step 9300 |avg loss 7.270 |avg tokens 2339.800 |tokens/s 8434.318 |walltime 2471.415 | +Transformer | epoch 0 | step 9310 |avg loss 7.791 |avg tokens 2215.100 |tokens/s 8257.617 |walltime 2474.098 | +Transformer | epoch 0 | step 9320 |avg loss 7.634 |avg tokens 2267.300 |tokens/s 8349.883 |walltime 2476.813 | +Transformer | epoch 0 | step 9330 |avg loss 7.703 |avg tokens 2248.400 |tokens/s 8285.720 |walltime 2479.526 | +Transformer | epoch 0 | step 9340 |avg loss 7.376 |avg tokens 2328.800 |tokens/s 8512.374 |walltime 2482.262 | +Transformer | epoch 0 | step 9350 |avg loss 7.402 |avg tokens 2152.000 |tokens/s 8105.561 |walltime 2484.917 | +Transformer | epoch 0 | step 9360 |avg loss 7.468 |avg tokens 2259.200 |tokens/s 8232.442 |walltime 2487.661 | +Transformer | epoch 0 | step 9370 |avg loss 7.530 |avg tokens 2177.000 |tokens/s 8113.792 |walltime 2490.345 | +Transformer | epoch 0 | step 9380 |avg loss 7.612 |avg tokens 2385.400 |tokens/s 8705.307 |walltime 2493.085 | +Transformer | epoch 0 | step 9390 |avg loss 7.476 |avg tokens 2077.000 |tokens/s 8154.368 |walltime 2495.632 | +Transformer | epoch 0 | step 9400 |avg loss 7.336 |avg tokens 2198.500 |tokens/s 8086.645 |walltime 2498.351 | +Transformer | epoch 0 | step 9410 |avg loss 7.820 |avg tokens 2214.400 |tokens/s 8627.088 |walltime 2500.917 | +Transformer | epoch 0 | step 9420 |avg loss 7.482 |avg tokens 2326.500 |tokens/s 8449.574 |walltime 2503.671 | +Transformer | epoch 0 | step 9430 |avg loss 7.102 |avg tokens 2332.800 |tokens/s 8359.969 |walltime 2506.461 | +Transformer | epoch 0 | step 9440 |avg loss 8.020 |avg tokens 2065.800 |tokens/s 8028.285 |walltime 2509.034 | +Transformer | epoch 0 | step 9450 |avg loss 8.129 |avg tokens 1833.000 |tokens/s 7507.219 |walltime 2511.476 | +Transformer | epoch 0 | step 9460 |avg loss 7.070 |avg tokens 2337.800 |tokens/s 8372.969 |walltime 2514.268 | +Transformer | epoch 0 | step 9470 |avg loss 7.739 |avg tokens 2098.400 |tokens/s 7958.900 |walltime 2516.905 | +Transformer | epoch 0 | step 9480 |avg loss 7.922 |avg tokens 2031.500 |tokens/s 7953.512 |walltime 2519.459 | +Transformer | epoch 0 | step 9490 |avg loss 7.610 |avg tokens 2213.800 |tokens/s 8234.599 |walltime 2522.147 | +Transformer | epoch 0 | step 9500 |avg loss 7.558 |avg tokens 2148.800 |tokens/s 8161.355 |walltime 2524.780 | +Transformer | epoch 0 | step 9510 |avg loss 7.921 |avg tokens 2146.300 |tokens/s 8064.914 |walltime 2527.441 | +Transformer | epoch 0 | step 9520 |avg loss 7.664 |avg tokens 2168.600 |tokens/s 8156.186 |walltime 2530.100 | +Transformer | epoch 0 | step 9530 |avg loss 7.697 |avg tokens 1982.400 |tokens/s 7781.670 |walltime 2532.648 | +Transformer | epoch 0 | step 9540 |avg loss 7.368 |avg tokens 2300.800 |tokens/s 8358.941 |walltime 2535.400 | +Transformer | epoch 0 | step 9550 |avg loss 7.097 |avg tokens 2304.000 |tokens/s 8311.913 |walltime 2538.172 | +Transformer | epoch 0 | step 9560 |avg loss 8.054 |avg tokens 2283.000 |tokens/s 9078.038 |walltime 2540.687 | +Transformer | epoch 0 | step 9570 |avg loss 7.781 |avg tokens 2263.200 |tokens/s 8461.334 |walltime 2543.362 | +Transformer | epoch 0 | step 9580 |avg loss 7.746 |avg tokens 2121.300 |tokens/s 8256.054 |walltime 2545.931 | +Transformer | epoch 0 | step 9590 |avg loss 7.623 |avg tokens 2340.800 |tokens/s 8585.600 |walltime 2548.658 | +Transformer | epoch 0 | step 9600 |avg loss 7.984 |avg tokens 2268.400 |tokens/s 8719.012 |walltime 2551.259 | +Transformer | epoch 0 | step 9610 |avg loss 7.246 |avg tokens 2285.600 |tokens/s 8216.841 |walltime 2554.041 | +Transformer | epoch 0 | step 9620 |avg loss 7.569 |avg tokens 2350.300 |tokens/s 8520.840 |walltime 2556.799 | +Transformer | epoch 0 | step 9630 |avg loss 7.367 |avg tokens 2158.700 |tokens/s 8198.099 |walltime 2559.432 | +Transformer | epoch 0 | step 9640 |avg loss 7.629 |avg tokens 2120.000 |tokens/s 8029.747 |walltime 2562.073 | +Transformer | epoch 0 | step 9650 |avg loss 7.714 |avg tokens 2000.000 |tokens/s 7917.315 |walltime 2564.599 | +Transformer | epoch 0 | step 9660 |avg loss 7.502 |avg tokens 2330.600 |tokens/s 8478.016 |walltime 2567.348 | +Transformer | epoch 0 | step 9670 |avg loss 8.114 |avg tokens 2140.300 |tokens/s 8164.658 |walltime 2569.969 | +Transformer | epoch 0 | step 9680 |avg loss 7.282 |avg tokens 2270.100 |tokens/s 8384.582 |walltime 2572.677 | +Transformer | epoch 0 | step 9690 |avg loss 7.842 |avg tokens 1942.100 |tokens/s 7815.653 |walltime 2575.162 | +Transformer | epoch 0 | step 9700 |avg loss 7.876 |avg tokens 2219.100 |tokens/s 8624.491 |walltime 2577.735 | +Transformer | epoch 0 | step 9710 |avg loss 8.027 |avg tokens 1995.200 |tokens/s 7967.962 |walltime 2580.239 | +Transformer | epoch 0 | step 9720 |avg loss 7.717 |avg tokens 2200.300 |tokens/s 8254.127 |walltime 2582.904 | +Transformer | epoch 0 | step 9730 |avg loss 7.347 |avg tokens 2334.300 |tokens/s 8467.083 |walltime 2585.661 | +Transformer | epoch 0 | step 9740 |avg loss 7.643 |avg tokens 2227.000 |tokens/s 8140.542 |walltime 2588.397 | +Transformer | epoch 0 | step 9750 |avg loss 7.961 |avg tokens 2008.800 |tokens/s 8016.113 |walltime 2590.903 | +Transformer | epoch 0 | step 9760 |avg loss 7.555 |avg tokens 2198.500 |tokens/s 7982.022 |walltime 2593.657 | +Transformer | epoch 0 | step 9770 |avg loss 7.924 |avg tokens 2336.400 |tokens/s 8824.127 |walltime 2596.305 | +Transformer | epoch 0 | step 9780 |avg loss 7.701 |avg tokens 2285.000 |tokens/s 8357.070 |walltime 2599.039 | +Transformer | epoch 0 | step 9790 |avg loss 7.911 |avg tokens 2022.800 |tokens/s 7868.168 |walltime 2601.610 | +Transformer | epoch 0 | step 9800 |avg loss 7.804 |avg tokens 2122.100 |tokens/s 8215.256 |walltime 2604.193 | +Transformer | epoch 0 | step 9810 |avg loss 7.554 |avg tokens 2158.300 |tokens/s 8019.827 |walltime 2606.884 | +Transformer | epoch 0 | step 9820 |avg loss 7.568 |avg tokens 2154.200 |tokens/s 8185.355 |walltime 2609.516 | +Transformer | epoch 0 | step 9830 |avg loss 7.429 |avg tokens 2199.800 |tokens/s 8030.444 |walltime 2612.255 | +Transformer | epoch 0 | step 9840 |avg loss 7.804 |avg tokens 2346.300 |tokens/s 8824.566 |walltime 2614.914 | +Transformer | epoch 0 | step 9850 |avg loss 7.709 |avg tokens 2162.400 |tokens/s 8169.629 |walltime 2617.561 | +Transformer | epoch 0 | step 9860 |avg loss 7.659 |avg tokens 2154.800 |tokens/s 7879.546 |walltime 2620.296 | +Transformer | epoch 0 | step 9870 |avg loss 7.796 |avg tokens 2286.600 |tokens/s 8343.897 |walltime 2623.036 | +Transformer | epoch 0 | step 9880 |avg loss 7.474 |avg tokens 2367.800 |tokens/s 8651.919 |walltime 2625.773 | +Transformer | epoch 0 | step 9890 |avg loss 7.774 |avg tokens 2202.300 |tokens/s 8287.927 |walltime 2628.430 | +Transformer | epoch 0 | step 9900 |avg loss 7.595 |avg tokens 2255.200 |tokens/s 8336.907 |walltime 2631.135 | +Transformer | epoch 0 | step 9910 |avg loss 7.387 |avg tokens 2231.100 |tokens/s 8373.803 |walltime 2633.800 | +Transformer | epoch 0 | step 9920 |avg loss 7.526 |avg tokens 2089.400 |tokens/s 7895.827 |walltime 2636.446 | +Transformer | epoch 0 | step 9930 |avg loss 7.866 |avg tokens 2134.400 |tokens/s 8315.664 |walltime 2639.013 | +Transformer | epoch 0 | step 9940 |avg loss 7.532 |avg tokens 2142.200 |tokens/s 8011.741 |walltime 2641.687 | +Transformer | epoch 0 | step 9950 |avg loss 7.528 |avg tokens 2374.200 |tokens/s 8724.681 |walltime 2644.408 | +Transformer | epoch 0 | step 9960 |avg loss 7.283 |avg tokens 2210.600 |tokens/s 8215.301 |walltime 2647.099 | +Transformer | epoch 0 | step 9970 |avg loss 7.733 |avg tokens 2189.200 |tokens/s 8168.850 |walltime 2649.779 | +Transformer | epoch 0 | step 9980 |avg loss 7.842 |avg tokens 1918.200 |tokens/s 7497.648 |walltime 2652.337 | +Transformer | epoch 0 | step 9990 |avg loss 7.665 |avg tokens 2116.800 |tokens/s 7886.021 |walltime 2655.021 | +Transformer | epoch 0 | step 10000 |avg loss 7.319 |avg tokens 2189.200 |tokens/s 8140.318 |walltime 2657.711 | +Transformer | epoch 0 | step 10010 |avg loss 7.710 |avg tokens 2045.200 |tokens/s 7584.191 |walltime 2660.407 | +Transformer | epoch 0 | step 10020 |avg loss 7.809 |avg tokens 1941.300 |tokens/s 7681.087 |walltime 2662.935 | +Transformer | epoch 0 | step 10030 |avg loss 7.785 |avg tokens 2076.200 |tokens/s 7892.690 |walltime 2665.565 | +Transformer | epoch 0 | step 10040 |avg loss 7.477 |avg tokens 2220.300 |tokens/s 8146.742 |walltime 2668.291 | +Transformer | epoch 0 | step 10050 |avg loss 7.698 |avg tokens 2264.800 |tokens/s 8725.017 |walltime 2670.886 | +Transformer | epoch 0 | step 10060 |avg loss 7.834 |avg tokens 2131.500 |tokens/s 8184.872 |walltime 2673.490 | +Transformer | epoch 0 | step 10070 |avg loss 7.385 |avg tokens 2011.100 |tokens/s 7601.625 |walltime 2676.136 | +Transformer | epoch 0 | step 10080 |avg loss 7.397 |avg tokens 2246.400 |tokens/s 8364.747 |walltime 2678.822 | +Transformer | epoch 0 | step 10090 |avg loss 7.925 |avg tokens 2011.700 |tokens/s 7929.115 |walltime 2681.359 | +Transformer | epoch 0 | step 10100 |avg loss 7.720 |avg tokens 2189.600 |tokens/s 8315.022 |walltime 2683.992 | +Transformer | epoch 0 | step 10110 |avg loss 7.760 |avg tokens 2349.700 |tokens/s 8852.774 |walltime 2686.646 | +Transformer | epoch 0 | step 10120 |avg loss 7.689 |avg tokens 2144.700 |tokens/s 8137.707 |walltime 2689.282 | +Transformer | epoch 0 | step 10130 |avg loss 7.597 |avg tokens 2208.800 |tokens/s 8313.134 |walltime 2691.939 | +Transformer | epoch 0 | step 10140 |avg loss 7.396 |avg tokens 2194.900 |tokens/s 8111.401 |walltime 2694.645 | +Transformer | epoch 0 | step 10150 |avg loss 7.612 |avg tokens 2216.800 |tokens/s 8230.978 |walltime 2697.338 | +Transformer | epoch 0 | step 10160 |avg loss 7.696 |avg tokens 2081.900 |tokens/s 7812.379 |walltime 2700.003 | +Transformer | epoch 0 | step 10170 |avg loss 7.184 |avg tokens 2437.600 |tokens/s 8711.961 |walltime 2702.801 | +Transformer | epoch 0 | step 10180 |avg loss 7.769 |avg tokens 1869.300 |tokens/s 7324.111 |walltime 2705.353 | +Transformer | epoch 0 | step 10190 |avg loss 8.041 |avg tokens 2048.900 |tokens/s 7986.916 |walltime 2707.918 | +Transformer | epoch 0 | step 10200 |avg loss 7.157 |avg tokens 2291.200 |tokens/s 8395.040 |walltime 2710.648 | +Transformer | epoch 0 | step 10210 |avg loss 7.759 |avg tokens 1951.700 |tokens/s 7623.331 |walltime 2713.208 | +Transformer | epoch 0 | step 10220 |avg loss 7.734 |avg tokens 2132.100 |tokens/s 7918.073 |walltime 2715.901 | +Transformer | epoch 0 | step 10230 |avg loss 7.449 |avg tokens 2208.700 |tokens/s 8137.785 |walltime 2718.615 | +Transformer | epoch 0 | step 10240 |avg loss 7.938 |avg tokens 2156.800 |tokens/s 8467.068 |walltime 2721.162 | +Transformer | epoch 0 | step 10250 |avg loss 8.101 |avg tokens 2113.400 |tokens/s 8313.722 |walltime 2723.704 | +Transformer | epoch 0 | step 10260 |avg loss 7.553 |avg tokens 2239.600 |tokens/s 8352.543 |walltime 2726.385 | +Transformer | epoch 0 | step 10270 |avg loss 8.232 |avg tokens 2185.800 |tokens/s 8592.807 |walltime 2728.929 | +Transformer | epoch 0 | step 10280 |avg loss 7.766 |avg tokens 2080.000 |tokens/s 8134.821 |walltime 2731.486 | +Transformer | epoch 0 | step 10290 |avg loss 7.394 |avg tokens 2387.200 |tokens/s 8512.448 |walltime 2734.290 | +Transformer | epoch 0 | step 10300 |avg loss 7.620 |avg tokens 2229.600 |tokens/s 8343.913 |walltime 2736.963 | +Transformer | epoch 0 | step 10310 |avg loss 8.011 |avg tokens 1980.500 |tokens/s 7661.258 |walltime 2739.548 | +Transformer | epoch 0 | step 10320 |avg loss 7.502 |avg tokens 2185.700 |tokens/s 8118.223 |walltime 2742.240 | +Transformer | epoch 0 | step 10330 |avg loss 7.660 |avg tokens 2056.100 |tokens/s 7912.523 |walltime 2744.839 | +Transformer | epoch 0 | step 10340 |avg loss 7.570 |avg tokens 2142.800 |tokens/s 7990.424 |walltime 2747.520 | +Transformer | epoch 0 | step 10350 |avg loss 8.438 |avg tokens 2103.400 |tokens/s 8513.321 |walltime 2749.991 | +Transformer | epoch 0 | step 10360 |avg loss 7.854 |avg tokens 2293.200 |tokens/s 8816.075 |walltime 2752.592 | +Transformer | epoch 0 | step 10370 |avg loss 7.483 |avg tokens 2273.800 |tokens/s 8312.573 |walltime 2755.327 | +Transformer | epoch 0 | step 10380 |avg loss 7.778 |avg tokens 2254.400 |tokens/s 8534.186 |walltime 2757.969 | +Transformer | epoch 0 | step 10390 |avg loss 7.883 |avg tokens 2036.200 |tokens/s 7877.186 |walltime 2760.554 | +Transformer | epoch 0 | step 10400 |avg loss 7.894 |avg tokens 2128.200 |tokens/s 8212.551 |walltime 2763.145 | +Transformer | epoch 0 | step 10410 |avg loss 8.003 |avg tokens 2279.800 |tokens/s 8596.693 |walltime 2765.797 | +Transformer | epoch 0 | step 10420 |avg loss 7.595 |avg tokens 2191.200 |tokens/s 8231.319 |walltime 2768.459 | +Transformer | epoch 0 | step 10430 |avg loss 7.399 |avg tokens 2141.600 |tokens/s 7913.744 |walltime 2771.166 | +Transformer | epoch 0 | step 10440 |avg loss 7.739 |avg tokens 2277.800 |tokens/s 8463.660 |walltime 2773.857 | +Transformer | epoch 0 | step 10450 |avg loss 7.929 |avg tokens 2191.500 |tokens/s 8406.079 |walltime 2776.464 | +Transformer | epoch 0 | step 10460 |avg loss 7.437 |avg tokens 2077.600 |tokens/s 7770.667 |walltime 2779.138 | +Transformer | epoch 0 | step 10470 |avg loss 7.634 |avg tokens 2014.500 |tokens/s 7936.430 |walltime 2781.676 | +Transformer | epoch 0 | step 10480 |avg loss 7.548 |avg tokens 2106.100 |tokens/s 7964.965 |walltime 2784.320 | +Transformer | epoch 0 | step 10490 |avg loss 7.676 |avg tokens 1911.300 |tokens/s 7584.606 |walltime 2786.840 | +Transformer | epoch 0 | step 10500 |avg loss 7.414 |avg tokens 2147.500 |tokens/s 8014.325 |walltime 2789.520 | +Transformer | epoch 0 | step 10510 |avg loss 8.202 |avg tokens 2152.000 |tokens/s 8472.708 |walltime 2792.060 | +Transformer | epoch 0 | step 10520 |avg loss 7.682 |avg tokens 2035.000 |tokens/s 7690.644 |walltime 2794.706 | +Transformer | epoch 0 | step 10530 |avg loss 7.677 |avg tokens 2039.700 |tokens/s 7877.545 |walltime 2797.295 | +Transformer | epoch 0 | step 10540 |avg loss 8.092 |avg tokens 1987.600 |tokens/s 8321.266 |walltime 2799.683 | +Transformer | epoch 0 | step 10550 |avg loss 7.996 |avg tokens 2126.600 |tokens/s 8369.756 |walltime 2802.224 | +Transformer | epoch 0 | step 10560 |avg loss 7.473 |avg tokens 2238.400 |tokens/s 8447.578 |walltime 2804.874 | +Transformer | epoch 0 | step 10570 |avg loss 7.732 |avg tokens 2200.300 |tokens/s 8066.545 |walltime 2807.602 | +Transformer | epoch 0 | step 10580 |avg loss 7.335 |avg tokens 2264.800 |tokens/s 8015.209 |walltime 2810.427 | +Transformer | epoch 0 | step 10590 |avg loss 8.251 |avg tokens 2140.200 |tokens/s 8513.579 |walltime 2812.941 | +Transformer | epoch 0 | step 10600 |avg loss 7.638 |avg tokens 2212.800 |tokens/s 8216.500 |walltime 2815.634 | +Transformer | epoch 0 | step 10610 |avg loss 8.343 |avg tokens 1853.600 |tokens/s 7789.328 |walltime 2818.014 | +Transformer | epoch 0 | step 10620 |avg loss 7.718 |avg tokens 2255.100 |tokens/s 8544.497 |walltime 2820.653 | +Transformer | epoch 0 | step 10630 |avg loss 7.683 |avg tokens 2205.000 |tokens/s 8363.465 |walltime 2823.290 | +Transformer | epoch 0 | step 10640 |avg loss 7.807 |avg tokens 2071.600 |tokens/s 8111.501 |walltime 2825.844 | +Transformer | epoch 0 | step 10650 |avg loss 7.651 |avg tokens 2110.100 |tokens/s 8200.289 |walltime 2828.417 | +Transformer | epoch 0 | step 10660 |avg loss 7.391 |avg tokens 2111.900 |tokens/s 8095.446 |walltime 2831.026 | +Transformer | epoch 0 | step 10670 |avg loss 7.457 |avg tokens 2245.700 |tokens/s 8129.819 |walltime 2833.788 | +Transformer | epoch 0 | step 10680 |avg loss 7.657 |avg tokens 2261.900 |tokens/s 8372.658 |walltime 2836.489 | +Transformer | epoch 0 | step 10690 |avg loss 7.793 |avg tokens 2178.100 |tokens/s 8089.548 |walltime 2839.182 | +Transformer | epoch 0 | step 10700 |avg loss 7.754 |avg tokens 1982.600 |tokens/s 7623.882 |walltime 2841.782 | +Transformer | epoch 0 | step 10710 |avg loss 8.220 |avg tokens 1984.400 |tokens/s 7796.106 |walltime 2844.328 | +Transformer | epoch 0 | step 10720 |avg loss 7.913 |avg tokens 1808.200 |tokens/s 7360.700 |walltime 2846.784 | +Transformer | epoch 0 | step 10730 |avg loss 7.800 |avg tokens 2267.800 |tokens/s 8442.900 |walltime 2849.470 | +Transformer | epoch 0 | step 10740 |avg loss 8.164 |avg tokens 2173.100 |tokens/s 8648.475 |walltime 2851.983 | +Transformer | epoch 0 | step 10750 |avg loss 7.878 |avg tokens 2269.800 |tokens/s 8685.189 |walltime 2854.597 | +Transformer | epoch 0 | step 10760 |avg loss 7.771 |avg tokens 2233.800 |tokens/s 8203.166 |walltime 2857.320 | +Transformer | epoch 0 | step 10770 |avg loss 8.123 |avg tokens 1857.500 |tokens/s 7495.502 |walltime 2859.798 | +Transformer | epoch 0 | step 10780 |avg loss 7.621 |avg tokens 2300.800 |tokens/s 8430.985 |walltime 2862.527 | +Transformer | epoch 0 | step 10790 |avg loss 7.605 |avg tokens 2170.000 |tokens/s 8246.506 |walltime 2865.158 | +Transformer | epoch 0 | step 10800 |avg loss 7.643 |avg tokens 2252.900 |tokens/s 8305.031 |walltime 2867.871 | +Transformer | epoch 0 | step 10810 |avg loss 7.788 |avg tokens 2067.300 |tokens/s 7918.382 |walltime 2870.482 | +Transformer | epoch 0 | step 10820 |avg loss 7.681 |avg tokens 2097.600 |tokens/s 8084.529 |walltime 2873.076 | +Transformer | epoch 0 | step 10830 |avg loss 7.486 |avg tokens 2166.500 |tokens/s 8206.104 |walltime 2875.716 | +Transformer | epoch 0 | step 10840 |avg loss 7.815 |avg tokens 2307.200 |tokens/s 8728.608 |walltime 2878.360 | +Transformer | epoch 0 | step 10850 |avg loss 7.495 |avg tokens 2252.000 |tokens/s 8386.785 |walltime 2881.045 | +Transformer | epoch 0 | step 10860 |avg loss 7.435 |avg tokens 2412.000 |tokens/s 8628.629 |walltime 2883.840 | +Transformer | epoch 0 | step 10870 |avg loss 7.472 |avg tokens 2300.800 |tokens/s 8563.694 |walltime 2886.527 | +Transformer | epoch 0 | step 10880 |avg loss 7.476 |avg tokens 2281.200 |tokens/s 8423.868 |walltime 2889.235 | +Transformer | epoch 0 | step 10890 |avg loss 7.461 |avg tokens 2232.000 |tokens/s 8388.651 |walltime 2891.896 | +Transformer | epoch 0 | step 10900 |avg loss 7.558 |avg tokens 2189.600 |tokens/s 8271.798 |walltime 2894.543 | +Transformer | epoch 0 | step 10910 |avg loss 7.566 |avg tokens 2063.900 |tokens/s 7891.876 |walltime 2897.158 | +Transformer | epoch 0 | step 10920 |avg loss 8.088 |avg tokens 2156.400 |tokens/s 8261.822 |walltime 2899.768 | +Transformer | epoch 0 | step 10930 |avg loss 7.425 |avg tokens 2397.600 |tokens/s 8658.496 |walltime 2902.537 | +Transformer | epoch 0 | step 10940 |avg loss 7.763 |avg tokens 2196.100 |tokens/s 8357.289 |walltime 2905.165 | +Transformer | epoch 0 | step 10950 |avg loss 7.952 |avg tokens 2085.600 |tokens/s 8051.813 |walltime 2907.755 | +Transformer | epoch 0 | step 10960 |avg loss 8.040 |avg tokens 2266.000 |tokens/s 8696.262 |walltime 2910.361 | +Transformer | epoch 0 | step 10970 |avg loss 7.545 |avg tokens 2245.100 |tokens/s 8232.806 |walltime 2913.088 | +Transformer | epoch 0 | step 10980 |avg loss 7.427 |avg tokens 2456.800 |tokens/s 8793.177 |walltime 2915.882 | +Transformer | epoch 0 | step 10990 |avg loss 7.511 |avg tokens 2096.800 |tokens/s 7919.428 |walltime 2918.530 | +Transformer | epoch 0 | step 11000 |avg loss 7.988 |avg tokens 2228.400 |tokens/s 8493.286 |walltime 2921.153 | +Transformer | epoch 0 | step 11010 |avg loss 7.955 |avg tokens 2107.600 |tokens/s 8131.776 |walltime 2923.745 | +Transformer | epoch 0 | step 11020 |avg loss 7.529 |avg tokens 2336.400 |tokens/s 8648.153 |walltime 2926.447 | +Transformer | epoch 0 | step 11030 |avg loss 7.508 |avg tokens 2078.400 |tokens/s 7859.169 |walltime 2929.091 | +Transformer | epoch 0 | step 11040 |avg loss 8.341 |avg tokens 1868.000 |tokens/s 7780.489 |walltime 2931.492 | +Transformer | epoch 0 | step 11050 |avg loss 8.049 |avg tokens 2019.100 |tokens/s 8353.873 |walltime 2933.909 | +Transformer | epoch 0 | step 11060 |avg loss 7.593 |avg tokens 2120.100 |tokens/s 8139.306 |walltime 2936.514 | +Transformer | epoch 0 | step 11070 |avg loss 7.958 |avg tokens 2063.900 |tokens/s 8035.601 |walltime 2939.082 | +Transformer | epoch 0 | step 11080 |avg loss 7.458 |avg tokens 2013.500 |tokens/s 7678.894 |walltime 2941.704 | +Transformer | epoch 0 | step 11090 |avg loss 7.252 |avg tokens 2385.600 |tokens/s 8415.047 |walltime 2944.539 | +Transformer | epoch 0 | step 11100 |avg loss 7.846 |avg tokens 2270.100 |tokens/s 8540.229 |walltime 2947.197 | +Transformer | epoch 0 | step 11110 |avg loss 7.426 |avg tokens 2237.600 |tokens/s 8046.156 |walltime 2949.978 | +Transformer | epoch 0 | step 11120 |avg loss 7.710 |avg tokens 2095.800 |tokens/s 7867.119 |walltime 2952.642 | +Transformer | epoch 0 | step 11130 |avg loss 7.685 |avg tokens 2232.300 |tokens/s 8403.366 |walltime 2955.299 | +Transformer | epoch 0 | step 11140 |avg loss 7.841 |avg tokens 2076.100 |tokens/s 7913.007 |walltime 2957.923 | +Transformer | epoch 0 | step 11150 |avg loss 7.458 |avg tokens 2205.900 |tokens/s 8326.268 |walltime 2960.572 | +Transformer | epoch 0 | step 11160 |avg loss 7.520 |avg tokens 2356.000 |tokens/s 8613.012 |walltime 2963.307 | +Transformer | epoch 0 | step 11170 |avg loss 8.199 |avg tokens 2298.800 |tokens/s 8661.023 |walltime 2965.961 | +Transformer | epoch 0 | step 11180 |avg loss 8.130 |avg tokens 1803.700 |tokens/s 7410.059 |walltime 2968.396 | +Transformer | epoch 0 | step 11190 |avg loss 7.792 |avg tokens 2324.500 |tokens/s 8481.763 |walltime 2971.136 | +Transformer | epoch 0 | step 11200 |avg loss 7.622 |avg tokens 2231.500 |tokens/s 8265.729 |walltime 2973.836 | +Transformer | epoch 0 | step 11210 |avg loss 7.538 |avg tokens 2242.400 |tokens/s 8255.147 |walltime 2976.552 | +Transformer | epoch 0 | step 11220 |avg loss 8.395 |avg tokens 1919.300 |tokens/s 8041.363 |walltime 2978.939 | +Transformer | epoch 0 | step 11230 |avg loss 7.871 |avg tokens 2258.600 |tokens/s 8593.933 |walltime 2981.567 | +Transformer | epoch 0 | step 11240 |avg loss 8.157 |avg tokens 2144.600 |tokens/s 8322.890 |walltime 2984.144 | +Transformer | epoch 0 | step 11250 |avg loss 7.866 |avg tokens 2011.900 |tokens/s 7892.030 |walltime 2986.693 | +Transformer | epoch 0 | step 11260 |avg loss 7.870 |avg tokens 2286.500 |tokens/s 8420.033 |walltime 2989.409 | +Transformer | epoch 0 | step 11270 |avg loss 8.034 |avg tokens 2206.000 |tokens/s 8353.458 |walltime 2992.050 | +Transformer | epoch 0 | step 11280 |avg loss 7.660 |avg tokens 2463.500 |tokens/s 8824.785 |walltime 2994.841 | +Transformer | epoch 0 | step 11290 |avg loss 7.661 |avg tokens 2241.100 |tokens/s 8243.273 |walltime 2997.560 | +Transformer | epoch 0 | step 11300 |avg loss 8.160 |avg tokens 1787.800 |tokens/s 7040.101 |walltime 3000.099 | +Transformer | epoch 0 | step 11310 |avg loss 7.785 |avg tokens 2121.900 |tokens/s 8381.595 |walltime 3002.631 | +Transformer | epoch 0 | step 11320 |avg loss 8.236 |avg tokens 1880.100 |tokens/s 7781.862 |walltime 3005.047 | +Transformer | epoch 0 | step 11330 |avg loss 8.067 |avg tokens 2077.700 |tokens/s 8157.201 |walltime 3007.594 | +Transformer | epoch 0 | step 11340 |avg loss 7.516 |avg tokens 2274.400 |tokens/s 8197.156 |walltime 3010.369 | +Transformer | epoch 0 | step 11350 |avg loss 7.843 |avg tokens 2256.900 |tokens/s 8443.770 |walltime 3013.042 | +Transformer | epoch 0 | step 11360 |avg loss 7.687 |avg tokens 2147.400 |tokens/s 8070.717 |walltime 3015.702 | +Transformer | epoch 0 | step 11370 |avg loss 7.569 |avg tokens 2036.000 |tokens/s 7726.072 |walltime 3018.337 | +Transformer | epoch 0 | step 11380 |avg loss 7.466 |avg tokens 2326.500 |tokens/s 8563.320 |walltime 3021.054 | +Transformer | epoch 0 | step 11390 |avg loss 7.677 |avg tokens 2302.500 |tokens/s 8520.988 |walltime 3023.756 | +Transformer | epoch 0 | step 11400 |avg loss 7.963 |avg tokens 2110.300 |tokens/s 8333.887 |walltime 3026.289 | +Transformer | epoch 0 | step 11410 |avg loss 7.490 |avg tokens 2262.400 |tokens/s 8379.377 |walltime 3028.989 | +Transformer | epoch 0 | step 11420 |avg loss 7.790 |avg tokens 2129.000 |tokens/s 8128.549 |walltime 3031.608 | +Transformer | epoch 0 | step 11430 |avg loss 7.967 |avg tokens 2142.900 |tokens/s 8335.407 |walltime 3034.179 | +Transformer | epoch 0 | step 11440 |avg loss 7.563 |avg tokens 2249.200 |tokens/s 8321.739 |walltime 3036.881 | +Transformer | epoch 0 | step 11450 |avg loss 7.635 |avg tokens 2331.200 |tokens/s 8296.885 |walltime 3039.691 | +Transformer | epoch 0 | step 11460 |avg loss 7.702 |avg tokens 2326.300 |tokens/s 8616.315 |walltime 3042.391 | +Transformer | epoch 0 | step 11470 |avg loss 7.969 |avg tokens 2108.700 |tokens/s 8137.030 |walltime 3044.983 | +Transformer | epoch 0 | step 11480 |avg loss 7.729 |avg tokens 2180.700 |tokens/s 8236.409 |walltime 3047.630 | +Transformer | epoch 0 | step 11490 |avg loss 7.392 |avg tokens 2188.000 |tokens/s 8123.104 |walltime 3050.324 | +Transformer | epoch 0 | step 11500 |avg loss 7.591 |avg tokens 2313.700 |tokens/s 8435.682 |walltime 3053.067 | +Transformer | epoch 0 | step 11510 |avg loss 7.490 |avg tokens 2176.000 |tokens/s 8151.101 |walltime 3055.736 | +Transformer | epoch 0 | step 11520 |avg loss 7.944 |avg tokens 2236.700 |tokens/s 8716.894 |walltime 3058.302 | +Transformer | epoch 0 | step 11530 |avg loss 7.718 |avg tokens 2178.400 |tokens/s 8363.793 |walltime 3060.907 | +Transformer | epoch 0 | step 11540 |avg loss 7.695 |avg tokens 2185.100 |tokens/s 8351.392 |walltime 3063.523 | +Transformer | epoch 0 | step 11550 |avg loss 7.377 |avg tokens 2380.000 |tokens/s 8527.686 |walltime 3066.314 | +Transformer | epoch 0 | step 11560 |avg loss 7.246 |avg tokens 2260.800 |tokens/s 8476.579 |walltime 3068.981 | +Transformer | epoch 0 | step 11570 |avg loss 7.755 |avg tokens 2153.700 |tokens/s 8268.100 |walltime 3071.586 | +Transformer | epoch 0 | step 11580 |avg loss 7.759 |avg tokens 2271.700 |tokens/s 8739.358 |walltime 3074.185 | +Transformer | epoch 0 | step 11590 |avg loss 7.958 |avg tokens 2167.400 |tokens/s 8392.842 |walltime 3076.768 | +Transformer | epoch 0 | step 11600 |avg loss 7.869 |avg tokens 2274.400 |tokens/s 8516.792 |walltime 3079.438 | +Transformer | epoch 0 | step 11610 |avg loss 7.910 |avg tokens 1989.600 |tokens/s 7660.052 |walltime 3082.036 | +Transformer | epoch 0 | step 11620 |avg loss 8.011 |avg tokens 2338.100 |tokens/s 8856.368 |walltime 3084.676 | +Transformer | epoch 0 | step 11630 |avg loss 7.881 |avg tokens 1998.200 |tokens/s 8165.721 |walltime 3087.123 | +Transformer | epoch 0 | step 11640 |avg loss 7.878 |avg tokens 2278.700 |tokens/s 8608.637 |walltime 3089.770 | +Transformer | epoch 0 | step 11650 |avg loss 7.383 |avg tokens 2296.800 |tokens/s 8535.048 |walltime 3092.461 | +Transformer | epoch 0 | step 11660 |avg loss 7.737 |avg tokens 2227.200 |tokens/s 8498.399 |walltime 3095.081 | +Transformer | epoch 0 | step 11670 |avg loss 7.430 |avg tokens 2264.800 |tokens/s 8270.578 |walltime 3097.820 | +Transformer | epoch 0 | step 11680 |avg loss 7.718 |avg tokens 2234.800 |tokens/s 8223.577 |walltime 3100.537 | +Transformer | epoch 0 | step 11690 |avg loss 7.448 |avg tokens 2416.700 |tokens/s 8997.854 |walltime 3103.223 | +Transformer | epoch 0 | step 11700 |avg loss 7.922 |avg tokens 2177.200 |tokens/s 8735.987 |walltime 3105.715 | +Transformer | epoch 0 | step 11710 |avg loss 7.575 |avg tokens 2064.600 |tokens/s 7787.381 |walltime 3108.367 | +Transformer | epoch 0 | step 11720 |avg loss 7.693 |avg tokens 2068.800 |tokens/s 7862.108 |walltime 3110.998 | +Transformer | epoch 0 | step 11730 |avg loss 7.796 |avg tokens 2151.700 |tokens/s 8031.832 |walltime 3113.677 | +Transformer | epoch 0 | step 11740 |avg loss 7.914 |avg tokens 1857.800 |tokens/s 7272.010 |walltime 3116.232 | +Transformer | epoch 0 | step 11750 |avg loss 7.657 |avg tokens 2286.400 |tokens/s 8588.441 |walltime 3118.894 | +Transformer | epoch 0 | step 11760 |avg loss 8.469 |avg tokens 1863.100 |tokens/s 7440.333 |walltime 3121.398 | +Transformer | epoch 0 | step 11770 |avg loss 8.124 |avg tokens 2277.400 |tokens/s 8591.575 |walltime 3124.049 | +Transformer | epoch 0 | step 11780 |avg loss 7.908 |avg tokens 2135.500 |tokens/s 8319.186 |walltime 3126.616 | +Transformer | epoch 0 | step 11790 |avg loss 7.694 |avg tokens 2150.100 |tokens/s 8128.544 |walltime 3129.261 | +Transformer | epoch 0 | step 11800 |avg loss 7.974 |avg tokens 2092.600 |tokens/s 8051.722 |walltime 3131.860 | +Transformer | epoch 0 | step 11810 |avg loss 7.692 |avg tokens 2234.400 |tokens/s 8167.622 |walltime 3134.595 | +Transformer | epoch 0 | step 11820 |avg loss 8.218 |avg tokens 1848.500 |tokens/s 7881.301 |walltime 3136.941 | +Transformer | epoch 0 | step 11830 |avg loss 8.014 |avg tokens 2288.400 |tokens/s 8651.503 |walltime 3139.586 | +Transformer | epoch 0 | step 11840 |avg loss 7.912 |avg tokens 2053.600 |tokens/s 8084.857 |walltime 3142.126 | +Transformer | epoch 0 | step 11850 |avg loss 7.768 |avg tokens 2181.500 |tokens/s 8040.147 |walltime 3144.839 | +Transformer | epoch 0 | step 11860 |avg loss 8.074 |avg tokens 1826.500 |tokens/s 7322.407 |walltime 3147.334 | +Transformer | epoch 0 | step 11870 |avg loss 7.835 |avg tokens 2203.400 |tokens/s 8255.732 |walltime 3150.003 | +Transformer | epoch 0 | step 11880 |avg loss 8.107 |avg tokens 2261.900 |tokens/s 8745.063 |walltime 3152.589 | +Transformer | epoch 0 | step 11890 |avg loss 7.471 |avg tokens 2376.000 |tokens/s 8486.316 |walltime 3155.389 | +Transformer | epoch 0 | step 11900 |avg loss 7.603 |avg tokens 2363.800 |tokens/s 8613.076 |walltime 3158.133 | +Transformer | epoch 0 | step 11910 |avg loss 7.973 |avg tokens 2247.000 |tokens/s 8563.763 |walltime 3160.757 | +Transformer | epoch 0 | step 11920 |avg loss 8.189 |avg tokens 1990.300 |tokens/s 7670.362 |walltime 3163.352 | +Transformer | epoch 0 | step 11930 |avg loss 7.207 |avg tokens 2314.800 |tokens/s 8355.491 |walltime 3166.122 | +Transformer | epoch 0 | step 11940 |avg loss 7.356 |avg tokens 2227.100 |tokens/s 8176.105 |walltime 3168.846 | +Transformer | epoch 0 | step 11950 |avg loss 7.549 |avg tokens 2326.200 |tokens/s 8564.660 |walltime 3171.562 | +Transformer | epoch 0 | step 11960 |avg loss 8.141 |avg tokens 2360.800 |tokens/s 9203.501 |walltime 3174.128 | +Transformer | epoch 0 | step 11970 |avg loss 7.630 |avg tokens 2350.100 |tokens/s 8364.959 |walltime 3176.937 | +Transformer | epoch 0 | step 11980 |avg loss 7.807 |avg tokens 2293.400 |tokens/s 8506.742 |walltime 3179.633 | +Transformer | epoch 0 | step 11990 |avg loss 7.684 |avg tokens 2438.400 |tokens/s 8922.075 |walltime 3182.366 | +Transformer | epoch 0 | step 12000 |avg loss 7.969 |avg tokens 2205.500 |tokens/s 8551.941 |walltime 3184.945 | +Transformer | epoch 0 | step 12010 |avg loss 7.908 |avg tokens 2280.000 |tokens/s 8659.647 |walltime 3187.578 | +Transformer | epoch 0 | step 12020 |avg loss 7.649 |avg tokens 2334.400 |tokens/s 8458.130 |walltime 3190.338 | +Transformer | epoch 0 | step 12030 |avg loss 7.800 |avg tokens 2107.100 |tokens/s 8086.845 |walltime 3192.943 | +Transformer | epoch 0 | step 12040 |avg loss 7.498 |avg tokens 2237.200 |tokens/s 8296.040 |walltime 3195.640 | +Transformer | epoch 0 | step 12050 |avg loss 7.699 |avg tokens 2247.000 |tokens/s 8202.013 |walltime 3198.380 | +Transformer | epoch 0 | step 12060 |avg loss 7.812 |avg tokens 2322.800 |tokens/s 8623.982 |walltime 3201.073 | +Transformer | epoch 0 | step 12070 |avg loss 7.849 |avg tokens 2132.000 |tokens/s 8050.267 |walltime 3203.721 | +Transformer | epoch 0 | step 12080 |avg loss 8.138 |avg tokens 2028.900 |tokens/s 8359.158 |walltime 3206.149 | +Transformer | epoch 0 | step 12090 |avg loss 8.182 |avg tokens 2060.400 |tokens/s 8267.188 |walltime 3208.641 | +Transformer | epoch 0 | step 12100 |avg loss 7.949 |avg tokens 1818.900 |tokens/s 7278.999 |walltime 3211.140 | +Transformer | epoch 0 | step 12110 |avg loss 7.417 |avg tokens 2160.200 |tokens/s 8283.460 |walltime 3213.748 | +Transformer | epoch 0 | step 12120 |avg loss 8.010 |avg tokens 2189.000 |tokens/s 8389.752 |walltime 3216.357 | +Transformer | epoch 0 | step 12130 |avg loss 7.667 |avg tokens 2390.400 |tokens/s 8633.599 |walltime 3219.125 | +Transformer | epoch 0 | step 12140 |avg loss 7.846 |avg tokens 2289.900 |tokens/s 8690.265 |walltime 3221.760 | +Transformer | epoch 0 | step 12150 |avg loss 7.769 |avg tokens 2240.900 |tokens/s 8336.327 |walltime 3224.449 | +Transformer | epoch 0 | step 12160 |avg loss 7.917 |avg tokens 2259.900 |tokens/s 8625.793 |walltime 3227.069 | +Transformer | epoch 0 | step 12170 |avg loss 7.683 |avg tokens 2227.200 |tokens/s 8336.519 |walltime 3229.740 | +Transformer | epoch 0 | step 12180 |avg loss 7.593 |avg tokens 2163.400 |tokens/s 8155.113 |walltime 3232.393 | +Transformer | epoch 0 | step 12190 |avg loss 7.122 |avg tokens 2400.000 |tokens/s 8613.318 |walltime 3235.179 | +Transformer | epoch 0 | step 12200 |avg loss 7.972 |avg tokens 2039.800 |tokens/s 7801.160 |walltime 3237.794 | +Transformer | epoch 0 | step 12210 |avg loss 7.677 |avg tokens 2323.700 |tokens/s 8413.084 |walltime 3240.556 | +Transformer | epoch 0 | step 12220 |avg loss 7.744 |avg tokens 1998.400 |tokens/s 7730.786 |walltime 3243.141 | +Transformer | epoch 0 | step 12230 |avg loss 7.515 |avg tokens 2279.000 |tokens/s 8384.779 |walltime 3245.859 | +Transformer | epoch 0 | step 12240 |avg loss 7.823 |avg tokens 2016.500 |tokens/s 7769.243 |walltime 3248.455 | +Transformer | epoch 0 | step 12250 |avg loss 7.576 |avg tokens 2020.900 |tokens/s 8033.481 |walltime 3250.970 | +Transformer | epoch 0 | step 12260 |avg loss 7.532 |avg tokens 2103.300 |tokens/s 7807.019 |walltime 3253.664 | +Transformer | epoch 0 | step 12270 |avg loss 7.584 |avg tokens 2087.700 |tokens/s 8001.479 |walltime 3256.274 | +Transformer | epoch 0 | step 12280 |avg loss 7.331 |avg tokens 2319.800 |tokens/s 8440.935 |walltime 3259.022 | +Transformer | epoch 0 | step 12290 |avg loss 7.200 |avg tokens 2368.400 |tokens/s 8357.610 |walltime 3261.856 | +Transformer | epoch 0 | step 12300 |avg loss 7.857 |avg tokens 2226.400 |tokens/s 8225.303 |walltime 3264.562 | +Transformer | epoch 0 | step 12310 |avg loss 7.777 |avg tokens 2377.000 |tokens/s 8879.967 |walltime 3267.239 | +Transformer | epoch 0 | step 12320 |avg loss 7.869 |avg tokens 2179.200 |tokens/s 8097.496 |walltime 3269.930 | +Transformer | epoch 0 | step 12330 |avg loss 7.398 |avg tokens 2277.600 |tokens/s 8239.620 |walltime 3272.695 | +Transformer | epoch 0 | step 12340 |avg loss 7.827 |avg tokens 2352.000 |tokens/s 8709.553 |walltime 3275.395 | +Transformer | epoch 0 | step 12350 |avg loss 7.838 |avg tokens 2158.100 |tokens/s 8230.084 |walltime 3278.017 | +Transformer | epoch 0 | step 12360 |avg loss 7.648 |avg tokens 2083.900 |tokens/s 8023.401 |walltime 3280.615 | +Transformer | epoch 0 | step 12370 |avg loss 7.483 |avg tokens 2119.700 |tokens/s 7859.195 |walltime 3283.312 | +Transformer | epoch 0 | step 12380 |avg loss 7.345 |avg tokens 2184.000 |tokens/s 7952.189 |walltime 3286.058 | +Transformer | epoch 0 | step 12390 |avg loss 7.627 |avg tokens 2107.900 |tokens/s 7936.838 |walltime 3288.714 | +Transformer | epoch 0 | step 12400 |avg loss 7.787 |avg tokens 1919.300 |tokens/s 7770.280 |walltime 3291.184 | +Transformer | epoch 0 | step 12410 |avg loss 7.507 |avg tokens 2267.900 |tokens/s 8361.790 |walltime 3293.896 | +Transformer | epoch 0 | step 12420 |avg loss 7.867 |avg tokens 1950.700 |tokens/s 7633.105 |walltime 3296.452 | +Transformer | epoch 0 | step 12430 |avg loss 7.596 |avg tokens 2205.600 |tokens/s 8232.354 |walltime 3299.131 | +Transformer | epoch 0 | step 12440 |avg loss 7.978 |avg tokens 2127.800 |tokens/s 8376.654 |walltime 3301.671 | +Transformer | epoch 0 | step 12450 |avg loss 7.414 |avg tokens 2122.900 |tokens/s 8233.291 |walltime 3304.250 | +Transformer | epoch 0 | step 12460 |avg loss 7.558 |avg tokens 2342.400 |tokens/s 8478.822 |walltime 3307.012 | +Transformer | epoch 0 | step 12470 |avg loss 7.793 |avg tokens 2266.000 |tokens/s 8531.414 |walltime 3309.668 | +Transformer | epoch 0 | step 12480 |avg loss 7.391 |avg tokens 2358.400 |tokens/s 8634.369 |walltime 3312.400 | +Transformer | epoch 0 | step 12490 |avg loss 8.077 |avg tokens 1975.800 |tokens/s 7759.010 |walltime 3314.946 | +Transformer | epoch 0 | step 12500 |avg loss 7.957 |avg tokens 2091.500 |tokens/s 7878.494 |walltime 3317.601 | +Transformer | epoch 0 | step 12510 |avg loss 7.548 |avg tokens 2250.200 |tokens/s 8399.331 |walltime 3320.280 | +Transformer | epoch 0 | step 12520 |avg loss 7.984 |avg tokens 2161.600 |tokens/s 8648.851 |walltime 3322.779 | +Transformer | epoch 0 | step 12530 |avg loss 7.768 |avg tokens 2209.200 |tokens/s 8208.328 |walltime 3325.471 | +Transformer | epoch 0 | step 12540 |avg loss 7.543 |avg tokens 2089.100 |tokens/s 7767.951 |walltime 3328.160 | +Transformer | epoch 0 | step 12550 |avg loss 7.636 |avg tokens 2220.800 |tokens/s 8238.808 |walltime 3330.856 | +Transformer | epoch 0 | step 12560 |avg loss 7.558 |avg tokens 2287.200 |tokens/s 8314.960 |walltime 3333.606 | +Transformer | epoch 0 | step 12570 |avg loss 8.124 |avg tokens 2247.600 |tokens/s 8780.243 |walltime 3336.166 | +Transformer | epoch 0 | step 12580 |avg loss 7.557 |avg tokens 2313.600 |tokens/s 8472.141 |walltime 3338.897 | +Transformer | epoch 0 | step 12590 |avg loss 7.832 |avg tokens 2151.000 |tokens/s 8239.664 |walltime 3341.507 | +Transformer | epoch 0 | step 12600 |avg loss 7.372 |avg tokens 2209.000 |tokens/s 8111.995 |walltime 3344.231 | +Transformer | epoch 0 | step 12610 |avg loss 7.442 |avg tokens 2292.000 |tokens/s 8354.902 |walltime 3346.974 | +Transformer | epoch 0 | step 12620 |avg loss 7.859 |avg tokens 2162.300 |tokens/s 8425.892 |walltime 3349.540 | +Transformer | epoch 0 | step 12630 |avg loss 7.481 |avg tokens 2272.800 |tokens/s 8488.027 |walltime 3352.218 | +Transformer | epoch 0 | step 12640 |avg loss 7.651 |avg tokens 2250.400 |tokens/s 8175.910 |walltime 3354.970 | +Transformer | epoch 0 | step 12650 |avg loss 7.271 |avg tokens 2334.400 |tokens/s 8505.402 |walltime 3357.715 | +Transformer | epoch 0 | step 12660 |avg loss 8.201 |avg tokens 2117.900 |tokens/s 7997.608 |walltime 3360.363 | +Transformer | epoch 0 | step 12670 |avg loss 7.878 |avg tokens 2344.900 |tokens/s 8778.963 |walltime 3363.034 | +Transformer | epoch 0 | step 12680 |avg loss 7.950 |avg tokens 2179.500 |tokens/s 8278.007 |walltime 3365.667 | +Transformer | epoch 0 | step 12690 |avg loss 7.815 |avg tokens 2120.200 |tokens/s 7972.820 |walltime 3368.326 | +Transformer | epoch 0 | step 12700 |avg loss 7.490 |avg tokens 2197.600 |tokens/s 7980.507 |walltime 3371.080 | +Transformer | epoch 0 | step 12710 |avg loss 7.685 |avg tokens 2021.400 |tokens/s 7910.724 |walltime 3373.635 | +Transformer | epoch 0 | step 12720 |avg loss 7.808 |avg tokens 2199.300 |tokens/s 8312.917 |walltime 3376.281 | +Transformer | epoch 0 | step 12730 |avg loss 7.710 |avg tokens 2252.800 |tokens/s 8275.109 |walltime 3379.003 | +Transformer | epoch 0 | step 12740 |avg loss 7.989 |avg tokens 1945.600 |tokens/s 7937.364 |walltime 3381.455 | +Transformer | epoch 0 | step 12750 |avg loss 7.775 |avg tokens 2162.500 |tokens/s 8256.285 |walltime 3384.074 | +Transformer | epoch 0 | step 12760 |avg loss 8.012 |avg tokens 2172.400 |tokens/s 8347.024 |walltime 3386.676 | +Transformer | epoch 0 | step 12770 |avg loss 8.010 |avg tokens 2054.300 |tokens/s 7911.269 |walltime 3389.273 | +Transformer | epoch 0 | step 12780 |avg loss 8.006 |avg tokens 2053.300 |tokens/s 8145.720 |walltime 3391.794 | +Transformer | epoch 0 | step 12790 |avg loss 7.808 |avg tokens 2149.600 |tokens/s 8123.443 |walltime 3394.440 | +Transformer | epoch 0 | step 12800 |avg loss 7.515 |avg tokens 2311.200 |tokens/s 8292.597 |walltime 3397.227 | +Transformer | epoch 0 | step 12810 |avg loss 7.616 |avg tokens 2188.500 |tokens/s 8239.091 |walltime 3399.883 | +Transformer | epoch 0 | step 12820 |avg loss 7.290 |avg tokens 2316.100 |tokens/s 8278.978 |walltime 3402.681 | +Transformer | epoch 0 | step 12830 |avg loss 7.439 |avg tokens 2188.800 |tokens/s 8057.522 |walltime 3405.397 | +Transformer | epoch 0 | step 12840 |avg loss 8.110 |avg tokens 2246.000 |tokens/s 8602.235 |walltime 3408.008 | +Transformer | epoch 0 | step 12850 |avg loss 8.097 |avg tokens 1953.200 |tokens/s 7666.070 |walltime 3410.556 | +Transformer | epoch 0 | step 12860 |avg loss 7.585 |avg tokens 2107.200 |tokens/s 7946.776 |walltime 3413.208 | +Transformer | epoch 0 | step 12870 |avg loss 7.781 |avg tokens 2272.200 |tokens/s 8488.434 |walltime 3415.885 | +Transformer | epoch 0 | step 12880 |avg loss 7.406 |avg tokens 2293.400 |tokens/s 8607.883 |walltime 3418.549 | +Transformer | epoch 0 | step 12890 |avg loss 8.026 |avg tokens 2210.000 |tokens/s 8461.967 |walltime 3421.161 | +Transformer | epoch 0 | step 12900 |avg loss 7.446 |avg tokens 2308.000 |tokens/s 8307.768 |walltime 3423.939 | +Transformer | epoch 0 | step 12910 |avg loss 7.791 |avg tokens 2139.200 |tokens/s 8245.199 |walltime 3426.533 | +Transformer | epoch 0 | step 12920 |avg loss 7.719 |avg tokens 2061.300 |tokens/s 8138.521 |walltime 3429.066 | +Transformer | epoch 0 | step 12930 |avg loss 7.664 |avg tokens 2322.200 |tokens/s 8475.126 |walltime 3431.806 | +Transformer | epoch 0 | step 12940 |avg loss 7.999 |avg tokens 2187.600 |tokens/s 8358.786 |walltime 3434.423 | +Transformer | epoch 0 | step 12950 |avg loss 7.942 |avg tokens 2134.100 |tokens/s 8169.608 |walltime 3437.035 | +Transformer | epoch 0 | step 12960 |avg loss 7.808 |avg tokens 2321.000 |tokens/s 8436.316 |walltime 3439.787 | +Transformer | epoch 0 | step 12970 |avg loss 7.483 |avg tokens 2336.800 |tokens/s 8557.138 |walltime 3442.517 | +Transformer | epoch 0 | step 12980 |avg loss 7.555 |avg tokens 2461.100 |tokens/s 8914.418 |walltime 3445.278 | +Transformer | epoch 0 | step 12990 |avg loss 7.833 |avg tokens 2129.800 |tokens/s 8104.635 |walltime 3447.906 | +Transformer | epoch 0 | step 13000 |avg loss 7.883 |avg tokens 2257.600 |tokens/s 8468.194 |walltime 3450.572 | +Transformer | epoch 0 | step 13010 |avg loss 7.539 |avg tokens 2263.400 |tokens/s 8287.536 |walltime 3453.303 | +Transformer | epoch 0 | step 13020 |avg loss 7.682 |avg tokens 2164.100 |tokens/s 8175.928 |walltime 3455.950 | +Transformer | epoch 0 | step 13030 |avg loss 7.573 |avg tokens 2219.200 |tokens/s 8304.648 |walltime 3458.622 | +Transformer | epoch 0 | step 13040 |avg loss 7.886 |avg tokens 1981.500 |tokens/s 7624.708 |walltime 3461.221 | +Transformer | epoch 0 | step 13050 |avg loss 8.047 |avg tokens 2250.200 |tokens/s 8570.336 |walltime 3463.847 | +Transformer | epoch 0 | step 13060 |avg loss 7.902 |avg tokens 2197.400 |tokens/s 8464.506 |walltime 3466.443 | +Transformer | epoch 0 | step 13070 |avg loss 7.595 |avg tokens 2272.000 |tokens/s 8465.560 |walltime 3469.127 | +Transformer | epoch 0 | step 13080 |avg loss 7.967 |avg tokens 2340.700 |tokens/s 8692.252 |walltime 3471.819 | +Transformer | epoch 0 | step 13090 |avg loss 7.709 |avg tokens 2184.700 |tokens/s 8236.172 |walltime 3474.472 | +Transformer | epoch 0 | step 13100 |avg loss 7.632 |avg tokens 2209.500 |tokens/s 8326.691 |walltime 3477.125 | +Transformer | epoch 0 | step 13110 |avg loss 8.197 |avg tokens 2105.400 |tokens/s 8257.725 |walltime 3479.675 | +Transformer | epoch 0 | step 13120 |avg loss 7.190 |avg tokens 2340.000 |tokens/s 8452.418 |walltime 3482.444 | +Transformer | epoch 0 | step 13130 |avg loss 8.008 |avg tokens 2032.500 |tokens/s 7804.406 |walltime 3485.048 | +Transformer | epoch 0 | step 13140 |avg loss 7.615 |avg tokens 2240.200 |tokens/s 8271.553 |walltime 3487.756 | +Transformer | epoch 0 | step 13150 |avg loss 7.886 |avg tokens 2239.600 |tokens/s 8295.703 |walltime 3490.456 | +Transformer | epoch 0 | step 13160 |avg loss 8.069 |avg tokens 2162.300 |tokens/s 8635.606 |walltime 3492.960 | +Transformer | epoch 0 | step 13170 |avg loss 7.652 |avg tokens 2347.200 |tokens/s 8389.216 |walltime 3495.758 | +Transformer | epoch 0 | step 13180 |avg loss 7.404 |avg tokens 2304.000 |tokens/s 8369.300 |walltime 3498.511 | +Transformer | epoch 0 | step 13190 |avg loss 7.634 |avg tokens 2310.700 |tokens/s 8475.147 |walltime 3501.237 | +Transformer | epoch 0 | step 13200 |avg loss 7.864 |avg tokens 2251.200 |tokens/s 8314.965 |walltime 3503.944 | +Transformer | epoch 0 | step 13210 |avg loss 8.001 |avg tokens 2196.400 |tokens/s 8719.395 |walltime 3506.463 | +Transformer | epoch 0 | step 13220 |avg loss 7.675 |avg tokens 2152.100 |tokens/s 8030.517 |walltime 3509.143 | +Transformer | epoch 0 | step 13230 |avg loss 8.117 |avg tokens 2176.800 |tokens/s 8545.490 |walltime 3511.691 | +Transformer | epoch 0 | step 13240 |avg loss 7.708 |avg tokens 2250.100 |tokens/s 8364.993 |walltime 3514.381 | +Transformer | epoch 0 | step 13250 |avg loss 7.689 |avg tokens 2023.300 |tokens/s 7889.705 |walltime 3516.945 | +Transformer | epoch 0 | step 13260 |avg loss 7.954 |avg tokens 2201.400 |tokens/s 8414.320 |walltime 3519.561 | +Transformer | epoch 0 | step 13270 |avg loss 7.699 |avg tokens 1896.200 |tokens/s 7685.842 |walltime 3522.028 | +Transformer | epoch 0 | step 13280 |avg loss 7.325 |avg tokens 2154.900 |tokens/s 7963.752 |walltime 3524.734 | +Transformer | epoch 0 | step 13290 |avg loss 7.739 |avg tokens 2306.300 |tokens/s 8408.504 |walltime 3527.477 | +Transformer | epoch 0 | step 13300 |avg loss 7.644 |avg tokens 2244.000 |tokens/s 8562.588 |walltime 3530.098 | +Transformer | epoch 0 | step 13310 |avg loss 7.543 |avg tokens 2268.800 |tokens/s 8624.014 |walltime 3532.729 | +Transformer | epoch 0 | step 13320 |avg loss 7.861 |avg tokens 2372.800 |tokens/s 8693.385 |walltime 3535.458 | +Transformer | epoch 0 | step 13330 |avg loss 7.873 |avg tokens 1986.900 |tokens/s 8133.059 |walltime 3537.901 | +Transformer | epoch 0 | step 13340 |avg loss 8.032 |avg tokens 2296.300 |tokens/s 8848.830 |walltime 3540.496 | +Transformer | epoch 0 | step 13350 |avg loss 7.724 |avg tokens 2250.400 |tokens/s 8376.601 |walltime 3543.183 | +Transformer | epoch 0 | step 13360 |avg loss 8.272 |avg tokens 2385.300 |tokens/s 9284.254 |walltime 3545.752 | +Transformer | epoch 0 | step 13370 |avg loss 7.849 |avg tokens 2164.800 |tokens/s 8270.963 |walltime 3548.369 | +Transformer | epoch 0 | step 13380 |avg loss 7.635 |avg tokens 2265.000 |tokens/s 8423.313 |walltime 3551.058 | +Transformer | epoch 0 | step 13390 |avg loss 7.964 |avg tokens 2038.400 |tokens/s 7862.549 |walltime 3553.651 | +Transformer | epoch 0 | step 13400 |avg loss 7.780 |avg tokens 2301.600 |tokens/s 8497.498 |walltime 3556.359 | +Transformer | epoch 0 | step 13410 |avg loss 7.600 |avg tokens 2096.200 |tokens/s 8112.828 |walltime 3558.943 | +Transformer | epoch 0 | step 13420 |avg loss 7.535 |avg tokens 2316.800 |tokens/s 8552.420 |walltime 3561.652 | +Transformer | epoch 0 | step 13430 |avg loss 7.831 |avg tokens 2277.400 |tokens/s 8483.651 |walltime 3564.337 | +Transformer | epoch 0 | step 13440 |avg loss 7.712 |avg tokens 2305.500 |tokens/s 8578.002 |walltime 3567.024 | +Transformer | epoch 0 | step 13450 |avg loss 7.445 |avg tokens 2372.000 |tokens/s 8440.124 |walltime 3569.835 | +Transformer | epoch 0 | step 13460 |avg loss 7.961 |avg tokens 2254.400 |tokens/s 8493.122 |walltime 3572.489 | +Transformer | epoch 0 | step 13470 |avg loss 8.213 |avg tokens 2047.600 |tokens/s 8444.363 |walltime 3574.914 | +Transformer | epoch 0 | step 13480 |avg loss 7.628 |avg tokens 2302.400 |tokens/s 8669.464 |walltime 3577.570 | +Transformer | epoch 0 | step 13490 |avg loss 7.580 |avg tokens 2336.000 |tokens/s 8740.674 |walltime 3580.242 | +Transformer | epoch 0 | step 13500 |avg loss 7.836 |avg tokens 2198.600 |tokens/s 8875.560 |walltime 3582.719 | +Transformer | epoch 0 | step 13510 |avg loss 8.361 |avg tokens 1988.300 |tokens/s 7849.506 |walltime 3585.252 | +Transformer | epoch 0 | step 13520 |avg loss 7.497 |avg tokens 2335.300 |tokens/s 8567.389 |walltime 3587.978 | +Transformer | epoch 0 | step 13530 |avg loss 7.706 |avg tokens 1903.300 |tokens/s 7523.567 |walltime 3590.508 | +Transformer | epoch 0 | step 13540 |avg loss 7.689 |avg tokens 2252.000 |tokens/s 8453.536 |walltime 3593.172 | +Transformer | epoch 0 | step 13550 |avg loss 7.665 |avg tokens 1998.600 |tokens/s 7723.563 |walltime 3595.760 | +Transformer | epoch 0 | step 13560 |avg loss 8.061 |avg tokens 2243.200 |tokens/s 8559.352 |walltime 3598.380 | +Transformer | epoch 0 | step 13570 |avg loss 7.831 |avg tokens 2216.500 |tokens/s 8446.206 |walltime 3601.005 | +Transformer | epoch 0 | step 13580 |avg loss 7.755 |avg tokens 2337.900 |tokens/s 8937.422 |walltime 3603.620 | +Transformer | epoch 0 | step 13590 |avg loss 7.823 |avg tokens 2139.900 |tokens/s 7948.079 |walltime 3606.313 | +Transformer | epoch 0 | step 13600 |avg loss 7.901 |avg tokens 1960.900 |tokens/s 7892.346 |walltime 3608.797 | +Transformer | epoch 0 | step 13610 |avg loss 7.952 |avg tokens 2175.100 |tokens/s 8201.208 |walltime 3611.450 | +Transformer | epoch 0 | step 13620 |avg loss 7.863 |avg tokens 1987.200 |tokens/s 7698.653 |walltime 3614.031 | +Transformer | epoch 0 | step 13630 |avg loss 7.506 |avg tokens 2276.000 |tokens/s 8203.508 |walltime 3616.805 | +Transformer | epoch 0 | step 13640 |avg loss 7.631 |avg tokens 2064.800 |tokens/s 7786.371 |walltime 3619.457 | +Transformer | epoch 0 | step 13650 |avg loss 7.704 |avg tokens 2212.900 |tokens/s 8218.340 |walltime 3622.150 | +Transformer | epoch 0 | step 13660 |avg loss 7.649 |avg tokens 2305.600 |tokens/s 8478.570 |walltime 3624.869 | +Transformer | epoch 0 | step 13670 |avg loss 7.570 |avg tokens 2236.700 |tokens/s 8211.639 |walltime 3627.593 | +Transformer | epoch 0 | step 13680 |avg loss 7.832 |avg tokens 2164.000 |tokens/s 8275.212 |walltime 3630.208 | +Transformer | epoch 0 | step 13690 |avg loss 7.540 |avg tokens 2348.400 |tokens/s 8540.915 |walltime 3632.957 | +Transformer | epoch 0 | step 13700 |avg loss 8.264 |avg tokens 2044.100 |tokens/s 8228.353 |walltime 3635.442 | +Transformer | epoch 0 | step 13710 |avg loss 7.430 |avg tokens 2324.500 |tokens/s 8464.134 |walltime 3638.188 | +Transformer | epoch 0 | step 13720 |avg loss 8.045 |avg tokens 2055.800 |tokens/s 7950.092 |walltime 3640.774 | +Transformer | epoch 0 | step 13730 |avg loss 8.076 |avg tokens 1996.000 |tokens/s 7804.770 |walltime 3643.331 | +Transformer | epoch 0 | step 13740 |avg loss 7.503 |avg tokens 2293.000 |tokens/s 8450.093 |walltime 3646.045 | +Transformer | epoch 0 | step 13750 |avg loss 7.893 |avg tokens 2130.800 |tokens/s 8368.111 |walltime 3648.591 | +Transformer | epoch 0 | step 13760 |avg loss 7.688 |avg tokens 2274.400 |tokens/s 8589.353 |walltime 3651.239 | +Transformer | epoch 0 | step 13770 |avg loss 8.216 |avg tokens 1992.100 |tokens/s 7983.594 |walltime 3653.734 | +Transformer | epoch 0 | step 13780 |avg loss 7.725 |avg tokens 2052.700 |tokens/s 7812.382 |walltime 3656.362 | +Transformer | epoch 0 | step 13790 |avg loss 7.626 |avg tokens 2242.400 |tokens/s 8613.603 |walltime 3658.965 | +Transformer | epoch 0 | step 13800 |avg loss 7.709 |avg tokens 2171.200 |tokens/s 8255.430 |walltime 3661.595 | +Transformer | epoch 0 | step 13810 |avg loss 7.963 |avg tokens 2099.900 |tokens/s 8129.113 |walltime 3664.178 | +Transformer | epoch 0 | step 13820 |avg loss 7.985 |avg tokens 2066.600 |tokens/s 7858.535 |walltime 3666.808 | +Transformer | epoch 0 | step 13830 |avg loss 7.700 |avg tokens 2208.000 |tokens/s 8210.147 |walltime 3669.498 | +Transformer | epoch 0 | step 13840 |avg loss 8.269 |avg tokens 2014.300 |tokens/s 8180.685 |walltime 3671.960 | +Transformer | epoch 0 | step 13850 |avg loss 7.619 |avg tokens 2312.800 |tokens/s 8440.683 |walltime 3674.700 | +Transformer | epoch 0 | step 13860 |avg loss 7.651 |avg tokens 2342.300 |tokens/s 8372.824 |walltime 3677.497 | +Transformer | epoch 0 | step 13870 |avg loss 7.277 |avg tokens 2406.300 |tokens/s 8665.172 |walltime 3680.274 | +Transformer | epoch 0 | step 13880 |avg loss 7.700 |avg tokens 2261.000 |tokens/s 8495.449 |walltime 3682.936 | +Transformer | epoch 0 | step 13890 |avg loss 8.173 |avg tokens 1892.000 |tokens/s 7216.800 |walltime 3685.557 | +Transformer | epoch 0 | step 13900 |avg loss 7.643 |avg tokens 2154.400 |tokens/s 8015.566 |walltime 3688.245 | +Transformer | epoch 0 | step 13910 |avg loss 7.873 |avg tokens 2226.600 |tokens/s 8217.959 |walltime 3690.955 | +Transformer | epoch 0 | step 13920 |avg loss 7.730 |avg tokens 2302.400 |tokens/s 8446.716 |walltime 3693.680 | +Transformer | epoch 0 | step 13930 |avg loss 7.685 |avg tokens 2319.900 |tokens/s 8258.636 |walltime 3696.490 | +Transformer | epoch 0 | step 13940 |avg loss 7.722 |avg tokens 2193.800 |tokens/s 8185.575 |walltime 3699.170 | +Transformer | epoch 0 | step 13950 |avg loss 7.701 |avg tokens 1966.600 |tokens/s 7814.886 |walltime 3701.686 | +Transformer | epoch 0 | step 13960 |avg loss 7.770 |avg tokens 2337.900 |tokens/s 8802.326 |walltime 3704.342 | +Transformer | epoch 0 | step 13970 |avg loss 8.187 |avg tokens 1981.500 |tokens/s 7930.487 |walltime 3706.841 | +Transformer | epoch 0 | step 13980 |avg loss 7.611 |avg tokens 2057.600 |tokens/s 7938.930 |walltime 3709.432 | +Transformer | epoch 0 | step 13990 |avg loss 8.010 |avg tokens 2189.700 |tokens/s 8313.046 |walltime 3712.067 | +Transformer | epoch 0 | step 14000 |avg loss 7.661 |avg tokens 2274.600 |tokens/s 8254.588 |walltime 3714.822 | +Transformer | epoch 0 | step 14010 |avg loss 8.038 |avg tokens 2285.600 |tokens/s 8645.374 |walltime 3717.466 | +Transformer | epoch 0 | step 14020 |avg loss 7.731 |avg tokens 2260.500 |tokens/s 8417.854 |walltime 3720.151 | +Transformer | epoch 0 | step 14030 |avg loss 7.899 |avg tokens 2225.400 |tokens/s 8504.286 |walltime 3722.768 | +Transformer | epoch 0 | step 14040 |avg loss 8.099 |avg tokens 2199.000 |tokens/s 8458.029 |walltime 3725.368 | +Transformer | epoch 0 | step 14050 |avg loss 7.846 |avg tokens 2208.100 |tokens/s 8220.214 |walltime 3728.054 | +Transformer | epoch 0 | step 14060 |avg loss 8.052 |avg tokens 2115.700 |tokens/s 8120.993 |walltime 3730.659 | +Transformer | epoch 0 | step 14070 |avg loss 7.534 |avg tokens 2139.100 |tokens/s 8094.078 |walltime 3733.302 | +Transformer | epoch 0 | step 14080 |avg loss 7.924 |avg tokens 2245.000 |tokens/s 8630.587 |walltime 3735.903 | +Transformer | epoch 0 | step 14090 |avg loss 7.430 |avg tokens 2298.400 |tokens/s 8216.344 |walltime 3738.701 | +Transformer | epoch 0 | step 14100 |avg loss 7.419 |avg tokens 2204.700 |tokens/s 8223.139 |walltime 3741.382 | +Transformer | epoch 0 | step 14110 |avg loss 7.888 |avg tokens 2182.800 |tokens/s 8391.454 |walltime 3743.983 | +Transformer | epoch 0 | step 14120 |avg loss 7.479 |avg tokens 2094.600 |tokens/s 7751.974 |walltime 3746.685 | +Transformer | epoch 0 | step 14130 |avg loss 7.800 |avg tokens 2017.900 |tokens/s 8087.494 |walltime 3749.180 | +Transformer | epoch 0 | step 14140 |avg loss 8.027 |avg tokens 2064.100 |tokens/s 8201.198 |walltime 3751.697 | +Transformer | epoch 0 | step 14150 |avg loss 7.755 |avg tokens 2262.200 |tokens/s 8371.498 |walltime 3754.399 | +Transformer | epoch 0 | step 14160 |avg loss 7.889 |avg tokens 2128.900 |tokens/s 8115.435 |walltime 3757.022 | +Transformer | epoch 0 | step 14170 |avg loss 7.410 |avg tokens 2124.800 |tokens/s 8006.502 |walltime 3759.676 | +Transformer | epoch 0 | step 14180 |avg loss 8.042 |avg tokens 2166.300 |tokens/s 8215.666 |walltime 3762.313 | +Transformer | epoch 0 | step 14190 |avg loss 8.153 |avg tokens 1938.500 |tokens/s 7933.847 |walltime 3764.756 | +Transformer | epoch 0 | step 14200 |avg loss 7.783 |avg tokens 2168.100 |tokens/s 8363.951 |walltime 3767.349 | +Transformer | epoch 0 | step 14210 |avg loss 7.705 |avg tokens 2266.400 |tokens/s 8379.424 |walltime 3770.053 | +Transformer | epoch 0 | step 14220 |avg loss 7.872 |avg tokens 2053.800 |tokens/s 7804.021 |walltime 3772.685 | +Transformer | epoch 0 | step 14230 |avg loss 7.603 |avg tokens 2114.800 |tokens/s 7861.077 |walltime 3775.375 | +Transformer | epoch 0 | step 14240 |avg loss 7.880 |avg tokens 2350.800 |tokens/s 8885.408 |walltime 3778.021 | +Transformer | epoch 0 | step 14250 |avg loss 8.294 |avg tokens 2201.100 |tokens/s 8908.936 |walltime 3780.492 | +Transformer | epoch 0 | step 14260 |avg loss 7.595 |avg tokens 2112.100 |tokens/s 8182.953 |walltime 3783.073 | +Transformer | epoch 0 | step 14270 |avg loss 7.693 |avg tokens 2248.800 |tokens/s 8426.607 |walltime 3785.741 | +Transformer | epoch 0 | step 14280 |avg loss 7.597 |avg tokens 2138.800 |tokens/s 8233.314 |walltime 3788.339 | +Transformer | epoch 0 | step 14290 |avg loss 7.896 |avg tokens 2024.900 |tokens/s 7846.715 |walltime 3790.920 | +Transformer | epoch 0 | step 14300 |avg loss 7.647 |avg tokens 2179.200 |tokens/s 8309.483 |walltime 3793.542 | +Transformer | epoch 0 | step 14310 |avg loss 7.691 |avg tokens 2272.800 |tokens/s 8401.410 |walltime 3796.248 | +Transformer | epoch 0 | step 14320 |avg loss 7.894 |avg tokens 2061.600 |tokens/s 7888.366 |walltime 3798.861 | +Transformer | epoch 0 | step 14330 |avg loss 7.713 |avg tokens 2088.000 |tokens/s 7795.423 |walltime 3801.540 | +Transformer | epoch 0 | step 14340 |avg loss 7.789 |avg tokens 2184.800 |tokens/s 8146.431 |walltime 3804.221 | +Transformer | epoch 0 | step 14350 |avg loss 7.961 |avg tokens 2163.900 |tokens/s 8479.843 |walltime 3806.773 | +Transformer | epoch 0 | step 14360 |avg loss 7.583 |avg tokens 2258.500 |tokens/s 8279.009 |walltime 3809.501 | +Transformer | epoch 0 | step 14370 |avg loss 7.762 |avg tokens 2141.600 |tokens/s 8339.065 |walltime 3812.069 | +Transformer | epoch 0 | step 14380 |avg loss 8.064 |avg tokens 2217.400 |tokens/s 8233.105 |walltime 3814.763 | +Transformer | epoch 0 | step 14390 |avg loss 8.018 |avg tokens 2184.000 |tokens/s 8504.154 |walltime 3817.331 | +Transformer | epoch 0 | step 14400 |avg loss 7.959 |avg tokens 2195.200 |tokens/s 8220.438 |walltime 3820.001 | +Transformer | epoch 0 | step 14410 |avg loss 7.680 |avg tokens 2227.200 |tokens/s 8252.985 |walltime 3822.700 | +Transformer | epoch 0 | step 14420 |avg loss 7.928 |avg tokens 2198.400 |tokens/s 8563.602 |walltime 3825.267 | +Transformer | epoch 0 | step 14430 |avg loss 7.823 |avg tokens 2141.400 |tokens/s 8247.157 |walltime 3827.864 | +Transformer | epoch 0 | step 14440 |avg loss 7.806 |avg tokens 2325.400 |tokens/s 8491.533 |walltime 3830.602 | +Transformer | epoch 0 | step 14450 |avg loss 8.113 |avg tokens 2209.000 |tokens/s 8434.363 |walltime 3833.221 | +Transformer | epoch 0 | step 14460 |avg loss 7.733 |avg tokens 2331.500 |tokens/s 8372.912 |walltime 3836.006 | +Transformer | epoch 0 | step 14470 |avg loss 7.843 |avg tokens 2118.000 |tokens/s 8022.359 |walltime 3838.646 | +Transformer | epoch 0 | step 14480 |avg loss 7.767 |avg tokens 2182.400 |tokens/s 8105.786 |walltime 3841.338 | +Transformer | epoch 0 | step 14490 |avg loss 7.843 |avg tokens 2125.500 |tokens/s 7846.251 |walltime 3844.047 | +Transformer | epoch 0 | step 14500 |avg loss 8.023 |avg tokens 1875.400 |tokens/s 7770.032 |walltime 3846.461 | +Transformer | epoch 0 | step 14510 |avg loss 7.419 |avg tokens 2245.800 |tokens/s 8244.719 |walltime 3849.185 | +Transformer | epoch 0 | step 14520 |avg loss 8.263 |avg tokens 2189.200 |tokens/s 8688.483 |walltime 3851.704 | +Transformer | epoch 0 | step 14530 |avg loss 7.930 |avg tokens 2096.500 |tokens/s 8106.210 |walltime 3854.291 | +Transformer | epoch 0 | step 14540 |avg loss 7.708 |avg tokens 2286.400 |tokens/s 8626.619 |walltime 3856.941 | +Transformer | epoch 0 | step 14550 |avg loss 7.870 |avg tokens 2277.900 |tokens/s 8287.583 |walltime 3859.690 | +Transformer | epoch 0 | step 14560 |avg loss 7.911 |avg tokens 2179.600 |tokens/s 8082.051 |walltime 3862.387 | +Transformer | epoch 0 | step 14570 |avg loss 8.005 |avg tokens 2281.400 |tokens/s 8679.138 |walltime 3865.015 | +Transformer | epoch 0 | step 14580 |avg loss 7.827 |avg tokens 2002.700 |tokens/s 7700.387 |walltime 3867.616 | +Transformer | epoch 0 | step 14590 |avg loss 7.952 |avg tokens 2355.600 |tokens/s 8778.225 |walltime 3870.299 | +Transformer | epoch 0 | step 14600 |avg loss 7.480 |avg tokens 2243.500 |tokens/s 8218.961 |walltime 3873.029 | +Transformer | epoch 0 | step 14610 |avg loss 7.918 |avg tokens 2111.800 |tokens/s 8135.350 |walltime 3875.625 | +Transformer | epoch 0 | step 14620 |avg loss 7.535 |avg tokens 2254.400 |tokens/s 8331.170 |walltime 3878.331 | +Transformer | epoch 0 | step 14630 |avg loss 8.260 |avg tokens 2216.400 |tokens/s 8596.349 |walltime 3880.909 | +Transformer | epoch 0 | step 14640 |avg loss 7.551 |avg tokens 2419.300 |tokens/s 8833.676 |walltime 3883.648 | +Transformer | epoch 0 | step 14650 |avg loss 8.168 |avg tokens 2085.000 |tokens/s 8160.141 |walltime 3886.203 | +Transformer | epoch 0 | step 14660 |avg loss 8.010 |avg tokens 2168.800 |tokens/s 8489.422 |walltime 3888.758 | +Transformer | epoch 0 | step 14670 |avg loss 7.839 |avg tokens 2311.600 |tokens/s 8497.849 |walltime 3891.478 | +Transformer | epoch 0 | step 14680 |avg loss 7.714 |avg tokens 2282.400 |tokens/s 8364.907 |walltime 3894.207 | +Transformer | epoch 0 | step 14690 |avg loss 7.921 |avg tokens 2209.700 |tokens/s 8556.679 |walltime 3896.789 | +Transformer | epoch 0 | step 14700 |avg loss 7.674 |avg tokens 2336.600 |tokens/s 8847.032 |walltime 3899.430 | +Transformer | epoch 0 | step 14710 |avg loss 7.789 |avg tokens 2253.400 |tokens/s 8243.666 |walltime 3902.164 | +Transformer | epoch 0 | step 14720 |avg loss 8.148 |avg tokens 2218.900 |tokens/s 8858.540 |walltime 3904.668 | +Transformer | epoch 0 | step 14730 |avg loss 7.540 |avg tokens 2245.600 |tokens/s 8404.968 |walltime 3907.340 | +Transformer | epoch 0 | step 14740 |avg loss 7.990 |avg tokens 2164.300 |tokens/s 8231.071 |walltime 3909.970 | +Transformer | epoch 0 | step 14750 |avg loss 7.554 |avg tokens 2248.800 |tokens/s 8251.350 |walltime 3912.695 | +Transformer | epoch 0 | step 14760 |avg loss 7.765 |avg tokens 2201.600 |tokens/s 8353.370 |walltime 3915.331 | +Transformer | epoch 0 | step 14770 |avg loss 7.354 |avg tokens 2332.000 |tokens/s 8479.486 |walltime 3918.081 | +Transformer | epoch 0 | step 14780 |avg loss 7.807 |avg tokens 2238.400 |tokens/s 8368.290 |walltime 3920.756 | +Transformer | epoch 0 | step 14790 |avg loss 7.786 |avg tokens 2118.400 |tokens/s 8122.124 |walltime 3923.364 | +Transformer | epoch 0 | step 14800 |avg loss 7.667 |avg tokens 2419.100 |tokens/s 8892.139 |walltime 3926.084 | +Transformer | epoch 0 | step 14810 |avg loss 8.001 |avg tokens 2235.700 |tokens/s 8490.186 |walltime 3928.718 | +Transformer | epoch 0 | step 14820 |avg loss 8.195 |avg tokens 1968.200 |tokens/s 7898.979 |walltime 3931.209 | +Transformer | epoch 0 | step 14830 |avg loss 7.942 |avg tokens 2157.900 |tokens/s 7976.166 |walltime 3933.915 | +Transformer | epoch 0 | step 14840 |avg loss 7.942 |avg tokens 2261.100 |tokens/s 8701.395 |walltime 3936.513 | +Transformer | epoch 0 | step 14850 |avg loss 7.926 |avg tokens 2203.900 |tokens/s 8435.928 |walltime 3939.126 | +Transformer | epoch 0 | step 14860 |avg loss 7.715 |avg tokens 2204.200 |tokens/s 8336.019 |walltime 3941.770 | +Transformer | epoch 0 | step 14870 |avg loss 7.979 |avg tokens 2112.200 |tokens/s 8135.356 |walltime 3944.366 | +Transformer | epoch 0 | step 14880 |avg loss 8.087 |avg tokens 2268.500 |tokens/s 8795.563 |walltime 3946.945 | +Transformer | epoch 0 | step 14890 |avg loss 8.137 |avg tokens 1987.400 |tokens/s 7950.375 |walltime 3949.445 | +Transformer | epoch 0 | step 14900 |avg loss 7.767 |avg tokens 2323.100 |tokens/s 8401.115 |walltime 3952.210 | +Transformer | epoch 0 | step 14910 |avg loss 7.719 |avg tokens 2265.800 |tokens/s 8335.091 |walltime 3954.929 | +Transformer | epoch 0 | step 14920 |avg loss 7.645 |avg tokens 2192.900 |tokens/s 8267.410 |walltime 3957.581 | +Transformer | epoch 0 | step 14930 |avg loss 7.657 |avg tokens 2218.400 |tokens/s 8206.614 |walltime 3960.284 | +Transformer | epoch 0 | step 14940 |avg loss 8.007 |avg tokens 2169.400 |tokens/s 8529.980 |walltime 3962.828 | +Transformer | epoch 0 | step 14950 |avg loss 8.020 |avg tokens 2110.600 |tokens/s 8002.592 |walltime 3965.465 | +Transformer | epoch 0 | step 14960 |avg loss 7.980 |avg tokens 2107.100 |tokens/s 8147.073 |walltime 3968.051 | +Transformer | epoch 0 | step 14970 |avg loss 7.903 |avg tokens 2165.600 |tokens/s 8277.107 |walltime 3970.668 | +Transformer | epoch 0 | step 14980 |avg loss 7.775 |avg tokens 2160.600 |tokens/s 8132.385 |walltime 3973.325 | +Transformer | epoch 0 | step 14990 |avg loss 7.977 |avg tokens 1854.700 |tokens/s 7421.749 |walltime 3975.824 | +Transformer | epoch 0 | step 15000 |avg loss 7.742 |avg tokens 2112.500 |tokens/s 8151.471 |walltime 3978.415 | +Transformer | epoch 0 | step 15010 |avg loss 7.927 |avg tokens 2149.100 |tokens/s 8110.737 |walltime 3981.065 | +Transformer | epoch 0 | step 15020 |avg loss 7.821 |avg tokens 2035.600 |tokens/s 8150.087 |walltime 3983.563 | +Transformer | epoch 0 | step 15030 |avg loss 7.331 |avg tokens 2295.100 |tokens/s 8467.913 |walltime 3986.273 | +Transformer | epoch 0 | step 15040 |avg loss 7.503 |avg tokens 2238.800 |tokens/s 8125.633 |walltime 3989.028 | +Transformer | epoch 0 | step 15050 |avg loss 8.068 |avg tokens 2002.100 |tokens/s 7708.753 |walltime 3991.625 | +Transformer | epoch 0 | step 15060 |avg loss 7.917 |avg tokens 2311.100 |tokens/s 8546.167 |walltime 3994.330 | +Transformer | epoch 0 | step 15070 |avg loss 7.548 |avg tokens 2227.200 |tokens/s 8279.546 |walltime 3997.020 | +Transformer | epoch 0 | step 15080 |avg loss 7.808 |avg tokens 2067.300 |tokens/s 7900.564 |walltime 3999.636 | +Transformer | epoch 0 | step 15090 |avg loss 7.497 |avg tokens 2109.500 |tokens/s 7943.303 |walltime 4002.292 | +Transformer | epoch 0 | step 15100 |avg loss 7.797 |avg tokens 2012.100 |tokens/s 7798.556 |walltime 4004.872 | +Transformer | epoch 0 | step 15110 |avg loss 8.144 |avg tokens 2091.400 |tokens/s 7986.463 |walltime 4007.491 | +Transformer | epoch 0 | step 15120 |avg loss 8.016 |avg tokens 2127.200 |tokens/s 8113.681 |walltime 4010.113 | +Transformer | epoch 0 | step 15130 |avg loss 7.815 |avg tokens 2210.300 |tokens/s 8348.344 |walltime 4012.760 | +Transformer | epoch 0 | step 15140 |avg loss 7.748 |avg tokens 2257.900 |tokens/s 8299.559 |walltime 4015.481 | +Transformer | epoch 0 | step 15150 |avg loss 7.920 |avg tokens 1951.100 |tokens/s 8033.652 |walltime 4017.909 | +Transformer | epoch 0 | step 15160 |avg loss 7.677 |avg tokens 2351.200 |tokens/s 8433.189 |walltime 4020.697 | +Transformer | epoch 0 | step 15170 |avg loss 7.483 |avg tokens 2348.400 |tokens/s 8456.938 |walltime 4023.474 | +Transformer | epoch 0 | step 15180 |avg loss 7.655 |avg tokens 2185.100 |tokens/s 8375.518 |walltime 4026.083 | +Transformer | epoch 0 | step 15190 |avg loss 7.684 |avg tokens 2254.200 |tokens/s 8335.341 |walltime 4028.788 | +Transformer | epoch 0 | step 15200 |avg loss 8.281 |avg tokens 2114.800 |tokens/s 8352.002 |walltime 4031.320 | +Transformer | epoch 0 | step 15210 |avg loss 8.012 |avg tokens 2213.900 |tokens/s 8440.384 |walltime 4033.943 | +Transformer | epoch 0 | step 15220 |avg loss 7.770 |avg tokens 2182.600 |tokens/s 8201.477 |walltime 4036.604 | +Transformer | epoch 0 | step 15230 |avg loss 7.891 |avg tokens 2297.800 |tokens/s 8617.975 |walltime 4039.270 | +Transformer | epoch 0 | step 15240 |avg loss 8.075 |avg tokens 2117.400 |tokens/s 7964.985 |walltime 4041.929 | +Transformer | epoch 0 | step 15250 |avg loss 7.892 |avg tokens 2275.700 |tokens/s 8498.552 |walltime 4044.606 | +Transformer | epoch 0 | step 15260 |avg loss 7.626 |avg tokens 2205.300 |tokens/s 8209.914 |walltime 4047.292 | +Transformer | epoch 0 | step 15270 |avg loss 7.836 |avg tokens 2150.000 |tokens/s 8124.350 |walltime 4049.939 | +Transformer | epoch 0 | step 15280 |avg loss 7.809 |avg tokens 2310.200 |tokens/s 8560.107 |walltime 4052.638 | +Transformer | epoch 0 | step 15290 |avg loss 7.622 |avg tokens 2374.800 |tokens/s 8621.579 |walltime 4055.392 | +Transformer | epoch 0 | step 15300 |avg loss 7.971 |avg tokens 2216.300 |tokens/s 8141.998 |walltime 4058.114 | +Transformer | epoch 0 | step 15310 |avg loss 7.877 |avg tokens 2169.200 |tokens/s 8382.810 |walltime 4060.702 | +Transformer | epoch 0 | step 15320 |avg loss 7.754 |avg tokens 2284.700 |tokens/s 8507.279 |walltime 4063.387 | +Transformer | epoch 0 | step 15330 |avg loss 7.791 |avg tokens 2154.100 |tokens/s 8219.167 |walltime 4066.008 | +Transformer | epoch 0 | step 15340 |avg loss 8.094 |avg tokens 2258.700 |tokens/s 8605.340 |walltime 4068.633 | +Transformer | epoch 0 | step 15350 |avg loss 8.098 |avg tokens 2039.800 |tokens/s 7735.305 |walltime 4071.270 | +Transformer | epoch 0 | step 15360 |avg loss 8.348 |avg tokens 2017.400 |tokens/s 8440.285 |walltime 4073.660 | +Transformer | epoch 0 | step 15370 |avg loss 7.826 |avg tokens 2275.600 |tokens/s 8283.073 |walltime 4076.407 | +Transformer | epoch 0 | step 15380 |avg loss 7.874 |avg tokens 2112.200 |tokens/s 8091.612 |walltime 4079.018 | +Transformer | epoch 0 | step 15390 |avg loss 7.752 |avg tokens 2350.700 |tokens/s 8639.069 |walltime 4081.739 | +Transformer | epoch 0 | step 15400 |avg loss 7.794 |avg tokens 2265.600 |tokens/s 8180.019 |walltime 4084.509 | +Transformer | epoch 0 | step 15410 |avg loss 8.102 |avg tokens 1879.100 |tokens/s 7805.950 |walltime 4086.916 | +Transformer | epoch 0 | step 15420 |avg loss 7.787 |avg tokens 2300.800 |tokens/s 8394.179 |walltime 4089.657 | +Transformer | epoch 0 | step 15430 |avg loss 7.886 |avg tokens 2061.500 |tokens/s 8157.249 |walltime 4092.184 | +Transformer | epoch 0 | step 15440 |avg loss 8.024 |avg tokens 2183.700 |tokens/s 8376.859 |walltime 4094.791 | +Transformer | epoch 0 | step 15450 |avg loss 8.076 |avg tokens 2243.700 |tokens/s 8553.045 |walltime 4097.414 | +Transformer | epoch 0 | step 15460 |avg loss 7.996 |avg tokens 2108.500 |tokens/s 8123.241 |walltime 4100.010 | +Transformer | epoch 0 | step 15470 |avg loss 7.644 |avg tokens 2278.400 |tokens/s 8353.948 |walltime 4102.737 | +Transformer | epoch 0 | step 15480 |avg loss 7.677 |avg tokens 2256.800 |tokens/s 8265.336 |walltime 4105.468 | +Transformer | epoch 0 | step 15490 |avg loss 7.746 |avg tokens 2291.400 |tokens/s 8255.683 |walltime 4108.243 | +Transformer | epoch 0 | step 15500 |avg loss 7.639 |avg tokens 2413.100 |tokens/s 8807.967 |walltime 4110.983 | +Transformer | epoch 0 | step 15510 |avg loss 7.636 |avg tokens 2396.200 |tokens/s 8621.537 |walltime 4113.762 | +Transformer | epoch 0 | step 15520 |avg loss 7.450 |avg tokens 2154.300 |tokens/s 8111.896 |walltime 4116.418 | +Transformer | epoch 0 | step 15530 |avg loss 7.851 |avg tokens 2376.000 |tokens/s 8913.964 |walltime 4119.083 | +Transformer | epoch 0 | step 15540 |avg loss 7.729 |avg tokens 2415.700 |tokens/s 8814.523 |walltime 4121.824 | +Transformer | epoch 0 | step 15550 |avg loss 7.648 |avg tokens 2332.100 |tokens/s 8553.954 |walltime 4124.550 | +Transformer | epoch 0 | step 15560 |avg loss 7.620 |avg tokens 2187.400 |tokens/s 8120.788 |walltime 4127.244 | +Transformer | epoch 0 | step 15570 |avg loss 7.898 |avg tokens 2112.900 |tokens/s 8030.187 |walltime 4129.875 | +Transformer | epoch 0 | step 15580 |avg loss 8.069 |avg tokens 2005.100 |tokens/s 7853.761 |walltime 4132.428 | +Transformer | epoch 0 | step 15590 |avg loss 7.639 |avg tokens 2194.200 |tokens/s 8154.969 |walltime 4135.119 | +Transformer | epoch 0 | step 15600 |avg loss 7.600 |avg tokens 2284.900 |tokens/s 8472.337 |walltime 4137.816 | +Transformer | epoch 0 | step 15610 |avg loss 7.745 |avg tokens 2355.200 |tokens/s 8742.206 |walltime 4140.510 | +Transformer | epoch 0 | step 15620 |avg loss 7.728 |avg tokens 2233.000 |tokens/s 8451.438 |walltime 4143.152 | +Transformer | epoch 0 | step 15630 |avg loss 7.603 |avg tokens 2210.000 |tokens/s 8213.621 |walltime 4145.842 | +Transformer | epoch 0 | step 15640 |avg loss 7.727 |avg tokens 2258.200 |tokens/s 8470.278 |walltime 4148.508 | +Transformer | epoch 0 | step 15650 |avg loss 8.251 |avg tokens 1963.500 |tokens/s 8123.700 |walltime 4150.925 | +Transformer | epoch 0 | step 15660 |avg loss 8.016 |avg tokens 1909.300 |tokens/s 7551.783 |walltime 4153.454 | +Transformer | epoch 0 | step 15670 |avg loss 8.166 |avg tokens 2150.200 |tokens/s 8465.602 |walltime 4155.994 | +Transformer | epoch 0 | step 15680 |avg loss 7.809 |avg tokens 2331.200 |tokens/s 8639.757 |walltime 4158.692 | +Transformer | epoch 0 | step 15690 |avg loss 7.798 |avg tokens 2035.600 |tokens/s 8043.761 |walltime 4161.223 | +Transformer | epoch 0 | step 15700 |avg loss 7.817 |avg tokens 2139.200 |tokens/s 8094.254 |walltime 4163.865 | +Transformer | epoch 0 | step 15710 |avg loss 7.645 |avg tokens 2247.800 |tokens/s 8504.450 |walltime 4166.509 | +Transformer | epoch 0 | step 15720 |avg loss 7.925 |avg tokens 2347.200 |tokens/s 8278.336 |walltime 4169.344 | +Transformer | epoch 0 | step 15730 |avg loss 7.800 |avg tokens 2148.100 |tokens/s 7880.635 |walltime 4172.070 | +Transformer | epoch 0 | step 15740 |avg loss 7.806 |avg tokens 2226.200 |tokens/s 8206.877 |walltime 4174.782 | +Transformer | epoch 0 | step 15750 |avg loss 7.968 |avg tokens 2124.400 |tokens/s 8051.477 |walltime 4177.421 | +Transformer | epoch 0 | step 15760 |avg loss 7.687 |avg tokens 2274.400 |tokens/s 8341.583 |walltime 4180.147 | +Transformer | epoch 0 | step 15770 |avg loss 7.375 |avg tokens 2407.300 |tokens/s 8524.907 |walltime 4182.971 | +Transformer | epoch 0 | step 15780 |avg loss 8.161 |avg tokens 2255.200 |tokens/s 8880.189 |walltime 4185.511 | +Transformer | epoch 0 | step 15790 |avg loss 7.769 |avg tokens 2393.300 |tokens/s 8813.987 |walltime 4188.226 | +Transformer | epoch 0 | step 15800 |avg loss 8.215 |avg tokens 2187.200 |tokens/s 8477.654 |walltime 4190.806 | +Transformer | epoch 0 | step 15810 |avg loss 7.650 |avg tokens 2378.400 |tokens/s 8534.029 |walltime 4193.593 | +Transformer | epoch 0 | step 15820 |avg loss 8.054 |avg tokens 2043.900 |tokens/s 7883.359 |walltime 4196.186 | +Transformer | epoch 0 | step 15830 |avg loss 8.091 |avg tokens 2145.000 |tokens/s 8093.356 |walltime 4198.836 | +Transformer | epoch 0 | step 15840 |avg loss 7.697 |avg tokens 2166.800 |tokens/s 8029.636 |walltime 4201.535 | +Transformer | epoch 0 | step 15850 |avg loss 8.375 |avg tokens 2289.000 |tokens/s 8913.514 |walltime 4204.103 | +Transformer | epoch 0 | step 15860 |avg loss 7.721 |avg tokens 2232.000 |tokens/s 8235.460 |walltime 4206.813 | +Transformer | epoch 0 | step 15870 |avg loss 7.535 |avg tokens 2242.400 |tokens/s 8128.957 |walltime 4209.571 | +Transformer | epoch 0 | step 15880 |avg loss 7.773 |avg tokens 2188.600 |tokens/s 8296.098 |walltime 4212.210 | +Transformer | epoch 0 | step 15890 |avg loss 8.029 |avg tokens 2138.000 |tokens/s 8264.424 |walltime 4214.797 | +Transformer | epoch 0 | step 15900 |avg loss 8.188 |avg tokens 1984.900 |tokens/s 7951.035 |walltime 4217.293 | +Transformer | epoch 0 | step 15910 |avg loss 7.878 |avg tokens 2387.200 |tokens/s 8686.827 |walltime 4220.041 | +Transformer | epoch 0 | step 15920 |avg loss 7.721 |avg tokens 1968.600 |tokens/s 7524.856 |walltime 4222.657 | +Transformer | epoch 0 | step 15930 |avg loss 7.801 |avg tokens 2404.000 |tokens/s 8692.542 |walltime 4225.423 | +Transformer | epoch 0 | step 15940 |avg loss 8.015 |avg tokens 2264.200 |tokens/s 8379.540 |walltime 4228.125 | +Transformer | epoch 0 | step 15950 |avg loss 7.715 |avg tokens 2364.000 |tokens/s 8735.184 |walltime 4230.831 | +Transformer | epoch 0 | step 15960 |avg loss 7.964 |avg tokens 2208.200 |tokens/s 8403.342 |walltime 4233.459 | +Transformer | epoch 0 | step 15970 |avg loss 7.983 |avg tokens 2257.200 |tokens/s 8341.398 |walltime 4236.165 | +Transformer | epoch 0 | step 15980 |avg loss 8.136 |avg tokens 1974.700 |tokens/s 7890.046 |walltime 4238.668 | +Transformer | epoch 0 | step 15990 |avg loss 8.052 |avg tokens 1987.000 |tokens/s 7839.267 |walltime 4241.202 | +Transformer | epoch 0 | step 16000 |avg loss 7.954 |avg tokens 2076.900 |tokens/s 8099.264 |walltime 4243.767 | +Transformer | epoch 0 | step 16010 |avg loss 8.259 |avg tokens 2276.400 |tokens/s 8969.447 |walltime 4246.305 | +Transformer | epoch 0 | step 16020 |avg loss 7.874 |avg tokens 1945.200 |tokens/s 7707.788 |walltime 4248.828 | +Transformer | epoch 0 | step 16030 |avg loss 7.851 |avg tokens 2374.400 |tokens/s 8785.377 |walltime 4251.531 | +Transformer | epoch 0 | step 16040 |avg loss 7.987 |avg tokens 2242.700 |tokens/s 8482.822 |walltime 4254.175 | +Transformer | epoch 0 | step 16050 |avg loss 8.075 |avg tokens 2128.400 |tokens/s 8580.893 |walltime 4256.655 | +Transformer | epoch 0 | step 16060 |avg loss 7.763 |avg tokens 2302.400 |tokens/s 8684.270 |walltime 4259.306 | +Transformer | epoch 0 | step 16070 |avg loss 7.690 |avg tokens 2324.000 |tokens/s 8526.985 |walltime 4262.032 | +Transformer | epoch 0 | step 16080 |avg loss 8.014 |avg tokens 2347.600 |tokens/s 8846.079 |walltime 4264.686 | +Transformer | epoch 0 | step 16090 |avg loss 8.157 |avg tokens 1898.900 |tokens/s 7575.604 |walltime 4267.192 | +Transformer | epoch 0 | step 16100 |avg loss 7.654 |avg tokens 2152.400 |tokens/s 7990.792 |walltime 4269.886 | +Transformer | epoch 0 | step 16110 |avg loss 7.739 |avg tokens 2277.700 |tokens/s 8370.727 |walltime 4272.607 | +Transformer | epoch 0 | step 16120 |avg loss 8.190 |avg tokens 1955.700 |tokens/s 7833.671 |walltime 4275.104 | +Transformer | epoch 0 | step 16130 |avg loss 7.364 |avg tokens 2403.200 |tokens/s 8471.776 |walltime 4277.940 | +Transformer | epoch 0 | step 16140 |avg loss 7.993 |avg tokens 2230.200 |tokens/s 8380.548 |walltime 4280.601 | +Transformer | epoch 0 | step 16150 |avg loss 7.349 |avg tokens 2352.000 |tokens/s 8379.444 |walltime 4283.408 | +Transformer | epoch 0 | step 16160 |avg loss 7.873 |avg tokens 1934.300 |tokens/s 7580.828 |walltime 4285.960 | +Transformer | epoch 0 | step 16170 |avg loss 7.883 |avg tokens 2223.200 |tokens/s 8206.699 |walltime 4288.669 | +Transformer | epoch 0 | step 16180 |avg loss 7.257 |avg tokens 2376.000 |tokens/s 8621.022 |walltime 4291.425 | +Transformer | epoch 0 | step 16190 |avg loss 7.925 |avg tokens 1882.100 |tokens/s 7594.942 |walltime 4293.903 | +Transformer | epoch 0 | step 16200 |avg loss 7.774 |avg tokens 2266.900 |tokens/s 8561.962 |walltime 4296.551 | +Transformer | epoch 0 | step 16210 |avg loss 8.039 |avg tokens 2243.600 |tokens/s 8816.526 |walltime 4299.095 | +Transformer | epoch 0 | step 16220 |avg loss 7.906 |avg tokens 2286.500 |tokens/s 8350.380 |walltime 4301.834 | +Transformer | epoch 0 | step 16230 |avg loss 7.644 |avg tokens 2233.700 |tokens/s 8234.356 |walltime 4304.546 | +Transformer | epoch 0 | step 16240 |avg loss 7.892 |avg tokens 2008.500 |tokens/s 7969.977 |walltime 4307.066 | +Transformer | epoch 0 | step 16250 |avg loss 7.655 |avg tokens 2202.300 |tokens/s 8023.871 |walltime 4309.811 | +Transformer | epoch 0 | step 16260 |avg loss 8.096 |avg tokens 2137.400 |tokens/s 8352.302 |walltime 4312.370 | +Transformer | epoch 0 | step 16270 |avg loss 7.908 |avg tokens 2131.000 |tokens/s 8107.690 |walltime 4314.999 | +Transformer | epoch 0 | step 16280 |avg loss 7.430 |avg tokens 2371.200 |tokens/s 8505.187 |walltime 4317.786 | +Transformer | epoch 0 | step 16290 |avg loss 8.311 |avg tokens 1988.800 |tokens/s 7897.600 |walltime 4320.305 | +Transformer | epoch 0 | step 16300 |avg loss 8.009 |avg tokens 2194.800 |tokens/s 8326.339 |walltime 4322.941 | +Transformer | epoch 0 | step 16310 |avg loss 7.838 |avg tokens 2148.800 |tokens/s 8220.832 |walltime 4325.555 | +Transformer | epoch 0 | step 16320 |avg loss 7.661 |avg tokens 2332.700 |tokens/s 8483.965 |walltime 4328.304 | +Transformer | epoch 0 | step 16330 |avg loss 8.091 |avg tokens 2321.600 |tokens/s 8652.688 |walltime 4330.987 | +Transformer | epoch 0 | step 16340 |avg loss 8.131 |avg tokens 2222.600 |tokens/s 8524.439 |walltime 4333.594 | +Transformer | epoch 0 | step 16350 |avg loss 7.877 |avg tokens 2030.000 |tokens/s 7885.580 |walltime 4336.169 | +Transformer | epoch 0 | step 16360 |avg loss 7.902 |avg tokens 2062.900 |tokens/s 7966.074 |walltime 4338.758 | +Transformer | epoch 0 | step 16370 |avg loss 7.962 |avg tokens 2063.500 |tokens/s 8039.932 |walltime 4341.325 | +Transformer | epoch 0 | step 16380 |avg loss 7.957 |avg tokens 2321.600 |tokens/s 8512.431 |walltime 4344.052 | +Transformer | epoch 0 | step 16390 |avg loss 7.556 |avg tokens 2234.900 |tokens/s 8338.501 |walltime 4346.733 | +Transformer | epoch 0 | step 16400 |avg loss 7.899 |avg tokens 1915.600 |tokens/s 7497.486 |walltime 4349.288 | +Transformer | epoch 0 | step 16410 |avg loss 7.908 |avg tokens 2156.600 |tokens/s 8240.510 |walltime 4351.905 | +Transformer | epoch 0 | step 16420 |avg loss 8.062 |avg tokens 2325.200 |tokens/s 8723.095 |walltime 4354.570 | +Transformer | epoch 0 | step 16430 |avg loss 7.684 |avg tokens 2216.600 |tokens/s 8307.816 |walltime 4357.238 | +Transformer | epoch 0 | step 16440 |avg loss 8.148 |avg tokens 2124.100 |tokens/s 8138.310 |walltime 4359.848 | +Transformer | epoch 0 | step 16450 |avg loss 8.122 |avg tokens 2226.200 |tokens/s 8671.240 |walltime 4362.416 | +Transformer | epoch 0 | step 16460 |avg loss 8.126 |avg tokens 2246.200 |tokens/s 8485.124 |walltime 4365.063 | +Transformer | epoch 0 | step 16470 |avg loss 8.125 |avg tokens 2152.500 |tokens/s 8602.340 |walltime 4367.565 | +Transformer | epoch 0 | step 16480 |avg loss 7.701 |avg tokens 2127.300 |tokens/s 7991.569 |walltime 4370.227 | +Transformer | epoch 0 | step 16490 |avg loss 7.936 |avg tokens 2051.400 |tokens/s 8226.057 |walltime 4372.721 | +Transformer | epoch 0 | step 16500 |avg loss 8.111 |avg tokens 2338.300 |tokens/s 8648.070 |walltime 4375.425 | +Transformer | epoch 0 | step 16510 |avg loss 7.675 |avg tokens 2378.400 |tokens/s 8508.738 |walltime 4378.220 | +Transformer | epoch 0 | step 16520 |avg loss 7.844 |avg tokens 2227.500 |tokens/s 8423.167 |walltime 4380.864 | +Transformer | epoch 0 | step 16530 |avg loss 8.030 |avg tokens 2242.800 |tokens/s 8808.778 |walltime 4383.410 | +Transformer | epoch 0 | step 16540 |avg loss 8.217 |avg tokens 2017.600 |tokens/s 8020.669 |walltime 4385.926 | +Transformer | epoch 0 | step 16550 |avg loss 8.236 |avg tokens 2049.400 |tokens/s 8335.193 |walltime 4388.385 | +Transformer | epoch 0 | step 16560 |avg loss 8.242 |avg tokens 2039.700 |tokens/s 8334.209 |walltime 4390.832 | +Transformer | epoch 0 | step 16570 |avg loss 7.929 |avg tokens 2022.400 |tokens/s 8058.254 |walltime 4393.342 | +Transformer | epoch 0 | step 16580 |avg loss 7.880 |avg tokens 2426.900 |tokens/s 8837.456 |walltime 4396.088 | +Transformer | epoch 0 | step 16590 |avg loss 7.960 |avg tokens 2162.400 |tokens/s 8226.825 |walltime 4398.716 | +Transformer | epoch 0 | step 16600 |avg loss 8.000 |avg tokens 2126.100 |tokens/s 7970.668 |walltime 4401.384 | +Transformer | epoch 0 | step 16610 |avg loss 7.695 |avg tokens 2052.500 |tokens/s 7759.353 |walltime 4404.029 | +Transformer | epoch 0 | step 16620 |avg loss 7.784 |avg tokens 2267.900 |tokens/s 8380.845 |walltime 4406.735 | +Transformer | epoch 0 | step 16630 |avg loss 7.807 |avg tokens 2280.800 |tokens/s 8292.395 |walltime 4409.486 | +Transformer | epoch 0 | step 16640 |avg loss 7.911 |avg tokens 2350.800 |tokens/s 8744.814 |walltime 4412.174 | +Transformer | epoch 0 | step 16650 |avg loss 7.876 |avg tokens 2362.400 |tokens/s 8797.052 |walltime 4414.859 | +Transformer | epoch 0 | step 16660 |avg loss 7.902 |avg tokens 2216.200 |tokens/s 8499.671 |walltime 4417.467 | +Transformer | epoch 0 | step 16670 |avg loss 7.765 |avg tokens 2164.800 |tokens/s 8081.072 |walltime 4420.146 | +Transformer | epoch 0 | step 16680 |avg loss 7.910 |avg tokens 2189.100 |tokens/s 8159.684 |walltime 4422.828 | +Transformer | epoch 0 | step 16690 |avg loss 7.756 |avg tokens 2268.500 |tokens/s 8734.823 |walltime 4425.425 | +Transformer | epoch 0 | step 16700 |avg loss 7.728 |avg tokens 2148.800 |tokens/s 7977.101 |walltime 4428.119 | +Transformer | epoch 0 | step 16710 |avg loss 7.794 |avg tokens 2003.500 |tokens/s 7777.084 |walltime 4430.695 | +Transformer | epoch 0 | step 16720 |avg loss 7.799 |avg tokens 2055.300 |tokens/s 7694.756 |walltime 4433.366 | +Transformer | epoch 0 | step 16730 |avg loss 7.623 |avg tokens 2275.200 |tokens/s 8472.494 |walltime 4436.052 | +Transformer | epoch 0 | step 16740 |avg loss 8.077 |avg tokens 2327.000 |tokens/s 8764.433 |walltime 4438.707 | +Transformer | epoch 0 | step 16750 |avg loss 7.761 |avg tokens 2089.700 |tokens/s 7781.413 |walltime 4441.392 | +Transformer | epoch 0 | step 16760 |avg loss 8.171 |avg tokens 2226.500 |tokens/s 8555.616 |walltime 4443.995 | +Transformer | epoch 0 | step 16770 |avg loss 8.484 |avg tokens 2226.200 |tokens/s 8731.226 |walltime 4446.544 | +Transformer | epoch 0 | step 16780 |avg loss 8.209 |avg tokens 2237.800 |tokens/s 8460.052 |walltime 4449.190 | +Transformer | epoch 0 | step 16790 |avg loss 7.575 |avg tokens 2280.800 |tokens/s 8273.511 |walltime 4451.946 | +Transformer | epoch 0 | step 16800 |avg loss 7.983 |avg tokens 2010.300 |tokens/s 7904.969 |walltime 4454.489 | +Transformer | epoch 0 | step 16810 |avg loss 7.636 |avg tokens 2247.200 |tokens/s 8042.938 |walltime 4457.283 | +Transformer | epoch 0 | step 16820 |avg loss 7.439 |avg tokens 2305.900 |tokens/s 8587.626 |walltime 4459.969 | +Transformer | epoch 0 | step 16830 |avg loss 7.430 |avg tokens 2296.000 |tokens/s 8339.307 |walltime 4462.722 | +Transformer | epoch 0 | step 16840 |avg loss 8.140 |avg tokens 1940.400 |tokens/s 7845.511 |walltime 4465.195 | +Transformer | epoch 0 | step 16850 |avg loss 7.867 |avg tokens 2175.600 |tokens/s 8141.555 |walltime 4467.867 | +Transformer | epoch 0 | step 16860 |avg loss 7.952 |avg tokens 2208.800 |tokens/s 8256.872 |walltime 4470.542 | +Transformer | epoch 0 | step 16870 |avg loss 7.888 |avg tokens 2234.100 |tokens/s 8366.665 |walltime 4473.213 | +Transformer | epoch 0 | step 16880 |avg loss 7.523 |avg tokens 2376.000 |tokens/s 8491.242 |walltime 4476.011 | +Transformer | epoch 0 | step 16890 |avg loss 7.540 |avg tokens 2274.300 |tokens/s 8383.648 |walltime 4478.724 | +Transformer | epoch 0 | step 16900 |avg loss 8.080 |avg tokens 2344.900 |tokens/s 8693.126 |walltime 4481.421 | +Transformer | epoch 0 | step 16910 |avg loss 7.760 |avg tokens 2347.200 |tokens/s 8638.723 |walltime 4484.138 | +Transformer | epoch 0 | step 16920 |avg loss 7.815 |avg tokens 2151.000 |tokens/s 8067.159 |walltime 4486.804 | +Transformer | epoch 0 | step 16930 |avg loss 8.065 |avg tokens 2077.600 |tokens/s 8259.394 |walltime 4489.320 | +Transformer | epoch 0 | step 16940 |avg loss 7.506 |avg tokens 2275.200 |tokens/s 8489.554 |walltime 4492.000 | +Transformer | epoch 0 | step 16950 |avg loss 7.798 |avg tokens 2064.800 |tokens/s 7952.109 |walltime 4494.596 | +Transformer | epoch 0 | step 16960 |avg loss 8.570 |avg tokens 2163.300 |tokens/s 8793.434 |walltime 4497.057 | +Transformer | epoch 0 | step 16970 |avg loss 7.614 |avg tokens 2029.700 |tokens/s 7666.128 |walltime 4499.704 | +Transformer | epoch 0 | step 16980 |avg loss 7.854 |avg tokens 2079.300 |tokens/s 7870.406 |walltime 4502.346 | +Transformer | epoch 0 | step 16990 |avg loss 7.632 |avg tokens 2253.600 |tokens/s 8171.431 |walltime 4505.104 | +Transformer | epoch 0 | step 17000 |avg loss 7.838 |avg tokens 1915.000 |tokens/s 7482.627 |walltime 4507.663 | +Transformer | epoch 0 | step 17010 |avg loss 7.928 |avg tokens 2414.800 |tokens/s 8981.276 |walltime 4510.352 | +Transformer | epoch 0 | step 17020 |avg loss 8.069 |avg tokens 2056.100 |tokens/s 7980.207 |walltime 4512.929 | +Transformer | epoch 0 | step 17030 |avg loss 7.938 |avg tokens 2008.400 |tokens/s 7791.792 |walltime 4515.506 | +Transformer | epoch 0 | step 17040 |avg loss 7.905 |avg tokens 2275.600 |tokens/s 8613.809 |walltime 4518.148 | +Transformer | epoch 0 | step 17050 |avg loss 7.959 |avg tokens 2167.300 |tokens/s 8167.446 |walltime 4520.802 | +Transformer | epoch 0 | step 17060 |avg loss 7.950 |avg tokens 2004.800 |tokens/s 7836.655 |walltime 4523.360 | +Transformer | epoch 0 | step 17070 |avg loss 7.757 |avg tokens 2291.100 |tokens/s 8280.431 |walltime 4526.127 | +Transformer | epoch 0 | step 17080 |avg loss 7.684 |avg tokens 2192.400 |tokens/s 8302.526 |walltime 4528.767 | +Transformer | epoch 0 | step 17090 |avg loss 7.945 |avg tokens 2161.800 |tokens/s 8151.432 |walltime 4531.419 | +Transformer | epoch 0 | step 17100 |avg loss 7.813 |avg tokens 2171.000 |tokens/s 8207.109 |walltime 4534.065 | +Transformer | epoch 0 | step 17110 |avg loss 7.445 |avg tokens 2241.000 |tokens/s 8176.535 |walltime 4536.805 | +Transformer | epoch 0 | step 17120 |avg loss 7.902 |avg tokens 2101.400 |tokens/s 7861.438 |walltime 4539.478 | +Transformer | epoch 0 | step 17130 |avg loss 7.821 |avg tokens 2210.300 |tokens/s 8441.152 |walltime 4542.097 | +Transformer | epoch 0 | step 17140 |avg loss 8.006 |avg tokens 2198.700 |tokens/s 8053.592 |walltime 4544.827 | +Transformer | epoch 0 | step 17150 |avg loss 8.103 |avg tokens 2089.600 |tokens/s 8072.635 |walltime 4547.416 | +Transformer | epoch 0 | step 17160 |avg loss 8.175 |avg tokens 2202.700 |tokens/s 8448.090 |walltime 4550.023 | +Transformer | epoch 0 | step 17170 |avg loss 7.534 |avg tokens 2105.500 |tokens/s 8012.146 |walltime 4552.651 | +Transformer | epoch 0 | step 17180 |avg loss 7.875 |avg tokens 2189.900 |tokens/s 8072.824 |walltime 4555.363 | +Transformer | epoch 0 | step 17190 |avg loss 7.787 |avg tokens 1976.000 |tokens/s 7718.071 |walltime 4557.924 | +Transformer | epoch 0 | step 17200 |avg loss 8.409 |avg tokens 1876.900 |tokens/s 7865.630 |walltime 4560.310 | +Transformer | epoch 0 | step 17210 |avg loss 7.603 |avg tokens 2382.400 |tokens/s 8520.879 |walltime 4563.106 | +Transformer | epoch 0 | step 17220 |avg loss 7.787 |avg tokens 2177.400 |tokens/s 8456.358 |walltime 4565.681 | +Transformer | epoch 0 | step 17230 |avg loss 7.620 |avg tokens 2316.800 |tokens/s 8460.023 |walltime 4568.419 | +Transformer | epoch 0 | step 17240 |avg loss 7.934 |avg tokens 2076.000 |tokens/s 8137.473 |walltime 4570.970 | +Transformer | epoch 0 | step 17250 |avg loss 7.703 |avg tokens 2305.400 |tokens/s 8443.124 |walltime 4573.701 | +Transformer | epoch 0 | step 17260 |avg loss 7.923 |avg tokens 1979.500 |tokens/s 7980.513 |walltime 4576.181 | +Transformer | epoch 0 | step 17270 |avg loss 7.630 |avg tokens 2341.600 |tokens/s 8613.727 |walltime 4578.900 | +Transformer | epoch 0 | step 17280 |avg loss 7.960 |avg tokens 1943.700 |tokens/s 7748.882 |walltime 4581.408 | +Transformer | epoch 0 | step 17290 |avg loss 8.041 |avg tokens 2039.100 |tokens/s 7853.106 |walltime 4584.005 | +Transformer | epoch 0 | step 17300 |avg loss 7.436 |avg tokens 2436.800 |tokens/s 8666.873 |walltime 4586.816 | +Transformer | epoch 0 | step 17310 |avg loss 7.876 |avg tokens 2212.100 |tokens/s 8437.632 |walltime 4589.438 | +Transformer | epoch 0 | step 17320 |avg loss 7.853 |avg tokens 2082.900 |tokens/s 7899.383 |walltime 4592.075 | +Transformer | epoch 0 | step 17330 |avg loss 7.724 |avg tokens 2235.100 |tokens/s 8195.147 |walltime 4594.802 | +Transformer | epoch 0 | step 17340 |avg loss 7.934 |avg tokens 2063.900 |tokens/s 8137.868 |walltime 4597.338 | +Transformer | epoch 0 | step 17350 |avg loss 7.995 |avg tokens 2130.700 |tokens/s 8097.455 |walltime 4599.970 | +Transformer | epoch 0 | step 17360 |avg loss 7.628 |avg tokens 2391.200 |tokens/s 8674.043 |walltime 4602.726 | +Transformer | epoch 0 | step 17370 |avg loss 7.940 |avg tokens 2290.400 |tokens/s 8430.405 |walltime 4605.443 | +Transformer | epoch 0 | step 17380 |avg loss 8.059 |avg tokens 2146.500 |tokens/s 8528.463 |walltime 4607.960 | +Transformer | epoch 0 | step 17390 |avg loss 7.714 |avg tokens 2201.500 |tokens/s 8209.218 |walltime 4610.642 | +Transformer | epoch 0 | step 17400 |avg loss 7.561 |avg tokens 2242.400 |tokens/s 8200.556 |walltime 4613.376 | +Transformer | epoch 0 | step 17410 |avg loss 8.276 |avg tokens 2057.000 |tokens/s 7915.866 |walltime 4615.975 | +Transformer | epoch 0 | step 17420 |avg loss 7.925 |avg tokens 2355.200 |tokens/s 8748.229 |walltime 4618.667 | +Transformer | epoch 0 | step 17430 |avg loss 8.061 |avg tokens 2155.100 |tokens/s 8056.432 |walltime 4621.342 | +Transformer | epoch 0 | step 17440 |avg loss 8.043 |avg tokens 1924.000 |tokens/s 7629.666 |walltime 4623.864 | +Transformer | epoch 0 | step 17450 |avg loss 7.402 |avg tokens 2248.800 |tokens/s 8112.322 |walltime 4626.636 | +Transformer | epoch 0 | step 17460 |avg loss 8.077 |avg tokens 2073.600 |tokens/s 8477.192 |walltime 4629.082 | +Transformer | epoch 0 | step 17470 |avg loss 7.651 |avg tokens 2140.400 |tokens/s 8014.133 |walltime 4631.753 | +Transformer | epoch 0 | step 17480 |avg loss 8.116 |avg tokens 2191.200 |tokens/s 8215.171 |walltime 4634.420 | +Transformer | epoch 0 | step 17490 |avg loss 8.059 |avg tokens 2141.200 |tokens/s 8347.729 |walltime 4636.985 | +Transformer | epoch 0 | step 17500 |avg loss 8.012 |avg tokens 2237.400 |tokens/s 8581.701 |walltime 4639.592 | +Transformer | epoch 0 | step 17510 |avg loss 8.123 |avg tokens 2206.600 |tokens/s 8523.141 |walltime 4642.181 | +Transformer | epoch 0 | step 17520 |avg loss 7.835 |avg tokens 2355.100 |tokens/s 8655.784 |walltime 4644.902 | +Transformer | epoch 0 | step 17530 |avg loss 7.994 |avg tokens 2069.000 |tokens/s 8256.615 |walltime 4647.408 | +Transformer | epoch 0 | step 17540 |avg loss 8.189 |avg tokens 2023.200 |tokens/s 8364.522 |walltime 4649.827 | +Transformer | epoch 0 | step 17550 |avg loss 7.637 |avg tokens 2234.400 |tokens/s 8384.867 |walltime 4652.492 | +Transformer | epoch 0 | step 17560 |avg loss 7.742 |avg tokens 2185.800 |tokens/s 8360.862 |walltime 4655.106 | +Transformer | epoch 0 | step 17570 |avg loss 7.998 |avg tokens 2297.000 |tokens/s 8607.485 |walltime 4657.774 | +Transformer | epoch 0 | step 17580 |avg loss 7.944 |avg tokens 2202.400 |tokens/s 8498.284 |walltime 4660.366 | +Transformer | epoch 0 | step 17590 |avg loss 7.550 |avg tokens 2188.800 |tokens/s 8059.234 |walltime 4663.082 | +Transformer | epoch 0 | step 17600 |avg loss 7.612 |avg tokens 1844.700 |tokens/s 7425.961 |walltime 4665.566 | +Transformer | epoch 0 | step 17610 |avg loss 7.727 |avg tokens 2238.400 |tokens/s 8301.294 |walltime 4668.263 | +Transformer | epoch 0 | step 17620 |avg loss 7.804 |avg tokens 2234.400 |tokens/s 8416.972 |walltime 4670.917 | +Transformer | epoch 0 | step 17630 |avg loss 8.062 |avg tokens 2324.500 |tokens/s 8706.650 |walltime 4673.587 | +Transformer | epoch 0 | step 17640 |avg loss 7.949 |avg tokens 2324.400 |tokens/s 8326.006 |walltime 4676.379 | +Transformer | epoch 0 | step 17650 |avg loss 7.882 |avg tokens 2316.000 |tokens/s 8763.381 |walltime 4679.022 | +Transformer | epoch 0 | step 17660 |avg loss 7.979 |avg tokens 2080.400 |tokens/s 8020.091 |walltime 4681.616 | +Transformer | epoch 0 | step 17670 |avg loss 7.532 |avg tokens 2297.600 |tokens/s 8376.020 |walltime 4684.359 | +Transformer | epoch 0 | step 17680 |avg loss 8.024 |avg tokens 2080.600 |tokens/s 8098.734 |walltime 4686.928 | +Transformer | epoch 0 | step 17690 |avg loss 7.497 |avg tokens 2304.800 |tokens/s 8639.128 |walltime 4689.596 | +Transformer | epoch 0 | step 17700 |avg loss 8.016 |avg tokens 2328.600 |tokens/s 8674.887 |walltime 4692.280 | +Transformer | epoch 0 | step 17710 |avg loss 8.299 |avg tokens 2164.900 |tokens/s 8463.375 |walltime 4694.838 | +Transformer | epoch 0 | step 17720 |avg loss 8.289 |avg tokens 1961.000 |tokens/s 7853.883 |walltime 4697.335 | +Transformer | epoch 0 | step 17730 |avg loss 7.721 |avg tokens 2278.400 |tokens/s 8604.708 |walltime 4699.982 | +Transformer | epoch 0 | step 17740 |avg loss 7.674 |avg tokens 2061.600 |tokens/s 7765.614 |walltime 4702.637 | +Transformer | epoch 0 | step 17750 |avg loss 8.187 |avg tokens 1828.900 |tokens/s 7798.497 |walltime 4704.982 | +Transformer | epoch 0 | step 17760 |avg loss 7.898 |avg tokens 2188.000 |tokens/s 8354.293 |walltime 4707.601 | +Transformer | epoch 0 | step 17770 |avg loss 8.301 |avg tokens 2118.000 |tokens/s 8420.455 |walltime 4710.117 | +Transformer | epoch 0 | step 17780 |avg loss 7.904 |avg tokens 1950.300 |tokens/s 7577.109 |walltime 4712.691 | +Transformer | epoch 0 | step 17790 |avg loss 7.807 |avg tokens 2342.400 |tokens/s 8659.363 |walltime 4715.396 | +Transformer | epoch 0 | step 17800 |avg loss 8.021 |avg tokens 2137.800 |tokens/s 8338.887 |walltime 4717.959 | +Transformer | epoch 0 | step 17810 |avg loss 8.450 |avg tokens 2015.900 |tokens/s 7693.029 |walltime 4720.580 | +Transformer | epoch 0 | step 17820 |avg loss 8.003 |avg tokens 2372.300 |tokens/s 9215.771 |walltime 4723.154 | +Transformer | epoch 0 | step 17830 |avg loss 7.877 |avg tokens 2143.200 |tokens/s 8074.681 |walltime 4725.808 | +Transformer | epoch 0 | step 17840 |avg loss 8.187 |avg tokens 2124.100 |tokens/s 8360.934 |walltime 4728.349 | +Transformer | epoch 0 | step 17850 |avg loss 7.626 |avg tokens 2075.700 |tokens/s 7927.143 |walltime 4730.967 | +Transformer | epoch 0 | step 17860 |avg loss 7.617 |avg tokens 2279.400 |tokens/s 8259.393 |walltime 4733.727 | +Transformer | epoch 0 | step 17870 |avg loss 7.974 |avg tokens 2151.300 |tokens/s 8268.453 |walltime 4736.329 | +Transformer | epoch 0 | step 17880 |avg loss 8.337 |avg tokens 2087.700 |tokens/s 8284.577 |walltime 4738.849 | +Transformer | epoch 0 | step 17890 |avg loss 7.887 |avg tokens 2085.000 |tokens/s 8188.686 |walltime 4741.395 | +Transformer | epoch 0 | step 17900 |avg loss 7.768 |avg tokens 2114.700 |tokens/s 8179.043 |walltime 4743.981 | +Transformer | epoch 0 | step 17910 |avg loss 7.866 |avg tokens 2174.200 |tokens/s 8298.860 |walltime 4746.600 | +Transformer | epoch 0 | step 17920 |avg loss 7.642 |avg tokens 2241.600 |tokens/s 8127.059 |walltime 4749.359 | +Transformer | epoch 0 | step 17930 |avg loss 7.916 |avg tokens 2056.800 |tokens/s 7991.766 |walltime 4751.932 | +Transformer | epoch 0 | step 17940 |avg loss 7.736 |avg tokens 2170.700 |tokens/s 8003.622 |walltime 4754.644 | +Transformer | epoch 0 | step 17950 |avg loss 7.847 |avg tokens 2030.700 |tokens/s 8032.089 |walltime 4757.173 | +Transformer | epoch 0 | step 17960 |avg loss 8.017 |avg tokens 1998.300 |tokens/s 7860.149 |walltime 4759.715 | +Transformer | epoch 0 | step 17970 |avg loss 7.666 |avg tokens 2327.000 |tokens/s 8660.751 |walltime 4762.402 | +Transformer | epoch 0 | step 17980 |avg loss 7.632 |avg tokens 2292.600 |tokens/s 8599.894 |walltime 4765.068 | +Transformer | epoch 0 | step 17990 |avg loss 8.145 |avg tokens 2212.400 |tokens/s 8579.497 |walltime 4767.646 | +Transformer | epoch 0 | step 18000 |avg loss 7.697 |avg tokens 2146.700 |tokens/s 8144.575 |walltime 4770.282 | +Transformer | epoch 0 | step 18010 |avg loss 7.903 |avg tokens 2257.700 |tokens/s 8776.511 |walltime 4772.855 | +Transformer | epoch 0 | step 18020 |avg loss 7.825 |avg tokens 2233.300 |tokens/s 8598.709 |walltime 4775.452 | +Transformer | epoch 0 | step 18030 |avg loss 7.661 |avg tokens 2124.800 |tokens/s 7900.878 |walltime 4778.141 | +Transformer | epoch 0 | step 18040 |avg loss 7.986 |avg tokens 1805.200 |tokens/s 7478.921 |walltime 4780.555 | +Transformer | epoch 0 | step 18050 |avg loss 8.142 |avg tokens 2214.900 |tokens/s 8448.046 |walltime 4783.177 | +Transformer | epoch 0 | step 18060 |avg loss 7.617 |avg tokens 2319.000 |tokens/s 8406.629 |walltime 4785.935 | +Transformer | epoch 0 | step 18070 |avg loss 7.496 |avg tokens 2379.200 |tokens/s 8575.806 |walltime 4788.710 | +Transformer | epoch 0 | step 18080 |avg loss 7.999 |avg tokens 2170.000 |tokens/s 8279.252 |walltime 4791.331 | +Transformer | epoch 0 | step 18090 |avg loss 8.183 |avg tokens 2121.400 |tokens/s 8458.795 |walltime 4793.838 | +Transformer | epoch 0 | step 18100 |avg loss 7.994 |avg tokens 2307.800 |tokens/s 8734.792 |walltime 4796.481 | +Transformer | epoch 0 | step 18110 |avg loss 7.309 |avg tokens 2157.500 |tokens/s 8036.543 |walltime 4799.165 | +Transformer | epoch 0 | step 18120 |avg loss 8.013 |avg tokens 2203.500 |tokens/s 8311.564 |walltime 4801.816 | +Transformer | epoch 0 | step 18130 |avg loss 8.301 |avg tokens 2141.200 |tokens/s 8768.308 |walltime 4804.258 | +Transformer | epoch 0 | step 18140 |avg loss 7.756 |avg tokens 2030.900 |tokens/s 7866.476 |walltime 4806.840 | +Transformer | epoch 0 | step 18150 |avg loss 7.923 |avg tokens 2326.000 |tokens/s 8331.725 |walltime 4809.632 | +Transformer | epoch 0 | step 18160 |avg loss 7.526 |avg tokens 2442.400 |tokens/s 8870.232 |walltime 4812.385 | +Transformer | epoch 0 | step 18170 |avg loss 7.822 |avg tokens 2369.400 |tokens/s 8534.781 |walltime 4815.161 | +Transformer | epoch 0 | step 18180 |avg loss 7.851 |avg tokens 2348.800 |tokens/s 8694.412 |walltime 4817.863 | +Transformer | epoch 0 | step 18190 |avg loss 7.946 |avg tokens 1978.700 |tokens/s 7535.212 |walltime 4820.489 | +Transformer | epoch 0 | step 18200 |avg loss 7.724 |avg tokens 2344.400 |tokens/s 8625.623 |walltime 4823.207 | +Transformer | epoch 0 | step 18210 |avg loss 7.897 |avg tokens 1971.800 |tokens/s 7724.374 |walltime 4825.760 | +Transformer | epoch 0 | step 18220 |avg loss 8.012 |avg tokens 2197.500 |tokens/s 8486.845 |walltime 4828.349 | +Transformer | epoch 0 | step 18230 |avg loss 7.940 |avg tokens 1982.300 |tokens/s 7844.663 |walltime 4830.876 | +Transformer | epoch 0 | step 18240 |avg loss 7.758 |avg tokens 1964.800 |tokens/s 7831.936 |walltime 4833.384 | +Transformer | epoch 0 | step 18250 |avg loss 7.498 |avg tokens 2280.300 |tokens/s 8202.498 |walltime 4836.164 | +Transformer | epoch 0 | step 18260 |avg loss 7.343 |avg tokens 2311.500 |tokens/s 8393.812 |walltime 4838.918 | +Transformer | epoch 0 | step 18270 |avg loss 7.944 |avg tokens 2192.600 |tokens/s 8346.474 |walltime 4841.545 | +Transformer | epoch 0 | step 18280 |avg loss 7.816 |avg tokens 2294.100 |tokens/s 8538.608 |walltime 4844.232 | +Transformer | epoch 0 | step 18290 |avg loss 8.016 |avg tokens 2025.000 |tokens/s 7936.350 |walltime 4846.784 | +Transformer | epoch 0 | step 18300 |avg loss 7.083 |avg tokens 2337.600 |tokens/s 8354.722 |walltime 4849.582 | +Transformer | epoch 0 | step 18310 |avg loss 8.118 |avg tokens 2240.400 |tokens/s 8338.072 |walltime 4852.268 | +Transformer | epoch 0 | step 18320 |avg loss 7.775 |avg tokens 2214.200 |tokens/s 8244.620 |walltime 4854.954 | +Transformer | epoch 0 | step 18330 |avg loss 8.106 |avg tokens 1860.800 |tokens/s 7604.928 |walltime 4857.401 | +Transformer | epoch 0 | step 18340 |avg loss 7.799 |avg tokens 2186.400 |tokens/s 8494.187 |walltime 4859.975 | +Transformer | epoch 0 | step 18350 |avg loss 7.895 |avg tokens 2220.500 |tokens/s 8432.055 |walltime 4862.608 | +Transformer | epoch 0 | step 18360 |avg loss 7.390 |avg tokens 2336.800 |tokens/s 8428.261 |walltime 4865.381 | +Transformer | epoch 0 | step 18370 |avg loss 7.603 |avg tokens 2260.000 |tokens/s 8201.717 |walltime 4868.136 | +Transformer | epoch 0 | step 18380 |avg loss 7.740 |avg tokens 2275.200 |tokens/s 8484.929 |walltime 4870.818 | +Transformer | epoch 0 | step 18390 |avg loss 7.703 |avg tokens 2332.500 |tokens/s 8727.694 |walltime 4873.490 | +Transformer | epoch 0 | step 18400 |avg loss 7.675 |avg tokens 2315.200 |tokens/s 8404.787 |walltime 4876.245 | +Transformer | epoch 0 | step 18410 |avg loss 8.051 |avg tokens 2098.100 |tokens/s 8405.552 |walltime 4878.741 | +Transformer | epoch 0 | step 18420 |avg loss 7.691 |avg tokens 2224.800 |tokens/s 8373.050 |walltime 4881.398 | +Transformer | epoch 0 | step 18430 |avg loss 7.925 |avg tokens 2183.200 |tokens/s 8226.305 |walltime 4884.052 | +Transformer | epoch 0 | step 18440 |avg loss 7.651 |avg tokens 2315.500 |tokens/s 8727.020 |walltime 4886.705 | +Transformer | epoch 0 | step 18450 |avg loss 7.806 |avg tokens 2206.500 |tokens/s 8063.414 |walltime 4889.442 | +Transformer | epoch 0 | step 18460 |avg loss 7.904 |avg tokens 2343.500 |tokens/s 8934.129 |walltime 4892.065 | +Transformer | epoch 0 | step 18470 |avg loss 7.927 |avg tokens 2116.700 |tokens/s 8344.165 |walltime 4894.602 | +Transformer | epoch 0 | step 18480 |avg loss 7.861 |avg tokens 2194.700 |tokens/s 8412.557 |walltime 4897.211 | +Transformer | epoch 0 | step 18490 |avg loss 8.075 |avg tokens 1772.800 |tokens/s 7138.854 |walltime 4899.694 | +Transformer | epoch 0 | step 18500 |avg loss 7.952 |avg tokens 2307.500 |tokens/s 8820.548 |walltime 4902.310 | +Transformer | epoch 0 | step 18510 |avg loss 7.908 |avg tokens 2206.000 |tokens/s 8147.552 |walltime 4905.017 | +Transformer | epoch 0 | step 18520 |avg loss 7.518 |avg tokens 2186.400 |tokens/s 7988.657 |walltime 4907.754 | +Transformer | epoch 0 | step 18530 |avg loss 7.776 |avg tokens 2054.500 |tokens/s 7808.164 |walltime 4910.386 | +Transformer | epoch 0 | step 18540 |avg loss 7.846 |avg tokens 2356.600 |tokens/s 8579.300 |walltime 4913.132 | +Transformer | epoch 0 | step 18550 |avg loss 7.540 |avg tokens 2241.100 |tokens/s 8286.937 |walltime 4915.837 | +Transformer | epoch 0 | step 18560 |avg loss 7.793 |avg tokens 2010.100 |tokens/s 7794.601 |walltime 4918.416 | +Transformer | epoch 0 | step 18570 |avg loss 7.762 |avg tokens 2109.300 |tokens/s 8116.278 |walltime 4921.015 | +Transformer | epoch 0 | step 18580 |avg loss 8.165 |avg tokens 2355.400 |tokens/s 8805.129 |walltime 4923.690 | +Transformer | epoch 0 | step 18590 |avg loss 8.198 |avg tokens 2095.300 |tokens/s 8396.214 |walltime 4926.185 | +Transformer | epoch 0 | step 18600 |avg loss 7.760 |avg tokens 2405.800 |tokens/s 9097.589 |walltime 4928.830 | +Transformer | epoch 0 | step 18610 |avg loss 7.679 |avg tokens 2091.000 |tokens/s 8013.555 |walltime 4931.439 | +Transformer | epoch 0 | step 18620 |avg loss 8.071 |avg tokens 2130.700 |tokens/s 8248.510 |walltime 4934.022 | +Transformer | epoch 0 | step 18630 |avg loss 7.962 |avg tokens 2119.100 |tokens/s 8309.379 |walltime 4936.572 | +Transformer | epoch 0 | step 18640 |avg loss 7.779 |avg tokens 2239.500 |tokens/s 8410.933 |walltime 4939.235 | +Transformer | epoch 0 | step 18650 |avg loss 7.612 |avg tokens 2200.200 |tokens/s 8240.880 |walltime 4941.905 | +Transformer | epoch 0 | step 18660 |avg loss 7.615 |avg tokens 2144.300 |tokens/s 8122.340 |walltime 4944.545 | +Transformer | epoch 0 | step 18670 |avg loss 7.770 |avg tokens 2138.300 |tokens/s 8200.792 |walltime 4947.152 | +Transformer | epoch 0 | step 18680 |avg loss 7.870 |avg tokens 2253.700 |tokens/s 8684.688 |walltime 4949.747 | +Transformer | epoch 0 | step 18690 |avg loss 8.089 |avg tokens 2264.500 |tokens/s 8970.046 |walltime 4952.272 | +Transformer | epoch 0 | step 18700 |avg loss 7.718 |avg tokens 1863.200 |tokens/s 7523.682 |walltime 4954.748 | +Transformer | epoch 0 | step 18710 |avg loss 7.699 |avg tokens 2300.200 |tokens/s 8477.068 |walltime 4957.462 | +Transformer | epoch 0 | step 18720 |avg loss 7.896 |avg tokens 2132.800 |tokens/s 8310.515 |walltime 4960.028 | +Transformer | epoch 0 | step 18730 |avg loss 8.155 |avg tokens 1757.900 |tokens/s 7282.629 |walltime 4962.442 | +Transformer | epoch 0 | step 18740 |avg loss 8.111 |avg tokens 2015.500 |tokens/s 7843.069 |walltime 4965.012 | +Transformer | epoch 0 | step 18750 |avg loss 7.870 |avg tokens 2199.900 |tokens/s 8331.170 |walltime 4967.652 | +Transformer | epoch 0 | step 18760 |avg loss 7.859 |avg tokens 2351.000 |tokens/s 8665.518 |walltime 4970.365 | +Transformer | epoch 0 | step 18770 |avg loss 7.483 |avg tokens 2216.900 |tokens/s 8388.091 |walltime 4973.008 | +Transformer | epoch 0 | step 18780 |avg loss 7.804 |avg tokens 2110.300 |tokens/s 8155.598 |walltime 4975.596 | +Transformer | epoch 0 | step 18790 |avg loss 7.953 |avg tokens 2164.000 |tokens/s 8318.919 |walltime 4978.197 | +Transformer | epoch 0 | step 18800 |avg loss 8.158 |avg tokens 2131.500 |tokens/s 8487.354 |walltime 4980.708 | +Transformer | epoch 0 | step 18810 |avg loss 7.807 |avg tokens 2215.000 |tokens/s 8073.229 |walltime 4983.452 | +Transformer | epoch 0 | step 18820 |avg loss 7.287 |avg tokens 2315.200 |tokens/s 8353.710 |walltime 4986.224 | +Transformer | epoch 0 | step 18830 |avg loss 7.597 |avg tokens 2345.300 |tokens/s 8615.664 |walltime 4988.946 | +Transformer | epoch 0 | step 18840 |avg loss 7.781 |avg tokens 2125.300 |tokens/s 8104.594 |walltime 4991.568 | +Transformer | epoch 0 | step 18850 |avg loss 7.901 |avg tokens 2445.500 |tokens/s 8951.760 |walltime 4994.300 | +Transformer | epoch 0 | step 18860 |avg loss 7.911 |avg tokens 2175.200 |tokens/s 8226.703 |walltime 4996.944 | +Transformer | epoch 0 | step 18870 |avg loss 7.766 |avg tokens 2218.000 |tokens/s 8487.223 |walltime 4999.557 | +Transformer | epoch 0 | step 18880 |avg loss 7.809 |avg tokens 2061.600 |tokens/s 7836.516 |walltime 5002.188 | +Transformer | epoch 0 | step 18890 |avg loss 7.841 |avg tokens 2317.300 |tokens/s 8451.924 |walltime 5004.930 | +Transformer | epoch 0 | step 18900 |avg loss 8.124 |avg tokens 1983.800 |tokens/s 7869.665 |walltime 5007.451 | +Transformer | epoch 0 | step 18910 |avg loss 7.687 |avg tokens 2162.300 |tokens/s 8047.848 |walltime 5010.137 | +Transformer | epoch 0 | step 18920 |avg loss 8.283 |avg tokens 2224.800 |tokens/s 8480.292 |walltime 5012.761 | +Transformer | epoch 0 | step 18930 |avg loss 7.880 |avg tokens 2103.200 |tokens/s 7914.759 |walltime 5015.418 | +Transformer | epoch 0 | step 18940 |avg loss 8.450 |avg tokens 2325.400 |tokens/s 9349.315 |walltime 5017.906 | +Transformer | epoch 0 | step 18950 |avg loss 8.049 |avg tokens 2265.900 |tokens/s 8644.081 |walltime 5020.527 | +Transformer | epoch 0 | step 18960 |avg loss 7.904 |avg tokens 2126.600 |tokens/s 8515.179 |walltime 5023.024 | +Transformer | epoch 0 | step 18970 |avg loss 7.786 |avg tokens 2214.400 |tokens/s 8300.175 |walltime 5025.692 | +Transformer | epoch 0 | step 18980 |avg loss 8.195 |avg tokens 2184.700 |tokens/s 8503.636 |walltime 5028.261 | +Transformer | epoch 0 | step 18990 |avg loss 7.922 |avg tokens 2338.000 |tokens/s 8620.608 |walltime 5030.973 | +Transformer | epoch 0 | step 19000 |avg loss 7.689 |avg tokens 2244.400 |tokens/s 8121.492 |walltime 5033.737 | +Transformer | epoch 0 | step 19010 |avg loss 7.521 |avg tokens 2168.500 |tokens/s 8147.126 |walltime 5036.399 | +Transformer | epoch 0 | step 19020 |avg loss 7.833 |avg tokens 2255.000 |tokens/s 8389.666 |walltime 5039.086 | +Transformer | epoch 0 | step 19030 |avg loss 8.108 |avg tokens 2038.700 |tokens/s 7990.656 |walltime 5041.638 | +Transformer | epoch 0 | step 19040 |avg loss 7.981 |avg tokens 1983.800 |tokens/s 7721.985 |walltime 5044.207 | +Transformer | epoch 0 | step 19050 |avg loss 7.885 |avg tokens 2110.300 |tokens/s 7990.973 |walltime 5046.848 | +Transformer | epoch 0 | step 19060 |avg loss 7.797 |avg tokens 2214.400 |tokens/s 8360.275 |walltime 5049.496 | +Transformer | epoch 0 | step 19070 |avg loss 8.129 |avg tokens 1935.600 |tokens/s 7722.584 |walltime 5052.003 | +Transformer | epoch 0 | step 19080 |avg loss 7.672 |avg tokens 2097.600 |tokens/s 7854.715 |walltime 5054.673 | +Transformer | epoch 0 | step 19090 |avg loss 8.146 |avg tokens 2122.100 |tokens/s 8359.953 |walltime 5057.212 | +Transformer | epoch 0 | step 19100 |avg loss 7.502 |avg tokens 2340.000 |tokens/s 8360.847 |walltime 5060.011 | +Transformer | epoch 0 | step 19110 |avg loss 7.763 |avg tokens 2232.000 |tokens/s 8321.186 |walltime 5062.693 | +Transformer | epoch 0 | step 19120 |avg loss 7.826 |avg tokens 2198.300 |tokens/s 8453.442 |walltime 5065.293 | +Transformer | epoch 0 | step 19130 |avg loss 7.775 |avg tokens 2097.600 |tokens/s 8102.086 |walltime 5067.882 | +Transformer | epoch 0 | step 19140 |avg loss 7.700 |avg tokens 2265.300 |tokens/s 8359.241 |walltime 5070.592 | +Transformer | epoch 0 | step 19150 |avg loss 7.575 |avg tokens 2304.000 |tokens/s 8550.363 |walltime 5073.287 | +Transformer | epoch 0 | step 19160 |avg loss 7.361 |avg tokens 2286.400 |tokens/s 8350.117 |walltime 5076.025 | +Transformer | epoch 0 | step 19170 |avg loss 8.165 |avg tokens 2069.200 |tokens/s 7919.039 |walltime 5078.638 | +Transformer | epoch 0 | step 19180 |avg loss 7.971 |avg tokens 2306.900 |tokens/s 8790.607 |walltime 5081.262 | +Transformer | epoch 0 | step 19190 |avg loss 7.957 |avg tokens 1958.100 |tokens/s 7989.014 |walltime 5083.713 | +Transformer | epoch 0 | step 19200 |avg loss 8.004 |avg tokens 1984.200 |tokens/s 7825.780 |walltime 5086.249 | +Transformer | epoch 0 | step 19210 |avg loss 8.163 |avg tokens 1851.900 |tokens/s 7412.618 |walltime 5088.747 | +Transformer | epoch 0 | step 19220 |avg loss 8.054 |avg tokens 1975.400 |tokens/s 8115.626 |walltime 5091.181 | +Transformer | epoch 0 | step 19230 |avg loss 7.599 |avg tokens 2149.700 |tokens/s 8022.767 |walltime 5093.861 | +Transformer | epoch 0 | step 19240 |avg loss 7.799 |avg tokens 2325.400 |tokens/s 8334.517 |walltime 5096.651 | +Transformer | epoch 0 | step 19250 |avg loss 7.602 |avg tokens 2273.600 |tokens/s 8289.622 |walltime 5099.393 | +Transformer | epoch 0 | step 19260 |avg loss 7.720 |avg tokens 2216.000 |tokens/s 8210.565 |walltime 5102.092 | +Transformer | epoch 0 | step 19270 |avg loss 7.884 |avg tokens 2333.700 |tokens/s 8532.532 |walltime 5104.827 | +Transformer | epoch 0 | step 19280 |avg loss 7.733 |avg tokens 2262.600 |tokens/s 8320.402 |walltime 5107.547 | +Transformer | epoch 0 | step 19290 |avg loss 7.577 |avg tokens 2174.800 |tokens/s 8159.327 |walltime 5110.212 | +Transformer | epoch 0 | step 19300 |avg loss 7.776 |avg tokens 2032.800 |tokens/s 7832.923 |walltime 5112.807 | +Transformer | epoch 0 | step 19310 |avg loss 8.230 |avg tokens 2209.300 |tokens/s 8759.115 |walltime 5115.330 | +Transformer | epoch 0 | step 19320 |avg loss 7.666 |avg tokens 2186.400 |tokens/s 8178.540 |walltime 5118.003 | +Transformer | epoch 0 | step 19330 |avg loss 7.894 |avg tokens 2412.700 |tokens/s 8701.228 |walltime 5120.776 | +Transformer | epoch 0 | step 19340 |avg loss 8.168 |avg tokens 2043.200 |tokens/s 8143.003 |walltime 5123.285 | +Transformer | epoch 0 | step 19350 |avg loss 7.850 |avg tokens 2003.900 |tokens/s 7896.507 |walltime 5125.823 | +Transformer | epoch 0 | step 19360 |avg loss 7.602 |avg tokens 2171.900 |tokens/s 8106.804 |walltime 5128.502 | +Transformer | epoch 0 | step 19370 |avg loss 7.897 |avg tokens 2212.400 |tokens/s 8537.505 |walltime 5131.093 | +Transformer | epoch 0 | step 19380 |avg loss 7.724 |avg tokens 2168.000 |tokens/s 7982.262 |walltime 5133.809 | +Transformer | epoch 0 | step 19390 |avg loss 8.036 |avg tokens 2061.100 |tokens/s 8143.975 |walltime 5136.340 | +Transformer | epoch 0 | step 19400 |avg loss 7.840 |avg tokens 2383.400 |tokens/s 8637.862 |walltime 5139.099 | +Transformer | epoch 0 | step 19410 |avg loss 7.895 |avg tokens 2203.200 |tokens/s 8267.944 |walltime 5141.764 | +Transformer | epoch 0 | step 19420 |avg loss 7.836 |avg tokens 2311.200 |tokens/s 8763.315 |walltime 5144.401 | +Transformer | epoch 0 | step 19430 |avg loss 8.370 |avg tokens 2005.800 |tokens/s 8369.666 |walltime 5146.798 | +Transformer | epoch 0 | step 19440 |avg loss 7.707 |avg tokens 2300.000 |tokens/s 8399.130 |walltime 5149.536 | +Transformer | epoch 0 | step 19450 |avg loss 7.796 |avg tokens 1876.300 |tokens/s 7684.904 |walltime 5151.978 | +Transformer | epoch 0 | step 19460 |avg loss 8.128 |avg tokens 2057.000 |tokens/s 8142.015 |walltime 5154.504 | +Transformer | epoch 0 | step 19470 |avg loss 7.849 |avg tokens 2311.200 |tokens/s 8563.757 |walltime 5157.203 | +Transformer | epoch 0 | step 19480 |avg loss 7.672 |avg tokens 2238.700 |tokens/s 8337.312 |walltime 5159.888 | +Transformer | epoch 0 | step 19490 |avg loss 7.504 |avg tokens 2396.000 |tokens/s 8804.016 |walltime 5162.610 | +Transformer | epoch 0 | step 19500 |avg loss 8.111 |avg tokens 2201.500 |tokens/s 8500.220 |walltime 5165.200 | +Transformer | epoch 0 | step 19510 |avg loss 8.089 |avg tokens 2003.800 |tokens/s 8089.462 |walltime 5167.677 | +Transformer | epoch 0 | step 19520 |avg loss 7.906 |avg tokens 2276.000 |tokens/s 8348.131 |walltime 5170.403 | +Transformer | epoch 0 | step 19530 |avg loss 7.508 |avg tokens 2245.600 |tokens/s 8148.636 |walltime 5173.159 | +Transformer | epoch 0 | step 19540 |avg loss 7.829 |avg tokens 2253.600 |tokens/s 8443.922 |walltime 5175.828 | +Transformer | epoch 0 | step 19550 |avg loss 7.705 |avg tokens 2172.800 |tokens/s 8230.637 |walltime 5178.468 | +Transformer | epoch 0 | step 19560 |avg loss 7.831 |avg tokens 2238.400 |tokens/s 8482.798 |walltime 5181.106 | +Transformer | epoch 0 | step 19570 |avg loss 7.585 |avg tokens 2372.000 |tokens/s 8628.913 |walltime 5183.855 | +Transformer | epoch 0 | step 19580 |avg loss 7.905 |avg tokens 2192.800 |tokens/s 7998.761 |walltime 5186.597 | +Transformer | epoch 0 | step 19590 |avg loss 7.773 |avg tokens 2308.000 |tokens/s 8383.553 |walltime 5189.350 | +Transformer | epoch 0 | step 19600 |avg loss 7.576 |avg tokens 2216.300 |tokens/s 8134.749 |walltime 5192.074 | +Transformer | epoch 0 | step 19610 |avg loss 7.670 |avg tokens 2327.200 |tokens/s 8827.312 |walltime 5194.711 | +Transformer | epoch 0 | step 19620 |avg loss 7.516 |avg tokens 2173.600 |tokens/s 8049.087 |walltime 5197.411 | +Transformer | epoch 0 | step 19630 |avg loss 7.984 |avg tokens 1958.400 |tokens/s 7589.259 |walltime 5199.992 | +Transformer | epoch 0 | step 19640 |avg loss 7.675 |avg tokens 2354.400 |tokens/s 8617.861 |walltime 5202.724 | +Transformer | epoch 0 | step 19650 |avg loss 8.098 |avg tokens 1848.900 |tokens/s 7492.031 |walltime 5205.191 | +Transformer | epoch 0 | step 19660 |avg loss 7.600 |avg tokens 2048.100 |tokens/s 7835.995 |walltime 5207.805 | +Transformer | epoch 0 | step 19670 |avg loss 8.069 |avg tokens 2051.500 |tokens/s 7924.278 |walltime 5210.394 | +Transformer | epoch 0 | step 19680 |avg loss 7.834 |avg tokens 2202.900 |tokens/s 8332.120 |walltime 5213.038 | +Transformer | epoch 0 | step 19690 |avg loss 7.931 |avg tokens 1992.700 |tokens/s 7756.854 |walltime 5215.607 | +Transformer | epoch 0 | step 19700 |avg loss 7.711 |avg tokens 2292.700 |tokens/s 8571.274 |walltime 5218.282 | +Transformer | epoch 0 | step 19710 |avg loss 7.855 |avg tokens 2177.600 |tokens/s 8194.325 |walltime 5220.939 | +Transformer | epoch 0 | step 19720 |avg loss 8.080 |avg tokens 2186.100 |tokens/s 8268.905 |walltime 5223.583 | +Transformer | epoch 0 | step 19730 |avg loss 8.091 |avg tokens 2064.100 |tokens/s 8167.073 |walltime 5226.110 | +Transformer | epoch 0 | step 19740 |avg loss 7.631 |avg tokens 2184.600 |tokens/s 8077.833 |walltime 5228.815 | +Transformer | epoch 0 | step 19750 |avg loss 7.786 |avg tokens 2054.400 |tokens/s 8074.680 |walltime 5231.359 | +Transformer | epoch 0 | step 19760 |avg loss 7.601 |avg tokens 2303.200 |tokens/s 8285.624 |walltime 5234.139 | +Transformer | epoch 0 | step 19770 |avg loss 8.326 |avg tokens 2110.700 |tokens/s 8495.812 |walltime 5236.623 | +Transformer | epoch 0 | step 19780 |avg loss 7.554 |avg tokens 2130.000 |tokens/s 8052.331 |walltime 5239.268 | +Transformer | epoch 0 | step 19790 |avg loss 8.096 |avg tokens 2100.900 |tokens/s 8161.127 |walltime 5241.843 | +Transformer | epoch 0 | step 19800 |avg loss 7.439 |avg tokens 2246.400 |tokens/s 8389.274 |walltime 5244.520 | +Transformer | epoch 0 | step 19810 |avg loss 8.084 |avg tokens 2102.900 |tokens/s 8303.717 |walltime 5247.053 | +Transformer | epoch 0 | step 19820 |avg loss 7.897 |avg tokens 2208.800 |tokens/s 8284.129 |walltime 5249.719 | +Transformer | epoch 0 | step 19830 |avg loss 7.888 |avg tokens 2067.000 |tokens/s 8167.744 |walltime 5252.250 | +Transformer | epoch 0 | step 19840 |avg loss 8.132 |avg tokens 2020.000 |tokens/s 8059.342 |walltime 5254.756 | +Transformer | epoch 0 | step 19850 |avg loss 8.053 |avg tokens 2246.200 |tokens/s 8671.832 |walltime 5257.346 | +Transformer | epoch 0 | step 19860 |avg loss 7.796 |avg tokens 2324.000 |tokens/s 8753.793 |walltime 5260.001 | +Transformer | epoch 0 | step 19870 |avg loss 8.161 |avg tokens 2090.200 |tokens/s 8693.503 |walltime 5262.406 | +Transformer | epoch 0 | step 19880 |avg loss 8.024 |avg tokens 2054.000 |tokens/s 8066.115 |walltime 5264.952 | +Transformer | epoch 0 | step 19890 |avg loss 7.924 |avg tokens 2028.000 |tokens/s 7982.810 |walltime 5267.493 | +Transformer | epoch 0 | step 19900 |avg loss 7.673 |avg tokens 2087.000 |tokens/s 8110.670 |walltime 5270.066 | +Transformer | epoch 0 | step 19910 |avg loss 7.847 |avg tokens 2208.800 |tokens/s 8301.719 |walltime 5272.726 | +Transformer | epoch 0 | step 19920 |avg loss 8.046 |avg tokens 2128.800 |tokens/s 8317.046 |walltime 5275.286 | +Transformer | epoch 0 | step 19930 |avg loss 7.507 |avg tokens 2389.600 |tokens/s 8401.883 |walltime 5278.130 | +Transformer | epoch 0 | step 19940 |avg loss 8.072 |avg tokens 2074.600 |tokens/s 7984.172 |walltime 5280.728 | +Transformer | epoch 0 | step 19950 |avg loss 7.600 |avg tokens 2096.200 |tokens/s 7837.939 |walltime 5283.403 | +Transformer | epoch 0 | step 19960 |avg loss 7.944 |avg tokens 2060.400 |tokens/s 8019.697 |walltime 5285.972 | +Transformer | epoch 0 | step 19970 |avg loss 7.688 |avg tokens 2189.300 |tokens/s 8261.370 |walltime 5288.622 | +Transformer | epoch 0 | step 19980 |avg loss 8.235 |avg tokens 1950.300 |tokens/s 7595.531 |walltime 5291.190 | +Transformer | epoch 0 | step 19990 |avg loss 7.629 |avg tokens 2389.900 |tokens/s 8833.316 |walltime 5293.895 | +Transformer | epoch 0 | step 20000 |avg loss 7.795 |avg tokens 2292.000 |tokens/s 8629.114 |walltime 5296.552 | +Transformer | epoch 0 | step 20010 |avg loss 8.042 |avg tokens 2209.500 |tokens/s 8431.420 |walltime 5299.172 | +Transformer | epoch 0 | step 20020 |avg loss 7.746 |avg tokens 2021.100 |tokens/s 7812.776 |walltime 5301.759 | +Transformer | epoch 0 | step 20030 |avg loss 7.563 |avg tokens 2359.400 |tokens/s 8404.238 |walltime 5304.566 | +Transformer | epoch 0 | step 20040 |avg loss 7.859 |avg tokens 1972.100 |tokens/s 7816.510 |walltime 5307.089 | +Transformer | epoch 0 | step 20050 |avg loss 7.819 |avg tokens 2090.800 |tokens/s 7876.145 |walltime 5309.744 | +Transformer | epoch 0 | step 20060 |avg loss 8.077 |avg tokens 2234.000 |tokens/s 8743.343 |walltime 5312.299 | +Transformer | epoch 0 | step 20070 |avg loss 8.379 |avg tokens 1933.200 |tokens/s 8135.024 |walltime 5314.675 | +Transformer | epoch 0 | step 20080 |avg loss 7.729 |avg tokens 2383.200 |tokens/s 8863.919 |walltime 5317.364 | +Transformer | epoch 0 | step 20090 |avg loss 7.632 |avg tokens 2290.100 |tokens/s 8243.958 |walltime 5320.142 | +Transformer | epoch 0 | step 20100 |avg loss 7.792 |avg tokens 2000.200 |tokens/s 7770.727 |walltime 5322.716 | +Transformer | epoch 0 | step 20110 |avg loss 7.830 |avg tokens 2173.700 |tokens/s 8377.842 |walltime 5325.311 | +Transformer | epoch 0 | step 20120 |avg loss 7.667 |avg tokens 2175.900 |tokens/s 7994.920 |walltime 5328.032 | +Transformer | epoch 0 | step 20130 |avg loss 7.953 |avg tokens 2278.100 |tokens/s 8509.878 |walltime 5330.709 | +Transformer | epoch 0 | step 20140 |avg loss 7.702 |avg tokens 2282.800 |tokens/s 8271.751 |walltime 5333.469 | +Transformer | epoch 0 | step 20150 |avg loss 7.933 |avg tokens 2337.800 |tokens/s 8734.550 |walltime 5336.146 | +Transformer | epoch 0 | step 20160 |avg loss 8.047 |avg tokens 2001.200 |tokens/s 8181.292 |walltime 5338.592 | +Transformer | epoch 0 | step 20170 |avg loss 8.004 |avg tokens 2117.600 |tokens/s 8318.061 |walltime 5341.137 | +Transformer | epoch 0 | step 20180 |avg loss 7.921 |avg tokens 2080.500 |tokens/s 8120.330 |walltime 5343.699 | +Transformer | epoch 0 | step 20190 |avg loss 8.066 |avg tokens 2122.400 |tokens/s 8301.469 |walltime 5346.256 | +Transformer | epoch 0 | step 20200 |avg loss 7.144 |avg tokens 2376.000 |tokens/s 8456.563 |walltime 5349.066 | +Transformer | epoch 0 | step 20210 |avg loss 8.064 |avg tokens 1824.600 |tokens/s 7453.005 |walltime 5351.514 | +Transformer | epoch 0 | step 20220 |avg loss 7.597 |avg tokens 2258.400 |tokens/s 8468.907 |walltime 5354.181 | +Transformer | epoch 0 | step 20230 |avg loss 8.111 |avg tokens 2112.700 |tokens/s 8222.518 |walltime 5356.750 | +Transformer | epoch 0 | step 20240 |avg loss 8.125 |avg tokens 2241.400 |tokens/s 8947.645 |walltime 5359.255 | +Transformer | epoch 0 | step 20250 |avg loss 7.880 |avg tokens 2187.800 |tokens/s 8298.836 |walltime 5361.891 | +Transformer | epoch 0 | step 20260 |avg loss 7.917 |avg tokens 2224.100 |tokens/s 8398.533 |walltime 5364.540 | +Transformer | epoch 0 | step 20270 |avg loss 8.037 |avg tokens 2104.800 |tokens/s 8180.780 |walltime 5367.112 | +Transformer | epoch 0 | step 20280 |avg loss 7.756 |avg tokens 2379.200 |tokens/s 8858.198 |walltime 5369.798 | +Transformer | epoch 0 | step 20290 |avg loss 7.595 |avg tokens 2255.400 |tokens/s 8279.041 |walltime 5372.523 | +Transformer | epoch 0 | step 20300 |avg loss 7.742 |avg tokens 2117.700 |tokens/s 8012.391 |walltime 5375.166 | +Transformer | epoch 0 | step 20310 |avg loss 7.983 |avg tokens 1898.900 |tokens/s 7755.068 |walltime 5377.614 | +Transformer | epoch 0 | step 20320 |avg loss 7.811 |avg tokens 2409.700 |tokens/s 8728.966 |walltime 5380.375 | +Transformer | epoch 0 | step 20330 |avg loss 7.700 |avg tokens 2181.900 |tokens/s 8239.427 |walltime 5383.023 | +Transformer | epoch 0 | step 20340 |avg loss 7.906 |avg tokens 2043.000 |tokens/s 7954.996 |walltime 5385.591 | +Transformer | epoch 0 | step 20350 |avg loss 7.504 |avg tokens 2280.800 |tokens/s 8351.669 |walltime 5388.322 | +Transformer | epoch 0 | step 20360 |avg loss 7.520 |avg tokens 1946.000 |tokens/s 7641.874 |walltime 5390.869 | +Transformer | epoch 0 | step 20370 |avg loss 7.355 |avg tokens 2321.600 |tokens/s 8355.212 |walltime 5393.647 | +Transformer | epoch 0 | step 20380 |avg loss 7.371 |avg tokens 2179.900 |tokens/s 8176.430 |walltime 5396.313 | +Transformer | epoch 0 | step 20390 |avg loss 8.122 |avg tokens 2165.700 |tokens/s 8136.950 |walltime 5398.975 | +Transformer | epoch 0 | step 20400 |avg loss 7.998 |avg tokens 1925.400 |tokens/s 7589.790 |walltime 5401.512 | +Transformer | epoch 0 | step 20410 |avg loss 8.115 |avg tokens 2071.200 |tokens/s 8297.027 |walltime 5404.008 | +Transformer | epoch 0 | step 20420 |avg loss 7.596 |avg tokens 2139.500 |tokens/s 8036.726 |walltime 5406.670 | +Transformer | epoch 0 | step 20430 |avg loss 7.749 |avg tokens 2367.400 |tokens/s 8616.341 |walltime 5409.418 | +Transformer | epoch 0 | step 20440 |avg loss 7.822 |avg tokens 2137.100 |tokens/s 8075.371 |walltime 5412.064 | +Transformer | epoch 0 | step 20450 |avg loss 7.755 |avg tokens 2163.600 |tokens/s 8014.347 |walltime 5414.764 | +Transformer | epoch 0 | step 20460 |avg loss 8.033 |avg tokens 2099.800 |tokens/s 7877.667 |walltime 5417.429 | +Transformer | epoch 0 | step 20470 |avg loss 7.795 |avg tokens 2255.200 |tokens/s 8297.993 |walltime 5420.147 | +Transformer | epoch 0 | step 20480 |avg loss 7.545 |avg tokens 2128.000 |tokens/s 8060.550 |walltime 5422.787 | +Transformer | epoch 0 | step 20490 |avg loss 7.957 |avg tokens 2078.100 |tokens/s 8269.828 |walltime 5425.300 | +Transformer | epoch 0 | step 20500 |avg loss 7.819 |avg tokens 2365.300 |tokens/s 8893.022 |walltime 5427.960 | +Transformer | epoch 0 | step 20510 |avg loss 7.714 |avg tokens 2339.200 |tokens/s 8982.037 |walltime 5430.564 | +Transformer | epoch 0 | step 20520 |avg loss 8.056 |avg tokens 2283.500 |tokens/s 8560.137 |walltime 5433.232 | +Transformer | epoch 0 | step 20530 |avg loss 7.794 |avg tokens 2255.500 |tokens/s 8424.169 |walltime 5435.909 | +Transformer | epoch 0 | step 20540 |avg loss 8.186 |avg tokens 1986.500 |tokens/s 8046.717 |walltime 5438.378 | +Transformer | epoch 0 | step 20550 |avg loss 7.831 |avg tokens 2188.800 |tokens/s 8110.828 |walltime 5441.076 | +Transformer | epoch 0 | step 20560 |avg loss 7.945 |avg tokens 2247.700 |tokens/s 8641.471 |walltime 5443.677 | +Transformer | epoch 0 | step 20570 |avg loss 7.836 |avg tokens 2151.500 |tokens/s 8172.211 |walltime 5446.310 | +Transformer | epoch 0 | step 20580 |avg loss 7.646 |avg tokens 2339.400 |tokens/s 8680.868 |walltime 5449.005 | +Transformer | epoch 0 | step 20590 |avg loss 8.163 |avg tokens 2314.800 |tokens/s 8543.237 |walltime 5451.715 | +Transformer | epoch 0 | step 20600 |avg loss 7.974 |avg tokens 2235.900 |tokens/s 8522.449 |walltime 5454.338 | +Transformer | epoch 0 | step 20610 |avg loss 7.913 |avg tokens 2173.600 |tokens/s 8356.912 |walltime 5456.939 | +Transformer | epoch 0 | step 20620 |avg loss 7.911 |avg tokens 2322.800 |tokens/s 8613.119 |walltime 5459.636 | +Transformer | epoch 0 | step 20630 |avg loss 8.427 |avg tokens 2006.700 |tokens/s 8317.215 |walltime 5462.049 | +Transformer | epoch 0 | step 20640 |avg loss 8.201 |avg tokens 2172.300 |tokens/s 8503.203 |walltime 5464.603 | +Transformer | epoch 0 | step 20650 |avg loss 7.660 |avg tokens 2171.500 |tokens/s 8234.109 |walltime 5467.241 | +Transformer | epoch 0 | step 20660 |avg loss 8.075 |avg tokens 2229.000 |tokens/s 8633.716 |walltime 5469.822 | +Transformer | epoch 0 | step 20670 |avg loss 7.713 |avg tokens 2355.200 |tokens/s 8878.116 |walltime 5472.475 | +Transformer | epoch 0 | step 20680 |avg loss 7.985 |avg tokens 2034.600 |tokens/s 7842.535 |walltime 5475.069 | +Transformer | epoch 0 | step 20690 |avg loss 8.004 |avg tokens 1688.600 |tokens/s 7118.540 |walltime 5477.442 | +Transformer | epoch 0 | step 20700 |avg loss 7.833 |avg tokens 2202.100 |tokens/s 8313.044 |walltime 5480.090 | +Transformer | epoch 0 | step 20710 |avg loss 8.010 |avg tokens 2321.500 |tokens/s 8789.246 |walltime 5482.732 | +Transformer | epoch 0 | step 20720 |avg loss 7.957 |avg tokens 2023.600 |tokens/s 7978.985 |walltime 5485.268 | +Transformer | epoch 0 | step 20730 |avg loss 7.796 |avg tokens 2248.000 |tokens/s 8338.146 |walltime 5487.964 | +Transformer | epoch 0 | step 20740 |avg loss 7.797 |avg tokens 2340.800 |tokens/s 8711.669 |walltime 5490.651 | +Transformer | epoch 0 | step 20750 |avg loss 7.781 |avg tokens 2253.900 |tokens/s 8597.964 |walltime 5493.272 | +Transformer | epoch 0 | step 20760 |avg loss 7.838 |avg tokens 2283.800 |tokens/s 8587.622 |walltime 5495.932 | +Transformer | epoch 0 | step 20770 |avg loss 8.117 |avg tokens 2209.000 |tokens/s 8751.869 |walltime 5498.456 | +Transformer | epoch 0 | step 20780 |avg loss 7.798 |avg tokens 2281.600 |tokens/s 8318.379 |walltime 5501.199 | +Transformer | epoch 0 | step 20790 |avg loss 8.082 |avg tokens 2114.800 |tokens/s 7837.821 |walltime 5503.897 | +Transformer | epoch 0 | step 20800 |avg loss 7.873 |avg tokens 2274.600 |tokens/s 8642.931 |walltime 5506.529 | +Transformer | epoch 0 | step 20810 |avg loss 7.949 |avg tokens 2048.200 |tokens/s 8140.298 |walltime 5509.045 | +Transformer | epoch 0 | step 20820 |avg loss 7.553 |avg tokens 2356.800 |tokens/s 8520.134 |walltime 5511.811 | +Transformer | epoch 0 | step 20830 |avg loss 7.831 |avg tokens 2017.800 |tokens/s 7718.806 |walltime 5514.425 | +Transformer | epoch 0 | step 20840 |avg loss 7.706 |avg tokens 2258.400 |tokens/s 8652.145 |walltime 5517.035 | +Transformer | epoch 0 | step 20850 |avg loss 8.141 |avg tokens 2111.800 |tokens/s 8198.198 |walltime 5519.611 | +Transformer | epoch 0 | step 20860 |avg loss 8.164 |avg tokens 1882.800 |tokens/s 7602.013 |walltime 5522.088 | +Transformer | epoch 0 | step 20870 |avg loss 7.770 |avg tokens 2281.600 |tokens/s 8284.756 |walltime 5524.842 | +Transformer | epoch 0 | step 20880 |avg loss 7.800 |avg tokens 2251.600 |tokens/s 8490.244 |walltime 5527.494 | +Transformer | epoch 0 | step 20890 |avg loss 8.147 |avg tokens 2204.900 |tokens/s 8393.753 |walltime 5530.121 | +Transformer | epoch 0 | step 20900 |avg loss 7.675 |avg tokens 2316.800 |tokens/s 8375.627 |walltime 5532.887 | +Transformer | epoch 0 | step 20910 |avg loss 8.039 |avg tokens 2299.300 |tokens/s 9087.190 |walltime 5535.417 | +Transformer | epoch 0 | step 20920 |avg loss 7.756 |avg tokens 2176.900 |tokens/s 8216.640 |walltime 5538.067 | +Transformer | epoch 0 | step 20930 |avg loss 7.903 |avg tokens 2175.300 |tokens/s 8546.292 |walltime 5540.612 | +Transformer | epoch 0 | step 20940 |avg loss 7.841 |avg tokens 2058.400 |tokens/s 7907.542 |walltime 5543.215 | +Transformer | epoch 0 | step 20950 |avg loss 7.914 |avg tokens 2283.200 |tokens/s 8554.460 |walltime 5545.884 | +Transformer | epoch 0 | step 20960 |avg loss 7.667 |avg tokens 2290.200 |tokens/s 8275.993 |walltime 5548.651 | +Transformer | epoch 0 | step 20970 |avg loss 7.726 |avg tokens 2086.500 |tokens/s 7868.671 |walltime 5551.303 | +Transformer | epoch 0 | step 20980 |avg loss 7.694 |avg tokens 2175.200 |tokens/s 7763.462 |walltime 5554.105 | +Transformer | epoch 0 | step 20990 |avg loss 7.887 |avg tokens 2178.100 |tokens/s 8325.649 |walltime 5556.721 | +Transformer | epoch 0 | step 21000 |avg loss 8.073 |avg tokens 2079.400 |tokens/s 8456.035 |walltime 5559.180 | +Transformer | epoch 0 | step 21010 |avg loss 7.507 |avg tokens 2152.900 |tokens/s 7964.728 |walltime 5561.883 | +Transformer | epoch 0 | step 21020 |avg loss 8.015 |avg tokens 2336.400 |tokens/s 8622.246 |walltime 5564.593 | +Transformer | epoch 0 | step 21030 |avg loss 7.973 |avg tokens 2083.900 |tokens/s 8045.706 |walltime 5567.183 | +Transformer | epoch 0 | step 21040 |avg loss 8.181 |avg tokens 2116.200 |tokens/s 8225.971 |walltime 5569.755 | +Transformer | epoch 0 | step 21050 |avg loss 8.122 |avg tokens 2197.600 |tokens/s 8644.199 |walltime 5572.298 | +Transformer | epoch 0 | step 21060 |avg loss 7.850 |avg tokens 2155.300 |tokens/s 8242.962 |walltime 5574.912 | +Transformer | epoch 0 | step 21070 |avg loss 7.809 |avg tokens 2200.800 |tokens/s 8425.404 |walltime 5577.525 | +Transformer | epoch 0 | step 21080 |avg loss 7.924 |avg tokens 2340.000 |tokens/s 8711.833 |walltime 5580.211 | +Transformer | epoch 0 | step 21090 |avg loss 7.930 |avg tokens 2277.000 |tokens/s 8399.418 |walltime 5582.921 | +Transformer | epoch 0 | step 21100 |avg loss 7.748 |avg tokens 2143.200 |tokens/s 8127.695 |walltime 5585.558 | +Transformer | epoch 0 | step 21110 |avg loss 7.714 |avg tokens 2166.400 |tokens/s 8156.658 |walltime 5588.214 | +Transformer | epoch 0 | step 21120 |avg loss 7.725 |avg tokens 2104.700 |tokens/s 8040.049 |walltime 5590.832 | +Transformer | epoch 0 | step 21130 |avg loss 7.685 |avg tokens 2091.200 |tokens/s 8239.855 |walltime 5593.370 | +Transformer | epoch 0 | step 21140 |avg loss 8.119 |avg tokens 2323.000 |tokens/s 8669.158 |walltime 5596.050 | +Transformer | epoch 0 | step 21150 |avg loss 7.947 |avg tokens 2253.600 |tokens/s 8344.457 |walltime 5598.750 | +Transformer | epoch 0 | step 21160 |avg loss 8.186 |avg tokens 2092.400 |tokens/s 8376.991 |walltime 5601.248 | +Transformer | epoch 0 | step 21170 |avg loss 7.822 |avg tokens 2223.000 |tokens/s 8175.861 |walltime 5603.967 | +Transformer | epoch 0 | step 21180 |avg loss 7.662 |avg tokens 2240.800 |tokens/s 8232.364 |walltime 5606.689 | +Transformer | epoch 0 | step 21190 |avg loss 8.103 |avg tokens 2075.700 |tokens/s 8578.519 |walltime 5609.109 | +Transformer | epoch 0 | step 21200 |avg loss 7.663 |avg tokens 2252.100 |tokens/s 8236.242 |walltime 5611.843 | +Transformer | epoch 0 | step 21210 |avg loss 7.564 |avg tokens 2120.400 |tokens/s 8059.707 |walltime 5614.474 | +Transformer | epoch 0 | step 21220 |avg loss 7.773 |avg tokens 2193.100 |tokens/s 8281.347 |walltime 5617.122 | +Transformer | epoch 0 | step 21230 |avg loss 7.615 |avg tokens 2429.600 |tokens/s 8918.900 |walltime 5619.846 | +Transformer | epoch 0 | step 21240 |avg loss 7.645 |avg tokens 2413.600 |tokens/s 8670.689 |walltime 5622.630 | +Transformer | epoch 0 | step 21250 |avg loss 7.875 |avg tokens 2288.000 |tokens/s 8382.633 |walltime 5625.359 | +Transformer | epoch 0 | step 21260 |avg loss 7.154 |avg tokens 2273.600 |tokens/s 8186.921 |walltime 5628.137 | +Transformer | epoch 0 | step 21270 |avg loss 7.880 |avg tokens 2128.300 |tokens/s 8174.100 |walltime 5630.740 | +Transformer | epoch 0 | step 21280 |avg loss 7.665 |avg tokens 2344.200 |tokens/s 8542.967 |walltime 5633.484 | +Transformer | epoch 0 | step 21290 |avg loss 7.884 |avg tokens 2255.000 |tokens/s 8453.646 |walltime 5636.152 | +Transformer | epoch 0 | step 21300 |avg loss 8.051 |avg tokens 2006.600 |tokens/s 8041.249 |walltime 5638.647 | +Transformer | epoch 0 | step 21310 |avg loss 7.687 |avg tokens 2302.400 |tokens/s 8245.016 |walltime 5641.440 | +Transformer | epoch 0 | step 21320 |avg loss 8.007 |avg tokens 2128.000 |tokens/s 8225.290 |walltime 5644.027 | +Transformer | epoch 0 | step 21330 |avg loss 7.792 |avg tokens 2358.900 |tokens/s 8649.074 |walltime 5646.754 | +Transformer | epoch 0 | step 21340 |avg loss 8.319 |avg tokens 2074.200 |tokens/s 8451.048 |walltime 5649.209 | +Transformer | epoch 0 | step 21350 |avg loss 8.173 |avg tokens 1998.600 |tokens/s 8155.601 |walltime 5651.659 | +Transformer | epoch 0 | step 21360 |avg loss 8.075 |avg tokens 2109.800 |tokens/s 8410.150 |walltime 5654.168 | +Transformer | epoch 0 | step 21370 |avg loss 7.680 |avg tokens 2183.200 |tokens/s 8124.783 |walltime 5656.855 | +Transformer | epoch 0 | step 21380 |avg loss 7.907 |avg tokens 2149.100 |tokens/s 8268.049 |walltime 5659.454 | +Transformer | epoch 0 | step 21390 |avg loss 7.972 |avg tokens 2049.600 |tokens/s 7872.153 |walltime 5662.058 | +Transformer | epoch 0 | step 21400 |avg loss 7.708 |avg tokens 2192.200 |tokens/s 8332.549 |walltime 5664.689 | +Transformer | epoch 0 | step 21410 |avg loss 7.948 |avg tokens 2109.900 |tokens/s 8327.216 |walltime 5667.222 | +Transformer | epoch 0 | step 21420 |avg loss 8.128 |avg tokens 2149.500 |tokens/s 8576.206 |walltime 5669.729 | +Transformer | epoch 0 | step 21430 |avg loss 7.600 |avg tokens 2128.300 |tokens/s 7996.485 |walltime 5672.390 | +Transformer | epoch 0 | step 21440 |avg loss 7.731 |avg tokens 2264.000 |tokens/s 8322.613 |walltime 5675.111 | +Transformer | epoch 0 | step 21450 |avg loss 7.606 |avg tokens 2299.200 |tokens/s 8474.336 |walltime 5677.824 | +Transformer | epoch 0 | step 21460 |avg loss 7.912 |avg tokens 2341.800 |tokens/s 8524.418 |walltime 5680.571 | +Transformer | epoch 0 | step 21470 |avg loss 7.679 |avg tokens 2164.900 |tokens/s 8164.889 |walltime 5683.222 | +Transformer | epoch 0 | step 21480 |avg loss 7.823 |avg tokens 2185.600 |tokens/s 8352.364 |walltime 5685.839 | +Transformer | epoch 0 | step 21490 |avg loss 8.162 |avg tokens 2090.500 |tokens/s 8282.023 |walltime 5688.363 | +Transformer | epoch 0 | step 21500 |avg loss 8.399 |avg tokens 2163.800 |tokens/s 8561.599 |walltime 5690.891 | +Transformer | epoch 0 | step 21510 |avg loss 7.786 |avg tokens 2182.800 |tokens/s 8354.826 |walltime 5693.503 | +Transformer | epoch 0 | step 21520 |avg loss 8.126 |avg tokens 2101.600 |tokens/s 8547.778 |walltime 5695.962 | +Transformer | epoch 0 | step 21530 |avg loss 8.071 |avg tokens 2150.800 |tokens/s 8169.421 |walltime 5698.595 | +Transformer | epoch 0 | step 21540 |avg loss 7.427 |avg tokens 2053.400 |tokens/s 7918.247 |walltime 5701.188 | +Transformer | epoch 0 | step 21550 |avg loss 8.072 |avg tokens 2037.200 |tokens/s 8314.997 |walltime 5703.638 | +Transformer | epoch 0 | step 21560 |avg loss 8.035 |avg tokens 2241.300 |tokens/s 8538.041 |walltime 5706.263 | +Transformer | epoch 0 | step 21570 |avg loss 7.732 |avg tokens 2319.200 |tokens/s 8424.465 |walltime 5709.016 | +Transformer | epoch 0 | step 21580 |avg loss 7.742 |avg tokens 2367.400 |tokens/s 8668.636 |walltime 5711.747 | +Transformer | epoch 0 | step 21590 |avg loss 7.694 |avg tokens 2263.800 |tokens/s 8215.815 |walltime 5714.502 | +Transformer | epoch 0 | step 21600 |avg loss 7.613 |avg tokens 2246.300 |tokens/s 8309.214 |walltime 5717.206 | +Transformer | epoch 0 | step 21610 |avg loss 8.068 |avg tokens 2110.700 |tokens/s 8329.067 |walltime 5719.740 | +Transformer | epoch 0 | step 21620 |avg loss 7.796 |avg tokens 2307.200 |tokens/s 8423.392 |walltime 5722.479 | +Transformer | epoch 0 | step 21630 |avg loss 8.243 |avg tokens 2054.700 |tokens/s 8232.741 |walltime 5724.975 | +Transformer | epoch 0 | step 21640 |avg loss 7.815 |avg tokens 2337.200 |tokens/s 8530.372 |walltime 5727.715 | +Transformer | epoch 0 | step 21650 |avg loss 7.661 |avg tokens 2331.900 |tokens/s 8630.902 |walltime 5730.416 | +Transformer | epoch 0 | step 21660 |avg loss 7.620 |avg tokens 2241.300 |tokens/s 8262.538 |walltime 5733.129 | +Transformer | epoch 0 | step 21670 |avg loss 7.972 |avg tokens 1997.100 |tokens/s 7657.442 |walltime 5735.737 | +Transformer | epoch 0 | step 21680 |avg loss 7.817 |avg tokens 2225.900 |tokens/s 8317.524 |walltime 5738.413 | +Transformer | epoch 0 | step 21690 |avg loss 7.497 |avg tokens 2113.800 |tokens/s 7920.668 |walltime 5741.082 | +Transformer | epoch 0 | step 21700 |avg loss 7.479 |avg tokens 2192.000 |tokens/s 7950.807 |walltime 5743.839 | +Transformer | epoch 0 | step 21710 |avg loss 7.758 |avg tokens 2118.900 |tokens/s 8074.042 |walltime 5746.463 | +Transformer | epoch 0 | step 21720 |avg loss 7.651 |avg tokens 2245.200 |tokens/s 8307.521 |walltime 5749.166 | +Transformer | epoch 0 | step 21730 |avg loss 7.975 |avg tokens 2215.000 |tokens/s 8358.669 |walltime 5751.816 | +Transformer | epoch 0 | step 21740 |avg loss 8.278 |avg tokens 2049.200 |tokens/s 8102.865 |walltime 5754.345 | +Transformer | epoch 0 | step 21750 |avg loss 8.143 |avg tokens 2130.400 |tokens/s 8343.496 |walltime 5756.898 | +Transformer | epoch 0 | step 21760 |avg loss 7.389 |avg tokens 2390.400 |tokens/s 8586.289 |walltime 5759.682 | +Transformer | epoch 0 | step 21770 |avg loss 8.117 |avg tokens 2177.200 |tokens/s 8571.277 |walltime 5762.222 | +Transformer | epoch 0 | step 21780 |avg loss 7.593 |avg tokens 2242.900 |tokens/s 8253.157 |walltime 5764.940 | +Transformer | epoch 0 | step 21790 |avg loss 7.507 |avg tokens 2391.200 |tokens/s 8818.862 |walltime 5767.651 | +Transformer | epoch 0 | step 21800 |avg loss 8.083 |avg tokens 2257.000 |tokens/s 8412.250 |walltime 5770.334 | +Transformer | epoch 0 | step 21810 |avg loss 8.062 |avg tokens 2073.800 |tokens/s 8111.909 |walltime 5772.891 | +Transformer | epoch 0 | step 21820 |avg loss 7.761 |avg tokens 2240.600 |tokens/s 8344.525 |walltime 5775.576 | +Transformer | epoch 0 | step 21830 |avg loss 7.808 |avg tokens 2218.400 |tokens/s 8226.595 |walltime 5778.273 | +Transformer | epoch 0 | step 21840 |avg loss 8.115 |avg tokens 2064.300 |tokens/s 8391.184 |walltime 5780.733 | +Transformer | epoch 0 | step 21850 |avg loss 7.829 |avg tokens 2213.300 |tokens/s 8339.768 |walltime 5783.387 | +Transformer | epoch 0 | step 21860 |avg loss 8.049 |avg tokens 1950.900 |tokens/s 7831.343 |walltime 5785.878 | +Transformer | epoch 0 | step 21870 |avg loss 7.758 |avg tokens 2212.500 |tokens/s 8373.229 |walltime 5788.520 | +Transformer | epoch 0 | step 21880 |avg loss 8.242 |avg tokens 2059.900 |tokens/s 8185.115 |walltime 5791.037 | +Transformer | epoch 0 | step 21890 |avg loss 7.699 |avg tokens 2344.000 |tokens/s 8584.616 |walltime 5793.767 | +Transformer | epoch 0 | step 21900 |avg loss 7.926 |avg tokens 2226.300 |tokens/s 8395.302 |walltime 5796.419 | +Transformer | epoch 0 | step 21910 |avg loss 8.171 |avg tokens 2391.700 |tokens/s 8988.280 |walltime 5799.080 | +Transformer | epoch 0 | step 21920 |avg loss 7.796 |avg tokens 2349.200 |tokens/s 8703.769 |walltime 5801.779 | +Transformer | epoch 0 | step 21930 |avg loss 7.904 |avg tokens 2321.300 |tokens/s 8662.788 |walltime 5804.459 | +Transformer | epoch 0 | step 21940 |avg loss 7.731 |avg tokens 2240.000 |tokens/s 8303.745 |walltime 5807.156 | +Transformer | epoch 0 | step 21950 |avg loss 8.257 |avg tokens 2066.400 |tokens/s 8478.656 |walltime 5809.593 | +Transformer | epoch 0 | step 21960 |avg loss 7.634 |avg tokens 2216.900 |tokens/s 8219.846 |walltime 5812.290 | +Transformer | epoch 0 | step 21970 |avg loss 7.487 |avg tokens 2260.800 |tokens/s 8302.950 |walltime 5815.013 | +Transformer | epoch 0 | step 21980 |avg loss 7.480 |avg tokens 2344.800 |tokens/s 8568.589 |walltime 5817.750 | +Transformer | epoch 0 | step 21990 |avg loss 7.613 |avg tokens 2149.800 |tokens/s 8137.475 |walltime 5820.392 | +Transformer | epoch 0 | step 22000 |avg loss 8.049 |avg tokens 2173.800 |tokens/s 8337.164 |walltime 5822.999 | +Transformer | epoch 0 | step 22010 |avg loss 7.907 |avg tokens 2087.900 |tokens/s 8031.107 |walltime 5825.599 | +Transformer | epoch 0 | step 22020 |avg loss 8.088 |avg tokens 2074.700 |tokens/s 8237.588 |walltime 5828.117 | +Transformer | epoch 0 | step 22030 |avg loss 8.168 |avg tokens 1987.800 |tokens/s 8071.874 |walltime 5830.580 | +Transformer | epoch 0 | step 22040 |avg loss 7.590 |avg tokens 2232.800 |tokens/s 8345.478 |walltime 5833.255 | +Transformer | epoch 0 | step 22050 |avg loss 7.757 |avg tokens 2238.500 |tokens/s 8421.318 |walltime 5835.914 | +Transformer | epoch 0 | step 22060 |avg loss 7.375 |avg tokens 2238.300 |tokens/s 8232.966 |walltime 5838.632 | +Transformer | epoch 0 | step 22070 |avg loss 8.193 |avg tokens 1934.500 |tokens/s 7873.657 |walltime 5841.089 | +Transformer | epoch 0 | step 22080 |avg loss 7.984 |avg tokens 2271.800 |tokens/s 8761.163 |walltime 5843.682 | +Transformer | epoch 0 | step 22090 |avg loss 7.935 |avg tokens 2349.000 |tokens/s 8762.920 |walltime 5846.363 | +Transformer | epoch 0 | step 22100 |avg loss 8.057 |avg tokens 1958.900 |tokens/s 7902.327 |walltime 5848.842 | +Transformer | epoch 0 | step 22110 |avg loss 8.040 |avg tokens 2177.900 |tokens/s 8467.878 |walltime 5851.414 | +Transformer | epoch 0 | step 22120 |avg loss 7.707 |avg tokens 2310.000 |tokens/s 8392.529 |walltime 5854.166 | +Transformer | epoch 0 | step 22130 |avg loss 7.676 |avg tokens 2320.600 |tokens/s 8830.332 |walltime 5856.794 | +Transformer | epoch 0 | step 22140 |avg loss 8.021 |avg tokens 2160.300 |tokens/s 8324.552 |walltime 5859.389 | +Transformer | epoch 0 | step 22150 |avg loss 7.656 |avg tokens 2067.200 |tokens/s 7771.541 |walltime 5862.049 | +Transformer | epoch 0 | step 22160 |avg loss 7.840 |avg tokens 1937.400 |tokens/s 7755.487 |walltime 5864.547 | +Transformer | epoch 0 | step 22170 |avg loss 7.603 |avg tokens 2219.200 |tokens/s 8413.011 |walltime 5867.185 | +Transformer | epoch 0 | step 22180 |avg loss 7.824 |avg tokens 2127.100 |tokens/s 8067.406 |walltime 5869.822 | +Transformer | epoch 0 | step 22190 |avg loss 8.289 |avg tokens 1715.600 |tokens/s 7454.658 |walltime 5872.123 | +Transformer | epoch 0 | step 22200 |avg loss 7.539 |avg tokens 2344.000 |tokens/s 8573.818 |walltime 5874.857 | +Transformer | epoch 0 | step 22210 |avg loss 7.872 |avg tokens 2336.300 |tokens/s 8991.177 |walltime 5877.456 | +Transformer | epoch 0 | step 22220 |avg loss 7.976 |avg tokens 2238.100 |tokens/s 8981.436 |walltime 5879.948 | +Transformer | epoch 0 | step 22230 |avg loss 7.920 |avg tokens 2197.600 |tokens/s 8351.354 |walltime 5882.579 | +Transformer | epoch 0 | step 22240 |avg loss 7.737 |avg tokens 2139.700 |tokens/s 7997.060 |walltime 5885.255 | +Transformer | epoch 0 | step 22250 |avg loss 7.854 |avg tokens 2235.200 |tokens/s 8399.355 |walltime 5887.916 | +Transformer | epoch 0 | step 22260 |avg loss 7.967 |avg tokens 2149.500 |tokens/s 8368.528 |walltime 5890.484 | +Transformer | epoch 0 | step 22270 |avg loss 7.649 |avg tokens 2334.500 |tokens/s 8405.824 |walltime 5893.262 | +Transformer | epoch 0 | step 22280 |avg loss 7.855 |avg tokens 2077.000 |tokens/s 8070.483 |walltime 5895.835 | +Transformer | epoch 0 | step 22290 |avg loss 7.821 |avg tokens 2280.700 |tokens/s 8456.249 |walltime 5898.532 | +Transformer | epoch 0 | step 22300 |avg loss 7.613 |avg tokens 2207.100 |tokens/s 8325.891 |walltime 5901.183 | +Transformer | epoch 0 | step 22310 |avg loss 7.666 |avg tokens 2083.700 |tokens/s 8023.686 |walltime 5903.780 | +Transformer | epoch 0 | step 22320 |avg loss 7.642 |avg tokens 2131.700 |tokens/s 7870.510 |walltime 5906.488 | +Transformer | epoch 0 | step 22330 |avg loss 7.926 |avg tokens 2344.800 |tokens/s 8600.080 |walltime 5909.215 | +Transformer | epoch 0 | step 22340 |avg loss 7.951 |avg tokens 2265.000 |tokens/s 8505.546 |walltime 5911.878 | +Transformer | epoch 0 | step 22350 |avg loss 7.559 |avg tokens 2242.400 |tokens/s 8268.726 |walltime 5914.590 | +Transformer | epoch 0 | step 22360 |avg loss 7.813 |avg tokens 2199.600 |tokens/s 8259.510 |walltime 5917.253 | +Transformer | epoch 0 | step 22370 |avg loss 7.648 |avg tokens 2224.800 |tokens/s 8200.408 |walltime 5919.966 | +Transformer | epoch 0 | step 22380 |avg loss 7.967 |avg tokens 2243.600 |tokens/s 8605.798 |walltime 5922.573 | +Transformer | epoch 0 | step 22390 |avg loss 7.620 |avg tokens 2210.400 |tokens/s 8191.814 |walltime 5925.271 | +Transformer | epoch 0 | step 22400 |avg loss 8.073 |avg tokens 2219.200 |tokens/s 8650.101 |walltime 5927.837 | +Transformer | epoch 0 | step 22410 |avg loss 8.122 |avg tokens 1887.000 |tokens/s 7643.975 |walltime 5930.306 | +Transformer | epoch 0 | step 22420 |avg loss 7.867 |avg tokens 2213.500 |tokens/s 8119.090 |walltime 5933.032 | +Transformer | epoch 0 | step 22430 |avg loss 7.677 |avg tokens 2441.900 |tokens/s 8776.413 |walltime 5935.814 | +Transformer | epoch 0 | step 22440 |avg loss 7.824 |avg tokens 2278.400 |tokens/s 8471.743 |walltime 5938.504 | +Transformer | epoch 0 | step 22450 |avg loss 7.978 |avg tokens 2162.100 |tokens/s 8324.307 |walltime 5941.101 | +Transformer | epoch 0 | step 22460 |avg loss 8.027 |avg tokens 2162.200 |tokens/s 8641.847 |walltime 5943.603 | +Transformer | epoch 0 | step 22470 |avg loss 8.031 |avg tokens 1918.600 |tokens/s 7413.550 |walltime 5946.191 | +Transformer | epoch 0 | step 22480 |avg loss 7.921 |avg tokens 2034.500 |tokens/s 7944.293 |walltime 5948.752 | +Transformer | epoch 0 | step 22490 |avg loss 7.602 |avg tokens 2251.000 |tokens/s 8589.374 |walltime 5951.373 | +Transformer | epoch 0 | step 22500 |avg loss 8.113 |avg tokens 2146.600 |tokens/s 8306.807 |walltime 5953.957 | +Transformer | epoch 0 | step 22510 |avg loss 7.627 |avg tokens 2248.900 |tokens/s 8745.536 |walltime 5956.528 | +Transformer | epoch 0 | step 22520 |avg loss 7.944 |avg tokens 1943.900 |tokens/s 8054.316 |walltime 5958.942 | +Transformer | epoch 0 | step 22530 |avg loss 7.173 |avg tokens 2183.100 |tokens/s 7998.032 |walltime 5961.671 | +Transformer | epoch 0 | step 22540 |avg loss 7.875 |avg tokens 2360.300 |tokens/s 8815.880 |walltime 5964.349 | +Transformer | epoch 0 | step 22550 |avg loss 8.092 |avg tokens 2009.600 |tokens/s 8076.456 |walltime 5966.837 | +Transformer | epoch 0 | step 22560 |avg loss 7.770 |avg tokens 2076.100 |tokens/s 8096.348 |walltime 5969.401 | +Transformer | epoch 0 | step 22570 |avg loss 7.521 |avg tokens 2220.100 |tokens/s 8196.487 |walltime 5972.110 | +Transformer | epoch 0 | step 22580 |avg loss 8.026 |avg tokens 2166.000 |tokens/s 8252.719 |walltime 5974.734 | +Transformer | epoch 0 | step 22590 |avg loss 7.876 |avg tokens 2043.000 |tokens/s 7986.373 |walltime 5977.292 | +Transformer | epoch 0 | step 22600 |avg loss 7.790 |avg tokens 2077.000 |tokens/s 7868.113 |walltime 5979.932 | +Transformer | epoch 0 | step 22610 |avg loss 7.654 |avg tokens 2190.400 |tokens/s 8211.251 |walltime 5982.600 | +Transformer | epoch 0 | step 22620 |avg loss 8.428 |avg tokens 2221.800 |tokens/s 8894.311 |walltime 5985.098 | +Transformer | epoch 0 | step 22630 |avg loss 7.729 |avg tokens 2243.900 |tokens/s 8455.733 |walltime 5987.751 | +Transformer | epoch 0 | step 22640 |avg loss 7.957 |avg tokens 2224.500 |tokens/s 8489.083 |walltime 5990.372 | +Transformer | epoch 0 | step 22650 |avg loss 7.598 |avg tokens 2340.000 |tokens/s 8552.688 |walltime 5993.108 | +Transformer | epoch 0 | step 22660 |avg loss 7.494 |avg tokens 2165.600 |tokens/s 7954.996 |walltime 5995.830 | +Transformer | epoch 0 | step 22670 |avg loss 7.758 |avg tokens 2211.700 |tokens/s 8280.336 |walltime 5998.501 | +Transformer | epoch 0 | step 22680 |avg loss 7.885 |avg tokens 2232.000 |tokens/s 8395.599 |walltime 6001.160 | +Transformer | epoch 0 | step 22690 |avg loss 7.684 |avg tokens 2198.500 |tokens/s 8352.005 |walltime 6003.792 | +Transformer | epoch 0 | step 22700 |avg loss 7.418 |avg tokens 2266.400 |tokens/s 8670.764 |walltime 6006.406 | +Transformer | epoch 0 | step 22710 |avg loss 7.634 |avg tokens 2239.400 |tokens/s 8278.367 |walltime 6009.111 | +Transformer | epoch 0 | step 22720 |avg loss 7.749 |avg tokens 2163.700 |tokens/s 8146.650 |walltime 6011.767 | +Transformer | epoch 0 | step 22730 |avg loss 7.845 |avg tokens 1990.000 |tokens/s 7680.200 |walltime 6014.358 | +Transformer | epoch 0 | step 22740 |avg loss 7.943 |avg tokens 2102.600 |tokens/s 8049.282 |walltime 6016.970 | +Transformer | epoch 0 | step 22750 |avg loss 7.691 |avg tokens 2154.400 |tokens/s 8270.661 |walltime 6019.575 | +Transformer | epoch 0 | step 22760 |avg loss 7.577 |avg tokens 2260.000 |tokens/s 8467.967 |walltime 6022.244 | +Transformer | epoch 0 | step 22770 |avg loss 7.857 |avg tokens 1946.600 |tokens/s 7736.253 |walltime 6024.760 | +Transformer | epoch 0 | step 22780 |avg loss 7.908 |avg tokens 2294.400 |tokens/s 8561.979 |walltime 6027.440 | +Transformer | epoch 0 | step 22790 |avg loss 7.919 |avg tokens 1940.500 |tokens/s 7746.054 |walltime 6029.945 | +Transformer | epoch 0 | step 22800 |avg loss 7.865 |avg tokens 2081.300 |tokens/s 8085.237 |walltime 6032.519 | +Transformer | epoch 0 | step 22810 |avg loss 8.111 |avg tokens 2012.200 |tokens/s 7927.166 |walltime 6035.058 | +Transformer | epoch 0 | step 22820 |avg loss 7.862 |avg tokens 2235.900 |tokens/s 8411.829 |walltime 6037.716 | +Transformer | epoch 0 | step 22830 |avg loss 7.922 |avg tokens 2028.100 |tokens/s 7787.445 |walltime 6040.320 | +Transformer | epoch 0 | step 22840 |avg loss 7.683 |avg tokens 2354.400 |tokens/s 8625.580 |walltime 6043.049 | +Transformer | epoch 0 | step 22850 |avg loss 7.581 |avg tokens 2291.100 |tokens/s 8206.791 |walltime 6045.841 | +Transformer | epoch 0 | step 22860 |avg loss 7.804 |avg tokens 2273.800 |tokens/s 8281.443 |walltime 6048.587 | +Transformer | epoch 0 | step 22870 |avg loss 7.900 |avg tokens 2264.000 |tokens/s 8609.434 |walltime 6051.217 | +Transformer | epoch 0 | step 22880 |avg loss 7.881 |avg tokens 2205.600 |tokens/s 8290.292 |walltime 6053.877 | +Transformer | epoch 0 | step 22890 |avg loss 7.790 |avg tokens 2222.800 |tokens/s 8255.543 |walltime 6056.570 | +Transformer | epoch 0 | step 22900 |avg loss 7.907 |avg tokens 1942.700 |tokens/s 7752.304 |walltime 6059.075 | +Transformer | epoch 0 | step 22910 |avg loss 7.845 |avg tokens 2418.900 |tokens/s 8823.653 |walltime 6061.817 | +Transformer | epoch 0 | step 22920 |avg loss 7.755 |avg tokens 2176.600 |tokens/s 8245.669 |walltime 6064.457 | +Transformer | epoch 0 | step 22930 |avg loss 8.094 |avg tokens 1872.400 |tokens/s 7614.699 |walltime 6066.915 | +Transformer | epoch 0 | step 22940 |avg loss 7.554 |avg tokens 2366.400 |tokens/s 8560.804 |walltime 6069.680 | +Transformer | epoch 0 | step 22950 |avg loss 7.583 |avg tokens 2335.500 |tokens/s 8480.734 |walltime 6072.434 | +Transformer | epoch 0 | step 22960 |avg loss 7.783 |avg tokens 2098.200 |tokens/s 7777.554 |walltime 6075.131 | +Transformer | epoch 0 | step 22970 |avg loss 7.886 |avg tokens 2154.000 |tokens/s 8165.489 |walltime 6077.769 | +Transformer | epoch 0 | step 22980 |avg loss 7.587 |avg tokens 2368.800 |tokens/s 8600.964 |walltime 6080.523 | +Transformer | epoch 0 | step 22990 |avg loss 7.732 |avg tokens 2100.700 |tokens/s 8129.571 |walltime 6083.107 | +Transformer | epoch 0 | step 23000 |avg loss 7.951 |avg tokens 2094.900 |tokens/s 7986.593 |walltime 6085.730 | +Transformer | epoch 0 | step 23010 |avg loss 7.845 |avg tokens 2117.300 |tokens/s 8035.026 |walltime 6088.366 | +Transformer | epoch 0 | step 23020 |avg loss 7.858 |avg tokens 2169.700 |tokens/s 8451.573 |walltime 6090.933 | +Transformer | epoch 0 | step 23030 |avg loss 7.585 |avg tokens 2208.200 |tokens/s 8194.927 |walltime 6093.627 | +Transformer | epoch 0 | step 23040 |avg loss 7.677 |avg tokens 2259.100 |tokens/s 8561.453 |walltime 6096.266 | +Transformer | epoch 0 | step 23050 |avg loss 7.641 |avg tokens 2256.700 |tokens/s 8377.563 |walltime 6098.960 | +Transformer | epoch 0 | step 23060 |avg loss 7.824 |avg tokens 2065.300 |tokens/s 8172.175 |walltime 6101.487 | +Transformer | epoch 0 | step 23070 |avg loss 7.871 |avg tokens 2211.300 |tokens/s 8314.143 |walltime 6104.147 | +Transformer | epoch 0 | step 23080 |avg loss 7.793 |avg tokens 2171.000 |tokens/s 8653.922 |walltime 6106.655 | +Transformer | epoch 0 | step 23090 |avg loss 7.939 |avg tokens 2288.400 |tokens/s 8728.355 |walltime 6109.277 | +Transformer | epoch 0 | step 23100 |avg loss 7.640 |avg tokens 2358.300 |tokens/s 8477.415 |walltime 6112.059 | +Transformer | epoch 0 | step 23110 |avg loss 8.019 |avg tokens 2121.700 |tokens/s 8287.039 |walltime 6114.619 | +Transformer | epoch 0 | step 23120 |avg loss 7.886 |avg tokens 2126.100 |tokens/s 8141.511 |walltime 6117.231 | +Transformer | epoch 0 | step 23130 |avg loss 8.226 |avg tokens 1970.500 |tokens/s 8233.452 |walltime 6119.624 | +Transformer | epoch 0 | step 23140 |avg loss 8.054 |avg tokens 2074.500 |tokens/s 8215.697 |walltime 6122.149 | +Transformer | epoch 0 | step 23150 |avg loss 8.468 |avg tokens 1918.800 |tokens/s 8473.529 |walltime 6124.414 | +Transformer | epoch 0 | step 23160 |avg loss 7.975 |avg tokens 2194.500 |tokens/s 8509.092 |walltime 6126.993 | +Transformer | epoch 0 | step 23170 |avg loss 7.422 |avg tokens 2344.800 |tokens/s 8397.192 |walltime 6129.785 | +Transformer | epoch 0 | step 23180 |avg loss 7.791 |avg tokens 2324.800 |tokens/s 8735.937 |walltime 6132.446 | +Transformer | epoch 0 | step 23190 |avg loss 7.440 |avg tokens 2318.500 |tokens/s 8415.142 |walltime 6135.201 | +Transformer | epoch 0 | step 23200 |avg loss 8.071 |avg tokens 2000.500 |tokens/s 8002.212 |walltime 6137.701 | +Transformer | epoch 0 | step 23210 |avg loss 7.640 |avg tokens 2396.800 |tokens/s 8661.112 |walltime 6140.469 | +Transformer | epoch 0 | step 23220 |avg loss 7.698 |avg tokens 2150.800 |tokens/s 8389.337 |walltime 6143.032 | +Transformer | epoch 0 | step 23230 |avg loss 7.758 |avg tokens 2180.600 |tokens/s 8221.552 |walltime 6145.685 | +Transformer | epoch 0 | step 23240 |avg loss 7.840 |avg tokens 2288.400 |tokens/s 8570.399 |walltime 6148.355 | +Transformer | epoch 0 | step 23250 |avg loss 7.941 |avg tokens 2285.800 |tokens/s 8681.912 |walltime 6150.988 | +Transformer | epoch 0 | step 23260 |avg loss 7.559 |avg tokens 2267.500 |tokens/s 8287.369 |walltime 6153.724 | +Transformer | epoch 0 | step 23270 |avg loss 8.135 |avg tokens 2239.100 |tokens/s 8804.509 |walltime 6156.267 | +Transformer | epoch 0 | step 23280 |avg loss 7.896 |avg tokens 1926.700 |tokens/s 7572.072 |walltime 6158.811 | +Transformer | epoch 0 | step 23290 |avg loss 7.498 |avg tokens 2227.200 |tokens/s 8148.046 |walltime 6161.545 | +Transformer | epoch 0 | step 23300 |avg loss 7.737 |avg tokens 2124.500 |tokens/s 8032.788 |walltime 6164.190 | +Transformer | epoch 0 | step 23310 |avg loss 7.738 |avg tokens 2260.000 |tokens/s 8567.699 |walltime 6166.827 | +Transformer | epoch 0 | step 23320 |avg loss 8.292 |avg tokens 1943.300 |tokens/s 7984.774 |walltime 6169.261 | +Transformer | epoch 0 | step 23330 |avg loss 8.075 |avg tokens 2293.600 |tokens/s 8786.710 |walltime 6171.871 | +Transformer | epoch 0 | step 23340 |avg loss 7.802 |avg tokens 2273.600 |tokens/s 8325.136 |walltime 6174.602 | +Transformer | epoch 0 | step 23350 |avg loss 7.942 |avg tokens 2129.600 |tokens/s 8200.597 |walltime 6177.199 | +Transformer | epoch 0 | step 23360 |avg loss 7.878 |avg tokens 2198.300 |tokens/s 8265.996 |walltime 6179.859 | +Transformer | epoch 0 | step 23370 |avg loss 8.150 |avg tokens 2084.600 |tokens/s 8439.766 |walltime 6182.329 | +Transformer | epoch 0 | step 23380 |avg loss 7.432 |avg tokens 2168.800 |tokens/s 8119.767 |walltime 6185.000 | +Transformer | epoch 0 | step 23390 |avg loss 7.648 |avg tokens 2280.300 |tokens/s 8333.142 |walltime 6187.736 | +Transformer | epoch 0 | step 23400 |avg loss 7.760 |avg tokens 2133.600 |tokens/s 8443.484 |walltime 6190.263 | +Transformer | epoch 0 | step 23410 |avg loss 7.684 |avg tokens 2149.000 |tokens/s 8467.307 |walltime 6192.801 | +Transformer | epoch 0 | step 23420 |avg loss 7.725 |avg tokens 2050.600 |tokens/s 7915.324 |walltime 6195.392 | +Transformer | epoch 0 | step 23430 |avg loss 7.444 |avg tokens 2282.400 |tokens/s 8272.042 |walltime 6198.151 | +Transformer | epoch 0 | step 23440 |avg loss 8.019 |avg tokens 2292.500 |tokens/s 8569.463 |walltime 6200.826 | +Transformer | epoch 0 | step 23450 |avg loss 7.589 |avg tokens 2026.000 |tokens/s 7868.463 |walltime 6203.401 | +Transformer | epoch 0 | step 23460 |avg loss 7.906 |avg tokens 1909.200 |tokens/s 7687.338 |walltime 6205.885 | +Transformer | epoch 0 | step 23470 |avg loss 7.672 |avg tokens 2101.100 |tokens/s 7988.237 |walltime 6208.515 | +Transformer | epoch 0 | step 23480 |avg loss 7.682 |avg tokens 2284.500 |tokens/s 8254.539 |walltime 6211.282 | +Transformer | epoch 0 | step 23490 |avg loss 8.013 |avg tokens 2245.700 |tokens/s 8258.369 |walltime 6214.002 | +Transformer | epoch 0 | step 23500 |avg loss 7.786 |avg tokens 2214.400 |tokens/s 8118.731 |walltime 6216.729 | +Transformer | epoch 0 | step 23510 |avg loss 7.701 |avg tokens 2146.400 |tokens/s 7952.778 |walltime 6219.428 | +Transformer | epoch 0 | step 23520 |avg loss 7.899 |avg tokens 1962.800 |tokens/s 7681.343 |walltime 6221.983 | +Transformer | epoch 0 | step 23530 |avg loss 7.966 |avg tokens 2380.000 |tokens/s 8874.500 |walltime 6224.665 | +Transformer | epoch 0 | step 23540 |avg loss 7.897 |avg tokens 2198.400 |tokens/s 8365.508 |walltime 6227.293 | +Transformer | epoch 0 | step 23550 |avg loss 7.716 |avg tokens 2301.500 |tokens/s 8496.262 |walltime 6230.002 | +Transformer | epoch 0 | step 23560 |avg loss 7.724 |avg tokens 2222.400 |tokens/s 8255.594 |walltime 6232.694 | +Transformer | epoch 0 | step 23570 |avg loss 7.955 |avg tokens 2090.000 |tokens/s 8040.648 |walltime 6235.293 | +Transformer | epoch 0 | step 23580 |avg loss 7.501 |avg tokens 2322.000 |tokens/s 8590.605 |walltime 6237.996 | +Transformer | epoch 0 | step 23590 |avg loss 7.900 |avg tokens 2203.300 |tokens/s 8469.694 |walltime 6240.598 | +Transformer | epoch 0 | step 23600 |avg loss 7.932 |avg tokens 2240.900 |tokens/s 8349.991 |walltime 6243.281 | +Transformer | epoch 0 | step 23610 |avg loss 8.162 |avg tokens 2122.600 |tokens/s 7980.580 |walltime 6245.941 | +Transformer | epoch 0 | step 23620 |avg loss 8.029 |avg tokens 1800.200 |tokens/s 7575.096 |walltime 6248.318 | +Transformer | epoch 0 | step 23630 |avg loss 7.987 |avg tokens 2206.500 |tokens/s 8377.192 |walltime 6250.952 | +Transformer | epoch 0 | step 23640 |avg loss 8.068 |avg tokens 2329.500 |tokens/s 8953.156 |walltime 6253.553 | +Transformer | epoch 0 | step 23650 |avg loss 7.672 |avg tokens 2236.200 |tokens/s 8400.802 |walltime 6256.215 | +Transformer | epoch 0 | step 23660 |avg loss 7.849 |avg tokens 1913.000 |tokens/s 7769.843 |walltime 6258.677 | +Transformer | epoch 0 | step 23670 |avg loss 7.593 |avg tokens 2241.600 |tokens/s 8267.157 |walltime 6261.389 | +Transformer | epoch 0 | step 23680 |avg loss 7.657 |avg tokens 1943.700 |tokens/s 7830.669 |walltime 6263.871 | +Transformer | epoch 0 | step 23690 |avg loss 7.607 |avg tokens 2234.600 |tokens/s 8253.901 |walltime 6266.578 | +Transformer | epoch 0 | step 23700 |avg loss 7.636 |avg tokens 2082.400 |tokens/s 8125.729 |walltime 6269.141 | +Transformer | epoch 0 | step 23710 |avg loss 7.913 |avg tokens 2243.500 |tokens/s 8524.491 |walltime 6271.773 | +Transformer | epoch 0 | step 23720 |avg loss 7.872 |avg tokens 2190.300 |tokens/s 8289.756 |walltime 6274.415 | +Transformer | epoch 0 | step 23730 |avg loss 8.071 |avg tokens 2191.000 |tokens/s 8636.735 |walltime 6276.952 | +Transformer | epoch 0 | step 23740 |avg loss 7.694 |avg tokens 2324.800 |tokens/s 8628.267 |walltime 6279.646 | +Transformer | epoch 0 | step 23750 |avg loss 7.590 |avg tokens 2239.800 |tokens/s 8255.924 |walltime 6282.359 | +Transformer | epoch 0 | step 23760 |avg loss 7.584 |avg tokens 2339.600 |tokens/s 8481.284 |walltime 6285.118 | +Transformer | epoch 0 | step 23770 |avg loss 7.855 |avg tokens 2154.200 |tokens/s 8176.737 |walltime 6287.752 | +Transformer | epoch 0 | step 23780 |avg loss 7.906 |avg tokens 2280.500 |tokens/s 8659.650 |walltime 6290.386 | +Transformer | epoch 0 | step 23790 |avg loss 7.835 |avg tokens 2076.300 |tokens/s 8058.831 |walltime 6292.962 | +Transformer | epoch 0 | step 23800 |avg loss 7.881 |avg tokens 2193.700 |tokens/s 8291.475 |walltime 6295.608 | +Transformer | epoch 0 | step 23810 |avg loss 7.985 |avg tokens 2163.000 |tokens/s 8457.133 |walltime 6298.166 | +Transformer | epoch 0 | step 23820 |avg loss 7.596 |avg tokens 2327.500 |tokens/s 8512.348 |walltime 6300.900 | +Transformer | epoch 0 | step 23830 |avg loss 7.679 |avg tokens 2294.400 |tokens/s 8558.660 |walltime 6303.581 | +Transformer | epoch 0 | step 23840 |avg loss 8.223 |avg tokens 2208.100 |tokens/s 8757.441 |walltime 6306.102 | +Transformer | epoch 0 | step 23850 |avg loss 7.689 |avg tokens 2026.500 |tokens/s 7772.282 |walltime 6308.710 | +Transformer | epoch 0 | step 23860 |avg loss 7.865 |avg tokens 2170.500 |tokens/s 8371.365 |walltime 6311.302 | +Transformer | epoch 0 | step 23870 |avg loss 7.654 |avg tokens 2186.400 |tokens/s 8035.061 |walltime 6314.023 | +Transformer | epoch 0 | step 23880 |avg loss 7.839 |avg tokens 1859.500 |tokens/s 7413.902 |walltime 6316.531 | +Transformer | epoch 0 | step 23890 |avg loss 7.783 |avg tokens 2335.600 |tokens/s 8649.348 |walltime 6319.232 | +Transformer | epoch 0 | step 23900 |avg loss 7.817 |avg tokens 2245.100 |tokens/s 8329.808 |walltime 6321.927 | +Transformer | epoch 0 | step 23910 |avg loss 7.938 |avg tokens 2107.200 |tokens/s 8088.988 |walltime 6324.532 | +Transformer | epoch 0 | step 23920 |avg loss 7.750 |avg tokens 2258.400 |tokens/s 8390.032 |walltime 6327.224 | +Transformer | epoch 0 | step 23930 |avg loss 7.796 |avg tokens 2185.300 |tokens/s 8410.810 |walltime 6329.822 | +Transformer | epoch 0 | step 23940 |avg loss 7.577 |avg tokens 2215.000 |tokens/s 8366.873 |walltime 6332.469 | +Transformer | epoch 0 | step 23950 |avg loss 8.057 |avg tokens 2213.600 |tokens/s 8657.686 |walltime 6335.026 | +Transformer | epoch 0 | step 23960 |avg loss 7.942 |avg tokens 1841.600 |tokens/s 7632.887 |walltime 6337.439 | +Transformer | epoch 0 | step 23970 |avg loss 8.041 |avg tokens 1776.600 |tokens/s 7122.416 |walltime 6339.933 | +Transformer | epoch 0 | step 23980 |avg loss 7.852 |avg tokens 2261.700 |tokens/s 8610.710 |walltime 6342.560 | +Transformer | epoch 0 | step 23990 |avg loss 7.915 |avg tokens 1943.800 |tokens/s 7642.301 |walltime 6345.103 | +Transformer | epoch 0 | step 24000 |avg loss 7.406 |avg tokens 2151.000 |tokens/s 8104.619 |walltime 6347.757 | +Transformer | epoch 0 | step 24010 |avg loss 7.708 |avg tokens 2320.000 |tokens/s 8447.507 |walltime 6350.504 | +Transformer | epoch 0 | step 24020 |avg loss 7.543 |avg tokens 2133.000 |tokens/s 7954.114 |walltime 6353.185 | +Transformer | epoch 0 | step 24030 |avg loss 7.768 |avg tokens 2064.600 |tokens/s 7863.664 |walltime 6355.811 | +Transformer | epoch 0 | step 24040 |avg loss 7.654 |avg tokens 2397.800 |tokens/s 8679.727 |walltime 6358.574 | +Transformer | epoch 0 | step 24050 |avg loss 7.835 |avg tokens 2311.900 |tokens/s 8503.235 |walltime 6361.292 | +Transformer | epoch 0 | step 24060 |avg loss 7.494 |avg tokens 2161.900 |tokens/s 8057.825 |walltime 6363.975 | +Transformer | epoch 0 | step 24070 |avg loss 7.647 |avg tokens 2147.500 |tokens/s 8100.325 |walltime 6366.626 | +Transformer | epoch 0 | step 24080 |avg loss 8.150 |avg tokens 1926.600 |tokens/s 8081.865 |walltime 6369.010 | +Transformer | epoch 0 | step 24090 |avg loss 7.721 |avg tokens 2108.800 |tokens/s 8067.043 |walltime 6371.624 | +Transformer | epoch 0 | step 24100 |avg loss 7.587 |avg tokens 2064.300 |tokens/s 7931.576 |walltime 6374.227 | +Transformer | epoch 0 | step 24110 |avg loss 7.668 |avg tokens 2272.600 |tokens/s 8378.708 |walltime 6376.939 | +Transformer | epoch 0 | step 24120 |avg loss 7.882 |avg tokens 2019.300 |tokens/s 7882.788 |walltime 6379.501 | +Transformer | epoch 0 | step 24130 |avg loss 7.616 |avg tokens 2233.600 |tokens/s 8162.159 |walltime 6382.238 | +Transformer | epoch 0 | step 24140 |avg loss 7.830 |avg tokens 2141.500 |tokens/s 8002.298 |walltime 6384.914 | +Transformer | epoch 0 | step 24150 |avg loss 8.031 |avg tokens 2223.000 |tokens/s 8471.787 |walltime 6387.538 | +Transformer | epoch 0 | step 24160 |avg loss 7.610 |avg tokens 2235.500 |tokens/s 8228.384 |walltime 6390.255 | +Transformer | epoch 0 | step 24170 |avg loss 7.851 |avg tokens 2132.500 |tokens/s 8110.452 |walltime 6392.884 | +Transformer | epoch 0 | step 24180 |avg loss 7.802 |avg tokens 2181.800 |tokens/s 8321.719 |walltime 6395.506 | +Transformer | epoch 0 | step 24190 |avg loss 7.882 |avg tokens 2208.100 |tokens/s 8580.276 |walltime 6398.079 | +Transformer | epoch 0 | step 24200 |avg loss 7.678 |avg tokens 2210.400 |tokens/s 8048.976 |walltime 6400.825 | +Transformer | epoch 0 | step 24210 |avg loss 7.625 |avg tokens 2198.300 |tokens/s 8192.397 |walltime 6403.509 | +Transformer | epoch 0 | step 24220 |avg loss 7.751 |avg tokens 2263.900 |tokens/s 8355.528 |walltime 6406.218 | +Transformer | epoch 0 | step 24230 |avg loss 7.641 |avg tokens 2187.300 |tokens/s 8222.563 |walltime 6408.878 | +Transformer | epoch 0 | step 24240 |avg loss 7.890 |avg tokens 2179.100 |tokens/s 8151.114 |walltime 6411.552 | +Transformer | epoch 0 | step 24250 |avg loss 7.900 |avg tokens 2085.500 |tokens/s 8311.995 |walltime 6414.061 | +Transformer | epoch 0 | step 24260 |avg loss 7.790 |avg tokens 2169.200 |tokens/s 8017.134 |walltime 6416.766 | +Transformer | epoch 0 | step 24270 |avg loss 7.666 |avg tokens 2235.200 |tokens/s 8339.788 |walltime 6419.447 | +Transformer | epoch 0 | step 24280 |avg loss 7.413 |avg tokens 2314.400 |tokens/s 8419.545 |walltime 6422.195 | +Transformer | epoch 0 | step 24290 |avg loss 7.419 |avg tokens 1864.600 |tokens/s 7306.294 |walltime 6424.748 | +Transformer | epoch 0 | step 24300 |avg loss 7.766 |avg tokens 2258.400 |tokens/s 8498.255 |walltime 6427.405 | +Transformer | epoch 0 | step 24310 |avg loss 8.121 |avg tokens 1907.100 |tokens/s 7941.358 |walltime 6429.806 | +Transformer | epoch 0 | step 24320 |avg loss 8.028 |avg tokens 2216.800 |tokens/s 8433.353 |walltime 6432.435 | +Transformer | epoch 0 | step 24330 |avg loss 7.856 |avg tokens 2185.000 |tokens/s 8330.031 |walltime 6435.058 | +Transformer | epoch 0 | step 24340 |avg loss 7.885 |avg tokens 2025.300 |tokens/s 7783.192 |walltime 6437.660 | +Transformer | epoch 0 | step 24350 |avg loss 8.107 |avg tokens 2217.600 |tokens/s 8840.495 |walltime 6440.169 | +Transformer | epoch 0 | step 24360 |avg loss 7.876 |avg tokens 2290.800 |tokens/s 8852.703 |walltime 6442.756 | +Transformer | epoch 0 | step 24370 |avg loss 7.874 |avg tokens 1962.100 |tokens/s 8000.141 |walltime 6445.209 | +Transformer | epoch 0 | step 24380 |avg loss 7.695 |avg tokens 2236.800 |tokens/s 8099.795 |walltime 6447.971 | +Transformer | epoch 0 | step 24390 |avg loss 7.315 |avg tokens 2201.600 |tokens/s 8218.599 |walltime 6450.649 | +Transformer | epoch 0 | step 24400 |avg loss 7.820 |avg tokens 2269.600 |tokens/s 8467.342 |walltime 6453.330 | +Transformer | epoch 0 | step 24410 |avg loss 7.673 |avg tokens 2267.800 |tokens/s 8416.312 |walltime 6456.024 | +Transformer | epoch 0 | step 24420 |avg loss 7.885 |avg tokens 2067.100 |tokens/s 8067.548 |walltime 6458.587 | +Transformer | epoch 0 | step 24430 |avg loss 7.346 |avg tokens 2342.400 |tokens/s 8270.821 |walltime 6461.419 | +Transformer | epoch 0 | step 24440 |avg loss 7.694 |avg tokens 2285.000 |tokens/s 8526.127 |walltime 6464.099 | +Transformer | epoch 0 | step 24450 |avg loss 7.708 |avg tokens 2117.100 |tokens/s 8172.740 |walltime 6466.689 | +Transformer | epoch 0 | step 24460 |avg loss 7.764 |avg tokens 2150.900 |tokens/s 8191.377 |walltime 6469.315 | +Transformer | epoch 0 | step 24470 |avg loss 7.687 |avg tokens 2356.000 |tokens/s 8618.137 |walltime 6472.049 | +Transformer | epoch 0 | step 24480 |avg loss 8.128 |avg tokens 1899.700 |tokens/s 7766.644 |walltime 6474.495 | +Transformer | epoch 0 | step 24490 |avg loss 7.849 |avg tokens 2271.000 |tokens/s 8401.747 |walltime 6477.198 | +Transformer | epoch 0 | step 24500 |avg loss 7.992 |avg tokens 1959.400 |tokens/s 8256.681 |walltime 6479.571 | +Transformer | epoch 0 | step 24510 |avg loss 7.422 |avg tokens 2245.200 |tokens/s 8327.076 |walltime 6482.267 | +Transformer | epoch 0 | step 24520 |avg loss 7.744 |avg tokens 2211.700 |tokens/s 8290.067 |walltime 6484.935 | +Transformer | epoch 0 | step 24530 |avg loss 7.953 |avg tokens 2185.900 |tokens/s 8533.803 |walltime 6487.496 | +Transformer | epoch 0 | step 24540 |avg loss 7.741 |avg tokens 2193.200 |tokens/s 8378.203 |walltime 6490.114 | +Transformer | epoch 0 | step 24550 |avg loss 7.777 |avg tokens 2296.400 |tokens/s 8562.155 |walltime 6492.796 | +Transformer | epoch 0 | step 24560 |avg loss 7.606 |avg tokens 2417.700 |tokens/s 8873.027 |walltime 6495.521 | +Transformer | epoch 0 | step 24570 |avg loss 7.668 |avg tokens 2326.400 |tokens/s 8560.858 |walltime 6498.239 | +Transformer | epoch 0 | step 24580 |avg loss 7.708 |avg tokens 2241.000 |tokens/s 8283.233 |walltime 6500.944 | +Transformer | epoch 0 | step 24590 |avg loss 7.739 |avg tokens 2303.200 |tokens/s 8566.817 |walltime 6503.633 | +Transformer | epoch 0 | step 24600 |avg loss 7.940 |avg tokens 2255.800 |tokens/s 8674.097 |walltime 6506.233 | +Transformer | epoch 0 | step 24610 |avg loss 7.773 |avg tokens 2068.200 |tokens/s 8030.151 |walltime 6508.809 | +Transformer | epoch 0 | step 24620 |avg loss 7.509 |avg tokens 2276.800 |tokens/s 8290.121 |walltime 6511.555 | +Transformer | epoch 0 | step 24630 |avg loss 8.006 |avg tokens 2135.600 |tokens/s 8196.834 |walltime 6514.160 | +Transformer | epoch 0 | step 24640 |avg loss 7.859 |avg tokens 1980.400 |tokens/s 7825.669 |walltime 6516.691 | +Transformer | epoch 0 | step 24650 |avg loss 7.968 |avg tokens 2179.200 |tokens/s 8437.167 |walltime 6519.274 | +Transformer | epoch 0 | step 24660 |avg loss 7.553 |avg tokens 2368.000 |tokens/s 8410.106 |walltime 6522.090 | +Transformer | epoch 0 | step 24670 |avg loss 7.803 |avg tokens 2108.800 |tokens/s 7884.407 |walltime 6524.764 | +Transformer | epoch 0 | step 24680 |avg loss 7.390 |avg tokens 2364.800 |tokens/s 8436.790 |walltime 6527.567 | +Transformer | epoch 0 | step 24690 |avg loss 8.103 |avg tokens 1830.300 |tokens/s 7403.249 |walltime 6530.040 | +Transformer | epoch 0 | step 24700 |avg loss 7.850 |avg tokens 2089.200 |tokens/s 7906.239 |walltime 6532.682 | +Transformer | epoch 0 | step 24710 |avg loss 7.559 |avg tokens 2132.400 |tokens/s 8107.494 |walltime 6535.312 | +Transformer | epoch 0 | step 24720 |avg loss 7.180 |avg tokens 2379.600 |tokens/s 8489.868 |walltime 6538.115 | +Transformer | epoch 0 | step 24730 |avg loss 7.892 |avg tokens 2061.100 |tokens/s 7934.648 |walltime 6540.713 | +Transformer | epoch 0 | step 24740 |avg loss 7.843 |avg tokens 2033.600 |tokens/s 7943.665 |walltime 6543.273 | +Transformer | epoch 0 | step 24750 |avg loss 7.656 |avg tokens 2012.400 |tokens/s 7751.955 |walltime 6545.869 | +Transformer | epoch 0 | step 24760 |avg loss 7.397 |avg tokens 2379.700 |tokens/s 8380.265 |walltime 6548.708 | +Transformer | epoch 0 | step 24770 |avg loss 7.506 |avg tokens 2368.800 |tokens/s 8581.001 |walltime 6551.469 | +Transformer | epoch 0 | step 24780 |avg loss 7.764 |avg tokens 2202.500 |tokens/s 8090.202 |walltime 6554.191 | +Transformer | epoch 0 | step 24790 |avg loss 7.750 |avg tokens 2458.400 |tokens/s 8942.500 |walltime 6556.940 | +Transformer | epoch 0 | step 24800 |avg loss 7.707 |avg tokens 2180.000 |tokens/s 8303.467 |walltime 6559.566 | +Transformer | epoch 0 | step 24810 |avg loss 8.300 |avg tokens 2264.800 |tokens/s 8828.613 |walltime 6562.131 | +Transformer | epoch 0 | step 24820 |avg loss 7.692 |avg tokens 2292.800 |tokens/s 8316.284 |walltime 6564.888 | +Transformer | epoch 0 | step 24830 |avg loss 8.077 |avg tokens 2142.400 |tokens/s 8076.765 |walltime 6567.541 | +Transformer | epoch 0 | step 24840 |avg loss 7.695 |avg tokens 2096.200 |tokens/s 7758.686 |walltime 6570.242 | +Transformer | epoch 0 | step 24850 |avg loss 7.836 |avg tokens 2181.900 |tokens/s 8187.110 |walltime 6572.908 | +Transformer | epoch 0 | step 24860 |avg loss 7.685 |avg tokens 2086.400 |tokens/s 7884.259 |walltime 6575.554 | +Transformer | epoch 0 | step 24870 |avg loss 7.847 |avg tokens 2056.500 |tokens/s 7922.088 |walltime 6578.150 | +Transformer | epoch 0 | step 24880 |avg loss 7.455 |avg tokens 2274.800 |tokens/s 8198.028 |walltime 6580.925 | +Transformer | epoch 0 | step 24890 |avg loss 7.746 |avg tokens 2209.900 |tokens/s 8299.425 |walltime 6583.587 | +Transformer | epoch 0 | step 24900 |avg loss 7.757 |avg tokens 2282.400 |tokens/s 8475.771 |walltime 6586.280 | +Transformer | epoch 0 | step 24910 |avg loss 7.636 |avg tokens 2294.700 |tokens/s 8294.522 |walltime 6589.047 | +Transformer | epoch 0 | step 24920 |avg loss 8.134 |avg tokens 2049.100 |tokens/s 7803.180 |walltime 6591.673 | +Transformer | epoch 0 | step 24930 |avg loss 7.522 |avg tokens 2407.200 |tokens/s 8792.785 |walltime 6594.410 | +Transformer | epoch 0 | step 24940 |avg loss 7.605 |avg tokens 2189.300 |tokens/s 8137.390 |walltime 6597.101 | +Transformer | epoch 0 | step 24950 |avg loss 7.616 |avg tokens 2064.000 |tokens/s 7815.999 |walltime 6599.742 | +Transformer | epoch 0 | step 24960 |avg loss 7.946 |avg tokens 1964.000 |tokens/s 7666.430 |walltime 6602.303 | +Transformer | epoch 0 | step 24970 |avg loss 7.628 |avg tokens 2327.000 |tokens/s 9006.033 |walltime 6604.887 | +Transformer | epoch 0 | step 24980 |avg loss 7.971 |avg tokens 2178.300 |tokens/s 8511.635 |walltime 6607.446 | +Transformer | epoch 0 | step 24990 |avg loss 7.924 |avg tokens 1960.200 |tokens/s 7952.094 |walltime 6609.911 | +Transformer | epoch 0 | step 25000 |avg loss 7.587 |avg tokens 2261.600 |tokens/s 8328.569 |walltime 6612.627 | +Transformer | epoch 0 | step 25010 |avg loss 7.779 |avg tokens 1982.200 |tokens/s 7728.871 |walltime 6615.192 | +Transformer | epoch 0 | step 25020 |avg loss 7.843 |avg tokens 2220.400 |tokens/s 8095.343 |walltime 6617.934 | +Transformer | epoch 0 | step 25030 |avg loss 7.611 |avg tokens 2244.800 |tokens/s 8313.021 |walltime 6620.635 | +Transformer | epoch 0 | step 25040 |avg loss 7.602 |avg tokens 2188.000 |tokens/s 8185.755 |walltime 6623.308 | +Transformer | epoch 0 | step 25050 |avg loss 7.795 |avg tokens 2246.900 |tokens/s 8460.503 |walltime 6625.963 | +Transformer | epoch 0 | step 25060 |avg loss 7.687 |avg tokens 2350.400 |tokens/s 8292.871 |walltime 6628.798 | +Transformer | epoch 0 | step 25070 |avg loss 7.717 |avg tokens 2099.000 |tokens/s 7987.941 |walltime 6631.425 | +Transformer | epoch 0 | step 25080 |avg loss 7.344 |avg tokens 2226.400 |tokens/s 8037.711 |walltime 6634.195 | +Transformer | epoch 0 | step 25090 |avg loss 7.448 |avg tokens 2362.100 |tokens/s 8470.327 |walltime 6636.984 | +Transformer | epoch 0 | step 25100 |avg loss 8.058 |avg tokens 2209.000 |tokens/s 8695.182 |walltime 6639.524 | +Transformer | epoch 0 | step 25110 |avg loss 8.299 |avg tokens 2099.100 |tokens/s 8958.700 |walltime 6641.868 | +Transformer | epoch 0 | step 25120 |avg loss 7.718 |avg tokens 2280.000 |tokens/s 8364.449 |walltime 6644.593 | +Transformer | epoch 0 | step 25130 |avg loss 7.704 |avg tokens 2209.400 |tokens/s 8223.527 |walltime 6647.280 | +Transformer | epoch 0 | step 25140 |avg loss 7.529 |avg tokens 2258.400 |tokens/s 8197.115 |walltime 6650.035 | +Transformer | epoch 0 | step 25150 |avg loss 7.648 |avg tokens 2139.700 |tokens/s 8019.836 |walltime 6652.703 | +Transformer | epoch 0 | step 25160 |avg loss 7.660 |avg tokens 2322.600 |tokens/s 8323.339 |walltime 6655.494 | +Transformer | epoch 0 | step 25170 |avg loss 7.590 |avg tokens 2249.600 |tokens/s 8118.051 |walltime 6658.265 | +Transformer | epoch 0 | step 25180 |avg loss 7.959 |avg tokens 2185.100 |tokens/s 8265.818 |walltime 6660.908 | +Transformer | epoch 0 | step 25190 |avg loss 7.626 |avg tokens 2254.300 |tokens/s 8165.306 |walltime 6663.669 | +Transformer | epoch 0 | step 25200 |avg loss 7.973 |avg tokens 2360.500 |tokens/s 8696.751 |walltime 6666.383 | +Transformer | epoch 0 | step 25210 |avg loss 7.506 |avg tokens 2219.400 |tokens/s 8108.224 |walltime 6669.121 | +Transformer | epoch 0 | step 25220 |avg loss 7.897 |avg tokens 2235.200 |tokens/s 8566.238 |walltime 6671.730 | +Transformer | epoch 0 | step 25230 |avg loss 7.614 |avg tokens 2235.100 |tokens/s 8176.387 |walltime 6674.464 | +Transformer | epoch 0 | step 25240 |avg loss 7.976 |avg tokens 2302.000 |tokens/s 8879.880 |walltime 6677.056 | +Transformer | epoch 0 | step 25250 |avg loss 7.612 |avg tokens 2172.800 |tokens/s 7851.959 |walltime 6679.823 | +Transformer | epoch 0 | step 25260 |avg loss 7.752 |avg tokens 2158.900 |tokens/s 8079.835 |walltime 6682.495 | +Transformer | epoch 0 | step 25270 |avg loss 7.474 |avg tokens 2320.800 |tokens/s 8192.037 |walltime 6685.328 | +Transformer | epoch 0 | step 25280 |avg loss 7.773 |avg tokens 2142.300 |tokens/s 7938.624 |walltime 6688.027 | +Transformer | epoch 0 | step 25290 |avg loss 7.255 |avg tokens 2231.400 |tokens/s 8178.106 |walltime 6690.755 | +Transformer | epoch 0 | step 25300 |avg loss 8.134 |avg tokens 2056.800 |tokens/s 8410.814 |walltime 6693.201 | +Transformer | epoch 0 | step 25310 |avg loss 7.280 |avg tokens 2300.800 |tokens/s 8113.570 |walltime 6696.036 | +Transformer | epoch 0 | step 25320 |avg loss 7.858 |avg tokens 2220.000 |tokens/s 8447.764 |walltime 6698.664 | +Transformer | epoch 0 | step 25330 |avg loss 7.798 |avg tokens 2293.200 |tokens/s 8627.279 |walltime 6701.322 | +Transformer | epoch 0 | step 25340 |avg loss 7.876 |avg tokens 2125.300 |tokens/s 7627.321 |walltime 6704.109 | +Transformer | epoch 0 | step 25350 |avg loss 8.098 |avg tokens 2064.300 |tokens/s 8082.395 |walltime 6706.663 | +Transformer | epoch 0 | step 25360 |avg loss 7.650 |avg tokens 2399.000 |tokens/s 8525.145 |walltime 6709.477 | +Transformer | epoch 0 | step 25370 |avg loss 7.851 |avg tokens 2129.400 |tokens/s 8200.198 |walltime 6712.074 | +Transformer | epoch 0 | step 25380 |avg loss 8.135 |avg tokens 2134.700 |tokens/s 8281.845 |walltime 6714.651 | +Transformer | epoch 0 | step 25390 |avg loss 7.725 |avg tokens 2148.000 |tokens/s 7971.536 |walltime 6717.346 | +Transformer | epoch 0 | step 25400 |avg loss 7.932 |avg tokens 2175.200 |tokens/s 8378.510 |walltime 6719.942 | +Transformer | epoch 0 | step 25410 |avg loss 8.015 |avg tokens 2226.800 |tokens/s 8551.469 |walltime 6722.546 | +Transformer | epoch 0 | step 25420 |avg loss 7.667 |avg tokens 2312.800 |tokens/s 8591.632 |walltime 6725.238 | +Transformer | epoch 0 | step 25430 |avg loss 8.021 |avg tokens 2024.200 |tokens/s 7903.803 |walltime 6727.799 | +Transformer | epoch 0 | step 25440 |avg loss 8.025 |avg tokens 2125.400 |tokens/s 8099.256 |walltime 6730.423 | +Transformer | epoch 0 | step 25450 |avg loss 7.764 |avg tokens 2212.700 |tokens/s 8371.759 |walltime 6733.066 | +Transformer | epoch 0 | step 25460 |avg loss 7.581 |avg tokens 2061.000 |tokens/s 7733.550 |walltime 6735.731 | +Transformer | epoch 0 | step 25470 |avg loss 7.640 |avg tokens 2315.200 |tokens/s 8463.204 |walltime 6738.467 | +Transformer | epoch 0 | step 25480 |avg loss 7.687 |avg tokens 2070.400 |tokens/s 7754.831 |walltime 6741.137 | +Transformer | epoch 0 | step 25490 |avg loss 7.722 |avg tokens 2234.900 |tokens/s 8323.079 |walltime 6743.822 | +Transformer | epoch 0 | step 25500 |avg loss 7.934 |avg tokens 2117.900 |tokens/s 8279.778 |walltime 6746.380 | +Transformer | epoch 0 | step 25510 |avg loss 7.625 |avg tokens 2213.600 |tokens/s 8429.349 |walltime 6749.006 | +Transformer | epoch 0 | step 25520 |avg loss 8.295 |avg tokens 1895.300 |tokens/s 7883.050 |walltime 6751.410 | +Transformer | epoch 0 | step 25530 |avg loss 7.964 |avg tokens 2111.300 |tokens/s 8221.760 |walltime 6753.978 | +Transformer | epoch 0 | step 25540 |avg loss 7.691 |avg tokens 2295.500 |tokens/s 8438.419 |walltime 6756.698 | +Transformer | epoch 0 | step 25550 |avg loss 7.698 |avg tokens 2227.100 |tokens/s 8196.871 |walltime 6759.415 | +Transformer | epoch 0 | step 25560 |avg loss 7.976 |avg tokens 2119.500 |tokens/s 8004.032 |walltime 6762.063 | +Transformer | epoch 0 | step 25570 |avg loss 7.579 |avg tokens 2273.100 |tokens/s 8442.716 |walltime 6764.756 | +Transformer | epoch 0 | step 25580 |avg loss 7.974 |avg tokens 2285.600 |tokens/s 8469.601 |walltime 6767.454 | +Transformer | epoch 0 | step 25590 |avg loss 7.308 |avg tokens 2340.000 |tokens/s 8376.021 |walltime 6770.248 | +Transformer | epoch 0 | step 25600 |avg loss 7.431 |avg tokens 2280.800 |tokens/s 8419.995 |walltime 6772.957 | +Transformer | epoch 0 | step 25610 |avg loss 7.557 |avg tokens 2362.400 |tokens/s 8778.091 |walltime 6775.648 | +Transformer | epoch 0 | step 25620 |avg loss 8.189 |avg tokens 1887.600 |tokens/s 7840.019 |walltime 6778.056 | +Transformer | epoch 0 | step 25630 |avg loss 7.728 |avg tokens 2281.800 |tokens/s 8533.620 |walltime 6780.730 | +Transformer | epoch 0 | step 25640 |avg loss 7.498 |avg tokens 2248.000 |tokens/s 8148.686 |walltime 6783.488 | +Transformer | epoch 0 | step 25650 |avg loss 8.034 |avg tokens 2366.800 |tokens/s 8830.997 |walltime 6786.169 | +Transformer | epoch 0 | step 25660 |avg loss 8.153 |avg tokens 1982.200 |tokens/s 7968.943 |walltime 6788.656 | +Transformer | epoch 0 | step 25670 |avg loss 7.447 |avg tokens 1943.400 |tokens/s 7529.712 |walltime 6791.237 | +Transformer | epoch 0 | step 25680 |avg loss 7.268 |avg tokens 2266.400 |tokens/s 8292.471 |walltime 6793.970 | +Transformer | epoch 0 | step 25690 |avg loss 8.017 |avg tokens 2108.400 |tokens/s 8341.905 |walltime 6796.498 | +Transformer | epoch 0 | step 25700 |avg loss 7.594 |avg tokens 2257.400 |tokens/s 8241.765 |walltime 6799.237 | +Transformer | epoch 0 | step 25710 |avg loss 7.777 |avg tokens 2016.400 |tokens/s 7932.669 |walltime 6801.778 | +Transformer | epoch 0 | step 25720 |avg loss 7.764 |avg tokens 2056.900 |tokens/s 7965.004 |walltime 6804.361 | +Transformer | epoch 0 | step 25730 |avg loss 7.437 |avg tokens 2383.200 |tokens/s 8834.443 |walltime 6807.058 | +Transformer | epoch 0 | step 25740 |avg loss 8.149 |avg tokens 2244.400 |tokens/s 8831.189 |walltime 6809.600 | +Transformer | epoch 0 | step 25750 |avg loss 7.864 |avg tokens 2355.200 |tokens/s 8804.159 |walltime 6812.275 | +Transformer | epoch 0 | step 25760 |avg loss 7.829 |avg tokens 2355.700 |tokens/s 8924.028 |walltime 6814.915 | +Transformer | epoch 0 | step 25770 |avg loss 7.725 |avg tokens 2409.200 |tokens/s 8681.060 |walltime 6817.690 | +Transformer | epoch 0 | step 25780 |avg loss 7.711 |avg tokens 2255.200 |tokens/s 8323.505 |walltime 6820.399 | +Transformer | epoch 0 | step 25790 |avg loss 7.588 |avg tokens 2242.000 |tokens/s 8299.703 |walltime 6823.101 | +Transformer | epoch 0 | step 25800 |avg loss 7.850 |avg tokens 2318.400 |tokens/s 8660.447 |walltime 6825.778 | +Transformer | epoch 0 | step 25810 |avg loss 7.306 |avg tokens 2309.600 |tokens/s 8422.088 |walltime 6828.520 | +Transformer | epoch 0 | step 25820 |avg loss 7.950 |avg tokens 2365.100 |tokens/s 8487.563 |walltime 6831.307 | +Transformer | epoch 0 | step 25830 |avg loss 7.561 |avg tokens 2305.200 |tokens/s 8267.751 |walltime 6834.095 | +Transformer | epoch 0 | step 25840 |avg loss 7.659 |avg tokens 2286.200 |tokens/s 8404.828 |walltime 6836.815 | +Transformer | epoch 0 | step 25850 |avg loss 7.766 |avg tokens 2176.000 |tokens/s 8020.933 |walltime 6839.528 | +Transformer | epoch 0 | step 25860 |avg loss 7.555 |avg tokens 2398.400 |tokens/s 8581.878 |walltime 6842.323 | +Transformer | epoch 0 | step 25870 |avg loss 7.412 |avg tokens 2165.400 |tokens/s 8320.874 |walltime 6844.925 | +Transformer | epoch 0 | step 25880 |avg loss 7.630 |avg tokens 2247.900 |tokens/s 8420.457 |walltime 6847.594 | +Transformer | epoch 0 | step 25890 |avg loss 7.408 |avg tokens 2306.400 |tokens/s 8385.899 |walltime 6850.345 | +Transformer | epoch 0 | step 25900 |avg loss 7.532 |avg tokens 2285.300 |tokens/s 8086.153 |walltime 6853.171 | +Transformer | epoch 0 | step 25910 |avg loss 7.740 |avg tokens 2276.000 |tokens/s 8422.970 |walltime 6855.873 | +Transformer | epoch 0 | step 25920 |avg loss 7.727 |avg tokens 2239.200 |tokens/s 8223.782 |walltime 6858.596 | +Transformer | epoch 0 | step 25930 |avg loss 7.781 |avg tokens 2016.100 |tokens/s 7962.089 |walltime 6861.128 | +Transformer | epoch 0 | step 25940 |avg loss 7.698 |avg tokens 2296.200 |tokens/s 8710.549 |walltime 6863.764 | +Transformer | epoch 0 | step 25950 |avg loss 7.645 |avg tokens 2339.200 |tokens/s 8782.263 |walltime 6866.428 | +Transformer | epoch 0 | step 25960 |avg loss 8.055 |avg tokens 1831.000 |tokens/s 7814.789 |walltime 6868.771 | +Transformer | epoch 0 | step 25970 |avg loss 8.006 |avg tokens 2013.400 |tokens/s 8157.432 |walltime 6871.239 | +Transformer | epoch 0 | step 25980 |avg loss 8.049 |avg tokens 2099.100 |tokens/s 8003.709 |walltime 6873.862 | +Transformer | epoch 0 | step 25990 |avg loss 8.095 |avg tokens 2175.100 |tokens/s 8279.695 |walltime 6876.489 | +Transformer | epoch 0 | step 26000 |avg loss 7.707 |avg tokens 2274.500 |tokens/s 8267.188 |walltime 6879.240 | +Transformer | epoch 0 | step 26010 |avg loss 7.551 |avg tokens 2326.400 |tokens/s 8597.852 |walltime 6881.946 | +Transformer | epoch 0 | step 26020 |avg loss 8.226 |avg tokens 2331.300 |tokens/s 9051.910 |walltime 6884.521 | +Transformer | epoch 0 | step 26030 |avg loss 8.015 |avg tokens 2127.800 |tokens/s 8233.617 |walltime 6887.105 | +Transformer | epoch 0 | step 26040 |avg loss 7.708 |avg tokens 2302.400 |tokens/s 8513.205 |walltime 6889.810 | +Transformer | epoch 0 | step 26050 |avg loss 7.905 |avg tokens 2100.900 |tokens/s 8152.966 |walltime 6892.387 | +Transformer | epoch 0 | step 26060 |avg loss 7.320 |avg tokens 2262.400 |tokens/s 8259.793 |walltime 6895.126 | +Transformer | epoch 0 | step 26070 |avg loss 7.608 |avg tokens 2320.800 |tokens/s 8539.843 |walltime 6897.844 | +Transformer | epoch 0 | step 26080 |avg loss 7.716 |avg tokens 2218.500 |tokens/s 8448.401 |walltime 6900.469 | +Transformer | epoch 0 | step 26090 |avg loss 7.788 |avg tokens 2288.000 |tokens/s 8549.245 |walltime 6903.146 | +Transformer | epoch 0 | step 26100 |avg loss 7.703 |avg tokens 2178.300 |tokens/s 8084.530 |walltime 6905.840 | +Transformer | epoch 0 | step 26110 |avg loss 7.562 |avg tokens 2305.100 |tokens/s 8338.072 |walltime 6908.605 | +Transformer | epoch 0 | step 26120 |avg loss 7.499 |avg tokens 2191.200 |tokens/s 8117.604 |walltime 6911.304 | +Transformer | epoch 0 | step 26130 |avg loss 7.807 |avg tokens 2369.000 |tokens/s 8788.923 |walltime 6913.999 | +Transformer | epoch 0 | step 26140 |avg loss 7.351 |avg tokens 2191.000 |tokens/s 8061.972 |walltime 6916.717 | +Transformer | epoch 0 | step 26150 |avg loss 7.417 |avg tokens 2005.300 |tokens/s 7655.047 |walltime 6919.337 | +Transformer | epoch 0 | step 26160 |avg loss 7.799 |avg tokens 2093.600 |tokens/s 8134.140 |walltime 6921.911 | +Transformer | epoch 0 | step 26170 |avg loss 7.582 |avg tokens 2192.400 |tokens/s 8284.764 |walltime 6924.557 | +Transformer | epoch 0 | step 26180 |avg loss 8.151 |avg tokens 2073.600 |tokens/s 8036.578 |walltime 6927.137 | +Transformer | epoch 0 | step 26190 |avg loss 7.554 |avg tokens 2152.000 |tokens/s 8046.157 |walltime 6929.812 | +Transformer | epoch 0 | step 26200 |avg loss 7.510 |avg tokens 2254.400 |tokens/s 8330.002 |walltime 6932.518 | +Transformer | epoch 0 | step 26210 |avg loss 7.770 |avg tokens 2331.700 |tokens/s 8610.699 |walltime 6935.226 | +Transformer | epoch 0 | step 26220 |avg loss 7.858 |avg tokens 2252.300 |tokens/s 8494.988 |walltime 6937.877 | +Transformer | epoch 0 | step 26230 |avg loss 7.290 |avg tokens 2213.100 |tokens/s 8310.957 |walltime 6940.540 | +Transformer | epoch 0 | step 26240 |avg loss 7.721 |avg tokens 2081.700 |tokens/s 8082.829 |walltime 6943.116 | +Transformer | epoch 0 | step 26250 |avg loss 8.352 |avg tokens 2004.900 |tokens/s 8419.694 |walltime 6945.497 | +Transformer | epoch 0 | step 26260 |avg loss 8.126 |avg tokens 2031.600 |tokens/s 8230.212 |walltime 6947.965 | +Transformer | epoch 0 | step 26270 |avg loss 7.668 |avg tokens 2217.600 |tokens/s 8356.025 |walltime 6950.619 | +Transformer | epoch 0 | step 26280 |avg loss 7.926 |avg tokens 2233.200 |tokens/s 8433.321 |walltime 6953.267 | +Transformer | epoch 0 | step 26290 |avg loss 7.885 |avg tokens 2094.400 |tokens/s 8225.951 |walltime 6955.813 | +Transformer | epoch 0 | step 26300 |avg loss 7.474 |avg tokens 2350.900 |tokens/s 8518.570 |walltime 6958.573 | +Transformer | epoch 0 | step 26310 |avg loss 7.387 |avg tokens 2272.800 |tokens/s 8510.640 |walltime 6961.244 | +Transformer | epoch 0 | step 26320 |avg loss 7.621 |avg tokens 2257.900 |tokens/s 8291.738 |walltime 6963.967 | +Transformer | epoch 0 | step 26330 |avg loss 7.482 |avg tokens 2297.400 |tokens/s 8362.915 |walltime 6966.714 | +Transformer | epoch 0 | step 26340 |avg loss 7.840 |avg tokens 2077.200 |tokens/s 8149.756 |walltime 6969.263 | +Transformer | epoch 0 | step 26350 |avg loss 7.909 |avg tokens 2168.400 |tokens/s 8468.569 |walltime 6971.823 | +Transformer | epoch 0 | step 26360 |avg loss 7.854 |avg tokens 2307.200 |tokens/s 8658.592 |walltime 6974.488 | +Transformer | epoch 0 | step 26370 |avg loss 7.727 |avg tokens 2153.300 |tokens/s 8376.751 |walltime 6977.058 | +Transformer | epoch 0 | step 26380 |avg loss 7.961 |avg tokens 2369.700 |tokens/s 9000.607 |walltime 6979.691 | +Transformer | epoch 0 | step 26390 |avg loss 7.806 |avg tokens 2083.200 |tokens/s 8045.995 |walltime 6982.280 | +Transformer | epoch 0 | step 26400 |avg loss 7.618 |avg tokens 2276.700 |tokens/s 8644.314 |walltime 6984.914 | +Transformer | epoch 0 | step 26410 |avg loss 7.685 |avg tokens 2176.400 |tokens/s 8238.847 |walltime 6987.556 | +Transformer | epoch 0 | step 26420 |avg loss 7.614 |avg tokens 2000.900 |tokens/s 7758.497 |walltime 6990.135 | +Transformer | epoch 0 | step 26430 |avg loss 8.296 |avg tokens 1863.100 |tokens/s 7590.196 |walltime 6992.589 | +Transformer | epoch 0 | step 26440 |avg loss 7.793 |avg tokens 2326.400 |tokens/s 8496.064 |walltime 6995.328 | +Transformer | epoch 0 | step 26450 |avg loss 7.907 |avg tokens 2157.700 |tokens/s 8234.966 |walltime 6997.948 | +Transformer | epoch 0 | step 26460 |avg loss 7.814 |avg tokens 2304.800 |tokens/s 8444.012 |walltime 7000.677 | +Transformer | epoch 0 | step 26470 |avg loss 7.674 |avg tokens 2282.900 |tokens/s 8624.902 |walltime 7003.324 | +Transformer | epoch 0 | step 26480 |avg loss 7.613 |avg tokens 2204.200 |tokens/s 8123.884 |walltime 7006.037 | +Transformer | epoch 0 | step 26490 |avg loss 7.654 |avg tokens 2147.400 |tokens/s 8309.509 |walltime 7008.622 | +Transformer | epoch 0 | step 26500 |avg loss 7.965 |avg tokens 2350.800 |tokens/s 8771.539 |walltime 7011.302 | +Transformer | epoch 0 | step 26510 |avg loss 7.493 |avg tokens 2132.800 |tokens/s 8023.235 |walltime 7013.960 | +Transformer | epoch 0 | step 26520 |avg loss 8.115 |avg tokens 2300.900 |tokens/s 9059.692 |walltime 7016.500 | +Transformer | epoch 0 | step 26530 |avg loss 8.011 |avg tokens 2116.300 |tokens/s 8278.335 |walltime 7019.056 | +Transformer | epoch 0 | step 26540 |avg loss 7.823 |avg tokens 2308.800 |tokens/s 8429.321 |walltime 7021.795 | +Transformer | epoch 0 | step 26550 |avg loss 7.910 |avg tokens 2184.700 |tokens/s 8425.839 |walltime 7024.388 | +Transformer | epoch 0 | step 26560 |avg loss 7.648 |avg tokens 2356.800 |tokens/s 8649.805 |walltime 7027.113 | +Transformer | epoch 0 | step 26570 |avg loss 7.486 |avg tokens 2176.200 |tokens/s 8303.130 |walltime 7029.734 | +Transformer | epoch 0 | step 26580 |avg loss 7.831 |avg tokens 2179.100 |tokens/s 8266.884 |walltime 7032.370 | +Transformer | epoch 0 | step 26590 |avg loss 7.749 |avg tokens 2097.200 |tokens/s 8140.806 |walltime 7034.946 | +Transformer | epoch 0 | step 26600 |avg loss 7.622 |avg tokens 2310.300 |tokens/s 8572.244 |walltime 7037.641 | +Transformer | epoch 0 | step 26610 |avg loss 7.855 |avg tokens 2250.200 |tokens/s 8381.710 |walltime 7040.325 | +Transformer | epoch 0 | step 26620 |avg loss 7.712 |avg tokens 2134.700 |tokens/s 8094.264 |walltime 7042.963 | +Transformer | epoch 0 | step 26630 |avg loss 7.646 |avg tokens 2213.600 |tokens/s 8107.254 |walltime 7045.693 | +Transformer | epoch 0 | step 26640 |avg loss 8.188 |avg tokens 2166.400 |tokens/s 8483.707 |walltime 7048.247 | +Transformer | epoch 0 | step 26650 |avg loss 7.885 |avg tokens 1995.900 |tokens/s 8033.977 |walltime 7050.731 | +Transformer | epoch 0 | step 26660 |avg loss 7.774 |avg tokens 2078.200 |tokens/s 8159.630 |walltime 7053.278 | +Transformer | epoch 0 | step 26670 |avg loss 7.680 |avg tokens 2169.800 |tokens/s 8286.875 |walltime 7055.896 | +Transformer | epoch 0 | step 26680 |avg loss 7.646 |avg tokens 2055.700 |tokens/s 7665.725 |walltime 7058.578 | +Transformer | epoch 0 | step 26690 |avg loss 7.633 |avg tokens 2244.000 |tokens/s 8134.054 |walltime 7061.337 | +Transformer | epoch 0 | step 26700 |avg loss 7.744 |avg tokens 2263.800 |tokens/s 8766.121 |walltime 7063.919 | +Transformer | epoch 0 | step 26710 |avg loss 7.817 |avg tokens 2298.400 |tokens/s 8436.186 |walltime 7066.644 | +Transformer | epoch 0 | step 26720 |avg loss 7.396 |avg tokens 2162.800 |tokens/s 8299.763 |walltime 7069.250 | +Transformer | epoch 0 | step 26730 |avg loss 8.159 |avg tokens 2008.200 |tokens/s 8197.238 |walltime 7071.699 | +Transformer | epoch 0 | step 26740 |avg loss 8.089 |avg tokens 2054.500 |tokens/s 7997.647 |walltime 7074.268 | +Transformer | epoch 0 | step 26750 |avg loss 7.498 |avg tokens 2402.400 |tokens/s 8613.554 |walltime 7077.057 | +Transformer | epoch 0 | step 26760 |avg loss 8.067 |avg tokens 2211.200 |tokens/s 8524.872 |walltime 7079.651 | +Transformer | epoch 0 | step 26770 |avg loss 7.796 |avg tokens 2147.000 |tokens/s 8182.132 |walltime 7082.275 | +Transformer | epoch 0 | step 26780 |avg loss 7.781 |avg tokens 2046.300 |tokens/s 7862.902 |walltime 7084.878 | +Transformer | epoch 0 | step 26790 |avg loss 7.682 |avg tokens 2066.700 |tokens/s 7724.394 |walltime 7087.553 | +Transformer | epoch 0 | step 26800 |avg loss 7.854 |avg tokens 2177.800 |tokens/s 8304.167 |walltime 7090.176 | +Transformer | epoch 0 | step 26810 |avg loss 7.535 |avg tokens 2349.400 |tokens/s 8684.408 |walltime 7092.881 | +Transformer | epoch 0 | step 26820 |avg loss 7.906 |avg tokens 2376.000 |tokens/s 9189.604 |walltime 7095.467 | +Transformer | epoch 0 | step 26830 |avg loss 7.648 |avg tokens 2336.800 |tokens/s 8504.640 |walltime 7098.214 | +Transformer | epoch 0 | step 26840 |avg loss 7.655 |avg tokens 2077.000 |tokens/s 7876.362 |walltime 7100.851 | +Transformer | epoch 0 | step 26850 |avg loss 7.497 |avg tokens 2227.900 |tokens/s 8254.331 |walltime 7103.550 | +Transformer | epoch 0 | step 26860 |avg loss 7.454 |avg tokens 2160.600 |tokens/s 8335.138 |walltime 7106.143 | +Transformer | epoch 0 | step 26870 |avg loss 7.803 |avg tokens 2386.100 |tokens/s 8628.912 |walltime 7108.908 | +Transformer | epoch 0 | step 26880 |avg loss 7.721 |avg tokens 2183.600 |tokens/s 8114.566 |walltime 7111.599 | +Transformer | epoch 0 | step 26890 |avg loss 8.056 |avg tokens 1954.700 |tokens/s 7848.249 |walltime 7114.089 | +Transformer | epoch 0 | step 26900 |avg loss 8.061 |avg tokens 1934.100 |tokens/s 7567.447 |walltime 7116.645 | +Transformer | epoch 0 | step 26910 |avg loss 7.821 |avg tokens 2261.200 |tokens/s 8329.642 |walltime 7119.360 | +Transformer | epoch 0 | step 26920 |avg loss 7.993 |avg tokens 1986.500 |tokens/s 7936.593 |walltime 7121.863 | +Transformer | epoch 0 | step 26930 |avg loss 7.824 |avg tokens 2088.600 |tokens/s 8087.995 |walltime 7124.445 | +Transformer | epoch 0 | step 26940 |avg loss 7.331 |avg tokens 2313.900 |tokens/s 8230.872 |walltime 7127.256 | +Transformer | epoch 0 | step 26950 |avg loss 7.645 |avg tokens 2226.000 |tokens/s 8214.763 |walltime 7129.966 | +Transformer | epoch 0 | step 26960 |avg loss 7.621 |avg tokens 2372.800 |tokens/s 8463.563 |walltime 7132.770 | +Transformer | epoch 0 | step 26970 |avg loss 7.519 |avg tokens 2304.700 |tokens/s 8297.949 |walltime 7135.547 | +Transformer | epoch 0 | step 26980 |avg loss 7.775 |avg tokens 1958.800 |tokens/s 7624.696 |walltime 7138.116 | +Transformer | epoch 0 | step 26990 |avg loss 7.758 |avg tokens 2088.000 |tokens/s 7931.154 |walltime 7140.749 | +Transformer | epoch 0 | step 27000 |avg loss 7.903 |avg tokens 1957.000 |tokens/s 7813.582 |walltime 7143.254 | +Transformer | epoch 0 | step 27010 |avg loss 7.675 |avg tokens 2022.400 |tokens/s 7742.162 |walltime 7145.866 | +Transformer | epoch 0 | step 27020 |avg loss 7.536 |avg tokens 2317.500 |tokens/s 8542.468 |walltime 7148.579 | +Transformer | epoch 0 | step 27030 |avg loss 7.639 |avg tokens 2192.000 |tokens/s 8116.512 |walltime 7151.279 | +Transformer | epoch 0 | step 27040 |avg loss 7.359 |avg tokens 2344.800 |tokens/s 8326.600 |walltime 7154.095 | +Transformer | epoch 0 | step 27050 |avg loss 7.850 |avg tokens 2294.200 |tokens/s 8824.071 |walltime 7156.695 | +Transformer | epoch 0 | step 27060 |avg loss 7.615 |avg tokens 2210.400 |tokens/s 8102.654 |walltime 7159.423 | +Transformer | epoch 0 | step 27070 |avg loss 7.850 |avg tokens 2252.700 |tokens/s 8269.188 |walltime 7162.148 | +Transformer | epoch 0 | step 27080 |avg loss 7.469 |avg tokens 2369.500 |tokens/s 8601.594 |walltime 7164.902 | +Transformer | epoch 0 | step 27090 |avg loss 8.024 |avg tokens 2185.200 |tokens/s 8377.894 |walltime 7167.511 | +Transformer | epoch 0 | step 27100 |avg loss 8.054 |avg tokens 2129.400 |tokens/s 8015.930 |walltime 7170.167 | +Transformer | epoch 0 | step 27110 |avg loss 8.374 |avg tokens 2010.300 |tokens/s 8286.372 |walltime 7172.593 | +Transformer | epoch 0 | step 27120 |avg loss 8.199 |avg tokens 1851.400 |tokens/s 7632.943 |walltime 7175.019 | +Transformer | epoch 0 | step 27130 |avg loss 7.635 |avg tokens 2215.200 |tokens/s 8246.311 |walltime 7177.705 | +Transformer | epoch 0 | step 27140 |avg loss 7.650 |avg tokens 2393.100 |tokens/s 8798.371 |walltime 7180.425 | +Transformer | epoch 0 | step 27150 |avg loss 7.478 |avg tokens 2253.100 |tokens/s 8370.699 |walltime 7183.116 | +Transformer | epoch 0 | step 27160 |avg loss 7.908 |avg tokens 2076.300 |tokens/s 8147.769 |walltime 7185.665 | +Transformer | epoch 0 | step 27170 |avg loss 7.703 |avg tokens 2370.200 |tokens/s 8652.228 |walltime 7188.404 | +Transformer | epoch 0 | step 27180 |avg loss 7.965 |avg tokens 2175.400 |tokens/s 8252.792 |walltime 7191.040 | +Transformer | epoch 0 | step 27190 |avg loss 7.414 |avg tokens 2170.400 |tokens/s 8103.542 |walltime 7193.718 | +Transformer | epoch 0 | step 27200 |avg loss 7.694 |avg tokens 2223.300 |tokens/s 8359.415 |walltime 7196.378 | +Transformer | epoch 0 | step 27210 |avg loss 7.932 |avg tokens 1919.100 |tokens/s 7636.762 |walltime 7198.891 | +Transformer | epoch 0 | step 27220 |avg loss 7.845 |avg tokens 2028.900 |tokens/s 7762.847 |walltime 7201.505 | +Transformer | epoch 0 | step 27230 |avg loss 7.496 |avg tokens 2442.800 |tokens/s 8824.616 |walltime 7204.273 | +Transformer | epoch 0 | step 27240 |avg loss 7.665 |avg tokens 2173.000 |tokens/s 8242.578 |walltime 7206.909 | +Transformer | epoch 0 | step 27250 |avg loss 8.150 |avg tokens 2302.800 |tokens/s 8689.630 |walltime 7209.559 | +Transformer | epoch 0 | step 27260 |avg loss 7.874 |avg tokens 2090.200 |tokens/s 7969.251 |walltime 7212.182 | +Transformer | epoch 0 | step 27270 |avg loss 8.123 |avg tokens 2289.600 |tokens/s 9062.690 |walltime 7214.709 | +Transformer | epoch 0 | step 27280 |avg loss 7.226 |avg tokens 2345.600 |tokens/s 8484.875 |walltime 7217.473 | +Transformer | epoch 0 | step 27290 |avg loss 7.841 |avg tokens 2279.100 |tokens/s 8425.751 |walltime 7220.178 | +Transformer | epoch 0 | step 27300 |avg loss 7.849 |avg tokens 2250.600 |tokens/s 8523.570 |walltime 7222.818 | +Transformer | epoch 0 | step 27310 |avg loss 7.744 |avg tokens 2292.600 |tokens/s 8431.212 |walltime 7225.538 | +Transformer | epoch 0 | step 27320 |avg loss 7.802 |avg tokens 2010.400 |tokens/s 7948.838 |walltime 7228.067 | +Transformer | epoch 0 | step 27330 |avg loss 7.748 |avg tokens 1978.400 |tokens/s 7680.690 |walltime 7230.643 | +Transformer | epoch 0 | step 27340 |avg loss 7.680 |avg tokens 2299.200 |tokens/s 8478.380 |walltime 7233.354 | +Transformer | epoch 0 | step 27350 |avg loss 7.727 |avg tokens 2100.200 |tokens/s 7805.018 |walltime 7236.045 | +Transformer | epoch 0 | step 27360 |avg loss 7.383 |avg tokens 2181.800 |tokens/s 8170.744 |walltime 7238.715 | +Transformer | epoch 0 | step 27370 |avg loss 7.617 |avg tokens 2169.300 |tokens/s 8177.349 |walltime 7241.368 | +Transformer | epoch 0 | step 27380 |avg loss 8.022 |avg tokens 2309.300 |tokens/s 8956.643 |walltime 7243.947 | +Transformer | epoch 0 | step 27390 |avg loss 7.764 |avg tokens 2150.800 |tokens/s 8015.663 |walltime 7246.630 | +Transformer | epoch 0 | step 27400 |avg loss 7.707 |avg tokens 2184.700 |tokens/s 8294.029 |walltime 7249.264 | +Transformer | epoch 0 | step 27410 |avg loss 7.659 |avg tokens 2103.200 |tokens/s 8021.347 |walltime 7251.886 | +Transformer | epoch 0 | step 27420 |avg loss 7.940 |avg tokens 2226.500 |tokens/s 8543.292 |walltime 7254.492 | +Transformer | epoch 0 | step 27430 |avg loss 7.314 |avg tokens 2324.800 |tokens/s 8341.027 |walltime 7257.279 | +Transformer | epoch 0 | step 27440 |avg loss 7.905 |avg tokens 2188.900 |tokens/s 8240.551 |walltime 7259.936 | +Transformer | epoch 0 | step 27450 |avg loss 7.752 |avg tokens 2097.800 |tokens/s 7894.434 |walltime 7262.593 | +Transformer | epoch 0 | step 27460 |avg loss 7.808 |avg tokens 2075.200 |tokens/s 8050.486 |walltime 7265.171 | +Transformer | epoch 0 | step 27470 |avg loss 8.088 |avg tokens 1857.500 |tokens/s 7765.063 |walltime 7267.563 | +Transformer | epoch 0 | step 27480 |avg loss 7.522 |avg tokens 2387.200 |tokens/s 8539.442 |walltime 7270.358 | +Transformer | epoch 0 | step 27490 |avg loss 7.806 |avg tokens 2223.200 |tokens/s 8615.974 |walltime 7272.939 | +Transformer | epoch 0 | step 27500 |avg loss 7.765 |avg tokens 1982.200 |tokens/s 7749.836 |walltime 7275.496 | +Transformer | epoch 0 | step 27510 |avg loss 7.825 |avg tokens 2120.700 |tokens/s 7930.252 |walltime 7278.170 | +Transformer | epoch 0 | step 27520 |avg loss 7.675 |avg tokens 2204.300 |tokens/s 8171.551 |walltime 7280.868 | +Transformer | epoch 0 | step 27530 |avg loss 7.989 |avg tokens 2257.400 |tokens/s 8574.780 |walltime 7283.501 | +Transformer | epoch 0 | step 27540 |avg loss 7.530 |avg tokens 2300.000 |tokens/s 8261.065 |walltime 7286.285 | +Transformer | epoch 0 | step 27550 |avg loss 7.694 |avg tokens 2259.600 |tokens/s 8292.065 |walltime 7289.010 | +Transformer | epoch 0 | step 27560 |avg loss 7.224 |avg tokens 2229.500 |tokens/s 8111.760 |walltime 7291.758 | +Transformer | epoch 0 | step 27570 |avg loss 7.890 |avg tokens 2227.200 |tokens/s 8447.566 |walltime 7294.395 | +Transformer | epoch 0 | step 27580 |avg loss 7.523 |avg tokens 2233.600 |tokens/s 8225.421 |walltime 7297.110 | +Transformer | epoch 0 | step 27590 |avg loss 7.806 |avg tokens 2187.800 |tokens/s 8347.836 |walltime 7299.731 | +Transformer | epoch 0 | step 27600 |avg loss 8.046 |avg tokens 2295.200 |tokens/s 8891.975 |walltime 7302.312 | +Transformer | epoch 0 | step 27610 |avg loss 7.769 |avg tokens 2167.600 |tokens/s 8198.887 |walltime 7304.956 | +Transformer | epoch 0 | step 27620 |avg loss 7.887 |avg tokens 2177.800 |tokens/s 8311.909 |walltime 7307.576 | +Transformer | epoch 0 | step 27630 |avg loss 8.033 |avg tokens 2307.100 |tokens/s 8639.699 |walltime 7310.246 | +Transformer | epoch 0 | step 27640 |avg loss 7.511 |avg tokens 2075.000 |tokens/s 7908.995 |walltime 7312.870 | +Transformer | epoch 0 | step 27650 |avg loss 7.760 |avg tokens 2150.200 |tokens/s 8239.438 |walltime 7315.480 | +Transformer | epoch 0 | step 27660 |avg loss 7.818 |avg tokens 2031.400 |tokens/s 7870.934 |walltime 7318.061 | +Transformer | epoch 0 | step 27670 |avg loss 7.969 |avg tokens 2208.300 |tokens/s 8363.161 |walltime 7320.701 | +Transformer | epoch 0 | step 27680 |avg loss 8.163 |avg tokens 2032.200 |tokens/s 8202.807 |walltime 7323.179 | +Transformer | epoch 0 | step 27690 |avg loss 7.843 |avg tokens 2203.400 |tokens/s 8331.530 |walltime 7325.823 | +Transformer | epoch 0 | step 27700 |avg loss 7.526 |avg tokens 2268.800 |tokens/s 8390.108 |walltime 7328.527 | +Transformer | epoch 0 | step 27710 |avg loss 7.964 |avg tokens 2111.400 |tokens/s 8035.847 |walltime 7331.155 | +Transformer | epoch 0 | step 27720 |avg loss 8.142 |avg tokens 2029.600 |tokens/s 8397.165 |walltime 7333.572 | +Transformer | epoch 0 | step 27730 |avg loss 6.933 |avg tokens 2434.100 |tokens/s 8728.760 |walltime 7336.360 | +Transformer | epoch 0 | step 27740 |avg loss 7.891 |avg tokens 2244.300 |tokens/s 8339.626 |walltime 7339.052 | +Transformer | epoch 0 | step 27750 |avg loss 7.971 |avg tokens 2244.700 |tokens/s 8590.610 |walltime 7341.665 | +Transformer | epoch 0 | step 27760 |avg loss 7.767 |avg tokens 2171.200 |tokens/s 8173.336 |walltime 7344.321 | +Transformer | epoch 0 | step 27770 |avg loss 7.618 |avg tokens 2092.800 |tokens/s 7924.202 |walltime 7346.962 | +Transformer | epoch 0 | step 27780 |avg loss 7.796 |avg tokens 2244.800 |tokens/s 8347.731 |walltime 7349.651 | +Transformer | epoch 0 | step 27790 |avg loss 7.531 |avg tokens 2125.600 |tokens/s 8026.445 |walltime 7352.299 | +Transformer | epoch 0 | step 27800 |avg loss 8.337 |avg tokens 1949.000 |tokens/s 7962.674 |walltime 7354.747 | +Transformer | epoch 0 | step 27810 |avg loss 7.399 |avg tokens 2194.400 |tokens/s 8089.742 |walltime 7357.460 | +Transformer | epoch 0 | step 27820 |avg loss 7.578 |avg tokens 2200.200 |tokens/s 8028.741 |walltime 7360.200 | +Transformer | epoch 0 | step 27830 |avg loss 7.645 |avg tokens 2306.500 |tokens/s 8683.688 |walltime 7362.856 | +Transformer | epoch 0 | step 27840 |avg loss 7.684 |avg tokens 2128.000 |tokens/s 7977.783 |walltime 7365.524 | +Transformer | epoch 0 | step 27850 |avg loss 8.122 |avg tokens 2095.200 |tokens/s 8166.990 |walltime 7368.089 | +Transformer | epoch 0 | step 27860 |avg loss 8.023 |avg tokens 2159.300 |tokens/s 8187.041 |walltime 7370.727 | +Transformer | epoch 0 | step 27870 |avg loss 7.955 |avg tokens 2194.400 |tokens/s 8585.029 |walltime 7373.283 | +Transformer | epoch 0 | step 27880 |avg loss 7.327 |avg tokens 2212.000 |tokens/s 8180.562 |walltime 7375.987 | +Transformer | epoch 0 | step 27890 |avg loss 7.821 |avg tokens 2131.700 |tokens/s 8251.022 |walltime 7378.570 | +Transformer | epoch 0 | step 27900 |avg loss 8.170 |avg tokens 2124.300 |tokens/s 8261.556 |walltime 7381.141 | +Transformer | epoch 0 | step 27910 |avg loss 7.947 |avg tokens 2323.000 |tokens/s 8696.183 |walltime 7383.813 | +Transformer | epoch 0 | step 27920 |avg loss 8.114 |avg tokens 2277.600 |tokens/s 8520.429 |walltime 7386.486 | +Transformer | epoch 0 | step 27930 |avg loss 7.944 |avg tokens 2313.800 |tokens/s 8826.681 |walltime 7389.107 | +Transformer | epoch 0 | step 27940 |avg loss 7.912 |avg tokens 2146.100 |tokens/s 8270.665 |walltime 7391.702 | +Transformer | epoch 0 | step 27950 |avg loss 7.513 |avg tokens 2337.000 |tokens/s 8352.734 |walltime 7394.500 | +Transformer | epoch 0 | step 27960 |avg loss 7.865 |avg tokens 2131.200 |tokens/s 8107.635 |walltime 7397.129 | +Transformer | epoch 0 | step 27970 |avg loss 7.878 |avg tokens 2081.200 |tokens/s 8203.909 |walltime 7399.665 | +Transformer | epoch 0 | step 27980 |avg loss 7.613 |avg tokens 2221.000 |tokens/s 8061.615 |walltime 7402.420 | +Transformer | epoch 0 | step 27990 |avg loss 7.793 |avg tokens 2152.700 |tokens/s 8033.597 |walltime 7405.100 | +Transformer | epoch 0 | step 28000 |avg loss 7.754 |avg tokens 2430.100 |tokens/s 8788.336 |walltime 7407.865 | +Transformer | epoch 0 | step 28010 |avg loss 7.792 |avg tokens 2121.600 |tokens/s 8076.046 |walltime 7410.492 | +Transformer | epoch 0 | step 28020 |avg loss 7.827 |avg tokens 2283.500 |tokens/s 8454.349 |walltime 7413.193 | +Transformer | epoch 0 | step 28030 |avg loss 7.444 |avg tokens 2282.400 |tokens/s 8507.978 |walltime 7415.876 | +Transformer | epoch 0 | step 28040 |avg loss 7.814 |avg tokens 2167.900 |tokens/s 8253.975 |walltime 7418.502 | +Transformer | epoch 0 | step 28050 |avg loss 7.650 |avg tokens 2023.900 |tokens/s 7941.113 |walltime 7421.051 | +Transformer | epoch 0 | step 28060 |avg loss 7.961 |avg tokens 2143.300 |tokens/s 8331.591 |walltime 7423.624 | +Transformer | epoch 0 | step 28070 |avg loss 7.950 |avg tokens 2140.400 |tokens/s 8416.944 |walltime 7426.167 | +Transformer | epoch 0 | step 28080 |avg loss 7.527 |avg tokens 2231.500 |tokens/s 8119.425 |walltime 7428.915 | +Transformer | epoch 0 | step 28090 |avg loss 7.707 |avg tokens 2256.000 |tokens/s 8738.882 |walltime 7431.496 | +Transformer | epoch 0 | step 28100 |avg loss 7.902 |avg tokens 1951.800 |tokens/s 7795.562 |walltime 7434.000 | +Transformer | epoch 0 | step 28110 |avg loss 7.745 |avg tokens 2421.500 |tokens/s 8844.277 |walltime 7436.738 | +Transformer | epoch 0 | step 28120 |avg loss 7.689 |avg tokens 2396.000 |tokens/s 8872.541 |walltime 7439.439 | +Transformer | epoch 0 | step 28130 |avg loss 7.907 |avg tokens 2177.000 |tokens/s 8307.093 |walltime 7442.059 | +Transformer | epoch 0 | step 28140 |avg loss 7.825 |avg tokens 2285.400 |tokens/s 8898.111 |walltime 7444.628 | +Transformer | epoch 0 | step 28150 |avg loss 7.622 |avg tokens 2212.800 |tokens/s 8152.646 |walltime 7447.342 | +Transformer | epoch 0 | step 28160 |avg loss 8.053 |avg tokens 2343.000 |tokens/s 8977.256 |walltime 7449.952 | +Transformer | epoch 0 | step 28170 |avg loss 7.449 |avg tokens 2112.100 |tokens/s 8268.503 |walltime 7452.506 | +Transformer | epoch 0 | step 28180 |avg loss 7.791 |avg tokens 2122.700 |tokens/s 8093.077 |walltime 7455.129 | +Transformer | epoch 0 | step 28190 |avg loss 7.476 |avg tokens 2362.400 |tokens/s 8520.236 |walltime 7457.902 | +Transformer | epoch 0 | step 28200 |avg loss 7.926 |avg tokens 2133.900 |tokens/s 8156.856 |walltime 7460.518 | +Transformer | epoch 0 | step 28210 |avg loss 7.652 |avg tokens 2299.100 |tokens/s 8549.744 |walltime 7463.207 | +Transformer | epoch 0 | step 28220 |avg loss 7.610 |avg tokens 2207.500 |tokens/s 8127.985 |walltime 7465.923 | +Transformer | epoch 0 | step 28230 |avg loss 7.871 |avg tokens 1957.700 |tokens/s 7770.609 |walltime 7468.442 | +Transformer | epoch 0 | step 28240 |avg loss 8.072 |avg tokens 2204.400 |tokens/s 8473.583 |walltime 7471.044 | +Transformer | epoch 0 | step 28250 |avg loss 7.912 |avg tokens 2209.300 |tokens/s 8442.217 |walltime 7473.661 | +Transformer | epoch 0 | step 28260 |avg loss 7.879 |avg tokens 2150.700 |tokens/s 8335.837 |walltime 7476.241 | +Transformer | epoch 0 | step 28270 |avg loss 7.856 |avg tokens 2181.600 |tokens/s 8284.265 |walltime 7478.874 | +Transformer | epoch 0 | step 28280 |avg loss 7.733 |avg tokens 2224.300 |tokens/s 8209.569 |walltime 7481.584 | +Transformer | epoch 0 | step 28290 |avg loss 7.986 |avg tokens 2326.000 |tokens/s 9077.161 |walltime 7484.146 | +Transformer | epoch 0 | step 28300 |avg loss 7.821 |avg tokens 2250.000 |tokens/s 8373.307 |walltime 7486.833 | +Transformer | epoch 0 | step 28310 |avg loss 7.771 |avg tokens 2221.900 |tokens/s 8220.474 |walltime 7489.536 | +Transformer | epoch 0 | step 28320 |avg loss 7.921 |avg tokens 2208.800 |tokens/s 8265.475 |walltime 7492.208 | +Transformer | epoch 0 | step 28330 |avg loss 7.843 |avg tokens 2265.000 |tokens/s 8470.680 |walltime 7494.882 | +Transformer | epoch 0 | step 28340 |avg loss 7.366 |avg tokens 2213.600 |tokens/s 8120.147 |walltime 7497.608 | +Transformer | epoch 0 | step 28350 |avg loss 7.748 |avg tokens 2195.200 |tokens/s 8194.873 |walltime 7500.287 | +Transformer | epoch 0 | step 28360 |avg loss 7.871 |avg tokens 2146.300 |tokens/s 8090.011 |walltime 7502.940 | +Transformer | epoch 0 | step 28370 |avg loss 7.318 |avg tokens 2196.800 |tokens/s 8123.072 |walltime 7505.645 | +Transformer | epoch 0 | step 28380 |avg loss 7.617 |avg tokens 2226.900 |tokens/s 8284.280 |walltime 7508.333 | +Transformer | epoch 0 | step 28390 |avg loss 7.914 |avg tokens 2048.100 |tokens/s 8100.533 |walltime 7510.861 | +Transformer | epoch 0 | step 28400 |avg loss 7.734 |avg tokens 2351.200 |tokens/s 8571.267 |walltime 7513.604 | +Transformer | epoch 0 | step 28410 |avg loss 7.805 |avg tokens 2208.500 |tokens/s 8541.132 |walltime 7516.190 | +Transformer | epoch 0 | step 28420 |avg loss 7.663 |avg tokens 2086.900 |tokens/s 7976.816 |walltime 7518.806 | +Transformer | epoch 0 | step 28430 |avg loss 7.680 |avg tokens 2255.500 |tokens/s 8626.174 |walltime 7521.421 | +Transformer | epoch 0 | step 28440 |avg loss 7.888 |avg tokens 2209.600 |tokens/s 8415.079 |walltime 7524.047 | +Transformer | epoch 0 | step 28450 |avg loss 7.790 |avg tokens 2233.600 |tokens/s 8786.774 |walltime 7526.589 | +Transformer | epoch 0 | step 28460 |avg loss 7.673 |avg tokens 2112.000 |tokens/s 7948.206 |walltime 7529.246 | +Transformer | epoch 0 | step 28470 |avg loss 7.573 |avg tokens 2175.000 |tokens/s 8059.075 |walltime 7531.945 | +Transformer | epoch 0 | step 28480 |avg loss 7.841 |avg tokens 2014.100 |tokens/s 7748.437 |walltime 7534.544 | +Transformer | epoch 0 | step 28490 |avg loss 8.003 |avg tokens 2086.300 |tokens/s 8179.328 |walltime 7537.095 | +Transformer | epoch 0 | step 28500 |avg loss 7.638 |avg tokens 2105.300 |tokens/s 8054.124 |walltime 7539.709 | +Transformer | epoch 0 | step 28510 |avg loss 7.486 |avg tokens 2094.400 |tokens/s 7871.883 |walltime 7542.369 | +Transformer | epoch 0 | step 28520 |avg loss 7.810 |avg tokens 1917.900 |tokens/s 7691.678 |walltime 7544.863 | +Transformer | epoch 0 | step 28530 |avg loss 7.682 |avg tokens 2271.900 |tokens/s 8655.194 |walltime 7547.488 | +Transformer | epoch 0 | step 28540 |avg loss 7.699 |avg tokens 2320.900 |tokens/s 8666.286 |walltime 7550.166 | +Transformer | epoch 0 | step 28550 |avg loss 7.753 |avg tokens 2084.600 |tokens/s 7882.518 |walltime 7552.810 | +Transformer | epoch 0 | step 28560 |avg loss 7.854 |avg tokens 2044.000 |tokens/s 7859.451 |walltime 7555.411 | +Transformer | epoch 0 | step 28570 |avg loss 8.071 |avg tokens 2071.700 |tokens/s 8371.299 |walltime 7557.886 | +Transformer | epoch 0 | step 28580 |avg loss 7.741 |avg tokens 2113.200 |tokens/s 7965.706 |walltime 7560.539 | +Transformer | epoch 0 | step 28590 |avg loss 7.929 |avg tokens 2114.300 |tokens/s 8049.211 |walltime 7563.165 | +Transformer | epoch 0 | step 28600 |avg loss 7.782 |avg tokens 2184.500 |tokens/s 8269.223 |walltime 7565.807 | +Transformer | epoch 0 | step 28610 |avg loss 7.987 |avg tokens 2094.300 |tokens/s 8127.712 |walltime 7568.384 | +Transformer | epoch 0 | step 28620 |avg loss 7.816 |avg tokens 2238.100 |tokens/s 8325.846 |walltime 7571.072 | +Transformer | epoch 0 | step 28630 |avg loss 8.113 |avg tokens 2264.900 |tokens/s 8699.794 |walltime 7573.675 | +Transformer | epoch 0 | step 28640 |avg loss 7.792 |avg tokens 2280.400 |tokens/s 8697.281 |walltime 7576.297 | +Transformer | epoch 0 | step 28650 |avg loss 7.728 |avg tokens 2199.500 |tokens/s 8250.548 |walltime 7578.963 | +Transformer | epoch 0 | step 28660 |avg loss 7.892 |avg tokens 1973.300 |tokens/s 7844.778 |walltime 7581.479 | +Transformer | epoch 0 | step 28670 |avg loss 7.641 |avg tokens 2312.000 |tokens/s 8563.438 |walltime 7584.179 | +Transformer | epoch 0 | step 28680 |avg loss 7.658 |avg tokens 2037.400 |tokens/s 7941.079 |walltime 7586.744 | +Transformer | epoch 0 | step 28690 |avg loss 7.668 |avg tokens 2346.900 |tokens/s 8517.304 |walltime 7589.500 | +Transformer | epoch 0 | step 28700 |avg loss 7.652 |avg tokens 2236.000 |tokens/s 8644.406 |walltime 7592.086 | +Transformer | epoch 0 | step 28710 |avg loss 7.786 |avg tokens 2326.900 |tokens/s 8578.485 |walltime 7594.799 | +Transformer | epoch 0 | step 28720 |avg loss 8.042 |avg tokens 2024.100 |tokens/s 8230.581 |walltime 7597.258 | +Transformer | epoch 0 | step 28730 |avg loss 7.675 |avg tokens 1956.000 |tokens/s 7463.939 |walltime 7599.879 | +Transformer | epoch 0 | step 28740 |avg loss 7.778 |avg tokens 2251.200 |tokens/s 8586.258 |walltime 7602.501 | +Transformer | epoch 0 | step 28750 |avg loss 7.645 |avg tokens 2324.000 |tokens/s 8413.258 |walltime 7605.263 | +Transformer | epoch 0 | step 28760 |avg loss 7.987 |avg tokens 2212.100 |tokens/s 8640.111 |walltime 7607.823 | +Transformer | epoch 0 | step 28770 |avg loss 7.785 |avg tokens 2219.200 |tokens/s 8293.896 |walltime 7610.499 | +Transformer | epoch 0 | step 28780 |avg loss 7.779 |avg tokens 2408.700 |tokens/s 8770.735 |walltime 7613.245 | +Transformer | epoch 0 | step 28790 |avg loss 7.420 |avg tokens 2217.600 |tokens/s 8320.579 |walltime 7615.910 | +Transformer | epoch 0 | step 28800 |avg loss 7.432 |avg tokens 2273.600 |tokens/s 8494.017 |walltime 7618.587 | +Transformer | epoch 0 | step 28810 |avg loss 7.869 |avg tokens 1838.300 |tokens/s 7376.802 |walltime 7621.079 | +Transformer | epoch 0 | step 28820 |avg loss 8.110 |avg tokens 2134.400 |tokens/s 8724.921 |walltime 7623.525 | +Transformer | epoch 0 | step 28830 |avg loss 7.506 |avg tokens 1942.100 |tokens/s 7848.039 |walltime 7626.000 | +Transformer | epoch 0 | step 28840 |avg loss 7.854 |avg tokens 2206.100 |tokens/s 8217.708 |walltime 7628.685 | +Transformer | epoch 0 | step 28850 |avg loss 8.380 |avg tokens 1973.500 |tokens/s 8317.637 |walltime 7631.057 | +Transformer | epoch 0 | step 28860 |avg loss 7.863 |avg tokens 2159.300 |tokens/s 8086.451 |walltime 7633.728 | +Transformer | epoch 0 | step 28870 |avg loss 7.447 |avg tokens 2181.600 |tokens/s 7951.232 |walltime 7636.471 | +Transformer | epoch 0 | step 28880 |avg loss 7.486 |avg tokens 2190.000 |tokens/s 8240.352 |walltime 7639.129 | +Transformer | epoch 0 | step 28890 |avg loss 8.138 |avg tokens 2159.600 |tokens/s 8461.008 |walltime 7641.681 | +Transformer | epoch 0 | step 28900 |avg loss 8.091 |avg tokens 2025.700 |tokens/s 8149.442 |walltime 7644.167 | +Transformer | epoch 0 | step 28910 |avg loss 7.654 |avg tokens 2137.500 |tokens/s 7955.455 |walltime 7646.854 | +Transformer | epoch 0 | step 28920 |avg loss 7.859 |avg tokens 2082.600 |tokens/s 7982.774 |walltime 7649.463 | +Transformer | epoch 0 | step 28930 |avg loss 7.499 |avg tokens 2172.000 |tokens/s 8010.698 |walltime 7652.174 | +Transformer | epoch 0 | step 28940 |avg loss 7.872 |avg tokens 2337.900 |tokens/s 8683.332 |walltime 7654.867 | +Transformer | epoch 0 | step 28950 |avg loss 7.716 |avg tokens 2172.000 |tokens/s 8191.814 |walltime 7657.518 | +Transformer | epoch 0 | step 28960 |avg loss 7.632 |avg tokens 2125.900 |tokens/s 8139.022 |walltime 7660.130 | +Transformer | epoch 0 | step 28970 |avg loss 7.910 |avg tokens 2189.900 |tokens/s 8555.297 |walltime 7662.690 | +Transformer | epoch 0 | step 28980 |avg loss 8.037 |avg tokens 2282.500 |tokens/s 8516.350 |walltime 7665.370 | +Transformer | epoch 0 | step 28990 |avg loss 7.925 |avg tokens 2234.700 |tokens/s 8557.666 |walltime 7667.981 | +Transformer | epoch 0 | step 29000 |avg loss 7.448 |avg tokens 2252.800 |tokens/s 8481.897 |walltime 7670.637 | +Transformer | epoch 0 | step 29010 |avg loss 7.678 |avg tokens 2250.400 |tokens/s 8647.656 |walltime 7673.239 | +Transformer | epoch 0 | step 29020 |avg loss 7.546 |avg tokens 2277.400 |tokens/s 8392.793 |walltime 7675.953 | +Transformer | epoch 0 | step 29030 |avg loss 7.439 |avg tokens 2397.600 |tokens/s 8621.049 |walltime 7678.734 | +Transformer | epoch 0 | step 29040 |avg loss 7.890 |avg tokens 2258.300 |tokens/s 8790.921 |walltime 7681.303 | +Transformer | epoch 0 | step 29050 |avg loss 7.983 |avg tokens 2316.800 |tokens/s 8886.683 |walltime 7683.910 | +Transformer | epoch 0 | step 29060 |avg loss 8.119 |avg tokens 1961.400 |tokens/s 7696.917 |walltime 7686.458 | +Transformer | epoch 0 | step 29070 |avg loss 7.791 |avg tokens 2117.200 |tokens/s 8035.472 |walltime 7689.093 | +Transformer | epoch 0 | step 29080 |avg loss 7.901 |avg tokens 2327.100 |tokens/s 8901.668 |walltime 7691.707 | +Transformer | epoch 0 | step 29090 |avg loss 7.694 |avg tokens 2011.700 |tokens/s 7690.550 |walltime 7694.323 | +Transformer | epoch 0 | step 29100 |avg loss 7.588 |avg tokens 2228.300 |tokens/s 8507.033 |walltime 7696.943 | +Transformer | epoch 0 | step 29110 |avg loss 7.841 |avg tokens 2043.900 |tokens/s 8048.211 |walltime 7699.482 | +Transformer | epoch 0 | step 29120 |avg loss 7.945 |avg tokens 2289.500 |tokens/s 8592.962 |walltime 7702.147 | +Transformer | epoch 0 | step 29130 |avg loss 7.640 |avg tokens 2252.200 |tokens/s 8060.333 |walltime 7704.941 | +Transformer | epoch 0 | step 29140 |avg loss 7.721 |avg tokens 2193.100 |tokens/s 8323.291 |walltime 7707.576 | +Transformer | epoch 0 | step 29150 |avg loss 7.783 |avg tokens 2215.400 |tokens/s 8303.426 |walltime 7710.244 | +Transformer | epoch 0 | step 29160 |avg loss 8.066 |avg tokens 1889.700 |tokens/s 7418.596 |walltime 7712.791 | +Transformer | epoch 0 | step 29170 |avg loss 7.931 |avg tokens 2140.800 |tokens/s 8266.554 |walltime 7715.381 | +Transformer | epoch 0 | step 29180 |avg loss 8.177 |avg tokens 2013.700 |tokens/s 8046.878 |walltime 7717.883 | +Transformer | epoch 0 | step 29190 |avg loss 7.474 |avg tokens 2316.300 |tokens/s 8176.177 |walltime 7720.716 | +Transformer | epoch 0 | step 29200 |avg loss 7.839 |avg tokens 2294.100 |tokens/s 8609.816 |walltime 7723.381 | +Transformer | epoch 0 | step 29210 |avg loss 7.478 |avg tokens 2328.800 |tokens/s 8362.076 |walltime 7726.166 | +Transformer | epoch 0 | step 29220 |avg loss 7.940 |avg tokens 2199.100 |tokens/s 8357.929 |walltime 7728.797 | +Transformer | epoch 0 | step 29230 |avg loss 7.599 |avg tokens 2409.700 |tokens/s 8578.704 |walltime 7731.606 | +Transformer | epoch 0 | step 29240 |avg loss 7.608 |avg tokens 2244.900 |tokens/s 8165.797 |walltime 7734.355 | +Transformer | epoch 0 | step 29250 |avg loss 7.750 |avg tokens 2235.500 |tokens/s 8222.382 |walltime 7737.074 | +Transformer | epoch 0 | step 29260 |avg loss 7.449 |avg tokens 2393.600 |tokens/s 8701.722 |walltime 7739.824 | +Transformer | epoch 0 | step 29270 |avg loss 7.975 |avg tokens 2017.600 |tokens/s 7731.807 |walltime 7742.434 | +Transformer | epoch 0 | step 29280 |avg loss 7.340 |avg tokens 2052.100 |tokens/s 7706.774 |walltime 7745.097 | +Transformer | epoch 0 | step 29290 |avg loss 7.615 |avg tokens 2200.700 |tokens/s 8089.588 |walltime 7747.817 | +Transformer | epoch 0 | step 29300 |avg loss 8.066 |avg tokens 2323.200 |tokens/s 8612.597 |walltime 7750.514 | +Transformer | epoch 0 | step 29310 |avg loss 7.920 |avg tokens 2106.000 |tokens/s 7845.436 |walltime 7753.199 | +Transformer | epoch 0 | step 29320 |avg loss 7.848 |avg tokens 2126.500 |tokens/s 7915.654 |walltime 7755.885 | +Transformer | epoch 0 | step 29330 |avg loss 7.744 |avg tokens 2293.500 |tokens/s 8516.951 |walltime 7758.578 | +Transformer | epoch 0 | step 29340 |avg loss 7.780 |avg tokens 2416.900 |tokens/s 8649.066 |walltime 7761.373 | +Transformer | epoch 0 | step 29350 |avg loss 7.678 |avg tokens 2281.300 |tokens/s 8271.006 |walltime 7764.131 | +Transformer | epoch 0 | step 29360 |avg loss 7.199 |avg tokens 2256.000 |tokens/s 8052.143 |walltime 7766.933 | +Transformer | epoch 0 | step 29370 |avg loss 7.622 |avg tokens 2331.100 |tokens/s 8673.767 |walltime 7769.620 | +Transformer | epoch 0 | step 29380 |avg loss 7.468 |avg tokens 2404.800 |tokens/s 8525.069 |walltime 7772.441 | +Transformer | epoch 0 | step 29390 |avg loss 7.910 |avg tokens 1804.500 |tokens/s 7497.035 |walltime 7774.848 | +Transformer | epoch 0 | step 29400 |avg loss 7.446 |avg tokens 2252.000 |tokens/s 8291.024 |walltime 7777.564 | +Transformer | epoch 0 | step 29410 |avg loss 7.871 |avg tokens 2241.300 |tokens/s 8544.438 |walltime 7780.187 | +Transformer | epoch 0 | step 29420 |avg loss 7.926 |avg tokens 1886.500 |tokens/s 7667.647 |walltime 7782.648 | +Transformer | epoch 0 | step 29430 |avg loss 8.033 |avg tokens 2235.700 |tokens/s 8682.360 |walltime 7785.223 | +Transformer | epoch 0 | step 29440 |avg loss 7.373 |avg tokens 2133.300 |tokens/s 8067.669 |walltime 7787.867 | +Transformer | epoch 0 | step 29450 |avg loss 7.909 |avg tokens 2239.200 |tokens/s 8487.282 |walltime 7790.505 | +Transformer | epoch 0 | step 29460 |avg loss 7.710 |avg tokens 1842.000 |tokens/s 7230.068 |walltime 7793.053 | +Transformer | epoch 0 | step 29470 |avg loss 8.054 |avg tokens 2125.100 |tokens/s 8117.512 |walltime 7795.671 | +Transformer | epoch 0 | step 29480 |avg loss 7.843 |avg tokens 2263.400 |tokens/s 8451.127 |walltime 7798.349 | +Transformer | epoch 0 | step 29490 |avg loss 7.627 |avg tokens 2289.900 |tokens/s 8292.904 |walltime 7801.110 | +Transformer | epoch 0 | step 29500 |avg loss 7.791 |avg tokens 2248.000 |tokens/s 8358.061 |walltime 7803.800 | +Transformer | epoch 0 | step 29510 |avg loss 7.768 |avg tokens 2366.700 |tokens/s 8621.254 |walltime 7806.545 | +Transformer | epoch 0 | step 29520 |avg loss 7.519 |avg tokens 2137.500 |tokens/s 8047.207 |walltime 7809.201 | +Transformer | epoch 0 | step 29530 |avg loss 7.802 |avg tokens 2256.200 |tokens/s 8393.353 |walltime 7811.889 | +Transformer | epoch 0 | step 29540 |avg loss 8.095 |avg tokens 1840.900 |tokens/s 7497.878 |walltime 7814.345 | +Transformer | epoch 0 | step 29550 |avg loss 7.684 |avg tokens 2196.800 |tokens/s 8013.160 |walltime 7817.086 | +Transformer | epoch 0 | step 29560 |avg loss 8.076 |avg tokens 2110.200 |tokens/s 7760.274 |walltime 7819.805 | +Transformer | epoch 0 | step 29570 |avg loss 7.795 |avg tokens 2284.200 |tokens/s 8467.942 |walltime 7822.503 | +Transformer | epoch 0 | step 29580 |avg loss 7.954 |avg tokens 2100.500 |tokens/s 8151.212 |walltime 7825.080 | +Transformer | epoch 0 | step 29590 |avg loss 7.443 |avg tokens 2226.400 |tokens/s 8276.252 |walltime 7827.770 | +Transformer | epoch 0 | step 29600 |avg loss 7.350 |avg tokens 2169.200 |tokens/s 8054.203 |walltime 7830.463 | +Transformer | epoch 0 | step 29610 |avg loss 7.789 |avg tokens 2249.400 |tokens/s 8326.006 |walltime 7833.165 | +Transformer | epoch 0 | step 29620 |avg loss 7.733 |avg tokens 1889.200 |tokens/s 7375.433 |walltime 7835.726 | +Transformer | epoch 0 | step 29630 |avg loss 8.057 |avg tokens 2230.500 |tokens/s 8577.198 |walltime 7838.327 | +Transformer | epoch 0 | step 29640 |avg loss 8.147 |avg tokens 2195.900 |tokens/s 8827.968 |walltime 7840.814 | +Transformer | epoch 0 | step 29650 |avg loss 7.590 |avg tokens 2270.100 |tokens/s 8526.491 |walltime 7843.477 | +Transformer | epoch 0 | step 29660 |avg loss 7.851 |avg tokens 2103.100 |tokens/s 7898.218 |walltime 7846.139 | +Transformer | epoch 0 | step 29670 |avg loss 8.318 |avg tokens 2166.300 |tokens/s 8635.601 |walltime 7848.648 | +Transformer | epoch 0 | step 29680 |avg loss 7.785 |avg tokens 2112.000 |tokens/s 7856.725 |walltime 7851.336 | +Transformer | epoch 0 | step 29690 |avg loss 7.525 |avg tokens 2153.200 |tokens/s 8065.599 |walltime 7854.006 | +Transformer | epoch 0 | step 29700 |avg loss 7.600 |avg tokens 2183.200 |tokens/s 8208.006 |walltime 7856.665 | +Transformer | epoch 0 | step 29710 |avg loss 7.572 |avg tokens 2154.600 |tokens/s 8231.193 |walltime 7859.283 | +Transformer | epoch 0 | step 29720 |avg loss 8.191 |avg tokens 2024.000 |tokens/s 8031.192 |walltime 7861.803 | +Transformer | epoch 0 | step 29730 |avg loss 8.069 |avg tokens 2044.500 |tokens/s 7997.737 |walltime 7864.360 | +Transformer | epoch 0 | step 29740 |avg loss 7.601 |avg tokens 2316.800 |tokens/s 8338.754 |walltime 7867.138 | +Transformer | epoch 0 | step 29750 |avg loss 7.657 |avg tokens 2186.300 |tokens/s 8242.694 |walltime 7869.790 | +Transformer | epoch 0 | step 29760 |avg loss 7.777 |avg tokens 2104.600 |tokens/s 8044.652 |walltime 7872.407 | +Transformer | epoch 0 | step 29770 |avg loss 8.042 |avg tokens 2011.100 |tokens/s 7899.856 |walltime 7874.952 | +Transformer | epoch 0 | step 29780 |avg loss 7.718 |avg tokens 2110.200 |tokens/s 8048.736 |walltime 7877.574 | +Transformer | epoch 0 | step 29790 |avg loss 7.769 |avg tokens 2431.400 |tokens/s 8833.812 |walltime 7880.326 | +Transformer | epoch 0 | step 29800 |avg loss 7.328 |avg tokens 2276.800 |tokens/s 8244.307 |walltime 7883.088 | +Transformer | epoch 0 | step 29810 |avg loss 7.766 |avg tokens 2158.600 |tokens/s 8235.544 |walltime 7885.709 | +Transformer | epoch 0 | step 29820 |avg loss 7.807 |avg tokens 2091.800 |tokens/s 8005.521 |walltime 7888.322 | +Transformer | epoch 0 | step 29830 |avg loss 8.166 |avg tokens 1888.200 |tokens/s 7605.388 |walltime 7890.805 | +Transformer | epoch 0 | step 29840 |avg loss 7.716 |avg tokens 2305.400 |tokens/s 8671.023 |walltime 7893.464 | +Transformer | epoch 0 | step 29850 |avg loss 7.713 |avg tokens 2311.200 |tokens/s 8509.254 |walltime 7896.180 | +Transformer | epoch 0 | step 29860 |avg loss 7.786 |avg tokens 2061.500 |tokens/s 8028.708 |walltime 7898.747 | +Transformer | epoch 0 | step 29870 |avg loss 7.867 |avg tokens 2021.700 |tokens/s 7753.685 |walltime 7901.355 | +Transformer | epoch 0 | step 29880 |avg loss 8.307 |avg tokens 2134.000 |tokens/s 8736.321 |walltime 7903.797 | +Transformer | epoch 0 | step 29890 |avg loss 7.703 |avg tokens 2201.000 |tokens/s 8458.621 |walltime 7906.400 | +Transformer | epoch 0 | step 29900 |avg loss 7.296 |avg tokens 2298.000 |tokens/s 8295.558 |walltime 7909.170 | +Transformer | epoch 0 | step 29910 |avg loss 7.276 |avg tokens 2194.400 |tokens/s 8194.731 |walltime 7911.848 | +Transformer | epoch 0 | step 29920 |avg loss 7.656 |avg tokens 2270.500 |tokens/s 8268.000 |walltime 7914.594 | +Transformer | epoch 0 | step 29930 |avg loss 7.834 |avg tokens 2248.800 |tokens/s 8288.405 |walltime 7917.307 | +Transformer | epoch 0 | step 29940 |avg loss 7.583 |avg tokens 2087.300 |tokens/s 7947.638 |walltime 7919.933 | +Transformer | epoch 0 | step 29950 |avg loss 7.994 |avg tokens 2312.100 |tokens/s 8676.772 |walltime 7922.598 | +Transformer | epoch 0 | step 29960 |avg loss 7.664 |avg tokens 2096.000 |tokens/s 7871.919 |walltime 7925.261 | +Transformer | epoch 0 | step 29970 |avg loss 7.783 |avg tokens 2246.400 |tokens/s 8531.142 |walltime 7927.894 | +Transformer | epoch 0 | step 29980 |avg loss 7.702 |avg tokens 2336.400 |tokens/s 8602.276 |walltime 7930.610 | +Transformer | epoch 0 | step 29990 |avg loss 8.095 |avg tokens 2081.800 |tokens/s 8092.722 |walltime 7933.182 | +Transformer | epoch 0 | step 30000 |avg loss 7.607 |avg tokens 2149.400 |tokens/s 7994.261 |walltime 7935.871 | +Transformer | epoch 0 | step 30010 |avg loss 7.939 |avg tokens 2194.200 |tokens/s 8495.348 |walltime 7938.454 | +Transformer | epoch 0 | step 30020 |avg loss 7.834 |avg tokens 2042.200 |tokens/s 8111.372 |walltime 7940.971 | +Transformer | epoch 0 | step 30030 |avg loss 8.022 |avg tokens 2229.100 |tokens/s 8475.615 |walltime 7943.601 | +Transformer | epoch 0 | step 30040 |avg loss 7.827 |avg tokens 2330.000 |tokens/s 8693.677 |walltime 7946.282 | +Transformer | epoch 0 | step 30050 |avg loss 8.367 |avg tokens 1888.500 |tokens/s 7880.801 |walltime 7948.678 | +Transformer | epoch 0 | step 30060 |avg loss 7.779 |avg tokens 2149.500 |tokens/s 8006.738 |walltime 7951.362 | +Transformer | epoch 0 | step 30070 |avg loss 7.748 |avg tokens 2254.500 |tokens/s 8122.089 |walltime 7954.138 | +Transformer | epoch 0 | step 30080 |avg loss 7.592 |avg tokens 2171.200 |tokens/s 8034.436 |walltime 7956.841 | +Transformer | epoch 0 | step 30090 |avg loss 7.624 |avg tokens 2253.600 |tokens/s 8421.225 |walltime 7959.517 | +Transformer | epoch 0 | step 30100 |avg loss 7.842 |avg tokens 2071.600 |tokens/s 8058.257 |walltime 7962.087 | +Transformer | epoch 0 | step 30110 |avg loss 7.508 |avg tokens 2190.900 |tokens/s 8105.552 |walltime 7964.790 | +Transformer | epoch 0 | step 30120 |avg loss 7.550 |avg tokens 2406.400 |tokens/s 8733.679 |walltime 7967.546 | +Transformer | epoch 0 | step 30130 |avg loss 7.448 |avg tokens 2297.300 |tokens/s 8399.780 |walltime 7970.281 | +Transformer | epoch 0 | step 30140 |avg loss 7.567 |avg tokens 2344.800 |tokens/s 8551.574 |walltime 7973.023 | +Transformer | epoch 0 | step 30150 |avg loss 7.690 |avg tokens 2101.700 |tokens/s 8212.681 |walltime 7975.582 | +Transformer | epoch 0 | step 30160 |avg loss 7.746 |avg tokens 2312.000 |tokens/s 8727.986 |walltime 7978.231 | +Transformer | epoch 0 | step 30170 |avg loss 7.475 |avg tokens 2350.400 |tokens/s 8377.776 |walltime 7981.036 | +Transformer | epoch 0 | step 30180 |avg loss 7.723 |avg tokens 2134.600 |tokens/s 8123.780 |walltime 7983.664 | +Transformer | epoch 0 | step 30190 |avg loss 7.575 |avg tokens 2228.500 |tokens/s 8264.378 |walltime 7986.360 | +Transformer | epoch 0 | step 30200 |avg loss 7.694 |avg tokens 2154.400 |tokens/s 8173.007 |walltime 7988.996 | +Transformer | epoch 0 | step 30210 |avg loss 7.577 |avg tokens 2320.900 |tokens/s 8670.247 |walltime 7991.673 | +Transformer | epoch 0 | step 30220 |avg loss 7.802 |avg tokens 2375.800 |tokens/s 8824.962 |walltime 7994.365 | +Transformer | epoch 0 | step 30230 |avg loss 7.955 |avg tokens 2142.500 |tokens/s 8287.302 |walltime 7996.951 | +Transformer | epoch 0 | step 30240 |avg loss 7.903 |avg tokens 1951.400 |tokens/s 7646.806 |walltime 7999.503 | +Transformer | epoch 0 | step 30250 |avg loss 7.770 |avg tokens 2194.200 |tokens/s 8257.962 |walltime 8002.160 | +Transformer | epoch 0 | step 30260 |avg loss 7.906 |avg tokens 2052.300 |tokens/s 8087.153 |walltime 8004.697 | +Transformer | epoch 0 | step 30270 |avg loss 7.756 |avg tokens 2224.200 |tokens/s 8359.421 |walltime 8007.358 | +Transformer | epoch 0 | step 30280 |avg loss 7.964 |avg tokens 2162.900 |tokens/s 8424.891 |walltime 8009.925 | +Transformer | epoch 0 | step 30290 |avg loss 7.885 |avg tokens 2286.400 |tokens/s 8775.406 |walltime 8012.531 | +Transformer | epoch 0 | step 30300 |avg loss 7.714 |avg tokens 2084.000 |tokens/s 7827.558 |walltime 8015.193 | +Transformer | epoch 0 | step 30310 |avg loss 7.669 |avg tokens 2197.600 |tokens/s 8332.432 |walltime 8017.831 | +Transformer | epoch 0 | step 30320 |avg loss 7.464 |avg tokens 2338.400 |tokens/s 8541.859 |walltime 8020.568 | +Transformer | epoch 0 | step 30330 |avg loss 7.847 |avg tokens 2280.000 |tokens/s 8589.115 |walltime 8023.223 | +Transformer | epoch 0 | step 30340 |avg loss 8.000 |avg tokens 1851.700 |tokens/s 7550.028 |walltime 8025.675 | +Transformer | epoch 0 | step 30350 |avg loss 7.641 |avg tokens 2248.100 |tokens/s 8247.136 |walltime 8028.401 | +Transformer | epoch 0 | step 30360 |avg loss 7.931 |avg tokens 2201.600 |tokens/s 8709.108 |walltime 8030.929 | +Transformer | epoch 0 | step 30370 |avg loss 7.804 |avg tokens 2235.500 |tokens/s 8548.425 |walltime 8033.544 | +Transformer | epoch 0 | step 30380 |avg loss 7.887 |avg tokens 2056.000 |tokens/s 7989.978 |walltime 8036.118 | +Transformer | epoch 0 | step 30390 |avg loss 7.952 |avg tokens 2392.300 |tokens/s 8977.600 |walltime 8038.782 | +Transformer | epoch 0 | step 30400 |avg loss 8.084 |avg tokens 1999.500 |tokens/s 7867.419 |walltime 8041.324 | +Transformer | epoch 0 | step 30410 |avg loss 7.828 |avg tokens 2281.000 |tokens/s 8753.717 |walltime 8043.930 | +Transformer | epoch 0 | step 30420 |avg loss 7.855 |avg tokens 2069.700 |tokens/s 7874.684 |walltime 8046.558 | +Transformer | epoch 0 | step 30430 |avg loss 7.801 |avg tokens 2207.900 |tokens/s 8303.528 |walltime 8049.217 | +Transformer | epoch 0 | step 30440 |avg loss 7.796 |avg tokens 2229.500 |tokens/s 8338.770 |walltime 8051.890 | +Transformer | epoch 0 | step 30450 |avg loss 7.491 |avg tokens 2265.600 |tokens/s 8195.333 |walltime 8054.655 | +Transformer | epoch 0 | step 30460 |avg loss 7.590 |avg tokens 2352.800 |tokens/s 8570.039 |walltime 8057.400 | +Transformer | epoch 0 | step 30470 |avg loss 7.887 |avg tokens 2154.000 |tokens/s 8253.215 |walltime 8060.010 | +Transformer | epoch 0 | step 30480 |avg loss 8.020 |avg tokens 2149.400 |tokens/s 8400.128 |walltime 8062.569 | +Transformer | epoch 0 | step 30490 |avg loss 7.666 |avg tokens 1959.300 |tokens/s 7808.827 |walltime 8065.078 | +Transformer | epoch 0 | step 30500 |avg loss 7.872 |avg tokens 2391.400 |tokens/s 8916.045 |walltime 8067.760 | +Transformer | epoch 0 | step 30510 |avg loss 7.857 |avg tokens 2181.100 |tokens/s 8184.859 |walltime 8070.425 | +Transformer | epoch 0 | step 30520 |avg loss 7.622 |avg tokens 2391.000 |tokens/s 8574.823 |walltime 8073.213 | +Transformer | epoch 0 | step 30530 |avg loss 7.659 |avg tokens 2339.200 |tokens/s 8808.748 |walltime 8075.869 | +Transformer | epoch 0 | step 30540 |avg loss 7.935 |avg tokens 1917.200 |tokens/s 7657.229 |walltime 8078.373 | +Transformer | epoch 0 | step 30550 |avg loss 7.571 |avg tokens 2279.200 |tokens/s 8306.538 |walltime 8081.117 | +Transformer | epoch 0 | step 30560 |avg loss 8.121 |avg tokens 2171.700 |tokens/s 8520.589 |walltime 8083.665 | +Transformer | epoch 0 | step 30570 |avg loss 7.495 |avg tokens 2079.900 |tokens/s 8116.905 |walltime 8086.228 | +Transformer | epoch 0 | step 30580 |avg loss 7.395 |avg tokens 2287.800 |tokens/s 8265.898 |walltime 8088.996 | +Transformer | epoch 0 | step 30590 |avg loss 7.413 |avg tokens 2360.000 |tokens/s 8705.869 |walltime 8091.706 | +Transformer | epoch 0 | step 30600 |avg loss 7.859 |avg tokens 2126.200 |tokens/s 8103.407 |walltime 8094.330 | +Transformer | epoch 0 | step 30610 |avg loss 7.336 |avg tokens 2293.600 |tokens/s 8308.145 |walltime 8097.091 | +Transformer | epoch 0 | step 30620 |avg loss 7.976 |avg tokens 2037.600 |tokens/s 8072.129 |walltime 8099.615 | +Transformer | epoch 0 | step 30630 |avg loss 7.792 |avg tokens 2373.200 |tokens/s 8905.784 |walltime 8102.280 | +Transformer | epoch 0 | step 30640 |avg loss 7.914 |avg tokens 1882.800 |tokens/s 7524.447 |walltime 8104.782 | +Transformer | epoch 0 | step 30650 |avg loss 7.623 |avg tokens 2292.600 |tokens/s 8592.937 |walltime 8107.450 | +Transformer | epoch 0 | step 30660 |avg loss 7.567 |avg tokens 2283.000 |tokens/s 8521.668 |walltime 8110.129 | +Transformer | epoch 0 | step 30670 |avg loss 8.242 |avg tokens 2127.900 |tokens/s 8837.377 |walltime 8112.537 | +Transformer | epoch 0 | step 30680 |avg loss 7.610 |avg tokens 2330.700 |tokens/s 8677.285 |walltime 8115.223 | +Transformer | epoch 0 | step 30690 |avg loss 7.738 |avg tokens 2270.400 |tokens/s 8289.277 |walltime 8117.962 | +Transformer | epoch 0 | step 30700 |avg loss 8.089 |avg tokens 2148.200 |tokens/s 8363.095 |walltime 8120.531 | +Transformer | epoch 0 | step 30710 |avg loss 7.441 |avg tokens 2253.000 |tokens/s 8302.415 |walltime 8123.244 | +Transformer | epoch 0 | step 30720 |avg loss 7.595 |avg tokens 2367.000 |tokens/s 8744.756 |walltime 8125.951 | +Transformer | epoch 0 | step 30730 |avg loss 7.748 |avg tokens 2057.400 |tokens/s 8028.646 |walltime 8128.514 | +Transformer | epoch 0 | step 30740 |avg loss 8.059 |avg tokens 2052.900 |tokens/s 8048.092 |walltime 8131.065 | +Transformer | epoch 0 | step 30750 |avg loss 7.555 |avg tokens 2402.300 |tokens/s 8859.618 |walltime 8133.776 | +Transformer | epoch 0 | step 30760 |avg loss 7.791 |avg tokens 2117.100 |tokens/s 8075.631 |walltime 8136.398 | +Transformer | epoch 0 | step 30770 |avg loss 8.077 |avg tokens 2048.700 |tokens/s 8154.179 |walltime 8138.910 | +Transformer | epoch 0 | step 30780 |avg loss 7.940 |avg tokens 2283.600 |tokens/s 8551.982 |walltime 8141.580 | +Transformer | epoch 0 | step 30790 |avg loss 8.122 |avg tokens 2214.400 |tokens/s 8791.695 |walltime 8144.099 | +Transformer | epoch 0 | step 30800 |avg loss 8.034 |avg tokens 2040.400 |tokens/s 8054.130 |walltime 8146.633 | +Transformer | epoch 0 | step 30810 |avg loss 7.459 |avg tokens 2335.200 |tokens/s 8340.761 |walltime 8149.432 | +Transformer | epoch 0 | step 30820 |avg loss 7.447 |avg tokens 2273.900 |tokens/s 8460.057 |walltime 8152.120 | +Transformer | epoch 0 | step 30830 |avg loss 7.766 |avg tokens 2156.400 |tokens/s 8200.496 |walltime 8154.750 | +Transformer | epoch 0 | step 30840 |avg loss 7.833 |avg tokens 2042.400 |tokens/s 7959.921 |walltime 8157.316 | +Transformer | epoch 0 | step 30850 |avg loss 7.930 |avg tokens 2145.900 |tokens/s 8396.817 |walltime 8159.871 | +Transformer | epoch 0 | step 30860 |avg loss 8.000 |avg tokens 2115.000 |tokens/s 7918.112 |walltime 8162.542 | +Transformer | epoch 0 | step 30870 |avg loss 7.593 |avg tokens 2180.400 |tokens/s 8069.346 |walltime 8165.244 | +Transformer | epoch 0 | step 30880 |avg loss 8.167 |avg tokens 1994.500 |tokens/s 8200.226 |walltime 8167.677 | +Transformer | epoch 0 | step 30890 |avg loss 7.810 |avg tokens 2254.500 |tokens/s 8787.324 |walltime 8170.242 | +Transformer | epoch 0 | step 30900 |avg loss 7.831 |avg tokens 1925.600 |tokens/s 7952.182 |walltime 8172.664 | +Transformer | epoch 0 | step 30910 |avg loss 7.728 |avg tokens 2167.600 |tokens/s 8228.685 |walltime 8175.298 | +Transformer | epoch 0 | step 30920 |avg loss 8.178 |avg tokens 2180.000 |tokens/s 8872.438 |walltime 8177.755 | +Transformer | epoch 0 | step 30930 |avg loss 7.648 |avg tokens 2230.600 |tokens/s 8185.192 |walltime 8180.480 | +Transformer | epoch 0 | step 30940 |avg loss 7.898 |avg tokens 2074.200 |tokens/s 8137.087 |walltime 8183.029 | +Transformer | epoch 0 | step 30950 |avg loss 7.833 |avg tokens 2178.800 |tokens/s 8037.213 |walltime 8185.740 | +Transformer | epoch 0 | step 30960 |avg loss 7.910 |avg tokens 2234.500 |tokens/s 8317.121 |walltime 8188.427 | +Transformer | epoch 0 | step 30970 |avg loss 7.845 |avg tokens 1966.800 |tokens/s 7824.760 |walltime 8190.940 | +Transformer | epoch 0 | step 30980 |avg loss 7.613 |avg tokens 2244.600 |tokens/s 8272.608 |walltime 8193.654 | +Transformer | epoch 0 | step 30990 |avg loss 7.848 |avg tokens 2220.900 |tokens/s 8542.828 |walltime 8196.253 | +Transformer | epoch 0 | step 31000 |avg loss 7.767 |avg tokens 2117.100 |tokens/s 8001.382 |walltime 8198.899 | +Transformer | epoch 0 | step 31010 |avg loss 7.668 |avg tokens 2169.700 |tokens/s 8237.279 |walltime 8201.533 | +Transformer | epoch 0 | step 31020 |avg loss 7.639 |avg tokens 2088.300 |tokens/s 8053.563 |walltime 8204.126 | +Transformer | epoch 0 | step 31030 |avg loss 7.544 |avg tokens 2208.000 |tokens/s 8224.369 |walltime 8206.811 | +Transformer | epoch 0 | step 31040 |avg loss 7.758 |avg tokens 2060.300 |tokens/s 7937.521 |walltime 8209.407 | +Transformer | epoch 0 | step 31050 |avg loss 7.695 |avg tokens 1975.500 |tokens/s 7676.014 |walltime 8211.980 | +Transformer | epoch 0 | step 31060 |avg loss 8.090 |avg tokens 1851.300 |tokens/s 7447.650 |walltime 8214.466 | +Transformer | epoch 0 | step 31070 |avg loss 7.340 |avg tokens 2147.200 |tokens/s 7979.953 |walltime 8217.157 | +Transformer | epoch 0 | step 31080 |avg loss 7.716 |avg tokens 2158.900 |tokens/s 8021.447 |walltime 8219.848 | +Transformer | epoch 0 | step 31090 |avg loss 7.621 |avg tokens 2177.800 |tokens/s 8120.589 |walltime 8222.530 | +Transformer | epoch 0 | step 31100 |avg loss 7.752 |avg tokens 2297.100 |tokens/s 8584.922 |walltime 8225.206 | +Transformer | epoch 0 | step 31110 |avg loss 7.956 |avg tokens 2275.400 |tokens/s 8633.810 |walltime 8227.841 | +Transformer | epoch 0 | step 31120 |avg loss 7.684 |avg tokens 2225.000 |tokens/s 8467.634 |walltime 8230.469 | +Transformer | epoch 0 | step 31130 |avg loss 7.476 |avg tokens 2195.200 |tokens/s 8119.893 |walltime 8233.172 | +Transformer | epoch 0 | step 31140 |avg loss 7.856 |avg tokens 2185.300 |tokens/s 8322.880 |walltime 8235.798 | +Transformer | epoch 0 | step 31150 |avg loss 7.537 |avg tokens 2340.600 |tokens/s 8593.205 |walltime 8238.522 | +Transformer | epoch 0 | step 31160 |avg loss 7.838 |avg tokens 2398.900 |tokens/s 8735.719 |walltime 8241.268 | +Transformer | epoch 0 | step 31170 |avg loss 8.032 |avg tokens 2181.000 |tokens/s 8441.869 |walltime 8243.851 | +Transformer | epoch 0 | step 31180 |avg loss 7.843 |avg tokens 2280.600 |tokens/s 8582.845 |walltime 8246.509 | +Transformer | epoch 0 | step 31190 |avg loss 7.593 |avg tokens 2180.200 |tokens/s 8107.997 |walltime 8249.197 | +Transformer | epoch 0 | step 31200 |avg loss 8.110 |avg tokens 2255.300 |tokens/s 8648.869 |walltime 8251.805 | +Transformer | epoch 0 | step 31210 |avg loss 7.426 |avg tokens 2102.900 |tokens/s 7883.741 |walltime 8254.473 | +Transformer | epoch 0 | step 31220 |avg loss 7.830 |avg tokens 2261.200 |tokens/s 8092.501 |walltime 8257.267 | +Transformer | epoch 0 | step 31230 |avg loss 8.166 |avg tokens 1880.500 |tokens/s 7565.893 |walltime 8259.752 | +Transformer | epoch 0 | step 31240 |avg loss 7.774 |avg tokens 2234.400 |tokens/s 8748.764 |walltime 8262.306 | +Transformer | epoch 0 | step 31250 |avg loss 7.446 |avg tokens 2386.500 |tokens/s 8522.987 |walltime 8265.106 | +Transformer | epoch 0 | step 31260 |avg loss 8.136 |avg tokens 2050.100 |tokens/s 8229.074 |walltime 8267.598 | +Transformer | epoch 0 | step 31270 |avg loss 7.654 |avg tokens 2173.300 |tokens/s 8324.107 |walltime 8270.208 | +Transformer | epoch 0 | step 31280 |avg loss 7.687 |avg tokens 2120.100 |tokens/s 8113.692 |walltime 8272.821 | +Transformer | epoch 0 | step 31290 |avg loss 7.498 |avg tokens 2359.700 |tokens/s 8430.060 |walltime 8275.621 | +Transformer | epoch 0 | step 31300 |avg loss 7.756 |avg tokens 1992.600 |tokens/s 7813.766 |walltime 8278.171 | +Transformer | epoch 0 | step 31310 |avg loss 7.699 |avg tokens 2196.400 |tokens/s 8189.412 |walltime 8280.853 | +Transformer | epoch 0 | step 31320 |avg loss 7.576 |avg tokens 2093.000 |tokens/s 7961.464 |walltime 8283.482 | +Transformer | epoch 0 | step 31330 |avg loss 7.553 |avg tokens 2315.100 |tokens/s 8617.406 |walltime 8286.168 | +Transformer | epoch 0 | step 31340 |avg loss 7.712 |avg tokens 2098.100 |tokens/s 8176.593 |walltime 8288.734 | +Transformer | epoch 0 | step 31350 |avg loss 7.624 |avg tokens 2428.900 |tokens/s 8975.035 |walltime 8291.440 | +Transformer | epoch 0 | step 31360 |avg loss 8.436 |avg tokens 2084.700 |tokens/s 8539.471 |walltime 8293.882 | +Transformer | epoch 0 | step 31370 |avg loss 8.019 |avg tokens 2105.600 |tokens/s 8321.375 |walltime 8296.412 | +Transformer | epoch 0 | step 31380 |avg loss 7.503 |avg tokens 2245.500 |tokens/s 8322.984 |walltime 8299.110 | +Transformer | epoch 0 | step 31390 |avg loss 7.615 |avg tokens 2249.600 |tokens/s 8440.361 |walltime 8301.775 | +Transformer | epoch 0 | step 31400 |avg loss 7.738 |avg tokens 2241.400 |tokens/s 8394.911 |walltime 8304.445 | +Transformer | epoch 0 | step 31410 |avg loss 7.416 |avg tokens 2146.400 |tokens/s 8052.226 |walltime 8307.111 | +Transformer | epoch 0 | step 31420 |avg loss 7.570 |avg tokens 2218.100 |tokens/s 8320.987 |walltime 8309.777 | +Transformer | epoch 0 | step 31430 |avg loss 7.575 |avg tokens 2366.600 |tokens/s 8719.820 |walltime 8312.491 | +Transformer | epoch 0 | step 31440 |avg loss 8.033 |avg tokens 2160.800 |tokens/s 8738.271 |walltime 8314.963 | +Transformer | epoch 0 | step 31450 |avg loss 7.743 |avg tokens 2125.600 |tokens/s 8327.303 |walltime 8317.516 | +Transformer | epoch 0 | step 31460 |avg loss 7.752 |avg tokens 2025.100 |tokens/s 8211.882 |walltime 8319.982 | +Transformer | epoch 0 | step 31470 |avg loss 7.592 |avg tokens 2256.000 |tokens/s 8435.755 |walltime 8322.656 | +Transformer | epoch 0 | step 31480 |avg loss 7.795 |avg tokens 2155.800 |tokens/s 8182.873 |walltime 8325.291 | +Transformer | epoch 0 | step 31490 |avg loss 7.711 |avg tokens 2176.600 |tokens/s 8136.608 |walltime 8327.966 | +Transformer | epoch 0 | step 31500 |avg loss 7.902 |avg tokens 2135.900 |tokens/s 7849.906 |walltime 8330.687 | +Transformer | epoch 0 | step 31510 |avg loss 7.591 |avg tokens 2323.000 |tokens/s 8774.908 |walltime 8333.334 | +Transformer | epoch 0 | step 31520 |avg loss 8.161 |avg tokens 2020.900 |tokens/s 8197.004 |walltime 8335.800 | +Transformer | epoch 0 | step 31530 |avg loss 7.588 |avg tokens 2116.400 |tokens/s 8027.652 |walltime 8338.436 | +Transformer | epoch 0 | step 31540 |avg loss 7.812 |avg tokens 2290.100 |tokens/s 8675.050 |walltime 8341.076 | +Transformer | epoch 0 | step 31550 |avg loss 7.855 |avg tokens 2224.800 |tokens/s 8391.952 |walltime 8343.727 | +Transformer | epoch 0 | step 31560 |avg loss 7.275 |avg tokens 2147.800 |tokens/s 8139.270 |walltime 8346.366 | +Transformer | epoch 0 | step 31570 |avg loss 8.057 |avg tokens 2008.700 |tokens/s 8056.814 |walltime 8348.859 | +Transformer | epoch 0 | step 31580 |avg loss 7.758 |avg tokens 2183.200 |tokens/s 8153.636 |walltime 8351.537 | +Transformer | epoch 0 | step 31590 |avg loss 7.755 |avg tokens 2174.200 |tokens/s 8339.316 |walltime 8354.144 | +Transformer | epoch 0 | step 31600 |avg loss 7.585 |avg tokens 2388.300 |tokens/s 8592.828 |walltime 8356.923 | +Transformer | epoch 0 | step 31610 |avg loss 7.604 |avg tokens 2283.800 |tokens/s 8553.930 |walltime 8359.593 | +Transformer | epoch 0 | step 31620 |avg loss 7.608 |avg tokens 2091.400 |tokens/s 7915.973 |walltime 8362.235 | +Transformer | epoch 0 | step 31630 |avg loss 7.545 |avg tokens 2216.200 |tokens/s 8351.014 |walltime 8364.889 | +Transformer | epoch 0 | step 31640 |avg loss 7.434 |avg tokens 2264.000 |tokens/s 8211.060 |walltime 8367.646 | +Transformer | epoch 0 | step 31650 |avg loss 7.845 |avg tokens 2122.000 |tokens/s 8255.482 |walltime 8370.217 | +Transformer | epoch 0 | step 31660 |avg loss 7.538 |avg tokens 2262.100 |tokens/s 8455.767 |walltime 8372.892 | +Transformer | epoch 0 | step 31670 |avg loss 7.523 |avg tokens 2181.300 |tokens/s 8269.311 |walltime 8375.530 | +Transformer | epoch 0 | step 31680 |avg loss 7.537 |avg tokens 2379.500 |tokens/s 8935.563 |walltime 8378.193 | +Transformer | epoch 0 | step 31690 |avg loss 8.343 |avg tokens 2250.200 |tokens/s 9054.804 |walltime 8380.678 | +Transformer | epoch 0 | step 31700 |avg loss 7.491 |avg tokens 2340.300 |tokens/s 8824.698 |walltime 8383.330 | +Transformer | epoch 0 | step 31710 |avg loss 7.470 |avg tokens 2239.000 |tokens/s 8344.683 |walltime 8386.013 | +Transformer | epoch 0 | step 31720 |avg loss 7.751 |avg tokens 2148.300 |tokens/s 8305.295 |walltime 8388.599 | +Transformer | epoch 0 | step 31730 |avg loss 7.623 |avg tokens 2119.100 |tokens/s 8254.896 |walltime 8391.167 | +Transformer | epoch 0 | step 31740 |avg loss 7.604 |avg tokens 2456.800 |tokens/s 8940.581 |walltime 8393.914 | +Transformer | epoch 0 | step 31750 |avg loss 7.638 |avg tokens 2220.800 |tokens/s 8286.272 |walltime 8396.595 | +Transformer | epoch 0 | step 31760 |avg loss 7.772 |avg tokens 2285.500 |tokens/s 8518.094 |walltime 8399.278 | +Transformer | epoch 0 | step 31770 |avg loss 7.712 |avg tokens 2141.500 |tokens/s 8291.534 |walltime 8401.860 | +Transformer | epoch 0 | step 31780 |avg loss 7.758 |avg tokens 2243.000 |tokens/s 8174.223 |walltime 8404.604 | +Transformer | epoch 0 | step 31790 |avg loss 7.808 |avg tokens 2268.400 |tokens/s 8868.711 |walltime 8407.162 | +Transformer | epoch 0 | step 31800 |avg loss 7.641 |avg tokens 2056.700 |tokens/s 7952.332 |walltime 8409.748 | +Transformer | epoch 0 | step 31810 |avg loss 8.119 |avg tokens 2148.300 |tokens/s 8522.910 |walltime 8412.269 | +Transformer | epoch 0 | step 31820 |avg loss 7.612 |avg tokens 2310.700 |tokens/s 8574.180 |walltime 8414.964 | +Transformer | epoch 0 | step 31830 |avg loss 7.975 |avg tokens 2086.300 |tokens/s 8435.147 |walltime 8417.437 | +Transformer | epoch 0 | step 31840 |avg loss 7.717 |avg tokens 2163.200 |tokens/s 8282.867 |walltime 8420.049 | +Transformer | epoch 0 | step 31850 |avg loss 7.976 |avg tokens 1950.200 |tokens/s 7961.020 |walltime 8422.499 | +Transformer | epoch 0 | step 31860 |avg loss 7.694 |avg tokens 2192.800 |tokens/s 8456.816 |walltime 8425.092 | +Transformer | epoch 0 | step 31870 |avg loss 7.652 |avg tokens 2376.500 |tokens/s 8779.906 |walltime 8427.798 | +Transformer | epoch 0 | step 31880 |avg loss 7.737 |avg tokens 2050.200 |tokens/s 7960.008 |walltime 8430.374 | +Transformer | epoch 0 | step 31890 |avg loss 7.398 |avg tokens 2165.600 |tokens/s 7924.633 |walltime 8433.107 | +Transformer | epoch 0 | step 31900 |avg loss 7.787 |avg tokens 2226.300 |tokens/s 8190.497 |walltime 8435.825 | +Transformer | epoch 0 | step 31910 |avg loss 7.687 |avg tokens 2222.400 |tokens/s 8238.279 |walltime 8438.523 | +Transformer | epoch 0 | step 31920 |avg loss 7.451 |avg tokens 2321.400 |tokens/s 8297.998 |walltime 8441.320 | +Transformer | epoch 0 | step 31930 |avg loss 7.914 |avg tokens 2212.800 |tokens/s 8257.308 |walltime 8444.000 | +Transformer | epoch 0 | step 31940 |avg loss 7.449 |avg tokens 2160.800 |tokens/s 8106.189 |walltime 8446.666 | +Transformer | epoch 0 | step 31950 |avg loss 8.275 |avg tokens 1840.900 |tokens/s 7883.940 |walltime 8449.001 | +Transformer | epoch 0 | step 31960 |avg loss 7.691 |avg tokens 2150.200 |tokens/s 8174.900 |walltime 8451.631 | +Transformer | epoch 0 | step 31970 |avg loss 7.666 |avg tokens 2244.000 |tokens/s 8203.686 |walltime 8454.366 | +Transformer | epoch 0 | step 31980 |avg loss 7.512 |avg tokens 2181.700 |tokens/s 8217.050 |walltime 8457.021 | +Transformer | epoch 0 | step 31990 |avg loss 7.826 |avg tokens 2078.300 |tokens/s 7917.199 |walltime 8459.646 | +Transformer | epoch 0 | step 32000 |avg loss 7.967 |avg tokens 2153.400 |tokens/s 8064.432 |walltime 8462.317 | +Transformer | epoch 0 | step 32010 |avg loss 8.296 |avg tokens 2043.000 |tokens/s 8256.059 |walltime 8464.791 | +Transformer | epoch 0 | step 32020 |avg loss 7.671 |avg tokens 2196.900 |tokens/s 8199.601 |walltime 8467.470 | +Transformer | epoch 0 | step 32030 |avg loss 7.618 |avg tokens 2339.400 |tokens/s 8512.623 |walltime 8470.219 | +Transformer | epoch 0 | step 32040 |avg loss 7.870 |avg tokens 2043.500 |tokens/s 7882.977 |walltime 8472.811 | +Transformer | epoch 0 | step 32050 |avg loss 7.769 |avg tokens 2234.100 |tokens/s 8467.626 |walltime 8475.449 | +Transformer | epoch 0 | step 32060 |avg loss 8.134 |avg tokens 2014.400 |tokens/s 7944.597 |walltime 8477.985 | +Transformer | epoch 0 | step 32070 |avg loss 7.684 |avg tokens 2296.800 |tokens/s 8448.929 |walltime 8480.703 | +Transformer | epoch 0 | step 32080 |avg loss 7.791 |avg tokens 2199.200 |tokens/s 8197.516 |walltime 8483.386 | +Transformer | epoch 0 | step 32090 |avg loss 7.774 |avg tokens 2227.200 |tokens/s 8233.049 |walltime 8486.091 | +Transformer | epoch 0 | step 32100 |avg loss 7.775 |avg tokens 2240.000 |tokens/s 8462.817 |walltime 8488.738 | +Transformer | epoch 0 | step 32110 |avg loss 7.616 |avg tokens 2211.500 |tokens/s 8231.251 |walltime 8491.425 | +Transformer | epoch 0 | step 32120 |avg loss 7.992 |avg tokens 2142.200 |tokens/s 8679.277 |walltime 8493.893 | +Transformer | epoch 0 | step 32130 |avg loss 7.705 |avg tokens 2200.800 |tokens/s 8394.265 |walltime 8496.515 | +Transformer | epoch 0 | step 32140 |avg loss 8.024 |avg tokens 2228.100 |tokens/s 8831.940 |walltime 8499.038 | +Transformer | epoch 0 | step 32150 |avg loss 7.545 |avg tokens 2240.800 |tokens/s 8094.505 |walltime 8501.806 | +Transformer | epoch 0 | step 32160 |avg loss 7.734 |avg tokens 2017.800 |tokens/s 7940.396 |walltime 8504.347 | +Transformer | epoch 0 | step 32170 |avg loss 7.361 |avg tokens 2309.700 |tokens/s 8376.828 |walltime 8507.104 | +Transformer | epoch 0 | step 32180 |avg loss 7.388 |avg tokens 2300.800 |tokens/s 8513.483 |walltime 8509.807 | +Transformer | epoch 0 | step 32190 |avg loss 7.870 |avg tokens 2076.100 |tokens/s 8165.081 |walltime 8512.350 | +Transformer | epoch 0 | step 32200 |avg loss 6.672 |avg tokens 2237.000 |tokens/s 8247.098 |walltime 8515.062 | +Transformer | epoch 0 | step 32210 |avg loss 7.603 |avg tokens 2157.600 |tokens/s 8179.618 |walltime 8517.700 | +Transformer | epoch 0 | step 32220 |avg loss 7.441 |avg tokens 2215.500 |tokens/s 8279.053 |walltime 8520.376 | +Transformer | epoch 0 | step 32230 |avg loss 7.737 |avg tokens 2072.500 |tokens/s 8092.680 |walltime 8522.937 | +Transformer | epoch 0 | step 32240 |avg loss 7.818 |avg tokens 2282.200 |tokens/s 8713.763 |walltime 8525.556 | +Transformer | epoch 0 | step 32250 |avg loss 8.058 |avg tokens 1957.900 |tokens/s 7963.667 |walltime 8528.014 | +Transformer | epoch 0 | step 32260 |avg loss 7.802 |avg tokens 2200.800 |tokens/s 8274.552 |walltime 8530.674 | +Transformer | epoch 0 | step 32270 |avg loss 7.821 |avg tokens 2290.900 |tokens/s 8682.080 |walltime 8533.313 | +Transformer | epoch 0 | step 32280 |avg loss 7.508 |avg tokens 2225.600 |tokens/s 8391.778 |walltime 8535.965 | +Transformer | epoch 0 | step 32290 |avg loss 7.738 |avg tokens 2099.400 |tokens/s 7948.161 |walltime 8538.606 | +Transformer | epoch 0 | step 32300 |avg loss 7.813 |avg tokens 1923.200 |tokens/s 8108.434 |walltime 8540.978 | +Transformer | epoch 0 | step 32310 |avg loss 7.512 |avg tokens 2255.100 |tokens/s 8194.546 |walltime 8543.730 | +Transformer | epoch 0 | step 32320 |avg loss 7.819 |avg tokens 2183.900 |tokens/s 8476.431 |walltime 8546.307 | +Transformer | epoch 0 | step 32330 |avg loss 8.069 |avg tokens 2252.400 |tokens/s 8706.835 |walltime 8548.894 | +Transformer | epoch 0 | step 32340 |avg loss 7.827 |avg tokens 1962.000 |tokens/s 7664.367 |walltime 8551.453 | +Transformer | epoch 0 | step 32350 |avg loss 8.193 |avg tokens 1970.600 |tokens/s 7921.493 |walltime 8553.941 | +Transformer | epoch 0 | step 32360 |avg loss 7.588 |avg tokens 2079.500 |tokens/s 7947.812 |walltime 8556.558 | +Transformer | epoch 0 | step 32370 |avg loss 7.814 |avg tokens 2122.800 |tokens/s 8196.431 |walltime 8559.147 | +Transformer | epoch 0 | step 32380 |avg loss 7.690 |avg tokens 2325.800 |tokens/s 8726.529 |walltime 8561.813 | +Transformer | epoch 0 | step 32390 |avg loss 8.089 |avg tokens 1706.400 |tokens/s 7356.627 |walltime 8564.132 | +Transformer | epoch 0 | step 32400 |avg loss 7.626 |avg tokens 2171.000 |tokens/s 8224.866 |walltime 8566.772 | +Transformer | epoch 0 | step 32410 |avg loss 7.908 |avg tokens 2300.000 |tokens/s 8755.793 |walltime 8569.399 | +Transformer | epoch 0 | step 32420 |avg loss 7.205 |avg tokens 2334.400 |tokens/s 8434.021 |walltime 8572.166 | +Transformer | epoch 0 | step 32430 |avg loss 7.843 |avg tokens 1887.600 |tokens/s 7525.386 |walltime 8574.675 | +Transformer | epoch 0 | step 32440 |avg loss 8.155 |avg tokens 2153.400 |tokens/s 8699.025 |walltime 8577.150 | +Transformer | epoch 0 | step 32450 |avg loss 7.484 |avg tokens 2241.000 |tokens/s 8500.124 |walltime 8579.787 | +Transformer | epoch 0 | step 32460 |avg loss 7.876 |avg tokens 2022.400 |tokens/s 7992.622 |walltime 8582.317 | +Transformer | epoch 0 | step 32470 |avg loss 7.729 |avg tokens 2241.600 |tokens/s 8400.075 |walltime 8584.986 | +Transformer | epoch 0 | step 32480 |avg loss 7.802 |avg tokens 2190.500 |tokens/s 8392.301 |walltime 8587.596 | +Transformer | epoch 0 | step 32490 |avg loss 7.367 |avg tokens 2319.800 |tokens/s 8382.326 |walltime 8590.363 | +Transformer | epoch 0 | step 32500 |avg loss 7.826 |avg tokens 2344.700 |tokens/s 8582.961 |walltime 8593.095 | +Transformer | epoch 0 | step 32510 |avg loss 7.933 |avg tokens 2178.400 |tokens/s 8226.225 |walltime 8595.743 | +Transformer | epoch 0 | step 32520 |avg loss 7.944 |avg tokens 2113.000 |tokens/s 8110.343 |walltime 8598.348 | +Transformer | epoch 0 | step 32530 |avg loss 7.962 |avg tokens 2003.500 |tokens/s 7541.277 |walltime 8601.005 | +Transformer | epoch 0 | step 32540 |avg loss 7.644 |avg tokens 2240.300 |tokens/s 8395.875 |walltime 8603.673 | +Transformer | epoch 0 | step 32550 |avg loss 8.028 |avg tokens 1865.400 |tokens/s 7591.706 |walltime 8606.131 | +Transformer | epoch 0 | step 32560 |avg loss 8.029 |avg tokens 2108.600 |tokens/s 8372.626 |walltime 8608.649 | +Transformer | epoch 0 | step 32570 |avg loss 7.528 |avg tokens 2123.200 |tokens/s 8053.513 |walltime 8611.285 | +Transformer | epoch 0 | step 32580 |avg loss 7.758 |avg tokens 2238.000 |tokens/s 8477.925 |walltime 8613.925 | +Transformer | epoch 0 | step 32590 |avg loss 7.638 |avg tokens 2159.100 |tokens/s 8292.075 |walltime 8616.529 | +Transformer | epoch 0 | step 32600 |avg loss 8.078 |avg tokens 2079.000 |tokens/s 8498.808 |walltime 8618.975 | +Transformer | epoch 0 | step 32610 |avg loss 8.115 |avg tokens 2101.900 |tokens/s 8345.357 |walltime 8621.494 | +Transformer | epoch 0 | step 32620 |avg loss 7.290 |avg tokens 2273.600 |tokens/s 8147.858 |walltime 8624.284 | +Transformer | epoch 0 | step 32630 |avg loss 7.708 |avg tokens 2183.800 |tokens/s 7980.206 |walltime 8627.021 | +Transformer | epoch 0 | step 32640 |avg loss 7.743 |avg tokens 2276.800 |tokens/s 8433.548 |walltime 8629.721 | +Transformer | epoch 0 | step 32650 |avg loss 7.694 |avg tokens 2225.800 |tokens/s 8362.853 |walltime 8632.382 | +Transformer | epoch 0 | step 32660 |avg loss 7.554 |avg tokens 2147.200 |tokens/s 8128.828 |walltime 8635.024 | +Transformer | epoch 0 | step 32670 |avg loss 7.647 |avg tokens 2381.900 |tokens/s 8662.591 |walltime 8637.773 | +Transformer | epoch 0 | step 32680 |avg loss 7.675 |avg tokens 2039.200 |tokens/s 7910.402 |walltime 8640.351 | +Transformer | epoch 0 | step 32690 |avg loss 7.721 |avg tokens 2419.800 |tokens/s 8990.289 |walltime 8643.043 | +Transformer | epoch 0 | step 32700 |avg loss 7.795 |avg tokens 2276.700 |tokens/s 8465.827 |walltime 8645.732 | +Transformer | epoch 0 | step 32710 |avg loss 7.964 |avg tokens 2119.000 |tokens/s 8405.038 |walltime 8648.253 | +Transformer | epoch 0 | step 32720 |avg loss 7.720 |avg tokens 2316.900 |tokens/s 9080.054 |walltime 8650.805 | +Transformer | epoch 0 | step 32730 |avg loss 8.172 |avg tokens 2214.300 |tokens/s 8629.672 |walltime 8653.371 | +Transformer | epoch 0 | step 32740 |avg loss 7.329 |avg tokens 2197.700 |tokens/s 8100.316 |walltime 8656.084 | +Transformer | epoch 0 | step 32750 |avg loss 7.636 |avg tokens 2104.500 |tokens/s 7954.414 |walltime 8658.729 | +Transformer | epoch 0 | step 32760 |avg loss 7.642 |avg tokens 2160.500 |tokens/s 8379.921 |walltime 8661.308 | +Transformer | epoch 0 | step 32770 |avg loss 7.730 |avg tokens 2189.300 |tokens/s 8193.551 |walltime 8663.980 | +Transformer | epoch 0 | step 32780 |avg loss 7.524 |avg tokens 2234.900 |tokens/s 8291.553 |walltime 8666.675 | +Transformer | epoch 0 | step 32790 |avg loss 7.588 |avg tokens 2093.500 |tokens/s 7912.355 |walltime 8669.321 | +Transformer | epoch 0 | step 32800 |avg loss 7.885 |avg tokens 2360.200 |tokens/s 8800.779 |walltime 8672.003 | +Transformer | epoch 0 | step 32810 |avg loss 7.693 |avg tokens 2017.000 |tokens/s 7960.061 |walltime 8674.537 | +Transformer | epoch 0 | step 32820 |avg loss 7.842 |avg tokens 2221.600 |tokens/s 8264.781 |walltime 8677.225 | +Transformer | epoch 0 | step 32830 |avg loss 7.577 |avg tokens 2149.700 |tokens/s 8290.400 |walltime 8679.818 | +Transformer | epoch 0 | step 32840 |avg loss 7.760 |avg tokens 2109.700 |tokens/s 8047.635 |walltime 8682.439 | +Transformer | epoch 0 | step 32850 |avg loss 7.545 |avg tokens 2403.600 |tokens/s 8742.610 |walltime 8685.188 | +Transformer | epoch 0 | step 32860 |avg loss 7.980 |avg tokens 1869.900 |tokens/s 7483.513 |walltime 8687.687 | +Transformer | epoch 0 | step 32870 |avg loss 7.593 |avg tokens 2208.800 |tokens/s 8393.121 |walltime 8690.319 | +Transformer | epoch 0 | step 32880 |avg loss 7.331 |avg tokens 2230.400 |tokens/s 8124.335 |walltime 8693.064 | +Transformer | epoch 0 | step 32890 |avg loss 7.737 |avg tokens 2224.300 |tokens/s 8468.115 |walltime 8695.691 | +Transformer | epoch 0 | step 32900 |avg loss 7.512 |avg tokens 2270.000 |tokens/s 8348.763 |walltime 8698.410 | +Transformer | epoch 0 | step 32910 |avg loss 7.413 |avg tokens 2318.300 |tokens/s 8285.888 |walltime 8701.208 | +Transformer | epoch 0 | step 32920 |avg loss 7.734 |avg tokens 2057.300 |tokens/s 8126.820 |walltime 8703.739 | +Transformer | epoch 0 | step 32930 |avg loss 7.351 |avg tokens 2223.500 |tokens/s 8192.880 |walltime 8706.453 | +Transformer | epoch 0 | step 32940 |avg loss 8.092 |avg tokens 2131.100 |tokens/s 8324.327 |walltime 8709.013 | +Transformer | epoch 0 | step 32950 |avg loss 7.879 |avg tokens 2226.900 |tokens/s 8519.013 |walltime 8711.627 | +Transformer | epoch 0 | step 32960 |avg loss 7.558 |avg tokens 2206.500 |tokens/s 8160.154 |walltime 8714.331 | +Transformer | epoch 0 | step 32970 |avg loss 7.567 |avg tokens 2234.200 |tokens/s 8303.905 |walltime 8717.022 | +Transformer | epoch 0 | step 32980 |avg loss 8.025 |avg tokens 2298.100 |tokens/s 8840.785 |walltime 8719.621 | +Transformer | epoch 0 | step 32990 |avg loss 7.831 |avg tokens 2065.800 |tokens/s 8047.246 |walltime 8722.188 | +Transformer | epoch 0 | step 33000 |avg loss 8.038 |avg tokens 2230.400 |tokens/s 8842.073 |walltime 8724.711 | +Transformer | epoch 0 | step 33010 |avg loss 7.688 |avg tokens 2139.200 |tokens/s 8058.337 |walltime 8727.365 | +Transformer | epoch 0 | step 33020 |avg loss 7.374 |avg tokens 2327.900 |tokens/s 8371.408 |walltime 8730.146 | +Transformer | epoch 0 | step 33030 |avg loss 7.400 |avg tokens 2351.700 |tokens/s 8457.324 |walltime 8732.927 | +Transformer | epoch 0 | step 33040 |avg loss 7.681 |avg tokens 2228.400 |tokens/s 8205.688 |walltime 8735.643 | +Transformer | epoch 0 | step 33050 |avg loss 7.718 |avg tokens 2154.400 |tokens/s 8222.005 |walltime 8738.263 | +Transformer | epoch 0 | step 33060 |avg loss 7.907 |avg tokens 2230.800 |tokens/s 8759.049 |walltime 8740.810 | +Transformer | epoch 0 | step 33070 |avg loss 7.934 |avg tokens 1781.200 |tokens/s 7330.142 |walltime 8743.240 | +Transformer | epoch 0 | step 33080 |avg loss 7.588 |avg tokens 2308.800 |tokens/s 8373.067 |walltime 8745.997 | +Transformer | epoch 0 | step 33090 |avg loss 7.692 |avg tokens 2270.300 |tokens/s 8393.168 |walltime 8748.702 | +Transformer | epoch 0 | step 33100 |avg loss 7.815 |avg tokens 2124.800 |tokens/s 8307.756 |walltime 8751.260 | +Transformer | epoch 0 | step 33110 |avg loss 7.852 |avg tokens 2145.600 |tokens/s 8246.946 |walltime 8753.861 | +Transformer | epoch 0 | step 33120 |avg loss 7.707 |avg tokens 2145.900 |tokens/s 8276.795 |walltime 8756.454 | +Transformer | epoch 0 | step 33130 |avg loss 8.164 |avg tokens 2078.200 |tokens/s 8523.272 |walltime 8758.892 | +Transformer | epoch 0 | step 33140 |avg loss 7.687 |avg tokens 2102.700 |tokens/s 8034.206 |walltime 8761.510 | +Transformer | epoch 0 | step 33150 |avg loss 7.932 |avg tokens 2352.500 |tokens/s 9030.920 |walltime 8764.114 | +Transformer | epoch 0 | step 33160 |avg loss 7.366 |avg tokens 2321.500 |tokens/s 8531.371 |walltime 8766.836 | +Transformer | epoch 0 | step 33170 |avg loss 7.358 |avg tokens 2299.200 |tokens/s 8448.385 |walltime 8769.557 | +Transformer | epoch 0 | step 33180 |avg loss 8.122 |avg tokens 2189.700 |tokens/s 8501.281 |walltime 8772.133 | +Transformer | epoch 0 | step 33190 |avg loss 7.693 |avg tokens 2224.600 |tokens/s 8402.906 |walltime 8774.780 | +Transformer | epoch 0 | step 33200 |avg loss 7.834 |avg tokens 2254.400 |tokens/s 8540.375 |walltime 8777.420 | +Transformer | epoch 0 | step 33210 |avg loss 7.944 |avg tokens 2223.200 |tokens/s 8367.675 |walltime 8780.077 | +Transformer | epoch 0 | step 33220 |avg loss 7.804 |avg tokens 2029.700 |tokens/s 7727.423 |walltime 8782.703 | +Transformer | epoch 0 | step 33230 |avg loss 7.845 |avg tokens 2323.400 |tokens/s 8673.788 |walltime 8785.382 | +Transformer | epoch 0 | step 33240 |avg loss 8.109 |avg tokens 1957.100 |tokens/s 8175.471 |walltime 8787.776 | +Transformer | epoch 0 | step 33250 |avg loss 8.067 |avg tokens 2358.700 |tokens/s 9113.388 |walltime 8790.364 | +Transformer | epoch 0 | step 33260 |avg loss 7.451 |avg tokens 2272.900 |tokens/s 8333.668 |walltime 8793.092 | +Transformer | epoch 0 | step 33270 |avg loss 7.931 |avg tokens 2346.500 |tokens/s 8599.420 |walltime 8795.820 | +Transformer | epoch 0 | step 33280 |avg loss 8.084 |avg tokens 2081.600 |tokens/s 8376.103 |walltime 8798.305 | +Transformer | epoch 0 | step 33290 |avg loss 7.983 |avg tokens 2323.400 |tokens/s 8765.333 |walltime 8800.956 | +Transformer | epoch 0 | step 33300 |avg loss 7.500 |avg tokens 2113.000 |tokens/s 8032.943 |walltime 8803.586 | +Transformer | epoch 0 | step 33310 |avg loss 7.705 |avg tokens 2132.400 |tokens/s 8200.975 |walltime 8806.187 | +Transformer | epoch 0 | step 33320 |avg loss 7.400 |avg tokens 2203.100 |tokens/s 8140.847 |walltime 8808.893 | +Transformer | epoch 0 | step 33330 |avg loss 7.491 |avg tokens 2087.700 |tokens/s 7941.168 |walltime 8811.522 | +Transformer | epoch 0 | step 33340 |avg loss 7.793 |avg tokens 2245.300 |tokens/s 8357.706 |walltime 8814.208 | +Transformer | epoch 0 | step 33350 |avg loss 7.548 |avg tokens 2212.800 |tokens/s 8152.430 |walltime 8816.923 | +Transformer | epoch 0 | step 33360 |avg loss 8.056 |avg tokens 2271.300 |tokens/s 8851.707 |walltime 8819.489 | +Transformer | epoch 0 | step 33370 |avg loss 7.609 |avg tokens 2290.300 |tokens/s 8270.308 |walltime 8822.258 | +Transformer | epoch 0 | step 33380 |avg loss 7.568 |avg tokens 2328.800 |tokens/s 8426.282 |walltime 8825.022 | +Transformer | epoch 0 | step 33390 |avg loss 7.719 |avg tokens 2326.400 |tokens/s 8635.160 |walltime 8827.716 | +Transformer | epoch 0 | step 33400 |avg loss 7.866 |avg tokens 1968.800 |tokens/s 7694.157 |walltime 8830.275 | +Transformer | epoch 0 | step 33410 |avg loss 7.665 |avg tokens 2108.800 |tokens/s 7900.647 |walltime 8832.944 | +Transformer | epoch 0 | step 33420 |avg loss 7.744 |avg tokens 2106.600 |tokens/s 8129.419 |walltime 8835.535 | +Transformer | epoch 0 | step 33430 |avg loss 7.781 |avg tokens 2192.700 |tokens/s 8471.634 |walltime 8838.123 | +Transformer | epoch 0 | step 33440 |avg loss 7.210 |avg tokens 2232.600 |tokens/s 8047.190 |walltime 8840.898 | +Transformer | epoch 0 | step 33450 |avg loss 7.758 |avg tokens 2284.900 |tokens/s 8427.277 |walltime 8843.609 | +Transformer | epoch 0 | step 33460 |avg loss 7.498 |avg tokens 2004.900 |tokens/s 7976.179 |walltime 8846.123 | +Transformer | epoch 0 | step 33470 |avg loss 7.688 |avg tokens 2235.200 |tokens/s 8389.294 |walltime 8848.787 | +Transformer | epoch 0 | step 33480 |avg loss 7.703 |avg tokens 2350.500 |tokens/s 8575.331 |walltime 8851.528 | +Transformer | epoch 0 | step 33490 |avg loss 7.831 |avg tokens 2184.100 |tokens/s 8542.927 |walltime 8854.085 | +Transformer | epoch 0 | step 33500 |avg loss 7.661 |avg tokens 2175.200 |tokens/s 8085.510 |walltime 8856.775 | +Transformer | epoch 0 | step 33510 |avg loss 7.522 |avg tokens 2239.200 |tokens/s 8287.743 |walltime 8859.477 | +Transformer | epoch 0 | step 33520 |avg loss 7.721 |avg tokens 2342.400 |tokens/s 8493.953 |walltime 8862.234 | +Transformer | epoch 0 | step 33530 |avg loss 7.701 |avg tokens 2245.400 |tokens/s 8408.794 |walltime 8864.905 | +Transformer | epoch 0 | step 33540 |avg loss 7.397 |avg tokens 2282.400 |tokens/s 8229.643 |walltime 8867.678 | +Transformer | epoch 0 | step 33550 |avg loss 7.711 |avg tokens 2022.500 |tokens/s 7919.989 |walltime 8870.232 | +Transformer | epoch 0 | step 33560 |avg loss 7.573 |avg tokens 2391.000 |tokens/s 8624.710 |walltime 8873.004 | +Transformer | epoch 0 | step 33570 |avg loss 7.933 |avg tokens 2341.700 |tokens/s 9207.436 |walltime 8875.547 | +Transformer | epoch 0 | step 33580 |avg loss 7.688 |avg tokens 2245.000 |tokens/s 8440.069 |walltime 8878.207 | +Transformer | epoch 0 | step 33590 |avg loss 7.808 |avg tokens 2178.700 |tokens/s 8349.174 |walltime 8880.817 | +Transformer | epoch 0 | step 33600 |avg loss 7.947 |avg tokens 1964.800 |tokens/s 7777.459 |walltime 8883.343 | +Transformer | epoch 0 | step 33610 |avg loss 7.885 |avg tokens 2182.900 |tokens/s 8545.285 |walltime 8885.898 | +Transformer | epoch 0 | step 33620 |avg loss 8.024 |avg tokens 2157.600 |tokens/s 8330.297 |walltime 8888.488 | +Transformer | epoch 0 | step 33630 |avg loss 7.799 |avg tokens 2278.700 |tokens/s 8512.458 |walltime 8891.165 | +Transformer | epoch 0 | step 33640 |avg loss 7.883 |avg tokens 2127.900 |tokens/s 8231.945 |walltime 8893.749 | +Transformer | epoch 0 | step 33650 |avg loss 8.082 |avg tokens 2006.600 |tokens/s 8102.029 |walltime 8896.226 | +Transformer | epoch 0 | step 33660 |avg loss 7.813 |avg tokens 2259.600 |tokens/s 8596.052 |walltime 8898.855 | +Transformer | epoch 0 | step 33670 |avg loss 7.253 |avg tokens 2286.400 |tokens/s 8315.772 |walltime 8901.604 | +Transformer | epoch 0 | step 33680 |avg loss 8.188 |avg tokens 2221.900 |tokens/s 8598.388 |walltime 8904.188 | +Transformer | epoch 0 | step 33690 |avg loss 7.544 |avg tokens 2211.100 |tokens/s 8227.007 |walltime 8906.876 | +Transformer | epoch 0 | step 33700 |avg loss 7.320 |avg tokens 2235.100 |tokens/s 8151.470 |walltime 8909.618 | +Transformer | epoch 0 | step 33710 |avg loss 7.727 |avg tokens 2266.200 |tokens/s 8348.074 |walltime 8912.333 | +Transformer | epoch 0 | step 33720 |avg loss 7.653 |avg tokens 2249.600 |tokens/s 8469.294 |walltime 8914.989 | +Transformer | epoch 0 | step 33730 |avg loss 7.999 |avg tokens 1976.700 |tokens/s 7877.556 |walltime 8917.498 | +Transformer | epoch 0 | step 33740 |avg loss 7.915 |avg tokens 2137.800 |tokens/s 8159.559 |walltime 8920.118 | +Transformer | epoch 0 | step 33750 |avg loss 7.804 |avg tokens 2018.400 |tokens/s 8034.557 |walltime 8922.630 | +Transformer | epoch 0 | step 33760 |avg loss 7.876 |avg tokens 1945.800 |tokens/s 7729.673 |walltime 8925.148 | +Transformer | epoch 0 | step 33770 |avg loss 8.166 |avg tokens 2214.400 |tokens/s 8510.401 |walltime 8927.750 | +Transformer | epoch 0 | step 33780 |avg loss 7.481 |avg tokens 2275.400 |tokens/s 8369.321 |walltime 8930.468 | +Transformer | epoch 0 | step 33790 |avg loss 7.704 |avg tokens 2158.000 |tokens/s 8281.328 |walltime 8933.074 | +Transformer | epoch 0 | step 33800 |avg loss 7.343 |avg tokens 2110.400 |tokens/s 8097.733 |walltime 8935.680 | +Transformer | epoch 0 | step 33810 |avg loss 7.722 |avg tokens 2130.800 |tokens/s 7965.446 |walltime 8938.355 | +Transformer | epoch 0 | step 33820 |avg loss 7.647 |avg tokens 2322.200 |tokens/s 8396.608 |walltime 8941.121 | +Transformer | epoch 0 | step 33830 |avg loss 7.566 |avg tokens 2260.100 |tokens/s 8452.689 |walltime 8943.795 | +Transformer | epoch 0 | step 33840 |avg loss 8.001 |avg tokens 2207.800 |tokens/s 8274.906 |walltime 8946.463 | +Transformer | epoch 0 | step 33850 |avg loss 7.513 |avg tokens 2093.300 |tokens/s 8024.654 |walltime 8949.072 | +Transformer | epoch 0 | step 33860 |avg loss 7.368 |avg tokens 2212.400 |tokens/s 8108.663 |walltime 8951.800 | +Transformer | epoch 0 | step 33870 |avg loss 7.939 |avg tokens 1961.600 |tokens/s 7706.922 |walltime 8954.345 | +Transformer | epoch 0 | step 33880 |avg loss 7.525 |avg tokens 2255.500 |tokens/s 8422.260 |walltime 8957.023 | +Transformer | epoch 0 | step 33890 |avg loss 7.587 |avg tokens 2332.000 |tokens/s 8689.361 |walltime 8959.707 | +Transformer | epoch 0 | step 33900 |avg loss 7.504 |avg tokens 2336.400 |tokens/s 8702.183 |walltime 8962.392 | +Transformer | epoch 0 | step 33910 |avg loss 7.860 |avg tokens 2286.200 |tokens/s 8723.531 |walltime 8965.013 | +Transformer | epoch 0 | step 33920 |avg loss 7.326 |avg tokens 2256.800 |tokens/s 8245.033 |walltime 8967.750 | +Transformer | epoch 0 | step 33930 |avg loss 7.982 |avg tokens 2327.100 |tokens/s 8942.450 |walltime 8970.352 | +Transformer | epoch 0 | step 33940 |avg loss 7.297 |avg tokens 2352.800 |tokens/s 8670.711 |walltime 8973.066 | +Transformer | epoch 0 | step 33950 |avg loss 7.705 |avg tokens 2232.800 |tokens/s 8454.197 |walltime 8975.707 | +Transformer | epoch 0 | step 33960 |avg loss 7.594 |avg tokens 2255.900 |tokens/s 8454.855 |walltime 8978.375 | +Transformer | epoch 0 | step 33970 |avg loss 7.558 |avg tokens 2281.300 |tokens/s 8276.202 |walltime 8981.131 | +Transformer | epoch 0 | step 33980 |avg loss 7.881 |avg tokens 2204.000 |tokens/s 8211.139 |walltime 8983.815 | +Transformer | epoch 0 | step 33990 |avg loss 7.626 |avg tokens 2414.200 |tokens/s 8949.877 |walltime 8986.513 | +Transformer | epoch 0 | step 34000 |avg loss 7.574 |avg tokens 2273.800 |tokens/s 8328.985 |walltime 8989.243 | +Transformer | epoch 0 | step 34010 |avg loss 7.830 |avg tokens 2156.900 |tokens/s 8202.082 |walltime 8991.873 | +Transformer | epoch 0 | step 34020 |avg loss 7.522 |avg tokens 2230.900 |tokens/s 8159.307 |walltime 8994.607 | +Transformer | epoch 0 | step 34030 |avg loss 7.978 |avg tokens 2344.900 |tokens/s 8802.929 |walltime 8997.271 | +Transformer | epoch 0 | step 34040 |avg loss 7.250 |avg tokens 2362.400 |tokens/s 8436.203 |walltime 9000.071 | +Transformer | epoch 0 | step 34050 |avg loss 7.387 |avg tokens 2144.800 |tokens/s 8076.564 |walltime 9002.726 | +Transformer | epoch 0 | step 34060 |avg loss 7.895 |avg tokens 2214.600 |tokens/s 8458.125 |walltime 9005.345 | +Transformer | epoch 0 | step 34070 |avg loss 7.911 |avg tokens 1978.300 |tokens/s 7848.850 |walltime 9007.865 | +Transformer | epoch 0 | step 34080 |avg loss 7.656 |avg tokens 1813.800 |tokens/s 7247.004 |walltime 9010.368 | +Transformer | epoch 0 | step 34090 |avg loss 7.482 |avg tokens 2153.700 |tokens/s 7988.658 |walltime 9013.064 | +Transformer | epoch 0 | step 34100 |avg loss 7.758 |avg tokens 2197.500 |tokens/s 8254.551 |walltime 9015.726 | +Transformer | epoch 0 | step 34110 |avg loss 7.519 |avg tokens 2307.000 |tokens/s 8409.988 |walltime 9018.469 | +Transformer | epoch 0 | step 34120 |avg loss 7.829 |avg tokens 2008.300 |tokens/s 7787.438 |walltime 9021.048 | +Transformer | epoch 0 | step 34130 |avg loss 7.637 |avg tokens 2332.800 |tokens/s 8697.865 |walltime 9023.730 | +Transformer | epoch 0 | step 34140 |avg loss 7.643 |avg tokens 2152.000 |tokens/s 8142.889 |walltime 9026.373 | +Transformer | epoch 0 | step 34150 |avg loss 8.106 |avg tokens 2291.000 |tokens/s 8983.385 |walltime 9028.923 | +Transformer | epoch 0 | step 34160 |avg loss 8.042 |avg tokens 2221.300 |tokens/s 8691.157 |walltime 9031.479 | +Transformer | epoch 0 | step 34170 |avg loss 7.553 |avg tokens 2190.300 |tokens/s 8230.151 |walltime 9034.141 | +Transformer | epoch 0 | step 34180 |avg loss 7.720 |avg tokens 2329.600 |tokens/s 8735.679 |walltime 9036.807 | +Transformer | epoch 0 | step 34190 |avg loss 7.910 |avg tokens 2335.300 |tokens/s 8961.922 |walltime 9039.413 | +Transformer | epoch 0 | step 34200 |avg loss 7.723 |avg tokens 2262.400 |tokens/s 8582.786 |walltime 9042.049 | +Transformer | epoch 0 | step 34210 |avg loss 7.630 |avg tokens 2135.300 |tokens/s 8216.413 |walltime 9044.648 | +Transformer | epoch 0 | step 34220 |avg loss 7.828 |avg tokens 2208.900 |tokens/s 8418.459 |walltime 9047.272 | +Transformer | epoch 0 | step 34230 |avg loss 7.528 |avg tokens 2284.800 |tokens/s 8313.340 |walltime 9050.020 | +Transformer | epoch 0 | step 34240 |avg loss 7.771 |avg tokens 2008.700 |tokens/s 7731.900 |walltime 9052.618 | +Transformer | epoch 0 | step 34250 |avg loss 7.866 |avg tokens 2131.800 |tokens/s 8245.897 |walltime 9055.203 | +Transformer | epoch 0 | step 34260 |avg loss 7.686 |avg tokens 2090.700 |tokens/s 8069.484 |walltime 9057.794 | +Transformer | epoch 0 | step 34270 |avg loss 7.758 |avg tokens 2104.200 |tokens/s 8159.078 |walltime 9060.373 | +Transformer | epoch 0 | step 34280 |avg loss 7.751 |avg tokens 2286.400 |tokens/s 8466.069 |walltime 9063.074 | +Transformer | epoch 0 | step 34290 |avg loss 7.331 |avg tokens 2026.000 |tokens/s 7805.669 |walltime 9065.669 | +Transformer | epoch 0 | step 34300 |avg loss 8.067 |avg tokens 1865.700 |tokens/s 7634.420 |walltime 9068.113 | +Transformer | epoch 0 | step 34310 |avg loss 7.886 |avg tokens 2100.700 |tokens/s 8282.298 |walltime 9070.650 | +Transformer | epoch 0 | step 34320 |avg loss 7.419 |avg tokens 2266.400 |tokens/s 8438.992 |walltime 9073.335 | +Transformer | epoch 0 | step 34330 |avg loss 7.933 |avg tokens 2030.400 |tokens/s 7963.599 |walltime 9075.885 | +Transformer | epoch 0 | step 34340 |avg loss 7.852 |avg tokens 2087.100 |tokens/s 8473.609 |walltime 9078.348 | +Transformer | epoch 0 | step 34350 |avg loss 7.580 |avg tokens 2309.200 |tokens/s 8726.135 |walltime 9080.994 | +Transformer | epoch 0 | step 34360 |avg loss 7.950 |avg tokens 2310.800 |tokens/s 8942.825 |walltime 9083.578 | +Transformer | epoch 0 | step 34370 |avg loss 7.508 |avg tokens 2192.100 |tokens/s 8399.912 |walltime 9086.188 | +Transformer | epoch 0 | step 34380 |avg loss 7.766 |avg tokens 2205.900 |tokens/s 8365.568 |walltime 9088.825 | +Transformer | epoch 0 | step 34390 |avg loss 7.764 |avg tokens 2211.500 |tokens/s 8422.310 |walltime 9091.451 | +Transformer | epoch 0 | step 34400 |avg loss 8.434 |avg tokens 2128.900 |tokens/s 8865.408 |walltime 9093.852 | +Transformer | epoch 0 | step 34410 |avg loss 8.133 |avg tokens 2117.800 |tokens/s 8408.752 |walltime 9096.370 | +Transformer | epoch 0 | step 34420 |avg loss 8.059 |avg tokens 1984.600 |tokens/s 8350.898 |walltime 9098.747 | +Transformer | epoch 0 | step 34430 |avg loss 7.711 |avg tokens 2011.400 |tokens/s 7760.269 |walltime 9101.339 | +Transformer | epoch 0 | step 34440 |avg loss 7.795 |avg tokens 2069.400 |tokens/s 7890.551 |walltime 9103.962 | +Transformer | epoch 0 | step 34450 |avg loss 7.310 |avg tokens 2272.800 |tokens/s 8161.769 |walltime 9106.746 | +Transformer | epoch 0 | step 34460 |avg loss 7.369 |avg tokens 2352.000 |tokens/s 8390.201 |walltime 9109.549 | +Transformer | epoch 0 | step 34470 |avg loss 7.644 |avg tokens 2264.000 |tokens/s 8529.441 |walltime 9112.204 | +Transformer | epoch 0 | step 34480 |avg loss 7.981 |avg tokens 2305.100 |tokens/s 8750.605 |walltime 9114.838 | +Transformer | epoch 0 | step 34490 |avg loss 7.733 |avg tokens 2270.000 |tokens/s 8272.140 |walltime 9117.582 | +Transformer | epoch 0 | step 34500 |avg loss 7.567 |avg tokens 2080.800 |tokens/s 8016.559 |walltime 9120.178 | +Transformer | epoch 0 | step 34510 |avg loss 7.697 |avg tokens 2022.500 |tokens/s 7691.002 |walltime 9122.808 | +Transformer | epoch 0 | step 34520 |avg loss 7.722 |avg tokens 2061.300 |tokens/s 7947.713 |walltime 9125.401 | +Transformer | epoch 0 | step 34530 |avg loss 8.148 |avg tokens 1967.400 |tokens/s 8170.553 |walltime 9127.809 | +Transformer | epoch 0 | step 34540 |avg loss 7.805 |avg tokens 2114.300 |tokens/s 8291.684 |walltime 9130.359 | +Transformer | epoch 0 | step 34550 |avg loss 7.482 |avg tokens 2168.000 |tokens/s 7954.975 |walltime 9133.084 | +Transformer | epoch 0 | step 34560 |avg loss 7.840 |avg tokens 1958.300 |tokens/s 7556.047 |walltime 9135.676 | +Transformer | epoch 0 | step 34570 |avg loss 7.463 |avg tokens 2359.200 |tokens/s 8533.435 |walltime 9138.441 | +Transformer | epoch 0 | step 34580 |avg loss 8.043 |avg tokens 1969.800 |tokens/s 7865.967 |walltime 9140.945 | +Transformer | epoch 0 | step 34590 |avg loss 7.490 |avg tokens 2005.100 |tokens/s 7927.734 |walltime 9143.474 | +Transformer | epoch 0 | step 34600 |avg loss 7.790 |avg tokens 2096.800 |tokens/s 7937.928 |walltime 9146.116 | +Transformer | epoch 0 | step 34610 |avg loss 7.727 |avg tokens 2037.700 |tokens/s 7610.904 |walltime 9148.793 | +Transformer | epoch 0 | step 34620 |avg loss 7.896 |avg tokens 2091.800 |tokens/s 8349.764 |walltime 9151.298 | +Transformer | epoch 0 | step 34630 |avg loss 7.879 |avg tokens 2314.700 |tokens/s 8563.577 |walltime 9154.001 | +Transformer | epoch 0 | step 34640 |avg loss 7.719 |avg tokens 2215.900 |tokens/s 8738.041 |walltime 9156.537 | +Transformer | epoch 0 | step 34650 |avg loss 7.569 |avg tokens 2256.000 |tokens/s 8329.509 |walltime 9159.245 | +Transformer | epoch 0 | step 34660 |avg loss 7.688 |avg tokens 2157.300 |tokens/s 8095.476 |walltime 9161.910 | +Transformer | epoch 0 | step 34670 |avg loss 7.634 |avg tokens 2408.700 |tokens/s 8879.830 |walltime 9164.623 | +Transformer | epoch 0 | step 34680 |avg loss 7.799 |avg tokens 2330.500 |tokens/s 8694.006 |walltime 9167.303 | +Transformer | epoch 0 | step 34690 |avg loss 7.854 |avg tokens 2260.900 |tokens/s 8509.286 |walltime 9169.960 | +Transformer | epoch 0 | step 34700 |avg loss 7.619 |avg tokens 2159.500 |tokens/s 8098.937 |walltime 9172.627 | +Transformer | epoch 0 | step 34710 |avg loss 8.155 |avg tokens 2092.500 |tokens/s 8147.035 |walltime 9175.195 | +Transformer | epoch 0 | step 34720 |avg loss 7.558 |avg tokens 2446.200 |tokens/s 8979.265 |walltime 9177.920 | +Transformer | epoch 0 | step 34730 |avg loss 7.793 |avg tokens 2136.800 |tokens/s 8199.224 |walltime 9180.526 | +Transformer | epoch 0 | step 34740 |avg loss 7.429 |avg tokens 2321.000 |tokens/s 8761.488 |walltime 9183.175 | +Transformer | epoch 0 | step 34750 |avg loss 7.512 |avg tokens 2059.900 |tokens/s 7849.087 |walltime 9185.799 | +Transformer | epoch 0 | step 34760 |avg loss 7.221 |avg tokens 2316.800 |tokens/s 8559.453 |walltime 9188.506 | +Transformer | epoch 0 | step 34770 |avg loss 7.853 |avg tokens 2271.600 |tokens/s 8770.559 |walltime 9191.096 | +Transformer | epoch 0 | step 34780 |avg loss 7.342 |avg tokens 2232.800 |tokens/s 8325.667 |walltime 9193.778 | +Transformer | epoch 0 | step 34790 |avg loss 7.421 |avg tokens 2252.900 |tokens/s 8327.372 |walltime 9196.483 | +Transformer | epoch 0 | step 34800 |avg loss 7.693 |avg tokens 2005.600 |tokens/s 7828.722 |walltime 9199.045 | +Transformer | epoch 0 | step 34810 |avg loss 7.549 |avg tokens 2324.000 |tokens/s 8509.403 |walltime 9201.776 | +Transformer | epoch 0 | step 34820 |avg loss 7.682 |avg tokens 2161.900 |tokens/s 7918.171 |walltime 9204.506 | +Transformer | epoch 0 | step 34830 |avg loss 7.893 |avg tokens 1943.700 |tokens/s 7640.109 |walltime 9207.050 | +Transformer | epoch 0 | step 34840 |avg loss 7.822 |avg tokens 2078.700 |tokens/s 8232.480 |walltime 9209.575 | +Transformer | epoch 0 | step 34850 |avg loss 8.015 |avg tokens 2136.200 |tokens/s 8321.468 |walltime 9212.143 | +Transformer | epoch 0 | step 34860 |avg loss 7.622 |avg tokens 2161.300 |tokens/s 8124.723 |walltime 9214.803 | +Transformer | epoch 0 | step 34870 |avg loss 7.450 |avg tokens 2189.600 |tokens/s 8121.542 |walltime 9217.499 | +Transformer | epoch 0 | step 34880 |avg loss 7.220 |avg tokens 2288.000 |tokens/s 8346.956 |walltime 9220.240 | +Transformer | epoch 0 | step 34890 |avg loss 7.888 |avg tokens 2315.500 |tokens/s 8622.046 |walltime 9222.925 | +Transformer | epoch 0 | step 34900 |avg loss 7.559 |avg tokens 2289.600 |tokens/s 8638.163 |walltime 9225.576 | +Transformer | epoch 0 | step 34910 |avg loss 8.026 |avg tokens 1805.400 |tokens/s 7508.513 |walltime 9227.980 | +Transformer | epoch 0 | step 34920 |avg loss 7.948 |avg tokens 2193.000 |tokens/s 8337.441 |walltime 9230.611 | +Transformer | epoch 0 | step 34930 |avg loss 7.689 |avg tokens 2057.400 |tokens/s 8152.116 |walltime 9233.135 | +Transformer | epoch 0 | step 34940 |avg loss 7.982 |avg tokens 2315.000 |tokens/s 8946.362 |walltime 9235.722 | +Transformer | epoch 0 | step 34950 |avg loss 7.512 |avg tokens 2442.400 |tokens/s 8874.172 |walltime 9238.474 | +Transformer | epoch 0 | step 34960 |avg loss 7.982 |avg tokens 2197.100 |tokens/s 8332.951 |walltime 9241.111 | +Transformer | epoch 0 | step 34970 |avg loss 7.547 |avg tokens 2324.500 |tokens/s 8377.277 |walltime 9243.886 | +Transformer | epoch 0 | step 34980 |avg loss 7.713 |avg tokens 2206.000 |tokens/s 8409.223 |walltime 9246.509 | +Transformer | epoch 0 | step 34990 |avg loss 8.131 |avg tokens 2139.400 |tokens/s 8298.927 |walltime 9249.087 | +Transformer | epoch 0 | step 35000 |avg loss 7.603 |avg tokens 2067.700 |tokens/s 8062.356 |walltime 9251.652 | +Transformer | epoch 0 | step 35010 |avg loss 8.238 |avg tokens 2024.100 |tokens/s 8286.961 |walltime 9254.094 | +Transformer | epoch 0 | step 35020 |avg loss 7.740 |avg tokens 2254.500 |tokens/s 8438.706 |walltime 9256.766 | +Transformer | epoch 0 | step 35030 |avg loss 7.766 |avg tokens 2231.200 |tokens/s 8276.731 |walltime 9259.462 | +Transformer | epoch 0 | step 35040 |avg loss 7.933 |avg tokens 2143.700 |tokens/s 8289.011 |walltime 9262.048 | +Transformer | epoch 0 | step 35050 |avg loss 7.634 |avg tokens 2229.100 |tokens/s 8263.969 |walltime 9264.745 | +Transformer | epoch 0 | step 35060 |avg loss 7.700 |avg tokens 2188.100 |tokens/s 8029.596 |walltime 9267.470 | +Transformer | epoch 0 | step 35070 |avg loss 7.934 |avg tokens 2176.800 |tokens/s 8444.930 |walltime 9270.048 | +Transformer | epoch 0 | step 35080 |avg loss 7.897 |avg tokens 2177.500 |tokens/s 8463.198 |walltime 9272.621 | +Transformer | epoch 0 | step 35090 |avg loss 7.760 |avg tokens 2256.500 |tokens/s 8506.850 |walltime 9275.273 | +Transformer | epoch 0 | step 35100 |avg loss 7.762 |avg tokens 2060.800 |tokens/s 7765.440 |walltime 9277.927 | +Transformer | epoch 0 | step 35110 |avg loss 7.544 |avg tokens 2266.400 |tokens/s 8269.042 |walltime 9280.668 | +Transformer | epoch 0 | step 35120 |avg loss 8.078 |avg tokens 2088.800 |tokens/s 8276.413 |walltime 9283.192 | +Transformer | epoch 0 | step 35130 |avg loss 7.643 |avg tokens 2162.900 |tokens/s 8243.471 |walltime 9285.816 | +Transformer | epoch 0 | step 35140 |avg loss 7.897 |avg tokens 2161.000 |tokens/s 8219.005 |walltime 9288.445 | +Transformer | epoch 0 | step 35150 |avg loss 7.626 |avg tokens 2342.600 |tokens/s 8726.001 |walltime 9291.130 | +Transformer | epoch 0 | step 35160 |avg loss 7.776 |avg tokens 2317.700 |tokens/s 8713.664 |walltime 9293.789 | +Transformer | epoch 0 | step 35170 |avg loss 7.380 |avg tokens 1940.800 |tokens/s 7675.040 |walltime 9296.318 | +Transformer | epoch 0 | step 35180 |avg loss 7.898 |avg tokens 1896.000 |tokens/s 7612.833 |walltime 9298.809 | +Transformer | epoch 0 | step 35190 |avg loss 8.000 |avg tokens 1921.900 |tokens/s 7598.289 |walltime 9301.338 | +Transformer | epoch 0 | step 35200 |avg loss 7.510 |avg tokens 2197.600 |tokens/s 8240.601 |walltime 9304.005 | +Transformer | epoch 0 | step 35210 |avg loss 7.633 |avg tokens 2281.800 |tokens/s 8241.420 |walltime 9306.774 | +Transformer | epoch 0 | step 35220 |avg loss 7.674 |avg tokens 2199.000 |tokens/s 8542.396 |walltime 9309.348 | +Transformer | epoch 0 | step 35230 |avg loss 7.817 |avg tokens 2173.500 |tokens/s 8153.087 |walltime 9312.014 | +Transformer | epoch 0 | step 35240 |avg loss 8.095 |avg tokens 2025.000 |tokens/s 8194.337 |walltime 9314.485 | +Transformer | epoch 0 | step 35250 |avg loss 7.837 |avg tokens 2102.900 |tokens/s 8253.931 |walltime 9317.033 | +Transformer | epoch 0 | step 35260 |avg loss 7.891 |avg tokens 2202.700 |tokens/s 8608.973 |walltime 9319.591 | +Transformer | epoch 0 | step 35270 |avg loss 8.016 |avg tokens 2154.100 |tokens/s 8330.827 |walltime 9322.177 | +Transformer | epoch 0 | step 35280 |avg loss 7.618 |avg tokens 2241.600 |tokens/s 8237.498 |walltime 9324.898 | +Transformer | epoch 0 | step 35290 |avg loss 7.591 |avg tokens 2381.600 |tokens/s 8522.489 |walltime 9327.693 | +Transformer | epoch 0 | step 35300 |avg loss 8.109 |avg tokens 2158.200 |tokens/s 8371.693 |walltime 9330.271 | +Transformer | epoch 0 | step 35310 |avg loss 7.635 |avg tokens 2199.700 |tokens/s 8189.070 |walltime 9332.957 | +Transformer | epoch 0 | step 35320 |avg loss 8.158 |avg tokens 2433.800 |tokens/s 9464.541 |walltime 9335.528 | +Transformer | epoch 0 | step 35330 |avg loss 7.836 |avg tokens 2103.400 |tokens/s 8113.367 |walltime 9338.121 | +Transformer | epoch 0 | step 35340 |avg loss 7.854 |avg tokens 2114.400 |tokens/s 7894.156 |walltime 9340.799 | +Transformer | epoch 0 | step 35350 |avg loss 7.611 |avg tokens 2194.200 |tokens/s 8458.961 |walltime 9343.393 | +Transformer | epoch 0 | step 35360 |avg loss 7.908 |avg tokens 2259.200 |tokens/s 8558.958 |walltime 9346.033 | +Transformer | epoch 0 | step 35370 |avg loss 7.881 |avg tokens 1781.300 |tokens/s 7296.434 |walltime 9348.474 | +Transformer | epoch 0 | step 35380 |avg loss 7.786 |avg tokens 2202.000 |tokens/s 8355.967 |walltime 9351.109 | +Transformer | epoch 0 | step 35390 |avg loss 7.778 |avg tokens 2130.600 |tokens/s 8002.652 |walltime 9353.772 | +Transformer | epoch 0 | step 35400 |avg loss 7.393 |avg tokens 2388.000 |tokens/s 8506.168 |walltime 9356.579 | +Transformer | epoch 0 | step 35410 |avg loss 8.031 |avg tokens 1849.200 |tokens/s 7223.578 |walltime 9359.139 | +Transformer | epoch 0 | step 35420 |avg loss 7.885 |avg tokens 2100.000 |tokens/s 8273.571 |walltime 9361.677 | +Transformer | epoch 0 | step 35430 |avg loss 7.566 |avg tokens 2382.700 |tokens/s 8806.722 |walltime 9364.383 | +Transformer | epoch 0 | step 35440 |avg loss 7.842 |avg tokens 1942.600 |tokens/s 8259.257 |walltime 9366.735 | +Transformer | epoch 0 | step 35450 |avg loss 7.900 |avg tokens 2117.000 |tokens/s 8036.743 |walltime 9369.369 | +Transformer | epoch 0 | step 35460 |avg loss 7.892 |avg tokens 2032.700 |tokens/s 7964.482 |walltime 9371.921 | +Transformer | epoch 0 | step 35470 |avg loss 7.671 |avg tokens 2399.200 |tokens/s 8800.740 |walltime 9374.647 | +Transformer | epoch 0 | step 35480 |avg loss 7.649 |avg tokens 2358.900 |tokens/s 8294.135 |walltime 9377.491 | +Transformer | epoch 0 | step 35490 |avg loss 7.726 |avg tokens 2109.000 |tokens/s 7938.932 |walltime 9380.148 | +Transformer | epoch 0 | step 35500 |avg loss 7.709 |avg tokens 2051.100 |tokens/s 8024.974 |walltime 9382.704 | +Transformer | epoch 0 | step 35510 |avg loss 7.999 |avg tokens 1990.200 |tokens/s 7853.951 |walltime 9385.238 | +Transformer | epoch 0 | step 35520 |avg loss 8.130 |avg tokens 1854.600 |tokens/s 7439.141 |walltime 9387.731 | +Transformer | epoch 0 | step 35530 |avg loss 7.636 |avg tokens 2009.600 |tokens/s 7642.808 |walltime 9390.360 | +Transformer | epoch 0 | step 35540 |avg loss 7.509 |avg tokens 2145.000 |tokens/s 7965.568 |walltime 9393.053 | +Transformer | epoch 0 | step 35550 |avg loss 7.647 |avg tokens 2174.400 |tokens/s 8216.307 |walltime 9395.700 | +Transformer | epoch 0 | step 35560 |avg loss 7.645 |avg tokens 2369.600 |tokens/s 8696.142 |walltime 9398.424 | +Transformer | epoch 0 | step 35570 |avg loss 7.697 |avg tokens 2153.600 |tokens/s 8298.960 |walltime 9401.019 | +Transformer | epoch 0 | step 35580 |avg loss 7.541 |avg tokens 2080.900 |tokens/s 7905.113 |walltime 9403.652 | +Transformer | epoch 0 | step 35590 |avg loss 7.447 |avg tokens 2242.400 |tokens/s 8219.260 |walltime 9406.380 | +Transformer | epoch 0 | step 35600 |avg loss 7.464 |avg tokens 2254.900 |tokens/s 8114.252 |walltime 9409.159 | +Transformer | epoch 0 | step 35610 |avg loss 7.930 |avg tokens 2173.300 |tokens/s 8346.486 |walltime 9411.763 | +Transformer | epoch 0 | step 35620 |avg loss 7.789 |avg tokens 2130.900 |tokens/s 7985.168 |walltime 9414.431 | +Transformer | epoch 0 | step 35630 |avg loss 7.789 |avg tokens 2374.900 |tokens/s 8869.805 |walltime 9417.109 | +Transformer | epoch 0 | step 35640 |avg loss 7.674 |avg tokens 2354.300 |tokens/s 8758.829 |walltime 9419.797 | +Transformer | epoch 0 | step 35650 |avg loss 7.627 |avg tokens 2307.600 |tokens/s 8427.425 |walltime 9422.535 | +Transformer | epoch 0 | step 35660 |avg loss 7.702 |avg tokens 2172.300 |tokens/s 8279.393 |walltime 9425.159 | +Transformer | epoch 0 | step 35670 |avg loss 7.921 |avg tokens 1943.900 |tokens/s 7571.862 |walltime 9427.726 | +Transformer | epoch 0 | step 35680 |avg loss 7.541 |avg tokens 2338.000 |tokens/s 8591.437 |walltime 9430.447 | +Transformer | epoch 0 | step 35690 |avg loss 7.874 |avg tokens 2114.800 |tokens/s 8473.499 |walltime 9432.943 | +Transformer | epoch 0 | step 35700 |avg loss 8.344 |avg tokens 1980.900 |tokens/s 7960.625 |walltime 9435.432 | +Transformer | epoch 0 | step 35710 |avg loss 7.655 |avg tokens 2167.000 |tokens/s 8199.300 |walltime 9438.074 | +Transformer | epoch 0 | step 35720 |avg loss 7.469 |avg tokens 2313.700 |tokens/s 8370.632 |walltime 9440.839 | +Transformer | epoch 0 | step 35730 |avg loss 7.644 |avg tokens 2273.600 |tokens/s 8601.916 |walltime 9443.482 | +Transformer | epoch 0 | step 35740 |avg loss 7.726 |avg tokens 2124.100 |tokens/s 8221.408 |walltime 9446.065 | +Transformer | epoch 0 | step 35750 |avg loss 7.531 |avg tokens 2011.400 |tokens/s 7684.537 |walltime 9448.683 | +Transformer | epoch 0 | step 35760 |avg loss 8.032 |avg tokens 2078.700 |tokens/s 7892.695 |walltime 9451.316 | +Transformer | epoch 0 | step 35770 |avg loss 7.755 |avg tokens 2195.200 |tokens/s 8383.543 |walltime 9453.935 | +Transformer | epoch 0 | step 35780 |avg loss 7.375 |avg tokens 2228.300 |tokens/s 8177.460 |walltime 9456.660 | +Transformer | epoch 0 | step 35790 |avg loss 7.649 |avg tokens 2206.400 |tokens/s 8256.726 |walltime 9459.332 | +Transformer | epoch 0 | step 35800 |avg loss 7.653 |avg tokens 2228.200 |tokens/s 8239.583 |walltime 9462.036 | +Transformer | epoch 0 | step 35810 |avg loss 7.615 |avg tokens 2208.100 |tokens/s 8093.041 |walltime 9464.765 | +Transformer | epoch 0 | step 35820 |avg loss 7.756 |avg tokens 1868.400 |tokens/s 7724.225 |walltime 9467.184 | +Transformer | epoch 0 | step 35830 |avg loss 7.594 |avg tokens 2098.600 |tokens/s 8006.815 |walltime 9469.805 | +Transformer | epoch 0 | step 35840 |avg loss 7.504 |avg tokens 2323.200 |tokens/s 8613.201 |walltime 9472.502 | +Transformer | epoch 0 | step 35850 |avg loss 8.023 |avg tokens 1896.600 |tokens/s 7472.832 |walltime 9475.040 | +Transformer | epoch 0 | step 35860 |avg loss 7.845 |avg tokens 2174.400 |tokens/s 8414.317 |walltime 9477.624 | +Transformer | epoch 0 | step 35870 |avg loss 7.824 |avg tokens 2323.100 |tokens/s 8871.615 |walltime 9480.243 | +Transformer | epoch 0 | step 35880 |avg loss 8.008 |avg tokens 2314.600 |tokens/s 8906.889 |walltime 9482.841 | +Transformer | epoch 0 | step 35890 |avg loss 7.950 |avg tokens 2061.600 |tokens/s 8221.743 |walltime 9485.349 | +Transformer | epoch 0 | step 35900 |avg loss 7.315 |avg tokens 2387.400 |tokens/s 8600.589 |walltime 9488.125 | +Transformer | epoch 0 | step 35910 |avg loss 7.521 |avg tokens 2324.000 |tokens/s 8562.583 |walltime 9490.839 | +Transformer | epoch 0 | step 35920 |avg loss 7.813 |avg tokens 2059.300 |tokens/s 7866.955 |walltime 9493.457 | +Transformer | epoch 0 | step 35930 |avg loss 7.556 |avg tokens 2257.400 |tokens/s 8442.131 |walltime 9496.130 | +Transformer | epoch 0 | step 35940 |avg loss 7.783 |avg tokens 2202.400 |tokens/s 8309.159 |walltime 9498.781 | +Transformer | epoch 0 | step 35950 |avg loss 7.569 |avg tokens 2373.600 |tokens/s 8714.867 |walltime 9501.505 | +Transformer | epoch 0 | step 35960 |avg loss 7.854 |avg tokens 2064.600 |tokens/s 8102.339 |walltime 9504.053 | +Transformer | epoch 0 | step 35970 |avg loss 7.945 |avg tokens 1953.400 |tokens/s 7709.865 |walltime 9506.586 | +Transformer | epoch 0 | step 35980 |avg loss 7.747 |avg tokens 2201.600 |tokens/s 8114.002 |walltime 9509.300 | +Transformer | epoch 0 | step 35990 |avg loss 7.525 |avg tokens 2218.800 |tokens/s 8167.274 |walltime 9512.017 | +Transformer | epoch 0 | step 36000 |avg loss 7.894 |avg tokens 1995.200 |tokens/s 7673.727 |walltime 9514.617 | +Transformer | epoch 0 | step 36010 |avg loss 7.838 |avg tokens 2427.400 |tokens/s 9078.384 |walltime 9517.290 | +Transformer | epoch 0 | step 36020 |avg loss 7.441 |avg tokens 2401.600 |tokens/s 8630.660 |walltime 9520.073 | +Transformer | epoch 0 | step 36030 |avg loss 7.869 |avg tokens 2173.300 |tokens/s 8291.407 |walltime 9522.694 | +Transformer | epoch 0 | step 36040 |avg loss 7.585 |avg tokens 2272.900 |tokens/s 8307.476 |walltime 9525.430 | +Transformer | epoch 0 | step 36050 |avg loss 7.755 |avg tokens 2242.400 |tokens/s 8414.503 |walltime 9528.095 | +Transformer | epoch 0 | step 36060 |avg loss 7.708 |avg tokens 2269.000 |tokens/s 8527.353 |walltime 9530.756 | +Transformer | epoch 0 | step 36070 |avg loss 7.702 |avg tokens 2282.500 |tokens/s 8443.158 |walltime 9533.459 | +Transformer | epoch 0 | step 36080 |avg loss 7.871 |avg tokens 2086.200 |tokens/s 8037.502 |walltime 9536.055 | +Transformer | epoch 0 | step 36090 |avg loss 7.689 |avg tokens 2142.900 |tokens/s 8145.017 |walltime 9538.686 | +Transformer | epoch 0 | step 36100 |avg loss 7.968 |avg tokens 2173.300 |tokens/s 8509.332 |walltime 9541.240 | +Transformer | epoch 0 | step 36110 |avg loss 7.889 |avg tokens 2205.100 |tokens/s 8387.478 |walltime 9543.869 | +Transformer | epoch 0 | step 36120 |avg loss 7.516 |avg tokens 2281.600 |tokens/s 8342.928 |walltime 9546.604 | +Transformer | epoch 0 | step 36130 |avg loss 8.100 |avg tokens 2096.300 |tokens/s 8179.226 |walltime 9549.167 | +Transformer | epoch 0 | step 36140 |avg loss 7.671 |avg tokens 2005.100 |tokens/s 7799.452 |walltime 9551.737 | +Transformer | epoch 0 | step 36150 |avg loss 7.800 |avg tokens 2298.000 |tokens/s 8319.130 |walltime 9554.500 | +Transformer | epoch 0 | step 36160 |avg loss 7.808 |avg tokens 2219.400 |tokens/s 8654.270 |walltime 9557.064 | +Transformer | epoch 0 | step 36170 |avg loss 7.884 |avg tokens 2249.200 |tokens/s 8597.098 |walltime 9559.681 | +Transformer | epoch 0 | step 36180 |avg loss 7.716 |avg tokens 2145.300 |tokens/s 8045.490 |walltime 9562.347 | +Transformer | epoch 0 | step 36190 |avg loss 8.027 |avg tokens 2058.400 |tokens/s 8108.340 |walltime 9564.886 | +Transformer | epoch 0 | step 36200 |avg loss 7.415 |avg tokens 2312.000 |tokens/s 8464.785 |walltime 9567.617 | +Transformer | epoch 0 | step 36210 |avg loss 7.654 |avg tokens 2389.600 |tokens/s 8722.298 |walltime 9570.357 | +Transformer | epoch 0 | step 36220 |avg loss 7.671 |avg tokens 2268.000 |tokens/s 8356.040 |walltime 9573.071 | +Transformer | epoch 0 | step 36230 |avg loss 7.710 |avg tokens 2113.600 |tokens/s 8290.982 |walltime 9575.620 | +Transformer | epoch 0 | step 36240 |avg loss 7.456 |avg tokens 2168.700 |tokens/s 8243.652 |walltime 9578.251 | +Transformer | epoch 0 | step 36250 |avg loss 7.768 |avg tokens 1955.700 |tokens/s 7879.344 |walltime 9580.733 | +Transformer | epoch 0 | step 36260 |avg loss 7.470 |avg tokens 2361.600 |tokens/s 8553.338 |walltime 9583.494 | +Transformer | epoch 0 | step 36270 |avg loss 7.950 |avg tokens 1999.900 |tokens/s 7721.530 |walltime 9586.084 | +Transformer | epoch 0 | step 36280 |avg loss 8.003 |avg tokens 2279.600 |tokens/s 8503.671 |walltime 9588.765 | +Transformer | epoch 0 | step 36290 |avg loss 7.740 |avg tokens 2272.500 |tokens/s 8532.886 |walltime 9591.428 | +Transformer | epoch 0 | step 36300 |avg loss 7.913 |avg tokens 2313.900 |tokens/s 8795.012 |walltime 9594.059 | +Transformer | epoch 0 | step 36310 |avg loss 7.572 |avg tokens 2293.600 |tokens/s 8432.038 |walltime 9596.779 | +Transformer | epoch 0 | step 36320 |avg loss 8.060 |avg tokens 2260.100 |tokens/s 8931.293 |walltime 9599.310 | +Transformer | epoch 0 | step 36330 |avg loss 7.854 |avg tokens 2099.000 |tokens/s 8185.764 |walltime 9601.874 | +Transformer | epoch 0 | step 36340 |avg loss 7.956 |avg tokens 2027.400 |tokens/s 7943.977 |walltime 9604.426 | +Transformer | epoch 0 | step 36350 |avg loss 7.892 |avg tokens 2245.400 |tokens/s 8502.690 |walltime 9607.067 | +Transformer | epoch 0 | step 36360 |avg loss 7.689 |avg tokens 2280.000 |tokens/s 8540.279 |walltime 9609.736 | +Transformer | epoch 0 | step 36370 |avg loss 7.967 |avg tokens 1927.300 |tokens/s 7580.673 |walltime 9612.279 | +Transformer | epoch 0 | step 36380 |avg loss 8.005 |avg tokens 2149.900 |tokens/s 8627.712 |walltime 9614.771 | +Transformer | epoch 0 | step 36390 |avg loss 7.778 |avg tokens 2314.000 |tokens/s 8659.552 |walltime 9617.443 | +Transformer | epoch 0 | step 36400 |avg loss 7.878 |avg tokens 2013.300 |tokens/s 7912.163 |walltime 9619.987 | +Transformer | epoch 0 | step 36410 |avg loss 7.871 |avg tokens 2262.000 |tokens/s 8651.994 |walltime 9622.602 | +Transformer | epoch 0 | step 36420 |avg loss 8.026 |avg tokens 1989.900 |tokens/s 7917.286 |walltime 9625.115 | +Transformer | epoch 0 | step 36430 |avg loss 7.503 |avg tokens 2156.800 |tokens/s 8051.849 |walltime 9627.794 | +Transformer | epoch 0 | step 36440 |avg loss 7.603 |avg tokens 2259.200 |tokens/s 8351.812 |walltime 9630.499 | +Transformer | epoch 0 | step 36450 |avg loss 7.707 |avg tokens 2275.200 |tokens/s 8675.225 |walltime 9633.122 | +Transformer | epoch 0 | step 36460 |avg loss 7.619 |avg tokens 2164.300 |tokens/s 8322.843 |walltime 9635.722 | +Transformer | epoch 0 | step 36470 |avg loss 7.826 |avg tokens 2193.100 |tokens/s 8124.558 |walltime 9638.421 | +Transformer | epoch 0 | step 36480 |avg loss 7.818 |avg tokens 2211.500 |tokens/s 8691.418 |walltime 9640.966 | +Transformer | epoch 0 | step 36490 |avg loss 7.586 |avg tokens 2288.500 |tokens/s 8343.036 |walltime 9643.709 | +Transformer | epoch 0 | step 36500 |avg loss 8.045 |avg tokens 2305.700 |tokens/s 8852.913 |walltime 9646.313 | +Transformer | epoch 0 | step 36510 |avg loss 7.656 |avg tokens 2285.200 |tokens/s 8340.298 |walltime 9649.053 | +Transformer | epoch 0 | step 36520 |avg loss 7.662 |avg tokens 2168.200 |tokens/s 8205.498 |walltime 9651.696 | +Transformer | epoch 0 | step 36530 |avg loss 7.968 |avg tokens 2058.700 |tokens/s 8155.837 |walltime 9654.220 | +Transformer | epoch 0 | step 36540 |avg loss 7.644 |avg tokens 2150.300 |tokens/s 8035.908 |walltime 9656.896 | +Transformer | epoch 0 | step 36550 |avg loss 8.025 |avg tokens 1996.400 |tokens/s 7756.771 |walltime 9659.469 | +Transformer | epoch 0 | step 36560 |avg loss 7.585 |avg tokens 2179.700 |tokens/s 8473.542 |walltime 9662.042 | +Transformer | epoch 0 | step 36570 |avg loss 7.464 |avg tokens 2235.900 |tokens/s 8266.982 |walltime 9664.746 | +Transformer | epoch 0 | step 36580 |avg loss 7.805 |avg tokens 2027.700 |tokens/s 7769.784 |walltime 9667.356 | +Transformer | epoch 0 | step 36590 |avg loss 7.630 |avg tokens 2421.600 |tokens/s 8512.613 |walltime 9670.201 | +Transformer | epoch 0 | step 36600 |avg loss 8.117 |avg tokens 2003.800 |tokens/s 8000.755 |walltime 9672.705 | +Transformer | epoch 0 | step 36610 |avg loss 7.902 |avg tokens 2113.600 |tokens/s 8523.420 |walltime 9675.185 | +Transformer | epoch 0 | step 36620 |avg loss 8.025 |avg tokens 2234.300 |tokens/s 8190.296 |walltime 9677.913 | +Transformer | epoch 0 | step 36630 |avg loss 7.769 |avg tokens 2265.800 |tokens/s 8425.397 |walltime 9680.602 | +Transformer | epoch 0 | step 36640 |avg loss 7.593 |avg tokens 2189.800 |tokens/s 8174.723 |walltime 9683.281 | +Transformer | epoch 0 | step 36650 |avg loss 7.803 |avg tokens 2137.700 |tokens/s 8288.658 |walltime 9685.860 | +Transformer | epoch 0 | step 36660 |avg loss 7.486 |avg tokens 2078.000 |tokens/s 8009.579 |walltime 9688.455 | +Transformer | epoch 0 | step 36670 |avg loss 7.889 |avg tokens 1988.900 |tokens/s 8117.633 |walltime 9690.905 | +Transformer | epoch 0 | step 36680 |avg loss 7.866 |avg tokens 2136.200 |tokens/s 8113.929 |walltime 9693.537 | +Transformer | epoch 0 | step 36690 |avg loss 7.975 |avg tokens 2276.500 |tokens/s 8784.383 |walltime 9696.129 | +Transformer | epoch 0 | step 36700 |avg loss 7.900 |avg tokens 1866.300 |tokens/s 7589.306 |walltime 9698.588 | +Transformer | epoch 0 | step 36710 |avg loss 7.612 |avg tokens 2378.400 |tokens/s 8717.352 |walltime 9701.316 | +Transformer | epoch 0 | step 36720 |avg loss 7.550 |avg tokens 2237.800 |tokens/s 8355.963 |walltime 9703.995 | +Transformer | epoch 0 | step 36730 |avg loss 7.651 |avg tokens 2153.700 |tokens/s 8075.995 |walltime 9706.661 | +Transformer | epoch 0 | step 36740 |avg loss 7.815 |avg tokens 2110.000 |tokens/s 8122.464 |walltime 9709.259 | +Transformer | epoch 0 | step 36750 |avg loss 7.745 |avg tokens 2280.000 |tokens/s 8559.717 |walltime 9711.923 | +Transformer | epoch 0 | step 36760 |avg loss 7.911 |avg tokens 2004.000 |tokens/s 7853.122 |walltime 9714.475 | +Transformer | epoch 0 | step 36770 |avg loss 7.972 |avg tokens 2206.600 |tokens/s 8358.308 |walltime 9717.115 | +Transformer | epoch 0 | step 36780 |avg loss 7.808 |avg tokens 2183.700 |tokens/s 8693.039 |walltime 9719.627 | +Transformer | epoch 0 | step 36790 |avg loss 7.719 |avg tokens 2155.200 |tokens/s 8168.175 |walltime 9722.265 | +Transformer | epoch 0 | step 36800 |avg loss 7.846 |avg tokens 2186.800 |tokens/s 8178.473 |walltime 9724.939 | +Transformer | epoch 0 | step 36810 |avg loss 7.883 |avg tokens 2049.600 |tokens/s 8062.768 |walltime 9727.481 | +Transformer | epoch 0 | step 36820 |avg loss 7.743 |avg tokens 2314.400 |tokens/s 8935.457 |walltime 9730.071 | +Transformer | epoch 0 | step 36830 |avg loss 8.027 |avg tokens 2289.200 |tokens/s 8877.509 |walltime 9732.650 | +Transformer | epoch 0 | step 36840 |avg loss 7.732 |avg tokens 2085.400 |tokens/s 8123.118 |walltime 9735.217 | +Transformer | epoch 0 | step 36850 |avg loss 8.072 |avg tokens 1994.000 |tokens/s 7982.867 |walltime 9737.715 | +Transformer | epoch 0 | step 36860 |avg loss 8.123 |avg tokens 2112.000 |tokens/s 8354.781 |walltime 9740.243 | +Transformer | epoch 0 | step 36870 |avg loss 7.835 |avg tokens 2163.900 |tokens/s 8292.788 |walltime 9742.852 | +Transformer | epoch 0 | step 36880 |avg loss 7.666 |avg tokens 2369.600 |tokens/s 8582.691 |walltime 9745.613 | +Transformer | epoch 0 | step 36890 |avg loss 7.803 |avg tokens 2292.100 |tokens/s 8733.714 |walltime 9748.238 | +Transformer | epoch 0 | step 36900 |avg loss 7.428 |avg tokens 2271.700 |tokens/s 8285.049 |walltime 9750.979 | +Transformer | epoch 0 | step 36910 |avg loss 7.673 |avg tokens 2235.400 |tokens/s 8160.272 |walltime 9753.719 | +Transformer | epoch 0 | step 36920 |avg loss 7.840 |avg tokens 2191.600 |tokens/s 8348.010 |walltime 9756.344 | +Transformer | epoch 0 | step 36930 |avg loss 7.463 |avg tokens 2068.700 |tokens/s 7988.833 |walltime 9758.934 | +Transformer | epoch 0 | step 36940 |avg loss 8.032 |avg tokens 1741.200 |tokens/s 7301.493 |walltime 9761.318 | +Transformer | epoch 0 | step 36950 |avg loss 7.769 |avg tokens 2394.500 |tokens/s 8967.484 |walltime 9763.989 | +Transformer | epoch 0 | step 36960 |avg loss 7.779 |avg tokens 2061.700 |tokens/s 8027.779 |walltime 9766.557 | +Transformer | epoch 0 | step 36970 |avg loss 7.695 |avg tokens 2167.200 |tokens/s 8032.028 |walltime 9769.255 | +Transformer | epoch 0 | step 36980 |avg loss 7.929 |avg tokens 2287.300 |tokens/s 8502.699 |walltime 9771.945 | +Transformer | epoch 0 | step 36990 |avg loss 7.636 |avg tokens 2304.900 |tokens/s 8469.624 |walltime 9774.666 | +Transformer | epoch 0 | step 37000 |avg loss 7.896 |avg tokens 2069.600 |tokens/s 8175.247 |walltime 9777.198 | +Transformer | epoch 0 | step 37010 |avg loss 7.536 |avg tokens 2348.000 |tokens/s 8517.994 |walltime 9779.955 | +Transformer | epoch 0 | step 37020 |avg loss 7.611 |avg tokens 2288.700 |tokens/s 8423.859 |walltime 9782.671 | +Transformer | epoch 0 | step 37030 |avg loss 7.513 |avg tokens 2200.000 |tokens/s 8147.549 |walltime 9785.372 | +Transformer | epoch 0 | step 37040 |avg loss 7.428 |avg tokens 2399.600 |tokens/s 8700.086 |walltime 9788.130 | +Transformer | epoch 0 | step 37050 |avg loss 7.434 |avg tokens 2072.100 |tokens/s 7725.360 |walltime 9790.812 | +Transformer | epoch 0 | step 37060 |avg loss 7.354 |avg tokens 2229.500 |tokens/s 8036.988 |walltime 9793.586 | +Transformer | epoch 0 | step 37070 |avg loss 7.661 |avg tokens 2052.000 |tokens/s 7756.515 |walltime 9796.232 | +Transformer | epoch 0 | step 37080 |avg loss 8.115 |avg tokens 2213.800 |tokens/s 9009.309 |walltime 9798.689 | +Transformer | epoch 0 | step 37090 |avg loss 7.597 |avg tokens 2112.000 |tokens/s 8142.908 |walltime 9801.283 | +Transformer | epoch 0 | step 37100 |avg loss 7.395 |avg tokens 2228.800 |tokens/s 8119.731 |walltime 9804.027 | +Transformer | epoch 0 | step 37110 |avg loss 7.851 |avg tokens 1893.200 |tokens/s 7463.757 |walltime 9806.564 | +Transformer | epoch 0 | step 37120 |avg loss 8.048 |avg tokens 2167.300 |tokens/s 8307.491 |walltime 9809.173 | +Transformer | epoch 0 | step 37130 |avg loss 7.876 |avg tokens 2226.800 |tokens/s 8428.987 |walltime 9811.815 | +Transformer | epoch 0 | step 37140 |avg loss 7.712 |avg tokens 2078.000 |tokens/s 8085.378 |walltime 9814.385 | +Transformer | epoch 0 | step 37150 |avg loss 8.159 |avg tokens 1914.900 |tokens/s 7992.293 |walltime 9816.781 | +Transformer | epoch 0 | step 37160 |avg loss 8.176 |avg tokens 2119.200 |tokens/s 8449.263 |walltime 9819.289 | +Transformer | epoch 0 | step 37170 |avg loss 7.790 |avg tokens 2135.700 |tokens/s 8069.427 |walltime 9821.935 | +Transformer | epoch 0 | step 37180 |avg loss 7.765 |avg tokens 2033.800 |tokens/s 8137.183 |walltime 9824.435 | +Transformer | epoch 0 | step 37190 |avg loss 7.768 |avg tokens 2113.600 |tokens/s 7922.003 |walltime 9827.103 | +Transformer | epoch 0 | step 37200 |avg loss 7.835 |avg tokens 2138.000 |tokens/s 8319.394 |walltime 9829.673 | +Transformer | epoch 0 | step 37210 |avg loss 7.697 |avg tokens 2255.200 |tokens/s 8598.384 |walltime 9832.296 | +Transformer | epoch 0 | step 37220 |avg loss 7.834 |avg tokens 2206.400 |tokens/s 8451.519 |walltime 9834.906 | +Transformer | epoch 0 | step 37230 |avg loss 7.653 |avg tokens 2347.200 |tokens/s 8813.007 |walltime 9837.570 | +Transformer | epoch 0 | step 37240 |avg loss 7.308 |avg tokens 2252.800 |tokens/s 8300.603 |walltime 9840.284 | +Transformer | epoch 0 | step 37250 |avg loss 7.661 |avg tokens 2346.000 |tokens/s 8597.429 |walltime 9843.012 | +Transformer | epoch 0 | step 37260 |avg loss 7.469 |avg tokens 2279.200 |tokens/s 8298.786 |walltime 9845.759 | +Transformer | epoch 0 | step 37270 |avg loss 7.757 |avg tokens 2245.600 |tokens/s 8546.646 |walltime 9848.386 | +Transformer | epoch 0 | step 37280 |avg loss 7.852 |avg tokens 2158.300 |tokens/s 8297.269 |walltime 9850.987 | +Transformer | epoch 0 | step 37290 |avg loss 7.895 |avg tokens 2187.900 |tokens/s 8435.940 |walltime 9853.581 | +Transformer | epoch 0 | step 37300 |avg loss 7.909 |avg tokens 2197.200 |tokens/s 8665.292 |walltime 9856.117 | +Transformer | epoch 0 | step 37310 |avg loss 7.980 |avg tokens 2063.000 |tokens/s 8149.071 |walltime 9858.648 | +Transformer | epoch 0 | step 37320 |avg loss 7.797 |avg tokens 2291.900 |tokens/s 8473.454 |walltime 9861.353 | +Transformer | epoch 0 | step 37330 |avg loss 7.305 |avg tokens 2341.800 |tokens/s 8439.138 |walltime 9864.128 | +Transformer | epoch 0 | step 37340 |avg loss 7.857 |avg tokens 2219.100 |tokens/s 8474.484 |walltime 9866.747 | +Transformer | epoch 0 | step 37350 |avg loss 7.839 |avg tokens 2086.000 |tokens/s 8081.663 |walltime 9869.328 | +Transformer | epoch 0 | step 37360 |avg loss 7.739 |avg tokens 2215.200 |tokens/s 8167.850 |walltime 9872.040 | +Transformer | epoch 0 | step 37370 |avg loss 7.765 |avg tokens 2031.900 |tokens/s 7858.377 |walltime 9874.625 | +Transformer | epoch 0 | step 37380 |avg loss 7.849 |avg tokens 2167.400 |tokens/s 8337.255 |walltime 9877.225 | +Transformer | epoch 0 | step 37390 |avg loss 7.708 |avg tokens 2372.800 |tokens/s 8877.491 |walltime 9879.898 | +Transformer | epoch 0 | step 37400 |avg loss 8.084 |avg tokens 1992.700 |tokens/s 7979.842 |walltime 9882.395 | +Transformer | epoch 0 | step 37410 |avg loss 8.013 |avg tokens 1797.600 |tokens/s 7572.642 |walltime 9884.769 | +Transformer | epoch 0 | step 37420 |avg loss 7.269 |avg tokens 2270.400 |tokens/s 8319.981 |walltime 9887.498 | +Transformer | epoch 0 | step 37430 |avg loss 7.921 |avg tokens 2120.000 |tokens/s 8163.495 |walltime 9890.095 | +Transformer | epoch 0 | step 37440 |avg loss 7.476 |avg tokens 2258.900 |tokens/s 8450.944 |walltime 9892.768 | +Transformer | epoch 0 | step 37450 |avg loss 7.504 |avg tokens 2237.100 |tokens/s 8283.361 |walltime 9895.468 | +Transformer | epoch 0 | step 37460 |avg loss 7.608 |avg tokens 2245.600 |tokens/s 8196.701 |walltime 9898.208 | +Transformer | epoch 0 | step 37470 |avg loss 7.510 |avg tokens 2108.300 |tokens/s 7967.381 |walltime 9900.854 | +Transformer | epoch 0 | step 37480 |avg loss 7.905 |avg tokens 2050.100 |tokens/s 8220.945 |walltime 9903.348 | +Transformer | epoch 0 | step 37490 |avg loss 7.650 |avg tokens 2204.000 |tokens/s 8378.658 |walltime 9905.978 | +Transformer | epoch 0 | step 37500 |avg loss 7.701 |avg tokens 2120.800 |tokens/s 8011.252 |walltime 9908.626 | +Transformer | epoch 0 | step 37510 |avg loss 7.952 |avg tokens 2003.800 |tokens/s 7754.906 |walltime 9911.210 | +Transformer | epoch 0 | step 37520 |avg loss 7.838 |avg tokens 2011.700 |tokens/s 7582.772 |walltime 9913.863 | +Transformer | epoch 0 | step 37530 |avg loss 8.032 |avg tokens 2054.200 |tokens/s 8246.453 |walltime 9916.354 | +Transformer | epoch 0 | step 37540 |avg loss 7.804 |avg tokens 2109.900 |tokens/s 8154.166 |walltime 9918.941 | +Transformer | epoch 0 | step 37550 |avg loss 7.820 |avg tokens 2303.200 |tokens/s 8514.631 |walltime 9921.646 | +Transformer | epoch 0 | step 37560 |avg loss 8.293 |avg tokens 2063.300 |tokens/s 8553.038 |walltime 9924.059 | +Transformer | epoch 0 | step 37570 |avg loss 7.968 |avg tokens 2216.400 |tokens/s 8674.397 |walltime 9926.614 | +Transformer | epoch 0 | step 37580 |avg loss 7.743 |avg tokens 2193.700 |tokens/s 8255.678 |walltime 9929.271 | +Transformer | epoch 0 | step 37590 |avg loss 7.532 |avg tokens 2347.200 |tokens/s 8616.386 |walltime 9931.995 | +Transformer | epoch 0 | step 37600 |avg loss 7.752 |avg tokens 2062.700 |tokens/s 8334.008 |walltime 9934.470 | +Transformer | epoch 0 | step 37610 |avg loss 7.511 |avg tokens 2325.200 |tokens/s 8395.243 |walltime 9937.240 | +Transformer | epoch 0 | step 37620 |avg loss 7.371 |avg tokens 2354.400 |tokens/s 8512.132 |walltime 9940.006 | +Transformer | epoch 0 | step 37630 |avg loss 7.690 |avg tokens 2169.600 |tokens/s 7956.359 |walltime 9942.733 | +Transformer | epoch 0 | step 37640 |avg loss 7.597 |avg tokens 2072.400 |tokens/s 7967.450 |walltime 9945.334 | +Transformer | epoch 0 | step 37650 |avg loss 7.815 |avg tokens 2332.200 |tokens/s 8777.756 |walltime 9947.991 | +Transformer | epoch 0 | step 37660 |avg loss 7.926 |avg tokens 2182.600 |tokens/s 8454.452 |walltime 9950.572 | +Transformer | epoch 0 | step 37670 |avg loss 7.681 |avg tokens 2245.600 |tokens/s 8584.301 |walltime 9953.188 | +Transformer | epoch 0 | step 37680 |avg loss 7.572 |avg tokens 2162.100 |tokens/s 8188.868 |walltime 9955.828 | +Transformer | epoch 0 | step 37690 |avg loss 7.778 |avg tokens 2319.200 |tokens/s 8509.032 |walltime 9958.554 | +Transformer | epoch 0 | step 37700 |avg loss 7.598 |avg tokens 2298.000 |tokens/s 8489.117 |walltime 9961.261 | +Transformer | epoch 0 | step 37710 |avg loss 7.663 |avg tokens 2310.100 |tokens/s 8491.900 |walltime 9963.981 | +Transformer | epoch 0 | step 37720 |avg loss 7.622 |avg tokens 2127.100 |tokens/s 7964.125 |walltime 9966.652 | +Transformer | epoch 0 | step 37730 |avg loss 7.880 |avg tokens 1917.100 |tokens/s 7605.394 |walltime 9969.173 | +Transformer | epoch 0 | step 37740 |avg loss 7.622 |avg tokens 2316.600 |tokens/s 8367.634 |walltime 9971.941 | +Transformer | epoch 0 | step 37750 |avg loss 7.794 |avg tokens 2161.900 |tokens/s 8579.067 |walltime 9974.461 | +Transformer | epoch 0 | step 37760 |avg loss 7.805 |avg tokens 1970.400 |tokens/s 7922.392 |walltime 9976.949 | +Transformer | epoch 0 | step 37770 |avg loss 7.838 |avg tokens 2195.800 |tokens/s 8508.951 |walltime 9979.529 | +Transformer | epoch 0 | step 37780 |avg loss 7.649 |avg tokens 2441.400 |tokens/s 9000.773 |walltime 9982.242 | +Transformer | epoch 0 | step 37790 |avg loss 7.506 |avg tokens 2248.000 |tokens/s 8313.818 |walltime 9984.945 | +Transformer | epoch 0 | step 37800 |avg loss 7.668 |avg tokens 2161.400 |tokens/s 7984.878 |walltime 9987.652 | +Transformer | epoch 0 | step 37810 |avg loss 7.711 |avg tokens 2139.200 |tokens/s 8104.437 |walltime 9990.292 | +Transformer | epoch 0 | step 37820 |avg loss 7.465 |avg tokens 2342.100 |tokens/s 8406.619 |walltime 9993.078 | +Transformer | epoch 0 | step 37830 |avg loss 7.605 |avg tokens 2062.200 |tokens/s 7789.286 |walltime 9995.725 | +Transformer | epoch 0 | step 37840 |avg loss 7.522 |avg tokens 2158.300 |tokens/s 8039.664 |walltime 9998.410 | +Transformer | epoch 0 | step 37850 |avg loss 7.900 |avg tokens 2066.800 |tokens/s 8086.900 |walltime 10000.966 | +Transformer | epoch 0 | step 37860 |avg loss 7.583 |avg tokens 2055.500 |tokens/s 7935.329 |walltime 10003.556 | +Transformer | epoch 0 | step 37870 |avg loss 7.858 |avg tokens 2361.800 |tokens/s 8984.438 |walltime 10006.185 | +Transformer | epoch 0 | step 37880 |avg loss 7.580 |avg tokens 2115.200 |tokens/s 7883.779 |walltime 10008.868 | +Transformer | epoch 0 | step 37890 |avg loss 7.767 |avg tokens 2355.400 |tokens/s 8741.496 |walltime 10011.562 | +Transformer | epoch 0 | step 37900 |avg loss 7.682 |avg tokens 2375.200 |tokens/s 8855.654 |walltime 10014.244 | +Transformer | epoch 0 | step 37910 |avg loss 7.612 |avg tokens 2238.400 |tokens/s 8400.909 |walltime 10016.909 | +Transformer | epoch 0 | step 37920 |avg loss 7.414 |avg tokens 2351.800 |tokens/s 8643.193 |walltime 10019.630 | +Transformer | epoch 0 | step 37930 |avg loss 7.251 |avg tokens 2322.300 |tokens/s 8428.223 |walltime 10022.385 | +Transformer | epoch 0 | step 37940 |avg loss 7.668 |avg tokens 2096.200 |tokens/s 8044.786 |walltime 10024.991 | +Transformer | epoch 0 | step 37950 |avg loss 7.747 |avg tokens 2277.300 |tokens/s 8645.609 |walltime 10027.625 | +Transformer | epoch 0 | step 37960 |avg loss 7.678 |avg tokens 2225.300 |tokens/s 8281.512 |walltime 10030.312 | +Transformer | epoch 0 | step 37970 |avg loss 7.557 |avg tokens 2320.800 |tokens/s 8408.227 |walltime 10033.072 | +Transformer | epoch 0 | step 37980 |avg loss 7.649 |avg tokens 2248.100 |tokens/s 8360.989 |walltime 10035.761 | +Transformer | epoch 0 | step 37990 |avg loss 7.647 |avg tokens 2197.900 |tokens/s 7853.086 |walltime 10038.560 | +Transformer | epoch 0 | step 38000 |avg loss 8.022 |avg tokens 2017.100 |tokens/s 8118.615 |walltime 10041.044 | +Transformer | epoch 0 | step 38010 |avg loss 7.411 |avg tokens 2280.000 |tokens/s 8279.452 |walltime 10043.798 | +Transformer | epoch 0 | step 38020 |avg loss 8.000 |avg tokens 2038.100 |tokens/s 7960.683 |walltime 10046.358 | +Transformer | epoch 0 | step 38030 |avg loss 7.636 |avg tokens 2341.000 |tokens/s 8691.097 |walltime 10049.052 | +Transformer | epoch 0 | step 38040 |avg loss 7.471 |avg tokens 2027.900 |tokens/s 7729.713 |walltime 10051.675 | +Transformer | epoch 0 | step 38050 |avg loss 7.586 |avg tokens 2340.000 |tokens/s 8562.985 |walltime 10054.408 | +Transformer | epoch 0 | step 38060 |avg loss 7.678 |avg tokens 2199.000 |tokens/s 8263.687 |walltime 10057.069 | +Transformer | epoch 0 | step 38070 |avg loss 8.045 |avg tokens 2087.700 |tokens/s 8188.149 |walltime 10059.619 | +Transformer | epoch 0 | step 38080 |avg loss 7.378 |avg tokens 2319.600 |tokens/s 8484.016 |walltime 10062.353 | +Transformer | epoch 0 | step 38090 |avg loss 7.791 |avg tokens 2204.800 |tokens/s 8448.013 |walltime 10064.963 | +Transformer | epoch 0 | step 38100 |avg loss 7.685 |avg tokens 2240.800 |tokens/s 8207.866 |walltime 10067.693 | +Transformer | epoch 0 | step 38110 |avg loss 7.765 |avg tokens 2188.000 |tokens/s 8253.265 |walltime 10070.344 | +Transformer | epoch 0 | step 38120 |avg loss 7.984 |avg tokens 2080.900 |tokens/s 8576.629 |walltime 10072.770 | +Transformer | epoch 0 | step 38130 |avg loss 7.738 |avg tokens 2274.300 |tokens/s 8461.085 |walltime 10075.458 | +Transformer | epoch 0 | step 38140 |avg loss 7.638 |avg tokens 2173.100 |tokens/s 8194.464 |walltime 10078.110 | +Transformer | epoch 0 | step 38150 |avg loss 8.193 |avg tokens 1918.400 |tokens/s 8119.398 |walltime 10080.473 | +Transformer | epoch 0 | step 38160 |avg loss 7.479 |avg tokens 2259.300 |tokens/s 8277.148 |walltime 10083.202 | +Transformer | epoch 0 | step 38170 |avg loss 8.042 |avg tokens 2221.500 |tokens/s 8618.832 |walltime 10085.780 | +Transformer | epoch 0 | step 38180 |avg loss 7.819 |avg tokens 2030.800 |tokens/s 7924.282 |walltime 10088.343 | +Transformer | epoch 0 | step 38190 |avg loss 7.779 |avg tokens 2323.400 |tokens/s 8569.397 |walltime 10091.054 | +Transformer | epoch 0 | step 38200 |avg loss 7.825 |avg tokens 2187.500 |tokens/s 8459.692 |walltime 10093.640 | +Transformer | epoch 0 | step 38210 |avg loss 7.766 |avg tokens 2212.600 |tokens/s 8339.100 |walltime 10096.293 | +Transformer | epoch 0 | step 38220 |avg loss 7.790 |avg tokens 2310.100 |tokens/s 8568.640 |walltime 10098.989 | +Transformer | epoch 0 | step 38230 |avg loss 7.830 |avg tokens 2024.900 |tokens/s 7932.500 |walltime 10101.542 | +Transformer | epoch 0 | step 38240 |avg loss 7.877 |avg tokens 2306.600 |tokens/s 8672.774 |walltime 10104.201 | +Transformer | epoch 0 | step 38250 |avg loss 7.761 |avg tokens 2268.000 |tokens/s 8483.413 |walltime 10106.875 | +Transformer | epoch 0 | step 38260 |avg loss 7.617 |avg tokens 2198.200 |tokens/s 8460.299 |walltime 10109.473 | +Transformer | epoch 0 | step 38270 |avg loss 7.633 |avg tokens 2305.500 |tokens/s 8452.046 |walltime 10112.201 | +Transformer | epoch 0 | step 38280 |avg loss 7.582 |avg tokens 2327.200 |tokens/s 8464.958 |walltime 10114.950 | +Transformer | epoch 0 | step 38290 |avg loss 7.397 |avg tokens 2276.000 |tokens/s 8311.039 |walltime 10117.688 | +Transformer | epoch 0 | step 38300 |avg loss 7.897 |avg tokens 2062.600 |tokens/s 7965.360 |walltime 10120.278 | +Transformer | epoch 0 | step 38310 |avg loss 7.921 |avg tokens 2057.200 |tokens/s 8139.798 |walltime 10122.805 | +Transformer | epoch 0 | step 38320 |avg loss 7.367 |avg tokens 2369.700 |tokens/s 8362.310 |walltime 10125.639 | +Transformer | epoch 0 | step 38330 |avg loss 8.260 |avg tokens 1725.300 |tokens/s 7332.663 |walltime 10127.992 | +Transformer | epoch 0 | step 38340 |avg loss 7.862 |avg tokens 2129.400 |tokens/s 8103.365 |walltime 10130.620 | +Transformer | epoch 0 | step 38350 |avg loss 7.148 |avg tokens 2337.000 |tokens/s 8662.635 |walltime 10133.318 | +Transformer | epoch 0 | step 38360 |avg loss 7.572 |avg tokens 2214.100 |tokens/s 8287.360 |walltime 10135.989 | +Transformer | epoch 0 | step 38370 |avg loss 8.027 |avg tokens 2001.400 |tokens/s 7736.543 |walltime 10138.576 | +Transformer | epoch 0 | step 38380 |avg loss 7.405 |avg tokens 2272.800 |tokens/s 8415.021 |walltime 10141.277 | +Transformer | epoch 0 | step 38390 |avg loss 7.680 |avg tokens 2165.900 |tokens/s 8066.700 |walltime 10143.962 | +Transformer | epoch 0 | step 38400 |avg loss 7.941 |avg tokens 2142.400 |tokens/s 8284.954 |walltime 10146.548 | +Transformer | epoch 0 | step 38410 |avg loss 7.824 |avg tokens 2237.600 |tokens/s 8546.760 |walltime 10149.166 | +Transformer | epoch 0 | step 38420 |avg loss 7.774 |avg tokens 2311.200 |tokens/s 8731.342 |walltime 10151.813 | +Transformer | epoch 0 | step 38430 |avg loss 7.486 |avg tokens 2201.900 |tokens/s 8164.160 |walltime 10154.510 | +Transformer | epoch 0 | step 38440 |avg loss 7.975 |avg tokens 2171.400 |tokens/s 8671.840 |walltime 10157.014 | +Transformer | epoch 0 | step 38450 |avg loss 7.994 |avg tokens 2363.900 |tokens/s 8908.681 |walltime 10159.668 | +Transformer | epoch 0 | step 38460 |avg loss 7.687 |avg tokens 2264.100 |tokens/s 8108.515 |walltime 10162.460 | +Transformer | epoch 0 | step 38470 |avg loss 7.708 |avg tokens 2123.300 |tokens/s 7992.901 |walltime 10165.116 | +Transformer | epoch 0 | step 38480 |avg loss 7.779 |avg tokens 2377.000 |tokens/s 8629.752 |walltime 10167.871 | +Transformer | epoch 0 | step 38490 |avg loss 7.749 |avg tokens 2009.900 |tokens/s 7687.881 |walltime 10170.485 | +Transformer | epoch 0 | step 38500 |avg loss 7.395 |avg tokens 2253.100 |tokens/s 8332.556 |walltime 10173.189 | +Transformer | epoch 0 | step 38510 |avg loss 8.065 |avg tokens 1953.100 |tokens/s 8133.784 |walltime 10175.590 | +Transformer | epoch 0 | step 38520 |avg loss 7.739 |avg tokens 2172.800 |tokens/s 8214.752 |walltime 10178.235 | +Transformer | epoch 0 | step 38530 |avg loss 7.973 |avg tokens 2153.100 |tokens/s 8238.139 |walltime 10180.849 | +Transformer | epoch 0 | step 38540 |avg loss 7.681 |avg tokens 2287.100 |tokens/s 8581.823 |walltime 10183.514 | +Transformer | epoch 0 | step 38550 |avg loss 7.714 |avg tokens 2312.800 |tokens/s 8660.638 |walltime 10186.184 | +Transformer | epoch 0 | step 38560 |avg loss 7.590 |avg tokens 2235.200 |tokens/s 8179.304 |walltime 10188.917 | +Transformer | epoch 0 | step 38570 |avg loss 7.685 |avg tokens 1980.000 |tokens/s 7762.725 |walltime 10191.468 | +Transformer | epoch 0 | step 38580 |avg loss 7.719 |avg tokens 2292.500 |tokens/s 8407.828 |walltime 10194.194 | +Transformer | epoch 0 | step 38590 |avg loss 7.595 |avg tokens 2195.200 |tokens/s 8193.390 |walltime 10196.874 | +Transformer | epoch 0 | step 38600 |avg loss 7.649 |avg tokens 2098.300 |tokens/s 8128.532 |walltime 10199.455 | +Transformer | epoch 0 | step 38610 |avg loss 7.869 |avg tokens 2207.600 |tokens/s 8576.463 |walltime 10202.029 | +Transformer | epoch 0 | step 38620 |avg loss 7.564 |avg tokens 2268.100 |tokens/s 8469.798 |walltime 10204.707 | +Transformer | epoch 0 | step 38630 |avg loss 7.778 |avg tokens 2310.900 |tokens/s 8490.501 |walltime 10207.429 | +Transformer | epoch 0 | step 38640 |avg loss 7.743 |avg tokens 2060.200 |tokens/s 8054.533 |walltime 10209.987 | +Transformer | epoch 0 | step 38650 |avg loss 7.341 |avg tokens 2378.400 |tokens/s 8732.889 |walltime 10212.710 | +Transformer | epoch 0 | step 38660 |avg loss 7.728 |avg tokens 2170.100 |tokens/s 8166.934 |walltime 10215.367 | +Transformer | epoch 0 | step 38670 |avg loss 8.281 |avg tokens 2014.500 |tokens/s 8298.281 |walltime 10217.795 | +Transformer | epoch 0 | step 38680 |avg loss 7.669 |avg tokens 2049.100 |tokens/s 8150.357 |walltime 10220.309 | +Transformer | epoch 0 | step 38690 |avg loss 7.610 |avg tokens 2260.200 |tokens/s 8318.225 |walltime 10223.026 | +Transformer | epoch 0 | step 38700 |avg loss 7.983 |avg tokens 2275.600 |tokens/s 8463.253 |walltime 10225.715 | +Transformer | epoch 0 | step 38710 |avg loss 7.733 |avg tokens 2420.000 |tokens/s 9058.684 |walltime 10228.386 | +Transformer | epoch 0 | step 38720 |avg loss 7.253 |avg tokens 2189.600 |tokens/s 8036.137 |walltime 10231.111 | +Transformer | epoch 0 | step 38730 |avg loss 7.617 |avg tokens 2288.800 |tokens/s 8676.029 |walltime 10233.749 | +Transformer | epoch 0 | step 38740 |avg loss 7.531 |avg tokens 2305.600 |tokens/s 8285.190 |walltime 10236.532 | +Transformer | epoch 0 | step 38750 |avg loss 7.646 |avg tokens 2295.600 |tokens/s 8330.791 |walltime 10239.288 | +Transformer | epoch 0 | step 38760 |avg loss 7.941 |avg tokens 2305.300 |tokens/s 8992.084 |walltime 10241.851 | +Transformer | epoch 0 | step 38770 |avg loss 7.617 |avg tokens 2144.000 |tokens/s 7936.906 |walltime 10244.553 | +Transformer | epoch 0 | step 38780 |avg loss 7.682 |avg tokens 2397.100 |tokens/s 8772.384 |walltime 10247.285 | +Transformer | epoch 0 | step 38790 |avg loss 7.769 |avg tokens 2270.900 |tokens/s 8644.253 |walltime 10249.912 | +Transformer | epoch 0 | step 38800 |avg loss 7.692 |avg tokens 2198.400 |tokens/s 8354.842 |walltime 10252.543 | +Transformer | epoch 0 | step 38810 |avg loss 7.941 |avg tokens 2184.200 |tokens/s 8600.917 |walltime 10255.083 | +Transformer | epoch 0 | step 38820 |avg loss 7.459 |avg tokens 2278.000 |tokens/s 8321.455 |walltime 10257.820 | +Transformer | epoch 0 | step 38830 |avg loss 7.783 |avg tokens 2305.000 |tokens/s 8810.472 |walltime 10260.437 | +Transformer | epoch 0 | step 38840 |avg loss 7.555 |avg tokens 2387.400 |tokens/s 8753.522 |walltime 10263.164 | +Transformer | epoch 0 | step 38850 |avg loss 7.916 |avg tokens 2219.200 |tokens/s 8563.856 |walltime 10265.755 | +Transformer | epoch 0 | step 38860 |avg loss 7.600 |avg tokens 2282.100 |tokens/s 8413.703 |walltime 10268.468 | +Transformer | epoch 0 | step 38870 |avg loss 7.969 |avg tokens 1809.300 |tokens/s 7861.396 |walltime 10270.769 | +Transformer | epoch 0 | step 38880 |avg loss 7.689 |avg tokens 2139.700 |tokens/s 7958.379 |walltime 10273.458 | +Transformer | epoch 0 | step 38890 |avg loss 7.668 |avg tokens 2360.800 |tokens/s 8639.309 |walltime 10276.191 | +Transformer | epoch 0 | step 38900 |avg loss 7.833 |avg tokens 2153.400 |tokens/s 8336.655 |walltime 10278.774 | +Transformer | epoch 0 | step 38910 |avg loss 7.742 |avg tokens 2220.000 |tokens/s 8365.918 |walltime 10281.427 | +Transformer | epoch 0 | step 38920 |avg loss 7.184 |avg tokens 2452.800 |tokens/s 8762.315 |walltime 10284.226 | +Transformer | epoch 0 | step 38930 |avg loss 7.596 |avg tokens 2263.300 |tokens/s 8264.141 |walltime 10286.965 | +Transformer | epoch 0 | step 38940 |avg loss 7.730 |avg tokens 2195.400 |tokens/s 8288.906 |walltime 10289.614 | +Transformer | epoch 0 | step 38950 |avg loss 7.773 |avg tokens 2225.300 |tokens/s 8432.773 |walltime 10292.253 | +Transformer | epoch 0 | step 38960 |avg loss 7.882 |avg tokens 2138.800 |tokens/s 8350.985 |walltime 10294.814 | +Transformer | epoch 0 | step 38970 |avg loss 7.658 |avg tokens 2325.900 |tokens/s 8644.934 |walltime 10297.504 | +Transformer | epoch 0 | step 38980 |avg loss 7.419 |avg tokens 2196.000 |tokens/s 8048.274 |walltime 10300.233 | +Transformer | epoch 0 | step 38990 |avg loss 7.591 |avg tokens 2193.200 |tokens/s 8327.500 |walltime 10302.866 | +Transformer | epoch 0 | step 39000 |avg loss 7.638 |avg tokens 2304.000 |tokens/s 8380.759 |walltime 10305.616 | +Transformer | epoch 0 | step 39010 |avg loss 7.812 |avg tokens 2129.000 |tokens/s 8037.403 |walltime 10308.265 | +Transformer | epoch 0 | step 39020 |avg loss 7.320 |avg tokens 2341.900 |tokens/s 8405.024 |walltime 10311.051 | +Transformer | epoch 0 | step 39030 |avg loss 7.323 |avg tokens 2102.800 |tokens/s 7966.741 |walltime 10313.690 | +Transformer | epoch 0 | step 39040 |avg loss 7.652 |avg tokens 2257.500 |tokens/s 8372.484 |walltime 10316.387 | +Transformer | epoch 0 | step 39050 |avg loss 7.762 |avg tokens 2064.500 |tokens/s 7921.978 |walltime 10318.993 | +Transformer | epoch 0 | step 39060 |avg loss 7.484 |avg tokens 2347.300 |tokens/s 8693.165 |walltime 10321.693 | +Transformer | epoch 0 | step 39070 |avg loss 7.543 |avg tokens 2241.400 |tokens/s 8354.829 |walltime 10324.376 | +Transformer | epoch 0 | step 39080 |avg loss 7.654 |avg tokens 2186.000 |tokens/s 8233.140 |walltime 10327.031 | +Transformer | epoch 0 | step 39090 |avg loss 7.987 |avg tokens 2182.900 |tokens/s 8597.627 |walltime 10329.570 | +Transformer | epoch 0 | step 39100 |avg loss 8.059 |avg tokens 2130.500 |tokens/s 8382.745 |walltime 10332.111 | +Transformer | epoch 0 | step 39110 |avg loss 8.104 |avg tokens 2077.500 |tokens/s 8607.291 |walltime 10334.525 | +Transformer | epoch 0 | step 39120 |avg loss 7.724 |avg tokens 2134.800 |tokens/s 8603.379 |walltime 10337.006 | +Transformer | epoch 0 | step 39130 |avg loss 7.580 |avg tokens 2295.900 |tokens/s 8347.841 |walltime 10339.757 | +Transformer | epoch 0 | step 39140 |avg loss 7.542 |avg tokens 2266.500 |tokens/s 8389.052 |walltime 10342.458 | +Transformer | epoch 0 | step 39150 |avg loss 7.876 |avg tokens 2010.500 |tokens/s 7925.787 |walltime 10344.995 | +Transformer | epoch 0 | step 39160 |avg loss 7.803 |avg tokens 2168.800 |tokens/s 8302.966 |walltime 10347.607 | +Transformer | epoch 0 | step 39170 |avg loss 8.029 |avg tokens 1789.600 |tokens/s 7424.308 |walltime 10350.018 | +Transformer | epoch 0 | step 39180 |avg loss 7.868 |avg tokens 2261.500 |tokens/s 8545.423 |walltime 10352.664 | +Transformer | epoch 0 | step 39190 |avg loss 7.306 |avg tokens 2111.400 |tokens/s 7944.780 |walltime 10355.322 | +Transformer | epoch 0 | step 39200 |avg loss 8.040 |avg tokens 2261.400 |tokens/s 8587.489 |walltime 10357.955 | +Transformer | epoch 0 | step 39210 |avg loss 7.386 |avg tokens 2258.400 |tokens/s 8457.230 |walltime 10360.625 | +Transformer | epoch 0 | step 39220 |avg loss 7.865 |avg tokens 2217.500 |tokens/s 8288.293 |walltime 10363.301 | +Transformer | epoch 0 | step 39230 |avg loss 7.534 |avg tokens 2305.600 |tokens/s 8485.821 |walltime 10366.018 | +Transformer | epoch 0 | step 39240 |avg loss 8.094 |avg tokens 1921.600 |tokens/s 7939.224 |walltime 10368.438 | +Transformer | epoch 0 | step 39250 |avg loss 7.671 |avg tokens 2328.800 |tokens/s 8747.876 |walltime 10371.100 | +Transformer | epoch 0 | step 39260 |avg loss 7.797 |avg tokens 2216.700 |tokens/s 8516.900 |walltime 10373.703 | +Transformer | epoch 0 | step 39270 |avg loss 7.601 |avg tokens 2359.500 |tokens/s 8532.518 |walltime 10376.468 | +Transformer | epoch 0 | step 39280 |avg loss 7.805 |avg tokens 2121.300 |tokens/s 8014.132 |walltime 10379.115 | +Transformer | epoch 0 | step 39290 |avg loss 7.568 |avg tokens 2398.400 |tokens/s 8646.077 |walltime 10381.889 | +Transformer | epoch 0 | step 39300 |avg loss 7.910 |avg tokens 2082.500 |tokens/s 8410.036 |walltime 10384.365 | +Transformer | epoch 0 | step 39310 |avg loss 7.804 |avg tokens 2202.600 |tokens/s 8528.051 |walltime 10386.948 | +Transformer | epoch 0 | step 39320 |avg loss 7.478 |avg tokens 2192.800 |tokens/s 8161.927 |walltime 10389.635 | +Transformer | epoch 0 | step 39330 |avg loss 7.916 |avg tokens 2271.800 |tokens/s 8679.489 |walltime 10392.252 | +Transformer | epoch 0 | step 39340 |avg loss 7.233 |avg tokens 2405.600 |tokens/s 8602.165 |walltime 10395.049 | +Transformer | epoch 0 | step 39350 |avg loss 7.379 |avg tokens 2279.700 |tokens/s 8297.633 |walltime 10397.796 | +Transformer | epoch 0 | step 39360 |avg loss 7.683 |avg tokens 2216.800 |tokens/s 8196.061 |walltime 10400.501 | +Transformer | epoch 0 | step 39370 |avg loss 7.837 |avg tokens 2176.600 |tokens/s 8254.399 |walltime 10403.138 | +Transformer | epoch 0 | step 39380 |avg loss 7.719 |avg tokens 2137.700 |tokens/s 8011.947 |walltime 10405.806 | +Transformer | epoch 0 | step 39390 |avg loss 8.286 |avg tokens 2094.700 |tokens/s 8609.474 |walltime 10408.239 | +Transformer | epoch 0 | step 39400 |avg loss 8.015 |avg tokens 2216.400 |tokens/s 9070.812 |walltime 10410.682 | +Transformer | epoch 0 | step 39410 |avg loss 7.557 |avg tokens 2364.800 |tokens/s 8538.348 |walltime 10413.452 | +Transformer | epoch 0 | step 39420 |avg loss 7.899 |avg tokens 2103.600 |tokens/s 8275.448 |walltime 10415.994 | +Transformer | epoch 0 | step 39430 |avg loss 7.958 |avg tokens 2381.400 |tokens/s 8944.641 |walltime 10418.656 | +Transformer | epoch 0 | step 39440 |avg loss 7.768 |avg tokens 2156.000 |tokens/s 8245.391 |walltime 10421.271 | +Transformer | epoch 0 | step 39450 |avg loss 7.732 |avg tokens 2404.800 |tokens/s 8528.658 |walltime 10424.091 | +Transformer | epoch 0 | step 39460 |avg loss 7.771 |avg tokens 2034.900 |tokens/s 7970.026 |walltime 10426.644 | +Transformer | epoch 0 | step 39470 |avg loss 7.763 |avg tokens 2139.400 |tokens/s 8163.738 |walltime 10429.265 | +Transformer | epoch 0 | step 39480 |avg loss 7.576 |avg tokens 2212.400 |tokens/s 8562.765 |walltime 10431.848 | +Transformer | epoch 0 | step 39490 |avg loss 7.819 |avg tokens 2136.000 |tokens/s 8229.989 |walltime 10434.444 | +Transformer | epoch 0 | step 39500 |avg loss 7.830 |avg tokens 2057.000 |tokens/s 8048.447 |walltime 10437.000 | +Transformer | epoch 0 | step 39510 |avg loss 7.896 |avg tokens 2011.500 |tokens/s 7829.895 |walltime 10439.569 | +Transformer | epoch 0 | step 39520 |avg loss 7.492 |avg tokens 2182.400 |tokens/s 8317.363 |walltime 10442.193 | +Transformer | epoch 0 | step 39530 |avg loss 8.030 |avg tokens 2135.400 |tokens/s 8189.903 |walltime 10444.800 | +Transformer | epoch 0 | step 39540 |avg loss 7.896 |avg tokens 2116.400 |tokens/s 8396.602 |walltime 10447.320 | +Transformer | epoch 0 | step 39550 |avg loss 7.765 |avg tokens 2111.500 |tokens/s 8062.529 |walltime 10449.939 | +Transformer | epoch 0 | step 39560 |avg loss 7.739 |avg tokens 1938.200 |tokens/s 7673.349 |walltime 10452.465 | +Transformer | epoch 0 | step 39570 |avg loss 7.538 |avg tokens 2223.100 |tokens/s 8179.373 |walltime 10455.183 | +Transformer | epoch 0 | step 39580 |avg loss 7.847 |avg tokens 2130.400 |tokens/s 8486.223 |walltime 10457.694 | +Transformer | epoch 0 | step 39590 |avg loss 7.624 |avg tokens 2147.200 |tokens/s 8142.165 |walltime 10460.331 | +Transformer | epoch 0 | step 39600 |avg loss 7.493 |avg tokens 2318.400 |tokens/s 8392.047 |walltime 10463.093 | +Transformer | epoch 0 | step 39610 |avg loss 7.588 |avg tokens 2200.800 |tokens/s 8192.798 |walltime 10465.780 | +Transformer | epoch 0 | step 39620 |avg loss 7.831 |avg tokens 2038.800 |tokens/s 8134.886 |walltime 10468.286 | +Transformer | epoch 0 | step 39630 |avg loss 7.695 |avg tokens 2143.800 |tokens/s 8166.619 |walltime 10470.911 | +Transformer | epoch 0 | step 39640 |avg loss 7.631 |avg tokens 2204.400 |tokens/s 8243.918 |walltime 10473.585 | +Transformer | epoch 0 | step 39650 |avg loss 7.501 |avg tokens 2216.700 |tokens/s 8406.718 |walltime 10476.222 | +Transformer | epoch 0 | step 39660 |avg loss 7.378 |avg tokens 2374.900 |tokens/s 8597.999 |walltime 10478.984 | +Transformer | epoch 0 | step 39670 |avg loss 7.785 |avg tokens 2003.000 |tokens/s 7636.252 |walltime 10481.607 | +Transformer | epoch 0 | step 39680 |avg loss 7.472 |avg tokens 2187.200 |tokens/s 8106.457 |walltime 10484.305 | +Transformer | epoch 0 | step 39690 |avg loss 7.777 |avg tokens 2314.700 |tokens/s 8335.996 |walltime 10487.082 | +Transformer | epoch 0 | step 39700 |avg loss 7.806 |avg tokens 2341.700 |tokens/s 8785.081 |walltime 10489.747 | +Transformer | epoch 0 | step 39710 |avg loss 7.762 |avg tokens 2284.900 |tokens/s 8516.879 |walltime 10492.430 | +Transformer | epoch 0 | step 39720 |avg loss 7.632 |avg tokens 2168.900 |tokens/s 8207.688 |walltime 10495.073 | +Transformer | epoch 0 | step 39730 |avg loss 7.710 |avg tokens 2247.600 |tokens/s 8556.529 |walltime 10497.699 | +Transformer | epoch 0 | step 39740 |avg loss 8.063 |avg tokens 2084.600 |tokens/s 8621.059 |walltime 10500.117 | +Transformer | epoch 0 | step 39750 |avg loss 7.813 |avg tokens 2317.600 |tokens/s 8659.046 |walltime 10502.794 | +Transformer | epoch 0 | step 39760 |avg loss 7.717 |avg tokens 2246.400 |tokens/s 8272.103 |walltime 10505.510 | +Transformer | epoch 0 | step 39770 |avg loss 7.368 |avg tokens 2303.200 |tokens/s 8365.529 |walltime 10508.263 | +Transformer | epoch 0 | step 39780 |avg loss 7.496 |avg tokens 2262.400 |tokens/s 8295.509 |walltime 10510.990 | +Transformer | epoch 0 | step 39790 |avg loss 7.501 |avg tokens 2200.800 |tokens/s 8277.137 |walltime 10513.649 | +Transformer | epoch 0 | step 39800 |avg loss 7.445 |avg tokens 2230.700 |tokens/s 8277.097 |walltime 10516.344 | +Transformer | epoch 0 | step 39810 |avg loss 7.549 |avg tokens 2162.400 |tokens/s 8052.440 |walltime 10519.029 | +Transformer | epoch 0 | step 39820 |avg loss 7.962 |avg tokens 2213.800 |tokens/s 8688.826 |walltime 10521.577 | +Transformer | epoch 0 | step 39830 |avg loss 7.985 |avg tokens 2063.100 |tokens/s 8242.858 |walltime 10524.080 | +Transformer | epoch 0 | step 39840 |avg loss 7.784 |avg tokens 2147.700 |tokens/s 8318.489 |walltime 10526.662 | +Transformer | epoch 0 | step 39850 |avg loss 7.720 |avg tokens 2209.000 |tokens/s 8148.508 |walltime 10529.373 | +Transformer | epoch 0 | step 39860 |avg loss 7.841 |avg tokens 2088.800 |tokens/s 7991.200 |walltime 10531.987 | +Transformer | epoch 0 | step 39870 |avg loss 7.583 |avg tokens 2236.000 |tokens/s 8175.219 |walltime 10534.722 | +Transformer | epoch 0 | step 39880 |avg loss 7.388 |avg tokens 2160.800 |tokens/s 7868.809 |walltime 10537.468 | +Transformer | epoch 0 | step 39890 |avg loss 7.805 |avg tokens 2131.800 |tokens/s 7887.853 |walltime 10540.171 | +Transformer | epoch 0 | step 39900 |avg loss 7.553 |avg tokens 2110.400 |tokens/s 7983.328 |walltime 10542.814 | +Transformer | epoch 0 | step 39910 |avg loss 7.358 |avg tokens 2292.800 |tokens/s 8330.085 |walltime 10545.567 | +Transformer | epoch 0 | step 39920 |avg loss 8.104 |avg tokens 2253.700 |tokens/s 8830.468 |walltime 10548.119 | +Transformer | epoch 0 | step 39930 |avg loss 7.753 |avg tokens 2324.000 |tokens/s 8709.133 |walltime 10550.787 | +Transformer | epoch 0 | step 39940 |avg loss 7.671 |avg tokens 2277.100 |tokens/s 8564.339 |walltime 10553.446 | +Transformer | epoch 0 | step 39950 |avg loss 7.927 |avg tokens 2148.000 |tokens/s 8103.732 |walltime 10556.097 | +Transformer | epoch 0 | step 39960 |avg loss 7.804 |avg tokens 2077.800 |tokens/s 8072.943 |walltime 10558.670 | +Transformer | epoch 0 | step 39970 |avg loss 7.672 |avg tokens 2254.400 |tokens/s 8186.633 |walltime 10561.424 | +Transformer | epoch 0 | step 39980 |avg loss 7.556 |avg tokens 2213.600 |tokens/s 8176.226 |walltime 10564.132 | +Transformer | epoch 0 | step 39990 |avg loss 7.815 |avg tokens 2104.800 |tokens/s 8081.486 |walltime 10566.736 | +Transformer | epoch 0 | step 40000 |avg loss 8.169 |avg tokens 2126.900 |tokens/s 8394.187 |walltime 10569.270 | +Transformer | epoch 0 | step 40010 |avg loss 7.858 |avg tokens 2105.000 |tokens/s 8226.230 |walltime 10571.829 | +Transformer | epoch 0 | step 40020 |avg loss 7.847 |avg tokens 2312.500 |tokens/s 8768.598 |walltime 10574.466 | +Transformer | epoch 0 | step 40030 |avg loss 7.714 |avg tokens 2131.400 |tokens/s 8267.386 |walltime 10577.044 | +Transformer | epoch 0 | step 40040 |avg loss 7.558 |avg tokens 2224.100 |tokens/s 8207.155 |walltime 10579.754 | +Transformer | epoch 0 | step 40050 |avg loss 7.869 |avg tokens 2242.200 |tokens/s 8570.293 |walltime 10582.370 | +Transformer | epoch 0 | step 40060 |avg loss 7.793 |avg tokens 2270.400 |tokens/s 8622.942 |walltime 10585.003 | +Transformer | epoch 0 | step 40070 |avg loss 7.501 |avg tokens 2321.600 |tokens/s 8505.366 |walltime 10587.733 | +Transformer | epoch 0 | step 40080 |avg loss 7.838 |avg tokens 2266.100 |tokens/s 8519.693 |walltime 10590.393 | +Transformer | epoch 0 | step 40090 |avg loss 7.669 |avg tokens 2028.200 |tokens/s 7725.339 |walltime 10593.018 | +Transformer | epoch 0 | step 40100 |avg loss 7.685 |avg tokens 2431.200 |tokens/s 9165.370 |walltime 10595.671 | +Transformer | epoch 0 | step 40110 |avg loss 7.710 |avg tokens 2079.200 |tokens/s 8326.318 |walltime 10598.168 | +Transformer | epoch 0 | step 40120 |avg loss 7.812 |avg tokens 2258.900 |tokens/s 8424.811 |walltime 10600.849 | +Transformer | epoch 0 | step 40130 |avg loss 8.036 |avg tokens 2061.800 |tokens/s 8223.662 |walltime 10603.356 | +Transformer | epoch 0 | step 40140 |avg loss 7.480 |avg tokens 2298.600 |tokens/s 8371.647 |walltime 10606.102 | +Transformer | epoch 0 | step 40150 |avg loss 7.011 |avg tokens 2314.400 |tokens/s 8420.656 |walltime 10608.850 | +Transformer | epoch 0 | step 40160 |avg loss 7.571 |avg tokens 2185.400 |tokens/s 8071.868 |walltime 10611.558 | +Transformer | epoch 0 | step 40170 |avg loss 8.122 |avg tokens 2140.300 |tokens/s 8551.782 |walltime 10614.061 | +Transformer | epoch 0 | step 40180 |avg loss 7.408 |avg tokens 2174.700 |tokens/s 8091.293 |walltime 10616.748 | +Transformer | epoch 0 | step 40190 |avg loss 7.744 |avg tokens 2295.900 |tokens/s 8398.707 |walltime 10619.482 | +Transformer | epoch 0 | step 40200 |avg loss 7.755 |avg tokens 2024.600 |tokens/s 7770.188 |walltime 10622.088 | +Transformer | epoch 0 | step 40210 |avg loss 7.975 |avg tokens 2370.100 |tokens/s 8757.369 |walltime 10624.794 | +Transformer | epoch 0 | step 40220 |avg loss 7.826 |avg tokens 2352.200 |tokens/s 8987.763 |walltime 10627.411 | +Transformer | epoch 0 | step 40230 |avg loss 7.605 |avg tokens 1966.400 |tokens/s 7640.691 |walltime 10629.985 | +Transformer | epoch 0 | step 40240 |avg loss 7.475 |avg tokens 2211.200 |tokens/s 8147.584 |walltime 10632.699 | +Transformer | epoch 0 | step 40250 |avg loss 7.936 |avg tokens 2322.800 |tokens/s 8724.804 |walltime 10635.361 | +Transformer | epoch 0 | step 40260 |avg loss 7.818 |avg tokens 2243.000 |tokens/s 8355.299 |walltime 10638.045 | +Transformer | epoch 0 | step 40270 |avg loss 7.939 |avg tokens 2111.100 |tokens/s 8452.691 |walltime 10640.543 | +Transformer | epoch 0 | step 40280 |avg loss 7.792 |avg tokens 2249.900 |tokens/s 8512.480 |walltime 10643.186 | +Transformer | epoch 0 | step 40290 |avg loss 7.680 |avg tokens 1880.700 |tokens/s 7378.606 |walltime 10645.735 | +Transformer | epoch 0 | step 40300 |avg loss 7.890 |avg tokens 2297.100 |tokens/s 8673.781 |walltime 10648.383 | +Transformer | epoch 0 | step 40310 |avg loss 7.535 |avg tokens 2246.000 |tokens/s 8576.347 |walltime 10651.002 | +Transformer | epoch 0 | step 40320 |avg loss 7.785 |avg tokens 2156.000 |tokens/s 7988.740 |walltime 10653.701 | +Transformer | epoch 0 | step 40330 |avg loss 7.753 |avg tokens 2318.200 |tokens/s 9040.690 |walltime 10656.265 | +Transformer | epoch 0 | step 40340 |avg loss 7.798 |avg tokens 1885.600 |tokens/s 7804.370 |walltime 10658.681 | +Transformer | epoch 0 | step 40350 |avg loss 7.939 |avg tokens 2226.000 |tokens/s 8129.286 |walltime 10661.419 | +Transformer | epoch 0 | step 40360 |avg loss 7.541 |avg tokens 2336.800 |tokens/s 8813.696 |walltime 10664.071 | +Transformer | epoch 0 | step 40370 |avg loss 7.494 |avg tokens 2348.900 |tokens/s 8494.685 |walltime 10666.836 | +Transformer | epoch 0 | step 40380 |avg loss 8.132 |avg tokens 2249.500 |tokens/s 8599.731 |walltime 10669.452 | +Transformer | epoch 0 | step 40390 |avg loss 7.885 |avg tokens 2335.200 |tokens/s 8769.341 |walltime 10672.115 | +Transformer | epoch 0 | step 40400 |avg loss 7.621 |avg tokens 2241.400 |tokens/s 8518.977 |walltime 10674.746 | +Transformer | epoch 0 | step 40410 |avg loss 7.543 |avg tokens 2067.700 |tokens/s 7960.632 |walltime 10677.343 | +Transformer | epoch 0 | step 40420 |avg loss 7.731 |avg tokens 2192.400 |tokens/s 8301.598 |walltime 10679.984 | +Transformer | epoch 0 | step 40430 |avg loss 7.486 |avg tokens 2326.700 |tokens/s 8572.471 |walltime 10682.698 | +Transformer | epoch 0 | step 40440 |avg loss 7.418 |avg tokens 2257.600 |tokens/s 8257.111 |walltime 10685.432 | +Transformer | epoch 0 | step 40450 |avg loss 7.431 |avg tokens 2344.000 |tokens/s 8521.440 |walltime 10688.183 | +Transformer | epoch 0 | step 40460 |avg loss 7.414 |avg tokens 2278.400 |tokens/s 8164.922 |walltime 10690.973 | +Transformer | epoch 0 | step 40470 |avg loss 7.471 |avg tokens 2070.500 |tokens/s 7844.635 |walltime 10693.613 | +Transformer | epoch 0 | step 40480 |avg loss 8.127 |avg tokens 1819.900 |tokens/s 7734.307 |walltime 10695.966 | +Transformer | epoch 0 | step 40490 |avg loss 7.879 |avg tokens 2271.700 |tokens/s 8502.959 |walltime 10698.638 | +Transformer | epoch 0 | step 40500 |avg loss 7.652 |avg tokens 2217.500 |tokens/s 8231.129 |walltime 10701.332 | +Transformer | epoch 0 | step 40510 |avg loss 7.871 |avg tokens 2289.400 |tokens/s 8719.766 |walltime 10703.957 | +Transformer | epoch 0 | step 40520 |avg loss 7.618 |avg tokens 2190.300 |tokens/s 8461.545 |walltime 10706.546 | +Transformer | epoch 0 | step 40530 |avg loss 7.714 |avg tokens 2116.600 |tokens/s 7931.531 |walltime 10709.214 | +Transformer | epoch 0 | step 40540 |avg loss 7.443 |avg tokens 2317.600 |tokens/s 8531.284 |walltime 10711.931 | +Transformer | epoch 0 | step 40550 |avg loss 7.742 |avg tokens 2206.400 |tokens/s 8389.268 |walltime 10714.561 | +Transformer | epoch 0 | step 40560 |avg loss 7.564 |avg tokens 2067.100 |tokens/s 7776.803 |walltime 10717.219 | +Transformer | epoch 0 | step 40570 |avg loss 7.759 |avg tokens 2143.200 |tokens/s 8176.377 |walltime 10719.840 | +Transformer | epoch 0 | step 40580 |avg loss 7.637 |avg tokens 2282.100 |tokens/s 8321.887 |walltime 10722.582 | +Transformer | epoch 0 | step 40590 |avg loss 7.851 |avg tokens 2237.800 |tokens/s 8472.700 |walltime 10725.224 | +Transformer | epoch 0 | step 40600 |avg loss 7.668 |avg tokens 2297.900 |tokens/s 8479.742 |walltime 10727.934 | +Transformer | epoch 0 | step 40610 |avg loss 7.910 |avg tokens 2105.400 |tokens/s 8308.915 |walltime 10730.467 | +Transformer | epoch 0 | step 40620 |avg loss 7.303 |avg tokens 2199.400 |tokens/s 8057.294 |walltime 10733.197 | +Transformer | epoch 0 | step 40630 |avg loss 7.761 |avg tokens 2026.300 |tokens/s 8384.741 |walltime 10735.614 | +Transformer | epoch 0 | step 40640 |avg loss 7.509 |avg tokens 2028.500 |tokens/s 7840.546 |walltime 10738.201 | +Transformer | epoch 0 | step 40650 |avg loss 7.739 |avg tokens 2326.100 |tokens/s 8543.087 |walltime 10740.924 | +Transformer | epoch 0 | step 40660 |avg loss 7.793 |avg tokens 2108.800 |tokens/s 8066.878 |walltime 10743.538 | +Transformer | epoch 0 | step 40670 |avg loss 7.546 |avg tokens 2233.900 |tokens/s 8138.646 |walltime 10746.283 | +Transformer | epoch 0 | step 40680 |avg loss 7.961 |avg tokens 1918.900 |tokens/s 7985.336 |walltime 10748.686 | +Transformer | epoch 0 | step 40690 |avg loss 7.688 |avg tokens 2219.300 |tokens/s 8250.956 |walltime 10751.376 | +Transformer | epoch 0 | step 40700 |avg loss 7.624 |avg tokens 1986.700 |tokens/s 7636.660 |walltime 10753.977 | +Transformer | epoch 0 | step 40710 |avg loss 7.608 |avg tokens 2251.800 |tokens/s 8465.138 |walltime 10756.637 | +Transformer | epoch 0 | step 40720 |avg loss 7.686 |avg tokens 2173.500 |tokens/s 8102.612 |walltime 10759.320 | +Transformer | epoch 0 | step 40730 |avg loss 7.834 |avg tokens 2300.200 |tokens/s 8675.231 |walltime 10761.971 | +Transformer | epoch 0 | step 40740 |avg loss 7.717 |avg tokens 2224.700 |tokens/s 7987.794 |walltime 10764.756 | +Transformer | epoch 0 | step 40750 |avg loss 7.585 |avg tokens 2309.700 |tokens/s 8841.574 |walltime 10767.369 | +Transformer | epoch 0 | step 40760 |avg loss 7.976 |avg tokens 2142.700 |tokens/s 8381.451 |walltime 10769.925 | +Transformer | epoch 0 | step 40770 |avg loss 7.536 |avg tokens 2141.100 |tokens/s 7994.154 |walltime 10772.603 | +Transformer | epoch 0 | step 40780 |avg loss 7.667 |avg tokens 2049.100 |tokens/s 7847.282 |walltime 10775.215 | +Transformer | epoch 0 | step 40790 |avg loss 7.544 |avg tokens 2131.200 |tokens/s 8183.332 |walltime 10777.819 | +Transformer | epoch 0 | step 40800 |avg loss 7.649 |avg tokens 2297.300 |tokens/s 8491.659 |walltime 10780.524 | +Transformer | epoch 0 | step 40810 |avg loss 7.898 |avg tokens 1991.100 |tokens/s 7928.343 |walltime 10783.036 | +Transformer | epoch 0 | step 40820 |avg loss 7.903 |avg tokens 2282.500 |tokens/s 8576.961 |walltime 10785.697 | +Transformer | epoch 0 | step 40830 |avg loss 7.643 |avg tokens 2322.200 |tokens/s 8510.110 |walltime 10788.426 | +Transformer | epoch 0 | step 40840 |avg loss 7.743 |avg tokens 2288.000 |tokens/s 8316.435 |walltime 10791.177 | +Transformer | epoch 0 | step 40850 |avg loss 7.890 |avg tokens 2033.300 |tokens/s 7684.434 |walltime 10793.823 | +Transformer | epoch 0 | step 40860 |avg loss 7.806 |avg tokens 2215.500 |tokens/s 8309.393 |walltime 10796.489 | +Transformer | epoch 0 | step 40870 |avg loss 7.925 |avg tokens 2308.800 |tokens/s 8443.308 |walltime 10799.224 | +Transformer | epoch 0 | step 40880 |avg loss 7.599 |avg tokens 2204.800 |tokens/s 8371.291 |walltime 10801.857 | +Transformer | epoch 0 | step 40890 |avg loss 8.010 |avg tokens 2102.600 |tokens/s 8446.460 |walltime 10804.347 | +Transformer | epoch 0 | step 40900 |avg loss 7.872 |avg tokens 2304.800 |tokens/s 8631.342 |walltime 10807.017 | +Transformer | epoch 0 | step 40910 |avg loss 7.820 |avg tokens 2260.500 |tokens/s 8580.915 |walltime 10809.651 | +Transformer | epoch 0 | step 40920 |avg loss 7.655 |avg tokens 2104.100 |tokens/s 8070.772 |walltime 10812.258 | +Transformer | epoch 0 | step 40930 |avg loss 7.658 |avg tokens 2266.400 |tokens/s 8580.159 |walltime 10814.900 | +Transformer | epoch 0 | step 40940 |avg loss 7.605 |avg tokens 2346.400 |tokens/s 8590.209 |walltime 10817.631 | +Transformer | epoch 0 | step 40950 |avg loss 7.498 |avg tokens 2153.100 |tokens/s 8048.818 |walltime 10820.306 | +Transformer | epoch 0 | step 40960 |avg loss 7.517 |avg tokens 2280.200 |tokens/s 8336.450 |walltime 10823.042 | +Transformer | epoch 0 | step 40970 |avg loss 7.697 |avg tokens 2227.800 |tokens/s 8245.908 |walltime 10825.743 | +Transformer | epoch 0 | step 40980 |avg loss 7.734 |avg tokens 2238.000 |tokens/s 8348.395 |walltime 10828.424 | +Transformer | epoch 0 | step 40990 |avg loss 7.591 |avg tokens 2025.100 |tokens/s 7605.545 |walltime 10831.087 | +Transformer | epoch 0 | step 41000 |avg loss 8.233 |avg tokens 2075.300 |tokens/s 8413.166 |walltime 10833.553 | +Transformer | epoch 0 | step 41010 |avg loss 7.537 |avg tokens 2397.100 |tokens/s 8655.613 |walltime 10836.323 | +Transformer | epoch 0 | step 41020 |avg loss 7.621 |avg tokens 2285.200 |tokens/s 8424.519 |walltime 10839.035 | +Transformer | epoch 0 | step 41030 |avg loss 7.493 |avg tokens 2179.700 |tokens/s 8177.706 |walltime 10841.701 | +Transformer | epoch 0 | step 41040 |avg loss 7.868 |avg tokens 1996.700 |tokens/s 7768.851 |walltime 10844.271 | +Transformer | epoch 0 | step 41050 |avg loss 7.399 |avg tokens 2316.800 |tokens/s 8375.027 |walltime 10847.037 | +Transformer | epoch 0 | step 41060 |avg loss 7.969 |avg tokens 2039.200 |tokens/s 8002.659 |walltime 10849.585 | +Transformer | epoch 0 | step 41070 |avg loss 7.748 |avg tokens 2192.700 |tokens/s 8141.370 |walltime 10852.279 | +Transformer | epoch 0 | step 41080 |avg loss 7.541 |avg tokens 2168.800 |tokens/s 8052.276 |walltime 10854.972 | +Transformer | epoch 0 | step 41090 |avg loss 8.022 |avg tokens 1994.200 |tokens/s 8219.957 |walltime 10857.398 | +Transformer | epoch 0 | step 41100 |avg loss 7.775 |avg tokens 2372.100 |tokens/s 8810.922 |walltime 10860.090 | +Transformer | epoch 0 | step 41110 |avg loss 7.735 |avg tokens 2168.800 |tokens/s 8143.113 |walltime 10862.754 | +Transformer | epoch 0 | step 41120 |avg loss 7.764 |avg tokens 2323.200 |tokens/s 8698.637 |walltime 10865.424 | +Transformer | epoch 0 | step 41130 |avg loss 7.610 |avg tokens 2118.100 |tokens/s 8230.422 |walltime 10867.998 | +Transformer | epoch 0 | step 41140 |avg loss 7.550 |avg tokens 2060.700 |tokens/s 7845.024 |walltime 10870.625 | +Transformer | epoch 0 | step 41150 |avg loss 7.788 |avg tokens 1996.700 |tokens/s 7648.206 |walltime 10873.235 | +Transformer | epoch 0 | step 41160 |avg loss 7.820 |avg tokens 2229.000 |tokens/s 8426.193 |walltime 10875.881 | +Transformer | epoch 0 | step 41170 |avg loss 7.746 |avg tokens 2018.800 |tokens/s 7865.776 |walltime 10878.447 | +Transformer | epoch 0 | step 41180 |avg loss 7.707 |avg tokens 1981.900 |tokens/s 7777.657 |walltime 10880.996 | +Transformer | epoch 0 | step 41190 |avg loss 7.675 |avg tokens 2189.000 |tokens/s 8375.463 |walltime 10883.609 | +Transformer | epoch 0 | step 41200 |avg loss 7.457 |avg tokens 2128.900 |tokens/s 8099.878 |walltime 10886.237 | +Transformer | epoch 0 | step 41210 |avg loss 8.104 |avg tokens 2386.000 |tokens/s 9092.921 |walltime 10888.861 | +Transformer | epoch 0 | step 41220 |avg loss 7.727 |avg tokens 2132.800 |tokens/s 8281.057 |walltime 10891.437 | +Transformer | epoch 0 | step 41230 |avg loss 7.752 |avg tokens 2214.300 |tokens/s 8235.227 |walltime 10894.126 | +Transformer | epoch 0 | step 41240 |avg loss 7.396 |avg tokens 2261.600 |tokens/s 8163.266 |walltime 10896.896 | +Transformer | epoch 0 | step 41250 |avg loss 7.705 |avg tokens 2197.400 |tokens/s 8327.292 |walltime 10899.535 | +Transformer | epoch 0 | step 41260 |avg loss 7.615 |avg tokens 1928.200 |tokens/s 7703.843 |walltime 10902.038 | +Transformer | epoch 0 | step 41270 |avg loss 7.841 |avg tokens 2196.100 |tokens/s 8620.786 |walltime 10904.585 | +Transformer | epoch 0 | step 41280 |avg loss 8.031 |avg tokens 2100.500 |tokens/s 8336.460 |walltime 10907.105 | +Transformer | epoch 0 | step 41290 |avg loss 7.776 |avg tokens 2082.700 |tokens/s 8149.397 |walltime 10909.661 | +Transformer | epoch 0 | step 41300 |avg loss 7.531 |avg tokens 2322.500 |tokens/s 8699.264 |walltime 10912.331 | +Transformer | epoch 0 | step 41310 |avg loss 7.929 |avg tokens 2076.000 |tokens/s 8234.086 |walltime 10914.852 | +Transformer | epoch 0 | step 41320 |avg loss 7.804 |avg tokens 2073.200 |tokens/s 7801.596 |walltime 10917.509 | +Transformer | epoch 0 | step 41330 |avg loss 7.521 |avg tokens 2230.400 |tokens/s 8455.436 |walltime 10920.147 | +Transformer | epoch 0 | step 41340 |avg loss 7.616 |avg tokens 2024.000 |tokens/s 7700.809 |walltime 10922.775 | +Transformer | epoch 0 | step 41350 |avg loss 7.918 |avg tokens 2274.900 |tokens/s 8730.986 |walltime 10925.381 | +Transformer | epoch 0 | step 41360 |avg loss 7.722 |avg tokens 2166.000 |tokens/s 8222.239 |walltime 10928.015 | +Transformer | epoch 0 | step 41370 |avg loss 7.873 |avg tokens 1956.400 |tokens/s 7879.038 |walltime 10930.498 | +Transformer | epoch 0 | step 41380 |avg loss 7.955 |avg tokens 1962.200 |tokens/s 7959.282 |walltime 10932.964 | +Transformer | epoch 0 | step 41390 |avg loss 8.119 |avg tokens 2214.000 |tokens/s 8513.066 |walltime 10935.564 | +Transformer | epoch 0 | step 41400 |avg loss 7.671 |avg tokens 2227.600 |tokens/s 8276.898 |walltime 10938.256 | +Transformer | epoch 0 | step 41410 |avg loss 7.744 |avg tokens 2271.600 |tokens/s 8665.934 |walltime 10940.877 | +Transformer | epoch 0 | step 41420 |avg loss 7.879 |avg tokens 2091.300 |tokens/s 8177.495 |walltime 10943.434 | +Transformer | epoch 0 | step 41430 |avg loss 7.828 |avg tokens 2397.000 |tokens/s 8908.059 |walltime 10946.125 | +Transformer | epoch 0 | step 41440 |avg loss 7.946 |avg tokens 1923.900 |tokens/s 7829.337 |walltime 10948.582 | +Transformer | epoch 0 | step 41450 |avg loss 8.013 |avg tokens 2125.100 |tokens/s 8452.062 |walltime 10951.097 | +Transformer | epoch 0 | step 41460 |avg loss 7.459 |avg tokens 2405.600 |tokens/s 8642.540 |walltime 10953.880 | +Transformer | epoch 0 | step 41470 |avg loss 7.557 |avg tokens 2177.500 |tokens/s 8294.181 |walltime 10956.505 | +Transformer | epoch 0 | step 41480 |avg loss 7.765 |avg tokens 2066.600 |tokens/s 8291.147 |walltime 10958.998 | +Transformer | epoch 0 | step 41490 |avg loss 7.586 |avg tokens 2313.100 |tokens/s 8516.442 |walltime 10961.714 | +Transformer | epoch 0 | step 41500 |avg loss 7.732 |avg tokens 2195.400 |tokens/s 8202.301 |walltime 10964.391 | +Transformer | epoch 0 | step 41510 |avg loss 7.484 |avg tokens 2258.800 |tokens/s 8615.587 |walltime 10967.012 | +Transformer | epoch 0 | step 41520 |avg loss 7.680 |avg tokens 2366.500 |tokens/s 9009.728 |walltime 10969.639 | +Transformer | epoch 0 | step 41530 |avg loss 7.540 |avg tokens 2265.700 |tokens/s 8285.447 |walltime 10972.374 | +Transformer | epoch 0 | step 41540 |avg loss 7.717 |avg tokens 2374.400 |tokens/s 8623.738 |walltime 10975.127 | +Transformer | epoch 0 | step 41550 |avg loss 7.464 |avg tokens 2222.500 |tokens/s 7983.468 |walltime 10977.911 | +Transformer | epoch 0 | step 41560 |avg loss 7.744 |avg tokens 1928.500 |tokens/s 7591.976 |walltime 10980.451 | +Transformer | epoch 0 | step 41570 |avg loss 7.421 |avg tokens 2200.800 |tokens/s 8295.444 |walltime 10983.104 | +Transformer | epoch 0 | step 41580 |avg loss 7.826 |avg tokens 1985.800 |tokens/s 7888.270 |walltime 10985.621 | +Transformer | epoch 0 | step 41590 |avg loss 7.912 |avg tokens 1792.200 |tokens/s 7613.872 |walltime 10987.975 | +Transformer | epoch 0 | step 41600 |avg loss 7.801 |avg tokens 2227.600 |tokens/s 8514.084 |walltime 10990.592 | +Transformer | epoch 0 | step 41610 |avg loss 7.281 |avg tokens 2259.400 |tokens/s 8189.917 |walltime 10993.350 | +Transformer | epoch 0 | step 41620 |avg loss 7.909 |avg tokens 2009.600 |tokens/s 7787.502 |walltime 10995.931 | +Transformer | epoch 0 | step 41630 |avg loss 7.819 |avg tokens 2220.100 |tokens/s 8612.946 |walltime 10998.509 | +Transformer | epoch 0 | step 41640 |avg loss 7.164 |avg tokens 2229.700 |tokens/s 8444.196 |walltime 11001.149 | +Transformer | epoch 0 | step 41650 |avg loss 7.679 |avg tokens 2168.800 |tokens/s 8312.091 |walltime 11003.758 | +Transformer | epoch 0 | step 41660 |avg loss 7.772 |avg tokens 1952.100 |tokens/s 7556.509 |walltime 11006.342 | +Transformer | epoch 0 | step 41670 |avg loss 7.840 |avg tokens 2199.800 |tokens/s 8312.863 |walltime 11008.988 | +Transformer | epoch 0 | step 41680 |avg loss 7.859 |avg tokens 1994.900 |tokens/s 7993.799 |walltime 11011.483 | +Transformer | epoch 0 | step 41690 |avg loss 7.745 |avg tokens 2167.000 |tokens/s 8102.887 |walltime 11014.158 | +Transformer | epoch 0 | step 41700 |avg loss 7.507 |avg tokens 1997.700 |tokens/s 7666.183 |walltime 11016.764 | +Transformer | epoch 0 | step 41710 |avg loss 7.665 |avg tokens 2163.600 |tokens/s 8451.862 |walltime 11019.324 | +Transformer | epoch 0 | step 41720 |avg loss 7.576 |avg tokens 2408.200 |tokens/s 8535.575 |walltime 11022.145 | +Transformer | epoch 0 | step 41730 |avg loss 7.516 |avg tokens 2130.100 |tokens/s 7881.375 |walltime 11024.848 | +Transformer | epoch 0 | step 41740 |avg loss 8.016 |avg tokens 2100.200 |tokens/s 8362.700 |walltime 11027.359 | +Transformer | epoch 0 | step 41750 |avg loss 7.778 |avg tokens 2240.200 |tokens/s 8445.702 |walltime 11030.012 | +Transformer | epoch 0 | step 41760 |avg loss 7.922 |avg tokens 1893.200 |tokens/s 7695.853 |walltime 11032.472 | +Transformer | epoch 0 | step 41770 |avg loss 7.609 |avg tokens 2217.700 |tokens/s 8451.606 |walltime 11035.096 | +Transformer | epoch 0 | step 41780 |avg loss 7.716 |avg tokens 2160.900 |tokens/s 8600.651 |walltime 11037.608 | +Transformer | epoch 0 | step 41790 |avg loss 7.607 |avg tokens 2252.800 |tokens/s 8500.649 |walltime 11040.258 | +Transformer | epoch 0 | step 41800 |avg loss 7.838 |avg tokens 1923.900 |tokens/s 7665.818 |walltime 11042.768 | +Transformer | epoch 0 | step 41810 |avg loss 7.727 |avg tokens 2396.200 |tokens/s 8882.437 |walltime 11045.466 | +Transformer | epoch 0 | step 41820 |avg loss 7.929 |avg tokens 2065.000 |tokens/s 8193.101 |walltime 11047.986 | +Transformer | epoch 0 | step 41830 |avg loss 7.438 |avg tokens 2320.600 |tokens/s 8687.464 |walltime 11050.657 | +Transformer | epoch 0 | step 41840 |avg loss 7.472 |avg tokens 2191.000 |tokens/s 8233.398 |walltime 11053.318 | +Transformer | epoch 0 | step 41850 |avg loss 8.023 |avg tokens 2187.500 |tokens/s 8316.518 |walltime 11055.949 | +Transformer | epoch 0 | step 41860 |avg loss 7.584 |avg tokens 2209.300 |tokens/s 8323.024 |walltime 11058.603 | +Transformer | epoch 0 | step 41870 |avg loss 7.807 |avg tokens 2178.600 |tokens/s 8517.523 |walltime 11061.161 | +Transformer | epoch 0 | step 41880 |avg loss 7.896 |avg tokens 1903.500 |tokens/s 7441.572 |walltime 11063.719 | +Transformer | epoch 0 | step 41890 |avg loss 7.413 |avg tokens 2283.300 |tokens/s 8494.946 |walltime 11066.407 | +Transformer | epoch 0 | step 41900 |avg loss 7.831 |avg tokens 2231.600 |tokens/s 8509.598 |walltime 11069.029 | +Transformer | epoch 0 | step 41910 |avg loss 7.809 |avg tokens 2050.800 |tokens/s 7995.947 |walltime 11071.594 | +Transformer | epoch 0 | step 41920 |avg loss 7.597 |avg tokens 2032.000 |tokens/s 7884.727 |walltime 11074.171 | +Transformer | epoch 0 | step 41930 |avg loss 7.791 |avg tokens 1922.100 |tokens/s 7803.658 |walltime 11076.634 | +Transformer | epoch 0 | step 41940 |avg loss 7.800 |avg tokens 2288.100 |tokens/s 8775.960 |walltime 11079.241 | +Transformer | epoch 0 | step 41950 |avg loss 7.930 |avg tokens 2031.000 |tokens/s 7996.740 |walltime 11081.781 | +Transformer | epoch 0 | step 41960 |avg loss 8.062 |avg tokens 1913.100 |tokens/s 7820.863 |walltime 11084.227 | +Transformer | epoch 0 | step 41970 |avg loss 7.483 |avg tokens 2388.900 |tokens/s 8667.346 |walltime 11086.984 | +Transformer | epoch 0 | step 41980 |avg loss 7.489 |avg tokens 2276.800 |tokens/s 8352.814 |walltime 11089.709 | +Transformer | epoch 0 | step 41990 |avg loss 8.099 |avg tokens 2278.200 |tokens/s 8841.859 |walltime 11092.286 | +Transformer | epoch 0 | step 42000 |avg loss 7.679 |avg tokens 2183.800 |tokens/s 8199.061 |walltime 11094.949 | +Transformer | epoch 0 | step 42010 |avg loss 7.753 |avg tokens 1950.700 |tokens/s 7784.395 |walltime 11097.455 | +Transformer | epoch 0 | step 42020 |avg loss 7.888 |avg tokens 2125.000 |tokens/s 8084.817 |walltime 11100.084 | +Transformer | epoch 0 | step 42030 |avg loss 7.822 |avg tokens 2178.700 |tokens/s 8531.901 |walltime 11102.637 | +Transformer | epoch 0 | step 42040 |avg loss 7.752 |avg tokens 2076.300 |tokens/s 8143.574 |walltime 11105.187 | +Transformer | epoch 0 | step 42050 |avg loss 7.822 |avg tokens 2320.000 |tokens/s 8756.353 |walltime 11107.836 | +Transformer | epoch 0 | step 42060 |avg loss 7.926 |avg tokens 2257.100 |tokens/s 8613.053 |walltime 11110.457 | +Transformer | epoch 0 | step 42070 |avg loss 7.674 |avg tokens 2310.400 |tokens/s 8434.492 |walltime 11113.196 | +Transformer | epoch 0 | step 42080 |avg loss 7.614 |avg tokens 2127.000 |tokens/s 8100.398 |walltime 11115.822 | +Transformer | epoch 0 | step 42090 |avg loss 7.252 |avg tokens 2274.800 |tokens/s 8410.324 |walltime 11118.527 | +Transformer | epoch 0 | step 42100 |avg loss 7.373 |avg tokens 2236.000 |tokens/s 8329.583 |walltime 11121.211 | +Transformer | epoch 0 | step 42110 |avg loss 7.843 |avg tokens 2134.300 |tokens/s 8051.897 |walltime 11123.862 | +Transformer | epoch 0 | step 42120 |avg loss 6.977 |avg tokens 2270.400 |tokens/s 8551.442 |walltime 11126.517 | +Transformer | epoch 0 | step 42130 |avg loss 8.241 |avg tokens 2053.600 |tokens/s 8457.670 |walltime 11128.945 | +Transformer | epoch 0 | step 42140 |avg loss 7.734 |avg tokens 2108.700 |tokens/s 7958.149 |walltime 11131.595 | +Transformer | epoch 0 | step 42150 |avg loss 7.494 |avg tokens 2049.800 |tokens/s 7814.752 |walltime 11134.218 | +Transformer | epoch 0 | step 42160 |avg loss 7.318 |avg tokens 2388.100 |tokens/s 8547.144 |walltime 11137.012 | +Transformer | epoch 0 | step 42170 |avg loss 7.953 |avg tokens 2063.700 |tokens/s 8154.682 |walltime 11139.543 | +Transformer | epoch 0 | step 42180 |avg loss 7.452 |avg tokens 2241.600 |tokens/s 8532.649 |walltime 11142.170 | +Transformer | epoch 0 | step 42190 |avg loss 7.601 |avg tokens 2350.400 |tokens/s 8878.870 |walltime 11144.817 | +Transformer | epoch 0 | step 42200 |avg loss 7.854 |avg tokens 2188.900 |tokens/s 8774.128 |walltime 11147.312 | +Transformer | epoch 0 | step 42210 |avg loss 7.708 |avg tokens 2362.400 |tokens/s 8930.131 |walltime 11149.957 | +Transformer | epoch 0 | step 42220 |avg loss 7.913 |avg tokens 2032.300 |tokens/s 7918.933 |walltime 11152.523 | +Transformer | epoch 0 | step 42230 |avg loss 7.576 |avg tokens 2279.600 |tokens/s 8607.524 |walltime 11155.172 | +Transformer | epoch 0 | step 42240 |avg loss 8.040 |avg tokens 2229.000 |tokens/s 8638.904 |walltime 11157.752 | +Transformer | epoch 0 | step 42250 |avg loss 7.598 |avg tokens 2277.600 |tokens/s 8469.449 |walltime 11160.441 | +Transformer | epoch 0 | step 42260 |avg loss 7.715 |avg tokens 2335.500 |tokens/s 8642.156 |walltime 11163.144 | +Transformer | epoch 0 | step 42270 |avg loss 7.753 |avg tokens 2317.600 |tokens/s 8635.433 |walltime 11165.827 | +Transformer | epoch 0 | step 42280 |avg loss 7.655 |avg tokens 2359.200 |tokens/s 8740.979 |walltime 11168.526 | +Transformer | epoch 0 | step 42290 |avg loss 7.549 |avg tokens 2138.800 |tokens/s 8167.233 |walltime 11171.145 | +Transformer | epoch 0 | step 42300 |avg loss 7.515 |avg tokens 2235.300 |tokens/s 8343.965 |walltime 11173.824 | +Transformer | epoch 0 | step 42310 |avg loss 7.967 |avg tokens 2103.900 |tokens/s 8498.888 |walltime 11176.300 | +Transformer | epoch 0 | step 42320 |avg loss 8.020 |avg tokens 2284.400 |tokens/s 8418.656 |walltime 11179.013 | +Transformer | epoch 0 | step 42330 |avg loss 7.809 |avg tokens 2166.600 |tokens/s 8342.064 |walltime 11181.610 | +Transformer | epoch 0 | step 42340 |avg loss 7.395 |avg tokens 2122.300 |tokens/s 8011.605 |walltime 11184.259 | +Transformer | epoch 0 | step 42350 |avg loss 7.750 |avg tokens 2322.400 |tokens/s 8657.473 |walltime 11186.942 | +Transformer | epoch 0 | step 42360 |avg loss 7.480 |avg tokens 2208.200 |tokens/s 8229.789 |walltime 11189.625 | +Transformer | epoch 0 | step 42370 |avg loss 7.631 |avg tokens 2136.500 |tokens/s 7912.108 |walltime 11192.325 | +Transformer | epoch 0 | step 42380 |avg loss 7.954 |avg tokens 2008.900 |tokens/s 7475.277 |walltime 11195.013 | +Transformer | epoch 0 | step 42390 |avg loss 7.337 |avg tokens 2408.800 |tokens/s 8578.854 |walltime 11197.821 | +Transformer | epoch 0 | step 42400 |avg loss 7.674 |avg tokens 2234.400 |tokens/s 8381.057 |walltime 11200.487 | +Transformer | epoch 0 | step 42410 |avg loss 7.520 |avg tokens 2188.900 |tokens/s 8104.611 |walltime 11203.187 | +Transformer | epoch 0 | step 42420 |avg loss 7.955 |avg tokens 2382.800 |tokens/s 8886.056 |walltime 11205.869 | +Transformer | epoch 0 | step 42430 |avg loss 7.777 |avg tokens 2402.600 |tokens/s 8950.781 |walltime 11208.553 | +Transformer | epoch 0 | step 42440 |avg loss 7.778 |avg tokens 2075.200 |tokens/s 8020.539 |walltime 11211.141 | +Transformer | epoch 0 | step 42450 |avg loss 7.291 |avg tokens 2311.100 |tokens/s 8618.766 |walltime 11213.822 | +Transformer | epoch 0 | step 42460 |avg loss 7.734 |avg tokens 2233.600 |tokens/s 8285.850 |walltime 11216.518 | +Transformer | epoch 0 | step 42470 |avg loss 7.591 |avg tokens 2245.600 |tokens/s 8492.170 |walltime 11219.162 | +Transformer | epoch 0 | step 42480 |avg loss 7.851 |avg tokens 2107.000 |tokens/s 8114.416 |walltime 11221.759 | +Transformer | epoch 0 | step 42490 |avg loss 7.546 |avg tokens 2266.400 |tokens/s 8276.145 |walltime 11224.497 | +Transformer | epoch 0 | step 42500 |avg loss 7.479 |avg tokens 2360.100 |tokens/s 8633.814 |walltime 11227.231 | +Transformer | epoch 0 | step 42510 |avg loss 7.882 |avg tokens 2080.200 |tokens/s 8180.094 |walltime 11229.774 | +Transformer | epoch 0 | step 42520 |avg loss 7.677 |avg tokens 2261.600 |tokens/s 8370.438 |walltime 11232.476 | +Transformer | epoch 0 | step 42530 |avg loss 7.998 |avg tokens 2016.600 |tokens/s 7979.759 |walltime 11235.003 | +Transformer | epoch 0 | step 42540 |avg loss 7.772 |avg tokens 2171.300 |tokens/s 8238.299 |walltime 11237.638 | +Transformer | epoch 0 | step 42550 |avg loss 7.699 |avg tokens 2117.900 |tokens/s 8112.637 |walltime 11240.249 | +Transformer | epoch 0 | step 42560 |avg loss 7.475 |avg tokens 2265.300 |tokens/s 8402.700 |walltime 11242.945 | +Transformer | epoch 0 | step 42570 |avg loss 7.663 |avg tokens 2201.200 |tokens/s 8215.232 |walltime 11245.624 | +Transformer | epoch 0 | step 42580 |avg loss 7.569 |avg tokens 2327.000 |tokens/s 8482.344 |walltime 11248.368 | +Transformer | epoch 0 | step 42590 |avg loss 7.649 |avg tokens 2098.900 |tokens/s 7819.590 |walltime 11251.052 | +Transformer | epoch 0 | step 42600 |avg loss 7.801 |avg tokens 1864.800 |tokens/s 7508.000 |walltime 11253.536 | +Transformer | epoch 0 | step 42610 |avg loss 7.805 |avg tokens 2190.900 |tokens/s 8390.138 |walltime 11256.147 | +Transformer | epoch 0 | step 42620 |avg loss 7.501 |avg tokens 2196.100 |tokens/s 8149.099 |walltime 11258.842 | +Transformer | epoch 0 | step 42630 |avg loss 7.369 |avg tokens 2356.800 |tokens/s 8477.550 |walltime 11261.622 | +Transformer | epoch 0 | step 42640 |avg loss 7.601 |avg tokens 2151.900 |tokens/s 7946.715 |walltime 11264.330 | +Transformer | epoch 0 | step 42650 |avg loss 8.004 |avg tokens 2246.000 |tokens/s 8681.930 |walltime 11266.917 | +Transformer | epoch 0 | step 42660 |avg loss 7.547 |avg tokens 2253.300 |tokens/s 8305.545 |walltime 11269.630 | +Transformer | epoch 0 | step 42670 |avg loss 7.649 |avg tokens 2298.000 |tokens/s 8798.217 |walltime 11272.242 | +Transformer | epoch 0 | step 42680 |avg loss 7.479 |avg tokens 2178.400 |tokens/s 8198.047 |walltime 11274.899 | +Transformer | epoch 0 | step 42690 |avg loss 7.529 |avg tokens 2148.600 |tokens/s 8177.865 |walltime 11277.526 | +Transformer | epoch 0 | step 42700 |avg loss 7.738 |avg tokens 2287.000 |tokens/s 8416.327 |walltime 11280.244 | +Transformer | epoch 0 | step 42710 |avg loss 7.920 |avg tokens 2189.000 |tokens/s 8455.366 |walltime 11282.832 | +Transformer | epoch 0 | step 42720 |avg loss 7.591 |avg tokens 2267.200 |tokens/s 8775.201 |walltime 11285.416 | +Transformer | epoch 0 | step 42730 |avg loss 7.904 |avg tokens 1981.100 |tokens/s 7814.946 |walltime 11287.951 | +Transformer | epoch 0 | step 42740 |avg loss 7.604 |avg tokens 2281.200 |tokens/s 8609.467 |walltime 11290.601 | +Transformer | epoch 0 | step 42750 |avg loss 7.806 |avg tokens 1998.000 |tokens/s 7755.476 |walltime 11293.177 | +Transformer | epoch 0 | step 42760 |avg loss 8.071 |avg tokens 1824.600 |tokens/s 7682.093 |walltime 11295.552 | +Transformer | epoch 0 | step 42770 |avg loss 7.593 |avg tokens 2187.000 |tokens/s 8472.356 |walltime 11298.133 | +Transformer | epoch 0 | step 42780 |avg loss 8.051 |avg tokens 2071.100 |tokens/s 8765.934 |walltime 11300.496 | +Transformer | epoch 0 | step 42790 |avg loss 7.640 |avg tokens 2197.200 |tokens/s 8351.545 |walltime 11303.127 | +Transformer | epoch 0 | step 42800 |avg loss 7.793 |avg tokens 2114.600 |tokens/s 8342.409 |walltime 11305.662 | +Transformer | epoch 0 | step 42810 |avg loss 8.147 |avg tokens 2338.400 |tokens/s 8956.207 |walltime 11308.273 | +Transformer | epoch 0 | step 42820 |avg loss 8.092 |avg tokens 2020.400 |tokens/s 8142.066 |walltime 11310.754 | +Transformer | epoch 0 | step 42830 |avg loss 7.840 |avg tokens 2144.500 |tokens/s 8266.974 |walltime 11313.348 | +Transformer | epoch 0 | step 42840 |avg loss 7.798 |avg tokens 2370.400 |tokens/s 9047.662 |walltime 11315.968 | +Transformer | epoch 0 | step 42850 |avg loss 7.684 |avg tokens 2247.700 |tokens/s 8446.579 |walltime 11318.629 | +Transformer | epoch 0 | step 42860 |avg loss 7.406 |avg tokens 2294.600 |tokens/s 8362.223 |walltime 11321.373 | +Transformer | epoch 0 | step 42870 |avg loss 7.994 |avg tokens 2149.200 |tokens/s 8391.541 |walltime 11323.934 | +Transformer | epoch 0 | step 42880 |avg loss 7.445 |avg tokens 2163.400 |tokens/s 8136.219 |walltime 11326.593 | +Transformer | epoch 0 | step 42890 |avg loss 7.559 |avg tokens 2278.400 |tokens/s 8217.284 |walltime 11329.366 | +Transformer | epoch 0 | step 42900 |avg loss 7.625 |avg tokens 2268.800 |tokens/s 8189.420 |walltime 11332.136 | +Transformer | epoch 0 | step 42910 |avg loss 7.859 |avg tokens 2289.400 |tokens/s 8709.114 |walltime 11334.765 | +Transformer | epoch 0 | step 42920 |avg loss 7.647 |avg tokens 2231.700 |tokens/s 8445.644 |walltime 11337.408 | +Transformer | epoch 0 | step 42930 |avg loss 7.963 |avg tokens 1990.800 |tokens/s 7785.275 |walltime 11339.965 | +Transformer | epoch 0 | step 42940 |avg loss 7.283 |avg tokens 2373.600 |tokens/s 8564.813 |walltime 11342.736 | +Transformer | epoch 0 | step 42950 |avg loss 7.605 |avg tokens 1968.000 |tokens/s 7606.850 |walltime 11345.323 | +Transformer | epoch 0 | step 42960 |avg loss 7.354 |avg tokens 2297.500 |tokens/s 8336.008 |walltime 11348.079 | +Transformer | epoch 0 | step 42970 |avg loss 7.639 |avg tokens 2084.500 |tokens/s 8216.484 |walltime 11350.616 | +Transformer | epoch 0 | step 42980 |avg loss 7.807 |avg tokens 2073.600 |tokens/s 8020.616 |walltime 11353.202 | +Transformer | epoch 0 | step 42990 |avg loss 7.538 |avg tokens 2183.100 |tokens/s 8272.858 |walltime 11355.841 | +Transformer | epoch 0 | step 43000 |avg loss 8.003 |avg tokens 2169.100 |tokens/s 8550.691 |walltime 11358.377 | +Transformer | epoch 0 | step 43010 |avg loss 7.852 |avg tokens 2265.400 |tokens/s 8373.555 |walltime 11361.083 | +Transformer | epoch 0 | step 43020 |avg loss 7.738 |avg tokens 2249.300 |tokens/s 8561.554 |walltime 11363.710 | +Transformer | epoch 0 | step 43030 |avg loss 7.324 |avg tokens 2159.900 |tokens/s 8156.027 |walltime 11366.358 | +Transformer | epoch 0 | step 43040 |avg loss 7.658 |avg tokens 2246.400 |tokens/s 8459.825 |walltime 11369.014 | +Transformer | epoch 0 | step 43050 |avg loss 7.445 |avg tokens 2314.600 |tokens/s 8475.342 |walltime 11371.745 | +Transformer | epoch 0 | step 43060 |avg loss 7.767 |avg tokens 2168.300 |tokens/s 8057.442 |walltime 11374.436 | +Transformer | epoch 0 | step 43070 |avg loss 7.647 |avg tokens 2177.200 |tokens/s 8173.964 |walltime 11377.099 | +Transformer | epoch 0 | step 43080 |avg loss 7.572 |avg tokens 2097.800 |tokens/s 8079.623 |walltime 11379.696 | +Transformer | epoch 0 | step 43090 |avg loss 7.341 |avg tokens 2251.400 |tokens/s 8214.803 |walltime 11382.436 | +Transformer | epoch 0 | step 43100 |avg loss 7.563 |avg tokens 2379.200 |tokens/s 8599.114 |walltime 11385.203 | +Transformer | epoch 0 | step 43110 |avg loss 7.410 |avg tokens 2347.400 |tokens/s 8465.851 |walltime 11387.976 | +Transformer | epoch 0 | step 43120 |avg loss 8.251 |avg tokens 1994.300 |tokens/s 8081.135 |walltime 11390.444 | +Transformer | epoch 0 | step 43130 |avg loss 7.796 |avg tokens 2163.200 |tokens/s 8312.175 |walltime 11393.046 | +Transformer | epoch 0 | step 43140 |avg loss 7.625 |avg tokens 2209.600 |tokens/s 8376.172 |walltime 11395.684 | +Transformer | epoch 0 | step 43150 |avg loss 7.809 |avg tokens 2221.500 |tokens/s 8294.013 |walltime 11398.363 | +Transformer | epoch 0 | step 43160 |avg loss 7.746 |avg tokens 2226.600 |tokens/s 8324.057 |walltime 11401.037 | +Transformer | epoch 0 | step 43170 |avg loss 7.465 |avg tokens 2332.300 |tokens/s 8548.950 |walltime 11403.766 | +Transformer | epoch 0 | step 43180 |avg loss 7.463 |avg tokens 2208.200 |tokens/s 8036.003 |walltime 11406.514 | +Transformer | epoch 0 | step 43190 |avg loss 8.027 |avg tokens 2033.900 |tokens/s 7908.970 |walltime 11409.085 | +Transformer | epoch 0 | step 43200 |avg loss 7.800 |avg tokens 2164.200 |tokens/s 8137.474 |walltime 11411.745 | +Transformer | epoch 0 | step 43210 |avg loss 7.721 |avg tokens 2210.100 |tokens/s 8174.403 |walltime 11414.448 | +Transformer | epoch 0 | step 43220 |avg loss 7.542 |avg tokens 2173.600 |tokens/s 8126.768 |walltime 11417.123 | +Transformer | epoch 0 | step 43230 |avg loss 7.885 |avg tokens 1963.400 |tokens/s 7581.538 |walltime 11419.713 | +Transformer | epoch 0 | step 43240 |avg loss 7.866 |avg tokens 1751.900 |tokens/s 7301.599 |walltime 11422.112 | +Transformer | epoch 0 | step 43250 |avg loss 7.497 |avg tokens 2096.600 |tokens/s 8165.256 |walltime 11424.680 | +Transformer | epoch 0 | step 43260 |avg loss 7.672 |avg tokens 2156.300 |tokens/s 8137.069 |walltime 11427.330 | +Transformer | epoch 0 | step 43270 |avg loss 7.177 |avg tokens 2216.000 |tokens/s 8101.097 |walltime 11430.065 | +Transformer | epoch 0 | step 43280 |avg loss 7.793 |avg tokens 1886.900 |tokens/s 7349.343 |walltime 11432.633 | +Transformer | epoch 0 | step 43290 |avg loss 8.021 |avg tokens 2174.000 |tokens/s 8576.991 |walltime 11435.167 | +Transformer | epoch 0 | step 43300 |avg loss 7.399 |avg tokens 2252.400 |tokens/s 8222.661 |walltime 11437.907 | +Transformer | epoch 0 | step 43310 |avg loss 7.274 |avg tokens 2322.500 |tokens/s 8407.874 |walltime 11440.669 | +Transformer | epoch 0 | step 43320 |avg loss 7.531 |avg tokens 2329.600 |tokens/s 8562.261 |walltime 11443.390 | +Transformer | epoch 0 | step 43330 |avg loss 8.067 |avg tokens 2174.500 |tokens/s 8497.013 |walltime 11445.949 | +Transformer | epoch 0 | step 43340 |avg loss 8.086 |avg tokens 2313.800 |tokens/s 8718.448 |walltime 11448.603 | +Transformer | epoch 0 | step 43350 |avg loss 7.793 |avg tokens 2121.100 |tokens/s 8106.591 |walltime 11451.219 | +Transformer | epoch 0 | step 43360 |avg loss 7.652 |avg tokens 2195.500 |tokens/s 8212.136 |walltime 11453.893 | +Transformer | epoch 0 | step 43370 |avg loss 8.105 |avg tokens 2218.300 |tokens/s 8921.471 |walltime 11456.379 | +Transformer | epoch 0 | step 43380 |avg loss 7.685 |avg tokens 2072.700 |tokens/s 8162.086 |walltime 11458.919 | +Transformer | epoch 0 | step 43390 |avg loss 7.683 |avg tokens 2324.000 |tokens/s 8627.459 |walltime 11461.612 | +Transformer | epoch 0 | step 43400 |avg loss 7.808 |avg tokens 2241.600 |tokens/s 8627.540 |walltime 11464.211 | +Transformer | epoch 0 | step 43410 |avg loss 7.792 |avg tokens 2201.600 |tokens/s 8351.923 |walltime 11466.847 | +Transformer | epoch 0 | step 43420 |avg loss 7.470 |avg tokens 2125.000 |tokens/s 8013.034 |walltime 11469.499 | +Transformer | epoch 0 | step 43430 |avg loss 7.977 |avg tokens 2038.000 |tokens/s 8333.347 |walltime 11471.944 | +Transformer | epoch 0 | step 43440 |avg loss 7.828 |avg tokens 2254.800 |tokens/s 8655.201 |walltime 11474.549 | +Transformer | epoch 0 | step 43450 |avg loss 7.421 |avg tokens 2334.100 |tokens/s 8543.597 |walltime 11477.281 | +Transformer | epoch 0 | step 43460 |avg loss 8.108 |avg tokens 1967.200 |tokens/s 8220.437 |walltime 11479.674 | +Transformer | epoch 0 | step 43470 |avg loss 7.746 |avg tokens 2274.500 |tokens/s 8687.901 |walltime 11482.292 | +Transformer | epoch 0 | step 43480 |avg loss 7.747 |avg tokens 2285.600 |tokens/s 8684.512 |walltime 11484.924 | +Transformer | epoch 0 | step 43490 |avg loss 7.647 |avg tokens 2262.000 |tokens/s 8237.306 |walltime 11487.670 | +Transformer | epoch 0 | step 43500 |avg loss 7.586 |avg tokens 2224.000 |tokens/s 8133.811 |walltime 11490.405 | +Transformer | epoch 0 | step 43510 |avg loss 7.957 |avg tokens 2282.800 |tokens/s 8961.692 |walltime 11492.952 | +Transformer | epoch 0 | step 43520 |avg loss 7.448 |avg tokens 2393.600 |tokens/s 8606.728 |walltime 11495.733 | +Transformer | epoch 0 | step 43530 |avg loss 7.800 |avg tokens 2306.300 |tokens/s 8742.007 |walltime 11498.371 | +Transformer | epoch 0 | step 43540 |avg loss 7.470 |avg tokens 2346.700 |tokens/s 8573.758 |walltime 11501.108 | +Transformer | epoch 0 | step 43550 |avg loss 7.846 |avg tokens 2290.600 |tokens/s 8708.914 |walltime 11503.738 | +Transformer | epoch 0 | step 43560 |avg loss 7.657 |avg tokens 2237.600 |tokens/s 8771.687 |walltime 11506.289 | +Transformer | epoch 0 | step 43570 |avg loss 7.894 |avg tokens 2158.400 |tokens/s 8348.325 |walltime 11508.875 | +Transformer | epoch 0 | step 43580 |avg loss 7.782 |avg tokens 2261.600 |tokens/s 8599.447 |walltime 11511.505 | +Transformer | epoch 0 | step 43590 |avg loss 7.565 |avg tokens 2198.800 |tokens/s 8216.136 |walltime 11514.181 | +Transformer | epoch 0 | step 43600 |avg loss 7.754 |avg tokens 2009.000 |tokens/s 7945.352 |walltime 11516.709 | +Transformer | epoch 0 | step 43610 |avg loss 7.604 |avg tokens 2346.600 |tokens/s 8564.193 |walltime 11519.449 | +Transformer | epoch 0 | step 43620 |avg loss 7.778 |avg tokens 2187.400 |tokens/s 8131.624 |walltime 11522.139 | +Transformer | epoch 0 | step 43630 |avg loss 7.920 |avg tokens 2009.100 |tokens/s 7982.705 |walltime 11524.656 | +Transformer | epoch 0 | step 43640 |avg loss 7.189 |avg tokens 2148.100 |tokens/s 8412.779 |walltime 11527.210 | +Transformer | epoch 0 | step 43650 |avg loss 7.715 |avg tokens 2287.300 |tokens/s 8692.368 |walltime 11529.841 | +Transformer | epoch 0 | step 43660 |avg loss 7.599 |avg tokens 2122.400 |tokens/s 8222.454 |walltime 11532.422 | +Transformer | epoch 0 | step 43670 |avg loss 7.825 |avg tokens 2286.900 |tokens/s 8801.593 |walltime 11535.020 | +Transformer | epoch 0 | step 43680 |avg loss 7.844 |avg tokens 2248.800 |tokens/s 8677.861 |walltime 11537.612 | +Transformer | epoch 0 | step 43690 |avg loss 7.168 |avg tokens 2206.400 |tokens/s 8038.500 |walltime 11540.357 | +Transformer | epoch 0 | step 43700 |avg loss 7.395 |avg tokens 2165.600 |tokens/s 7998.804 |walltime 11543.064 | +Transformer | epoch 0 | step 43710 |avg loss 7.761 |avg tokens 2145.300 |tokens/s 8206.693 |walltime 11545.678 | +Transformer | epoch 0 | step 43720 |avg loss 7.530 |avg tokens 2245.400 |tokens/s 8281.601 |walltime 11548.390 | +Transformer | epoch 0 | step 43730 |avg loss 7.580 |avg tokens 2206.600 |tokens/s 8348.326 |walltime 11551.033 | +Transformer | epoch 0 | step 43740 |avg loss 7.383 |avg tokens 2345.600 |tokens/s 8431.028 |walltime 11553.815 | +Transformer | epoch 0 | step 43750 |avg loss 7.547 |avg tokens 2224.500 |tokens/s 8345.697 |walltime 11556.480 | +Transformer | epoch 0 | step 43760 |avg loss 7.648 |avg tokens 2182.400 |tokens/s 8083.690 |walltime 11559.180 | +Transformer | epoch 0 | step 43770 |avg loss 7.815 |avg tokens 2180.300 |tokens/s 8335.051 |walltime 11561.796 | +Transformer | epoch 0 | step 43780 |avg loss 7.430 |avg tokens 2349.500 |tokens/s 8440.076 |walltime 11564.580 | +Transformer | epoch 0 | step 43790 |avg loss 7.755 |avg tokens 2190.400 |tokens/s 8357.827 |walltime 11567.200 | +Transformer | epoch 0 | step 43800 |avg loss 7.874 |avg tokens 2153.300 |tokens/s 8079.157 |walltime 11569.866 | +Transformer | epoch 0 | step 43810 |avg loss 7.224 |avg tokens 2049.600 |tokens/s 8086.698 |walltime 11572.400 | +Transformer | epoch 0 | step 43820 |avg loss 7.434 |avg tokens 2097.600 |tokens/s 7868.076 |walltime 11575.066 | +Transformer | epoch 0 | step 43830 |avg loss 7.646 |avg tokens 2205.700 |tokens/s 8253.019 |walltime 11577.739 | +Transformer | epoch 0 | step 43840 |avg loss 7.747 |avg tokens 2360.200 |tokens/s 8750.110 |walltime 11580.436 | +Transformer | epoch 0 | step 43850 |avg loss 7.833 |avg tokens 2122.600 |tokens/s 8409.931 |walltime 11582.960 | +Transformer | epoch 0 | step 43860 |avg loss 7.810 |avg tokens 2111.600 |tokens/s 8034.688 |walltime 11585.588 | +Transformer | epoch 0 | step 43870 |avg loss 7.716 |avg tokens 2267.200 |tokens/s 8524.358 |walltime 11588.248 | +Transformer | epoch 0 | step 43880 |avg loss 7.718 |avg tokens 2040.500 |tokens/s 7808.916 |walltime 11590.861 | +Transformer | epoch 0 | step 43890 |avg loss 7.988 |avg tokens 2282.100 |tokens/s 8756.581 |walltime 11593.467 | +Transformer | epoch 0 | step 43900 |avg loss 7.897 |avg tokens 2194.800 |tokens/s 8700.358 |walltime 11595.990 | +Transformer | epoch 0 | step 43910 |avg loss 7.552 |avg tokens 2183.600 |tokens/s 8076.991 |walltime 11598.693 | +Transformer | epoch 0 | step 43920 |avg loss 7.697 |avg tokens 2112.100 |tokens/s 8221.173 |walltime 11601.262 | +Transformer | epoch 0 | step 43930 |avg loss 7.832 |avg tokens 2080.200 |tokens/s 8382.605 |walltime 11603.744 | +Transformer | epoch 0 | step 43940 |avg loss 8.039 |avg tokens 2045.100 |tokens/s 8208.715 |walltime 11606.235 | +Transformer | epoch 0 | step 43950 |avg loss 7.792 |avg tokens 2067.600 |tokens/s 8158.011 |walltime 11608.770 | +Transformer | epoch 0 | step 43960 |avg loss 7.522 |avg tokens 2151.000 |tokens/s 8121.042 |walltime 11611.418 | +Transformer | epoch 0 | step 43970 |avg loss 7.452 |avg tokens 2240.000 |tokens/s 8252.266 |walltime 11614.133 | +Transformer | epoch 0 | step 43980 |avg loss 7.678 |avg tokens 2180.000 |tokens/s 8140.499 |walltime 11616.811 | +Transformer | epoch 0 | step 43990 |avg loss 7.437 |avg tokens 2107.900 |tokens/s 7861.977 |walltime 11619.492 | +Transformer | epoch 0 | step 44000 |avg loss 8.145 |avg tokens 2068.100 |tokens/s 8338.868 |walltime 11621.972 | +Transformer | epoch 0 | step 44010 |avg loss 7.679 |avg tokens 2267.200 |tokens/s 8598.927 |walltime 11624.608 | +Transformer | epoch 0 | step 44020 |avg loss 7.756 |avg tokens 2091.600 |tokens/s 8065.802 |walltime 11627.202 | +Transformer | epoch 0 | step 44030 |avg loss 7.642 |avg tokens 2270.600 |tokens/s 8479.215 |walltime 11629.880 | +Transformer | epoch 0 | step 44040 |avg loss 7.580 |avg tokens 2032.300 |tokens/s 7798.455 |walltime 11632.486 | +Transformer | epoch 0 | step 44050 |avg loss 7.719 |avg tokens 2246.300 |tokens/s 8718.780 |walltime 11635.062 | +Transformer | epoch 0 | step 44060 |avg loss 7.741 |avg tokens 2128.900 |tokens/s 8208.231 |walltime 11637.656 | +Transformer | epoch 0 | step 44070 |avg loss 7.608 |avg tokens 2194.400 |tokens/s 8247.363 |walltime 11640.316 | +Transformer | epoch 0 | step 44080 |avg loss 7.813 |avg tokens 2155.800 |tokens/s 8369.615 |walltime 11642.892 | +Transformer | epoch 0 | step 44090 |avg loss 7.461 |avg tokens 2295.200 |tokens/s 8328.189 |walltime 11645.648 | +Transformer | epoch 0 | step 44100 |avg loss 8.360 |avg tokens 2237.900 |tokens/s 9080.191 |walltime 11648.113 | +Transformer | epoch 0 | step 44110 |avg loss 7.639 |avg tokens 2037.100 |tokens/s 7949.426 |walltime 11650.675 | +Transformer | epoch 0 | step 44120 |avg loss 7.491 |avg tokens 2303.300 |tokens/s 8377.280 |walltime 11653.425 | +Transformer | epoch 0 | step 44130 |avg loss 7.881 |avg tokens 2102.000 |tokens/s 8175.154 |walltime 11655.996 | +Transformer | epoch 0 | step 44140 |avg loss 7.451 |avg tokens 2269.900 |tokens/s 8164.175 |walltime 11658.776 | +Transformer | epoch 0 | step 44150 |avg loss 7.164 |avg tokens 2271.700 |tokens/s 8290.823 |walltime 11661.516 | +Transformer | epoch 0 | step 44160 |avg loss 7.623 |avg tokens 2132.400 |tokens/s 8086.004 |walltime 11664.153 | +Transformer | epoch 0 | step 44170 |avg loss 7.910 |avg tokens 2079.900 |tokens/s 8179.429 |walltime 11666.696 | +Transformer | epoch 0 | step 44180 |avg loss 7.598 |avg tokens 2189.300 |tokens/s 8298.023 |walltime 11669.335 | +Transformer | epoch 0 | step 44190 |avg loss 7.674 |avg tokens 1925.300 |tokens/s 7426.900 |walltime 11671.927 | +Transformer | epoch 0 | step 44200 |avg loss 7.554 |avg tokens 2260.800 |tokens/s 8302.996 |walltime 11674.650 | +Transformer | epoch 0 | step 44210 |avg loss 7.853 |avg tokens 2090.300 |tokens/s 8116.105 |walltime 11677.225 | +Transformer | epoch 0 | step 44220 |avg loss 7.761 |avg tokens 2214.600 |tokens/s 8319.799 |walltime 11679.887 | +Transformer | epoch 0 | step 44230 |avg loss 7.833 |avg tokens 1903.300 |tokens/s 7696.733 |walltime 11682.360 | +Transformer | epoch 0 | step 44240 |avg loss 7.904 |avg tokens 2035.900 |tokens/s 7919.352 |walltime 11684.931 | +Transformer | epoch 0 | step 44250 |avg loss 7.947 |avg tokens 1937.600 |tokens/s 7625.827 |walltime 11687.472 | +Transformer | epoch 0 | step 44260 |avg loss 7.714 |avg tokens 2032.600 |tokens/s 7865.896 |walltime 11690.056 | +Transformer | epoch 0 | step 44270 |avg loss 7.353 |avg tokens 2235.200 |tokens/s 8227.392 |walltime 11692.772 | +Transformer | epoch 0 | step 44280 |avg loss 7.891 |avg tokens 1989.900 |tokens/s 8005.452 |walltime 11695.258 | +Transformer | epoch 0 | step 44290 |avg loss 7.613 |avg tokens 2155.200 |tokens/s 8130.565 |walltime 11697.909 | +Transformer | epoch 0 | step 44300 |avg loss 7.838 |avg tokens 2167.700 |tokens/s 8286.121 |walltime 11700.525 | +Transformer | epoch 0 | step 44310 |avg loss 7.388 |avg tokens 2258.200 |tokens/s 8361.054 |walltime 11703.226 | +Transformer | epoch 0 | step 44320 |avg loss 7.560 |avg tokens 2256.000 |tokens/s 8372.457 |walltime 11705.920 | +Transformer | epoch 0 | step 44330 |avg loss 7.831 |avg tokens 2325.000 |tokens/s 8989.149 |walltime 11708.507 | +Transformer | epoch 0 | step 44340 |avg loss 7.766 |avg tokens 2359.200 |tokens/s 8679.206 |walltime 11711.225 | +Transformer | epoch 0 | step 44350 |avg loss 7.656 |avg tokens 2136.800 |tokens/s 8083.260 |walltime 11713.869 | +Transformer | epoch 0 | step 44360 |avg loss 7.716 |avg tokens 2379.200 |tokens/s 8899.233 |walltime 11716.542 | +Transformer | epoch 0 | step 44370 |avg loss 7.335 |avg tokens 2213.700 |tokens/s 8123.219 |walltime 11719.267 | +Transformer | epoch 0 | step 44380 |avg loss 7.765 |avg tokens 2068.800 |tokens/s 8302.561 |walltime 11721.759 | +Transformer | epoch 0 | step 44390 |avg loss 7.442 |avg tokens 2145.600 |tokens/s 7931.905 |walltime 11724.464 | +Transformer | epoch 0 | step 44400 |avg loss 7.910 |avg tokens 1790.400 |tokens/s 7387.928 |walltime 11726.887 | +Transformer | epoch 0 | step 44410 |avg loss 8.020 |avg tokens 2128.600 |tokens/s 8725.154 |walltime 11729.327 | +Transformer | epoch 0 | step 44420 |avg loss 7.663 |avg tokens 2321.000 |tokens/s 8733.949 |walltime 11731.984 | +Transformer | epoch 0 | step 44430 |avg loss 8.073 |avg tokens 2096.500 |tokens/s 8678.985 |walltime 11734.400 | +Transformer | epoch 0 | step 44440 |avg loss 7.682 |avg tokens 2363.100 |tokens/s 8691.046 |walltime 11737.119 | +Transformer | epoch 0 | step 44450 |avg loss 7.732 |avg tokens 2291.700 |tokens/s 8667.351 |walltime 11739.763 | +Transformer | epoch 0 | step 44460 |avg loss 8.026 |avg tokens 2067.600 |tokens/s 8397.698 |walltime 11742.225 | +Transformer | epoch 0 | step 44470 |avg loss 7.492 |avg tokens 2175.900 |tokens/s 8163.270 |walltime 11744.891 | +Transformer | epoch 0 | step 44480 |avg loss 7.440 |avg tokens 2090.400 |tokens/s 7810.567 |walltime 11747.567 | +Transformer | epoch 0 | step 44490 |avg loss 7.667 |avg tokens 2329.600 |tokens/s 8520.922 |walltime 11750.301 | +Transformer | epoch 0 | step 44500 |avg loss 7.746 |avg tokens 2310.000 |tokens/s 8612.489 |walltime 11752.983 | +Transformer | epoch 0 | step 44510 |avg loss 7.835 |avg tokens 1975.800 |tokens/s 7845.241 |walltime 11755.502 | +Transformer | epoch 0 | step 44520 |avg loss 7.776 |avg tokens 2270.900 |tokens/s 8412.318 |walltime 11758.201 | +Transformer | epoch 0 | step 44530 |avg loss 7.387 |avg tokens 2362.300 |tokens/s 8753.880 |walltime 11760.900 | +Transformer | epoch 0 | step 44540 |avg loss 7.738 |avg tokens 2150.700 |tokens/s 8566.356 |walltime 11763.410 | +Transformer | epoch 0 | step 44550 |avg loss 7.066 |avg tokens 2371.200 |tokens/s 8376.009 |walltime 11766.241 | +Transformer | epoch 0 | step 44560 |avg loss 8.234 |avg tokens 2107.000 |tokens/s 8620.328 |walltime 11768.686 | +Transformer | epoch 0 | step 44570 |avg loss 7.485 |avg tokens 2261.300 |tokens/s 8318.406 |walltime 11771.404 | +Transformer | epoch 0 | step 44580 |avg loss 7.890 |avg tokens 1905.200 |tokens/s 7757.215 |walltime 11773.860 | +Transformer | epoch 0 | step 44590 |avg loss 8.068 |avg tokens 2219.700 |tokens/s 8561.409 |walltime 11776.453 | +Transformer | epoch 0 | step 44600 |avg loss 7.659 |avg tokens 2187.400 |tokens/s 8138.200 |walltime 11779.141 | +Transformer | epoch 0 | step 44610 |avg loss 8.091 |avg tokens 1909.100 |tokens/s 7774.437 |walltime 11781.596 | +Transformer | epoch 0 | step 44620 |avg loss 7.785 |avg tokens 2127.500 |tokens/s 8294.466 |walltime 11784.161 | +Transformer | epoch 0 | step 44630 |avg loss 7.434 |avg tokens 2204.500 |tokens/s 8349.471 |walltime 11786.801 | +Transformer | epoch 0 | step 44640 |avg loss 7.782 |avg tokens 2132.900 |tokens/s 8287.706 |walltime 11789.375 | +Transformer | epoch 0 | step 44650 |avg loss 7.850 |avg tokens 2172.600 |tokens/s 8535.882 |walltime 11791.920 | +Transformer | epoch 0 | step 44660 |avg loss 7.435 |avg tokens 2250.300 |tokens/s 8499.790 |walltime 11794.568 | +Transformer | epoch 0 | step 44670 |avg loss 7.643 |avg tokens 2036.900 |tokens/s 7866.759 |walltime 11797.157 | +Transformer | epoch 0 | step 44680 |avg loss 7.364 |avg tokens 1935.800 |tokens/s 7606.721 |walltime 11799.702 | +Transformer | epoch 0 | step 44690 |avg loss 7.272 |avg tokens 2100.800 |tokens/s 7931.066 |walltime 11802.351 | +Transformer | epoch 0 | step 44700 |avg loss 7.556 |avg tokens 2033.000 |tokens/s 7719.516 |walltime 11804.984 | +Transformer | epoch 0 | step 44710 |avg loss 7.747 |avg tokens 2227.200 |tokens/s 8336.808 |walltime 11807.656 | +Transformer | epoch 0 | step 44720 |avg loss 7.570 |avg tokens 2221.000 |tokens/s 8153.763 |walltime 11810.380 | +Transformer | epoch 0 | step 44730 |avg loss 7.471 |avg tokens 2375.000 |tokens/s 8548.123 |walltime 11813.158 | +Transformer | epoch 0 | step 44740 |avg loss 7.892 |avg tokens 2216.900 |tokens/s 8409.677 |walltime 11815.794 | +Transformer | epoch 0 | step 44750 |avg loss 7.601 |avg tokens 2154.100 |tokens/s 8039.464 |walltime 11818.474 | +Transformer | epoch 0 | step 44760 |avg loss 7.493 |avg tokens 1925.900 |tokens/s 7588.850 |walltime 11821.011 | +Transformer | epoch 0 | step 44770 |avg loss 7.761 |avg tokens 2238.500 |tokens/s 8292.822 |walltime 11823.711 | +Transformer | epoch 0 | step 44780 |avg loss 7.376 |avg tokens 2298.700 |tokens/s 8354.632 |walltime 11826.462 | +Transformer | epoch 0 | step 44790 |avg loss 7.840 |avg tokens 2200.000 |tokens/s 8609.388 |walltime 11829.018 | +Transformer | epoch 0 | step 44800 |avg loss 7.926 |avg tokens 2289.400 |tokens/s 9085.425 |walltime 11831.537 | +Transformer | epoch 0 | step 44810 |avg loss 7.419 |avg tokens 2368.600 |tokens/s 8578.507 |walltime 11834.299 | +Transformer | epoch 0 | step 44820 |avg loss 7.601 |avg tokens 2076.700 |tokens/s 8199.378 |walltime 11836.831 | +Transformer | epoch 0 | step 44830 |avg loss 7.692 |avg tokens 2330.600 |tokens/s 8687.772 |walltime 11839.514 | +Transformer | epoch 0 | step 44840 |avg loss 8.017 |avg tokens 1783.000 |tokens/s 7184.019 |walltime 11841.996 | +Transformer | epoch 0 | step 44850 |avg loss 7.449 |avg tokens 2131.800 |tokens/s 7968.256 |walltime 11844.671 | +Transformer | epoch 0 | step 44860 |avg loss 7.376 |avg tokens 2400.800 |tokens/s 8566.864 |walltime 11847.474 | +Transformer | epoch 0 | step 44870 |avg loss 7.852 |avg tokens 1966.900 |tokens/s 8093.950 |walltime 11849.904 | +Transformer | epoch 0 | step 44880 |avg loss 7.861 |avg tokens 2068.400 |tokens/s 8257.748 |walltime 11852.408 | +Transformer | epoch 0 | step 44890 |avg loss 7.885 |avg tokens 2009.300 |tokens/s 7963.482 |walltime 11854.932 | +Transformer | epoch 0 | step 44900 |avg loss 7.418 |avg tokens 2293.600 |tokens/s 8409.222 |walltime 11857.659 | +Transformer | epoch 0 | step 44910 |avg loss 7.790 |avg tokens 2246.900 |tokens/s 8632.355 |walltime 11860.262 | +Transformer | epoch 0 | step 44920 |avg loss 6.930 |avg tokens 2232.000 |tokens/s 8200.635 |walltime 11862.984 | +Transformer | epoch 0 | step 44930 |avg loss 7.765 |avg tokens 2182.000 |tokens/s 8466.824 |walltime 11865.561 | +Transformer | epoch 0 | step 44940 |avg loss 7.773 |avg tokens 2169.600 |tokens/s 8168.686 |walltime 11868.217 | +Transformer | epoch 0 | step 44950 |avg loss 7.973 |avg tokens 2039.100 |tokens/s 7866.146 |walltime 11870.809 | +Transformer | epoch 0 | step 44960 |avg loss 7.614 |avg tokens 2393.300 |tokens/s 8794.548 |walltime 11873.530 | +Transformer | epoch 0 | step 44970 |avg loss 7.915 |avg tokens 2069.700 |tokens/s 8268.910 |walltime 11876.033 | +Transformer | epoch 0 | step 44980 |avg loss 7.814 |avg tokens 2240.300 |tokens/s 8621.254 |walltime 11878.632 | +Transformer | epoch 0 | step 44990 |avg loss 7.697 |avg tokens 2070.200 |tokens/s 8079.252 |walltime 11881.194 | +Transformer | epoch 0 | step 45000 |avg loss 7.880 |avg tokens 2193.000 |tokens/s 8219.429 |walltime 11883.863 | +Transformer | epoch 0 | step 45010 |avg loss 7.171 |avg tokens 2346.400 |tokens/s 8433.908 |walltime 11886.645 | +Transformer | epoch 0 | step 45020 |avg loss 7.722 |avg tokens 2170.000 |tokens/s 7919.731 |walltime 11889.385 | +Transformer | epoch 0 | step 45030 |avg loss 8.117 |avg tokens 2019.000 |tokens/s 8062.952 |walltime 11891.889 | +Transformer | epoch 0 | step 45040 |avg loss 7.952 |avg tokens 2219.700 |tokens/s 8630.145 |walltime 11894.461 | +Transformer | epoch 0 | step 45050 |avg loss 7.584 |avg tokens 2203.100 |tokens/s 8108.931 |walltime 11897.178 | +Transformer | epoch 0 | step 45060 |avg loss 7.785 |avg tokens 2175.500 |tokens/s 8601.697 |walltime 11899.707 | +Transformer | epoch 0 | step 45070 |avg loss 7.686 |avg tokens 2216.300 |tokens/s 8521.009 |walltime 11902.308 | +Transformer | epoch 0 | step 45080 |avg loss 7.762 |avg tokens 2265.000 |tokens/s 8366.397 |walltime 11905.015 | +Transformer | epoch 0 | step 45090 |avg loss 7.708 |avg tokens 2333.000 |tokens/s 9003.954 |walltime 11907.606 | +Transformer | epoch 0 | step 45100 |avg loss 7.773 |avg tokens 2080.800 |tokens/s 7859.649 |walltime 11910.254 | +Transformer | epoch 0 | step 45110 |avg loss 7.669 |avg tokens 2194.700 |tokens/s 7980.221 |walltime 11913.004 | +Transformer | epoch 0 | step 45120 |avg loss 7.479 |avg tokens 2177.200 |tokens/s 8328.021 |walltime 11915.618 | +Transformer | epoch 0 | step 45130 |avg loss 7.428 |avg tokens 2049.300 |tokens/s 7909.239 |walltime 11918.209 | +Transformer | epoch 0 | step 45140 |avg loss 7.573 |avg tokens 2062.800 |tokens/s 7975.542 |walltime 11920.795 | +Transformer | epoch 0 | step 45150 |avg loss 7.627 |avg tokens 2077.300 |tokens/s 8629.266 |walltime 11923.203 | +Transformer | epoch 0 | step 45160 |avg loss 7.630 |avg tokens 2236.000 |tokens/s 8263.650 |walltime 11925.909 | +Transformer | epoch 0 | step 45170 |avg loss 7.568 |avg tokens 2205.700 |tokens/s 8217.750 |walltime 11928.593 | +Transformer | epoch 0 | step 45180 |avg loss 7.880 |avg tokens 2204.800 |tokens/s 8385.876 |walltime 11931.222 | +Transformer | epoch 0 | step 45190 |avg loss 7.685 |avg tokens 2117.200 |tokens/s 8044.360 |walltime 11933.854 | +Transformer | epoch 0 | step 45200 |avg loss 7.432 |avg tokens 2272.800 |tokens/s 8319.095 |walltime 11936.586 | +Transformer | epoch 0 | step 45210 |avg loss 7.995 |avg tokens 1979.100 |tokens/s 8012.462 |walltime 11939.056 | +Transformer | epoch 0 | step 45220 |avg loss 7.692 |avg tokens 2183.200 |tokens/s 8393.643 |walltime 11941.657 | +Transformer | epoch 0 | step 45230 |avg loss 7.704 |avg tokens 2035.300 |tokens/s 8062.221 |walltime 11944.181 | +Transformer | epoch 0 | step 45240 |avg loss 7.378 |avg tokens 2006.300 |tokens/s 7772.225 |walltime 11946.763 | +Transformer | epoch 0 | step 45250 |avg loss 7.383 |avg tokens 2246.800 |tokens/s 8373.272 |walltime 11949.446 | +Transformer | epoch 0 | step 45260 |avg loss 7.611 |avg tokens 2122.700 |tokens/s 7958.174 |walltime 11952.113 | +Transformer | epoch 0 | step 45270 |avg loss 7.824 |avg tokens 2249.600 |tokens/s 8763.601 |walltime 11954.680 | +Transformer | epoch 0 | step 45280 |avg loss 7.674 |avg tokens 2100.400 |tokens/s 8278.841 |walltime 11957.217 | +Transformer | epoch 0 | step 45290 |avg loss 7.815 |avg tokens 1977.100 |tokens/s 7868.951 |walltime 11959.730 | +Transformer | epoch 0 | step 45300 |avg loss 7.627 |avg tokens 2184.300 |tokens/s 8507.522 |walltime 11962.297 | +Transformer | epoch 0 | step 45310 |avg loss 7.752 |avg tokens 2167.300 |tokens/s 8028.643 |walltime 11964.997 | +Transformer | epoch 0 | step 45320 |avg loss 8.018 |avg tokens 2133.700 |tokens/s 8361.845 |walltime 11967.549 | +Transformer | epoch 0 | step 45330 |avg loss 7.606 |avg tokens 2309.800 |tokens/s 8347.457 |walltime 11970.316 | +Transformer | epoch 0 | step 45340 |avg loss 7.205 |avg tokens 2361.600 |tokens/s 8381.873 |walltime 11973.133 | +Transformer | epoch 0 | step 45350 |avg loss 7.371 |avg tokens 2229.600 |tokens/s 8257.945 |walltime 11975.833 | +Transformer | epoch 0 | step 45360 |avg loss 7.473 |avg tokens 2261.600 |tokens/s 8352.064 |walltime 11978.541 | +Transformer | epoch 0 | step 45370 |avg loss 7.723 |avg tokens 2194.000 |tokens/s 8449.498 |walltime 11981.138 | +Transformer | epoch 0 | step 45380 |avg loss 7.766 |avg tokens 2074.000 |tokens/s 8237.696 |walltime 11983.655 | +Transformer | epoch 0 | step 45390 |avg loss 7.798 |avg tokens 2204.800 |tokens/s 8319.475 |walltime 11986.305 | +Transformer | epoch 0 | step 45400 |avg loss 7.904 |avg tokens 1903.000 |tokens/s 7643.180 |walltime 11988.795 | +Transformer | epoch 0 | step 45410 |avg loss 7.681 |avg tokens 2209.700 |tokens/s 8332.732 |walltime 11991.447 | +Transformer | epoch 0 | step 45420 |avg loss 7.654 |avg tokens 2163.500 |tokens/s 7997.171 |walltime 11994.152 | +Transformer | epoch 0 | step 45430 |avg loss 7.797 |avg tokens 2169.000 |tokens/s 8271.148 |walltime 11996.775 | +Transformer | epoch 0 | step 45440 |avg loss 7.362 |avg tokens 2276.300 |tokens/s 8356.586 |walltime 11999.499 | +Transformer | epoch 0 | step 45450 |avg loss 7.852 |avg tokens 2379.400 |tokens/s 9334.847 |walltime 12002.048 | +Transformer | epoch 0 | step 45460 |avg loss 7.373 |avg tokens 2267.200 |tokens/s 8300.470 |walltime 12004.779 | +Transformer | epoch 0 | step 45470 |avg loss 7.526 |avg tokens 2178.400 |tokens/s 8222.088 |walltime 12007.429 | +Transformer | epoch 0 | step 45480 |avg loss 7.094 |avg tokens 2162.600 |tokens/s 7931.505 |walltime 12010.155 | +Transformer | epoch 0 | step 45490 |avg loss 7.558 |avg tokens 2229.600 |tokens/s 8268.123 |walltime 12012.852 | +Transformer | epoch 0 | step 45500 |avg loss 7.519 |avg tokens 2345.100 |tokens/s 8978.304 |walltime 12015.464 | +Transformer | epoch 0 | step 45510 |avg loss 7.489 |avg tokens 2271.200 |tokens/s 8287.429 |walltime 12018.204 | +Transformer | epoch 0 | step 45520 |avg loss 7.879 |avg tokens 2173.100 |tokens/s 8379.016 |walltime 12020.798 | +Transformer | epoch 0 | step 45530 |avg loss 7.602 |avg tokens 2283.300 |tokens/s 8657.707 |walltime 12023.435 | +Transformer | epoch 0 | step 45540 |avg loss 8.106 |avg tokens 1946.700 |tokens/s 7963.660 |walltime 12025.880 | +Transformer | epoch 0 | step 45550 |avg loss 7.571 |avg tokens 2105.700 |tokens/s 8012.228 |walltime 12028.508 | +Transformer | epoch 0 | step 45560 |avg loss 7.549 |avg tokens 2223.600 |tokens/s 8404.401 |walltime 12031.153 | +Transformer | epoch 0 | step 45570 |avg loss 8.010 |avg tokens 2100.500 |tokens/s 8531.980 |walltime 12033.615 | +Transformer | epoch 0 | step 45580 |avg loss 7.958 |avg tokens 2188.600 |tokens/s 8478.327 |walltime 12036.197 | +Transformer | epoch 0 | step 45590 |avg loss 7.606 |avg tokens 2307.200 |tokens/s 8538.255 |walltime 12038.899 | +Transformer | epoch 0 | step 45600 |avg loss 7.936 |avg tokens 2020.900 |tokens/s 7975.768 |walltime 12041.433 | +Transformer | epoch 0 | step 45610 |avg loss 7.554 |avg tokens 2182.100 |tokens/s 8018.426 |walltime 12044.154 | +Transformer | epoch 0 | step 45620 |avg loss 7.573 |avg tokens 2076.000 |tokens/s 8094.836 |walltime 12046.719 | +Transformer | epoch 0 | step 45630 |avg loss 7.694 |avg tokens 2143.800 |tokens/s 7997.868 |walltime 12049.399 | +Transformer | epoch 0 | step 45640 |avg loss 7.784 |avg tokens 2181.300 |tokens/s 8156.526 |walltime 12052.074 | +Transformer | epoch 0 | step 45650 |avg loss 8.066 |avg tokens 2037.600 |tokens/s 8002.463 |walltime 12054.620 | +Transformer | epoch 0 | step 45660 |avg loss 7.619 |avg tokens 2285.800 |tokens/s 8483.026 |walltime 12057.314 | +Transformer | epoch 0 | step 45670 |avg loss 7.896 |avg tokens 2382.200 |tokens/s 9141.441 |walltime 12059.920 | +Transformer | epoch 0 | step 45680 |avg loss 7.106 |avg tokens 2425.400 |tokens/s 8659.775 |walltime 12062.721 | +Transformer | epoch 0 | step 45690 |avg loss 7.507 |avg tokens 2222.600 |tokens/s 8212.885 |walltime 12065.427 | +Transformer | epoch 0 | step 45700 |avg loss 7.599 |avg tokens 2220.400 |tokens/s 8353.653 |walltime 12068.085 | +Transformer | epoch 0 | step 45710 |avg loss 7.454 |avg tokens 2288.800 |tokens/s 8417.429 |walltime 12070.804 | +Transformer | epoch 0 | step 45720 |avg loss 7.089 |avg tokens 2349.800 |tokens/s 8532.361 |walltime 12073.558 | +Transformer | epoch 0 | step 45730 |avg loss 7.673 |avg tokens 2318.400 |tokens/s 8590.014 |walltime 12076.257 | +Transformer | epoch 0 | step 45740 |avg loss 7.704 |avg tokens 2204.700 |tokens/s 8358.355 |walltime 12078.895 | +Transformer | epoch 0 | step 45750 |avg loss 7.803 |avg tokens 2327.700 |tokens/s 8672.202 |walltime 12081.579 | +Transformer | epoch 0 | step 45760 |avg loss 7.407 |avg tokens 2325.600 |tokens/s 8469.377 |walltime 12084.325 | +Transformer | epoch 0 | step 45770 |avg loss 7.617 |avg tokens 2217.900 |tokens/s 8476.951 |walltime 12086.941 | +Transformer | epoch 0 | step 45780 |avg loss 7.386 |avg tokens 2311.200 |tokens/s 8561.238 |walltime 12089.641 | +Transformer | epoch 0 | step 45790 |avg loss 7.525 |avg tokens 2062.300 |tokens/s 7844.975 |walltime 12092.270 | +Transformer | epoch 0 | step 45800 |avg loss 7.753 |avg tokens 2022.400 |tokens/s 7945.139 |walltime 12094.815 | +Transformer | epoch 0 | step 45810 |avg loss 7.659 |avg tokens 2356.000 |tokens/s 8812.327 |walltime 12097.489 | +Transformer | epoch 0 | step 45820 |avg loss 7.457 |avg tokens 2250.400 |tokens/s 8398.982 |walltime 12100.168 | +Transformer | epoch 0 | step 45830 |avg loss 7.825 |avg tokens 1677.100 |tokens/s 7061.299 |walltime 12102.543 | +Transformer | epoch 0 | step 45840 |avg loss 7.559 |avg tokens 2309.500 |tokens/s 8473.355 |walltime 12105.269 | +Transformer | epoch 0 | step 45850 |avg loss 7.858 |avg tokens 1936.200 |tokens/s 7735.133 |walltime 12107.772 | +Transformer | epoch 0 | step 45860 |avg loss 7.626 |avg tokens 2277.500 |tokens/s 8422.160 |walltime 12110.476 | +Transformer | epoch 0 | step 45870 |avg loss 7.461 |avg tokens 2188.800 |tokens/s 8231.984 |walltime 12113.135 | +Transformer | epoch 0 | step 45880 |avg loss 7.748 |avg tokens 2313.400 |tokens/s 8530.992 |walltime 12115.847 | +Transformer | epoch 0 | step 45890 |avg loss 7.701 |avg tokens 2143.000 |tokens/s 8287.983 |walltime 12118.433 | +Transformer | epoch 0 | step 45900 |avg loss 7.633 |avg tokens 2130.500 |tokens/s 8026.678 |walltime 12121.087 | +Transformer | epoch 0 | step 45910 |avg loss 8.168 |avg tokens 2019.900 |tokens/s 8365.506 |walltime 12123.501 | +Transformer | epoch 0 | step 45920 |avg loss 7.823 |avg tokens 1972.800 |tokens/s 7995.146 |walltime 12125.969 | +Transformer | epoch 0 | step 45930 |avg loss 7.831 |avg tokens 2102.700 |tokens/s 8008.788 |walltime 12128.594 | +Transformer | epoch 0 | step 45940 |avg loss 7.718 |avg tokens 2168.200 |tokens/s 8368.452 |walltime 12131.185 | +Transformer | epoch 0 | step 45950 |avg loss 7.496 |avg tokens 2117.400 |tokens/s 8083.741 |walltime 12133.805 | +Transformer | epoch 0 | step 45960 |avg loss 7.466 |avg tokens 1890.700 |tokens/s 7587.024 |walltime 12136.297 | +Transformer | epoch 0 | step 45970 |avg loss 7.868 |avg tokens 1914.400 |tokens/s 7815.267 |walltime 12138.746 | +Transformer | epoch 0 | step 45980 |avg loss 7.747 |avg tokens 2021.200 |tokens/s 7836.964 |walltime 12141.325 | +Transformer | epoch 0 | step 45990 |avg loss 7.658 |avg tokens 2210.100 |tokens/s 8425.851 |walltime 12143.948 | +Transformer | epoch 0 | step 46000 |avg loss 8.010 |avg tokens 2119.200 |tokens/s 8230.714 |walltime 12146.523 | +Transformer | epoch 0 | step 46010 |avg loss 7.505 |avg tokens 2248.900 |tokens/s 8230.531 |walltime 12149.255 | +Transformer | epoch 0 | step 46020 |avg loss 7.804 |avg tokens 2211.600 |tokens/s 8632.493 |walltime 12151.817 | +Transformer | epoch 0 | step 46030 |avg loss 7.408 |avg tokens 2366.400 |tokens/s 8779.231 |walltime 12154.513 | +Transformer | epoch 0 | step 46040 |avg loss 7.404 |avg tokens 2346.400 |tokens/s 8363.569 |walltime 12157.318 | +Transformer | epoch 0 | step 46050 |avg loss 7.435 |avg tokens 2287.200 |tokens/s 8401.316 |walltime 12160.041 | +Transformer | epoch 0 | step 46060 |avg loss 7.644 |avg tokens 2014.400 |tokens/s 7898.438 |walltime 12162.591 | +Transformer | epoch 0 | step 46070 |avg loss 7.524 |avg tokens 2247.400 |tokens/s 8266.927 |walltime 12165.310 | +Transformer | epoch 0 | step 46080 |avg loss 7.789 |avg tokens 2089.300 |tokens/s 8100.747 |walltime 12167.889 | +Transformer | epoch 0 | step 46090 |avg loss 8.206 |avg tokens 2408.800 |tokens/s 9448.759 |walltime 12170.438 | +Transformer | epoch 0 | step 46100 |avg loss 7.691 |avg tokens 2036.800 |tokens/s 8016.588 |walltime 12172.979 | +Transformer | epoch 0 | step 46110 |avg loss 7.485 |avg tokens 2195.300 |tokens/s 8113.683 |walltime 12175.685 | +Transformer | epoch 0 | step 46120 |avg loss 7.744 |avg tokens 2353.200 |tokens/s 9041.366 |walltime 12178.287 | +Transformer | epoch 0 | step 46130 |avg loss 7.776 |avg tokens 2316.200 |tokens/s 8776.049 |walltime 12180.927 | +Transformer | epoch 0 | step 46140 |avg loss 7.660 |avg tokens 2276.000 |tokens/s 8420.988 |walltime 12183.629 | +Transformer | epoch 0 | step 46150 |avg loss 7.917 |avg tokens 2098.900 |tokens/s 8410.755 |walltime 12186.125 | +Transformer | epoch 0 | step 46160 |avg loss 7.655 |avg tokens 2149.800 |tokens/s 8180.238 |walltime 12188.753 | +Transformer | epoch 0 | step 46170 |avg loss 7.827 |avg tokens 2224.700 |tokens/s 8523.007 |walltime 12191.363 | +Transformer | epoch 0 | step 46180 |avg loss 7.546 |avg tokens 2129.600 |tokens/s 7925.804 |walltime 12194.050 | +Transformer | epoch 0 | step 46190 |avg loss 7.816 |avg tokens 2074.400 |tokens/s 8241.949 |walltime 12196.567 | +Transformer | epoch 0 | step 46200 |avg loss 7.947 |avg tokens 2107.700 |tokens/s 8577.181 |walltime 12199.024 | +Transformer | epoch 0 | step 46210 |avg loss 7.587 |avg tokens 2229.900 |tokens/s 8189.080 |walltime 12201.747 | +Transformer | epoch 0 | step 46220 |avg loss 7.567 |avg tokens 2211.300 |tokens/s 8098.972 |walltime 12204.478 | +Transformer | epoch 0 | step 46230 |avg loss 7.901 |avg tokens 2171.800 |tokens/s 8437.696 |walltime 12207.052 | +Transformer | epoch 0 | step 46240 |avg loss 7.571 |avg tokens 2113.900 |tokens/s 8157.149 |walltime 12209.643 | +Transformer | epoch 0 | step 46250 |avg loss 7.830 |avg tokens 2246.800 |tokens/s 8429.386 |walltime 12212.308 | +Transformer | epoch 0 | step 46260 |avg loss 7.699 |avg tokens 2180.600 |tokens/s 8235.199 |walltime 12214.956 | +Transformer | epoch 0 | step 46270 |avg loss 7.270 |avg tokens 2302.400 |tokens/s 8349.695 |walltime 12217.714 | +Transformer | epoch 0 | step 46280 |avg loss 7.802 |avg tokens 2354.400 |tokens/s 8653.905 |walltime 12220.434 | +Transformer | epoch 0 | step 46290 |avg loss 7.981 |avg tokens 1863.000 |tokens/s 7433.216 |walltime 12222.941 | +Transformer | epoch 0 | step 46300 |avg loss 7.948 |avg tokens 1996.700 |tokens/s 7601.865 |walltime 12225.567 | +Transformer | epoch 0 | step 46310 |avg loss 7.477 |avg tokens 2305.600 |tokens/s 8569.535 |walltime 12228.258 | +Transformer | epoch 0 | step 46320 |avg loss 7.840 |avg tokens 1849.100 |tokens/s 7455.378 |walltime 12230.738 | +Transformer | epoch 0 | step 46330 |avg loss 7.367 |avg tokens 2149.500 |tokens/s 8237.136 |walltime 12233.348 | +Transformer | epoch 0 | step 46340 |avg loss 7.937 |avg tokens 1945.400 |tokens/s 7867.866 |walltime 12235.820 | +Transformer | epoch 0 | step 46350 |avg loss 7.559 |avg tokens 2287.400 |tokens/s 8558.524 |walltime 12238.493 | +Transformer | epoch 0 | step 46360 |avg loss 7.430 |avg tokens 2205.800 |tokens/s 7894.173 |walltime 12241.287 | +Transformer | epoch 0 | step 46370 |avg loss 7.752 |avg tokens 2141.200 |tokens/s 8301.452 |walltime 12243.866 | +Transformer | epoch 0 | step 46380 |avg loss 7.863 |avg tokens 2125.600 |tokens/s 8438.427 |walltime 12246.385 | +Transformer | epoch 0 | step 46390 |avg loss 8.152 |avg tokens 2176.000 |tokens/s 8788.266 |walltime 12248.861 | +Transformer | epoch 0 | step 46400 |avg loss 7.749 |avg tokens 2241.100 |tokens/s 8656.736 |walltime 12251.450 | +Transformer | epoch 0 | step 46410 |avg loss 7.650 |avg tokens 2208.400 |tokens/s 8425.837 |walltime 12254.071 | +Transformer | epoch 0 | step 46420 |avg loss 8.057 |avg tokens 2250.700 |tokens/s 8476.682 |walltime 12256.726 | +Transformer | epoch 0 | step 46430 |avg loss 7.765 |avg tokens 2076.700 |tokens/s 7915.799 |walltime 12259.350 | +Transformer | epoch 0 | step 46440 |avg loss 7.697 |avg tokens 2113.500 |tokens/s 8014.370 |walltime 12261.987 | +Transformer | epoch 0 | step 46450 |avg loss 7.893 |avg tokens 2072.400 |tokens/s 8164.727 |walltime 12264.525 | +Transformer | epoch 0 | step 46460 |avg loss 7.517 |avg tokens 2344.800 |tokens/s 8570.051 |walltime 12267.261 | +Transformer | epoch 0 | step 46470 |avg loss 7.754 |avg tokens 2041.900 |tokens/s 8136.716 |walltime 12269.771 | +Transformer | epoch 0 | step 46480 |avg loss 7.746 |avg tokens 2224.200 |tokens/s 8595.939 |walltime 12272.358 | +Transformer | epoch 0 | step 46490 |avg loss 8.068 |avg tokens 2105.300 |tokens/s 8038.829 |walltime 12274.977 | +Transformer | epoch 0 | step 46500 |avg loss 7.733 |avg tokens 2142.100 |tokens/s 8381.734 |walltime 12277.533 | +Transformer | epoch 0 | step 46510 |avg loss 7.444 |avg tokens 2321.400 |tokens/s 8331.190 |walltime 12280.319 | +Transformer | epoch 0 | step 46520 |avg loss 7.737 |avg tokens 2273.300 |tokens/s 8560.915 |walltime 12282.975 | +Transformer | epoch 0 | step 46530 |avg loss 7.647 |avg tokens 2063.300 |tokens/s 8012.203 |walltime 12285.550 | +Transformer | epoch 0 | step 46540 |avg loss 7.198 |avg tokens 2410.700 |tokens/s 8766.953 |walltime 12288.300 | +Transformer | epoch 0 | step 46550 |avg loss 8.126 |avg tokens 2008.500 |tokens/s 7711.006 |walltime 12290.904 | +Transformer | epoch 0 | step 46560 |avg loss 7.953 |avg tokens 2107.200 |tokens/s 8453.459 |walltime 12293.397 | +Transformer | epoch 0 | step 46570 |avg loss 7.669 |avg tokens 2175.100 |tokens/s 8258.642 |walltime 12296.031 | +Transformer | epoch 0 | step 46580 |avg loss 7.704 |avg tokens 2215.100 |tokens/s 8504.048 |walltime 12298.636 | +Transformer | epoch 0 | step 46590 |avg loss 7.462 |avg tokens 2247.400 |tokens/s 8473.779 |walltime 12301.288 | +Transformer | epoch 0 | step 46600 |avg loss 7.656 |avg tokens 2076.100 |tokens/s 7908.592 |walltime 12303.913 | +Transformer | epoch 0 | step 46610 |avg loss 7.699 |avg tokens 2053.500 |tokens/s 7974.151 |walltime 12306.488 | +Transformer | epoch 0 | step 46620 |avg loss 7.898 |avg tokens 1954.100 |tokens/s 7678.937 |walltime 12309.033 | +Transformer | epoch 0 | step 46630 |avg loss 8.106 |avg tokens 1933.600 |tokens/s 7863.098 |walltime 12311.492 | +Transformer | epoch 0 | step 46640 |avg loss 7.485 |avg tokens 2272.800 |tokens/s 8208.969 |walltime 12314.261 | +Transformer | epoch 0 | step 46650 |avg loss 7.436 |avg tokens 2160.600 |tokens/s 8128.728 |walltime 12316.919 | +Transformer | epoch 0 | step 46660 |avg loss 7.891 |avg tokens 2056.600 |tokens/s 8126.870 |walltime 12319.449 | +Transformer | epoch 0 | step 46670 |avg loss 7.291 |avg tokens 2269.200 |tokens/s 8525.795 |walltime 12322.111 | +Transformer | epoch 0 | step 46680 |avg loss 7.814 |avg tokens 2136.900 |tokens/s 8287.841 |walltime 12324.689 | +Transformer | epoch 0 | step 46690 |avg loss 7.448 |avg tokens 2376.000 |tokens/s 8645.103 |walltime 12327.438 | +Transformer | epoch 0 | step 46700 |avg loss 7.543 |avg tokens 2231.600 |tokens/s 8210.586 |walltime 12330.156 | +Transformer | epoch 0 | step 46710 |avg loss 7.803 |avg tokens 2029.700 |tokens/s 7984.184 |walltime 12332.698 | +Transformer | epoch 0 | step 46720 |avg loss 7.654 |avg tokens 2140.800 |tokens/s 8227.219 |walltime 12335.300 | +Transformer | epoch 0 | step 46730 |avg loss 7.560 |avg tokens 2120.900 |tokens/s 8208.746 |walltime 12337.884 | +Transformer | epoch 0 | step 46740 |avg loss 7.631 |avg tokens 2170.000 |tokens/s 8359.867 |walltime 12340.479 | +Transformer | epoch 0 | step 46750 |avg loss 7.941 |avg tokens 1862.700 |tokens/s 7585.253 |walltime 12342.935 | +Transformer | epoch 0 | step 46760 |avg loss 7.620 |avg tokens 2340.200 |tokens/s 8562.586 |walltime 12345.668 | +Transformer | epoch 0 | step 46770 |avg loss 7.979 |avg tokens 1853.400 |tokens/s 7273.364 |walltime 12348.216 | +Transformer | epoch 0 | step 46780 |avg loss 7.587 |avg tokens 2341.000 |tokens/s 8851.768 |walltime 12350.861 | +Transformer | epoch 0 | step 46790 |avg loss 7.315 |avg tokens 2178.100 |tokens/s 8207.204 |walltime 12353.515 | +Transformer | epoch 0 | step 46800 |avg loss 7.770 |avg tokens 1901.700 |tokens/s 8092.703 |walltime 12355.865 | +Transformer | epoch 0 | step 46810 |avg loss 7.700 |avg tokens 2267.500 |tokens/s 8497.109 |walltime 12358.533 | +Transformer | epoch 0 | step 46820 |avg loss 7.725 |avg tokens 2237.000 |tokens/s 8284.521 |walltime 12361.233 | +Transformer | epoch 0 | step 46830 |avg loss 7.657 |avg tokens 2357.300 |tokens/s 8793.666 |walltime 12363.914 | +Transformer | epoch 0 | step 46840 |avg loss 7.465 |avg tokens 2161.500 |tokens/s 8068.533 |walltime 12366.593 | +Transformer | epoch 0 | step 46850 |avg loss 8.025 |avg tokens 2045.000 |tokens/s 8142.378 |walltime 12369.105 | +Transformer | epoch 0 | step 46860 |avg loss 7.799 |avg tokens 2304.900 |tokens/s 8727.563 |walltime 12371.746 | +Transformer | epoch 0 | step 46870 |avg loss 7.658 |avg tokens 2277.600 |tokens/s 8493.136 |walltime 12374.427 | +Transformer | epoch 0 | step 46880 |avg loss 7.968 |avg tokens 2299.300 |tokens/s 8964.506 |walltime 12376.992 | +Transformer | epoch 0 | step 46890 |avg loss 8.016 |avg tokens 1720.000 |tokens/s 7128.748 |walltime 12379.405 | +Transformer | epoch 0 | step 46900 |avg loss 7.547 |avg tokens 2019.600 |tokens/s 7755.812 |walltime 12382.009 | +Transformer | epoch 0 | step 46910 |avg loss 7.439 |avg tokens 2302.400 |tokens/s 8530.876 |walltime 12384.708 | +Transformer | epoch 0 | step 46920 |avg loss 7.796 |avg tokens 2288.800 |tokens/s 8800.609 |walltime 12387.309 | +Transformer | epoch 0 | step 46930 |avg loss 7.589 |avg tokens 2252.500 |tokens/s 8139.190 |walltime 12390.076 | +Transformer | epoch 0 | step 46940 |avg loss 7.770 |avg tokens 2227.400 |tokens/s 8137.136 |walltime 12392.813 | +Transformer | epoch 0 | step 46950 |avg loss 7.700 |avg tokens 2369.600 |tokens/s 8575.907 |walltime 12395.576 | +Transformer | epoch 0 | step 46960 |avg loss 7.834 |avg tokens 2098.800 |tokens/s 8184.824 |walltime 12398.141 | +Transformer | epoch 0 | step 46970 |avg loss 7.762 |avg tokens 2150.100 |tokens/s 8202.084 |walltime 12400.762 | +Transformer | epoch 0 | step 46980 |avg loss 7.848 |avg tokens 1825.400 |tokens/s 7672.286 |walltime 12403.141 | +Transformer | epoch 0 | step 46990 |avg loss 7.501 |avg tokens 2217.100 |tokens/s 8218.079 |walltime 12405.839 | +Transformer | epoch 0 | step 47000 |avg loss 8.030 |avg tokens 2295.000 |tokens/s 8923.964 |walltime 12408.411 | +Transformer | epoch 0 | step 47010 |avg loss 7.905 |avg tokens 1747.100 |tokens/s 7542.087 |walltime 12410.727 | +Transformer | epoch 0 | step 47020 |avg loss 7.677 |avg tokens 2206.600 |tokens/s 8475.349 |walltime 12413.331 | +Transformer | epoch 0 | step 47030 |avg loss 7.425 |avg tokens 2222.100 |tokens/s 8302.846 |walltime 12416.007 | +Transformer | epoch 0 | step 47040 |avg loss 7.421 |avg tokens 2186.200 |tokens/s 8375.500 |walltime 12418.617 | +Transformer | epoch 0 | step 47050 |avg loss 6.980 |avg tokens 2413.300 |tokens/s 8622.465 |walltime 12421.416 | +Transformer | epoch 0 | step 47060 |avg loss 7.941 |avg tokens 2282.400 |tokens/s 8440.094 |walltime 12424.121 | +Transformer | epoch 0 | step 47070 |avg loss 7.204 |avg tokens 2316.000 |tokens/s 8600.412 |walltime 12426.813 | +Transformer | epoch 0 | step 47080 |avg loss 7.587 |avg tokens 2174.200 |tokens/s 8150.038 |walltime 12429.481 | +Transformer | epoch 0 | step 47090 |avg loss 7.485 |avg tokens 2020.800 |tokens/s 7678.435 |walltime 12432.113 | +Transformer | epoch 0 | step 47100 |avg loss 7.454 |avg tokens 2140.000 |tokens/s 8085.352 |walltime 12434.760 | +Transformer | epoch 0 | step 47110 |avg loss 7.466 |avg tokens 2369.200 |tokens/s 8427.657 |walltime 12437.571 | +Transformer | epoch 0 | step 47120 |avg loss 7.611 |avg tokens 2329.900 |tokens/s 8454.760 |walltime 12440.327 | +Transformer | epoch 0 | step 47130 |avg loss 7.931 |avg tokens 2259.400 |tokens/s 8599.865 |walltime 12442.954 | +Transformer | epoch 0 | step 47140 |avg loss 7.737 |avg tokens 2214.300 |tokens/s 8708.436 |walltime 12445.497 | +Transformer | epoch 0 | step 47150 |avg loss 7.539 |avg tokens 2271.300 |tokens/s 8260.066 |walltime 12448.246 | +Transformer | epoch 0 | step 47160 |avg loss 7.526 |avg tokens 2290.600 |tokens/s 8623.030 |walltime 12450.903 | +Transformer | epoch 0 | step 47170 |avg loss 7.373 |avg tokens 2215.800 |tokens/s 8366.214 |walltime 12453.551 | +Transformer | epoch 0 | step 47180 |avg loss 7.906 |avg tokens 2174.700 |tokens/s 8257.601 |walltime 12456.185 | +Transformer | epoch 0 | step 47190 |avg loss 7.727 |avg tokens 2161.400 |tokens/s 8307.104 |walltime 12458.787 | +Transformer | epoch 0 | step 47200 |avg loss 7.756 |avg tokens 2061.500 |tokens/s 8078.201 |walltime 12461.339 | +Transformer | epoch 0 | step 47210 |avg loss 7.619 |avg tokens 2169.600 |tokens/s 8196.194 |walltime 12463.986 | +Transformer | epoch 0 | step 47220 |avg loss 7.657 |avg tokens 2370.200 |tokens/s 8808.181 |walltime 12466.677 | +Transformer | epoch 0 | step 47230 |avg loss 7.837 |avg tokens 1845.400 |tokens/s 7791.841 |walltime 12469.045 | +Transformer | epoch 0 | step 47240 |avg loss 7.825 |avg tokens 2109.400 |tokens/s 8062.775 |walltime 12471.661 | +Transformer | epoch 0 | step 47250 |avg loss 7.806 |avg tokens 2126.600 |tokens/s 7996.134 |walltime 12474.321 | +Transformer | epoch 0 | step 47260 |avg loss 7.440 |avg tokens 2417.600 |tokens/s 8788.174 |walltime 12477.072 | +Transformer | epoch 0 | step 47270 |avg loss 7.760 |avg tokens 1838.500 |tokens/s 7621.247 |walltime 12479.484 | +Transformer | epoch 0 | step 47280 |avg loss 7.935 |avg tokens 2244.900 |tokens/s 8452.042 |walltime 12482.140 | +Transformer | epoch 0 | step 47290 |avg loss 7.820 |avg tokens 2211.000 |tokens/s 8303.585 |walltime 12484.803 | +Transformer | epoch 0 | step 47300 |avg loss 7.720 |avg tokens 2139.800 |tokens/s 8110.523 |walltime 12487.441 | +Transformer | epoch 0 | step 47310 |avg loss 7.644 |avg tokens 2034.700 |tokens/s 7735.534 |walltime 12490.072 | +Transformer | epoch 0 | step 47320 |avg loss 8.014 |avg tokens 1919.500 |tokens/s 7786.507 |walltime 12492.537 | +Transformer | epoch 0 | step 47330 |avg loss 7.821 |avg tokens 1980.500 |tokens/s 7631.471 |walltime 12495.132 | +Transformer | epoch 0 | step 47340 |avg loss 7.716 |avg tokens 2130.400 |tokens/s 8015.951 |walltime 12497.790 | +Transformer | epoch 0 | step 47350 |avg loss 7.829 |avg tokens 2318.600 |tokens/s 8826.062 |walltime 12500.417 | +Transformer | epoch 0 | step 47360 |avg loss 7.402 |avg tokens 2181.000 |tokens/s 8075.430 |walltime 12503.117 | +Transformer | epoch 0 | step 47370 |avg loss 7.405 |avg tokens 2372.500 |tokens/s 8587.910 |walltime 12505.880 | +Transformer | epoch 0 | step 47380 |avg loss 7.624 |avg tokens 2134.600 |tokens/s 8217.705 |walltime 12508.478 | +Transformer | epoch 0 | step 47390 |avg loss 7.565 |avg tokens 2249.600 |tokens/s 8639.834 |walltime 12511.081 | +Transformer | epoch 0 | step 47400 |avg loss 7.588 |avg tokens 2383.200 |tokens/s 8600.843 |walltime 12513.852 | +Transformer | epoch 0 | step 47410 |avg loss 7.556 |avg tokens 2150.900 |tokens/s 8285.561 |walltime 12516.448 | +Transformer | epoch 0 | step 47420 |avg loss 7.385 |avg tokens 2262.400 |tokens/s 8259.192 |walltime 12519.187 | +Transformer | epoch 0 | step 47430 |avg loss 7.421 |avg tokens 2366.100 |tokens/s 8600.265 |walltime 12521.939 | +Transformer | epoch 0 | step 47440 |avg loss 8.013 |avg tokens 2166.200 |tokens/s 8742.653 |walltime 12524.416 | +Transformer | epoch 0 | step 47450 |avg loss 7.710 |avg tokens 2138.800 |tokens/s 8123.995 |walltime 12527.049 | +Transformer | epoch 0 | step 47460 |avg loss 7.661 |avg tokens 2190.400 |tokens/s 8217.320 |walltime 12529.715 | +Transformer | epoch 0 | step 47470 |avg loss 7.506 |avg tokens 2200.000 |tokens/s 8215.092 |walltime 12532.393 | +Transformer | epoch 0 | step 47480 |avg loss 7.482 |avg tokens 2395.700 |tokens/s 8603.309 |walltime 12535.177 | +Transformer | epoch 0 | step 47490 |avg loss 7.729 |avg tokens 1849.500 |tokens/s 7316.244 |walltime 12537.705 | +Transformer | epoch 0 | step 47500 |avg loss 7.773 |avg tokens 2211.100 |tokens/s 8392.618 |walltime 12540.340 | +Transformer | epoch 0 | step 47510 |avg loss 8.088 |avg tokens 2290.600 |tokens/s 8935.798 |walltime 12542.903 | +Transformer | epoch 0 | step 47520 |avg loss 7.779 |avg tokens 2306.400 |tokens/s 8671.642 |walltime 12545.563 | +Transformer | epoch 0 | step 47530 |avg loss 7.934 |avg tokens 2121.400 |tokens/s 8524.200 |walltime 12548.052 | +Transformer | epoch 0 | step 47540 |avg loss 7.230 |avg tokens 2373.900 |tokens/s 8662.537 |walltime 12550.792 | +Transformer | epoch 0 | step 47550 |avg loss 8.120 |avg tokens 2181.200 |tokens/s 8373.411 |walltime 12553.397 | +Transformer | epoch 0 | step 47560 |avg loss 7.786 |avg tokens 2173.400 |tokens/s 8378.352 |walltime 12555.991 | +Transformer | epoch 0 | step 47570 |avg loss 7.516 |avg tokens 2281.200 |tokens/s 8207.491 |walltime 12558.770 | +Transformer | epoch 0 | step 47580 |avg loss 7.789 |avg tokens 2143.900 |tokens/s 8193.906 |walltime 12561.387 | +Transformer | epoch 0 | step 47590 |avg loss 7.479 |avg tokens 2287.000 |tokens/s 8497.387 |walltime 12564.078 | +Transformer | epoch 0 | step 47600 |avg loss 7.469 |avg tokens 2135.700 |tokens/s 8067.343 |walltime 12566.726 | +Transformer | epoch 0 | step 47610 |avg loss 7.559 |avg tokens 2300.800 |tokens/s 8731.679 |walltime 12569.361 | +Transformer | epoch 0 | step 47620 |avg loss 7.447 |avg tokens 2310.000 |tokens/s 8616.797 |walltime 12572.041 | +Transformer | epoch 0 | step 47630 |avg loss 7.516 |avg tokens 2306.400 |tokens/s 8416.366 |walltime 12574.782 | +Transformer | epoch 0 | step 47640 |avg loss 7.531 |avg tokens 2354.700 |tokens/s 8788.807 |walltime 12577.461 | +Transformer | epoch 0 | step 47650 |avg loss 7.494 |avg tokens 2235.900 |tokens/s 8287.367 |walltime 12580.159 | +Transformer | epoch 0 | step 47660 |avg loss 7.532 |avg tokens 2370.400 |tokens/s 8634.341 |walltime 12582.904 | +Transformer | epoch 0 | step 47670 |avg loss 7.539 |avg tokens 2319.200 |tokens/s 8479.051 |walltime 12585.640 | +Transformer | epoch 0 | step 47680 |avg loss 7.648 |avg tokens 2292.300 |tokens/s 8331.949 |walltime 12588.391 | +Transformer | epoch 0 | step 47690 |avg loss 7.746 |avg tokens 2277.300 |tokens/s 8644.035 |walltime 12591.025 | +Transformer | epoch 0 | step 47700 |avg loss 7.712 |avg tokens 2084.000 |tokens/s 7834.416 |walltime 12593.685 | +Transformer | epoch 0 | step 47710 |avg loss 7.735 |avg tokens 2267.200 |tokens/s 8262.484 |walltime 12596.429 | +Transformer | epoch 0 | step 47720 |avg loss 7.457 |avg tokens 2120.900 |tokens/s 7845.754 |walltime 12599.133 | +Transformer | epoch 0 | step 47730 |avg loss 7.812 |avg tokens 2278.400 |tokens/s 8560.739 |walltime 12601.794 | +Transformer | epoch 0 | step 47740 |avg loss 7.602 |avg tokens 2224.800 |tokens/s 8246.229 |walltime 12604.492 | +Transformer | epoch 0 | step 47750 |avg loss 7.857 |avg tokens 1801.400 |tokens/s 7534.338 |walltime 12606.883 | +Transformer | epoch 0 | step 47760 |avg loss 7.432 |avg tokens 2312.800 |tokens/s 8361.011 |walltime 12609.649 | +Transformer | epoch 0 | step 47770 |avg loss 7.868 |avg tokens 2139.900 |tokens/s 8137.597 |walltime 12612.279 | +Transformer | epoch 0 | step 47780 |avg loss 7.294 |avg tokens 2187.500 |tokens/s 7894.327 |walltime 12615.050 | +Transformer | epoch 0 | step 47790 |avg loss 8.001 |avg tokens 2122.000 |tokens/s 8326.579 |walltime 12617.598 | +Transformer | epoch 0 | step 47800 |avg loss 7.231 |avg tokens 2312.800 |tokens/s 8292.003 |walltime 12620.387 | +Transformer | epoch 0 | step 47810 |avg loss 7.834 |avg tokens 2096.200 |tokens/s 8312.063 |walltime 12622.909 | +Transformer | epoch 0 | step 47820 |avg loss 7.513 |avg tokens 2328.000 |tokens/s 8653.401 |walltime 12625.600 | +Transformer | epoch 0 | step 47830 |avg loss 7.873 |avg tokens 2204.400 |tokens/s 8498.324 |walltime 12628.194 | +Transformer | epoch 0 | step 47840 |avg loss 7.782 |avg tokens 2110.400 |tokens/s 8119.205 |walltime 12630.793 | +Transformer | epoch 0 | step 47850 |avg loss 7.881 |avg tokens 1983.800 |tokens/s 7836.834 |walltime 12633.324 | +Transformer | epoch 0 | step 47860 |avg loss 7.576 |avg tokens 2243.200 |tokens/s 8285.360 |walltime 12636.032 | +Transformer | epoch 0 | step 47870 |avg loss 7.803 |avg tokens 2112.300 |tokens/s 8509.783 |walltime 12638.514 | +Transformer | epoch 0 | step 47880 |avg loss 7.601 |avg tokens 2256.800 |tokens/s 8411.241 |walltime 12641.197 | +Transformer | epoch 0 | step 47890 |avg loss 7.998 |avg tokens 1940.700 |tokens/s 7970.423 |walltime 12643.632 | +Transformer | epoch 0 | step 47900 |avg loss 7.814 |avg tokens 2239.300 |tokens/s 8493.086 |walltime 12646.268 | +Transformer | epoch 0 | step 47910 |avg loss 7.794 |avg tokens 2126.200 |tokens/s 8233.794 |walltime 12648.851 | +Transformer | epoch 0 | step 47920 |avg loss 7.669 |avg tokens 2182.500 |tokens/s 8267.486 |walltime 12651.491 | +Transformer | epoch 0 | step 47930 |avg loss 7.663 |avg tokens 2304.800 |tokens/s 8637.994 |walltime 12654.159 | +Transformer | epoch 0 | step 47940 |avg loss 7.543 |avg tokens 2135.100 |tokens/s 8262.096 |walltime 12656.743 | +Transformer | epoch 0 | step 47950 |avg loss 7.690 |avg tokens 2034.800 |tokens/s 7808.387 |walltime 12659.349 | +Transformer | epoch 0 | step 47960 |avg loss 7.494 |avg tokens 2306.200 |tokens/s 8458.663 |walltime 12662.075 | +Transformer | epoch 0 | step 47970 |avg loss 7.946 |avg tokens 1919.700 |tokens/s 7574.947 |walltime 12664.610 | +Transformer | epoch 0 | step 47980 |avg loss 7.608 |avg tokens 2376.600 |tokens/s 8824.664 |walltime 12667.303 | +Transformer | epoch 0 | step 47990 |avg loss 7.828 |avg tokens 2352.500 |tokens/s 8908.938 |walltime 12669.943 | +Transformer | epoch 0 | step 48000 |avg loss 7.946 |avg tokens 2213.000 |tokens/s 8380.068 |walltime 12672.584 | +Transformer | epoch 0 | step 48010 |avg loss 7.837 |avg tokens 2246.300 |tokens/s 8265.230 |walltime 12675.302 | +Transformer | epoch 0 | step 48020 |avg loss 7.728 |avg tokens 2175.400 |tokens/s 8273.387 |walltime 12677.931 | +Transformer | epoch 0 | step 48030 |avg loss 7.729 |avg tokens 2301.100 |tokens/s 8480.575 |walltime 12680.645 | +Transformer | epoch 0 | step 48040 |avg loss 7.778 |avg tokens 2135.100 |tokens/s 8304.696 |walltime 12683.216 | +Transformer | epoch 0 | step 48050 |avg loss 7.722 |avg tokens 2060.800 |tokens/s 8048.200 |walltime 12685.776 | +Transformer | epoch 0 | step 48060 |avg loss 7.364 |avg tokens 2335.100 |tokens/s 8387.431 |walltime 12688.560 | +Transformer | epoch 0 | step 48070 |avg loss 7.920 |avg tokens 2368.500 |tokens/s 8765.189 |walltime 12691.262 | +Transformer | epoch 0 | step 48080 |avg loss 7.796 |avg tokens 2247.900 |tokens/s 8557.601 |walltime 12693.889 | +Transformer | epoch 0 | step 48090 |avg loss 7.829 |avg tokens 2153.000 |tokens/s 8458.833 |walltime 12696.435 | +Transformer | epoch 0 | step 48100 |avg loss 8.142 |avg tokens 2338.800 |tokens/s 8859.107 |walltime 12699.075 | +Transformer | epoch 0 | step 48110 |avg loss 8.182 |avg tokens 1725.500 |tokens/s 7084.195 |walltime 12701.510 | +Transformer | epoch 0 | step 48120 |avg loss 7.610 |avg tokens 2336.500 |tokens/s 8721.556 |walltime 12704.189 | +Transformer | epoch 0 | step 48130 |avg loss 7.605 |avg tokens 2069.900 |tokens/s 7846.057 |walltime 12706.827 | +Transformer | epoch 0 | step 48140 |avg loss 7.314 |avg tokens 2084.900 |tokens/s 7856.499 |walltime 12709.481 | +Transformer | epoch 0 | step 48150 |avg loss 7.582 |avg tokens 2285.300 |tokens/s 8557.640 |walltime 12712.152 | +Transformer | epoch 0 | step 48160 |avg loss 7.617 |avg tokens 2238.400 |tokens/s 8299.345 |walltime 12714.849 | +Transformer | epoch 0 | step 48170 |avg loss 7.330 |avg tokens 2335.400 |tokens/s 8430.903 |walltime 12717.619 | +Transformer | epoch 0 | step 48180 |avg loss 7.649 |avg tokens 2194.400 |tokens/s 8262.509 |walltime 12720.275 | +Transformer | epoch 0 | step 48190 |avg loss 7.616 |avg tokens 2125.000 |tokens/s 8122.574 |walltime 12722.891 | +Transformer | epoch 0 | step 48200 |avg loss 8.068 |avg tokens 2089.000 |tokens/s 8048.906 |walltime 12725.486 | +Transformer | epoch 0 | step 48210 |avg loss 8.116 |avg tokens 2149.500 |tokens/s 8729.295 |walltime 12727.949 | +Transformer | epoch 0 | step 48220 |avg loss 7.957 |avg tokens 1621.500 |tokens/s 7239.326 |walltime 12730.188 | +Transformer | epoch 0 | step 48230 |avg loss 7.635 |avg tokens 2286.900 |tokens/s 8476.659 |walltime 12732.886 | +Transformer | epoch 0 | step 48240 |avg loss 7.671 |avg tokens 2296.000 |tokens/s 8432.592 |walltime 12735.609 | +Transformer | epoch 0 | step 48250 |avg loss 8.142 |avg tokens 1706.000 |tokens/s 7279.527 |walltime 12737.953 | +Transformer | epoch 0 | step 48260 |avg loss 7.962 |avg tokens 2132.400 |tokens/s 8423.301 |walltime 12740.484 | +Transformer | epoch 0 | step 48270 |avg loss 7.792 |avg tokens 2158.300 |tokens/s 8600.955 |walltime 12742.994 | +Transformer | epoch 0 | step 48280 |avg loss 7.546 |avg tokens 2244.000 |tokens/s 8575.439 |walltime 12745.610 | +Transformer | epoch 0 | step 48290 |avg loss 7.912 |avg tokens 2080.900 |tokens/s 8297.996 |walltime 12748.118 | +Transformer | epoch 0 | step 48300 |avg loss 7.377 |avg tokens 2324.400 |tokens/s 8518.449 |walltime 12750.847 | +Transformer | epoch 0 | step 48310 |avg loss 8.102 |avg tokens 1971.400 |tokens/s 8073.263 |walltime 12753.289 | +Transformer | epoch 0 | step 48320 |avg loss 7.640 |avg tokens 2429.200 |tokens/s 8894.754 |walltime 12756.020 | +Transformer | epoch 0 | step 48330 |avg loss 7.615 |avg tokens 2204.600 |tokens/s 8381.251 |walltime 12758.650 | +Transformer | epoch 0 | step 48340 |avg loss 7.841 |avg tokens 2171.700 |tokens/s 8385.615 |walltime 12761.240 | +Transformer | epoch 0 | step 48350 |avg loss 7.739 |avg tokens 2266.700 |tokens/s 8541.921 |walltime 12763.893 | +Transformer | epoch 0 | step 48360 |avg loss 7.511 |avg tokens 2329.600 |tokens/s 8397.381 |walltime 12766.668 | +Transformer | epoch 0 | step 48370 |avg loss 7.692 |avg tokens 2268.000 |tokens/s 8371.647 |walltime 12769.377 | +Transformer | epoch 0 | step 48380 |avg loss 7.866 |avg tokens 2184.500 |tokens/s 8348.998 |walltime 12771.993 | +Transformer | epoch 0 | step 48390 |avg loss 7.397 |avg tokens 2365.600 |tokens/s 8591.925 |walltime 12774.747 | +Transformer | epoch 0 | step 48400 |avg loss 7.718 |avg tokens 2240.600 |tokens/s 8623.711 |walltime 12777.345 | +Transformer | epoch 0 | step 48410 |avg loss 7.551 |avg tokens 2205.200 |tokens/s 8216.127 |walltime 12780.029 | +Transformer | epoch 0 | step 48420 |avg loss 7.392 |avg tokens 2284.800 |tokens/s 8592.708 |walltime 12782.688 | +Transformer | epoch 0 | step 48430 |avg loss 7.964 |avg tokens 2138.100 |tokens/s 8397.603 |walltime 12785.234 | +Transformer | epoch 0 | step 48440 |avg loss 7.700 |avg tokens 2122.600 |tokens/s 8330.004 |walltime 12787.782 | +Transformer | epoch 0 | step 48450 |avg loss 7.709 |avg tokens 2164.200 |tokens/s 8065.422 |walltime 12790.465 | +Transformer | epoch 0 | step 48460 |avg loss 7.993 |avg tokens 2067.900 |tokens/s 8369.772 |walltime 12792.936 | +Transformer | epoch 0 | step 48470 |avg loss 7.742 |avg tokens 2165.400 |tokens/s 8396.933 |walltime 12795.515 | +Transformer | epoch 0 | step 48480 |avg loss 7.359 |avg tokens 2392.000 |tokens/s 8350.148 |walltime 12798.379 | +Transformer | epoch 0 | step 48490 |avg loss 7.820 |avg tokens 1810.200 |tokens/s 7357.073 |walltime 12800.840 | +Transformer | epoch 0 | step 48500 |avg loss 7.415 |avg tokens 2084.200 |tokens/s 7837.136 |walltime 12803.499 | +Transformer | epoch 0 | step 48510 |avg loss 7.802 |avg tokens 2136.000 |tokens/s 8300.051 |walltime 12806.073 | +Transformer | epoch 0 | step 48520 |avg loss 7.805 |avg tokens 2229.800 |tokens/s 8257.829 |walltime 12808.773 | +Transformer | epoch 0 | step 48530 |avg loss 7.468 |avg tokens 2261.600 |tokens/s 8287.224 |walltime 12811.502 | +Transformer | epoch 0 | step 48540 |avg loss 7.431 |avg tokens 2193.600 |tokens/s 8034.369 |walltime 12814.232 | +Transformer | epoch 0 | step 48550 |avg loss 7.682 |avg tokens 2194.400 |tokens/s 8124.241 |walltime 12816.933 | +Transformer | epoch 0 | step 48560 |avg loss 8.135 |avg tokens 2035.900 |tokens/s 8169.672 |walltime 12819.425 | +Transformer | epoch 0 | step 48570 |avg loss 7.576 |avg tokens 2094.100 |tokens/s 7902.562 |walltime 12822.075 | +Transformer | epoch 0 | step 48580 |avg loss 7.881 |avg tokens 2095.800 |tokens/s 8582.049 |walltime 12824.517 | +Transformer | epoch 0 | step 48590 |avg loss 7.759 |avg tokens 1999.600 |tokens/s 7693.639 |walltime 12827.116 | +Transformer | epoch 0 | step 48600 |avg loss 7.756 |avg tokens 2258.400 |tokens/s 8270.130 |walltime 12829.847 | +Transformer | epoch 0 | step 48610 |avg loss 7.915 |avg tokens 2126.500 |tokens/s 8230.509 |walltime 12832.431 | +Transformer | epoch 0 | step 48620 |avg loss 7.721 |avg tokens 2368.300 |tokens/s 9034.905 |walltime 12835.052 | +Transformer | epoch 0 | step 48630 |avg loss 7.700 |avg tokens 2433.400 |tokens/s 9135.628 |walltime 12837.716 | +Transformer | epoch 0 | step 48640 |avg loss 8.101 |avg tokens 2029.700 |tokens/s 8302.355 |walltime 12840.161 | +Transformer | epoch 0 | step 48650 |avg loss 7.985 |avg tokens 1844.400 |tokens/s 7691.108 |walltime 12842.559 | +Transformer | epoch 0 | step 48660 |avg loss 7.699 |avg tokens 2006.200 |tokens/s 7896.354 |walltime 12845.099 | +Transformer | epoch 0 | step 48670 |avg loss 7.867 |avg tokens 1936.900 |tokens/s 7738.873 |walltime 12847.602 | +Transformer | epoch 0 | step 48680 |avg loss 7.495 |avg tokens 2335.200 |tokens/s 8641.778 |walltime 12850.304 | +Transformer | epoch 0 | step 48690 |avg loss 7.813 |avg tokens 1923.300 |tokens/s 7574.920 |walltime 12852.843 | +Transformer | epoch 0 | step 48700 |avg loss 7.660 |avg tokens 2213.300 |tokens/s 8412.875 |walltime 12855.474 | +Transformer | epoch 0 | step 48710 |avg loss 7.746 |avg tokens 2263.200 |tokens/s 8431.596 |walltime 12858.158 | +Transformer | epoch 0 | step 48720 |avg loss 7.934 |avg tokens 2223.500 |tokens/s 8208.201 |walltime 12860.867 | +Transformer | epoch 0 | step 48730 |avg loss 7.411 |avg tokens 2393.100 |tokens/s 8856.458 |walltime 12863.569 | +Transformer | epoch 0 | step 48740 |avg loss 7.349 |avg tokens 2298.400 |tokens/s 8375.506 |walltime 12866.314 | +Transformer | epoch 0 | step 48750 |avg loss 7.645 |avg tokens 2271.800 |tokens/s 8405.924 |walltime 12869.016 | +Transformer | epoch 0 | step 48760 |avg loss 7.699 |avg tokens 2203.200 |tokens/s 8078.576 |walltime 12871.743 | +Transformer | epoch 0 | step 48770 |avg loss 7.699 |avg tokens 2239.200 |tokens/s 8578.394 |walltime 12874.354 | +Transformer | epoch 0 | step 48780 |avg loss 7.548 |avg tokens 2333.900 |tokens/s 8654.590 |walltime 12877.050 | +Transformer | epoch 0 | step 48790 |avg loss 7.669 |avg tokens 2165.000 |tokens/s 8170.264 |walltime 12879.700 | +Transformer | epoch 0 | step 48800 |avg loss 7.831 |avg tokens 2244.900 |tokens/s 8198.136 |walltime 12882.439 | +Transformer | epoch 0 | step 48810 |avg loss 7.655 |avg tokens 2018.300 |tokens/s 8126.037 |walltime 12884.922 | +Transformer | epoch 0 | step 48820 |avg loss 7.721 |avg tokens 1963.100 |tokens/s 7759.962 |walltime 12887.452 | +Transformer | epoch 0 | step 48830 |avg loss 7.805 |avg tokens 2179.300 |tokens/s 8050.077 |walltime 12890.159 | +Transformer | epoch 0 | step 48840 |avg loss 7.792 |avg tokens 2166.500 |tokens/s 8247.869 |walltime 12892.786 | +Transformer | epoch 0 | step 48850 |avg loss 7.438 |avg tokens 2221.200 |tokens/s 8252.979 |walltime 12895.477 | +Transformer | epoch 0 | step 48860 |avg loss 8.041 |avg tokens 2264.300 |tokens/s 8768.472 |walltime 12898.060 | +Transformer | epoch 0 | step 48870 |avg loss 7.642 |avg tokens 2122.100 |tokens/s 8065.593 |walltime 12900.691 | +Transformer | epoch 0 | step 48880 |avg loss 7.666 |avg tokens 2242.500 |tokens/s 8281.693 |walltime 12903.399 | +Transformer | epoch 0 | step 48890 |avg loss 7.831 |avg tokens 2056.900 |tokens/s 8176.918 |walltime 12905.914 | +Transformer | epoch 0 | step 48900 |avg loss 7.735 |avg tokens 2261.900 |tokens/s 8325.341 |walltime 12908.631 | +Transformer | epoch 0 | step 48910 |avg loss 7.878 |avg tokens 2273.100 |tokens/s 8280.284 |walltime 12911.376 | +Transformer | epoch 0 | step 48920 |avg loss 7.814 |avg tokens 2278.200 |tokens/s 8807.303 |walltime 12913.963 | +Transformer | epoch 0 | step 48930 |avg loss 7.416 |avg tokens 2387.200 |tokens/s 8514.868 |walltime 12916.767 | +Transformer | epoch 0 | step 48940 |avg loss 7.463 |avg tokens 2346.100 |tokens/s 8612.910 |walltime 12919.490 | +Transformer | epoch 0 | step 48950 |avg loss 7.789 |avg tokens 2365.100 |tokens/s 9086.952 |walltime 12922.093 | +Transformer | epoch 0 | step 48960 |avg loss 7.852 |avg tokens 2140.400 |tokens/s 8302.450 |walltime 12924.671 | +Transformer | epoch 0 | step 48970 |avg loss 7.855 |avg tokens 2202.000 |tokens/s 8476.273 |walltime 12927.269 | +Transformer | epoch 0 | step 48980 |avg loss 7.563 |avg tokens 2164.100 |tokens/s 8458.346 |walltime 12929.828 | +Transformer | epoch 0 | step 48990 |avg loss 8.166 |avg tokens 2000.200 |tokens/s 7892.199 |walltime 12932.362 | +Transformer | epoch 0 | step 49000 |avg loss 7.496 |avg tokens 2026.100 |tokens/s 7568.175 |walltime 12935.039 | +Transformer | epoch 0 | step 49010 |avg loss 7.701 |avg tokens 2096.800 |tokens/s 7936.508 |walltime 12937.681 | +Transformer | epoch 0 | step 49020 |avg loss 7.730 |avg tokens 2028.600 |tokens/s 7999.784 |walltime 12940.217 | +Transformer | epoch 0 | step 49030 |avg loss 7.927 |avg tokens 2217.900 |tokens/s 8480.797 |walltime 12942.832 | +Transformer | epoch 0 | step 49040 |avg loss 7.823 |avg tokens 2222.100 |tokens/s 8414.667 |walltime 12945.473 | +Transformer | epoch 0 | step 49050 |avg loss 7.688 |avg tokens 2312.000 |tokens/s 8567.287 |walltime 12948.172 | +Transformer | epoch 0 | step 49060 |avg loss 7.880 |avg tokens 2105.700 |tokens/s 8004.871 |walltime 12950.802 | +Transformer | epoch 0 | step 49070 |avg loss 7.929 |avg tokens 1917.100 |tokens/s 7866.461 |walltime 12953.239 | +Transformer | epoch 0 | step 49080 |avg loss 7.614 |avg tokens 2144.600 |tokens/s 7916.553 |walltime 12955.948 | +Transformer | epoch 0 | step 49090 |avg loss 7.803 |avg tokens 2325.000 |tokens/s 8885.607 |walltime 12958.565 | +Transformer | epoch 0 | step 49100 |avg loss 7.346 |avg tokens 2127.500 |tokens/s 7977.996 |walltime 12961.231 | +Transformer | epoch 0 | step 49110 |avg loss 7.686 |avg tokens 1995.100 |tokens/s 7809.609 |walltime 12963.786 | +Transformer | epoch 0 | step 49120 |avg loss 7.550 |avg tokens 2220.400 |tokens/s 8262.552 |walltime 12966.473 | +Transformer | epoch 0 | step 49130 |avg loss 7.823 |avg tokens 2157.700 |tokens/s 8690.015 |walltime 12968.956 | +Transformer | epoch 0 | step 49140 |avg loss 7.650 |avg tokens 2126.100 |tokens/s 8021.619 |walltime 12971.607 | +Transformer | epoch 0 | step 49150 |avg loss 7.885 |avg tokens 2207.200 |tokens/s 8434.580 |walltime 12974.224 | +Transformer | epoch 0 | step 49160 |avg loss 8.274 |avg tokens 2014.600 |tokens/s 8376.968 |walltime 12976.629 | +Transformer | epoch 0 | step 49170 |avg loss 7.757 |avg tokens 2189.800 |tokens/s 8413.461 |walltime 12979.231 | +Transformer | epoch 0 | step 49180 |avg loss 8.011 |avg tokens 1888.600 |tokens/s 7804.005 |walltime 12981.651 | +Transformer | epoch 0 | step 49190 |avg loss 7.791 |avg tokens 2277.600 |tokens/s 8438.094 |walltime 12984.351 | +Transformer | epoch 0 | step 49200 |avg loss 7.869 |avg tokens 2348.900 |tokens/s 9045.218 |walltime 12986.948 | +Transformer | epoch 0 | step 49210 |avg loss 7.889 |avg tokens 2091.800 |tokens/s 8092.793 |walltime 12989.532 | +Transformer | epoch 0 | step 49220 |avg loss 7.981 |avg tokens 2107.100 |tokens/s 8233.758 |walltime 12992.091 | +Transformer | epoch 0 | step 49230 |avg loss 7.636 |avg tokens 2426.400 |tokens/s 8885.873 |walltime 12994.822 | +Transformer | epoch 0 | step 49240 |avg loss 7.979 |avg tokens 2111.200 |tokens/s 8128.844 |walltime 12997.419 | +Transformer | epoch 0 | step 49250 |avg loss 7.938 |avg tokens 1996.500 |tokens/s 7856.820 |walltime 12999.960 | +Transformer | epoch 0 | step 49260 |avg loss 7.990 |avg tokens 1995.100 |tokens/s 7915.981 |walltime 13002.481 | +Transformer | epoch 0 | step 49270 |avg loss 8.044 |avg tokens 2017.400 |tokens/s 8017.317 |walltime 13004.997 | +Transformer | epoch 0 | step 49280 |avg loss 7.769 |avg tokens 2212.300 |tokens/s 8295.546 |walltime 13007.664 | +Transformer | epoch 0 | step 49290 |avg loss 7.524 |avg tokens 2063.500 |tokens/s 8122.092 |walltime 13010.204 | +Transformer | epoch 0 | step 49300 |avg loss 7.581 |avg tokens 2176.000 |tokens/s 8284.052 |walltime 13012.831 | +Transformer | epoch 0 | step 49310 |avg loss 7.587 |avg tokens 2316.000 |tokens/s 8478.912 |walltime 13015.563 | +Transformer | epoch 0 | step 49320 |avg loss 7.776 |avg tokens 2193.300 |tokens/s 8344.460 |walltime 13018.191 | +Transformer | epoch 0 | step 49330 |avg loss 7.930 |avg tokens 2317.800 |tokens/s 8823.603 |walltime 13020.818 | +Transformer | epoch 0 | step 49340 |avg loss 7.622 |avg tokens 2213.400 |tokens/s 8364.573 |walltime 13023.464 | +Transformer | epoch 0 | step 49350 |avg loss 8.104 |avg tokens 2086.200 |tokens/s 8344.880 |walltime 13025.964 | +Transformer | epoch 0 | step 49360 |avg loss 7.808 |avg tokens 2262.500 |tokens/s 8330.209 |walltime 13028.680 | +Transformer | epoch 0 | step 49370 |avg loss 7.604 |avg tokens 2230.500 |tokens/s 8155.048 |walltime 13031.415 | +Transformer | epoch 0 | step 49380 |avg loss 7.562 |avg tokens 1964.000 |tokens/s 7857.767 |walltime 13033.915 | +Transformer | epoch 0 | step 49390 |avg loss 7.701 |avg tokens 2321.600 |tokens/s 8795.676 |walltime 13036.554 | +Transformer | epoch 0 | step 49400 |avg loss 7.253 |avg tokens 2105.400 |tokens/s 7997.270 |walltime 13039.187 | +Transformer | epoch 0 | step 49410 |avg loss 7.587 |avg tokens 2155.800 |tokens/s 8140.774 |walltime 13041.835 | +Transformer | epoch 0 | step 49420 |avg loss 7.706 |avg tokens 1796.600 |tokens/s 7297.639 |walltime 13044.297 | +Transformer | epoch 0 | step 49430 |avg loss 7.713 |avg tokens 2273.900 |tokens/s 8262.513 |walltime 13047.049 | +Transformer | epoch 0 | step 49440 |avg loss 7.401 |avg tokens 2278.200 |tokens/s 8438.541 |walltime 13049.749 | +Transformer | epoch 0 | step 49450 |avg loss 7.514 |avg tokens 2277.600 |tokens/s 8346.032 |walltime 13052.478 | +Transformer | epoch 0 | step 49460 |avg loss 7.663 |avg tokens 2166.400 |tokens/s 8024.421 |walltime 13055.177 | +Transformer | epoch 0 | step 49470 |avg loss 7.640 |avg tokens 2045.900 |tokens/s 7776.398 |walltime 13057.808 | +Transformer | epoch 0 | step 49480 |avg loss 7.769 |avg tokens 2148.600 |tokens/s 8251.474 |walltime 13060.412 | +Transformer | epoch 0 | step 49490 |avg loss 7.709 |avg tokens 2182.800 |tokens/s 8109.635 |walltime 13063.104 | +Transformer | epoch 0 | step 49500 |avg loss 7.706 |avg tokens 2360.000 |tokens/s 8707.211 |walltime 13065.814 | +Transformer | epoch 0 | step 49510 |avg loss 7.564 |avg tokens 2283.100 |tokens/s 8615.395 |walltime 13068.464 | +Transformer | epoch 0 | step 49520 |avg loss 7.823 |avg tokens 2088.800 |tokens/s 8072.431 |walltime 13071.052 | +Transformer | epoch 0 | step 49530 |avg loss 7.618 |avg tokens 2304.800 |tokens/s 8444.157 |walltime 13073.781 | +Transformer | epoch 0 | step 49540 |avg loss 7.577 |avg tokens 2155.600 |tokens/s 7986.128 |walltime 13076.480 | +Transformer | epoch 0 | step 49550 |avg loss 7.779 |avg tokens 2227.200 |tokens/s 8296.914 |walltime 13079.165 | +Transformer | epoch 0 | step 49560 |avg loss 7.436 |avg tokens 2212.000 |tokens/s 8210.218 |walltime 13081.859 | +Transformer | epoch 0 | step 49570 |avg loss 8.037 |avg tokens 2099.900 |tokens/s 8283.351 |walltime 13084.394 | +Transformer | epoch 0 | step 49580 |avg loss 7.244 |avg tokens 2080.500 |tokens/s 7933.341 |walltime 13087.017 | +Transformer | epoch 0 | step 49590 |avg loss 7.673 |avg tokens 2354.500 |tokens/s 8626.031 |walltime 13089.746 | +Transformer | epoch 0 | step 49600 |avg loss 7.936 |avg tokens 2074.700 |tokens/s 7986.161 |walltime 13092.344 | +Transformer | epoch 0 | step 49610 |avg loss 7.726 |avg tokens 2310.300 |tokens/s 8478.289 |walltime 13095.069 | +Transformer | epoch 0 | step 49620 |avg loss 7.396 |avg tokens 2131.200 |tokens/s 7966.235 |walltime 13097.744 | +Transformer | epoch 0 | step 49630 |avg loss 7.879 |avg tokens 2104.800 |tokens/s 8068.170 |walltime 13100.353 | +Transformer | epoch 0 | step 49640 |avg loss 7.747 |avg tokens 2366.400 |tokens/s 8659.419 |walltime 13103.086 | +Transformer | epoch 0 | step 49650 |avg loss 7.236 |avg tokens 2088.800 |tokens/s 7813.065 |walltime 13105.759 | +Transformer | epoch 0 | step 49660 |avg loss 7.478 |avg tokens 2165.300 |tokens/s 8182.460 |walltime 13108.406 | +Transformer | epoch 0 | step 49670 |avg loss 7.601 |avg tokens 2289.600 |tokens/s 8366.091 |walltime 13111.142 | +Transformer | epoch 0 | step 49680 |avg loss 7.742 |avg tokens 2322.600 |tokens/s 8479.151 |walltime 13113.882 | +Transformer | epoch 0 | step 49690 |avg loss 7.815 |avg tokens 2259.600 |tokens/s 8487.397 |walltime 13116.544 | +Transformer | epoch 0 | step 49700 |avg loss 7.818 |avg tokens 2214.300 |tokens/s 8497.584 |walltime 13119.150 | +Transformer | epoch 0 | step 49710 |avg loss 7.577 |avg tokens 2067.400 |tokens/s 8244.246 |walltime 13121.657 | +Transformer | epoch 0 | step 49720 |avg loss 7.751 |avg tokens 2249.400 |tokens/s 8286.079 |walltime 13124.372 | +Transformer | epoch 0 | step 49730 |avg loss 8.041 |avg tokens 2093.000 |tokens/s 8028.013 |walltime 13126.979 | +Transformer | epoch 0 | step 49740 |avg loss 7.847 |avg tokens 2313.800 |tokens/s 8751.776 |walltime 13129.623 | +Transformer | epoch 0 | step 49750 |avg loss 7.729 |avg tokens 2065.500 |tokens/s 8001.326 |walltime 13132.204 | +Transformer | epoch 0 | step 49760 |avg loss 7.659 |avg tokens 2287.100 |tokens/s 8612.914 |walltime 13134.860 | +Transformer | epoch 0 | step 49770 |avg loss 7.817 |avg tokens 2035.300 |tokens/s 7957.005 |walltime 13137.418 | +Transformer | epoch 0 | step 49780 |avg loss 7.589 |avg tokens 2262.800 |tokens/s 8620.568 |walltime 13140.043 | +Transformer | epoch 0 | step 49790 |avg loss 7.290 |avg tokens 2335.000 |tokens/s 8424.329 |walltime 13142.814 | +Transformer | epoch 0 | step 49800 |avg loss 7.846 |avg tokens 2176.100 |tokens/s 8046.164 |walltime 13145.519 | +Transformer | epoch 0 | step 49810 |avg loss 7.910 |avg tokens 1921.400 |tokens/s 7719.056 |walltime 13148.008 | +Transformer | epoch 0 | step 49820 |avg loss 7.500 |avg tokens 2343.900 |tokens/s 8474.930 |walltime 13150.774 | +Transformer | epoch 0 | step 49830 |avg loss 7.587 |avg tokens 2276.000 |tokens/s 8256.448 |walltime 13153.530 | +Transformer | epoch 0 | step 49840 |avg loss 7.832 |avg tokens 2117.000 |tokens/s 8227.033 |walltime 13156.104 | +Transformer | epoch 0 | step 49850 |avg loss 7.533 |avg tokens 2210.100 |tokens/s 8553.841 |walltime 13158.687 | +Transformer | epoch 0 | step 49860 |avg loss 7.479 |avg tokens 2392.000 |tokens/s 8495.263 |walltime 13161.503 | +Transformer | epoch 0 | step 49870 |avg loss 7.828 |avg tokens 2189.200 |tokens/s 8527.192 |walltime 13164.070 | +Transformer | epoch 0 | step 49880 |avg loss 8.041 |avg tokens 1856.700 |tokens/s 7656.643 |walltime 13166.495 | +Transformer | epoch 0 | step 49890 |avg loss 7.566 |avg tokens 2228.900 |tokens/s 8323.314 |walltime 13169.173 | +Transformer | epoch 0 | step 49900 |avg loss 7.883 |avg tokens 2298.500 |tokens/s 8793.181 |walltime 13171.787 | +Transformer | epoch 0 | step 49910 |avg loss 7.794 |avg tokens 2297.600 |tokens/s 8674.944 |walltime 13174.436 | +Transformer | epoch 0 | step 49920 |avg loss 8.066 |avg tokens 2165.500 |tokens/s 8742.913 |walltime 13176.913 | +Transformer | epoch 0 | step 49930 |avg loss 7.942 |avg tokens 1918.900 |tokens/s 7717.523 |walltime 13179.399 | +Transformer | epoch 0 | step 49940 |avg loss 7.648 |avg tokens 2328.000 |tokens/s 8796.892 |walltime 13182.045 | +Transformer | epoch 0 | step 49950 |avg loss 7.255 |avg tokens 2361.600 |tokens/s 8610.119 |walltime 13184.788 | +Transformer | epoch 0 | step 49960 |avg loss 7.657 |avg tokens 2076.000 |tokens/s 8102.687 |walltime 13187.350 | +Transformer | epoch 0 | step 49970 |avg loss 7.801 |avg tokens 2206.300 |tokens/s 8429.425 |walltime 13189.968 | +Transformer | epoch 0 | step 49980 |avg loss 7.481 |avg tokens 2302.300 |tokens/s 8334.968 |walltime 13192.730 | +Transformer | epoch 0 | step 49990 |avg loss 7.700 |avg tokens 2116.400 |tokens/s 8020.447 |walltime 13195.369 | +Transformer | epoch 0 | step 50000 |avg loss 7.733 |avg tokens 2346.400 |tokens/s 8674.737 |walltime 13198.074 | +Transformer | epoch 0 | step 50010 |avg loss 7.012 |avg tokens 2334.400 |tokens/s 8460.611 |walltime 13200.833 | +Transformer | epoch 0 | step 50020 |avg loss 8.206 |avg tokens 2084.800 |tokens/s 8598.989 |walltime 13203.257 | +Transformer | epoch 0 | step 50030 |avg loss 7.984 |avg tokens 2175.200 |tokens/s 8393.663 |walltime 13205.849 | +Transformer | epoch 0 | step 50040 |avg loss 7.762 |avg tokens 1930.200 |tokens/s 7488.289 |walltime 13208.426 | +Transformer | epoch 0 | step 50050 |avg loss 7.506 |avg tokens 2123.400 |tokens/s 8081.190 |walltime 13211.054 | +Transformer | epoch 0 | step 50060 |avg loss 7.975 |avg tokens 2328.100 |tokens/s 9143.338 |walltime 13213.600 | +Transformer | epoch 0 | step 50070 |avg loss 8.183 |avg tokens 1919.000 |tokens/s 8041.326 |walltime 13215.987 | +Transformer | epoch 0 | step 50080 |avg loss 7.958 |avg tokens 2039.500 |tokens/s 7993.852 |walltime 13218.538 | +Transformer | epoch 0 | step 50090 |avg loss 7.470 |avg tokens 2147.900 |tokens/s 8296.683 |walltime 13221.127 | +Transformer | epoch 0 | step 50100 |avg loss 7.985 |avg tokens 2242.900 |tokens/s 8528.883 |walltime 13223.757 | +Transformer | epoch 0 | step 50110 |avg loss 7.583 |avg tokens 2171.400 |tokens/s 8197.850 |walltime 13226.405 | +Transformer | epoch 0 | step 50120 |avg loss 7.780 |avg tokens 2112.500 |tokens/s 8048.262 |walltime 13229.030 | +Transformer | epoch 0 | step 50130 |avg loss 7.815 |avg tokens 2083.200 |tokens/s 8059.479 |walltime 13231.615 | +Transformer | epoch 0 | step 50140 |avg loss 7.533 |avg tokens 2324.000 |tokens/s 8369.890 |walltime 13234.391 | +Transformer | epoch 0 | step 50150 |avg loss 7.960 |avg tokens 1931.300 |tokens/s 7912.296 |walltime 13236.832 | +Transformer | epoch 0 | step 50160 |avg loss 7.545 |avg tokens 2335.100 |tokens/s 8582.767 |walltime 13239.553 | +Transformer | epoch 0 | step 50170 |avg loss 7.626 |avg tokens 2214.900 |tokens/s 8435.710 |walltime 13242.179 | +Transformer | epoch 0 | step 50180 |avg loss 7.449 |avg tokens 2205.200 |tokens/s 8379.814 |walltime 13244.810 | +Transformer | epoch 0 | step 50190 |avg loss 7.551 |avg tokens 2297.600 |tokens/s 8596.937 |walltime 13247.483 | +Transformer | epoch 0 | step 50200 |avg loss 8.050 |avg tokens 2068.800 |tokens/s 8500.362 |walltime 13249.917 | +Transformer | epoch 0 | step 50210 |avg loss 7.556 |avg tokens 2342.400 |tokens/s 8579.766 |walltime 13252.647 | +Transformer | epoch 0 | step 50220 |avg loss 7.672 |avg tokens 2275.300 |tokens/s 8416.398 |walltime 13255.350 | +Transformer | epoch 0 | step 50230 |avg loss 7.579 |avg tokens 2138.400 |tokens/s 8038.362 |walltime 13258.010 | +Transformer | epoch 0 | step 50240 |avg loss 7.580 |avg tokens 2088.300 |tokens/s 8104.893 |walltime 13260.587 | +Transformer | epoch 0 | step 50250 |avg loss 7.721 |avg tokens 2166.900 |tokens/s 8396.330 |walltime 13263.168 | +Transformer | epoch 0 | step 50260 |avg loss 8.003 |avg tokens 1980.900 |tokens/s 8086.130 |walltime 13265.618 | +Transformer | epoch 0 | step 50270 |avg loss 7.940 |avg tokens 2204.900 |tokens/s 8803.781 |walltime 13268.122 | +Transformer | epoch 0 | step 50280 |avg loss 7.684 |avg tokens 2202.000 |tokens/s 8133.016 |walltime 13270.830 | +Transformer | epoch 0 | step 50290 |avg loss 7.608 |avg tokens 2123.400 |tokens/s 8554.980 |walltime 13273.312 | +Transformer | epoch 0 | step 50300 |avg loss 7.669 |avg tokens 2224.400 |tokens/s 8483.775 |walltime 13275.934 | +Transformer | epoch 0 | step 50310 |avg loss 7.490 |avg tokens 2072.700 |tokens/s 7889.105 |walltime 13278.561 | +Transformer | epoch 0 | step 50320 |avg loss 7.635 |avg tokens 2379.700 |tokens/s 8682.862 |walltime 13281.302 | +Transformer | epoch 0 | step 50330 |avg loss 7.830 |avg tokens 2078.400 |tokens/s 7928.070 |walltime 13283.923 | +Transformer | epoch 0 | step 50340 |avg loss 7.557 |avg tokens 1995.800 |tokens/s 7806.924 |walltime 13286.480 | +Transformer | epoch 0 | step 50350 |avg loss 7.676 |avg tokens 2238.400 |tokens/s 8417.308 |walltime 13289.139 | +Transformer | epoch 0 | step 50360 |avg loss 7.855 |avg tokens 2147.900 |tokens/s 8339.985 |walltime 13291.714 | +Transformer | epoch 0 | step 50370 |avg loss 7.664 |avg tokens 2141.200 |tokens/s 7885.116 |walltime 13294.430 | +Transformer | epoch 0 | step 50380 |avg loss 7.710 |avg tokens 2161.900 |tokens/s 8486.894 |walltime 13296.977 | +Transformer | epoch 0 | step 50390 |avg loss 7.891 |avg tokens 2018.600 |tokens/s 7865.509 |walltime 13299.544 | +Transformer | epoch 0 | step 50400 |avg loss 7.750 |avg tokens 2240.700 |tokens/s 8510.979 |walltime 13302.176 | +Transformer | epoch 0 | step 50410 |avg loss 7.606 |avg tokens 2313.400 |tokens/s 8624.597 |walltime 13304.859 | +Transformer | epoch 0 | step 50420 |avg loss 7.392 |avg tokens 2408.700 |tokens/s 8821.334 |walltime 13307.589 | +Transformer | epoch 0 | step 50430 |avg loss 8.272 |avg tokens 2035.800 |tokens/s 8440.858 |walltime 13310.001 | +Transformer | epoch 0 | step 50440 |avg loss 7.637 |avg tokens 2153.500 |tokens/s 8242.918 |walltime 13312.614 | +Transformer | epoch 0 | step 50450 |avg loss 7.719 |avg tokens 2234.100 |tokens/s 8222.230 |walltime 13315.331 | +Transformer | epoch 0 | step 50460 |avg loss 7.599 |avg tokens 2238.600 |tokens/s 8554.701 |walltime 13317.947 | +Transformer | epoch 0 | step 50470 |avg loss 7.615 |avg tokens 2043.600 |tokens/s 7842.587 |walltime 13320.553 | +Transformer | epoch 0 | step 50480 |avg loss 8.026 |avg tokens 2017.700 |tokens/s 8123.256 |walltime 13323.037 | +Transformer | epoch 0 | step 50490 |avg loss 7.646 |avg tokens 2039.800 |tokens/s 7910.584 |walltime 13325.616 | +Transformer | epoch 0 | step 50500 |avg loss 7.707 |avg tokens 2433.000 |tokens/s 9175.817 |walltime 13328.267 | +Transformer | epoch 0 | step 50510 |avg loss 7.528 |avg tokens 2326.100 |tokens/s 8455.357 |walltime 13331.018 | +Transformer | epoch 0 | step 50520 |avg loss 7.987 |avg tokens 2192.700 |tokens/s 8276.606 |walltime 13333.668 | +Transformer | epoch 0 | step 50530 |avg loss 7.588 |avg tokens 2318.100 |tokens/s 8536.817 |walltime 13336.383 | +Transformer | epoch 0 | step 50540 |avg loss 8.029 |avg tokens 1903.400 |tokens/s 7876.372 |walltime 13338.800 | +Transformer | epoch 0 | step 50550 |avg loss 7.624 |avg tokens 2219.200 |tokens/s 8274.306 |walltime 13341.482 | +Transformer | epoch 0 | step 50560 |avg loss 7.653 |avg tokens 2349.700 |tokens/s 8698.315 |walltime 13344.183 | +Transformer | epoch 0 | step 50570 |avg loss 7.428 |avg tokens 2295.100 |tokens/s 8498.742 |walltime 13346.883 | +Transformer | epoch 0 | step 50580 |avg loss 8.058 |avg tokens 2050.300 |tokens/s 8266.342 |walltime 13349.364 | +Transformer | epoch 0 | step 50590 |avg loss 7.632 |avg tokens 2077.600 |tokens/s 8035.280 |walltime 13351.949 | +Transformer | epoch 0 | step 50600 |avg loss 7.791 |avg tokens 2200.800 |tokens/s 8079.910 |walltime 13354.673 | +Transformer | epoch 0 | step 50610 |avg loss 7.666 |avg tokens 2425.600 |tokens/s 8855.309 |walltime 13357.412 | +Transformer | epoch 0 | step 50620 |avg loss 7.715 |avg tokens 2202.300 |tokens/s 8282.570 |walltime 13360.071 | +Transformer | epoch 0 | step 50630 |avg loss 7.842 |avg tokens 2122.000 |tokens/s 8263.609 |walltime 13362.639 | +Transformer | epoch 0 | step 50640 |avg loss 7.784 |avg tokens 2207.400 |tokens/s 8512.375 |walltime 13365.232 | +Transformer | epoch 0 | step 50650 |avg loss 7.616 |avg tokens 2205.300 |tokens/s 8422.125 |walltime 13367.851 | +Transformer | epoch 0 | step 50660 |avg loss 8.080 |avg tokens 2240.300 |tokens/s 9033.561 |walltime 13370.331 | +Transformer | epoch 0 | step 50670 |avg loss 7.660 |avg tokens 2216.000 |tokens/s 8182.567 |walltime 13373.039 | +Transformer | epoch 0 | step 50680 |avg loss 7.511 |avg tokens 2298.400 |tokens/s 8600.802 |walltime 13375.711 | +Transformer | epoch 0 | step 50690 |avg loss 7.547 |avg tokens 2348.400 |tokens/s 8556.567 |walltime 13378.456 | +Transformer | epoch 0 | step 50700 |avg loss 7.779 |avg tokens 2345.700 |tokens/s 8871.461 |walltime 13381.100 | +Transformer | epoch 0 | step 50710 |avg loss 7.450 |avg tokens 2328.000 |tokens/s 8542.700 |walltime 13383.825 | +Transformer | epoch 0 | step 50720 |avg loss 7.754 |avg tokens 2276.000 |tokens/s 8483.074 |walltime 13386.508 | +Transformer | epoch 0 | step 50730 |avg loss 7.803 |avg tokens 2286.700 |tokens/s 8855.336 |walltime 13389.090 | +Transformer | epoch 0 | step 50740 |avg loss 7.900 |avg tokens 2202.900 |tokens/s 8262.702 |walltime 13391.756 | +Transformer | epoch 0 | step 50750 |avg loss 7.601 |avg tokens 2247.200 |tokens/s 8235.364 |walltime 13394.485 | +Transformer | epoch 0 | step 50760 |avg loss 7.581 |avg tokens 2329.400 |tokens/s 8496.744 |walltime 13397.227 | +Transformer | epoch 0 | step 50770 |avg loss 7.846 |avg tokens 2152.200 |tokens/s 8224.140 |walltime 13399.844 | +Transformer | epoch 0 | step 50780 |avg loss 7.694 |avg tokens 2046.700 |tokens/s 7995.560 |walltime 13402.403 | +Transformer | epoch 0 | step 50790 |avg loss 7.963 |avg tokens 2173.900 |tokens/s 8544.620 |walltime 13404.948 | +Transformer | epoch 0 | step 50800 |avg loss 7.670 |avg tokens 2107.400 |tokens/s 8299.004 |walltime 13407.487 | +Transformer | epoch 0 | step 50810 |avg loss 7.886 |avg tokens 2113.900 |tokens/s 8189.466 |walltime 13410.068 | +Transformer | epoch 0 | step 50820 |avg loss 8.087 |avg tokens 2011.300 |tokens/s 7985.569 |walltime 13412.587 | +Transformer | epoch 0 | step 50830 |avg loss 7.876 |avg tokens 2074.400 |tokens/s 7916.612 |walltime 13415.207 | +Transformer | epoch 0 | step 50840 |avg loss 7.785 |avg tokens 2088.500 |tokens/s 8123.570 |walltime 13417.778 | +Transformer | epoch 0 | step 50850 |avg loss 7.519 |avg tokens 2274.600 |tokens/s 8370.787 |walltime 13420.495 | +Transformer | epoch 0 | step 50860 |avg loss 7.495 |avg tokens 2380.800 |tokens/s 8485.485 |walltime 13423.301 | +Transformer | epoch 0 | step 50870 |avg loss 7.710 |avg tokens 2275.700 |tokens/s 8427.446 |walltime 13426.001 | +Transformer | epoch 0 | step 50880 |avg loss 7.986 |avg tokens 2159.200 |tokens/s 8635.962 |walltime 13428.502 | +Transformer | epoch 0 | step 50890 |avg loss 7.825 |avg tokens 2225.400 |tokens/s 8539.345 |walltime 13431.108 | +Transformer | epoch 0 | step 50900 |avg loss 7.434 |avg tokens 2325.500 |tokens/s 8671.118 |walltime 13433.790 | +Transformer | epoch 0 | step 50910 |avg loss 7.845 |avg tokens 2210.300 |tokens/s 8477.934 |walltime 13436.397 | +Transformer | epoch 0 | step 50920 |avg loss 7.532 |avg tokens 2100.800 |tokens/s 7846.901 |walltime 13439.074 | +Transformer | epoch 0 | step 50930 |avg loss 7.789 |avg tokens 2097.100 |tokens/s 8307.882 |walltime 13441.598 | +Transformer | epoch 0 | step 50940 |avg loss 7.599 |avg tokens 2388.200 |tokens/s 8641.154 |walltime 13444.362 | +Transformer | epoch 0 | step 50950 |avg loss 7.516 |avg tokens 2351.100 |tokens/s 8514.337 |walltime 13447.123 | +Transformer | epoch 0 | step 50960 |avg loss 7.465 |avg tokens 2298.600 |tokens/s 8541.505 |walltime 13449.814 | +Transformer | epoch 0 | step 50970 |avg loss 8.009 |avg tokens 2072.900 |tokens/s 8119.536 |walltime 13452.367 | +Transformer | epoch 0 | step 50980 |avg loss 7.779 |avg tokens 2231.500 |tokens/s 8365.158 |walltime 13455.035 | +Transformer | epoch 0 | step 50990 |avg loss 7.975 |avg tokens 2061.800 |tokens/s 8137.288 |walltime 13457.569 | +Transformer | epoch 0 | step 51000 |avg loss 7.541 |avg tokens 2105.200 |tokens/s 8039.667 |walltime 13460.187 | +Transformer | epoch 0 | step 51010 |avg loss 7.835 |avg tokens 2062.700 |tokens/s 7954.533 |walltime 13462.780 | +Transformer | epoch 0 | step 51020 |avg loss 7.492 |avg tokens 2312.000 |tokens/s 8321.151 |walltime 13465.559 | +Transformer | epoch 0 | step 51030 |avg loss 7.916 |avg tokens 2308.900 |tokens/s 8819.389 |walltime 13468.177 | +Transformer | epoch 0 | step 51040 |avg loss 7.784 |avg tokens 2136.200 |tokens/s 8332.408 |walltime 13470.741 | +Transformer | epoch 0 | step 51050 |avg loss 7.318 |avg tokens 2186.400 |tokens/s 8115.804 |walltime 13473.435 | +Transformer | epoch 0 | step 51060 |avg loss 7.671 |avg tokens 2165.400 |tokens/s 8197.589 |walltime 13476.076 | +Transformer | epoch 0 | step 51070 |avg loss 7.930 |avg tokens 2084.000 |tokens/s 8196.889 |walltime 13478.619 | +Transformer | epoch 0 | step 51080 |avg loss 7.920 |avg tokens 2059.200 |tokens/s 7962.790 |walltime 13481.205 | +Transformer | epoch 0 | step 51090 |avg loss 7.745 |avg tokens 1948.400 |tokens/s 7724.299 |walltime 13483.727 | +Transformer | epoch 0 | step 51100 |avg loss 7.903 |avg tokens 2236.800 |tokens/s 8556.858 |walltime 13486.341 | +Transformer | epoch 0 | step 51110 |avg loss 7.764 |avg tokens 2086.400 |tokens/s 7784.743 |walltime 13489.021 | +Transformer | epoch 0 | step 51120 |avg loss 7.831 |avg tokens 2162.400 |tokens/s 8173.727 |walltime 13491.667 | +Transformer | epoch 0 | step 51130 |avg loss 7.721 |avg tokens 2199.800 |tokens/s 8324.606 |walltime 13494.309 | +Transformer | epoch 0 | step 51140 |avg loss 7.537 |avg tokens 2038.600 |tokens/s 7744.469 |walltime 13496.942 | +Transformer | epoch 0 | step 51150 |avg loss 7.803 |avg tokens 2332.000 |tokens/s 8892.293 |walltime 13499.564 | +Transformer | epoch 0 | step 51160 |avg loss 7.998 |avg tokens 2067.100 |tokens/s 8224.108 |walltime 13502.078 | +Transformer | epoch 0 | step 51170 |avg loss 7.589 |avg tokens 2160.700 |tokens/s 8198.602 |walltime 13504.713 | +Transformer | epoch 0 | step 51180 |avg loss 7.795 |avg tokens 1780.900 |tokens/s 7419.032 |walltime 13507.114 | +Transformer | epoch 0 | step 51190 |avg loss 7.831 |avg tokens 2272.600 |tokens/s 8553.872 |walltime 13509.770 | +Transformer | epoch 0 | step 51200 |avg loss 7.686 |avg tokens 2233.600 |tokens/s 8504.008 |walltime 13512.397 | +Transformer | epoch 0 | step 51210 |avg loss 7.824 |avg tokens 2053.500 |tokens/s 8138.409 |walltime 13514.920 | +Transformer | epoch 0 | step 51220 |avg loss 7.657 |avg tokens 2322.300 |tokens/s 8505.833 |walltime 13517.650 | +Transformer | epoch 0 | step 51230 |avg loss 7.684 |avg tokens 2040.600 |tokens/s 7824.065 |walltime 13520.258 | +Transformer | epoch 0 | step 51240 |avg loss 7.798 |avg tokens 2191.200 |tokens/s 8219.065 |walltime 13522.924 | +Transformer | epoch 0 | step 51250 |avg loss 8.079 |avg tokens 2143.900 |tokens/s 8492.391 |walltime 13525.449 | +Transformer | epoch 0 | step 51260 |avg loss 7.848 |avg tokens 2104.600 |tokens/s 8231.624 |walltime 13528.006 | +Transformer | epoch 0 | step 51270 |avg loss 7.594 |avg tokens 2279.200 |tokens/s 8285.680 |walltime 13530.756 | +Transformer | epoch 0 | step 51280 |avg loss 7.745 |avg tokens 2364.200 |tokens/s 8596.969 |walltime 13533.507 | +Transformer | epoch 0 | step 51290 |avg loss 7.918 |avg tokens 2117.500 |tokens/s 8170.586 |walltime 13536.098 | +Transformer | epoch 0 | step 51300 |avg loss 7.515 |avg tokens 2336.800 |tokens/s 8569.037 |walltime 13538.825 | +Transformer | epoch 0 | step 51310 |avg loss 7.966 |avg tokens 2159.200 |tokens/s 8397.061 |walltime 13541.397 | +Transformer | epoch 0 | step 51320 |avg loss 7.412 |avg tokens 2374.600 |tokens/s 8444.445 |walltime 13544.209 | +Transformer | epoch 0 | step 51330 |avg loss 7.811 |avg tokens 2215.400 |tokens/s 8330.991 |walltime 13546.868 | +Transformer | epoch 0 | step 51340 |avg loss 7.483 |avg tokens 2268.800 |tokens/s 8396.072 |walltime 13549.570 | +Transformer | epoch 0 | step 51350 |avg loss 7.662 |avg tokens 2296.800 |tokens/s 8358.487 |walltime 13552.318 | +Transformer | epoch 0 | step 51360 |avg loss 7.676 |avg tokens 2268.000 |tokens/s 8461.161 |walltime 13554.998 | +Transformer | epoch 0 | step 51370 |avg loss 7.433 |avg tokens 2146.200 |tokens/s 8048.795 |walltime 13557.665 | +Transformer | epoch 0 | step 51380 |avg loss 7.802 |avg tokens 2228.300 |tokens/s 8410.534 |walltime 13560.314 | +Transformer | epoch 0 | step 51390 |avg loss 7.605 |avg tokens 2214.100 |tokens/s 8462.030 |walltime 13562.931 | +Transformer | epoch 0 | step 51400 |avg loss 7.544 |avg tokens 2241.100 |tokens/s 8319.391 |walltime 13565.625 | +Transformer | epoch 0 | step 51410 |avg loss 7.793 |avg tokens 2128.200 |tokens/s 7914.531 |walltime 13568.314 | +Transformer | epoch 0 | step 51420 |avg loss 7.909 |avg tokens 2335.500 |tokens/s 8808.683 |walltime 13570.965 | +Transformer | epoch 0 | step 51430 |avg loss 7.648 |avg tokens 2225.600 |tokens/s 8344.294 |walltime 13573.632 | +Transformer | epoch 0 | step 51440 |avg loss 7.901 |avg tokens 1816.000 |tokens/s 7461.024 |walltime 13576.066 | +Transformer | epoch 0 | step 51450 |avg loss 7.542 |avg tokens 2018.200 |tokens/s 7709.639 |walltime 13578.684 | +Transformer | epoch 0 | step 51460 |avg loss 8.001 |avg tokens 2008.300 |tokens/s 7944.520 |walltime 13581.212 | +Transformer | epoch 0 | step 51470 |avg loss 7.762 |avg tokens 2212.200 |tokens/s 8398.113 |walltime 13583.846 | +Transformer | epoch 0 | step 51480 |avg loss 8.077 |avg tokens 2023.000 |tokens/s 8265.390 |walltime 13586.294 | +Transformer | epoch 0 | step 51490 |avg loss 7.508 |avg tokens 2303.600 |tokens/s 8243.615 |walltime 13589.088 | +Transformer | epoch 0 | step 51500 |avg loss 7.754 |avg tokens 2242.600 |tokens/s 8268.695 |walltime 13591.800 | +Transformer | epoch 0 | step 51510 |avg loss 7.634 |avg tokens 2286.400 |tokens/s 8485.614 |walltime 13594.495 | +Transformer | epoch 0 | step 51520 |avg loss 7.478 |avg tokens 2372.300 |tokens/s 8919.228 |walltime 13597.154 | +Transformer | epoch 0 | step 51530 |avg loss 7.742 |avg tokens 2224.500 |tokens/s 8488.554 |walltime 13599.775 | +Transformer | epoch 0 | step 51540 |avg loss 7.687 |avg tokens 1982.300 |tokens/s 7626.586 |walltime 13602.374 | +Transformer | epoch 0 | step 51550 |avg loss 7.852 |avg tokens 2283.000 |tokens/s 8518.453 |walltime 13605.054 | +Transformer | epoch 0 | step 51560 |avg loss 7.451 |avg tokens 2260.800 |tokens/s 8315.502 |walltime 13607.773 | +Transformer | epoch 0 | step 51570 |avg loss 7.714 |avg tokens 2400.800 |tokens/s 8658.491 |walltime 13610.546 | +Transformer | epoch 0 | step 51580 |avg loss 7.878 |avg tokens 2140.200 |tokens/s 8521.300 |walltime 13613.057 | +Transformer | epoch 0 | step 51590 |avg loss 7.665 |avg tokens 2100.100 |tokens/s 7881.807 |walltime 13615.722 | +Transformer | epoch 0 | step 51600 |avg loss 7.846 |avg tokens 1875.000 |tokens/s 7743.919 |walltime 13618.143 | +Transformer | epoch 0 | step 51610 |avg loss 7.927 |avg tokens 2159.300 |tokens/s 8179.628 |walltime 13620.783 | +Transformer | epoch 0 | step 51620 |avg loss 7.938 |avg tokens 2045.500 |tokens/s 8117.391 |walltime 13623.303 | +Transformer | epoch 0 | step 51630 |avg loss 7.980 |avg tokens 2346.100 |tokens/s 8853.661 |walltime 13625.953 | +Transformer | epoch 0 | step 51640 |avg loss 7.845 |avg tokens 2035.900 |tokens/s 7936.346 |walltime 13628.518 | +Transformer | epoch 0 | step 51650 |avg loss 7.718 |avg tokens 2211.000 |tokens/s 8328.541 |walltime 13631.173 | +Transformer | epoch 0 | step 51660 |avg loss 7.734 |avg tokens 2038.600 |tokens/s 7946.704 |walltime 13633.738 | +Transformer | epoch 0 | step 51670 |avg loss 7.996 |avg tokens 2094.000 |tokens/s 8076.013 |walltime 13636.331 | +Transformer | epoch 0 | step 51680 |avg loss 7.385 |avg tokens 2266.800 |tokens/s 8245.668 |walltime 13639.080 | +Transformer | epoch 0 | step 51690 |avg loss 7.842 |avg tokens 2135.700 |tokens/s 8070.002 |walltime 13641.727 | +Transformer | epoch 0 | step 51700 |avg loss 7.680 |avg tokens 2422.500 |tokens/s 8957.839 |walltime 13644.431 | +Transformer | epoch 0 | step 51710 |avg loss 7.804 |avg tokens 2166.000 |tokens/s 8214.449 |walltime 13647.068 | +Transformer | epoch 0 | step 51720 |avg loss 7.823 |avg tokens 2193.400 |tokens/s 8626.556 |walltime 13649.610 | +Transformer | epoch 0 | step 51730 |avg loss 7.434 |avg tokens 2162.600 |tokens/s 8070.193 |walltime 13652.290 | +Transformer | epoch 0 | step 51740 |avg loss 7.949 |avg tokens 2079.100 |tokens/s 8405.114 |walltime 13654.764 | +Transformer | epoch 0 | step 51750 |avg loss 7.767 |avg tokens 2001.700 |tokens/s 7988.608 |walltime 13657.269 | +Transformer | epoch 0 | step 51760 |avg loss 7.943 |avg tokens 2055.500 |tokens/s 8412.961 |walltime 13659.713 | +Transformer | epoch 0 | step 51770 |avg loss 7.842 |avg tokens 2116.200 |tokens/s 8064.484 |walltime 13662.337 | +Transformer | epoch 0 | step 51780 |avg loss 7.735 |avg tokens 2238.800 |tokens/s 8308.688 |walltime 13665.031 | +Transformer | epoch 0 | step 51790 |avg loss 7.745 |avg tokens 2092.600 |tokens/s 7973.475 |walltime 13667.656 | +Transformer | epoch 0 | step 51800 |avg loss 7.485 |avg tokens 2260.000 |tokens/s 8375.347 |walltime 13670.354 | +Transformer | epoch 0 | step 51810 |avg loss 7.669 |avg tokens 2131.100 |tokens/s 8072.315 |walltime 13672.994 | +Transformer | epoch 0 | step 51820 |avg loss 7.935 |avg tokens 2205.500 |tokens/s 8540.701 |walltime 13675.576 | +Transformer | epoch 0 | step 51830 |avg loss 7.514 |avg tokens 1960.800 |tokens/s 7699.485 |walltime 13678.123 | +Transformer | epoch 0 | step 51840 |avg loss 7.554 |avg tokens 2424.000 |tokens/s 8853.417 |walltime 13680.861 | +Transformer | epoch 0 | step 51850 |avg loss 7.387 |avg tokens 2215.800 |tokens/s 8135.495 |walltime 13683.585 | +Transformer | epoch 0 | step 51860 |avg loss 7.781 |avg tokens 2109.500 |tokens/s 8152.487 |walltime 13686.172 | +Transformer | epoch 0 | step 51870 |avg loss 7.942 |avg tokens 1897.500 |tokens/s 7741.302 |walltime 13688.623 | +Transformer | epoch 0 | step 51880 |avg loss 7.915 |avg tokens 2069.800 |tokens/s 8384.364 |walltime 13691.092 | +Transformer | epoch 0 | step 51890 |avg loss 7.690 |avg tokens 2017.500 |tokens/s 8106.095 |walltime 13693.581 | +Transformer | epoch 0 | step 51900 |avg loss 7.837 |avg tokens 2238.800 |tokens/s 8743.827 |walltime 13696.141 | +Transformer | epoch 0 | step 51910 |avg loss 8.068 |avg tokens 1778.700 |tokens/s 7286.497 |walltime 13698.582 | +Transformer | epoch 0 | step 51920 |avg loss 7.946 |avg tokens 2128.800 |tokens/s 8088.633 |walltime 13701.214 | +Transformer | epoch 0 | step 51930 |avg loss 7.833 |avg tokens 2318.200 |tokens/s 8729.367 |walltime 13703.870 | +Transformer | epoch 0 | step 51940 |avg loss 8.353 |avg tokens 2013.500 |tokens/s 8399.792 |walltime 13706.267 | +Transformer | epoch 0 | step 51950 |avg loss 7.787 |avg tokens 2054.900 |tokens/s 7858.538 |walltime 13708.882 | +Transformer | epoch 0 | step 51960 |avg loss 7.889 |avg tokens 2307.700 |tokens/s 8616.058 |walltime 13711.560 | +Transformer | epoch 0 | step 51970 |avg loss 8.122 |avg tokens 1716.900 |tokens/s 7292.619 |walltime 13713.915 | +Transformer | epoch 0 | step 51980 |avg loss 8.066 |avg tokens 2042.400 |tokens/s 8213.669 |walltime 13716.401 | +Transformer | epoch 0 | step 51990 |avg loss 7.550 |avg tokens 2276.800 |tokens/s 8423.792 |walltime 13719.104 | +Transformer | epoch 0 | step 52000 |avg loss 7.718 |avg tokens 2271.100 |tokens/s 8545.349 |walltime 13721.762 | +Transformer | epoch 0 | step 52010 |avg loss 7.771 |avg tokens 2162.100 |tokens/s 8294.615 |walltime 13724.368 | +Transformer | epoch 0 | step 52020 |avg loss 7.819 |avg tokens 2331.200 |tokens/s 8785.265 |walltime 13727.022 | +Transformer | epoch 0 | step 52030 |avg loss 7.417 |avg tokens 2374.400 |tokens/s 8561.104 |walltime 13729.795 | +Transformer | epoch 0 | step 52040 |avg loss 7.509 |avg tokens 2234.500 |tokens/s 8106.840 |walltime 13732.552 | +Transformer | epoch 0 | step 52050 |avg loss 7.500 |avg tokens 2226.300 |tokens/s 8389.119 |walltime 13735.205 | +Transformer | epoch 0 | step 52060 |avg loss 7.580 |avg tokens 2106.400 |tokens/s 7979.322 |walltime 13737.845 | +Transformer | epoch 0 | step 52070 |avg loss 7.795 |avg tokens 2137.800 |tokens/s 8019.584 |walltime 13740.511 | +Transformer | epoch 0 | step 52080 |avg loss 7.383 |avg tokens 2218.400 |tokens/s 8147.662 |walltime 13743.234 | +Transformer | epoch 0 | step 52090 |avg loss 7.815 |avg tokens 2219.300 |tokens/s 8493.759 |walltime 13745.847 | +Transformer | epoch 0 | step 52100 |avg loss 7.467 |avg tokens 2396.000 |tokens/s 8719.991 |walltime 13748.594 | +Transformer | epoch 0 | step 52110 |avg loss 7.816 |avg tokens 2156.100 |tokens/s 7878.158 |walltime 13751.331 | +Transformer | epoch 0 | step 52120 |avg loss 7.545 |avg tokens 2105.000 |tokens/s 8059.583 |walltime 13753.943 | +Transformer | epoch 0 | step 52130 |avg loss 7.919 |avg tokens 2060.800 |tokens/s 7919.066 |walltime 13756.545 | +Transformer | epoch 0 | step 52140 |avg loss 7.895 |avg tokens 2271.900 |tokens/s 8617.145 |walltime 13759.182 | +Transformer | epoch 0 | step 52150 |avg loss 7.722 |avg tokens 2224.600 |tokens/s 8323.566 |walltime 13761.854 | +Transformer | epoch 0 | step 52160 |avg loss 7.508 |avg tokens 2324.600 |tokens/s 8414.932 |walltime 13764.617 | +Transformer | epoch 0 | step 52170 |avg loss 7.991 |avg tokens 2277.800 |tokens/s 8906.500 |walltime 13767.174 | +Transformer | epoch 0 | step 52180 |avg loss 7.614 |avg tokens 2255.600 |tokens/s 8138.525 |walltime 13769.946 | +Transformer | epoch 0 | step 52190 |avg loss 7.163 |avg tokens 2291.900 |tokens/s 8373.931 |walltime 13772.683 | +Transformer | epoch 0 | step 52200 |avg loss 7.487 |avg tokens 2273.600 |tokens/s 8499.034 |walltime 13775.358 | +Transformer | epoch 0 | step 52210 |avg loss 7.555 |avg tokens 2324.200 |tokens/s 8574.990 |walltime 13778.068 | +Transformer | epoch 0 | step 52220 |avg loss 7.416 |avg tokens 2150.400 |tokens/s 8060.909 |walltime 13780.736 | +Transformer | epoch 0 | step 52230 |avg loss 7.801 |avg tokens 1912.400 |tokens/s 7805.974 |walltime 13783.186 | +Transformer | epoch 0 | step 52240 |avg loss 7.612 |avg tokens 2143.400 |tokens/s 8087.305 |walltime 13785.836 | +Transformer | epoch 0 | step 52250 |avg loss 7.719 |avg tokens 2184.000 |tokens/s 8448.535 |walltime 13788.421 | +Transformer | epoch 0 | step 52260 |avg loss 7.574 |avg tokens 2096.700 |tokens/s 8227.616 |walltime 13790.970 | +Transformer | epoch 0 | step 52270 |avg loss 7.922 |avg tokens 1822.700 |tokens/s 7508.001 |walltime 13793.397 | +Transformer | epoch 0 | step 52280 |avg loss 7.790 |avg tokens 2036.100 |tokens/s 7790.681 |walltime 13796.011 | +Transformer | epoch 0 | step 52290 |avg loss 7.811 |avg tokens 2009.800 |tokens/s 7824.996 |walltime 13798.579 | +Transformer | epoch 0 | step 52300 |avg loss 7.876 |avg tokens 2166.900 |tokens/s 8299.989 |walltime 13801.190 | +Transformer | epoch 0 | step 52310 |avg loss 8.137 |avg tokens 2202.900 |tokens/s 8625.751 |walltime 13803.744 | +Transformer | epoch 0 | step 52320 |avg loss 7.963 |avg tokens 1968.800 |tokens/s 7814.207 |walltime 13806.264 | +Transformer | epoch 0 | step 52330 |avg loss 7.848 |avg tokens 2323.400 |tokens/s 8800.383 |walltime 13808.904 | +Transformer | epoch 0 | step 52340 |avg loss 8.052 |avg tokens 1855.300 |tokens/s 7597.519 |walltime 13811.346 | +Transformer | epoch 0 | step 52350 |avg loss 7.580 |avg tokens 2283.000 |tokens/s 8327.749 |walltime 13814.087 | +Transformer | epoch 0 | step 52360 |avg loss 7.856 |avg tokens 2177.800 |tokens/s 8323.532 |walltime 13816.704 | +Transformer | epoch 0 | step 52370 |avg loss 7.798 |avg tokens 2130.900 |tokens/s 8129.901 |walltime 13819.325 | +Transformer | epoch 0 | step 52380 |avg loss 7.831 |avg tokens 2030.600 |tokens/s 7737.900 |walltime 13821.949 | +Transformer | epoch 0 | step 52390 |avg loss 7.603 |avg tokens 2292.800 |tokens/s 8356.165 |walltime 13824.693 | +Transformer | epoch 0 | step 52400 |avg loss 7.776 |avg tokens 2136.800 |tokens/s 8073.259 |walltime 13827.339 | +Transformer | epoch 0 | step 52410 |avg loss 8.006 |avg tokens 2260.200 |tokens/s 8870.106 |walltime 13829.888 | +Transformer | epoch 0 | step 52420 |avg loss 7.982 |avg tokens 1995.000 |tokens/s 7996.904 |walltime 13832.382 | +Transformer | epoch 0 | step 52430 |avg loss 7.790 |avg tokens 2183.400 |tokens/s 8328.166 |walltime 13835.004 | +Transformer | epoch 0 | step 52440 |avg loss 7.482 |avg tokens 2144.000 |tokens/s 8104.581 |walltime 13837.649 | +Transformer | epoch 0 | step 52450 |avg loss 7.591 |avg tokens 2253.900 |tokens/s 8632.180 |walltime 13840.260 | +Transformer | epoch 0 | step 52460 |avg loss 7.643 |avg tokens 2267.100 |tokens/s 8269.854 |walltime 13843.002 | +Transformer | epoch 0 | step 52470 |avg loss 7.695 |avg tokens 2215.200 |tokens/s 8195.828 |walltime 13845.705 | +Transformer | epoch 0 | step 52480 |avg loss 7.797 |avg tokens 2133.000 |tokens/s 8145.126 |walltime 13848.323 | +Transformer | epoch 0 | step 52490 |avg loss 7.977 |avg tokens 2217.100 |tokens/s 8670.580 |walltime 13850.880 | +Transformer | epoch 0 | step 52500 |avg loss 7.915 |avg tokens 2125.800 |tokens/s 8020.676 |walltime 13853.531 | +Transformer | epoch 0 | step 52510 |avg loss 7.945 |avg tokens 2038.900 |tokens/s 7795.461 |walltime 13856.146 | +Transformer | epoch 0 | step 52520 |avg loss 7.730 |avg tokens 2251.500 |tokens/s 8123.147 |walltime 13858.918 | +Transformer | epoch 0 | step 52530 |avg loss 7.407 |avg tokens 2265.600 |tokens/s 8342.766 |walltime 13861.634 | +Transformer | epoch 0 | step 52540 |avg loss 7.644 |avg tokens 2260.000 |tokens/s 8494.343 |walltime 13864.294 | +Transformer | epoch 0 | step 52550 |avg loss 7.778 |avg tokens 2257.100 |tokens/s 8380.489 |walltime 13866.988 | +Transformer | epoch 0 | step 52560 |avg loss 7.680 |avg tokens 2302.400 |tokens/s 8687.998 |walltime 13869.638 | +Transformer | epoch 0 | step 52570 |avg loss 7.822 |avg tokens 2065.200 |tokens/s 7872.964 |walltime 13872.261 | +Transformer | epoch 0 | step 52580 |avg loss 7.724 |avg tokens 2031.700 |tokens/s 7946.639 |walltime 13874.818 | +Transformer | epoch 0 | step 52590 |avg loss 7.787 |avg tokens 2223.200 |tokens/s 8353.410 |walltime 13877.479 | +Transformer | epoch 0 | step 52600 |avg loss 7.825 |avg tokens 2094.400 |tokens/s 7899.265 |walltime 13880.130 | +Transformer | epoch 0 | step 52610 |avg loss 7.537 |avg tokens 2165.400 |tokens/s 7872.259 |walltime 13882.881 | +Transformer | epoch 0 | step 52620 |avg loss 7.542 |avg tokens 2209.600 |tokens/s 8229.524 |walltime 13885.566 | +Transformer | epoch 0 | step 52630 |avg loss 7.624 |avg tokens 2270.400 |tokens/s 8280.581 |walltime 13888.308 | +Transformer | epoch 0 | step 52640 |avg loss 7.708 |avg tokens 2205.100 |tokens/s 8369.354 |walltime 13890.943 | +Transformer | epoch 0 | step 52650 |avg loss 7.742 |avg tokens 2109.200 |tokens/s 8036.048 |walltime 13893.567 | +Transformer | epoch 0 | step 52660 |avg loss 7.556 |avg tokens 2354.600 |tokens/s 8727.479 |walltime 13896.265 | +Transformer | epoch 0 | step 52670 |avg loss 7.800 |avg tokens 2215.600 |tokens/s 8436.424 |walltime 13898.891 | +Transformer | epoch 0 | step 52680 |avg loss 7.621 |avg tokens 2242.400 |tokens/s 8166.974 |walltime 13901.637 | +Transformer | epoch 0 | step 52690 |avg loss 8.073 |avg tokens 2162.800 |tokens/s 8645.848 |walltime 13904.139 | +Transformer | epoch 0 | step 52700 |avg loss 7.706 |avg tokens 2220.600 |tokens/s 8428.088 |walltime 13906.773 | +Transformer | epoch 0 | step 52710 |avg loss 7.941 |avg tokens 2224.600 |tokens/s 8380.592 |walltime 13909.428 | +Transformer | epoch 0 | step 52720 |avg loss 7.885 |avg tokens 2037.900 |tokens/s 8101.124 |walltime 13911.943 | +Transformer | epoch 0 | step 52730 |avg loss 7.438 |avg tokens 2160.800 |tokens/s 8038.867 |walltime 13914.631 | +Transformer | epoch 0 | step 52740 |avg loss 7.552 |avg tokens 2209.600 |tokens/s 8217.278 |walltime 13917.320 | +Transformer | epoch 0 | step 52750 |avg loss 7.678 |avg tokens 2248.000 |tokens/s 8551.722 |walltime 13919.949 | +Transformer | epoch 0 | step 52760 |avg loss 7.826 |avg tokens 2259.300 |tokens/s 8629.410 |walltime 13922.567 | +Transformer | epoch 0 | step 52770 |avg loss 7.648 |avg tokens 2347.200 |tokens/s 8499.341 |walltime 13925.329 | +Transformer | epoch 0 | step 52780 |avg loss 7.650 |avg tokens 2331.500 |tokens/s 8432.281 |walltime 13928.094 | +Transformer | epoch 0 | step 52790 |avg loss 7.687 |avg tokens 2099.100 |tokens/s 8122.041 |walltime 13930.678 | +Transformer | epoch 0 | step 52800 |avg loss 7.501 |avg tokens 2239.000 |tokens/s 8170.119 |walltime 13933.419 | +Transformer | epoch 0 | step 52810 |avg loss 7.059 |avg tokens 2339.200 |tokens/s 8440.833 |walltime 13936.190 | +Transformer | epoch 0 | step 52820 |avg loss 7.923 |avg tokens 1849.600 |tokens/s 7501.435 |walltime 13938.656 | +Transformer | epoch 0 | step 52830 |avg loss 7.893 |avg tokens 2141.200 |tokens/s 8325.996 |walltime 13941.227 | +Transformer | epoch 0 | step 52840 |avg loss 7.534 |avg tokens 2326.400 |tokens/s 8371.623 |walltime 13944.006 | +Transformer | epoch 0 | step 52850 |avg loss 7.904 |avg tokens 2198.600 |tokens/s 8350.632 |walltime 13946.639 | +Transformer | epoch 0 | step 52860 |avg loss 8.229 |avg tokens 2017.300 |tokens/s 8238.920 |walltime 13949.088 | +Transformer | epoch 0 | step 52870 |avg loss 7.683 |avg tokens 2147.800 |tokens/s 8201.707 |walltime 13951.706 | +Transformer | epoch 0 | step 52880 |avg loss 7.996 |avg tokens 2006.700 |tokens/s 7968.526 |walltime 13954.225 | +Transformer | epoch 0 | step 52890 |avg loss 7.846 |avg tokens 2107.600 |tokens/s 8339.538 |walltime 13956.752 | +Transformer | epoch 0 | step 52900 |avg loss 7.826 |avg tokens 2423.300 |tokens/s 8820.010 |walltime 13959.500 | +Transformer | epoch 0 | step 52910 |avg loss 7.545 |avg tokens 2121.600 |tokens/s 7924.117 |walltime 13962.177 | +Transformer | epoch 0 | step 52920 |avg loss 7.708 |avg tokens 2303.200 |tokens/s 8654.313 |walltime 13964.838 | +Transformer | epoch 0 | step 52930 |avg loss 7.483 |avg tokens 2415.200 |tokens/s 8627.753 |walltime 13967.638 | +Transformer | epoch 0 | step 52940 |avg loss 7.822 |avg tokens 2109.100 |tokens/s 8114.732 |walltime 13970.237 | +Transformer | epoch 0 | step 52950 |avg loss 7.696 |avg tokens 2249.300 |tokens/s 8651.627 |walltime 13972.837 | +Transformer | epoch 0 | step 52960 |avg loss 7.902 |avg tokens 2051.800 |tokens/s 7989.688 |walltime 13975.405 | +Transformer | epoch 0 | step 52970 |avg loss 7.786 |avg tokens 2161.600 |tokens/s 8044.071 |walltime 13978.092 | +Transformer | epoch 0 | step 52980 |avg loss 7.718 |avg tokens 2057.900 |tokens/s 7875.142 |walltime 13980.705 | +Transformer | epoch 0 | step 52990 |avg loss 7.878 |avg tokens 1996.800 |tokens/s 7739.002 |walltime 13983.285 | +Transformer | epoch 0 | step 53000 |avg loss 7.810 |avg tokens 2202.400 |tokens/s 8656.893 |walltime 13985.829 | +Transformer | epoch 0 | step 53010 |avg loss 7.641 |avg tokens 2295.200 |tokens/s 8529.994 |walltime 13988.520 | +Transformer | epoch 0 | step 53020 |avg loss 7.517 |avg tokens 2294.800 |tokens/s 8344.829 |walltime 13991.270 | +Transformer | epoch 0 | step 53030 |avg loss 7.641 |avg tokens 2279.700 |tokens/s 8558.596 |walltime 13993.934 | +Transformer | epoch 0 | step 53040 |avg loss 7.800 |avg tokens 2224.800 |tokens/s 8368.349 |walltime 13996.592 | +Transformer | epoch 0 | step 53050 |avg loss 8.129 |avg tokens 1782.900 |tokens/s 7489.411 |walltime 13998.973 | +Transformer | epoch 0 | step 53060 |avg loss 7.599 |avg tokens 2220.000 |tokens/s 8179.543 |walltime 14001.687 | +Transformer | epoch 0 | step 53070 |avg loss 7.826 |avg tokens 2127.100 |tokens/s 8041.192 |walltime 14004.332 | +Transformer | epoch 0 | step 53080 |avg loss 7.892 |avg tokens 2095.300 |tokens/s 7989.537 |walltime 14006.955 | +Transformer | epoch 0 | step 53090 |avg loss 7.617 |avg tokens 2121.100 |tokens/s 7991.457 |walltime 14009.609 | +Transformer | epoch 0 | step 53100 |avg loss 7.873 |avg tokens 2143.800 |tokens/s 8398.586 |walltime 14012.161 | +Transformer | epoch 0 | step 53110 |avg loss 7.721 |avg tokens 2300.300 |tokens/s 8547.669 |walltime 14014.853 | +Transformer | epoch 0 | step 53120 |avg loss 7.981 |avg tokens 2042.100 |tokens/s 8228.772 |walltime 14017.334 | +Transformer | epoch 0 | step 53130 |avg loss 7.713 |avg tokens 1877.800 |tokens/s 7330.214 |walltime 14019.896 | +Transformer | epoch 0 | step 53140 |avg loss 7.586 |avg tokens 1950.200 |tokens/s 7716.725 |walltime 14022.423 | +Transformer | epoch 0 | step 53150 |avg loss 7.935 |avg tokens 2110.000 |tokens/s 8173.908 |walltime 14025.005 | +Transformer | epoch 0 | step 53160 |avg loss 7.662 |avg tokens 2147.800 |tokens/s 8009.931 |walltime 14027.686 | +Transformer | epoch 0 | step 53170 |avg loss 7.451 |avg tokens 2109.600 |tokens/s 8136.733 |walltime 14030.279 | +Transformer | epoch 0 | step 53180 |avg loss 7.728 |avg tokens 2078.300 |tokens/s 8052.047 |walltime 14032.860 | +Transformer | epoch 0 | step 53190 |avg loss 7.760 |avg tokens 2128.200 |tokens/s 7921.782 |walltime 14035.546 | +Transformer | epoch 0 | step 53200 |avg loss 7.685 |avg tokens 2284.600 |tokens/s 8655.865 |walltime 14038.186 | +Transformer | epoch 0 | step 53210 |avg loss 7.901 |avg tokens 2161.700 |tokens/s 8257.607 |walltime 14040.804 | +Transformer | epoch 0 | step 53220 |avg loss 7.835 |avg tokens 2084.800 |tokens/s 8127.405 |walltime 14043.369 | +Transformer | epoch 0 | step 53230 |avg loss 7.590 |avg tokens 2126.600 |tokens/s 8116.694 |walltime 14045.989 | +Transformer | epoch 0 | step 53240 |avg loss 7.894 |avg tokens 2023.900 |tokens/s 7946.606 |walltime 14048.536 | +Transformer | epoch 0 | step 53250 |avg loss 8.062 |avg tokens 2136.700 |tokens/s 8425.390 |walltime 14051.072 | +Transformer | epoch 0 | step 53260 |avg loss 7.660 |avg tokens 2346.400 |tokens/s 8715.107 |walltime 14053.764 | +Transformer | epoch 0 | step 53270 |avg loss 7.890 |avg tokens 2182.400 |tokens/s 8411.445 |walltime 14056.359 | +Transformer | epoch 0 | step 53280 |avg loss 7.536 |avg tokens 2303.700 |tokens/s 8597.955 |walltime 14059.038 | +Transformer | epoch 0 | step 53290 |avg loss 7.869 |avg tokens 2071.300 |tokens/s 8174.692 |walltime 14061.572 | +Transformer | epoch 0 | step 53300 |avg loss 7.918 |avg tokens 2112.500 |tokens/s 8221.045 |walltime 14064.141 | +Transformer | epoch 0 | step 53310 |avg loss 7.573 |avg tokens 2381.600 |tokens/s 8481.278 |walltime 14066.949 | +Transformer | epoch 0 | step 53320 |avg loss 7.623 |avg tokens 2297.600 |tokens/s 8507.342 |walltime 14069.650 | +Transformer | epoch 0 | step 53330 |avg loss 7.767 |avg tokens 2112.300 |tokens/s 7947.381 |walltime 14072.308 | +Transformer | epoch 0 | step 53340 |avg loss 8.242 |avg tokens 2166.100 |tokens/s 8561.937 |walltime 14074.838 | +Transformer | epoch 0 | step 53350 |avg loss 7.983 |avg tokens 2049.700 |tokens/s 8199.297 |walltime 14077.338 | +Transformer | epoch 0 | step 53360 |avg loss 7.660 |avg tokens 2375.900 |tokens/s 8549.028 |walltime 14080.117 | +Transformer | epoch 0 | step 53370 |avg loss 7.722 |avg tokens 2379.200 |tokens/s 8642.415 |walltime 14082.870 | +Transformer | epoch 0 | step 53380 |avg loss 7.782 |avg tokens 2088.900 |tokens/s 7802.732 |walltime 14085.547 | +Transformer | epoch 0 | step 53390 |avg loss 7.538 |avg tokens 2328.800 |tokens/s 8507.977 |walltime 14088.284 | +Transformer | epoch 0 | step 53400 |avg loss 7.607 |avg tokens 2260.600 |tokens/s 8292.024 |walltime 14091.010 | +Transformer | epoch 0 | step 53410 |avg loss 7.816 |avg tokens 1849.200 |tokens/s 7532.677 |walltime 14093.465 | +Transformer | epoch 0 | step 53420 |avg loss 7.519 |avg tokens 2070.500 |tokens/s 7958.604 |walltime 14096.067 | +Transformer | epoch 0 | step 53430 |avg loss 7.969 |avg tokens 2157.600 |tokens/s 8180.546 |walltime 14098.704 | +Transformer | epoch 0 | step 53440 |avg loss 7.661 |avg tokens 2193.500 |tokens/s 8453.823 |walltime 14101.299 | +Transformer | epoch 0 | step 53450 |avg loss 8.091 |avg tokens 2163.300 |tokens/s 8540.576 |walltime 14103.832 | +Transformer | epoch 0 | step 53460 |avg loss 7.523 |avg tokens 2275.000 |tokens/s 8684.980 |walltime 14106.452 | +Transformer | epoch 0 | step 53470 |avg loss 7.584 |avg tokens 2147.100 |tokens/s 8287.564 |walltime 14109.042 | +Transformer | epoch 0 | step 53480 |avg loss 7.965 |avg tokens 2322.100 |tokens/s 8486.712 |walltime 14111.779 | +Transformer | epoch 0 | step 53490 |avg loss 7.682 |avg tokens 2102.600 |tokens/s 7870.052 |walltime 14114.450 | +Transformer | epoch 0 | step 53500 |avg loss 7.600 |avg tokens 2238.300 |tokens/s 8438.144 |walltime 14117.103 | +Transformer | epoch 0 | step 53510 |avg loss 7.988 |avg tokens 2438.600 |tokens/s 9079.226 |walltime 14119.789 | +Transformer | epoch 0 | step 53520 |avg loss 7.947 |avg tokens 1944.500 |tokens/s 8093.259 |walltime 14122.191 | +Transformer | epoch 0 | step 53530 |avg loss 8.063 |avg tokens 1883.200 |tokens/s 7476.570 |walltime 14124.710 | +Transformer | epoch 0 | step 53540 |avg loss 7.469 |avg tokens 2391.200 |tokens/s 8496.054 |walltime 14127.525 | +Transformer | epoch 0 | step 53550 |avg loss 7.941 |avg tokens 2015.300 |tokens/s 7865.751 |walltime 14130.087 | +Transformer | epoch 0 | step 53560 |avg loss 7.767 |avg tokens 2231.400 |tokens/s 8150.257 |walltime 14132.825 | +Transformer | epoch 0 | step 53570 |avg loss 7.751 |avg tokens 2064.100 |tokens/s 8347.919 |walltime 14135.297 | +Transformer | epoch 0 | step 53580 |avg loss 7.965 |avg tokens 2195.200 |tokens/s 8365.133 |walltime 14137.921 | +Transformer | epoch 0 | step 53590 |avg loss 7.846 |avg tokens 2062.000 |tokens/s 7717.830 |walltime 14140.593 | +Transformer | epoch 0 | step 53600 |avg loss 7.984 |avg tokens 2357.900 |tokens/s 8618.735 |walltime 14143.329 | +Transformer | epoch 0 | step 53610 |avg loss 7.729 |avg tokens 2164.700 |tokens/s 8237.175 |walltime 14145.957 | +Transformer | epoch 0 | step 53620 |avg loss 7.548 |avg tokens 2016.300 |tokens/s 7836.665 |walltime 14148.530 | +Transformer | epoch 0 | step 53630 |avg loss 7.746 |avg tokens 2279.400 |tokens/s 8461.210 |walltime 14151.224 | +Transformer | epoch 0 | step 53640 |avg loss 7.886 |avg tokens 2120.300 |tokens/s 8145.463 |walltime 14153.827 | +Transformer | epoch 0 | step 53650 |avg loss 7.803 |avg tokens 2187.900 |tokens/s 8523.463 |walltime 14156.394 | +Transformer | epoch 0 | step 53660 |avg loss 7.908 |avg tokens 2175.400 |tokens/s 8659.978 |walltime 14158.906 | +Transformer | epoch 0 | step 53670 |avg loss 7.683 |avg tokens 2358.700 |tokens/s 8696.291 |walltime 14161.618 | +Transformer | epoch 0 | step 53680 |avg loss 7.771 |avg tokens 2282.300 |tokens/s 8419.562 |walltime 14164.329 | +Transformer | epoch 0 | step 53690 |avg loss 7.652 |avg tokens 2196.200 |tokens/s 8327.476 |walltime 14166.966 | +Transformer | epoch 0 | step 53700 |avg loss 7.594 |avg tokens 1908.500 |tokens/s 7640.500 |walltime 14169.464 | +Transformer | epoch 0 | step 53710 |avg loss 7.985 |avg tokens 2354.500 |tokens/s 8942.595 |walltime 14172.097 | +Transformer | epoch 0 | step 53720 |avg loss 7.805 |avg tokens 2252.800 |tokens/s 8649.423 |walltime 14174.701 | +Transformer | epoch 0 | step 53730 |avg loss 7.739 |avg tokens 2203.800 |tokens/s 8411.379 |walltime 14177.321 | +Transformer | epoch 0 | step 53740 |avg loss 7.808 |avg tokens 2315.500 |tokens/s 8951.448 |walltime 14179.908 | +Transformer | epoch 0 | step 53750 |avg loss 7.588 |avg tokens 2247.200 |tokens/s 8277.456 |walltime 14182.623 | +Transformer | epoch 0 | step 53760 |avg loss 7.557 |avg tokens 2156.000 |tokens/s 7992.938 |walltime 14185.320 | +Transformer | epoch 0 | step 53770 |avg loss 7.693 |avg tokens 2362.100 |tokens/s 8701.512 |walltime 14188.035 | +Transformer | epoch 0 | step 53780 |avg loss 7.521 |avg tokens 2279.200 |tokens/s 8242.453 |walltime 14190.800 | +Transformer | epoch 0 | step 53790 |avg loss 7.812 |avg tokens 2125.700 |tokens/s 8265.098 |walltime 14193.372 | +Transformer | epoch 0 | step 53800 |avg loss 7.750 |avg tokens 2215.200 |tokens/s 8284.475 |walltime 14196.046 | +Transformer | epoch 0 | step 53810 |avg loss 7.728 |avg tokens 2149.900 |tokens/s 7989.965 |walltime 14198.737 | +Transformer | epoch 0 | step 53820 |avg loss 7.717 |avg tokens 2116.700 |tokens/s 8328.397 |walltime 14201.278 | +Transformer | epoch 0 | step 53830 |avg loss 8.030 |avg tokens 2336.700 |tokens/s 8943.260 |walltime 14203.891 | +Transformer | epoch 0 | step 53840 |avg loss 7.778 |avg tokens 2184.000 |tokens/s 8518.401 |walltime 14206.455 | +Transformer | epoch 0 | step 53850 |avg loss 7.842 |avg tokens 2218.500 |tokens/s 8402.131 |walltime 14209.095 | +Transformer | epoch 0 | step 53860 |avg loss 7.474 |avg tokens 2244.100 |tokens/s 8273.253 |walltime 14211.808 | +Transformer | epoch 0 | step 53870 |avg loss 7.915 |avg tokens 1989.300 |tokens/s 7918.093 |walltime 14214.320 | +Transformer | epoch 0 | step 53880 |avg loss 7.675 |avg tokens 2314.500 |tokens/s 8385.856 |walltime 14217.080 | +Transformer | epoch 0 | step 53890 |avg loss 7.940 |avg tokens 2246.700 |tokens/s 8771.957 |walltime 14219.641 | +Transformer | epoch 0 | step 53900 |avg loss 7.245 |avg tokens 2149.200 |tokens/s 7996.162 |walltime 14222.329 | +Transformer | epoch 0 | step 53910 |avg loss 7.946 |avg tokens 1911.700 |tokens/s 7617.724 |walltime 14224.839 | +Transformer | epoch 0 | step 53920 |avg loss 7.933 |avg tokens 1983.100 |tokens/s 7919.770 |walltime 14227.343 | +Transformer | epoch 0 | step 53930 |avg loss 7.549 |avg tokens 2262.100 |tokens/s 8335.885 |walltime 14230.056 | +Transformer | epoch 0 | step 53940 |avg loss 7.761 |avg tokens 2224.200 |tokens/s 8509.703 |walltime 14232.670 | +Transformer | epoch 0 | step 53950 |avg loss 7.787 |avg tokens 2150.400 |tokens/s 8284.113 |walltime 14235.266 | +Transformer | epoch 0 | step 53960 |avg loss 7.716 |avg tokens 2293.600 |tokens/s 8429.817 |walltime 14237.987 | +Transformer | epoch 0 | step 53970 |avg loss 7.581 |avg tokens 2091.900 |tokens/s 7812.283 |walltime 14240.665 | +Transformer | epoch 0 | step 53980 |avg loss 7.420 |avg tokens 2212.800 |tokens/s 8090.739 |walltime 14243.400 | +Transformer | epoch 0 | step 53990 |avg loss 7.708 |avg tokens 2202.400 |tokens/s 8288.999 |walltime 14246.057 | +Transformer | epoch 0 | step 54000 |avg loss 7.814 |avg tokens 2190.300 |tokens/s 8208.237 |walltime 14248.725 | +Transformer | epoch 0 | step 54010 |avg loss 7.921 |avg tokens 2068.500 |tokens/s 8140.201 |walltime 14251.266 | +Transformer | epoch 0 | step 54020 |avg loss 7.718 |avg tokens 2266.200 |tokens/s 8296.225 |walltime 14253.998 | +Transformer | epoch 0 | step 54030 |avg loss 7.840 |avg tokens 2169.300 |tokens/s 8332.267 |walltime 14256.601 | +Transformer | epoch 0 | step 54040 |avg loss 7.856 |avg tokens 2224.200 |tokens/s 8581.987 |walltime 14259.193 | +Transformer | epoch 0 | step 54050 |avg loss 8.040 |avg tokens 2161.600 |tokens/s 8509.379 |walltime 14261.733 | +Transformer | epoch 0 | step 54060 |avg loss 7.838 |avg tokens 2213.200 |tokens/s 8603.877 |walltime 14264.305 | +Transformer | epoch 0 | step 54070 |avg loss 7.775 |avg tokens 2284.200 |tokens/s 8752.396 |walltime 14266.915 | +Transformer | epoch 0 | step 54080 |avg loss 7.793 |avg tokens 2160.300 |tokens/s 8148.469 |walltime 14269.566 | +Transformer | epoch 0 | step 54090 |avg loss 7.405 |avg tokens 2368.000 |tokens/s 8348.142 |walltime 14272.403 | +Transformer | epoch 0 | step 54100 |avg loss 7.411 |avg tokens 2325.300 |tokens/s 8299.681 |walltime 14275.205 | +Transformer | epoch 0 | step 54110 |avg loss 7.776 |avg tokens 1942.400 |tokens/s 7732.318 |walltime 14277.717 | +Transformer | epoch 0 | step 54120 |avg loss 8.133 |avg tokens 2223.700 |tokens/s 8712.734 |walltime 14280.269 | +Transformer | epoch 0 | step 54130 |avg loss 7.324 |avg tokens 2050.000 |tokens/s 7923.864 |walltime 14282.856 | +Transformer | epoch 0 | step 54140 |avg loss 7.521 |avg tokens 2231.200 |tokens/s 8102.146 |walltime 14285.610 | +Transformer | epoch 0 | step 54150 |avg loss 8.023 |avg tokens 1982.200 |tokens/s 7743.372 |walltime 14288.170 | +Transformer | epoch 0 | step 54160 |avg loss 8.093 |avg tokens 2076.400 |tokens/s 7954.189 |walltime 14290.780 | +Transformer | epoch 0 | step 54170 |avg loss 7.848 |avg tokens 1967.200 |tokens/s 7649.152 |walltime 14293.352 | +Transformer | epoch 0 | step 54180 |avg loss 7.878 |avg tokens 2240.400 |tokens/s 8870.275 |walltime 14295.878 | +Transformer | epoch 0 | step 54190 |avg loss 7.637 |avg tokens 2288.000 |tokens/s 8453.955 |walltime 14298.584 | +Transformer | epoch 0 | step 54200 |avg loss 7.793 |avg tokens 2301.600 |tokens/s 8750.900 |walltime 14301.214 | +Transformer | epoch 0 | step 54210 |avg loss 7.888 |avg tokens 2196.400 |tokens/s 8386.790 |walltime 14303.833 | +Transformer | epoch 0 | step 54220 |avg loss 7.775 |avg tokens 1920.800 |tokens/s 7628.649 |walltime 14306.351 | +Transformer | epoch 0 | step 54230 |avg loss 7.650 |avg tokens 2362.000 |tokens/s 8592.218 |walltime 14309.100 | +Transformer | epoch 0 | step 54240 |avg loss 7.701 |avg tokens 2266.200 |tokens/s 8469.417 |walltime 14311.776 | +Transformer | epoch 0 | step 54250 |avg loss 8.356 |avg tokens 1868.800 |tokens/s 7657.217 |walltime 14314.216 | +Transformer | epoch 0 | step 54260 |avg loss 7.959 |avg tokens 2123.000 |tokens/s 8342.214 |walltime 14316.761 | +Transformer | epoch 0 | step 54270 |avg loss 8.073 |avg tokens 2008.200 |tokens/s 8119.527 |walltime 14319.235 | +Transformer | epoch 0 | step 54280 |avg loss 7.851 |avg tokens 2032.500 |tokens/s 7663.829 |walltime 14321.887 | +Transformer | epoch 0 | step 54290 |avg loss 7.925 |avg tokens 2226.500 |tokens/s 8419.939 |walltime 14324.531 | +Transformer | epoch 0 | step 54300 |avg loss 7.841 |avg tokens 2117.100 |tokens/s 8265.195 |walltime 14327.093 | +Transformer | epoch 0 | step 54310 |avg loss 7.668 |avg tokens 2181.600 |tokens/s 8209.165 |walltime 14329.750 | +Transformer | epoch 0 | step 54320 |avg loss 7.539 |avg tokens 2099.800 |tokens/s 7912.117 |walltime 14332.404 | +Transformer | epoch 0 | step 54330 |avg loss 7.248 |avg tokens 2436.800 |tokens/s 8704.598 |walltime 14335.203 | +Transformer | epoch 0 | step 54340 |avg loss 7.448 |avg tokens 2324.000 |tokens/s 8512.912 |walltime 14337.933 | +Transformer | epoch 0 | step 54350 |avg loss 7.663 |avg tokens 2208.600 |tokens/s 8276.116 |walltime 14340.602 | +Transformer | epoch 0 | step 54360 |avg loss 7.527 |avg tokens 2098.300 |tokens/s 7893.672 |walltime 14343.260 | +Transformer | epoch 0 | step 54370 |avg loss 7.298 |avg tokens 2350.400 |tokens/s 8581.601 |walltime 14345.999 | +Transformer | epoch 0 | step 54380 |avg loss 7.746 |avg tokens 2182.500 |tokens/s 8233.361 |walltime 14348.650 | +Transformer | epoch 0 | step 54390 |avg loss 7.617 |avg tokens 2198.600 |tokens/s 8065.877 |walltime 14351.376 | +Transformer | epoch 0 | step 54400 |avg loss 7.797 |avg tokens 2336.500 |tokens/s 8821.240 |walltime 14354.024 | +Transformer | epoch 0 | step 54410 |avg loss 7.740 |avg tokens 2128.200 |tokens/s 8074.471 |walltime 14356.660 | +Transformer | epoch 0 | step 54420 |avg loss 7.799 |avg tokens 2290.600 |tokens/s 8813.826 |walltime 14359.259 | +Transformer | epoch 0 | step 54430 |avg loss 7.555 |avg tokens 2301.200 |tokens/s 8477.743 |walltime 14361.973 | +Transformer | epoch 0 | step 54440 |avg loss 7.747 |avg tokens 2281.400 |tokens/s 8590.353 |walltime 14364.629 | +Transformer | epoch 0 | step 54450 |avg loss 7.883 |avg tokens 2102.900 |tokens/s 8117.579 |walltime 14367.220 | +Transformer | epoch 0 | step 54460 |avg loss 7.969 |avg tokens 2022.500 |tokens/s 8049.443 |walltime 14369.732 | +Transformer | epoch 0 | step 54470 |avg loss 7.745 |avg tokens 2105.000 |tokens/s 8094.896 |walltime 14372.333 | +Transformer | epoch 0 | step 54480 |avg loss 7.886 |avg tokens 2283.600 |tokens/s 8695.458 |walltime 14374.959 | +Transformer | epoch 0 | step 54490 |avg loss 7.459 |avg tokens 2352.800 |tokens/s 8578.752 |walltime 14377.702 | +Transformer | epoch 0 | step 54500 |avg loss 7.831 |avg tokens 2099.800 |tokens/s 7795.116 |walltime 14380.395 | +Transformer | epoch 0 | step 54510 |avg loss 7.656 |avg tokens 2150.700 |tokens/s 8267.841 |walltime 14382.997 | +Transformer | epoch 0 | step 54520 |avg loss 7.860 |avg tokens 2197.300 |tokens/s 8287.832 |walltime 14385.648 | +Transformer | epoch 0 | step 54530 |avg loss 7.840 |avg tokens 2171.300 |tokens/s 8570.419 |walltime 14388.181 | +Transformer | epoch 0 | step 54540 |avg loss 8.120 |avg tokens 2173.100 |tokens/s 8944.595 |walltime 14390.611 | +Transformer | epoch 0 | step 54550 |avg loss 8.117 |avg tokens 2147.100 |tokens/s 8559.326 |walltime 14393.119 | +Transformer | epoch 0 | step 54560 |avg loss 7.745 |avg tokens 2269.900 |tokens/s 8389.362 |walltime 14395.825 | +Transformer | epoch 0 | step 54570 |avg loss 7.335 |avg tokens 2294.400 |tokens/s 8323.892 |walltime 14398.581 | +Transformer | epoch 0 | step 54580 |avg loss 7.718 |avg tokens 2247.200 |tokens/s 8380.533 |walltime 14401.263 | +Transformer | epoch 0 | step 54590 |avg loss 7.961 |avg tokens 2149.300 |tokens/s 8309.380 |walltime 14403.850 | +Transformer | epoch 0 | step 54600 |avg loss 7.789 |avg tokens 2010.600 |tokens/s 7765.933 |walltime 14406.439 | +Transformer | epoch 0 | step 54610 |avg loss 7.522 |avg tokens 2223.300 |tokens/s 8232.283 |walltime 14409.139 | +Transformer | epoch 0 | step 54620 |avg loss 8.033 |avg tokens 2301.900 |tokens/s 9154.182 |walltime 14411.654 | +Transformer | epoch 0 | step 54630 |avg loss 7.615 |avg tokens 2232.200 |tokens/s 8310.569 |walltime 14414.340 | +Transformer | epoch 0 | step 54640 |avg loss 7.369 |avg tokens 2243.700 |tokens/s 8361.049 |walltime 14417.023 | +Transformer | epoch 0 | step 54650 |avg loss 7.469 |avg tokens 2266.600 |tokens/s 8459.096 |walltime 14419.703 | +Transformer | epoch 0 | step 54660 |avg loss 7.804 |avg tokens 2260.300 |tokens/s 8674.321 |walltime 14422.309 | +Transformer | epoch 0 | step 54670 |avg loss 7.626 |avg tokens 2030.600 |tokens/s 7893.168 |walltime 14424.881 | +Transformer | epoch 0 | step 54680 |avg loss 7.799 |avg tokens 2129.700 |tokens/s 8209.963 |walltime 14427.475 | +Transformer | epoch 0 | step 54690 |avg loss 7.768 |avg tokens 1883.900 |tokens/s 7654.140 |walltime 14429.936 | +Transformer | epoch 0 | step 54700 |avg loss 7.771 |avg tokens 2114.600 |tokens/s 8024.164 |walltime 14432.572 | +Transformer | epoch 0 | step 54710 |avg loss 7.591 |avg tokens 2112.000 |tokens/s 7895.923 |walltime 14435.247 | +Transformer | epoch 0 | step 54720 |avg loss 7.775 |avg tokens 1875.300 |tokens/s 7383.291 |walltime 14437.787 | +Transformer | epoch 0 | step 54730 |avg loss 7.738 |avg tokens 2154.500 |tokens/s 8073.946 |walltime 14440.455 | +Transformer | epoch 0 | step 54740 |avg loss 8.136 |avg tokens 1789.700 |tokens/s 7941.100 |walltime 14442.709 | +Transformer | epoch 0 | step 54750 |avg loss 7.858 |avg tokens 2181.100 |tokens/s 8197.965 |walltime 14445.369 | +Transformer | epoch 0 | step 54760 |avg loss 7.621 |avg tokens 2217.800 |tokens/s 8297.667 |walltime 14448.042 | +Transformer | epoch 0 | step 54770 |avg loss 7.933 |avg tokens 1783.700 |tokens/s 7383.039 |walltime 14450.458 | +Transformer | epoch 0 | step 54780 |avg loss 7.733 |avg tokens 2010.000 |tokens/s 8001.748 |walltime 14452.970 | +Transformer | epoch 0 | step 54790 |avg loss 7.835 |avg tokens 2089.700 |tokens/s 7862.873 |walltime 14455.628 | +Transformer | epoch 0 | step 54800 |avg loss 7.654 |avg tokens 2238.000 |tokens/s 8419.095 |walltime 14458.286 | +Transformer | epoch 0 | step 54810 |avg loss 7.511 |avg tokens 2307.200 |tokens/s 8223.856 |walltime 14461.091 | +Transformer | epoch 0 | step 54820 |avg loss 7.737 |avg tokens 2176.600 |tokens/s 8184.303 |walltime 14463.751 | +Transformer | epoch 0 | step 54830 |avg loss 7.623 |avg tokens 2333.600 |tokens/s 8600.773 |walltime 14466.464 | +Transformer | epoch 0 | step 54840 |avg loss 7.553 |avg tokens 2072.400 |tokens/s 8071.900 |walltime 14469.032 | +Transformer | epoch 0 | step 54850 |avg loss 7.967 |avg tokens 2236.400 |tokens/s 8534.531 |walltime 14471.652 | +Transformer | epoch 0 | step 54860 |avg loss 7.739 |avg tokens 2214.800 |tokens/s 8099.402 |walltime 14474.386 | +Transformer | epoch 0 | step 54870 |avg loss 7.491 |avg tokens 2096.700 |tokens/s 7934.959 |walltime 14477.029 | +Transformer | epoch 0 | step 54880 |avg loss 7.690 |avg tokens 2274.000 |tokens/s 8695.415 |walltime 14479.644 | +Transformer | epoch 0 | step 54890 |avg loss 7.811 |avg tokens 2225.500 |tokens/s 8668.851 |walltime 14482.211 | +Transformer | epoch 0 | step 54900 |avg loss 7.791 |avg tokens 2262.600 |tokens/s 8626.485 |walltime 14484.834 | +Transformer | epoch 0 | step 54910 |avg loss 7.449 |avg tokens 2024.800 |tokens/s 7734.727 |walltime 14487.452 | +Transformer | epoch 0 | step 54920 |avg loss 7.385 |avg tokens 2283.200 |tokens/s 8229.751 |walltime 14490.226 | +Transformer | epoch 0 | step 54930 |avg loss 7.538 |avg tokens 2109.100 |tokens/s 7944.926 |walltime 14492.881 | +Transformer | epoch 0 | step 54940 |avg loss 7.604 |avg tokens 2326.300 |tokens/s 8849.251 |walltime 14495.510 | +Transformer | epoch 0 | step 54950 |avg loss 7.635 |avg tokens 2183.900 |tokens/s 8176.815 |walltime 14498.181 | +Transformer | epoch 0 | step 54960 |avg loss 7.490 |avg tokens 2258.600 |tokens/s 8174.719 |walltime 14500.943 | +Transformer | epoch 0 | step 54970 |avg loss 7.472 |avg tokens 2239.000 |tokens/s 8376.241 |walltime 14503.617 | +Transformer | epoch 0 | step 54980 |avg loss 7.951 |avg tokens 2216.400 |tokens/s 8552.402 |walltime 14506.208 | +Transformer | epoch 0 | step 54990 |avg loss 7.689 |avg tokens 2351.700 |tokens/s 8602.627 |walltime 14508.942 | +Transformer | epoch 0 | step 55000 |avg loss 7.854 |avg tokens 2094.400 |tokens/s 7953.747 |walltime 14511.575 | +Transformer | epoch 0 | step 55010 |avg loss 8.067 |avg tokens 2204.900 |tokens/s 8188.612 |walltime 14514.268 | +Transformer | epoch 0 | step 55020 |avg loss 7.729 |avg tokens 2323.100 |tokens/s 8476.956 |walltime 14517.008 | +Transformer | epoch 0 | step 55030 |avg loss 7.531 |avg tokens 2294.400 |tokens/s 8186.412 |walltime 14519.811 | +Transformer | epoch 0 | step 55040 |avg loss 7.685 |avg tokens 1943.100 |tokens/s 7742.419 |walltime 14522.321 | +Transformer | epoch 0 | step 55050 |avg loss 7.744 |avg tokens 2296.200 |tokens/s 8613.057 |walltime 14524.986 | +Transformer | epoch 0 | step 55060 |avg loss 7.801 |avg tokens 2034.600 |tokens/s 7886.992 |walltime 14527.566 | +Transformer | epoch 0 | step 55070 |avg loss 8.027 |avg tokens 2026.600 |tokens/s 7914.867 |walltime 14530.127 | +Transformer | epoch 0 | step 55080 |avg loss 7.735 |avg tokens 2219.200 |tokens/s 8307.520 |walltime 14532.798 | +Transformer | epoch 0 | step 55090 |avg loss 7.584 |avg tokens 1970.400 |tokens/s 7639.704 |walltime 14535.377 | +Transformer | epoch 0 | step 55100 |avg loss 7.681 |avg tokens 2301.500 |tokens/s 8562.735 |walltime 14538.065 | +Transformer | epoch 0 | step 55110 |avg loss 7.573 |avg tokens 2277.600 |tokens/s 8362.689 |walltime 14540.789 | +Transformer | epoch 0 | step 55120 |avg loss 7.628 |avg tokens 1909.100 |tokens/s 7447.188 |walltime 14543.352 | +Transformer | epoch 0 | step 55130 |avg loss 7.729 |avg tokens 2290.900 |tokens/s 8288.109 |walltime 14546.116 | +Transformer | epoch 0 | step 55140 |avg loss 7.815 |avg tokens 1969.100 |tokens/s 8081.547 |walltime 14548.553 | +Transformer | epoch 0 | step 55150 |avg loss 7.466 |avg tokens 2173.900 |tokens/s 8094.633 |walltime 14551.238 | +Transformer | epoch 0 | step 55160 |avg loss 7.721 |avg tokens 2130.900 |tokens/s 8319.829 |walltime 14553.799 | +Transformer | epoch 0 | step 55170 |avg loss 7.566 |avg tokens 2294.300 |tokens/s 8414.895 |walltime 14556.526 | +Transformer | epoch 0 | step 55180 |avg loss 7.824 |avg tokens 2239.200 |tokens/s 8314.886 |walltime 14559.219 | +Transformer | epoch 0 | step 55190 |avg loss 8.050 |avg tokens 2181.600 |tokens/s 8807.032 |walltime 14561.696 | +Transformer | epoch 0 | step 55200 |avg loss 7.873 |avg tokens 2052.300 |tokens/s 8228.446 |walltime 14564.190 | +Transformer | epoch 0 | step 55210 |avg loss 7.918 |avg tokens 2083.500 |tokens/s 8224.873 |walltime 14566.723 | +Transformer | epoch 0 | step 55220 |avg loss 7.784 |avg tokens 2319.700 |tokens/s 8608.247 |walltime 14569.418 | +Transformer | epoch 0 | step 55230 |avg loss 7.491 |avg tokens 2243.400 |tokens/s 8326.438 |walltime 14572.112 | +Transformer | epoch 0 | step 55240 |avg loss 7.631 |avg tokens 2335.200 |tokens/s 8556.116 |walltime 14574.842 | +Transformer | epoch 0 | step 55250 |avg loss 7.524 |avg tokens 2250.500 |tokens/s 8283.728 |walltime 14577.559 | +Transformer | epoch 0 | step 55260 |avg loss 7.681 |avg tokens 2139.300 |tokens/s 8400.526 |walltime 14580.105 | +Transformer | epoch 0 | step 55270 |avg loss 7.988 |avg tokens 1985.000 |tokens/s 8100.057 |walltime 14582.556 | +Transformer | epoch 0 | step 55280 |avg loss 7.814 |avg tokens 2246.500 |tokens/s 8499.652 |walltime 14585.199 | +Transformer | epoch 0 | step 55290 |avg loss 7.789 |avg tokens 2086.700 |tokens/s 8139.473 |walltime 14587.763 | +Transformer | epoch 0 | step 55300 |avg loss 7.532 |avg tokens 2259.200 |tokens/s 8281.522 |walltime 14590.491 | +Transformer | epoch 0 | step 55310 |avg loss 7.489 |avg tokens 2320.800 |tokens/s 8460.289 |walltime 14593.234 | +Transformer | epoch 0 | step 55320 |avg loss 7.729 |avg tokens 2277.300 |tokens/s 8539.608 |walltime 14595.900 | +Transformer | epoch 0 | step 55330 |avg loss 8.077 |avg tokens 2087.100 |tokens/s 8409.571 |walltime 14598.382 | +Transformer | epoch 0 | step 55340 |avg loss 7.461 |avg tokens 2093.500 |tokens/s 7989.833 |walltime 14601.002 | +Transformer | epoch 0 | step 55350 |avg loss 7.476 |avg tokens 2294.500 |tokens/s 8287.803 |walltime 14603.771 | +Transformer | epoch 0 | step 55360 |avg loss 7.884 |avg tokens 2146.100 |tokens/s 8462.392 |walltime 14606.307 | +Transformer | epoch 0 | step 55370 |avg loss 7.758 |avg tokens 2032.200 |tokens/s 7888.143 |walltime 14608.883 | +Transformer | epoch 0 | step 55380 |avg loss 7.509 |avg tokens 2309.700 |tokens/s 8359.471 |walltime 14611.646 | +Transformer | epoch 0 | step 55390 |avg loss 7.623 |avg tokens 2055.400 |tokens/s 8268.169 |walltime 14614.132 | +Transformer | epoch 0 | step 55400 |avg loss 7.534 |avg tokens 2365.600 |tokens/s 8612.881 |walltime 14616.879 | +Transformer | epoch 0 | step 55410 |avg loss 7.583 |avg tokens 2313.000 |tokens/s 8519.069 |walltime 14619.594 | +Transformer | epoch 0 | step 55420 |avg loss 8.109 |avg tokens 1895.400 |tokens/s 8002.549 |walltime 14621.962 | +Transformer | epoch 0 | step 55430 |avg loss 7.798 |avg tokens 2195.400 |tokens/s 8094.118 |walltime 14624.675 | +Transformer | epoch 0 | step 55440 |avg loss 7.643 |avg tokens 2322.700 |tokens/s 8559.545 |walltime 14627.388 | +Transformer | epoch 0 | step 55450 |avg loss 7.669 |avg tokens 2081.700 |tokens/s 7764.374 |walltime 14630.069 | +Transformer | epoch 0 | step 55460 |avg loss 7.823 |avg tokens 2114.700 |tokens/s 8024.384 |walltime 14632.705 | +Transformer | epoch 0 | step 55470 |avg loss 7.260 |avg tokens 2202.300 |tokens/s 7999.436 |walltime 14635.458 | +Transformer | epoch 0 | step 55480 |avg loss 7.432 |avg tokens 2088.800 |tokens/s 7761.243 |walltime 14638.149 | +Transformer | epoch 0 | step 55490 |avg loss 8.085 |avg tokens 2299.000 |tokens/s 9050.452 |walltime 14640.689 | +Transformer | epoch 0 | step 55500 |avg loss 7.608 |avg tokens 2176.800 |tokens/s 8191.806 |walltime 14643.347 | +Transformer | epoch 0 | step 55510 |avg loss 8.096 |avg tokens 2313.000 |tokens/s 9121.343 |walltime 14645.882 | +Transformer | epoch 0 | step 55520 |avg loss 7.662 |avg tokens 2204.800 |tokens/s 8278.239 |walltime 14648.546 | +Transformer | epoch 0 | step 55530 |avg loss 7.846 |avg tokens 2032.200 |tokens/s 8030.400 |walltime 14651.077 | +Transformer | epoch 0 | step 55540 |avg loss 7.833 |avg tokens 2173.100 |tokens/s 8194.783 |walltime 14653.728 | +Transformer | epoch 0 | step 55550 |avg loss 7.864 |avg tokens 2073.000 |tokens/s 8286.258 |walltime 14656.230 | +Transformer | epoch 0 | step 55560 |avg loss 7.707 |avg tokens 2115.100 |tokens/s 8328.292 |walltime 14658.770 | +Transformer | epoch 0 | step 55570 |avg loss 7.773 |avg tokens 2141.500 |tokens/s 8399.811 |walltime 14661.319 | +Transformer | epoch 0 | step 55580 |avg loss 7.707 |avg tokens 2066.600 |tokens/s 7794.989 |walltime 14663.970 | +Transformer | epoch 0 | step 55590 |avg loss 7.603 |avg tokens 2112.200 |tokens/s 8074.680 |walltime 14666.586 | +Transformer | epoch 0 | step 55600 |avg loss 7.606 |avg tokens 2018.400 |tokens/s 8013.302 |walltime 14669.105 | +Transformer | epoch 0 | step 55610 |avg loss 7.661 |avg tokens 1875.800 |tokens/s 7507.578 |walltime 14671.604 | +Transformer | epoch 0 | step 55620 |avg loss 7.333 |avg tokens 2320.800 |tokens/s 8346.433 |walltime 14674.384 | +Transformer | epoch 0 | step 55630 |avg loss 7.390 |avg tokens 2204.800 |tokens/s 8012.276 |walltime 14677.136 | +Transformer | epoch 0 | step 55640 |avg loss 7.755 |avg tokens 2165.600 |tokens/s 8535.369 |walltime 14679.673 | +Transformer | epoch 0 | step 55650 |avg loss 7.280 |avg tokens 2320.000 |tokens/s 8391.468 |walltime 14682.438 | +Transformer | epoch 0 | step 55660 |avg loss 7.571 |avg tokens 2301.700 |tokens/s 8330.707 |walltime 14685.201 | +Transformer | epoch 0 | step 55670 |avg loss 7.964 |avg tokens 2134.200 |tokens/s 8437.371 |walltime 14687.730 | +Transformer | epoch 0 | step 55680 |avg loss 7.643 |avg tokens 2253.600 |tokens/s 8379.529 |walltime 14690.420 | +Transformer | epoch 0 | step 55690 |avg loss 7.726 |avg tokens 2085.400 |tokens/s 7951.612 |walltime 14693.042 | +Transformer | epoch 0 | step 55700 |avg loss 7.357 |avg tokens 2351.100 |tokens/s 8666.858 |walltime 14695.755 | +Transformer | epoch 0 | step 55710 |avg loss 7.727 |avg tokens 1998.600 |tokens/s 7833.544 |walltime 14698.306 | +Transformer | epoch 0 | step 55720 |avg loss 7.591 |avg tokens 1888.800 |tokens/s 7307.153 |walltime 14700.891 | +Transformer | epoch 0 | step 55730 |avg loss 7.484 |avg tokens 2072.900 |tokens/s 7944.960 |walltime 14703.500 | +Transformer | epoch 0 | step 55740 |avg loss 7.627 |avg tokens 2225.600 |tokens/s 8294.326 |walltime 14706.184 | +Transformer | epoch 0 | step 55750 |avg loss 7.689 |avg tokens 2195.900 |tokens/s 8070.280 |walltime 14708.905 | +Transformer | epoch 0 | step 55760 |avg loss 8.116 |avg tokens 2021.100 |tokens/s 8087.596 |walltime 14711.404 | +Transformer | epoch 0 | step 55770 |avg loss 8.167 |avg tokens 1890.900 |tokens/s 7964.944 |walltime 14713.778 | +Transformer | epoch 0 | step 55780 |avg loss 7.741 |avg tokens 2184.000 |tokens/s 8116.881 |walltime 14716.468 | +Transformer | epoch 0 | step 55790 |avg loss 7.774 |avg tokens 2143.200 |tokens/s 8064.379 |walltime 14719.126 | +Transformer | epoch 0 | step 55800 |avg loss 7.721 |avg tokens 2195.500 |tokens/s 8539.474 |walltime 14721.697 | +Transformer | epoch 0 | step 55810 |avg loss 7.563 |avg tokens 2176.100 |tokens/s 8184.644 |walltime 14724.356 | +Transformer | epoch 0 | step 55820 |avg loss 7.637 |avg tokens 2371.000 |tokens/s 8709.445 |walltime 14727.078 | +Transformer | epoch 0 | step 55830 |avg loss 7.694 |avg tokens 2281.700 |tokens/s 8414.929 |walltime 14729.790 | +Transformer | epoch 0 | step 55840 |avg loss 7.406 |avg tokens 2210.400 |tokens/s 8133.000 |walltime 14732.507 | +Transformer | epoch 0 | step 55850 |avg loss 7.930 |avg tokens 2159.500 |tokens/s 8209.370 |walltime 14735.138 | +Transformer | epoch 0 | step 55860 |avg loss 8.210 |avg tokens 2042.200 |tokens/s 8500.684 |walltime 14737.540 | +Transformer | epoch 0 | step 55870 |avg loss 8.002 |avg tokens 2276.500 |tokens/s 8672.939 |walltime 14740.165 | +Transformer | epoch 0 | step 55880 |avg loss 8.042 |avg tokens 2106.100 |tokens/s 7953.506 |walltime 14742.813 | +Transformer | epoch 0 | step 55890 |avg loss 7.882 |avg tokens 2105.800 |tokens/s 8475.346 |walltime 14745.298 | +Transformer | epoch 0 | step 55900 |avg loss 7.682 |avg tokens 2253.900 |tokens/s 8365.213 |walltime 14747.992 | +Transformer | epoch 0 | step 55910 |avg loss 7.529 |avg tokens 2200.400 |tokens/s 8087.027 |walltime 14750.713 | +Transformer | epoch 0 | step 55920 |avg loss 7.554 |avg tokens 2357.600 |tokens/s 8505.942 |walltime 14753.485 | +Transformer | epoch 0 | step 55930 |avg loss 7.271 |avg tokens 2182.800 |tokens/s 8152.046 |walltime 14756.162 | +Transformer | epoch 0 | step 55940 |avg loss 7.668 |avg tokens 2284.100 |tokens/s 8342.563 |walltime 14758.900 | +Transformer | epoch 0 | step 55950 |avg loss 8.208 |avg tokens 2131.800 |tokens/s 8489.455 |walltime 14761.411 | +Transformer | epoch 0 | step 55960 |avg loss 7.949 |avg tokens 2149.300 |tokens/s 8613.512 |walltime 14763.907 | +Transformer | epoch 0 | step 55970 |avg loss 7.831 |avg tokens 2140.000 |tokens/s 8078.474 |walltime 14766.556 | +Transformer | epoch 0 | step 55980 |avg loss 7.492 |avg tokens 2365.600 |tokens/s 8639.664 |walltime 14769.294 | +Transformer | epoch 0 | step 55990 |avg loss 7.974 |avg tokens 2335.200 |tokens/s 8762.401 |walltime 14771.959 | +Transformer | epoch 0 | step 56000 |avg loss 7.836 |avg tokens 2089.800 |tokens/s 8499.605 |walltime 14774.417 | +Transformer | epoch 0 | step 56010 |avg loss 7.719 |avg tokens 2154.400 |tokens/s 8093.223 |walltime 14777.079 | +Transformer | epoch 0 | step 56020 |avg loss 7.710 |avg tokens 2183.700 |tokens/s 8336.057 |walltime 14779.699 | +Transformer | epoch 0 | step 56030 |avg loss 7.982 |avg tokens 2230.400 |tokens/s 8648.541 |walltime 14782.278 | +Transformer | epoch 0 | step 56040 |avg loss 7.704 |avg tokens 2371.500 |tokens/s 8615.369 |walltime 14785.031 | +Transformer | epoch 0 | step 56050 |avg loss 7.650 |avg tokens 2285.700 |tokens/s 8537.552 |walltime 14787.708 | +Transformer | epoch 0 | step 56060 |avg loss 7.854 |avg tokens 2418.200 |tokens/s 8566.957 |walltime 14790.531 | +Transformer | epoch 0 | step 56070 |avg loss 7.541 |avg tokens 2046.300 |tokens/s 7731.869 |walltime 14793.177 | +Transformer | epoch 0 | step 56080 |avg loss 7.640 |avg tokens 2391.800 |tokens/s 8858.823 |walltime 14795.877 | +Transformer | epoch 0 | step 56090 |avg loss 7.540 |avg tokens 2272.800 |tokens/s 8382.696 |walltime 14798.588 | +Transformer | epoch 0 | step 56100 |avg loss 7.714 |avg tokens 2123.700 |tokens/s 8160.936 |walltime 14801.191 | +Transformer | epoch 0 | step 56110 |avg loss 7.961 |avg tokens 2136.800 |tokens/s 8104.024 |walltime 14803.827 | +Transformer | epoch 0 | step 56120 |avg loss 7.689 |avg tokens 2198.300 |tokens/s 8556.652 |walltime 14806.396 | +Transformer | epoch 0 | step 56130 |avg loss 7.693 |avg tokens 2219.600 |tokens/s 8389.773 |walltime 14809.042 | +Transformer | epoch 0 | step 56140 |avg loss 7.879 |avg tokens 2097.100 |tokens/s 8168.983 |walltime 14811.609 | +Transformer | epoch 0 | step 56150 |avg loss 7.536 |avg tokens 2278.400 |tokens/s 8240.829 |walltime 14814.374 | +Transformer | epoch 0 | step 56160 |avg loss 7.502 |avg tokens 2359.200 |tokens/s 8433.152 |walltime 14817.172 | +Transformer | epoch 0 | step 56170 |avg loss 8.046 |avg tokens 2138.400 |tokens/s 8095.083 |walltime 14819.813 | +Transformer | epoch 0 | step 56180 |avg loss 7.458 |avg tokens 2234.400 |tokens/s 8266.675 |walltime 14822.516 | +Transformer | epoch 0 | step 56190 |avg loss 7.786 |avg tokens 2115.400 |tokens/s 8093.091 |walltime 14825.130 | +Transformer | epoch 0 | step 56200 |avg loss 7.505 |avg tokens 2332.100 |tokens/s 8474.520 |walltime 14827.882 | +Transformer | epoch 0 | step 56210 |avg loss 7.723 |avg tokens 2149.600 |tokens/s 8104.760 |walltime 14830.534 | +Transformer | epoch 0 | step 56220 |avg loss 8.099 |avg tokens 2101.000 |tokens/s 8611.036 |walltime 14832.974 | +Transformer | epoch 0 | step 56230 |avg loss 7.858 |avg tokens 1831.700 |tokens/s 7746.342 |walltime 14835.339 | +Transformer | epoch 0 | step 56240 |avg loss 7.797 |avg tokens 2069.700 |tokens/s 8172.794 |walltime 14837.871 | +Transformer | epoch 0 | step 56250 |avg loss 7.717 |avg tokens 2348.400 |tokens/s 8829.358 |walltime 14840.531 | +Transformer | epoch 0 | step 56260 |avg loss 7.693 |avg tokens 1908.000 |tokens/s 7684.801 |walltime 14843.014 | +Transformer | epoch 0 | step 56270 |avg loss 7.874 |avg tokens 2243.000 |tokens/s 8284.592 |walltime 14845.721 | +Transformer | epoch 0 | step 56280 |avg loss 7.644 |avg tokens 2314.600 |tokens/s 8598.587 |walltime 14848.413 | +Transformer | epoch 0 | step 56290 |avg loss 7.849 |avg tokens 2014.300 |tokens/s 7874.354 |walltime 14850.971 | +Transformer | epoch 0 | step 56300 |avg loss 7.949 |avg tokens 2273.200 |tokens/s 8439.473 |walltime 14853.664 | +Transformer | epoch 0 | step 56310 |avg loss 7.796 |avg tokens 2080.600 |tokens/s 8217.032 |walltime 14856.197 | +Transformer | epoch 0 | step 56320 |avg loss 7.893 |avg tokens 2387.500 |tokens/s 8936.454 |walltime 14858.868 | +Transformer | epoch 0 | step 56330 |avg loss 7.986 |avg tokens 2010.200 |tokens/s 8299.408 |walltime 14861.290 | +Transformer | epoch 0 | step 56340 |avg loss 7.648 |avg tokens 2222.000 |tokens/s 8528.224 |walltime 14863.896 | +Transformer | epoch 0 | step 56350 |avg loss 7.672 |avg tokens 2412.800 |tokens/s 8866.314 |walltime 14866.617 | +Transformer | epoch 0 | step 56360 |avg loss 7.949 |avg tokens 2043.100 |tokens/s 7894.976 |walltime 14869.205 | +Transformer | epoch 0 | step 56370 |avg loss 7.841 |avg tokens 2112.300 |tokens/s 8190.909 |walltime 14871.784 | +Transformer | epoch 0 | step 56380 |avg loss 7.953 |avg tokens 2376.800 |tokens/s 8858.472 |walltime 14874.467 | +Transformer | epoch 0 | step 56390 |avg loss 7.327 |avg tokens 2309.600 |tokens/s 8606.773 |walltime 14877.150 | +Transformer | epoch 0 | step 56400 |avg loss 7.575 |avg tokens 2223.700 |tokens/s 8122.101 |walltime 14879.888 | +Transformer | epoch 0 | step 56410 |avg loss 7.873 |avg tokens 2161.200 |tokens/s 8452.129 |walltime 14882.445 | +Transformer | epoch 0 | step 56420 |avg loss 7.926 |avg tokens 2188.300 |tokens/s 8396.051 |walltime 14885.052 | +Transformer | epoch 0 | step 56430 |avg loss 7.802 |avg tokens 2111.400 |tokens/s 8197.684 |walltime 14887.627 | +Transformer | epoch 0 | step 56440 |avg loss 7.835 |avg tokens 2317.700 |tokens/s 8496.130 |walltime 14890.355 | +Transformer | epoch 0 | step 56450 |avg loss 7.418 |avg tokens 2326.200 |tokens/s 8514.969 |walltime 14893.087 | +Transformer | epoch 0 | step 56460 |avg loss 7.480 |avg tokens 2099.000 |tokens/s 7862.376 |walltime 14895.757 | +Transformer | epoch 0 | step 56470 |avg loss 7.754 |avg tokens 2317.000 |tokens/s 8525.544 |walltime 14898.474 | +Transformer | epoch 0 | step 56480 |avg loss 7.353 |avg tokens 2223.200 |tokens/s 8233.241 |walltime 14901.175 | +Transformer | epoch 0 | step 56490 |avg loss 7.732 |avg tokens 1994.300 |tokens/s 7931.301 |walltime 14903.689 | +Transformer | epoch 0 | step 56500 |avg loss 7.741 |avg tokens 2257.200 |tokens/s 8320.672 |walltime 14906.402 | +Transformer | epoch 0 | step 56510 |avg loss 7.713 |avg tokens 2260.000 |tokens/s 8374.011 |walltime 14909.101 | +Transformer | epoch 0 | step 56520 |avg loss 7.618 |avg tokens 2285.600 |tokens/s 8301.595 |walltime 14911.854 | +Transformer | epoch 0 | step 56530 |avg loss 7.654 |avg tokens 2041.900 |tokens/s 7786.650 |walltime 14914.476 | +Transformer | epoch 0 | step 56540 |avg loss 7.899 |avg tokens 2221.300 |tokens/s 8125.094 |walltime 14917.210 | +Transformer | epoch 0 | step 56550 |avg loss 7.829 |avg tokens 2021.600 |tokens/s 7691.250 |walltime 14919.839 | +Transformer | epoch 0 | step 56560 |avg loss 7.886 |avg tokens 2236.800 |tokens/s 8330.371 |walltime 14922.524 | +Transformer | epoch 0 | step 56570 |avg loss 7.930 |avg tokens 2116.900 |tokens/s 8278.296 |walltime 14925.081 | +Transformer | epoch 0 | step 56580 |avg loss 7.709 |avg tokens 2231.700 |tokens/s 8271.501 |walltime 14927.779 | +Transformer | epoch 0 | step 56590 |avg loss 7.794 |avg tokens 2194.200 |tokens/s 8279.949 |walltime 14930.429 | +Transformer | epoch 0 | step 56600 |avg loss 8.101 |avg tokens 2059.100 |tokens/s 8110.306 |walltime 14932.968 | +Transformer | epoch 0 | step 56610 |avg loss 7.568 |avg tokens 2192.200 |tokens/s 8154.055 |walltime 14935.656 | +Transformer | epoch 0 | step 56620 |avg loss 7.872 |avg tokens 1885.900 |tokens/s 7691.929 |walltime 14938.108 | +Transformer | epoch 0 | step 56630 |avg loss 7.887 |avg tokens 2044.900 |tokens/s 8230.941 |walltime 14940.592 | +Transformer | epoch 0 | step 56640 |avg loss 7.969 |avg tokens 2038.400 |tokens/s 8010.481 |walltime 14943.137 | +Transformer | epoch 0 | step 56650 |avg loss 7.557 |avg tokens 2109.800 |tokens/s 7836.290 |walltime 14945.830 | +Transformer | epoch 0 | step 56660 |avg loss 7.435 |avg tokens 2153.600 |tokens/s 8045.573 |walltime 14948.506 | +Transformer | epoch 0 | step 56670 |avg loss 7.930 |avg tokens 2110.800 |tokens/s 8221.424 |walltime 14951.074 | +Transformer | epoch 0 | step 56680 |avg loss 7.661 |avg tokens 2191.200 |tokens/s 8167.475 |walltime 14953.757 | +Transformer | epoch 0 | step 56690 |avg loss 7.891 |avg tokens 2030.600 |tokens/s 8023.522 |walltime 14956.287 | +Transformer | epoch 0 | step 56700 |avg loss 7.448 |avg tokens 2377.600 |tokens/s 8438.302 |walltime 14959.105 | +Transformer | epoch 0 | step 56710 |avg loss 8.030 |avg tokens 1947.200 |tokens/s 7609.563 |walltime 14961.664 | +Transformer | epoch 0 | step 56720 |avg loss 7.817 |avg tokens 2117.200 |tokens/s 8042.968 |walltime 14964.296 | +Transformer | epoch 0 | step 56730 |avg loss 7.803 |avg tokens 2253.600 |tokens/s 8546.446 |walltime 14966.933 | +Transformer | epoch 0 | step 56740 |avg loss 7.690 |avg tokens 2379.000 |tokens/s 8587.893 |walltime 14969.703 | +Transformer | epoch 0 | step 56750 |avg loss 7.934 |avg tokens 1785.800 |tokens/s 7364.480 |walltime 14972.128 | +Transformer | epoch 0 | step 56760 |avg loss 7.559 |avg tokens 2151.800 |tokens/s 8162.507 |walltime 14974.764 | +Transformer | epoch 0 | step 56770 |avg loss 7.849 |avg tokens 2315.200 |tokens/s 8663.248 |walltime 14977.437 | +Transformer | epoch 0 | step 56780 |avg loss 7.802 |avg tokens 2331.000 |tokens/s 8668.516 |walltime 14980.126 | +Transformer | epoch 0 | step 56790 |avg loss 7.916 |avg tokens 2093.500 |tokens/s 8067.525 |walltime 14982.721 | +Transformer | epoch 0 | step 56800 |avg loss 7.952 |avg tokens 2063.800 |tokens/s 8142.986 |walltime 14985.255 | +Transformer | epoch 0 | step 56810 |avg loss 7.904 |avg tokens 1992.900 |tokens/s 7794.763 |walltime 14987.812 | +Transformer | epoch 0 | step 56820 |avg loss 7.750 |avg tokens 2217.500 |tokens/s 8101.633 |walltime 14990.549 | +Transformer | epoch 0 | step 56830 |avg loss 7.894 |avg tokens 2046.700 |tokens/s 8069.320 |walltime 14993.086 | +Transformer | epoch 0 | step 56840 |avg loss 8.004 |avg tokens 2253.700 |tokens/s 8629.882 |walltime 14995.697 | +Transformer | epoch 0 | step 56850 |avg loss 8.001 |avg tokens 1888.100 |tokens/s 7559.151 |walltime 14998.195 | +Transformer | epoch 0 | step 56860 |avg loss 8.016 |avg tokens 2024.000 |tokens/s 7926.302 |walltime 15000.748 | +Transformer | epoch 0 | step 56870 |avg loss 7.217 |avg tokens 2243.200 |tokens/s 8356.499 |walltime 15003.433 | +Transformer | epoch 0 | step 56880 |avg loss 7.686 |avg tokens 2206.900 |tokens/s 8209.509 |walltime 15006.121 | +Transformer | epoch 0 | step 56890 |avg loss 7.981 |avg tokens 2252.900 |tokens/s 8789.518 |walltime 15008.684 | +Transformer | epoch 0 | step 56900 |avg loss 7.317 |avg tokens 2205.700 |tokens/s 8113.245 |walltime 15011.403 | +Transformer | epoch 0 | step 56910 |avg loss 7.755 |avg tokens 2335.200 |tokens/s 8694.242 |walltime 15014.089 | +Transformer | epoch 0 | step 56920 |avg loss 8.024 |avg tokens 2160.600 |tokens/s 8473.315 |walltime 15016.639 | +Transformer | epoch 0 | step 56930 |avg loss 7.772 |avg tokens 2389.600 |tokens/s 8637.573 |walltime 15019.405 | +Transformer | epoch 0 | step 56940 |avg loss 7.783 |avg tokens 2225.900 |tokens/s 7927.883 |walltime 15022.213 | +Transformer | epoch 0 | step 56950 |avg loss 8.052 |avg tokens 2288.200 |tokens/s 8852.830 |walltime 15024.798 | +Transformer | epoch 0 | step 56960 |avg loss 7.760 |avg tokens 2037.400 |tokens/s 7941.209 |walltime 15027.363 | +Transformer | epoch 0 | step 56970 |avg loss 7.616 |avg tokens 2308.600 |tokens/s 8582.199 |walltime 15030.053 | +Transformer | epoch 0 | step 56980 |avg loss 7.510 |avg tokens 2252.000 |tokens/s 8409.860 |walltime 15032.731 | +Transformer | epoch 0 | step 56990 |avg loss 7.759 |avg tokens 2260.800 |tokens/s 8731.991 |walltime 15035.320 | +Transformer | epoch 0 | step 57000 |avg loss 8.269 |avg tokens 2110.300 |tokens/s 8195.182 |walltime 15037.895 | +Transformer | epoch 0 | step 57010 |avg loss 7.732 |avg tokens 2195.700 |tokens/s 8235.339 |walltime 15040.561 | +Transformer | epoch 0 | step 57020 |avg loss 7.915 |avg tokens 2248.400 |tokens/s 8333.036 |walltime 15043.259 | +Transformer | epoch 0 | step 57030 |avg loss 7.623 |avg tokens 2287.400 |tokens/s 8550.362 |walltime 15045.935 | +Transformer | epoch 0 | step 57040 |avg loss 7.886 |avg tokens 2254.600 |tokens/s 8141.602 |walltime 15048.704 | +Transformer | epoch 0 | step 57050 |avg loss 7.855 |avg tokens 2184.100 |tokens/s 8330.270 |walltime 15051.326 | +Transformer | epoch 0 | step 57060 |avg loss 7.729 |avg tokens 2102.400 |tokens/s 8219.915 |walltime 15053.883 | +Transformer | epoch 0 | step 57070 |avg loss 7.684 |avg tokens 2254.900 |tokens/s 8265.760 |walltime 15056.612 | +Transformer | epoch 0 | step 57080 |avg loss 7.892 |avg tokens 1951.200 |tokens/s 8056.603 |walltime 15059.033 | +Transformer | epoch 0 | step 57090 |avg loss 7.783 |avg tokens 2146.400 |tokens/s 8087.895 |walltime 15061.687 | +Transformer | epoch 0 | step 57100 |avg loss 7.719 |avg tokens 2146.100 |tokens/s 8409.618 |walltime 15064.239 | +Transformer | epoch 0 | step 57110 |avg loss 7.763 |avg tokens 2181.900 |tokens/s 8217.195 |walltime 15066.894 | +Transformer | epoch 0 | step 57120 |avg loss 7.496 |avg tokens 2224.900 |tokens/s 8371.407 |walltime 15069.552 | +Transformer | epoch 0 | step 57130 |avg loss 7.949 |avg tokens 2144.000 |tokens/s 8117.590 |walltime 15072.193 | +Transformer | epoch 0 | step 57140 |avg loss 8.002 |avg tokens 1909.400 |tokens/s 7501.321 |walltime 15074.739 | +Transformer | epoch 0 | step 57150 |avg loss 7.344 |avg tokens 2291.200 |tokens/s 8509.997 |walltime 15077.431 | +Transformer | epoch 0 | step 57160 |avg loss 7.597 |avg tokens 2287.200 |tokens/s 8455.671 |walltime 15080.136 | +Transformer | epoch 0 | step 57170 |avg loss 7.914 |avg tokens 2331.900 |tokens/s 9034.906 |walltime 15082.717 | +Transformer | epoch 0 | step 57180 |avg loss 7.811 |avg tokens 2206.400 |tokens/s 8592.253 |walltime 15085.285 | +Transformer | epoch 0 | step 57190 |avg loss 7.601 |avg tokens 2233.000 |tokens/s 8487.499 |walltime 15087.916 | +Transformer | epoch 0 | step 57200 |avg loss 7.863 |avg tokens 2318.100 |tokens/s 8577.217 |walltime 15090.619 | +Transformer | epoch 0 | step 57210 |avg loss 7.659 |avg tokens 2008.600 |tokens/s 7867.084 |walltime 15093.172 | +Transformer | epoch 0 | step 57220 |avg loss 7.714 |avg tokens 2138.400 |tokens/s 7936.714 |walltime 15095.866 | +Transformer | epoch 0 | step 57230 |avg loss 7.469 |avg tokens 2217.800 |tokens/s 8109.110 |walltime 15098.601 | +Transformer | epoch 0 | step 57240 |avg loss 7.745 |avg tokens 2280.800 |tokens/s 8417.899 |walltime 15101.310 | +Transformer | epoch 0 | step 57250 |avg loss 8.066 |avg tokens 2107.400 |tokens/s 8474.840 |walltime 15103.797 | +Transformer | epoch 0 | step 57260 |avg loss 7.911 |avg tokens 1917.000 |tokens/s 7609.880 |walltime 15106.316 | +Transformer | epoch 0 | step 57270 |avg loss 7.844 |avg tokens 2037.200 |tokens/s 8149.419 |walltime 15108.816 | +Transformer | epoch 0 | step 57280 |avg loss 7.678 |avg tokens 2310.300 |tokens/s 8335.380 |walltime 15111.588 | +Transformer | epoch 0 | step 57290 |avg loss 7.580 |avg tokens 2145.600 |tokens/s 8042.132 |walltime 15114.256 | +Transformer | epoch 0 | step 57300 |avg loss 7.742 |avg tokens 2251.900 |tokens/s 8435.855 |walltime 15116.925 | +Transformer | epoch 0 | step 57310 |avg loss 7.600 |avg tokens 2288.000 |tokens/s 8526.896 |walltime 15119.608 | +Transformer | epoch 0 | step 57320 |avg loss 7.896 |avg tokens 2098.700 |tokens/s 8133.091 |walltime 15122.189 | +Transformer | epoch 0 | step 57330 |avg loss 7.828 |avg tokens 2073.000 |tokens/s 8299.344 |walltime 15124.687 | +Transformer | epoch 0 | step 57340 |avg loss 8.016 |avg tokens 2206.200 |tokens/s 8431.261 |walltime 15127.303 | +Transformer | epoch 0 | step 57350 |avg loss 7.973 |avg tokens 1955.300 |tokens/s 7709.212 |walltime 15129.840 | +Transformer | epoch 0 | step 57360 |avg loss 7.446 |avg tokens 2324.800 |tokens/s 8305.357 |walltime 15132.639 | +Transformer | epoch 0 | step 57370 |avg loss 7.438 |avg tokens 2238.900 |tokens/s 8524.217 |walltime 15135.265 | +Transformer | epoch 0 | step 57380 |avg loss 7.900 |avg tokens 2003.100 |tokens/s 8016.211 |walltime 15137.764 | +Transformer | epoch 0 | step 57390 |avg loss 7.397 |avg tokens 2292.400 |tokens/s 8364.768 |walltime 15140.505 | +Transformer | epoch 0 | step 57400 |avg loss 7.836 |avg tokens 2140.600 |tokens/s 8168.957 |walltime 15143.125 | +Transformer | epoch 0 | step 57410 |avg loss 7.418 |avg tokens 2306.800 |tokens/s 8362.694 |walltime 15145.884 | +Transformer | epoch 0 | step 57420 |avg loss 7.224 |avg tokens 2208.900 |tokens/s 8178.905 |walltime 15148.584 | +Transformer | epoch 0 | step 57430 |avg loss 7.892 |avg tokens 2003.400 |tokens/s 8025.814 |walltime 15151.081 | +Transformer | epoch 0 | step 57440 |avg loss 7.553 |avg tokens 1988.700 |tokens/s 7667.870 |walltime 15153.674 | +Transformer | epoch 0 | step 57450 |avg loss 7.718 |avg tokens 2276.700 |tokens/s 8827.294 |walltime 15156.253 | +Transformer | epoch 0 | step 57460 |avg loss 7.516 |avg tokens 2204.300 |tokens/s 8157.143 |walltime 15158.956 | +Transformer | epoch 0 | step 57470 |avg loss 7.967 |avg tokens 2205.800 |tokens/s 8481.402 |walltime 15161.556 | +Transformer | epoch 0 | step 57480 |avg loss 7.797 |avg tokens 2071.600 |tokens/s 7789.883 |walltime 15164.216 | +Transformer | epoch 0 | step 57490 |avg loss 7.570 |avg tokens 2090.400 |tokens/s 8178.455 |walltime 15166.772 | +Transformer | epoch 0 | step 57500 |avg loss 7.893 |avg tokens 2110.400 |tokens/s 8362.465 |walltime 15169.295 | +Transformer | epoch 0 | step 57510 |avg loss 7.805 |avg tokens 2127.100 |tokens/s 8424.989 |walltime 15171.820 | +Transformer | epoch 0 | step 57520 |avg loss 7.713 |avg tokens 2162.500 |tokens/s 8060.537 |walltime 15174.503 | +Transformer | epoch 0 | step 57530 |avg loss 7.835 |avg tokens 2066.000 |tokens/s 8139.386 |walltime 15177.041 | +Transformer | epoch 0 | step 57540 |avg loss 7.752 |avg tokens 2038.400 |tokens/s 7962.499 |walltime 15179.601 | +Transformer | epoch 0 | step 57550 |avg loss 7.855 |avg tokens 2050.700 |tokens/s 8036.920 |walltime 15182.153 | +Transformer | epoch 0 | step 57560 |avg loss 7.965 |avg tokens 2176.100 |tokens/s 8419.837 |walltime 15184.737 | +Transformer | epoch 0 | step 57570 |avg loss 7.653 |avg tokens 2265.800 |tokens/s 8185.461 |walltime 15187.505 | +Transformer | epoch 0 | step 57580 |avg loss 7.646 |avg tokens 2352.800 |tokens/s 8706.772 |walltime 15190.208 | +Transformer | epoch 0 | step 57590 |avg loss 7.647 |avg tokens 2008.100 |tokens/s 7559.851 |walltime 15192.864 | +Transformer | epoch 0 | step 57600 |avg loss 7.564 |avg tokens 2334.400 |tokens/s 8495.369 |walltime 15195.612 | +Transformer | epoch 0 | step 57610 |avg loss 7.738 |avg tokens 2400.000 |tokens/s 8659.538 |walltime 15198.383 | +Transformer | epoch 0 | step 57620 |avg loss 7.777 |avg tokens 2317.600 |tokens/s 8570.977 |walltime 15201.087 | +Transformer | epoch 0 | step 57630 |avg loss 7.283 |avg tokens 2198.000 |tokens/s 8253.815 |walltime 15203.750 | +Transformer | epoch 0 | step 57640 |avg loss 7.260 |avg tokens 2373.400 |tokens/s 8474.980 |walltime 15206.551 | +Transformer | epoch 0 | step 57650 |avg loss 7.700 |avg tokens 2195.900 |tokens/s 8260.258 |walltime 15209.209 | +Transformer | epoch 0 | step 57660 |avg loss 7.666 |avg tokens 2022.400 |tokens/s 7832.741 |walltime 15211.791 | +Transformer | epoch 0 | step 57670 |avg loss 7.761 |avg tokens 2206.700 |tokens/s 8379.252 |walltime 15214.425 | +Transformer | epoch 0 | step 57680 |avg loss 7.849 |avg tokens 1884.500 |tokens/s 7555.777 |walltime 15216.919 | +Transformer | epoch 0 | step 57690 |avg loss 7.796 |avg tokens 2049.000 |tokens/s 7954.727 |walltime 15219.495 | +Transformer | epoch 0 | step 57700 |avg loss 7.434 |avg tokens 2350.800 |tokens/s 8456.284 |walltime 15222.275 | +Transformer | epoch 0 | step 57710 |avg loss 7.849 |avg tokens 2190.400 |tokens/s 8262.548 |walltime 15224.926 | +Transformer | epoch 0 | step 57720 |avg loss 7.967 |avg tokens 1999.800 |tokens/s 7776.948 |walltime 15227.497 | +Transformer | epoch 0 | step 57730 |avg loss 7.745 |avg tokens 2347.200 |tokens/s 8665.132 |walltime 15230.206 | +Transformer | epoch 0 | step 57740 |avg loss 8.027 |avg tokens 1955.200 |tokens/s 7868.533 |walltime 15232.691 | +Transformer | epoch 0 | step 57750 |avg loss 7.717 |avg tokens 2103.200 |tokens/s 7887.396 |walltime 15235.357 | +Transformer | epoch 0 | step 57760 |avg loss 7.810 |avg tokens 2342.900 |tokens/s 8564.498 |walltime 15238.093 | +Transformer | epoch 0 | step 57770 |avg loss 7.856 |avg tokens 2063.900 |tokens/s 7977.222 |walltime 15240.680 | +Transformer | epoch 0 | step 57780 |avg loss 7.679 |avg tokens 2257.600 |tokens/s 8411.965 |walltime 15243.364 | +Transformer | epoch 0 | step 57790 |avg loss 7.866 |avg tokens 2100.500 |tokens/s 8061.039 |walltime 15245.970 | +Transformer | epoch 0 | step 57800 |avg loss 8.002 |avg tokens 2020.700 |tokens/s 8023.275 |walltime 15248.488 | +Transformer | epoch 0 | step 57810 |avg loss 7.788 |avg tokens 2248.500 |tokens/s 8512.667 |walltime 15251.129 | +Transformer | epoch 0 | step 57820 |avg loss 7.409 |avg tokens 2385.300 |tokens/s 8443.499 |walltime 15253.955 | +Transformer | epoch 0 | step 57830 |avg loss 7.319 |avg tokens 2348.000 |tokens/s 8545.406 |walltime 15256.702 | +Transformer | epoch 0 | step 57840 |avg loss 7.866 |avg tokens 2323.500 |tokens/s 8486.865 |walltime 15259.440 | +Transformer | epoch 0 | step 57850 |avg loss 7.781 |avg tokens 2158.600 |tokens/s 8064.335 |walltime 15262.117 | +Transformer | epoch 0 | step 57860 |avg loss 7.679 |avg tokens 1851.700 |tokens/s 7318.624 |walltime 15264.647 | +Transformer | epoch 0 | step 57870 |avg loss 7.703 |avg tokens 1912.300 |tokens/s 7617.021 |walltime 15267.157 | +Transformer | epoch 0 | step 57880 |avg loss 7.736 |avg tokens 2157.900 |tokens/s 8209.256 |walltime 15269.786 | +Transformer | epoch 0 | step 57890 |avg loss 7.639 |avg tokens 2290.300 |tokens/s 8528.639 |walltime 15272.471 | +Transformer | epoch 0 | step 57900 |avg loss 7.504 |avg tokens 2294.400 |tokens/s 8529.021 |walltime 15275.162 | +Transformer | epoch 0 | step 57910 |avg loss 7.717 |avg tokens 2257.900 |tokens/s 8276.274 |walltime 15277.890 | +Transformer | epoch 0 | step 57920 |avg loss 8.433 |avg tokens 1787.800 |tokens/s 8009.046 |walltime 15280.122 | +Transformer | epoch 0 | step 57930 |avg loss 8.097 |avg tokens 2292.900 |tokens/s 8769.156 |walltime 15282.737 | +Transformer | epoch 0 | step 57940 |avg loss 8.121 |avg tokens 1953.900 |tokens/s 8003.760 |walltime 15285.178 | +Transformer | epoch 0 | step 57950 |avg loss 7.629 |avg tokens 2119.200 |tokens/s 7910.979 |walltime 15287.857 | +Transformer | epoch 0 | step 57960 |avg loss 7.788 |avg tokens 2085.600 |tokens/s 8002.974 |walltime 15290.463 | +Transformer | epoch 0 | step 57970 |avg loss 7.787 |avg tokens 1967.200 |tokens/s 7655.568 |walltime 15293.032 | +Transformer | epoch 0 | step 57980 |avg loss 8.006 |avg tokens 2094.400 |tokens/s 8192.625 |walltime 15295.589 | +Transformer | epoch 0 | step 57990 |avg loss 8.022 |avg tokens 2307.200 |tokens/s 8611.392 |walltime 15298.268 | +Transformer | epoch 0 | step 58000 |avg loss 7.691 |avg tokens 2132.800 |tokens/s 8039.204 |walltime 15300.921 | +Transformer | epoch 0 | step 58010 |avg loss 7.921 |avg tokens 2359.500 |tokens/s 8706.769 |walltime 15303.631 | +Transformer | epoch 0 | step 58020 |avg loss 8.070 |avg tokens 1920.600 |tokens/s 7743.024 |walltime 15306.111 | +Transformer | epoch 0 | step 58030 |avg loss 7.510 |avg tokens 2187.700 |tokens/s 8186.578 |walltime 15308.784 | +Transformer | epoch 0 | step 58040 |avg loss 7.796 |avg tokens 2382.600 |tokens/s 9146.683 |walltime 15311.389 | +Transformer | epoch 0 | step 58050 |avg loss 8.087 |avg tokens 1961.200 |tokens/s 7923.868 |walltime 15313.864 | +Transformer | epoch 0 | step 58060 |avg loss 7.979 |avg tokens 2169.900 |tokens/s 8327.485 |walltime 15316.469 | +Transformer | epoch 0 | step 58070 |avg loss 7.648 |avg tokens 2184.300 |tokens/s 8062.314 |walltime 15319.179 | +Transformer | epoch 0 | step 58080 |avg loss 7.648 |avg tokens 2267.600 |tokens/s 8462.026 |walltime 15321.858 | +Transformer | epoch 0 | step 58090 |avg loss 7.966 |avg tokens 2139.300 |tokens/s 8274.592 |walltime 15324.444 | +Transformer | epoch 0 | step 58100 |avg loss 7.560 |avg tokens 2254.400 |tokens/s 8261.020 |walltime 15327.173 | +Transformer | epoch 0 | step 58110 |avg loss 7.886 |avg tokens 2096.900 |tokens/s 8164.888 |walltime 15329.741 | +Transformer | epoch 0 | step 58120 |avg loss 7.480 |avg tokens 2097.500 |tokens/s 8096.803 |walltime 15332.332 | +Transformer | epoch 0 | step 58130 |avg loss 7.581 |avg tokens 2115.400 |tokens/s 7917.828 |walltime 15335.003 | +Transformer | epoch 0 | step 58140 |avg loss 7.614 |avg tokens 2099.300 |tokens/s 7931.039 |walltime 15337.650 | +Transformer | epoch 0 | step 58150 |avg loss 7.983 |avg tokens 2240.700 |tokens/s 8743.003 |walltime 15340.213 | +Transformer | epoch 0 | step 58160 |avg loss 7.676 |avg tokens 2373.000 |tokens/s 8459.249 |walltime 15343.018 | +Transformer | epoch 0 | step 58170 |avg loss 7.310 |avg tokens 2223.500 |tokens/s 8297.420 |walltime 15345.698 | +Transformer | epoch 0 | step 58180 |avg loss 7.658 |avg tokens 2208.400 |tokens/s 8300.609 |walltime 15348.359 | +Transformer | epoch 0 | step 58190 |avg loss 7.988 |avg tokens 2146.700 |tokens/s 8381.886 |walltime 15350.920 | +Transformer | epoch 0 | step 58200 |avg loss 7.901 |avg tokens 2239.900 |tokens/s 8662.212 |walltime 15353.505 | +Transformer | epoch 0 | step 58210 |avg loss 7.991 |avg tokens 2278.600 |tokens/s 8691.320 |walltime 15356.127 | +Transformer | epoch 0 | step 58220 |avg loss 7.670 |avg tokens 2347.500 |tokens/s 8635.918 |walltime 15358.845 | +Transformer | epoch 0 | step 58230 |avg loss 7.602 |avg tokens 2378.400 |tokens/s 8627.409 |walltime 15361.602 | +Transformer | epoch 0 | step 58240 |avg loss 7.654 |avg tokens 2082.200 |tokens/s 8077.089 |walltime 15364.180 | +Transformer | epoch 0 | step 58250 |avg loss 7.114 |avg tokens 2434.400 |tokens/s 8842.112 |walltime 15366.933 | +Transformer | epoch 0 | step 58260 |avg loss 7.700 |avg tokens 2199.200 |tokens/s 8413.716 |walltime 15369.547 | +Transformer | epoch 0 | step 58270 |avg loss 7.888 |avg tokens 2218.300 |tokens/s 8308.004 |walltime 15372.217 | +Transformer | epoch 0 | step 58280 |avg loss 8.020 |avg tokens 2105.500 |tokens/s 8205.233 |walltime 15374.783 | +Transformer | epoch 0 | step 58290 |avg loss 7.605 |avg tokens 2134.500 |tokens/s 8193.354 |walltime 15377.388 | +Transformer | epoch 0 | step 58300 |avg loss 7.826 |avg tokens 1938.100 |tokens/s 7603.973 |walltime 15379.937 | +Transformer | epoch 0 | step 58310 |avg loss 8.101 |avg tokens 2102.600 |tokens/s 8550.871 |walltime 15382.396 | +Transformer | epoch 0 | step 58320 |avg loss 8.072 |avg tokens 2166.600 |tokens/s 8241.947 |walltime 15385.025 | +Transformer | epoch 0 | step 58330 |avg loss 7.703 |avg tokens 2282.900 |tokens/s 8371.804 |walltime 15387.752 | +Transformer | epoch 0 | step 58340 |avg loss 7.948 |avg tokens 2016.600 |tokens/s 8239.631 |walltime 15390.199 | +Transformer | epoch 0 | step 58350 |avg loss 8.011 |avg tokens 2210.400 |tokens/s 8654.203 |walltime 15392.753 | +Transformer | epoch 0 | step 58360 |avg loss 7.507 |avg tokens 2326.400 |tokens/s 8391.434 |walltime 15395.526 | +Transformer | epoch 0 | step 58370 |avg loss 7.570 |avg tokens 2262.400 |tokens/s 8285.404 |walltime 15398.256 | +Transformer | epoch 0 | step 58380 |avg loss 8.114 |avg tokens 2004.700 |tokens/s 7828.860 |walltime 15400.817 | +Transformer | epoch 0 | step 58390 |avg loss 7.862 |avg tokens 2348.200 |tokens/s 8786.430 |walltime 15403.490 | +Transformer | epoch 0 | step 58400 |avg loss 7.753 |avg tokens 2374.800 |tokens/s 8719.468 |walltime 15406.213 | +Transformer | epoch 0 | step 58410 |avg loss 7.538 |avg tokens 2070.400 |tokens/s 7759.902 |walltime 15408.881 | +Transformer | epoch 0 | step 58420 |avg loss 7.574 |avg tokens 2305.200 |tokens/s 8498.679 |walltime 15411.594 | +Transformer | epoch 0 | step 58430 |avg loss 7.918 |avg tokens 2301.400 |tokens/s 8585.051 |walltime 15414.274 | +Transformer | epoch 0 | step 58440 |avg loss 7.809 |avg tokens 2247.600 |tokens/s 8467.791 |walltime 15416.929 | +Transformer | epoch 0 | step 58450 |avg loss 7.656 |avg tokens 2217.200 |tokens/s 8299.885 |walltime 15419.600 | +Transformer | epoch 0 | step 58460 |avg loss 7.649 |avg tokens 2185.600 |tokens/s 8228.898 |walltime 15422.256 | +Transformer | epoch 0 | step 58470 |avg loss 7.983 |avg tokens 2068.600 |tokens/s 8118.877 |walltime 15424.804 | +Transformer | epoch 0 | step 58480 |avg loss 7.661 |avg tokens 2281.700 |tokens/s 8353.496 |walltime 15427.535 | +Transformer | epoch 0 | step 58490 |avg loss 8.348 |avg tokens 1850.800 |tokens/s 7790.460 |walltime 15429.911 | +Transformer | epoch 0 | step 58500 |avg loss 8.026 |avg tokens 2033.700 |tokens/s 8180.305 |walltime 15432.397 | +Transformer | epoch 0 | step 58510 |avg loss 7.414 |avg tokens 2189.200 |tokens/s 8244.948 |walltime 15435.052 | +Transformer | epoch 0 | step 58520 |avg loss 7.667 |avg tokens 2087.300 |tokens/s 7963.613 |walltime 15437.673 | +Transformer | epoch 0 | step 58530 |avg loss 7.694 |avg tokens 2238.400 |tokens/s 8191.342 |walltime 15440.406 | +Transformer | epoch 0 | step 58540 |avg loss 7.792 |avg tokens 2162.300 |tokens/s 8322.983 |walltime 15443.004 | +Transformer | epoch 0 | step 58550 |avg loss 7.824 |avg tokens 2251.400 |tokens/s 8445.705 |walltime 15445.670 | +Transformer | epoch 0 | step 58560 |avg loss 7.418 |avg tokens 2288.000 |tokens/s 8428.224 |walltime 15448.385 | +Transformer | epoch 0 | step 58570 |avg loss 7.768 |avg tokens 2212.200 |tokens/s 8084.804 |walltime 15451.121 | +Transformer | epoch 0 | step 58580 |avg loss 7.767 |avg tokens 2163.200 |tokens/s 8227.908 |walltime 15453.750 | +Transformer | epoch 0 | step 58590 |avg loss 7.634 |avg tokens 2240.200 |tokens/s 8215.160 |walltime 15456.477 | +Transformer | epoch 0 | step 58600 |avg loss 7.621 |avg tokens 2357.200 |tokens/s 8795.209 |walltime 15459.157 | +Transformer | epoch 0 | step 58610 |avg loss 8.035 |avg tokens 1858.200 |tokens/s 7767.067 |walltime 15461.549 | +Transformer | epoch 0 | step 58620 |avg loss 8.156 |avg tokens 2022.400 |tokens/s 8140.876 |walltime 15464.034 | +Transformer | epoch 0 | step 58630 |avg loss 7.330 |avg tokens 2410.600 |tokens/s 8587.979 |walltime 15466.841 | +Transformer | epoch 0 | step 58640 |avg loss 7.821 |avg tokens 2329.800 |tokens/s 8816.132 |walltime 15469.483 | +Transformer | epoch 0 | step 58650 |avg loss 7.656 |avg tokens 2132.500 |tokens/s 8246.519 |walltime 15472.069 | +Transformer | epoch 0 | step 58660 |avg loss 7.696 |avg tokens 2151.300 |tokens/s 8289.424 |walltime 15474.664 | +Transformer | epoch 0 | step 58670 |avg loss 7.605 |avg tokens 2234.600 |tokens/s 8234.319 |walltime 15477.378 | +Transformer | epoch 0 | step 58680 |avg loss 8.168 |avg tokens 2097.700 |tokens/s 8143.961 |walltime 15479.954 | +Transformer | epoch 0 | step 58690 |avg loss 7.936 |avg tokens 1964.100 |tokens/s 8546.252 |walltime 15482.252 | +Transformer | epoch 0 | step 58700 |avg loss 8.026 |avg tokens 1964.400 |tokens/s 7780.727 |walltime 15484.777 | +Transformer | epoch 0 | step 58710 |avg loss 7.974 |avg tokens 1774.500 |tokens/s 7475.191 |walltime 15487.151 | +Transformer | epoch 0 | step 58720 |avg loss 7.951 |avg tokens 2193.800 |tokens/s 8510.516 |walltime 15489.728 | +Transformer | epoch 0 | step 58730 |avg loss 7.780 |avg tokens 2180.200 |tokens/s 8260.940 |walltime 15492.368 | +Transformer | epoch 0 | step 58740 |avg loss 8.115 |avg tokens 2194.800 |tokens/s 8817.379 |walltime 15494.857 | +Transformer | epoch 0 | step 58750 |avg loss 7.758 |avg tokens 2358.700 |tokens/s 9148.203 |walltime 15497.435 | +Transformer | epoch 0 | step 58760 |avg loss 7.530 |avg tokens 2340.200 |tokens/s 8400.935 |walltime 15500.221 | +Transformer | epoch 0 | step 58770 |avg loss 7.810 |avg tokens 2267.700 |tokens/s 8437.656 |walltime 15502.908 | +Transformer | epoch 0 | step 58780 |avg loss 7.612 |avg tokens 2292.800 |tokens/s 8324.297 |walltime 15505.663 | +Transformer | epoch 0 | step 58790 |avg loss 7.673 |avg tokens 2280.900 |tokens/s 8538.804 |walltime 15508.334 | +Transformer | epoch 0 | step 58800 |avg loss 7.913 |avg tokens 2064.600 |tokens/s 8048.033 |walltime 15510.899 | +Transformer | epoch 0 | step 58810 |avg loss 7.973 |avg tokens 2284.000 |tokens/s 8954.211 |walltime 15513.450 | +Transformer | epoch 0 | step 58820 |avg loss 7.698 |avg tokens 2298.900 |tokens/s 8565.411 |walltime 15516.134 | +Transformer | epoch 0 | step 58830 |avg loss 7.977 |avg tokens 1933.300 |tokens/s 7854.420 |walltime 15518.595 | +Transformer | epoch 0 | step 58840 |avg loss 8.041 |avg tokens 2110.000 |tokens/s 8283.720 |walltime 15521.143 | +Transformer | epoch 0 | step 58850 |avg loss 7.743 |avg tokens 2131.700 |tokens/s 8046.175 |walltime 15523.792 | +Transformer | epoch 0 | step 58860 |avg loss 7.913 |avg tokens 2121.000 |tokens/s 8161.427 |walltime 15526.391 | +Transformer | epoch 0 | step 58870 |avg loss 7.713 |avg tokens 2292.500 |tokens/s 8474.469 |walltime 15529.096 | +Transformer | epoch 0 | step 58880 |avg loss 7.407 |avg tokens 2374.400 |tokens/s 8563.196 |walltime 15531.869 | +Transformer | epoch 0 | step 58890 |avg loss 7.426 |avg tokens 2287.300 |tokens/s 8291.499 |walltime 15534.627 | +Transformer | epoch 0 | step 58900 |avg loss 7.980 |avg tokens 1972.400 |tokens/s 7917.389 |walltime 15537.119 | +Transformer | epoch 0 | step 58910 |avg loss 7.549 |avg tokens 2315.600 |tokens/s 8405.743 |walltime 15539.873 | +Transformer | epoch 0 | step 58920 |avg loss 7.951 |avg tokens 2063.400 |tokens/s 8134.949 |walltime 15542.410 | +Transformer | epoch 0 | step 58930 |avg loss 7.989 |avg tokens 1775.600 |tokens/s 7167.002 |walltime 15544.887 | +Transformer | epoch 0 | step 58940 |avg loss 7.688 |avg tokens 2163.900 |tokens/s 8403.709 |walltime 15547.462 | +Transformer | epoch 0 | step 58950 |avg loss 7.530 |avg tokens 2344.800 |tokens/s 8658.722 |walltime 15550.170 | +Transformer | epoch 0 | step 58960 |avg loss 7.699 |avg tokens 2176.000 |tokens/s 8161.145 |walltime 15552.837 | +Transformer | epoch 0 | step 58970 |avg loss 7.727 |avg tokens 2193.000 |tokens/s 8413.329 |walltime 15555.443 | +Transformer | epoch 0 | step 58980 |avg loss 7.680 |avg tokens 2180.000 |tokens/s 8083.247 |walltime 15558.140 | +Transformer | epoch 0 | step 58990 |avg loss 7.978 |avg tokens 2014.700 |tokens/s 7714.702 |walltime 15560.752 | +Transformer | epoch 0 | step 59000 |avg loss 7.773 |avg tokens 2088.800 |tokens/s 7994.049 |walltime 15563.365 | +Transformer | epoch 0 | step 59010 |avg loss 7.339 |avg tokens 2368.000 |tokens/s 8424.034 |walltime 15566.176 | +Transformer | epoch 0 | step 59020 |avg loss 7.728 |avg tokens 2087.900 |tokens/s 8066.734 |walltime 15568.764 | +Transformer | epoch 0 | step 59030 |avg loss 7.574 |avg tokens 2246.400 |tokens/s 8243.569 |walltime 15571.489 | +Transformer | epoch 0 | step 59040 |avg loss 8.260 |avg tokens 1996.400 |tokens/s 8225.046 |walltime 15573.916 | +Transformer | epoch 0 | step 59050 |avg loss 8.077 |avg tokens 2177.400 |tokens/s 8616.918 |walltime 15576.443 | +Transformer | epoch 0 | step 59060 |avg loss 7.559 |avg tokens 2317.600 |tokens/s 8825.351 |walltime 15579.069 | +Transformer | epoch 0 | step 59070 |avg loss 7.683 |avg tokens 2114.300 |tokens/s 8204.691 |walltime 15581.646 | +Transformer | epoch 0 | step 59080 |avg loss 7.696 |avg tokens 2299.800 |tokens/s 8456.983 |walltime 15584.365 | +Transformer | epoch 0 | step 59090 |avg loss 8.013 |avg tokens 2113.500 |tokens/s 8337.370 |walltime 15586.900 | +Transformer | epoch 0 | step 59100 |avg loss 7.503 |avg tokens 2162.400 |tokens/s 7981.385 |walltime 15589.610 | +Transformer | epoch 0 | step 59110 |avg loss 7.696 |avg tokens 2144.000 |tokens/s 8271.286 |walltime 15592.202 | +Transformer | epoch 0 | step 59120 |avg loss 7.984 |avg tokens 2132.300 |tokens/s 8197.518 |walltime 15594.803 | +Transformer | epoch 0 | step 59130 |avg loss 7.618 |avg tokens 2222.700 |tokens/s 8255.801 |walltime 15597.495 | +Transformer | epoch 0 | step 59140 |avg loss 7.762 |avg tokens 2167.800 |tokens/s 8087.690 |walltime 15600.176 | +Transformer | epoch 0 | step 59150 |avg loss 8.028 |avg tokens 1887.100 |tokens/s 7616.932 |walltime 15602.653 | +Transformer | epoch 0 | step 59160 |avg loss 7.780 |avg tokens 2134.900 |tokens/s 8161.901 |walltime 15605.269 | +Transformer | epoch 0 | step 59170 |avg loss 7.828 |avg tokens 2126.900 |tokens/s 8122.134 |walltime 15607.887 | +Transformer | epoch 0 | step 59180 |avg loss 7.904 |avg tokens 2264.000 |tokens/s 8536.617 |walltime 15610.540 | +Transformer | epoch 0 | step 59190 |avg loss 7.713 |avg tokens 2334.400 |tokens/s 8586.816 |walltime 15613.258 | +Transformer | epoch 0 | step 59200 |avg loss 7.566 |avg tokens 2091.400 |tokens/s 7889.878 |walltime 15615.909 | +Transformer | epoch 0 | step 59210 |avg loss 7.625 |avg tokens 2316.000 |tokens/s 8434.584 |walltime 15618.655 | +Transformer | epoch 0 | step 59220 |avg loss 7.928 |avg tokens 1836.200 |tokens/s 7488.076 |walltime 15621.107 | +Transformer | epoch 0 | step 59230 |avg loss 7.543 |avg tokens 2085.500 |tokens/s 7775.721 |walltime 15623.789 | +Transformer | epoch 0 | step 59240 |avg loss 7.484 |avg tokens 2353.600 |tokens/s 8355.240 |walltime 15626.606 | +Transformer | epoch 0 | step 59250 |avg loss 7.644 |avg tokens 2247.200 |tokens/s 8533.249 |walltime 15629.239 | +Transformer | epoch 0 | step 59260 |avg loss 7.727 |avg tokens 2198.800 |tokens/s 8349.698 |walltime 15631.873 | +Transformer | epoch 0 | step 59270 |avg loss 7.750 |avg tokens 2098.500 |tokens/s 7745.390 |walltime 15634.582 | +Transformer | epoch 0 | step 59280 |avg loss 7.900 |avg tokens 2221.700 |tokens/s 8505.074 |walltime 15637.194 | +Transformer | epoch 0 | step 59290 |avg loss 8.026 |avg tokens 1789.100 |tokens/s 7489.585 |walltime 15639.583 | +Transformer | epoch 0 | step 59300 |avg loss 7.923 |avg tokens 2004.000 |tokens/s 7954.649 |walltime 15642.102 | +Transformer | epoch 0 | step 59310 |avg loss 8.018 |avg tokens 2126.000 |tokens/s 8346.775 |walltime 15644.649 | +Transformer | epoch 0 | step 59320 |avg loss 8.161 |avg tokens 2054.600 |tokens/s 8286.294 |walltime 15647.129 | +Transformer | epoch 0 | step 59330 |avg loss 7.723 |avg tokens 2204.800 |tokens/s 8240.160 |walltime 15649.805 | +Transformer | epoch 0 | step 59340 |avg loss 7.628 |avg tokens 2045.200 |tokens/s 7899.277 |walltime 15652.394 | +Transformer | epoch 0 | step 59350 |avg loss 7.417 |avg tokens 2396.200 |tokens/s 8668.137 |walltime 15655.158 | +Transformer | epoch 0 | step 59360 |avg loss 7.860 |avg tokens 2262.700 |tokens/s 8462.196 |walltime 15657.832 | +Transformer | epoch 0 | step 59370 |avg loss 7.762 |avg tokens 2414.000 |tokens/s 8746.335 |walltime 15660.592 | +Transformer | epoch 0 | step 59380 |avg loss 7.947 |avg tokens 2086.500 |tokens/s 7955.563 |walltime 15663.215 | +Transformer | epoch 0 | step 59390 |avg loss 7.848 |avg tokens 2197.700 |tokens/s 8314.225 |walltime 15665.858 | +Transformer | epoch 0 | step 59400 |avg loss 7.886 |avg tokens 1933.200 |tokens/s 7729.975 |walltime 15668.359 | +Transformer | epoch 0 | step 59410 |avg loss 7.920 |avg tokens 2251.900 |tokens/s 8517.390 |walltime 15671.003 | +Transformer | epoch 0 | step 59420 |avg loss 7.846 |avg tokens 2135.700 |tokens/s 8159.033 |walltime 15673.620 | +Transformer | epoch 0 | step 59430 |avg loss 7.688 |avg tokens 2181.600 |tokens/s 8182.014 |walltime 15676.287 | +Transformer | epoch 0 | step 59440 |avg loss 7.823 |avg tokens 2189.300 |tokens/s 8372.906 |walltime 15678.902 | +Transformer | epoch 0 | step 59450 |avg loss 7.746 |avg tokens 1931.000 |tokens/s 7544.191 |walltime 15681.461 | +Transformer | epoch 0 | step 59460 |avg loss 7.542 |avg tokens 2113.900 |tokens/s 8071.155 |walltime 15684.080 | +Transformer | epoch 0 | step 59470 |avg loss 7.749 |avg tokens 2316.500 |tokens/s 8496.720 |walltime 15686.807 | +Transformer | epoch 0 | step 59480 |avg loss 7.986 |avg tokens 1988.700 |tokens/s 7645.558 |walltime 15689.408 | +Transformer | epoch 0 | step 59490 |avg loss 7.785 |avg tokens 2094.500 |tokens/s 7857.573 |walltime 15692.073 | +Transformer | epoch 0 | step 59500 |avg loss 7.790 |avg tokens 2292.200 |tokens/s 8164.888 |walltime 15694.881 | +Transformer | epoch 0 | step 59510 |avg loss 7.671 |avg tokens 2239.800 |tokens/s 8589.502 |walltime 15697.488 | +Transformer | epoch 0 | step 59520 |avg loss 7.742 |avg tokens 2241.400 |tokens/s 8602.425 |walltime 15700.094 | +Transformer | epoch 0 | step 59530 |avg loss 7.747 |avg tokens 2107.400 |tokens/s 8201.796 |walltime 15702.663 | +Transformer | epoch 0 | step 59540 |avg loss 7.538 |avg tokens 2133.400 |tokens/s 8006.780 |walltime 15705.328 | +Transformer | epoch 0 | step 59550 |avg loss 7.571 |avg tokens 2336.800 |tokens/s 8316.706 |walltime 15708.138 | +Transformer | epoch 0 | step 59560 |avg loss 7.611 |avg tokens 2332.700 |tokens/s 8550.581 |walltime 15710.866 | +Transformer | epoch 0 | step 59570 |avg loss 7.933 |avg tokens 2052.000 |tokens/s 8028.182 |walltime 15713.422 | +Transformer | epoch 0 | step 59580 |avg loss 7.651 |avg tokens 2410.000 |tokens/s 8529.654 |walltime 15716.247 | +Transformer | epoch 0 | step 59590 |avg loss 7.732 |avg tokens 2123.300 |tokens/s 8139.680 |walltime 15718.856 | +Transformer | epoch 0 | step 59600 |avg loss 7.712 |avg tokens 2127.900 |tokens/s 8142.775 |walltime 15721.469 | +Transformer | epoch 0 | step 59610 |avg loss 7.613 |avg tokens 2214.900 |tokens/s 8145.149 |walltime 15724.188 | +Transformer | epoch 0 | step 59620 |avg loss 7.719 |avg tokens 2277.800 |tokens/s 8547.012 |walltime 15726.853 | +Transformer | epoch 0 | step 59630 |avg loss 7.443 |avg tokens 2397.600 |tokens/s 8834.140 |walltime 15729.567 | +Transformer | epoch 0 | step 59640 |avg loss 7.868 |avg tokens 2149.100 |tokens/s 8409.843 |walltime 15732.123 | +Transformer | epoch 0 | step 59650 |avg loss 8.018 |avg tokens 2098.100 |tokens/s 8264.976 |walltime 15734.661 | +Transformer | epoch 0 | step 59660 |avg loss 7.930 |avg tokens 2224.600 |tokens/s 8637.796 |walltime 15737.237 | +Transformer | epoch 0 | step 59670 |avg loss 7.623 |avg tokens 2010.600 |tokens/s 7676.929 |walltime 15739.856 | +Transformer | epoch 0 | step 59680 |avg loss 8.019 |avg tokens 2147.600 |tokens/s 8273.702 |walltime 15742.451 | +Transformer | epoch 0 | step 59690 |avg loss 7.474 |avg tokens 2112.000 |tokens/s 8028.828 |walltime 15745.082 | +Transformer | epoch 0 | step 59700 |avg loss 7.683 |avg tokens 2291.800 |tokens/s 8161.816 |walltime 15747.890 | +Transformer | epoch 0 | step 59710 |avg loss 8.017 |avg tokens 2198.700 |tokens/s 8662.598 |walltime 15750.428 | +Transformer | epoch 0 | step 59720 |avg loss 7.582 |avg tokens 2307.200 |tokens/s 8423.066 |walltime 15753.167 | +Transformer | epoch 0 | step 59730 |avg loss 7.493 |avg tokens 1995.100 |tokens/s 7626.141 |walltime 15755.783 | +Transformer | epoch 0 | step 59740 |avg loss 7.792 |avg tokens 2253.900 |tokens/s 8492.653 |walltime 15758.437 | +Transformer | epoch 0 | step 59750 |avg loss 7.355 |avg tokens 2355.200 |tokens/s 8671.881 |walltime 15761.153 | +Transformer | epoch 0 | step 59760 |avg loss 7.812 |avg tokens 2249.800 |tokens/s 8574.336 |walltime 15763.777 | +Transformer | epoch 0 | step 59770 |avg loss 7.688 |avg tokens 2289.400 |tokens/s 8386.800 |walltime 15766.507 | +Transformer | epoch 0 | step 59780 |avg loss 7.669 |avg tokens 2126.400 |tokens/s 8034.393 |walltime 15769.154 | +Transformer | epoch 0 | step 59790 |avg loss 7.695 |avg tokens 2117.600 |tokens/s 8170.101 |walltime 15771.745 | +Transformer | epoch 0 | step 59800 |avg loss 7.697 |avg tokens 1972.100 |tokens/s 7666.456 |walltime 15774.318 | +Transformer | epoch 0 | step 59810 |avg loss 8.076 |avg tokens 2005.800 |tokens/s 8091.120 |walltime 15776.797 | +Transformer | epoch 0 | step 59820 |avg loss 7.214 |avg tokens 2320.000 |tokens/s 8248.051 |walltime 15779.610 | +Transformer | epoch 0 | step 59830 |avg loss 7.992 |avg tokens 2021.600 |tokens/s 8011.610 |walltime 15782.133 | +Transformer | epoch 0 | step 59840 |avg loss 7.472 |avg tokens 2289.300 |tokens/s 8336.535 |walltime 15784.879 | +Transformer | epoch 0 | step 59850 |avg loss 7.639 |avg tokens 2145.100 |tokens/s 8147.897 |walltime 15787.512 | +Transformer | epoch 0 | step 59860 |avg loss 7.912 |avg tokens 2177.300 |tokens/s 8319.118 |walltime 15790.129 | +Transformer | epoch 0 | step 59870 |avg loss 7.756 |avg tokens 2225.400 |tokens/s 8545.273 |walltime 15792.733 | +Transformer | epoch 0 | step 59880 |avg loss 8.077 |avg tokens 2185.700 |tokens/s 8552.881 |walltime 15795.289 | +Transformer | epoch 0 | step 59890 |avg loss 7.312 |avg tokens 2237.600 |tokens/s 8099.204 |walltime 15798.051 | +Transformer | epoch 0 | step 59900 |avg loss 7.388 |avg tokens 2277.600 |tokens/s 8396.934 |walltime 15800.764 | +Transformer | epoch 0 | step 59910 |avg loss 7.699 |avg tokens 2048.700 |tokens/s 7788.194 |walltime 15803.394 | +Transformer | epoch 0 | step 59920 |avg loss 7.978 |avg tokens 2263.400 |tokens/s 8523.479 |walltime 15806.050 | +Transformer | epoch 0 | step 59930 |avg loss 7.812 |avg tokens 2193.600 |tokens/s 7986.827 |walltime 15808.796 | +Transformer | epoch 0 | step 59940 |avg loss 8.360 |avg tokens 2115.800 |tokens/s 8676.317 |walltime 15811.235 | +Transformer | epoch 0 | step 59950 |avg loss 7.597 |avg tokens 2235.600 |tokens/s 8244.925 |walltime 15813.947 | +Transformer | epoch 0 | step 59960 |avg loss 7.615 |avg tokens 2150.500 |tokens/s 8059.737 |walltime 15816.615 | +Transformer | epoch 0 | step 59970 |avg loss 7.696 |avg tokens 2167.300 |tokens/s 8285.667 |walltime 15819.230 | +Transformer | epoch 0 | step 59980 |avg loss 7.508 |avg tokens 2134.200 |tokens/s 8013.281 |walltime 15821.894 | +Transformer | epoch 0 | step 59990 |avg loss 7.398 |avg tokens 2388.900 |tokens/s 8555.504 |walltime 15824.686 | +Transformer | epoch 0 | step 60000 |avg loss 7.888 |avg tokens 1833.200 |tokens/s 7351.163 |walltime 15827.180 | +Transformer | epoch 0 | step 60010 |avg loss 7.726 |avg tokens 2222.400 |tokens/s 8461.141 |walltime 15829.806 | +Transformer | epoch 0 | step 60020 |avg loss 7.685 |avg tokens 2188.500 |tokens/s 8032.184 |walltime 15832.531 | +Transformer | epoch 0 | step 60030 |avg loss 7.814 |avg tokens 2164.800 |tokens/s 8684.710 |walltime 15835.024 | +Transformer | epoch 0 | step 60040 |avg loss 7.895 |avg tokens 2215.000 |tokens/s 8174.445 |walltime 15837.733 | +Transformer | epoch 0 | step 60050 |avg loss 7.416 |avg tokens 2081.400 |tokens/s 7907.234 |walltime 15840.366 | +Transformer | epoch 0 | step 60060 |avg loss 7.642 |avg tokens 2234.500 |tokens/s 8600.278 |walltime 15842.964 | +Transformer | epoch 0 | step 60070 |avg loss 7.872 |avg tokens 2269.800 |tokens/s 8562.935 |walltime 15845.615 | +Transformer | epoch 0 | step 60080 |avg loss 7.557 |avg tokens 2338.400 |tokens/s 8325.049 |walltime 15848.423 | +Transformer | epoch 0 | step 60090 |avg loss 7.674 |avg tokens 2107.300 |tokens/s 8500.125 |walltime 15850.903 | +Transformer | epoch 0 | step 60100 |avg loss 7.577 |avg tokens 2176.500 |tokens/s 8368.902 |walltime 15853.503 | +Transformer | epoch 0 | step 60110 |avg loss 7.564 |avg tokens 2117.900 |tokens/s 8044.784 |walltime 15856.136 | +Transformer | epoch 0 | step 60120 |avg loss 7.569 |avg tokens 2365.600 |tokens/s 8493.465 |walltime 15858.921 | +Transformer | epoch 0 | step 60130 |avg loss 7.516 |avg tokens 2291.200 |tokens/s 8417.684 |walltime 15861.643 | +Transformer | epoch 0 | step 60140 |avg loss 7.570 |avg tokens 2152.900 |tokens/s 8340.483 |walltime 15864.224 | +Transformer | epoch 0 | step 60150 |avg loss 8.143 |avg tokens 2157.800 |tokens/s 8432.305 |walltime 15866.783 | +Transformer | epoch 0 | step 60160 |avg loss 7.744 |avg tokens 2234.400 |tokens/s 8334.688 |walltime 15869.464 | +Transformer | epoch 0 | step 60170 |avg loss 7.853 |avg tokens 2158.100 |tokens/s 8346.525 |walltime 15872.050 | +Transformer | epoch 0 | step 60180 |avg loss 8.157 |avg tokens 2151.800 |tokens/s 8606.259 |walltime 15874.550 | +Transformer | epoch 0 | step 60190 |avg loss 7.674 |avg tokens 2072.000 |tokens/s 8022.606 |walltime 15877.133 | +Transformer | epoch 0 | step 60200 |avg loss 7.652 |avg tokens 2300.800 |tokens/s 8416.391 |walltime 15879.866 | +Transformer | epoch 0 | step 60210 |avg loss 7.407 |avg tokens 2287.200 |tokens/s 8324.423 |walltime 15882.614 | +Transformer | epoch 0 | step 60220 |avg loss 7.679 |avg tokens 2140.100 |tokens/s 8163.104 |walltime 15885.236 | +Transformer | epoch 0 | step 60230 |avg loss 7.695 |avg tokens 2365.200 |tokens/s 8465.477 |walltime 15888.030 | +Transformer | epoch 0 | step 60240 |avg loss 8.333 |avg tokens 1995.500 |tokens/s 8403.509 |walltime 15890.404 | +Transformer | epoch 0 | step 60250 |avg loss 7.718 |avg tokens 2201.600 |tokens/s 8119.022 |walltime 15893.116 | +Transformer | epoch 0 | step 60260 |avg loss 7.736 |avg tokens 1979.000 |tokens/s 7612.136 |walltime 15895.716 | +Transformer | epoch 0 | step 60270 |avg loss 7.213 |avg tokens 2279.200 |tokens/s 8216.445 |walltime 15898.490 | +Transformer | epoch 0 | step 60280 |avg loss 7.475 |avg tokens 2189.400 |tokens/s 8035.914 |walltime 15901.214 | +Transformer | epoch 0 | step 60290 |avg loss 7.752 |avg tokens 2095.600 |tokens/s 8135.294 |walltime 15903.790 | +Transformer | epoch 0 | step 60300 |avg loss 7.745 |avg tokens 2257.300 |tokens/s 8345.682 |walltime 15906.495 | +Transformer | epoch 0 | step 60310 |avg loss 7.418 |avg tokens 2389.600 |tokens/s 8616.111 |walltime 15909.268 | +Transformer | epoch 0 | step 60320 |avg loss 7.624 |avg tokens 2212.700 |tokens/s 8094.870 |walltime 15912.002 | +Transformer | epoch 0 | step 60330 |avg loss 7.669 |avg tokens 2019.600 |tokens/s 7784.669 |walltime 15914.596 | +Transformer | epoch 0 | step 60340 |avg loss 7.716 |avg tokens 2116.800 |tokens/s 8236.165 |walltime 15917.166 | +Transformer | epoch 0 | step 60350 |avg loss 7.989 |avg tokens 2034.400 |tokens/s 7777.636 |walltime 15919.782 | +Transformer | epoch 0 | step 60360 |avg loss 7.940 |avg tokens 2059.800 |tokens/s 7874.747 |walltime 15922.398 | +Transformer | epoch 0 | step 60370 |avg loss 7.584 |avg tokens 2341.600 |tokens/s 8364.718 |walltime 15925.197 | +Transformer | epoch 0 | step 60380 |avg loss 7.848 |avg tokens 2227.800 |tokens/s 8607.858 |walltime 15927.785 | +Transformer | epoch 0 | step 60390 |avg loss 7.931 |avg tokens 1816.200 |tokens/s 7365.134 |walltime 15930.251 | +Transformer | epoch 0 | step 60400 |avg loss 7.898 |avg tokens 2138.700 |tokens/s 8189.166 |walltime 15932.863 | +Transformer | epoch 0 | step 60410 |avg loss 7.681 |avg tokens 2091.900 |tokens/s 7975.035 |walltime 15935.486 | +Transformer | epoch 0 | step 60420 |avg loss 7.781 |avg tokens 2167.200 |tokens/s 7945.978 |walltime 15938.213 | +Transformer | epoch 0 | step 60430 |avg loss 7.883 |avg tokens 2381.700 |tokens/s 8829.197 |walltime 15940.911 | +Transformer | epoch 0 | step 60440 |avg loss 7.706 |avg tokens 2312.700 |tokens/s 8910.099 |walltime 15943.506 | +Transformer | epoch 0 | step 60450 |avg loss 7.708 |avg tokens 2035.800 |tokens/s 7886.666 |walltime 15946.088 | +Transformer | epoch 0 | step 60460 |avg loss 8.029 |avg tokens 2218.700 |tokens/s 8406.140 |walltime 15948.727 | +Transformer | epoch 0 | step 60470 |avg loss 8.284 |avg tokens 1769.600 |tokens/s 7882.234 |walltime 15950.972 | +Transformer | epoch 0 | step 60480 |avg loss 7.668 |avg tokens 2303.200 |tokens/s 8254.555 |walltime 15953.762 | +Transformer | epoch 0 | step 60490 |avg loss 7.917 |avg tokens 2327.700 |tokens/s 8625.971 |walltime 15956.461 | +Transformer | epoch 0 | step 60500 |avg loss 7.917 |avg tokens 2153.000 |tokens/s 8488.772 |walltime 15958.997 | +Transformer | epoch 0 | step 60510 |avg loss 8.037 |avg tokens 2224.300 |tokens/s 8431.776 |walltime 15961.635 | +Transformer | epoch 0 | step 60520 |avg loss 7.672 |avg tokens 2321.600 |tokens/s 8451.653 |walltime 15964.382 | +Transformer | epoch 0 | step 60530 |avg loss 7.723 |avg tokens 2194.300 |tokens/s 8345.172 |walltime 15967.011 | +Transformer | epoch 0 | step 60540 |avg loss 7.993 |avg tokens 1894.300 |tokens/s 7489.860 |walltime 15969.541 | +Transformer | epoch 0 | step 60550 |avg loss 7.826 |avg tokens 1850.900 |tokens/s 7559.216 |walltime 15971.989 | +Transformer | epoch 0 | step 60560 |avg loss 7.736 |avg tokens 2175.100 |tokens/s 8137.932 |walltime 15974.662 | +Transformer | epoch 0 | step 60570 |avg loss 7.667 |avg tokens 2063.700 |tokens/s 8090.751 |walltime 15977.213 | +Transformer | epoch 0 | step 60580 |avg loss 7.792 |avg tokens 2000.100 |tokens/s 7733.540 |walltime 15979.799 | +Transformer | epoch 0 | step 60590 |avg loss 7.816 |avg tokens 2342.300 |tokens/s 8553.000 |walltime 15982.537 | +Transformer | epoch 0 | step 60600 |avg loss 7.919 |avg tokens 2164.300 |tokens/s 8255.562 |walltime 15985.159 | +Transformer | epoch 0 | step 60610 |avg loss 7.672 |avg tokens 2103.800 |tokens/s 8069.949 |walltime 15987.766 | +Transformer | epoch 0 | step 60620 |avg loss 7.517 |avg tokens 2341.600 |tokens/s 8625.179 |walltime 15990.481 | +Transformer | epoch 0 | step 60630 |avg loss 7.655 |avg tokens 2400.300 |tokens/s 8858.032 |walltime 15993.191 | +Transformer | epoch 0 | step 60640 |avg loss 7.836 |avg tokens 1863.200 |tokens/s 7460.629 |walltime 15995.688 | +Transformer | epoch 0 | step 60650 |avg loss 7.710 |avg tokens 2102.400 |tokens/s 8032.455 |walltime 15998.305 | +Transformer | epoch 0 | step 60660 |avg loss 8.117 |avg tokens 2178.500 |tokens/s 8709.535 |walltime 16000.807 | +Transformer | epoch 0 | step 60670 |avg loss 7.671 |avg tokens 2105.600 |tokens/s 7976.810 |walltime 16003.446 | +Transformer | epoch 0 | step 60680 |avg loss 8.067 |avg tokens 2098.900 |tokens/s 8309.347 |walltime 16005.972 | +Transformer | epoch 0 | step 60690 |avg loss 8.046 |avg tokens 2273.500 |tokens/s 8808.843 |walltime 16008.553 | +Transformer | epoch 0 | step 60700 |avg loss 7.623 |avg tokens 2347.900 |tokens/s 8495.174 |walltime 16011.317 | +Transformer | epoch 0 | step 60710 |avg loss 7.554 |avg tokens 2262.100 |tokens/s 8282.080 |walltime 16014.048 | +Transformer | epoch 0 | step 60720 |avg loss 7.917 |avg tokens 2281.100 |tokens/s 9060.853 |walltime 16016.566 | +Transformer | epoch 0 | step 60730 |avg loss 7.742 |avg tokens 2148.400 |tokens/s 8220.612 |walltime 16019.179 | +Transformer | epoch 0 | step 60740 |avg loss 8.021 |avg tokens 2132.300 |tokens/s 8571.885 |walltime 16021.667 | +Transformer | epoch 0 | step 60750 |avg loss 7.805 |avg tokens 2292.800 |tokens/s 8402.757 |walltime 16024.396 | +Transformer | epoch 0 | step 60760 |avg loss 7.674 |avg tokens 1982.000 |tokens/s 7602.242 |walltime 16027.003 | +Transformer | epoch 0 | step 60770 |avg loss 7.856 |avg tokens 2186.800 |tokens/s 8141.496 |walltime 16029.689 | +Transformer | epoch 0 | step 60780 |avg loss 7.744 |avg tokens 2002.400 |tokens/s 7757.540 |walltime 16032.270 | +Transformer | epoch 0 | step 60790 |avg loss 8.072 |avg tokens 2390.600 |tokens/s 9052.065 |walltime 16034.911 | +Transformer | epoch 0 | step 60800 |avg loss 8.115 |avg tokens 2064.500 |tokens/s 8306.633 |walltime 16037.396 | +Transformer | epoch 0 | step 60810 |avg loss 7.660 |avg tokens 2268.100 |tokens/s 8531.134 |walltime 16040.055 | +Transformer | epoch 0 | step 60820 |avg loss 7.947 |avg tokens 1981.500 |tokens/s 7916.879 |walltime 16042.558 | +Transformer | epoch 0 | step 60830 |avg loss 7.487 |avg tokens 2379.200 |tokens/s 8403.090 |walltime 16045.389 | +Transformer | epoch 0 | step 60840 |avg loss 8.019 |avg tokens 2252.600 |tokens/s 8929.792 |walltime 16047.912 | +Transformer | epoch 0 | step 60850 |avg loss 7.697 |avg tokens 2358.400 |tokens/s 8349.835 |walltime 16050.736 | +Transformer | epoch 0 | step 60860 |avg loss 7.441 |avg tokens 2243.000 |tokens/s 8231.078 |walltime 16053.461 | +Transformer | epoch 0 | step 60870 |avg loss 7.820 |avg tokens 2258.400 |tokens/s 8536.072 |walltime 16056.107 | +Transformer | epoch 0 | step 60880 |avg loss 7.911 |avg tokens 2006.800 |tokens/s 8073.810 |walltime 16058.592 | +Transformer | epoch 0 | step 60890 |avg loss 7.790 |avg tokens 1860.200 |tokens/s 7343.090 |walltime 16061.126 | +Transformer | epoch 0 | step 60900 |avg loss 7.518 |avg tokens 2231.000 |tokens/s 8297.837 |walltime 16063.814 | +Transformer | epoch 0 | step 60910 |avg loss 7.540 |avg tokens 2365.600 |tokens/s 8430.500 |walltime 16066.620 | +Transformer | epoch 0 | step 60920 |avg loss 7.367 |avg tokens 2210.300 |tokens/s 8171.724 |walltime 16069.325 | +Transformer | epoch 0 | step 60930 |avg loss 7.743 |avg tokens 2095.200 |tokens/s 8067.864 |walltime 16071.922 | +Transformer | epoch 0 | step 60940 |avg loss 7.571 |avg tokens 2113.500 |tokens/s 8016.959 |walltime 16074.558 | +Transformer | epoch 0 | step 60950 |avg loss 7.662 |avg tokens 2107.400 |tokens/s 8124.827 |walltime 16077.152 | +Transformer | epoch 0 | step 60960 |avg loss 7.893 |avg tokens 2033.300 |tokens/s 7915.811 |walltime 16079.721 | +Transformer | epoch 0 | step 60970 |avg loss 7.551 |avg tokens 2196.800 |tokens/s 8256.579 |walltime 16082.382 | +Transformer | epoch 0 | step 60980 |avg loss 7.855 |avg tokens 1759.300 |tokens/s 7141.070 |walltime 16084.845 | +Transformer | epoch 0 | step 60990 |avg loss 7.914 |avg tokens 2070.100 |tokens/s 8033.473 |walltime 16087.422 | +Transformer | epoch 0 | step 61000 |avg loss 7.855 |avg tokens 2150.500 |tokens/s 8124.645 |walltime 16090.069 | +Transformer | epoch 0 | step 61010 |avg loss 7.919 |avg tokens 2137.700 |tokens/s 8264.166 |walltime 16092.656 | +Transformer | epoch 0 | step 61020 |avg loss 7.410 |avg tokens 2398.400 |tokens/s 8851.388 |walltime 16095.365 | +Transformer | epoch 0 | step 61030 |avg loss 7.586 |avg tokens 2160.100 |tokens/s 8183.848 |walltime 16098.005 | +Transformer | epoch 0 | step 61040 |avg loss 7.964 |avg tokens 2053.000 |tokens/s 7941.793 |walltime 16100.590 | +Transformer | epoch 0 | step 61050 |avg loss 7.563 |avg tokens 1908.700 |tokens/s 7680.131 |walltime 16103.075 | +Transformer | epoch 0 | step 61060 |avg loss 7.873 |avg tokens 2052.200 |tokens/s 7949.707 |walltime 16105.657 | +Transformer | epoch 0 | step 61070 |avg loss 7.468 |avg tokens 1955.600 |tokens/s 7629.418 |walltime 16108.220 | +Transformer | epoch 0 | step 61080 |avg loss 8.143 |avg tokens 2109.200 |tokens/s 8071.623 |walltime 16110.833 | +Transformer | epoch 0 | step 61090 |avg loss 8.132 |avg tokens 1944.800 |tokens/s 7954.149 |walltime 16113.278 | +Transformer | epoch 0 | step 61100 |avg loss 7.433 |avg tokens 2225.600 |tokens/s 8348.791 |walltime 16115.944 | +Transformer | epoch 0 | step 61110 |avg loss 7.850 |avg tokens 2316.100 |tokens/s 8511.938 |walltime 16118.665 | +Transformer | epoch 0 | step 61120 |avg loss 7.971 |avg tokens 2122.900 |tokens/s 8621.551 |walltime 16121.127 | +Transformer | epoch 0 | step 61130 |avg loss 8.017 |avg tokens 1982.600 |tokens/s 7959.893 |walltime 16123.618 | +Transformer | epoch 0 | step 61140 |avg loss 7.528 |avg tokens 2326.400 |tokens/s 8510.828 |walltime 16126.351 | +Transformer | epoch 0 | step 61150 |avg loss 7.646 |avg tokens 2060.200 |tokens/s 8003.613 |walltime 16128.925 | +Transformer | epoch 0 | step 61160 |avg loss 7.951 |avg tokens 2214.500 |tokens/s 8483.470 |walltime 16131.536 | +Transformer | epoch 0 | step 61170 |avg loss 7.753 |avg tokens 2251.200 |tokens/s 8404.469 |walltime 16134.214 | +Transformer | epoch 0 | step 61180 |avg loss 7.745 |avg tokens 2277.500 |tokens/s 8611.427 |walltime 16136.859 | +Transformer | epoch 0 | step 61190 |avg loss 7.975 |avg tokens 2329.100 |tokens/s 8877.184 |walltime 16139.483 | +Transformer | epoch 0 | step 61200 |avg loss 8.065 |avg tokens 1980.200 |tokens/s 8311.713 |walltime 16141.865 | +Transformer | epoch 0 | step 61210 |avg loss 7.851 |avg tokens 2212.000 |tokens/s 8275.468 |walltime 16144.538 | +Transformer | epoch 0 | step 61220 |avg loss 7.962 |avg tokens 2289.300 |tokens/s 8683.086 |walltime 16147.175 | +Transformer | epoch 0 | step 61230 |avg loss 7.798 |avg tokens 2344.000 |tokens/s 9120.251 |walltime 16149.745 | +Transformer | epoch 0 | step 61240 |avg loss 7.906 |avg tokens 2104.300 |tokens/s 8104.052 |walltime 16152.341 | +Transformer | epoch 0 | step 61250 |avg loss 7.845 |avg tokens 2126.000 |tokens/s 8406.849 |walltime 16154.870 | +Transformer | epoch 0 | step 61260 |avg loss 7.758 |avg tokens 2098.400 |tokens/s 8207.075 |walltime 16157.427 | +Transformer | epoch 0 | step 61270 |avg loss 7.580 |avg tokens 2098.400 |tokens/s 7930.534 |walltime 16160.073 | +Transformer | epoch 0 | step 61280 |avg loss 7.799 |avg tokens 2159.200 |tokens/s 8293.199 |walltime 16162.677 | +Transformer | epoch 0 | step 61290 |avg loss 7.638 |avg tokens 2400.800 |tokens/s 8761.134 |walltime 16165.417 | +Transformer | epoch 0 | step 61300 |avg loss 7.983 |avg tokens 2031.900 |tokens/s 7925.677 |walltime 16167.981 | +Transformer | epoch 0 | step 61310 |avg loss 7.767 |avg tokens 2075.800 |tokens/s 7822.928 |walltime 16170.634 | +Transformer | epoch 0 | step 61320 |avg loss 7.859 |avg tokens 2164.700 |tokens/s 8448.890 |walltime 16173.196 | +Transformer | epoch 0 | step 61330 |avg loss 7.482 |avg tokens 2159.200 |tokens/s 8308.927 |walltime 16175.795 | +Transformer | epoch 0 | step 61340 |avg loss 7.623 |avg tokens 2383.900 |tokens/s 8509.385 |walltime 16178.596 | +Transformer | epoch 0 | step 61350 |avg loss 7.696 |avg tokens 2261.600 |tokens/s 8402.761 |walltime 16181.288 | +Transformer | epoch 0 | step 61360 |avg loss 7.141 |avg tokens 2385.000 |tokens/s 8637.649 |walltime 16184.049 | +Transformer | epoch 0 | step 61370 |avg loss 8.277 |avg tokens 2202.800 |tokens/s 9208.198 |walltime 16186.441 | +Transformer | epoch 0 | step 61380 |avg loss 7.790 |avg tokens 2271.000 |tokens/s 8613.933 |walltime 16189.078 | +Transformer | epoch 0 | step 61390 |avg loss 8.030 |avg tokens 2048.200 |tokens/s 8194.157 |walltime 16191.577 | +Transformer | epoch 0 | step 61400 |avg loss 7.687 |avg tokens 2328.200 |tokens/s 8653.713 |walltime 16194.268 | +Transformer | epoch 0 | step 61410 |avg loss 7.776 |avg tokens 2347.500 |tokens/s 8708.150 |walltime 16196.963 | +Transformer | epoch 0 | step 61420 |avg loss 7.961 |avg tokens 2190.600 |tokens/s 8187.667 |walltime 16199.639 | +Transformer | epoch 0 | step 61430 |avg loss 7.747 |avg tokens 2275.300 |tokens/s 8552.939 |walltime 16202.299 | +Transformer | epoch 0 | step 61440 |avg loss 8.129 |avg tokens 1973.100 |tokens/s 7922.827 |walltime 16204.790 | +Transformer | epoch 0 | step 61450 |avg loss 7.939 |avg tokens 2140.400 |tokens/s 7919.501 |walltime 16207.492 | +Transformer | epoch 0 | step 61460 |avg loss 7.856 |avg tokens 2146.900 |tokens/s 8285.952 |walltime 16210.083 | +Transformer | epoch 0 | step 61470 |avg loss 7.629 |avg tokens 2316.800 |tokens/s 8525.619 |walltime 16212.801 | +Transformer | epoch 0 | step 61480 |avg loss 8.094 |avg tokens 1986.100 |tokens/s 8076.278 |walltime 16215.260 | +Transformer | epoch 0 | step 61490 |avg loss 7.776 |avg tokens 2149.900 |tokens/s 8137.782 |walltime 16217.902 | +Transformer | epoch 0 | step 61500 |avg loss 7.812 |avg tokens 2265.600 |tokens/s 8453.730 |walltime 16220.582 | +Transformer | epoch 0 | step 61510 |avg loss 7.610 |avg tokens 2228.800 |tokens/s 8338.560 |walltime 16223.255 | +Transformer | epoch 0 | step 61520 |avg loss 7.719 |avg tokens 2298.900 |tokens/s 8508.400 |walltime 16225.957 | +Transformer | epoch 0 | step 61530 |avg loss 8.176 |avg tokens 1950.400 |tokens/s 8096.924 |walltime 16228.365 | +Transformer | epoch 0 | step 61540 |avg loss 8.046 |avg tokens 2144.900 |tokens/s 8386.521 |walltime 16230.923 | +Transformer | epoch 0 | step 61550 |avg loss 7.579 |avg tokens 2236.200 |tokens/s 8236.245 |walltime 16233.638 | +Transformer | epoch 0 | step 61560 |avg loss 7.922 |avg tokens 1935.400 |tokens/s 7728.266 |walltime 16236.142 | +Transformer | epoch 0 | step 61570 |avg loss 7.872 |avg tokens 2032.700 |tokens/s 8156.519 |walltime 16238.635 | +Transformer | epoch 0 | step 61580 |avg loss 7.707 |avg tokens 2410.400 |tokens/s 8609.257 |walltime 16241.434 | +Transformer | epoch 0 | step 61590 |avg loss 7.685 |avg tokens 2255.400 |tokens/s 8339.092 |walltime 16244.139 | +Transformer | epoch 0 | step 61600 |avg loss 7.905 |avg tokens 2066.000 |tokens/s 8273.437 |walltime 16246.636 | +Transformer | epoch 0 | step 61610 |avg loss 7.799 |avg tokens 2394.900 |tokens/s 8495.116 |walltime 16249.455 | +Transformer | epoch 0 | step 61620 |avg loss 7.692 |avg tokens 1940.800 |tokens/s 7843.580 |walltime 16251.930 | +Transformer | epoch 0 | step 61630 |avg loss 7.638 |avg tokens 2268.800 |tokens/s 8288.033 |walltime 16254.667 | +Transformer | epoch 0 | step 61640 |avg loss 7.805 |avg tokens 2267.700 |tokens/s 8338.286 |walltime 16257.387 | +Transformer | epoch 0 | step 61650 |avg loss 7.779 |avg tokens 2143.200 |tokens/s 7973.312 |walltime 16260.075 | +Transformer | epoch 0 | step 61660 |avg loss 7.724 |avg tokens 2209.600 |tokens/s 8249.322 |walltime 16262.753 | +Transformer | epoch 0 | step 61670 |avg loss 7.813 |avg tokens 2355.100 |tokens/s 9137.583 |walltime 16265.331 | +Transformer | epoch 0 | step 61680 |avg loss 7.621 |avg tokens 2223.300 |tokens/s 8109.595 |walltime 16268.072 | +Transformer | epoch 0 | step 61690 |avg loss 7.668 |avg tokens 2151.900 |tokens/s 8232.993 |walltime 16270.686 | +Transformer | epoch 0 | step 61700 |avg loss 7.637 |avg tokens 2064.100 |tokens/s 7967.907 |walltime 16273.276 | +Transformer | epoch 0 | step 61710 |avg loss 8.149 |avg tokens 2187.600 |tokens/s 8462.162 |walltime 16275.862 | +Transformer | epoch 0 | step 61720 |avg loss 7.712 |avg tokens 2005.500 |tokens/s 7726.340 |walltime 16278.457 | +Transformer | epoch 0 | step 61730 |avg loss 7.778 |avg tokens 2007.500 |tokens/s 7784.617 |walltime 16281.036 | +Transformer | epoch 0 | step 61740 |avg loss 7.603 |avg tokens 2171.900 |tokens/s 8045.005 |walltime 16283.736 | +Transformer | epoch 0 | step 61750 |avg loss 7.891 |avg tokens 2045.300 |tokens/s 7805.448 |walltime 16286.356 | +Transformer | epoch 0 | step 61760 |avg loss 7.887 |avg tokens 2095.500 |tokens/s 8120.679 |walltime 16288.937 | +Transformer | epoch 0 | step 61770 |avg loss 7.865 |avg tokens 2089.900 |tokens/s 8159.327 |walltime 16291.498 | +Transformer | epoch 0 | step 61780 |avg loss 7.869 |avg tokens 2155.100 |tokens/s 8151.987 |walltime 16294.142 | +Transformer | epoch 0 | step 61790 |avg loss 7.885 |avg tokens 2324.700 |tokens/s 8868.617 |walltime 16296.763 | +Transformer | epoch 0 | step 61800 |avg loss 7.786 |avg tokens 2063.600 |tokens/s 7964.390 |walltime 16299.354 | +Transformer | epoch 0 | step 61810 |avg loss 7.915 |avg tokens 2036.700 |tokens/s 8094.654 |walltime 16301.870 | +Transformer | epoch 0 | step 61820 |avg loss 7.638 |avg tokens 2319.500 |tokens/s 8336.745 |walltime 16304.652 | +Transformer | epoch 0 | step 61830 |avg loss 7.913 |avg tokens 2122.400 |tokens/s 8252.391 |walltime 16307.224 | +Transformer | epoch 0 | step 61840 |avg loss 7.692 |avg tokens 2320.000 |tokens/s 8638.772 |walltime 16309.910 | +Transformer | epoch 0 | step 61850 |avg loss 7.726 |avg tokens 2116.000 |tokens/s 8074.501 |walltime 16312.530 | +Transformer | epoch 0 | step 61860 |avg loss 7.745 |avg tokens 2329.600 |tokens/s 8707.529 |walltime 16315.206 | +Transformer | epoch 0 | step 61870 |avg loss 7.987 |avg tokens 2038.200 |tokens/s 7746.991 |walltime 16317.837 | +Transformer | epoch 0 | step 61880 |avg loss 7.566 |avg tokens 2333.000 |tokens/s 8366.105 |walltime 16320.625 | +Transformer | epoch 0 | step 61890 |avg loss 7.645 |avg tokens 1980.600 |tokens/s 7613.860 |walltime 16323.227 | +Transformer | epoch 0 | step 61900 |avg loss 7.848 |avg tokens 2214.200 |tokens/s 8363.380 |walltime 16325.874 | +Transformer | epoch 0 | step 61910 |avg loss 7.811 |avg tokens 2253.400 |tokens/s 8576.664 |walltime 16328.502 | +Transformer | epoch 0 | step 61920 |avg loss 7.558 |avg tokens 2223.600 |tokens/s 8174.922 |walltime 16331.222 | +Transformer | epoch 0 | step 61930 |avg loss 7.996 |avg tokens 1934.400 |tokens/s 7461.948 |walltime 16333.814 | +Transformer | epoch 0 | step 61940 |avg loss 7.814 |avg tokens 2182.300 |tokens/s 8298.036 |walltime 16336.444 | +Transformer | epoch 0 | step 61950 |avg loss 7.863 |avg tokens 2133.500 |tokens/s 8382.111 |walltime 16338.989 | +Transformer | epoch 0 | step 61960 |avg loss 7.663 |avg tokens 2228.000 |tokens/s 8265.736 |walltime 16341.685 | +Transformer | epoch 0 | step 61970 |avg loss 7.812 |avg tokens 2281.700 |tokens/s 8448.030 |walltime 16344.385 | +Transformer | epoch 0 | step 61980 |avg loss 7.946 |avg tokens 2363.300 |tokens/s 8776.603 |walltime 16347.078 | +Transformer | epoch 0 | step 61990 |avg loss 8.064 |avg tokens 1790.400 |tokens/s 7347.614 |walltime 16349.515 | +Transformer | epoch 0 | step 62000 |avg loss 7.699 |avg tokens 2198.100 |tokens/s 8073.374 |walltime 16352.238 | +Transformer | epoch 0 | step 62010 |avg loss 7.692 |avg tokens 2408.000 |tokens/s 8748.901 |walltime 16354.990 | +Transformer | epoch 0 | step 62020 |avg loss 7.935 |avg tokens 2171.500 |tokens/s 8461.677 |walltime 16357.556 | +Transformer | epoch 0 | step 62030 |avg loss 7.607 |avg tokens 2239.200 |tokens/s 8255.631 |walltime 16360.269 | +Transformer | epoch 0 | step 62040 |avg loss 7.932 |avg tokens 2129.600 |tokens/s 8499.056 |walltime 16362.774 | +Transformer | epoch 0 | step 62050 |avg loss 7.921 |avg tokens 1999.900 |tokens/s 8039.057 |walltime 16365.262 | +Transformer | epoch 0 | step 62060 |avg loss 7.737 |avg tokens 2088.900 |tokens/s 7862.241 |walltime 16367.919 | +Transformer | epoch 0 | step 62070 |avg loss 7.950 |avg tokens 2248.400 |tokens/s 8637.617 |walltime 16370.522 | +Transformer | epoch 0 | step 62080 |avg loss 7.603 |avg tokens 2292.800 |tokens/s 8372.856 |walltime 16373.260 | +Transformer | epoch 0 | step 62090 |avg loss 7.821 |avg tokens 2155.700 |tokens/s 8520.280 |walltime 16375.790 | +Transformer | epoch 0 | step 62100 |avg loss 7.936 |avg tokens 1902.800 |tokens/s 7375.074 |walltime 16378.370 | +Transformer | epoch 0 | step 62110 |avg loss 7.903 |avg tokens 2115.200 |tokens/s 8282.245 |walltime 16380.924 | +Transformer | epoch 0 | step 62120 |avg loss 7.576 |avg tokens 2238.800 |tokens/s 8334.486 |walltime 16383.610 | +Transformer | epoch 0 | step 62130 |avg loss 8.153 |avg tokens 1588.400 |tokens/s 6829.368 |walltime 16385.936 | +Transformer | epoch 0 | step 62140 |avg loss 7.765 |avg tokens 2218.400 |tokens/s 8161.728 |walltime 16388.654 | +Transformer | epoch 0 | step 62150 |avg loss 7.581 |avg tokens 2037.000 |tokens/s 7819.414 |walltime 16391.259 | +Transformer | epoch 0 | step 62160 |avg loss 7.570 |avg tokens 2309.600 |tokens/s 8359.360 |walltime 16394.022 | +Transformer | epoch 0 | step 62170 |avg loss 7.641 |avg tokens 2105.100 |tokens/s 8058.495 |walltime 16396.635 | +Transformer | epoch 0 | step 62180 |avg loss 7.705 |avg tokens 2034.900 |tokens/s 7901.365 |walltime 16399.210 | +Transformer | epoch 0 | step 62190 |avg loss 7.716 |avg tokens 2194.400 |tokens/s 8215.157 |walltime 16401.881 | +Transformer | epoch 0 | step 62200 |avg loss 8.133 |avg tokens 2263.500 |tokens/s 9001.999 |walltime 16404.396 | +Transformer | epoch 0 | step 62210 |avg loss 8.220 |avg tokens 1882.500 |tokens/s 8072.002 |walltime 16406.728 | +Transformer | epoch 0 | step 62220 |avg loss 7.744 |avg tokens 1929.800 |tokens/s 7555.484 |walltime 16409.282 | +Transformer | epoch 0 | step 62230 |avg loss 7.558 |avg tokens 2251.200 |tokens/s 8113.085 |walltime 16412.057 | +Transformer | epoch 0 | step 62240 |avg loss 7.591 |avg tokens 2310.400 |tokens/s 8381.615 |walltime 16414.813 | +Transformer | epoch 0 | step 62250 |avg loss 7.755 |avg tokens 2244.000 |tokens/s 8379.575 |walltime 16417.491 | +Transformer | epoch 0 | step 62260 |avg loss 7.761 |avg tokens 2154.800 |tokens/s 8432.326 |walltime 16420.047 | +Transformer | epoch 0 | step 62270 |avg loss 7.837 |avg tokens 2253.800 |tokens/s 8406.201 |walltime 16422.728 | +Transformer | epoch 0 | step 62280 |avg loss 7.789 |avg tokens 2271.000 |tokens/s 8296.932 |walltime 16425.465 | +Transformer | epoch 0 | step 62290 |avg loss 7.509 |avg tokens 2123.200 |tokens/s 8013.545 |walltime 16428.114 | +Transformer | epoch 0 | step 62300 |avg loss 7.460 |avg tokens 2260.000 |tokens/s 8495.193 |walltime 16430.775 | +Transformer | epoch 0 | step 62310 |avg loss 8.019 |avg tokens 2190.700 |tokens/s 8468.631 |walltime 16433.362 | +Transformer | epoch 0 | step 62320 |avg loss 7.653 |avg tokens 2293.600 |tokens/s 8489.023 |walltime 16436.063 | +Transformer | epoch 0 | step 62330 |avg loss 7.941 |avg tokens 1950.000 |tokens/s 7916.766 |walltime 16438.526 | +Transformer | epoch 0 | step 62340 |avg loss 7.777 |avg tokens 2243.500 |tokens/s 8401.572 |walltime 16441.197 | +Transformer | epoch 0 | step 62350 |avg loss 7.727 |avg tokens 2085.200 |tokens/s 7844.722 |walltime 16443.855 | +Transformer | epoch 0 | step 62360 |avg loss 8.203 |avg tokens 2222.900 |tokens/s 8447.470 |walltime 16446.486 | +Transformer | epoch 0 | step 62370 |avg loss 7.993 |avg tokens 2050.400 |tokens/s 8074.873 |walltime 16449.026 | +Transformer | epoch 0 | step 62380 |avg loss 7.817 |avg tokens 2024.200 |tokens/s 7833.170 |walltime 16451.610 | +Transformer | epoch 0 | step 62390 |avg loss 8.134 |avg tokens 2143.100 |tokens/s 8390.781 |walltime 16454.164 | +Transformer | epoch 0 | step 62400 |avg loss 7.471 |avg tokens 2321.400 |tokens/s 8525.255 |walltime 16456.887 | +Transformer | epoch 0 | step 62410 |avg loss 7.459 |avg tokens 2132.400 |tokens/s 7926.406 |walltime 16459.577 | +Transformer | epoch 0 | step 62420 |avg loss 7.722 |avg tokens 2309.100 |tokens/s 8551.742 |walltime 16462.277 | +Transformer | epoch 0 | step 62430 |avg loss 7.783 |avg tokens 2289.800 |tokens/s 8254.804 |walltime 16465.051 | +Transformer | epoch 0 | step 62440 |avg loss 7.570 |avg tokens 2329.600 |tokens/s 8325.074 |walltime 16467.849 | +Transformer | epoch 0 | step 62450 |avg loss 7.842 |avg tokens 2166.600 |tokens/s 8264.181 |walltime 16470.471 | +Transformer | epoch 0 | step 62460 |avg loss 7.691 |avg tokens 2254.800 |tokens/s 8391.488 |walltime 16473.158 | +Transformer | epoch 0 | step 62470 |avg loss 8.034 |avg tokens 2203.800 |tokens/s 8631.176 |walltime 16475.711 | +Transformer | epoch 0 | step 62480 |avg loss 7.571 |avg tokens 2239.500 |tokens/s 8291.088 |walltime 16478.413 | +Transformer | epoch 0 | step 62490 |avg loss 7.990 |avg tokens 2180.200 |tokens/s 8211.515 |walltime 16481.068 | +Transformer | epoch 0 | step 62500 |avg loss 7.710 |avg tokens 2159.900 |tokens/s 8112.698 |walltime 16483.730 | +Transformer | epoch 0 | step 62510 |avg loss 7.933 |avg tokens 2250.300 |tokens/s 8750.364 |walltime 16486.302 | +Transformer | epoch 0 | step 62520 |avg loss 7.765 |avg tokens 2012.000 |tokens/s 7812.315 |walltime 16488.877 | +Transformer | epoch 0 | step 62530 |avg loss 7.938 |avg tokens 2381.600 |tokens/s 9079.392 |walltime 16491.500 | +Transformer | epoch 0 | step 62540 |avg loss 7.769 |avg tokens 2174.500 |tokens/s 8249.321 |walltime 16494.136 | +Transformer | epoch 0 | step 62550 |avg loss 7.995 |avg tokens 2278.800 |tokens/s 8687.507 |walltime 16496.759 | +Transformer | epoch 0 | step 62560 |avg loss 7.646 |avg tokens 2239.200 |tokens/s 8227.962 |walltime 16499.481 | +Transformer | epoch 0 | step 62570 |avg loss 7.933 |avg tokens 2197.000 |tokens/s 8237.887 |walltime 16502.148 | +Transformer | epoch 0 | step 62580 |avg loss 7.523 |avg tokens 2187.200 |tokens/s 8062.223 |walltime 16504.861 | +Transformer | epoch 0 | step 62590 |avg loss 7.597 |avg tokens 2158.300 |tokens/s 8127.520 |walltime 16507.516 | +Transformer | epoch 0 | step 62600 |avg loss 7.789 |avg tokens 2045.900 |tokens/s 7985.833 |walltime 16510.078 | +Transformer | epoch 0 | step 62610 |avg loss 7.922 |avg tokens 1997.200 |tokens/s 8327.455 |walltime 16512.476 | +Transformer | epoch 0 | step 62620 |avg loss 7.919 |avg tokens 2260.200 |tokens/s 8680.149 |walltime 16515.080 | +Transformer | epoch 0 | step 62630 |avg loss 7.446 |avg tokens 2251.100 |tokens/s 8210.881 |walltime 16517.822 | +Transformer | epoch 0 | step 62640 |avg loss 7.766 |avg tokens 2280.900 |tokens/s 8564.896 |walltime 16520.485 | +Transformer | epoch 0 | step 62650 |avg loss 7.734 |avg tokens 2388.000 |tokens/s 8918.854 |walltime 16523.162 | +Transformer | epoch 0 | step 62660 |avg loss 7.798 |avg tokens 2156.300 |tokens/s 8221.663 |walltime 16525.785 | +Transformer | epoch 0 | step 62670 |avg loss 7.905 |avg tokens 2303.500 |tokens/s 8720.412 |walltime 16528.427 | +Transformer | epoch 0 | step 62680 |avg loss 7.749 |avg tokens 2249.000 |tokens/s 8314.321 |walltime 16531.132 | +Transformer | epoch 0 | step 62690 |avg loss 8.025 |avg tokens 2039.300 |tokens/s 8157.574 |walltime 16533.631 | +Transformer | epoch 0 | step 62700 |avg loss 7.281 |avg tokens 2283.600 |tokens/s 8232.307 |walltime 16536.405 | +Transformer | epoch 0 | step 62710 |avg loss 7.641 |avg tokens 2167.200 |tokens/s 8224.449 |walltime 16539.040 | +Transformer | epoch 0 | step 62720 |avg loss 7.946 |avg tokens 2139.800 |tokens/s 8091.109 |walltime 16541.685 | +Transformer | epoch 0 | step 62730 |avg loss 7.796 |avg tokens 2318.800 |tokens/s 8822.775 |walltime 16544.313 | +Transformer | epoch 0 | step 62740 |avg loss 7.809 |avg tokens 2273.700 |tokens/s 8426.901 |walltime 16547.011 | +Transformer | epoch 0 | step 62750 |avg loss 7.638 |avg tokens 2085.600 |tokens/s 7902.086 |walltime 16549.651 | +Transformer | epoch 0 | step 62760 |avg loss 7.776 |avg tokens 2396.300 |tokens/s 9016.320 |walltime 16552.308 | +Transformer | epoch 0 | step 62770 |avg loss 7.929 |avg tokens 1986.900 |tokens/s 7902.625 |walltime 16554.823 | +Transformer | epoch 0 | step 62780 |avg loss 7.511 |avg tokens 2161.600 |tokens/s 8075.378 |walltime 16557.500 | +Transformer | epoch 0 | step 62790 |avg loss 7.636 |avg tokens 2277.900 |tokens/s 8316.831 |walltime 16560.238 | +Transformer | epoch 0 | step 62800 |avg loss 7.799 |avg tokens 2035.200 |tokens/s 7869.487 |walltime 16562.825 | +Transformer | epoch 0 | step 62810 |avg loss 7.991 |avg tokens 2211.100 |tokens/s 8343.139 |walltime 16565.475 | +Transformer | epoch 0 | step 62820 |avg loss 7.696 |avg tokens 2201.600 |tokens/s 8146.383 |walltime 16568.177 | +Transformer | epoch 0 | step 62830 |avg loss 7.789 |avg tokens 2202.600 |tokens/s 8205.249 |walltime 16570.862 | +Transformer | epoch 0 | step 62840 |avg loss 7.591 |avg tokens 2072.400 |tokens/s 7758.395 |walltime 16573.533 | +Transformer | epoch 0 | step 62850 |avg loss 7.895 |avg tokens 2227.600 |tokens/s 8675.879 |walltime 16576.101 | +Transformer | epoch 0 | step 62860 |avg loss 7.904 |avg tokens 2221.200 |tokens/s 8407.649 |walltime 16578.742 | +Transformer | epoch 0 | step 62870 |avg loss 7.541 |avg tokens 2318.600 |tokens/s 8476.146 |walltime 16581.478 | +Transformer | epoch 0 | step 62880 |avg loss 8.079 |avg tokens 2318.600 |tokens/s 8960.911 |walltime 16584.065 | +Transformer | epoch 0 | step 62890 |avg loss 7.945 |avg tokens 2028.000 |tokens/s 8021.791 |walltime 16586.593 | +Transformer | epoch 0 | step 62900 |avg loss 7.610 |avg tokens 2282.400 |tokens/s 8491.553 |walltime 16589.281 | +Transformer | epoch 0 | step 62910 |avg loss 7.970 |avg tokens 2023.200 |tokens/s 7752.254 |walltime 16591.891 | +Transformer | epoch 0 | step 62920 |avg loss 7.792 |avg tokens 1998.100 |tokens/s 7859.196 |walltime 16594.433 | +Transformer | epoch 0 | step 62930 |avg loss 7.759 |avg tokens 2223.600 |tokens/s 8529.476 |walltime 16597.040 | +Transformer | epoch 0 | step 62940 |avg loss 7.733 |avg tokens 2318.500 |tokens/s 8560.079 |walltime 16599.749 | +Transformer | epoch 0 | step 62950 |avg loss 8.158 |avg tokens 1965.900 |tokens/s 7900.708 |walltime 16602.237 | +Transformer | epoch 0 | step 62960 |avg loss 8.251 |avg tokens 1721.900 |tokens/s 7457.679 |walltime 16604.546 | +Transformer | epoch 0 | step 62970 |avg loss 7.662 |avg tokens 2212.600 |tokens/s 8261.331 |walltime 16607.224 | +Transformer | epoch 0 | step 62980 |avg loss 7.363 |avg tokens 2175.200 |tokens/s 8106.363 |walltime 16609.908 | +Transformer | epoch 0 | step 62990 |avg loss 7.718 |avg tokens 2077.600 |tokens/s 8160.409 |walltime 16612.454 | +Transformer | epoch 0 | step 63000 |avg loss 7.600 |avg tokens 2070.300 |tokens/s 7875.043 |walltime 16615.083 | +Transformer | epoch 0 | step 63010 |avg loss 8.174 |avg tokens 2186.500 |tokens/s 8914.336 |walltime 16617.535 | +Transformer | epoch 0 | step 63020 |avg loss 7.613 |avg tokens 2327.500 |tokens/s 8757.447 |walltime 16620.193 | +Transformer | epoch 0 | step 63030 |avg loss 7.393 |avg tokens 2007.300 |tokens/s 7787.148 |walltime 16622.771 | +Transformer | epoch 0 | step 63040 |avg loss 7.911 |avg tokens 2038.200 |tokens/s 7736.962 |walltime 16625.405 | +Transformer | epoch 0 | step 63050 |avg loss 7.690 |avg tokens 2231.500 |tokens/s 8428.627 |walltime 16628.053 | +Transformer | epoch 0 | step 63060 |avg loss 7.516 |avg tokens 2228.800 |tokens/s 8398.742 |walltime 16630.706 | +Transformer | epoch 0 | step 63070 |avg loss 7.600 |avg tokens 2136.600 |tokens/s 8142.001 |walltime 16633.331 | +Transformer | epoch 0 | step 63080 |avg loss 7.841 |avg tokens 2344.000 |tokens/s 8625.528 |walltime 16636.048 | +Transformer | epoch 0 | step 63090 |avg loss 7.813 |avg tokens 2290.000 |tokens/s 8475.084 |walltime 16638.750 | +Transformer | epoch 0 | step 63100 |avg loss 8.172 |avg tokens 2222.000 |tokens/s 8615.312 |walltime 16641.329 | +Transformer | epoch 0 | step 63110 |avg loss 7.920 |avg tokens 2254.400 |tokens/s 8623.023 |walltime 16643.944 | +Transformer | epoch 0 | step 63120 |avg loss 7.607 |avg tokens 2389.600 |tokens/s 8747.344 |walltime 16646.676 | +Transformer | epoch 0 | step 63130 |avg loss 7.880 |avg tokens 1952.200 |tokens/s 7799.640 |walltime 16649.178 | +Transformer | epoch 0 | step 63140 |avg loss 7.685 |avg tokens 2217.600 |tokens/s 8267.445 |walltime 16651.861 | +Transformer | epoch 0 | step 63150 |avg loss 7.804 |avg tokens 2404.800 |tokens/s 8809.251 |walltime 16654.591 | +Transformer | epoch 0 | step 63160 |avg loss 7.853 |avg tokens 2225.700 |tokens/s 8737.327 |walltime 16657.138 | +Transformer | epoch 0 | step 63170 |avg loss 7.985 |avg tokens 2053.600 |tokens/s 8214.681 |walltime 16659.638 | +Transformer | epoch 0 | step 63180 |avg loss 8.054 |avg tokens 2265.300 |tokens/s 8887.017 |walltime 16662.187 | +Transformer | epoch 0 | step 63190 |avg loss 7.639 |avg tokens 2161.600 |tokens/s 7907.884 |walltime 16664.920 | +Transformer | epoch 0 | step 63200 |avg loss 7.781 |avg tokens 2318.700 |tokens/s 8649.252 |walltime 16667.601 | +Transformer | epoch 0 | step 63210 |avg loss 7.782 |avg tokens 2320.000 |tokens/s 8508.179 |walltime 16670.328 | +Transformer | epoch 0 | step 63220 |avg loss 7.943 |avg tokens 1984.000 |tokens/s 7761.432 |walltime 16672.884 | +Transformer | epoch 0 | step 63230 |avg loss 7.797 |avg tokens 2027.400 |tokens/s 8135.020 |walltime 16675.376 | +Transformer | epoch 0 | step 63240 |avg loss 7.512 |avg tokens 2149.600 |tokens/s 8198.940 |walltime 16677.998 | +Transformer | epoch 0 | step 63250 |avg loss 7.752 |avg tokens 2304.000 |tokens/s 8707.267 |walltime 16680.644 | +Transformer | epoch 0 | step 63260 |avg loss 7.773 |avg tokens 2146.800 |tokens/s 8058.806 |walltime 16683.308 | +Transformer | epoch 0 | step 63270 |avg loss 7.537 |avg tokens 2192.700 |tokens/s 8226.665 |walltime 16685.974 | +Transformer | epoch 0 | step 63280 |avg loss 7.934 |avg tokens 1955.300 |tokens/s 7889.917 |walltime 16688.452 | +Transformer | epoch 0 | step 63290 |avg loss 7.965 |avg tokens 1808.700 |tokens/s 7473.682 |walltime 16690.872 | +Transformer | epoch 0 | step 63300 |avg loss 7.929 |avg tokens 2268.600 |tokens/s 8463.535 |walltime 16693.552 | +Transformer | epoch 0 | step 63310 |avg loss 7.807 |avg tokens 2074.900 |tokens/s 8138.501 |walltime 16696.102 | +Transformer | epoch 0 | step 63320 |avg loss 7.600 |avg tokens 2314.400 |tokens/s 8382.177 |walltime 16698.863 | +Transformer | epoch 0 | step 63330 |avg loss 7.904 |avg tokens 2275.100 |tokens/s 8398.272 |walltime 16701.572 | +Transformer | epoch 0 | step 63340 |avg loss 7.662 |avg tokens 2192.800 |tokens/s 8201.436 |walltime 16704.246 | +Transformer | epoch 0 | step 63350 |avg loss 7.724 |avg tokens 1959.000 |tokens/s 7793.772 |walltime 16706.759 | +Transformer | epoch 0 | step 63360 |avg loss 7.623 |avg tokens 2023.200 |tokens/s 7760.658 |walltime 16709.366 | +Transformer | epoch 0 | step 63370 |avg loss 8.244 |avg tokens 2088.600 |tokens/s 8762.030 |walltime 16711.750 | +Transformer | epoch 0 | step 63380 |avg loss 7.665 |avg tokens 2215.400 |tokens/s 8446.785 |walltime 16714.373 | +Transformer | epoch 0 | step 63390 |avg loss 7.739 |avg tokens 2243.200 |tokens/s 8322.310 |walltime 16717.068 | +Transformer | epoch 0 | step 63400 |avg loss 7.967 |avg tokens 2202.900 |tokens/s 8421.643 |walltime 16719.684 | +Transformer | epoch 0 | step 63410 |avg loss 7.515 |avg tokens 1962.400 |tokens/s 7724.009 |walltime 16722.225 | +Transformer | epoch 0 | step 63420 |avg loss 7.807 |avg tokens 2128.300 |tokens/s 8248.630 |walltime 16724.805 | +Transformer | epoch 0 | step 63430 |avg loss 7.832 |avg tokens 1952.500 |tokens/s 7674.708 |walltime 16727.349 | +Transformer | epoch 0 | step 63440 |avg loss 7.596 |avg tokens 2211.900 |tokens/s 8118.852 |walltime 16730.073 | +Transformer | epoch 0 | step 63450 |avg loss 7.597 |avg tokens 2354.800 |tokens/s 8455.179 |walltime 16732.858 | +Transformer | epoch 0 | step 63460 |avg loss 7.922 |avg tokens 2033.900 |tokens/s 8103.161 |walltime 16735.368 | +Transformer | epoch 0 | step 63470 |avg loss 7.695 |avg tokens 2359.200 |tokens/s 8676.784 |walltime 16738.087 | +Transformer | epoch 0 | step 63480 |avg loss 7.837 |avg tokens 2247.700 |tokens/s 8477.813 |walltime 16740.739 | +Transformer | epoch 0 | step 63490 |avg loss 7.776 |avg tokens 2102.100 |tokens/s 8237.818 |walltime 16743.290 | +Transformer | epoch 0 | step 63500 |avg loss 7.551 |avg tokens 2388.900 |tokens/s 8821.929 |walltime 16745.998 | +Transformer | epoch 0 | step 63510 |avg loss 7.514 |avg tokens 2261.600 |tokens/s 8220.828 |walltime 16748.749 | +Transformer | epoch 0 | step 63520 |avg loss 7.898 |avg tokens 2271.100 |tokens/s 8441.122 |walltime 16751.440 | +Transformer | epoch 0 | step 63530 |avg loss 8.055 |avg tokens 2266.200 |tokens/s 8814.986 |walltime 16754.011 | +Transformer | epoch 0 | step 63540 |avg loss 7.760 |avg tokens 2083.800 |tokens/s 7868.763 |walltime 16756.659 | +Transformer | epoch 0 | step 63550 |avg loss 7.693 |avg tokens 2144.100 |tokens/s 8247.478 |walltime 16759.259 | +Transformer | epoch 0 | step 63560 |avg loss 7.841 |avg tokens 2161.700 |tokens/s 8144.841 |walltime 16761.913 | +Transformer | epoch 0 | step 63570 |avg loss 7.575 |avg tokens 2132.600 |tokens/s 8043.929 |walltime 16764.564 | +Transformer | epoch 0 | step 63580 |avg loss 7.628 |avg tokens 2348.900 |tokens/s 8607.421 |walltime 16767.293 | +Transformer | epoch 0 | step 63590 |avg loss 7.526 |avg tokens 2276.800 |tokens/s 8333.162 |walltime 16770.025 | +Transformer | epoch 0 | step 63600 |avg loss 7.610 |avg tokens 2186.300 |tokens/s 8545.320 |walltime 16772.583 | +Transformer | epoch 0 | step 63610 |avg loss 7.643 |avg tokens 2269.900 |tokens/s 8242.646 |walltime 16775.337 | +Transformer | epoch 0 | step 63620 |avg loss 7.645 |avg tokens 2168.100 |tokens/s 8221.678 |walltime 16777.974 | +Transformer | epoch 0 | step 63630 |avg loss 7.897 |avg tokens 2355.400 |tokens/s 8838.185 |walltime 16780.639 | +Transformer | epoch 0 | step 63640 |avg loss 8.016 |avg tokens 1870.000 |tokens/s 7486.220 |walltime 16783.137 | +Transformer | epoch 0 | step 63650 |avg loss 7.735 |avg tokens 2321.200 |tokens/s 8660.883 |walltime 16785.817 | +Transformer | epoch 0 | step 63660 |avg loss 7.755 |avg tokens 2367.200 |tokens/s 8598.346 |walltime 16788.571 | +Transformer | epoch 0 | step 63670 |avg loss 7.490 |avg tokens 2158.000 |tokens/s 7992.065 |walltime 16791.271 | +Transformer | epoch 0 | step 63680 |avg loss 7.552 |avg tokens 2324.800 |tokens/s 8586.562 |walltime 16793.978 | +Transformer | epoch 0 | step 63690 |avg loss 7.814 |avg tokens 2178.600 |tokens/s 8332.871 |walltime 16796.593 | +Transformer | epoch 0 | step 63700 |avg loss 7.974 |avg tokens 2160.300 |tokens/s 8202.030 |walltime 16799.227 | +Transformer | epoch 0 | step 63710 |avg loss 7.576 |avg tokens 2280.200 |tokens/s 8445.171 |walltime 16801.927 | +Transformer | epoch 0 | step 63720 |avg loss 7.824 |avg tokens 2122.100 |tokens/s 8295.586 |walltime 16804.485 | +Transformer | epoch 0 | step 63730 |avg loss 7.970 |avg tokens 2187.200 |tokens/s 8390.765 |walltime 16807.091 | +Transformer | epoch 0 | step 63740 |avg loss 7.685 |avg tokens 2172.500 |tokens/s 8405.613 |walltime 16809.676 | +Transformer | epoch 0 | step 63750 |avg loss 7.552 |avg tokens 2241.600 |tokens/s 8175.675 |walltime 16812.418 | +Transformer | epoch 0 | step 63760 |avg loss 7.925 |avg tokens 2025.000 |tokens/s 7962.470 |walltime 16814.961 | +Transformer | epoch 0 | step 63770 |avg loss 7.354 |avg tokens 2330.600 |tokens/s 8371.885 |walltime 16817.745 | +Transformer | epoch 0 | step 63780 |avg loss 7.695 |avg tokens 2257.500 |tokens/s 8819.279 |walltime 16820.304 | +Transformer | epoch 0 | step 63790 |avg loss 8.100 |avg tokens 2164.200 |tokens/s 8651.661 |walltime 16822.806 | +Transformer | epoch 0 | step 63800 |avg loss 8.012 |avg tokens 2160.400 |tokens/s 8481.496 |walltime 16825.353 | +Transformer | epoch 0 | step 63810 |avg loss 7.959 |avg tokens 2128.400 |tokens/s 8276.619 |walltime 16827.925 | +Transformer | epoch 0 | step 63820 |avg loss 7.816 |avg tokens 2227.200 |tokens/s 8356.233 |walltime 16830.590 | +Transformer | epoch 0 | step 63830 |avg loss 7.924 |avg tokens 2184.800 |tokens/s 8565.978 |walltime 16833.141 | +Transformer | epoch 0 | step 63840 |avg loss 7.706 |avg tokens 2142.600 |tokens/s 8087.534 |walltime 16835.790 | +Transformer | epoch 0 | step 63850 |avg loss 7.784 |avg tokens 2353.200 |tokens/s 8691.970 |walltime 16838.497 | +Transformer | epoch 0 | step 63860 |avg loss 7.940 |avg tokens 2177.400 |tokens/s 8740.344 |walltime 16840.988 | +Transformer | epoch 0 | step 63870 |avg loss 7.814 |avg tokens 2018.400 |tokens/s 7994.429 |walltime 16843.513 | +Transformer | epoch 0 | step 63880 |avg loss 7.689 |avg tokens 2267.200 |tokens/s 8366.342 |walltime 16846.223 | +Transformer | epoch 0 | step 63890 |avg loss 7.854 |avg tokens 2078.100 |tokens/s 7845.642 |walltime 16848.872 | +Transformer | epoch 0 | step 63900 |avg loss 7.655 |avg tokens 2039.100 |tokens/s 7835.301 |walltime 16851.474 | +Transformer | epoch 0 | step 63910 |avg loss 7.431 |avg tokens 2239.600 |tokens/s 8308.534 |walltime 16854.170 | +Transformer | epoch 0 | step 63920 |avg loss 7.883 |avg tokens 2152.000 |tokens/s 8351.017 |walltime 16856.747 | +Transformer | epoch 0 | step 63930 |avg loss 7.991 |avg tokens 2167.000 |tokens/s 8470.760 |walltime 16859.305 | +Transformer | epoch 0 | step 63940 |avg loss 7.801 |avg tokens 2225.900 |tokens/s 8509.704 |walltime 16861.921 | +Transformer | epoch 0 | step 63950 |avg loss 7.342 |avg tokens 2320.800 |tokens/s 8350.694 |walltime 16864.700 | +Transformer | epoch 0 | step 63960 |avg loss 7.568 |avg tokens 2338.400 |tokens/s 8567.518 |walltime 16867.429 | +Transformer | epoch 0 | step 63970 |avg loss 7.541 |avg tokens 2310.400 |tokens/s 8420.338 |walltime 16870.173 | +Transformer | epoch 0 | step 63980 |avg loss 7.741 |avg tokens 2138.500 |tokens/s 8283.080 |walltime 16872.755 | +Transformer | epoch 0 | step 63990 |avg loss 7.573 |avg tokens 2260.000 |tokens/s 8445.122 |walltime 16875.431 | +Transformer | epoch 0 | step 64000 |avg loss 7.720 |avg tokens 2392.800 |tokens/s 8764.293 |walltime 16878.161 | +Transformer | epoch 0 | step 64010 |avg loss 7.928 |avg tokens 2285.000 |tokens/s 8834.609 |walltime 16880.748 | +Transformer | epoch 0 | step 64020 |avg loss 8.082 |avg tokens 2207.200 |tokens/s 8717.253 |walltime 16883.280 | +Transformer | epoch 0 | step 64030 |avg loss 7.600 |avg tokens 2154.500 |tokens/s 8132.069 |walltime 16885.929 | +Transformer | epoch 0 | step 64040 |avg loss 8.099 |avg tokens 1859.900 |tokens/s 7546.862 |walltime 16888.393 | +Transformer | epoch 0 | step 64050 |avg loss 7.861 |avg tokens 2190.300 |tokens/s 8453.580 |walltime 16890.984 | +Transformer | epoch 0 | step 64060 |avg loss 7.896 |avg tokens 2235.400 |tokens/s 8669.638 |walltime 16893.563 | +Transformer | epoch 0 | step 64070 |avg loss 7.871 |avg tokens 2098.500 |tokens/s 8116.463 |walltime 16896.148 | +Transformer | epoch 0 | step 64080 |avg loss 7.947 |avg tokens 2113.100 |tokens/s 8092.716 |walltime 16898.759 | +Transformer | epoch 0 | step 64090 |avg loss 7.647 |avg tokens 2213.600 |tokens/s 8157.343 |walltime 16901.473 | +Transformer | epoch 0 | step 64100 |avg loss 7.436 |avg tokens 2241.800 |tokens/s 8229.134 |walltime 16904.197 | +Transformer | epoch 0 | step 64110 |avg loss 7.705 |avg tokens 2058.900 |tokens/s 7853.630 |walltime 16906.819 | +Transformer | epoch 0 | step 64120 |avg loss 7.810 |avg tokens 2303.000 |tokens/s 8587.078 |walltime 16909.501 | +Transformer | epoch 0 | step 64130 |avg loss 7.701 |avg tokens 2298.800 |tokens/s 8424.595 |walltime 16912.230 | +Transformer | epoch 0 | step 64140 |avg loss 7.770 |avg tokens 2227.200 |tokens/s 8623.058 |walltime 16914.812 | +Transformer | epoch 0 | step 64150 |avg loss 7.905 |avg tokens 2201.800 |tokens/s 8254.103 |walltime 16917.480 | +Transformer | epoch 0 | step 64160 |avg loss 7.769 |avg tokens 2146.000 |tokens/s 8233.665 |walltime 16920.086 | +Transformer | epoch 0 | step 64170 |avg loss 7.854 |avg tokens 2245.200 |tokens/s 8304.023 |walltime 16922.790 | +Transformer | epoch 0 | step 64180 |avg loss 7.859 |avg tokens 2157.900 |tokens/s 8057.398 |walltime 16925.468 | +Transformer | epoch 0 | step 64190 |avg loss 8.020 |avg tokens 1965.900 |tokens/s 7725.033 |walltime 16928.013 | +Transformer | epoch 0 | step 64200 |avg loss 7.782 |avg tokens 2219.500 |tokens/s 8452.668 |walltime 16930.639 | +Transformer | epoch 0 | step 64210 |avg loss 7.948 |avg tokens 2348.200 |tokens/s 8967.082 |walltime 16933.258 | +Transformer | epoch 0 | step 64220 |avg loss 7.700 |avg tokens 2385.600 |tokens/s 8928.361 |walltime 16935.929 | +Transformer | epoch 0 | step 64230 |avg loss 7.868 |avg tokens 1986.700 |tokens/s 7720.544 |walltime 16938.503 | +Transformer | epoch 0 | step 64240 |avg loss 7.741 |avg tokens 2196.000 |tokens/s 8216.775 |walltime 16941.175 | +Transformer | epoch 0 | step 64250 |avg loss 7.540 |avg tokens 2375.500 |tokens/s 8473.349 |walltime 16943.979 | +Transformer | epoch 0 | step 64260 |avg loss 7.736 |avg tokens 2294.400 |tokens/s 8317.082 |walltime 16946.737 | +Transformer | epoch 0 | step 64270 |avg loss 7.826 |avg tokens 1850.300 |tokens/s 7567.050 |walltime 16949.183 | +Transformer | epoch 0 | step 64280 |avg loss 8.181 |avg tokens 1742.500 |tokens/s 7448.521 |walltime 16951.522 | +Transformer | epoch 0 | step 64290 |avg loss 8.287 |avg tokens 2301.600 |tokens/s 8996.499 |walltime 16954.080 | +Transformer | epoch 0 | step 64300 |avg loss 7.809 |avg tokens 2091.500 |tokens/s 8052.379 |walltime 16956.678 | +Transformer | epoch 0 | step 64310 |avg loss 7.786 |avg tokens 2188.800 |tokens/s 8120.436 |walltime 16959.373 | +Transformer | epoch 0 | step 64320 |avg loss 7.990 |avg tokens 2120.800 |tokens/s 8147.004 |walltime 16961.976 | +Transformer | epoch 0 | step 64330 |avg loss 7.398 |avg tokens 2400.800 |tokens/s 8520.034 |walltime 16964.794 | +Transformer | epoch 0 | step 64340 |avg loss 7.735 |avg tokens 2274.400 |tokens/s 8502.492 |walltime 16967.469 | +Transformer | epoch 0 | step 64350 |avg loss 7.768 |avg tokens 2188.100 |tokens/s 7965.799 |walltime 16970.216 | +Transformer | epoch 0 | step 64360 |avg loss 7.741 |avg tokens 2330.700 |tokens/s 8693.268 |walltime 16972.897 | +Transformer | epoch 0 | step 64370 |avg loss 7.810 |avg tokens 2022.800 |tokens/s 8123.783 |walltime 16975.387 | +Transformer | epoch 0 | step 64380 |avg loss 7.729 |avg tokens 2156.400 |tokens/s 8118.067 |walltime 16978.043 | +Transformer | epoch 0 | step 64390 |avg loss 8.086 |avg tokens 2084.600 |tokens/s 8122.584 |walltime 16980.610 | +Transformer | epoch 0 | step 64400 |avg loss 7.514 |avg tokens 2058.900 |tokens/s 7934.515 |walltime 16983.205 | +Transformer | epoch 0 | step 64410 |avg loss 7.973 |avg tokens 2324.100 |tokens/s 8801.502 |walltime 16985.845 | +Transformer | epoch 0 | step 64420 |avg loss 7.625 |avg tokens 2392.000 |tokens/s 8560.303 |walltime 16988.640 | +Transformer | epoch 0 | step 64430 |avg loss 7.682 |avg tokens 2268.100 |tokens/s 8299.795 |walltime 16991.372 | +Transformer | epoch 0 | step 64440 |avg loss 7.676 |avg tokens 2184.000 |tokens/s 8121.835 |walltime 16994.061 | +Transformer | epoch 0 | step 64450 |avg loss 7.695 |avg tokens 2213.400 |tokens/s 8344.495 |walltime 16996.714 | +Transformer | epoch 0 | step 64460 |avg loss 7.298 |avg tokens 2276.600 |tokens/s 8475.529 |walltime 16999.400 | +Transformer | epoch 0 | step 64470 |avg loss 7.596 |avg tokens 2391.300 |tokens/s 8537.731 |walltime 17002.201 | +Transformer | epoch 0 | step 64480 |avg loss 7.610 |avg tokens 2295.700 |tokens/s 8334.994 |walltime 17004.955 | +Transformer | epoch 0 | step 64490 |avg loss 7.913 |avg tokens 1992.700 |tokens/s 8124.620 |walltime 17007.408 | +Transformer | epoch 0 | step 64500 |avg loss 7.832 |avg tokens 2081.100 |tokens/s 8081.716 |walltime 17009.983 | +Transformer | epoch 0 | step 64510 |avg loss 8.006 |avg tokens 2305.900 |tokens/s 8560.869 |walltime 17012.676 | +Transformer | epoch 0 | step 64520 |avg loss 7.832 |avg tokens 2260.200 |tokens/s 8445.248 |walltime 17015.353 | +Transformer | epoch 0 | step 64530 |avg loss 7.951 |avg tokens 1787.300 |tokens/s 7083.783 |walltime 17017.876 | +Transformer | epoch 0 | step 64540 |avg loss 7.976 |avg tokens 2061.200 |tokens/s 7951.608 |walltime 17020.468 | +Transformer | epoch 0 | step 64550 |avg loss 7.258 |avg tokens 2184.600 |tokens/s 8183.134 |walltime 17023.138 | +Transformer | epoch 0 | step 64560 |avg loss 7.835 |avg tokens 2252.500 |tokens/s 8270.842 |walltime 17025.861 | +Transformer | epoch 0 | step 64570 |avg loss 7.808 |avg tokens 2096.100 |tokens/s 8134.615 |walltime 17028.438 | +Transformer | epoch 0 | step 64580 |avg loss 7.705 |avg tokens 2239.200 |tokens/s 8210.349 |walltime 17031.165 | +Transformer | epoch 0 | step 64590 |avg loss 7.783 |avg tokens 2071.900 |tokens/s 7994.102 |walltime 17033.757 | +Transformer | epoch 0 | step 64600 |avg loss 7.864 |avg tokens 2024.700 |tokens/s 7587.017 |walltime 17036.426 | +Transformer | epoch 0 | step 64610 |avg loss 7.815 |avg tokens 2266.500 |tokens/s 8506.187 |walltime 17039.090 | +Transformer | epoch 0 | step 64620 |avg loss 8.330 |avg tokens 2124.300 |tokens/s 8518.735 |walltime 17041.584 | +Transformer | epoch 0 | step 64630 |avg loss 7.781 |avg tokens 2203.600 |tokens/s 8447.054 |walltime 17044.193 | +Transformer | epoch 0 | step 64640 |avg loss 7.657 |avg tokens 2235.500 |tokens/s 8177.321 |walltime 17046.926 | +Transformer | epoch 0 | step 64650 |avg loss 7.167 |avg tokens 2039.300 |tokens/s 8012.080 |walltime 17049.472 | +Transformer | epoch 0 | step 64660 |avg loss 7.863 |avg tokens 2149.400 |tokens/s 8077.132 |walltime 17052.133 | +Transformer | epoch 0 | step 64670 |avg loss 8.054 |avg tokens 2098.100 |tokens/s 8022.266 |walltime 17054.748 | +Transformer | epoch 0 | step 64680 |avg loss 7.395 |avg tokens 2404.000 |tokens/s 8573.619 |walltime 17057.552 | +Transformer | epoch 0 | step 64690 |avg loss 7.544 |avg tokens 2279.300 |tokens/s 8229.269 |walltime 17060.322 | +Transformer | epoch 0 | step 64700 |avg loss 7.664 |avg tokens 2355.600 |tokens/s 8542.594 |walltime 17063.079 | +Transformer | epoch 0 | step 64710 |avg loss 7.799 |avg tokens 2237.500 |tokens/s 8597.445 |walltime 17065.682 | +Transformer | epoch 0 | step 64720 |avg loss 7.660 |avg tokens 2093.300 |tokens/s 7842.109 |walltime 17068.351 | +Transformer | epoch 0 | step 64730 |avg loss 7.907 |avg tokens 2073.100 |tokens/s 8040.603 |walltime 17070.929 | +Transformer | epoch 0 | step 64740 |avg loss 7.513 |avg tokens 2269.100 |tokens/s 8223.507 |walltime 17073.689 | +Transformer | epoch 0 | step 64750 |avg loss 7.765 |avg tokens 2185.100 |tokens/s 8179.292 |walltime 17076.360 | +Transformer | epoch 0 | step 64760 |avg loss 7.616 |avg tokens 2255.800 |tokens/s 8602.978 |walltime 17078.982 | +Transformer | epoch 0 | step 64770 |avg loss 7.560 |avg tokens 2382.400 |tokens/s 8635.833 |walltime 17081.741 | +Transformer | epoch 0 | step 64780 |avg loss 7.999 |avg tokens 2190.300 |tokens/s 8286.897 |walltime 17084.384 | +Transformer | epoch 0 | step 64790 |avg loss 7.655 |avg tokens 2332.800 |tokens/s 8439.490 |walltime 17087.148 | +Transformer | epoch 0 | step 64800 |avg loss 8.059 |avg tokens 2188.500 |tokens/s 8208.186 |walltime 17089.814 | +Transformer | epoch 0 | step 64810 |avg loss 7.869 |avg tokens 2169.600 |tokens/s 8072.528 |walltime 17092.502 | +Transformer | epoch 0 | step 64820 |avg loss 7.892 |avg tokens 2388.800 |tokens/s 8902.179 |walltime 17095.186 | +Transformer | epoch 0 | step 64830 |avg loss 7.808 |avg tokens 2157.900 |tokens/s 7970.322 |walltime 17097.893 | +Transformer | epoch 0 | step 64840 |avg loss 7.548 |avg tokens 2407.200 |tokens/s 8962.427 |walltime 17100.579 | +Transformer | epoch 0 | step 64850 |avg loss 7.487 |avg tokens 2322.900 |tokens/s 8543.963 |walltime 17103.298 | +Transformer | epoch 0 | step 64860 |avg loss 7.662 |avg tokens 2108.200 |tokens/s 8302.714 |walltime 17105.837 | +Transformer | epoch 0 | step 64870 |avg loss 7.414 |avg tokens 2295.200 |tokens/s 8394.981 |walltime 17108.571 | +Transformer | epoch 0 | step 64880 |avg loss 7.467 |avg tokens 2123.600 |tokens/s 8081.596 |walltime 17111.198 | +Transformer | epoch 0 | step 64890 |avg loss 7.712 |avg tokens 2310.800 |tokens/s 8623.284 |walltime 17113.878 | +Transformer | epoch 0 | step 64900 |avg loss 7.791 |avg tokens 2128.500 |tokens/s 8047.017 |walltime 17116.523 | +Transformer | epoch 0 | step 64910 |avg loss 7.854 |avg tokens 2267.200 |tokens/s 8321.779 |walltime 17119.248 | +Transformer | epoch 0 | step 64920 |avg loss 7.839 |avg tokens 2098.400 |tokens/s 8157.489 |walltime 17121.820 | +Transformer | epoch 0 | step 64930 |avg loss 7.702 |avg tokens 2287.400 |tokens/s 8453.706 |walltime 17124.526 | +Transformer | epoch 0 | step 64940 |avg loss 7.640 |avg tokens 2330.000 |tokens/s 8631.204 |walltime 17127.225 | +Transformer | epoch 0 | step 64950 |avg loss 7.383 |avg tokens 2393.900 |tokens/s 8517.755 |walltime 17130.036 | +Transformer | epoch 0 | step 64960 |avg loss 8.254 |avg tokens 2058.400 |tokens/s 8605.380 |walltime 17132.428 | +Transformer | epoch 0 | step 64970 |avg loss 7.644 |avg tokens 2242.200 |tokens/s 8563.329 |walltime 17135.046 | +Transformer | epoch 0 | step 64980 |avg loss 7.660 |avg tokens 1958.000 |tokens/s 7983.527 |walltime 17137.499 | +Transformer | epoch 0 | step 64990 |avg loss 7.948 |avg tokens 2331.100 |tokens/s 8966.496 |walltime 17140.099 | +Transformer | epoch 0 | step 65000 |avg loss 7.789 |avg tokens 2199.200 |tokens/s 8241.442 |walltime 17142.767 | +Transformer | epoch 0 | step 65010 |avg loss 7.822 |avg tokens 2141.100 |tokens/s 8142.593 |walltime 17145.397 | +Transformer | epoch 0 | step 65020 |avg loss 7.982 |avg tokens 2148.200 |tokens/s 8325.814 |walltime 17147.977 | +Transformer | epoch 0 | step 65030 |avg loss 7.893 |avg tokens 2291.000 |tokens/s 8366.657 |walltime 17150.715 | +Transformer | epoch 0 | step 65040 |avg loss 7.973 |avg tokens 2115.900 |tokens/s 8588.738 |walltime 17153.179 | +Transformer | epoch 0 | step 65050 |avg loss 7.996 |avg tokens 2199.900 |tokens/s 8335.215 |walltime 17155.818 | +Transformer | epoch 0 | step 65060 |avg loss 7.701 |avg tokens 1898.200 |tokens/s 7357.570 |walltime 17158.398 | +Transformer | epoch 0 | step 65070 |avg loss 7.965 |avg tokens 2147.600 |tokens/s 8123.451 |walltime 17161.041 | +Transformer | epoch 0 | step 65080 |avg loss 7.496 |avg tokens 2303.400 |tokens/s 8379.095 |walltime 17163.790 | +Transformer | epoch 0 | step 65090 |avg loss 7.862 |avg tokens 2111.400 |tokens/s 8517.042 |walltime 17166.270 | +Transformer | epoch 0 | step 65100 |avg loss 7.525 |avg tokens 2145.600 |tokens/s 8094.157 |walltime 17168.920 | +Transformer | epoch 0 | step 65110 |avg loss 7.663 |avg tokens 2366.400 |tokens/s 8575.047 |walltime 17171.680 | +Transformer | epoch 0 | step 65120 |avg loss 7.845 |avg tokens 1762.800 |tokens/s 7055.866 |walltime 17174.178 | +Transformer | epoch 0 | step 65130 |avg loss 8.100 |avg tokens 2318.700 |tokens/s 8729.189 |walltime 17176.835 | +Transformer | epoch 0 | step 65140 |avg loss 7.920 |avg tokens 1618.300 |tokens/s 6985.717 |walltime 17179.151 | +Transformer | epoch 0 | step 65150 |avg loss 8.264 |avg tokens 1993.200 |tokens/s 7914.688 |walltime 17181.670 | +Transformer | epoch 0 | step 65160 |avg loss 8.014 |avg tokens 2243.400 |tokens/s 8411.930 |walltime 17184.336 | +Transformer | epoch 0 | step 65170 |avg loss 8.057 |avg tokens 2292.800 |tokens/s 8760.025 |walltime 17186.954 | +Transformer | epoch 0 | step 65180 |avg loss 8.019 |avg tokens 2274.200 |tokens/s 8549.564 |walltime 17189.614 | +Transformer | epoch 0 | step 65190 |avg loss 7.834 |avg tokens 2141.400 |tokens/s 8143.183 |walltime 17192.244 | +Epoch time: 17183.74609684944 +Transformer | epoch 0 | step 65198 |avg loss 7.655 |avg tokens 2430.000 |tokens/s 7480.188 |walltime 17194.842 | +Validation loss on subset valid: 7.79771701159763 /workspace/translation/fairseq/sequence_generator.py:376: UserWarning: Integer division of tensors using div or / is deprecated, and in a future release div will perform true division as in Python 3. Use true_divide or floor_divide (// in Python) instead. (Triggered internally at ../aten/src/ATen/native/BinaryOps.cpp:66.) torch.div(cand_indices, self.vocab_size, out=cand_beams) -| Translated 3000 sentences (124565 tokens) in 75.7s (39.64 sentences/s, 1645.87 tokens/s) -| Eval completed in: 98.04s | UNCASED BLEU 1.45 -| done training in 17378.6 seconds -Transformer | epoch 0 | step RUN |avg loss 7.380 |walltime 17388.917 | +| Translated 3000 sentences (106731 tokens) in 79.1s (37.93 sentences/s, 1349.50 tokens/s) +| Eval completed in: 101.88s | UNCASED BLEU 1.04 +| done training in 17302.3 seconds +Transformer | epoch 0 | step RUN |avg loss 7.798 |walltime 17313.847 | From 220a72a9b473b52f25c445f809cde58966fa764f Mon Sep 17 00:00:00 2001 From: zhangkeliang Date: Sun, 10 Jan 2021 11:47:17 +0000 Subject: [PATCH 7/7] Update pytorch transformer perf data --- Transformer/OtherReports/PyTorch/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Transformer/OtherReports/PyTorch/README.md b/Transformer/OtherReports/PyTorch/README.md index 74fb834..ec0a2fc 100644 --- a/Transformer/OtherReports/PyTorch/README.md +++ b/Transformer/OtherReports/PyTorch/README.md @@ -149,7 +149,7 @@ NGC PyTorch 的代码仓库提供了自动构建 Docker 镜像的的 [shell 脚 |卡数 | FP32(BS=2560) | AMP O2(BS=5120) | |:-----:|:-----:|:-----:| -|1 | 7893.1 | 30523.5 | +|1 | 8277.2 | 32999.9 | ## 五、日志数据 ### 1.单机(单卡、8卡)日志