diff --git a/bionemo-recipes/recipes/llama3_native_te/README.md b/bionemo-recipes/recipes/llama3_native_te/README.md index 7f593157ab..204cc8dbaf 100644 --- a/bionemo-recipes/recipes/llama3_native_te/README.md +++ b/bionemo-recipes/recipes/llama3_native_te/README.md @@ -48,7 +48,7 @@ for the list of dependencies. ### Performance Benchmarks

- Llama 3 Context Parallelism Benchmarks + Llama 3 Context Parallelism Benchmarks

Scaling Llama 3 70B with Context Parallelism (CP) on 32x NVIDIA GB300 GPUs (NVL32) with synthetic data of increasing @@ -64,6 +64,13 @@ def compute_model_pflops(seq_len, global_batch_size, step_time_s): return model_flops / 1e15 ``` +Performing the same experiment with a fixed context length of 8192 (increasing micro batch size to hold the global batch +size constant) more clearly shows the overhead introduced by context parallelism communication. + +

+ Llama 3 Context Parallelism Benchmarks +

+ ### Convergence Benchmarks

diff --git a/docs/docs/assets/images/recipes/70b-cp-benchmarks-flat-ctx.png b/docs/docs/assets/images/recipes/70b-cp-benchmarks-flat-ctx.png new file mode 100644 index 0000000000..31cf0967f5 Binary files /dev/null and b/docs/docs/assets/images/recipes/70b-cp-benchmarks-flat-ctx.png differ diff --git a/docs/docs/assets/images/recipes/70b-cp-benchmarks-increasing-ctx.png b/docs/docs/assets/images/recipes/70b-cp-benchmarks-increasing-ctx.png new file mode 100644 index 0000000000..cf62233ba9 Binary files /dev/null and b/docs/docs/assets/images/recipes/70b-cp-benchmarks-increasing-ctx.png differ diff --git a/docs/docs/assets/images/recipes/70b-cp-benchmarks.png b/docs/docs/assets/images/recipes/70b-cp-benchmarks.png deleted file mode 100644 index 464db25d83..0000000000 Binary files a/docs/docs/assets/images/recipes/70b-cp-benchmarks.png and /dev/null differ