Skip to content

Commit 1a6acfa

Browse files
authored
docs(gpu): fix wording (#5933)
* docs(gpu): fix wording * docs(gpu): fix typo * docs(gpu): add nvlink
1 parent 3ed4994 commit 1a6acfa

File tree

1 file changed

+4
-2
lines changed

1 file changed

+4
-2
lines changed

pages/gpu/reference-content/blackwell-vs-hopper-choosing-the-right-architecture.mdx

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -29,15 +29,17 @@ The B300 delivers exceptional memory and bandwidth:
2929
This massive capacity enables entire 1-trillion-parameter models, huge batch sizes, and ultra-long context windows (*up to 1 million+ tokens*) to reside on just a few GPUs,reducing the need for complex multi-node communication and drastically lowering inter-node overhead.
3030
Equipped with **fifth-generation Tensor Cores**, the B300 introduces native hardware support for FP4 and FP6 precision, a major advancement over Hopper. On H100, FP4 operations are emulated using INT8 arithmetic, which limits efficiency and real-world performance. In contrast, Blackwell’s Tensor Cores process FP4 natively, unlocking significantly higher throughput and energy efficiency for ultra-low-precision AI workloads.
3131

32-
Combined with an enhanced **second-generation Transformer Engine**, these improvements enable the B300 to deliver up to **15 PFLOPS of dense FP4 performance**, achieving multiple times higher inference throughput than the H100-SXM on modern reasoning workloads such as DeepSeek-R1 and Llama 3.1 405B+.
32+
Scaleway's B300-SXM Instances are equipped with **fifth-generation NVLink** that vastly boosts scalability in large multi-GPU systems, allowing GPUs to seamlessly share memory and coordinate computations across training, inference, and reasoning workloads. Each NVIDIA Blackwell GPU includes up to [18 NVLink 100 GB/s](https://www.nvidia.com/en-us/data-center/nvlink/) connections, providing a total of 1.8 TB/s of bandwidth—twice that of the previous generation and more than 14× the bandwidth of PCIe Gen5.
33+
34+
Combined with an enhanced **second-generation Transformer Engine**, these improvements enable the B300 to deliver up to **18 PFLOPS of dense FP4 performance**, achieving multiple times higher inference throughput than the H100-SXM on modern reasoning workloads such as DeepSeek-R1 and Llama 3.1 405B+.
3335
Compared to B200, B300 delivers higher FP4 perfomance at the expense of the FP64 performance. As a result, the B300 delivers lower FP64 performance compared to the H100-SXM, making it less well-suited for traditional scientific computing and HPC simulations that rely on high-precision arithmetic.
3436
With its combination of vast memory and ultra-efficient low-precision compute, the B300 shines in “*big-AI*” scenarios, including:
3537
* Training and fine-tuning trillion-parameter dense or Mixture-of-Experts (MoE) models
3638
* Real-time, high-throughput inference at scale
3739
* Retrieval-Augmented Generation (RAG) with massive context
3840
* AI reasoning pipelines requiring extended token sequences
3941

40-
The architecture's strong focus on frontier AI means the B300 is *not optimized for HPC workloads* such as computational physics, quantum chemistry, or climate modeling,domains where FP64 accuracy is critical. In these cases, performance may be inferior to H100, despite Blackwell’s generational leap in AI capabilities.
42+
The architecture's strong focus on frontier AI means the B300 is *not optimized for HPC workloads* such as computational physics, quantum chemistry, or climate modeling; domains where FP64 accuracy is critical.
4143
Moreover, the B300’s extreme capabilities make it over-provisioned for smaller or mid-sized models (≤70B parameters), prototyping, or general-purpose AI tasks. For these use cases, Scaleway’s H100-SXM GPU Instances remain a more economical and practical choice.
4244

4345
## NVIDIA H100-SXM: The reliable standard for AI and HPC

0 commit comments

Comments
 (0)