Author: Md Mahfuzur Rahman Shanto
Portfolio: mahfuz.cv · LinkedIn: linkedin.com/in/mahfuzswe
A practical field guide for students, researchers, and early-stage ML engineers. Every chapter helps you do something, fix something, or optimize something. No filler.
- Beginners who have never run code on a GPU
- ML practitioners moving from CPU-only workflows to GPU training
- Researchers working with deep learning models or LLMs who want to squeeze more out of their hardware
Read it front to back if you are starting from zero. Jump to specific chapters if you already have a foundation and need to fill in gaps. The chapters are ordered by dependency — each one builds on the concepts from the previous.
| Chapter | Title | What You Learn |
|---|---|---|
| 01 | Foundations | GPU vs CPU, CUDA, VRAM, Tensor Cores, ecosystem overview |
| 02 | Environment Setup | CUDA install, PyTorch + TF GPU setup, verification, common failures |
| 03 | First Practical GPU Usage | Device placement, training loop, nvidia-smi, debugging |
| 04 | Performance Optimization | Mixed precision, batch tuning, DataLoader, gradient accumulation, profiling |
| 05 | Deep Learning Workloads | CNN workflow, Transformers, large datasets, multi-GPU (DDP) |
| 06 | GPU in the LLM Era | VRAM math, quantization, LoRA/PEFT fine-tuning, inference optimization |
| 07 | Cloud & Remote GPU | Colab, Kaggle, paid services, SSH + remote workflows |
| 08 | Real-World Engineering | Choosing hardware, cost trade-offs, CUDA errors decoded, common mistakes |
| 09 | Zero to Hero Roadmap | Week-by-week learning path, recommended stack, mini-projects |
| 10 | Appendix | Command cheat sheet, error reference table, curated resources |
If you are completely new, read Chapter 1 first to build the mental model, then Chapter 2 to get your environment working. From there, Chapter 3 gives you your first real GPU code to run and observe.
If you already have PyTorch working on GPU, jump directly to Chapter 4 for optimization techniques that apply to almost every training job.
If you are specifically working with LLMs (fine-tuning, inference, quantization), Chapter 6 is the most relevant.
Begin with Chapter 1: Foundations. Every chapter builds on the previous, so starting from the beginning is recommended.
All code examples in this guide are:
- Written for PyTorch (primary) and occasionally Hugging Face Transformers
- Tested on CUDA 12.x with PyTorch 2.x
- Self-contained — each snippet runs independently with minimal setup
- Commented to explain the reasoning, not just the syntax
This guide is freely available for personal learning and educational use.
If this guide helps you, consider sharing it with someone else who is learning GPU programming.