Chapter 3 · Hardware¶

Scaffolded — not yet written to depth

Outlined below.

Planned sections¶

GPU architecture — compute (SMs, tensor cores) vs memory and caches (HBM, L2, SRAM)
GPU generations — Hopper, Ada Lovelace, Blackwell, Rubin; Grace/Vera CPUs
Instances — multi-GPU nodes, NVLink, multi-instance GPUs (MIG)
Other accelerators — TPUs, Trainium, Inferentia, and when they make sense
Local inference — desktop and mobile, and how the constraints change