Chapter 3 · Hardware¶
Scaffolded — not yet written to depth
Outlined below.
Planned sections¶
- GPU architecture — compute (SMs, tensor cores) vs memory and caches (HBM, L2, SRAM)
- GPU generations — Hopper, Ada Lovelace, Blackwell, Rubin; Grace/Vera CPUs
- Instances — multi-GPU nodes, NVLink, multi-instance GPUs (MIG)
- Other accelerators — TPUs, Trainium, Inferentia, and when they make sense
- Local inference — desktop and mobile, and how the constraints change