Skip to content

Chapter 3 · Hardware

Scaffolded — not yet written to depth

Outlined below.

Planned sections

  • GPU architecture — compute (SMs, tensor cores) vs memory and caches (HBM, L2, SRAM)
  • GPU generations — Hopper, Ada Lovelace, Blackwell, Rubin; Grace/Vera CPUs
  • Instances — multi-GPU nodes, NVLink, multi-instance GPUs (MIG)
  • Other accelerators — TPUs, Trainium, Inferentia, and when they make sense
  • Local inference — desktop and mobile, and how the constraints change