Hardware

DGX A100 80G (640G): NVIDIA DGX A100 | The Universal System for AI Infrastructure

A100- 40G by itself: NVIDIA A100 | Tensor Core GPU (Not easy or really logical to buy a single 80G a100)

TPU: Cloud TPU documentation  |  Google Cloud

TPU links

PyTorch on TPU: Training PyTorch models on Cloud TPU Pods  |  Google Cloud

cloud.google.com/tpu/pricing

g.co/cloudtpu


Found a great example why Tensorflow-Mesh is so different from PyTorch FSDP.

I was going to make the argument that TPU has less memory but more bandwidth than DGX, but I’m not sure that’s true anymore.

Comparison: Compute per Memory (…per watt)

<aside> 💡 I’m must not be comparing apples to apples, despite how closely I’m reading it. The pods are HUGE, can’t be just 192 watts (that’s per chip??)

</aside>

GPUv4: 1.1 Exaflops @ bf16 or int8, per 32GiB of HBM2.