DGX A100 80G (640G): NVIDIA DGX A100 | The Universal System for AI Infrastructure
A100- 40G by itself: NVIDIA A100 | Tensor Core GPU (Not easy or really logical to buy a single 80G a100)
TPU: Cloud TPU documentation | Google Cloud
PyTorch on TPU: Training PyTorch models on Cloud TPU Pods | Google Cloud
Found a great example why Tensorflow-Mesh is so different from PyTorch FSDP.
I was going to make the argument that TPU has less memory but more bandwidth than DGX, but I’m not sure that’s true anymore.
<aside> 💡 I’m must not be comparing apples to apples, despite how closely I’m reading it. The pods are HUGE, can’t be just 192 watts (that’s per chip??)
</aside>
GPUv4: 1.1 Exaflops @ bf16 or int8, per 32GiB of HBM2.