Рет қаралды 873
Compilation is an excellent technique to accelerate the training and inference of deep learning models, especially if it can be completely automated!
In this video, we discuss deep learning compilation, from the early days of TensorFlow to PyTorch 2. Along the way, you'll learn about key technologies such as XLA, PyTorch/XLA, OpenXLA, TorchScript, HLO, TorchDynamo, TorchInductor, and more. You'll see where they fit and how they help accelerate models on a wide range of devices, including custom chips like Google TPU and AWS Inferentia 2. Of course, we'll also share some simple examples, including how to easily accelerate Hugging Face models with PyTorch 2 and torch.compile().
⭐️⭐️⭐️ Don't forget to subscribe to be notified of future videos. Follow me on Medium at / julsimon or Substack at julsimon.substack.com. ⭐️⭐️⭐️
02:10 TensorFlow 1.x and graph mode
05:52 TensorFlow XLA
09:35 PyTorch TorchScript
14:25 PyTorch/XLA and lazy tensors
17:28 PyTorch/XLA example with Google TPU
21:40 A quick look at HLO
24:05 OpenXLA
25:50 PyTorch/XLA example with AWS Inferentia 2
29:10 PyTorch 2 : torch.compile()
34:37 Hugging Face models with PyTorch 2
36:10 BERT on CPU with Torch Inductor and IPEX backends