New course taught by Jay Alammar and Maarten Grootendorst: How Transformer LLMs Work

Рет қаралды 2,858

Күн бұрын

Learn more: bit.ly/3WKa3fK
Introducing "How Transformer LLMs Work," created with Jay Alammar and Maarten Grootendorst, authors of the “Hands-On Large Language Models” book. This course offers a deep dive into the main components of the transformer architecture that powers large language models (LLMs).
The transformer architecture revolutionized generative AI. In fact, the "GPT" in ChatGPT stands for "Generative Pre-Trained Transformer."
Originally introduced in the groundbreaking 2017 paper Attention Is All You Need, by Ashish Vaswani and others, transformers were a highly scalable model for machine translation tasks. Variants of this architecture now power today’s LLMs such as those from OpenAI, Google, Meta, Cohere, and Anthropic.
In their book, Jay and Maarten beautifully illustrated the underlying architecture of LLMs through insightful and easy-to-understand explanations.
In this course, you'll learn how a transformer network architecture that powers LLMs works. You'll build the intuition of how LLMs process text and work with code examples that illustrate the key components of the transformer architecture.
Key topics covered in this course include:
The evolution of how language has been represented numerically, from the Bag-of-Words model through Word2Vec embeddings to the transformer architecture that captures word meanings in full context.
How LLM inputs are broken down into tokens, which represent words or pieces before they are sent to the language model.
The details of a transformer and the three main stages, consisting of tokenization and embedding, the stack of transformer blocks, and the language model head.
The details of the transformer block, including attention, which calculates relevance scores followed by the feedforward layer, which incorporates stored information learned in training.
How cached calculations make transformers faster, how the transformer block has evolved over the years since the original paper was released, and how they continue to be widely used.
Explore an implementation of recent models in the Hugging Face transformer library.
By the end of this course, you’ll have a deep understanding of how LLMs process language and you'll be able to read through papers describing models and understand the details that are used to describe these architectures. This intuition will help improve your approach to building LLM applications.
Enroll now: bit.ly/3WKa3fK