LLMs: A Journey Through Time and Architecture

  Рет қаралды 4,627

Sebastian Raschka

Sebastian Raschka

Күн бұрын

Пікірлер: 42
@SebastianRaschka
@SebastianRaschka 2 ай бұрын
If someone is interested in a code tutorial converting the GPT model to Llama, I have a step-by-step guide here: github.com/rasbt/LLMs-from-scratch/blob/main/ch05/07_gpt_to_llama/converting-gpt-to-llama2.ipynb (will add it to the description)
@SHAMIKII
@SHAMIKII 2 ай бұрын
Certainly, me, me, me. Thank you very much for all your content.
@oldmankatan7383
@oldmankatan7383 20 күн бұрын
Nice round up! Thank you for this.
@hiramcoriarodriguez1252
@hiramcoriarodriguez1252 2 ай бұрын
Your book is a master peace, congratulations
@SebastianRaschka
@SebastianRaschka 2 ай бұрын
Thanks for the kind words!
@SanjaySingh-gj2kq
@SanjaySingh-gj2kq 2 ай бұрын
Bought your book on manning last year - one of the best book on LLM internals. Looking forward to get the print book
@SebastianRaschka
@SebastianRaschka 2 ай бұрын
Thanks for the kind words, glad to hear that you've been enjoying it! The print copies started shipping and I hope you get your's soon!
@abdulhamidmerii5538
@abdulhamidmerii5538 2 ай бұрын
Just received the print version of your book yesterday, I look forward to reading it!
@SebastianRaschka
@SebastianRaschka 2 ай бұрын
Good timing! I hope you like it and have a fun weekend ahead!
@tee_iam78
@tee_iam78 2 ай бұрын
A brilliant content. Thank you.
@SebastianRaschka
@SebastianRaschka 2 ай бұрын
Thanks!!
@thefatcat-hd6ze
@thefatcat-hd6ze 2 ай бұрын
Enjoying your book a lot :))
@SebastianRaschka
@SebastianRaschka 2 ай бұрын
Thanks! Glad to hear that it was worth all the long hours and weekends!
@thefatcat-hd6ze
@thefatcat-hd6ze 2 ай бұрын
@@SebastianRaschka 🙏
@vaioslaschos
@vaioslaschos 2 ай бұрын
I think the grouped-query attention is more than a trick for computational reduction. It says something deep about what is the best way to share information in a multiagent system to have the best performance. And it says something alont the lines that it is better to give little essential info and at the same time request multiple info from many sources.
@SebastianRaschka
@SebastianRaschka 2 ай бұрын
That's a nice interpretation regarding multi- and grouped-query attention. Thanks for sharing! If you go by the original papers though, the intention was more computation constraints and efficiency (e.g., see arxiv.org/abs/2305.13245), but yeah, perhaps it can actually help with modeling performance as well in certain scenarios (for instance, where there is massive overfitting otherwise).
@vaioslaschos
@vaioslaschos 2 ай бұрын
@@SebastianRaschka I have no doubt that what you say is true, and in no way I wanted to imply you missed something. Two years ago, I spent couple of months training 100M models with different architectures. I did some weird stuff like putting all the attention layers first and then a big nonlinear layer. You will be surprised with how many monstrosities can actually work without losing too much performance. The two things I got from all this is a) There is some interesting intuition in group querying (that I cant fully articulate), and it will make sense for this to be explored further, b) skip connection, where you pass the value from previous layers to the current, is not a gimmick. If you remove it the performance drops a lot, which for me implies that attention mechanism is actually applied to get only the "new" info. I think that intuitions about the architecture is not passed from the researchers to the community and It is a pity. Also it is a pity that experimenting with architecture is a rich persons hobby. Anyway, I really like your channel. I subscribed :-).
@dc33333
@dc33333 2 ай бұрын
my favorite YT channel
@SebastianRaschka
@SebastianRaschka 2 ай бұрын
Thanks :)
@Ken-de6tp
@Ken-de6tp 2 ай бұрын
Reading your new book ! 🎉🎉
@SebastianRaschka
@SebastianRaschka 2 ай бұрын
Hope you'll like it! Happy coding and reading!
@mahdipourmirzaei1048
@mahdipourmirzaei1048 2 ай бұрын
GPT2 training did not train on 40 billion tokens, it was 40 GB of text which is equivalent to roughly 8 billion tokens or less.
@SebastianRaschka
@SebastianRaschka 2 ай бұрын
Ah yes, 40 GB, you are right. Good catch!
@maikerodrigo4249
@maikerodrigo4249 2 ай бұрын
Llama 3.2 just came out today
@SebastianRaschka
@SebastianRaschka 2 ай бұрын
Ha yes, I wish I could insert additional slides! What's interesting is that the small model is back from RMSNorm to LayerNorm
@cletadjos
@cletadjos 2 ай бұрын
Thanks for sharing 😊
@1msirius
@1msirius 28 күн бұрын
Hey, thanks for your videos also can you suggest to me your best book on Gen AI (I want to learn about transformers in detail)
@SebastianRaschka
@SebastianRaschka 28 күн бұрын
Glad you found the videos useful! Since you asked for a book recommendation: Build a Large Language Model From Scratch (amzn.to/4fqvn0D), where you build a transformer-based LLM from the ground up, implementing each single component.
@Innovatead_Solutions-e4u
@Innovatead_Solutions-e4u 2 ай бұрын
Dear Sebastian Raschka, your channel caught our attention and we would like to explore advertising possibilities with you. Looking forward to discussing potential opportunities!
@SaiKiran-he5vy
@SaiKiran-he5vy 2 ай бұрын
What is the pre-requisites knowledge required to explore your new book: `Build a Large Language Model (From Scratch)`
@SebastianRaschka
@SebastianRaschka 2 ай бұрын
Good question! It would require Python knowledge. PyTorch knowledge is also good to get started quicker, but not strictly necessary. If you are new to PyTorch, you can start with Appendix A, which is a ~50 page intro to PyTorch to get you up to speed
@SettimiTommaso
@SettimiTommaso 2 ай бұрын
Yes!
@subaruhassufferredenough7892
@subaruhassufferredenough7892 2 ай бұрын
What do you mean by high quality annealing?
@SebastianRaschka
@SebastianRaschka 2 ай бұрын
They would select a small subset of very high quality data for the final annealing stage.
@subaruhassufferredenough7892
@subaruhassufferredenough7892 2 ай бұрын
What does annealing mean in the context of LLMs? Is it the same as what we mean by an annealing LR scheduler?
@SebastianRaschka
@SebastianRaschka 2 ай бұрын
@@subaruhassufferredenough7892 Yes, it's basically the same
@subaruhassufferredenough7892
@subaruhassufferredenough7892 2 ай бұрын
Do you know how they determined which data was high quality?
@rafsanjaniLab
@rafsanjaniLab 2 ай бұрын
Hi Prof. Raschka, could you please attach the slides?
@parvesh-rana
@parvesh-rana 2 ай бұрын
Explain transformers in detail
@SebastianRaschka
@SebastianRaschka 2 ай бұрын
That would be a very long video :D. But you might find my book useful in that respect.
@SerikPoliasc
@SerikPoliasc 2 ай бұрын
Moore Daniel Taylor Brenda Anderson Eric
Visualizing transformers and attention | Talk for TNG Big Tech Day '24
57:45
Insights from Finetuning LLMs with Low-Rank Adaptation
13:49
Sebastian Raschka
Рет қаралды 6 М.
Accompanying my daughter to practice dance is so annoying #funny #cute#comedy
00:17
Funny daughter's daily life
Рет қаралды 22 МЛН
The Ultimate Sausage Prank! Watch Their Reactions 😂🌭 #Unexpected
00:17
La La Life Shorts
Рет қаралды 9 МЛН
Симбу закрыли дома?! 🔒 #симба #симбочка #арти
00:41
Симбочка Пимпочка
Рет қаралды 5 МЛН
УДИВИЛ ВСЕХ СВОИМ УХОДОМ!😳 #shorts
00:49
HARD_MMA
Рет қаралды 4,5 МЛН
Transformers (how LLMs work) explained visually | DL5
27:14
3Blue1Brown
Рет қаралды 3,8 МЛН
AI can't cross this line and we don't know why.
24:07
Welch Labs
Рет қаралды 1,3 МЛН
Developing an LLM: Building, Training, Finetuning
58:46
Sebastian Raschka
Рет қаралды 62 М.
Qwen Just Casually Started the Local AI Revolution
16:05
Cole Medin
Рет қаралды 96 М.
Building LLMs from the Ground Up: A 3-hour Coding Workshop
2:45:10
Sebastian Raschka
Рет қаралды 78 М.
Reliable, fully local RAG agents with LLaMA3.2-3b
31:04
LangChain
Рет қаралды 65 М.
Large Language Models explained briefly
8:48
3Blue1Brown
Рет қаралды 614 М.
How Domain-Specific AI Agents Will Shape the Industrial World in the Next 10 Years
32:29
Accompanying my daughter to practice dance is so annoying #funny #cute#comedy
00:17
Funny daughter's daily life
Рет қаралды 22 МЛН