HuggingFace Fundamentals with LLM's such as TInyLlama and Mistral 7B

  Рет қаралды 5,637

Chris Hay

Chris Hay

3 ай бұрын

chris looks under the hood of huggingface models such as TinyLlama and Mistral 7-B. In the video Chris presents a high level reference model of large language models and uses this to show how tokenization and the AutoTokenizer module works from the HuggingFace transfomer library linking it back to the HuggingFace repository. In addition we look at the tokenizer config and Chris shows how Mistral and Llama-2 both use the same tokenizer and embeddings architecture (albeit different vocabularies). Finally Chris shows you how to look at the model configuration and model architecture of hugging face models.
As we start to build towards our own large language model, understanding these fundamentals are critical no matter whether you are a builder or consumer of AI.
Google Colab:
colab.research.google.com/dri...

Пікірлер: 46
@ukaszrozewicz7488
@ukaszrozewicz7488 3 ай бұрын
The best video I've watched on KZbin about LLM so far. You explain complex topics in an accessible language, clearly and understandably. You are doing a very good job. I'm eagerly waiting for the next videos :)
@ThomazMartinez
@ThomazMartinez 3 ай бұрын
same here
@chrishayuk
@chrishayuk 3 ай бұрын
Wow, thanks!, this one actually took a long time to get right, glad you liked it
@mindurownbussines
@mindurownbussines Ай бұрын
Thank you so much Chris I truly believe if one has a great understanding of a subject he can teach it clearly and you simply did that ! God bless you
@chrishayuk
@chrishayuk Ай бұрын
You are too kind, and thank you. Glad it was useful
@kenchang3456
@kenchang3456 3 ай бұрын
Excellent explanation. Although I don't have a use case to fine-tune a model currently, I presume I will eventually it'll be great to have what you've shared in my back pocket. Thanks a bunch.
@chrishayuk
@chrishayuk 3 ай бұрын
Awesome, glad it was useful
@janstrunk
@janstrunk 3 ай бұрын
Great video! Looking forward to your next videos…
@chrishayuk
@chrishayuk 3 ай бұрын
Yeah, next ones in series will be fun, glad you’re enjoying it
@Jaypatel512
@Jaypatel512 2 ай бұрын
Amazing way to get people comfortable with the model architecture. Thank you so much for sharing your knowledge.
@chrishayuk
@chrishayuk 2 ай бұрын
Glad it was useful
@BipinRimal314
@BipinRimal314 3 ай бұрын
Looking really forward to the next video.
@chrishayuk
@chrishayuk 3 ай бұрын
next one dropped
@keithrule4240
@keithrule4240 3 ай бұрын
Great video. Just the right amount of detail. Thanks.
@chrishayuk
@chrishayuk 3 ай бұрын
Glad it was helpful!
@narenkrishnanGenius
@narenkrishnanGenius 3 ай бұрын
very well explained and useful
@chrishayuk
@chrishayuk 3 ай бұрын
So glad to hear that, thank you
@kennethmichaelreda
@kennethmichaelreda 3 ай бұрын
Insanely valuable video. Thank you!
@chrishayuk
@chrishayuk 3 ай бұрын
Glad it’s useful
@atifsaeedkhan9207
@atifsaeedkhan9207 3 ай бұрын
Thanks being so so in details. That was really a refresher for me. Glad someone like you is doing such a good work.
@chrishayuk
@chrishayuk 3 ай бұрын
thank you, very much appreciate that
@prashantkowshik5637
@prashantkowshik5637 Ай бұрын
Thanks a lot Chris.
@chrishayuk
@chrishayuk 10 күн бұрын
glad it was useful
@wadejohnson4542
@wadejohnson4542 2 ай бұрын
For the very first time, I finally get it, thanks to you. Thank you for your service to the community.
@john6268
@john6268 3 ай бұрын
How does the tokenizer decode sub-word embeddings? Specifically, how do you determine which sequence is concatenated into a word vs. standing on its own? As shown, the answer would be decoded with spaces between the embeddings, which wouldn't make "Lovelace" into a word.
@chrishayuk
@chrishayuk 3 ай бұрын
Certain tokens will have spaces others won’t so _lace would be a different token from lace. I have a deep dive of the tiktoken tokenizer where I spend a lot of time on this. I am planning to do a building a tokenizer vid soon as part of this series
@chrishayuk
@chrishayuk 3 ай бұрын
how the tokenizer for gpt-4 (tiktoken) works and why it can't reverse strings kzbin.info/www/bejne/hH7SeXuJjMtkg9E
@john6268
@john6268 3 ай бұрын
@@chrishayuk Thanks, I'll check out the other video and looking forward to the next one.
@ilyanemihin6029
@ilyanemihin6029 3 ай бұрын
Thank you! This video brings light into the black box of LLM magic)
@chrishayuk
@chrishayuk 3 ай бұрын
more to come, the next set of videos reveal a bunch more
@nguyenhuuanhtuan5360
@nguyenhuuanhtuan5360 3 ай бұрын
Aways awesome content ❤
@chrishayuk
@chrishayuk 3 ай бұрын
Super glad it’s useful, thank you
@BiranchiNarayanNayak
@BiranchiNarayanNayak 3 ай бұрын
Excellent tutorial to get started with LLMs.
@chrishayuk
@chrishayuk 3 ай бұрын
Glad you liked it!
@javaneze
@javaneze 2 ай бұрын
great video - many thanks!
@chrishayuk
@chrishayuk 2 ай бұрын
Glad you liked it!
@AncientSlugThrower
@AncientSlugThrower 3 ай бұрын
Great video.
@chrishayuk
@chrishayuk 3 ай бұрын
thank you
@MannyBernabe
@MannyBernabe 3 ай бұрын
great video. Thx!
@chrishayuk
@chrishayuk 3 ай бұрын
thank you
@tec-earning8672
@tec-earning8672 3 ай бұрын
Great job sir, one video for me sir how to build llama APIs i want use my train own model now i want using in my website ..
@chrishayuk
@chrishayuk 3 ай бұрын
That’s where we are are working up to, but you can check out my existing fine tuning llama-2 video
@huiwencheng4585
@huiwencheng4585 3 ай бұрын
Bro, just turn-on the Big Thank so I can donate you
@chrishayuk
@chrishayuk 3 ай бұрын
lol, not gonna happen but appreciate the gesture and glad you like the videos
NO NO NO YES! (50 MLN SUBSCRIBERS CHALLENGE!) #shorts
00:26
PANDA BOI
Рет қаралды 102 МЛН
CAN YOU HELP ME? (ROAD TO 100 MLN!) #shorts
00:26
PANDA BOI
Рет қаралды 36 МЛН
Glow Stick Secret 😱 #shorts
00:37
Mr DegrEE
Рет қаралды 144 МЛН
SHE WANTED CHIPS, BUT SHE GOT CARROTS 🤣🥕
00:19
OKUNJATA
Рет қаралды 14 МЛН
Getting Started with OLLAMA - the docker of ai!!!
18:19
Chris Hay
Рет қаралды 9 М.
why llama-3-8B is 8 billion parameters instead of 7?
25:40
Chris Hay
Рет қаралды 2,9 М.
🐐Llama 3 Fine-Tune with RLHF [Free Colab 👇🏽]
14:30
Whispering AI
Рет қаралды 13 М.
What is Retrieval Augmented Generation (RAG) and JinaAI?
37:12
Chris Hay
Рет қаралды 2,9 М.
Ollama 0.1.26 Makes Embedding 100x Better
8:17
Matt Williams
Рет қаралды 41 М.
Fine-Tune Llama3 using Synthetic Data
37:03
Chris Hay
Рет қаралды 1,4 М.
7 Years of Learning in 10 Minutes: PRODUCTIVITY secrets
10:15
LITTLE BIT BETTER
Рет қаралды 595 М.
Mistral Fine Tuning for Dummies (with 16k, 32k, 128k+ Context)
24:15
Nodematic Tutorials
Рет қаралды 11 М.
Which Phone Unlock Code Will You Choose? 🤔️
0:14
Game9bit
Рет қаралды 12 МЛН
Карточка Зарядка 📱 ( @ArshSoni )
0:23
EpicShortsRussia
Рет қаралды 173 М.
👎Главный МИНУС планшета Apple🍏
0:29
Demin's Lounge
Рет қаралды 482 М.
как спасти усилитель?
0:35
KS Customs
Рет қаралды 446 М.