How the Gemma/Gemini Tokenizer Works - Gemma/Gemini vs GPT-4 vs Mistral

  Рет қаралды 1,747

Chris Hay

Chris Hay

Күн бұрын

Пікірлер: 10
@Aberger789
@Aberger789 8 ай бұрын
Well, it's 2am, and I can't wait to watch your other videos. I am building some RAG implementations with scientific journals from PDF, and feeling like I'm going in circles. Taking a step back and considering the bigger concepts is helping. Great format for learning, I really appreciate your time!
@chrishayuk
@chrishayuk 8 ай бұрын
glad you're enjoying, you might wanna checkout my RAG video, and listen to my stoopid poems
@reza2kn
@reza2kn 9 ай бұрын
This is wonderful! The dataset alone is super useful to have, and the video walk through was really awesome for someone who's just trying to understand what's what here :D Please keep on doing what you're doing! One thing I have been interested in is visualizing the entire vocabulary inside a tokenizer to actually see what's inside, but have it be done in a easy to explore way. tried world clouds and didn't work at all. Do you have any ideas? I'm also super interested in fine-tuning models to teach them another language and using agents, but not to just look at codes for 30 mins. Specific , real-world use-cases with applied examples. I think KZbin is really lacking that at the moment. P.S: Cool glasses :)
@chrishayuk
@chrishayuk 9 ай бұрын
thank you, glad it's useful. you might find my next video on embeddings useful for visualization (no spoilers :). As for fine-tuning. I recently downloaded a lot of english-welsh translations, and was planning to do a video on that. i was going to use llama2-7b as i know it doesn't do welsh. i might do it with Gemma but not sure if does Welsh already. Regardless i'll be doing a language fine tune video soon
@smithnigelw
@smithnigelw 9 ай бұрын
Thanks Chris. Very interesting how they have chosen the vocabulary. For representation of programs in Python, how do they tokenise the white-space? I’m looking forward to the video on embedding.
@chrishayuk
@chrishayuk 9 ай бұрын
it's a similar approach to llama, because not every language seperates using whitespace. i'll maybe cover that in a future video. i will update the programming languages in the dataset, i didn't have time to merge all the other versions back in (where python was covered)
@cybermanaudiobooks3231
@cybermanaudiobooks3231 9 ай бұрын
Great video. Companion piece to Andrej Karpathy's most recent. Very insightful. Thanks!
@chrishayuk
@chrishayuk 9 ай бұрын
Thank you, glad it’s useful. This one was a video I’ve been trying to get right for a while
@garyhamilton2104
@garyhamilton2104 9 ай бұрын
Commenting cuz I know Chris will give me a heart :)
@chrishayuk
@chrishayuk 9 ай бұрын
because i love you all
How does Gemini compare to GPT-4?
15:24
Samuel Albanie
Рет қаралды 3 М.
Real Man relocate to Remote Controlled Car 👨🏻➡️🚙🕹️ #builderc
00:24
Don't underestimate anyone
00:47
奇軒Tricking
Рет қаралды 18 МЛН
Accompanying my daughter to practice dance is so annoying #funny #cute#comedy
00:17
Funny daughter's daily life
Рет қаралды 13 МЛН
Visualizing transformers and attention | Talk for TNG Big Tech Day '24
57:45
Nemotron-4 is BIG in More Ways than One
10:02
AI Master Group
Рет қаралды 850
DEF CON 32 - Social Engineering Like you’re Picard - Jayson E  Street
46:50
Has Generative AI Already Peaked? - Computerphile
12:48
Computerphile
Рет қаралды 1 МЛН
what happens if you give claude's system prompt to llama3...
19:51
AI Pioneer Shows The Power of AI AGENTS - "The Future Is Agentic"
23:47
Brexit’s Trade Fallout: How the UK is Struggling to Adapt
22:04
Gemini Ultra: Does it Beat GPT-4? (Surprising Results!)
15:02
Brie Kirbyson
Рет қаралды 13 М.
Real Man relocate to Remote Controlled Car 👨🏻➡️🚙🕹️ #builderc
00:24