Well, it's 2am, and I can't wait to watch your other videos. I am building some RAG implementations with scientific journals from PDF, and feeling like I'm going in circles. Taking a step back and considering the bigger concepts is helping. Great format for learning, I really appreciate your time!
@chrishayuk8 ай бұрын
glad you're enjoying, you might wanna checkout my RAG video, and listen to my stoopid poems
@reza2kn9 ай бұрын
This is wonderful! The dataset alone is super useful to have, and the video walk through was really awesome for someone who's just trying to understand what's what here :D Please keep on doing what you're doing! One thing I have been interested in is visualizing the entire vocabulary inside a tokenizer to actually see what's inside, but have it be done in a easy to explore way. tried world clouds and didn't work at all. Do you have any ideas? I'm also super interested in fine-tuning models to teach them another language and using agents, but not to just look at codes for 30 mins. Specific , real-world use-cases with applied examples. I think KZbin is really lacking that at the moment. P.S: Cool glasses :)
@chrishayuk9 ай бұрын
thank you, glad it's useful. you might find my next video on embeddings useful for visualization (no spoilers :). As for fine-tuning. I recently downloaded a lot of english-welsh translations, and was planning to do a video on that. i was going to use llama2-7b as i know it doesn't do welsh. i might do it with Gemma but not sure if does Welsh already. Regardless i'll be doing a language fine tune video soon
@smithnigelw9 ай бұрын
Thanks Chris. Very interesting how they have chosen the vocabulary. For representation of programs in Python, how do they tokenise the white-space? I’m looking forward to the video on embedding.
@chrishayuk9 ай бұрын
it's a similar approach to llama, because not every language seperates using whitespace. i'll maybe cover that in a future video. i will update the programming languages in the dataset, i didn't have time to merge all the other versions back in (where python was covered)
@cybermanaudiobooks32319 ай бұрын
Great video. Companion piece to Andrej Karpathy's most recent. Very insightful. Thanks!
@chrishayuk9 ай бұрын
Thank you, glad it’s useful. This one was a video I’ve been trying to get right for a while
@garyhamilton21049 ай бұрын
Commenting cuz I know Chris will give me a heart :)