A Hackers' Guide to Language Models
1:31:13
Lesson 9B - the math of diffusion
51:36
Fast.ai APL study session 17
1:13:59
Fast.ai APL study session 16
1:09:47
Fast.ai APL study session 15
1:26:17
Пікірлер
@davidz6828
@davidz6828 12 сағат бұрын
Thanks for sharing, Jeremy and team. Always learn something new from you.
@niazhimselfangels
@niazhimselfangels Күн бұрын
Only fifteen minutes in and I'm already missing fastai classes so much! There's a ton I learn from you Jeremy, whenever you do one of these streams. And the Hackers Guide to LLMs video is a true gem! 🎉
@law_wu
@law_wu Күн бұрын
Thanks for this Jeremy and team! So informative to learn how these libraries are built. Looking forward to future dev chats.
@souvikbhattacharyya2480
@souvikbhattacharyya2480 2 күн бұрын
Thanks for making these public!
@pkn8707
@pkn8707 2 күн бұрын
Until and unless, educators like Jeremy are present, no closed source company can have a lock on knowledge. Thanks for doing what you do, so consistently. One question though, even though there's so much chaos in education field, what motivates you to do it consistently? Doing great is okay, doing great consistently is really hard in this distraction prone world. Anyways as always Thank you and your team for your contribution
@EkShunya
@EkShunya 2 күн бұрын
great video as usual
@wolpumba4099
@wolpumba4099 2 күн бұрын
*Summary* *Why Claudette exists:* * *[**0:00**]* Jeremy feels Claude is underrated and wants to promote its use. * *[**4:03**]* He aims to provide a simpler, more transparent alternative to large, complex LLM frameworks. * *[**6:18**]* Claudette bridges the gap between writing everything from scratch and using bulky frameworks, appealing to both beginners and experienced Python programmers. *What Claudette does:* * *Simplifies Anthropic SDK:* * *[**8:27**]* Offers convenient functions to access Claude's models (Opus, Sonnet, Haiku). * *[**15:57**]* Handles message formatting and content extraction, making interactions cleaner. * *[**19:15**]* Tracks token usage across sessions for cost monitoring. * *[**24:48**]* Provides easy ways to use `prefill` (forcing the start of a response) and streaming. * *Implements Chat History:* * *[**1:01:55**]* The `Chat` class maintains conversation history, mimicking stateful behavior. * *[**1:04:34**]* Integrates seamlessly with prefill, streaming, and tool use within a chat session. * *Facilitates Tool Use (REACT pattern):* * *[**36:57**]* Uses Python functions as tools, automatically generating the required JSON schema with `docstrings` and type hints. * *[**56:21**]* Handles tool execution based on Claude's requests, including passing inputs and retrieving results. * *[**1:28:45**]* Provides a `tool_loop` function to automate multi-step tool interactions. * *Supports Images:* * *[**1:09:01**]* Includes functions to easily incorporate images into messages and prompts. * *Example Use Cases:* * *[**1:19:12**]* Demonstrates building a simple customer service agent with tool use. * *[**1:31:20**]* Showcases building a code interpreter similar to ChatGPT's, combining Python execution with other tools. *Key Features:* * *[**7:36**]* *Literate Programming:* The source code is designed to be read like a tutorial, explaining itself as you go. * *[**5:09**]* *Minimal Abstractions:* Leverages existing Python knowledge and avoids introducing unnecessary complexity. * *[**1:24:51**]* *Transparency:* You can easily inspect requests, responses, history, and even debug the HTTP communication. *Future Plans:* * *[**1:16:19**]* Create similar libraries ("friends") for other LLMs like GPT and Gemini. * *[**1:16:19**]* Maintain focus on simplicity and ease of use. *Overall:* Claudette is a user-friendly library that simplifies working with Claude while providing powerful features for building LLM applications. Its literate programming style and minimal abstractions make it easy to learn, use, and extend. i used gemini 1.5 pro to summarize the transcript.
@user-go4sb6re4y
@user-go4sb6re4y 2 күн бұрын
Thanks for podcast!
@giorda77
@giorda77 5 күн бұрын
Lesson 11 and 12 should be called "Broadcasting:the missing semester." Great content @Jeremy Howard. Thanks again
@adityagupta-hm2vs
@adityagupta-hm2vs 6 күн бұрын
Also, are we using latent space as gradients here, as we are subtracting gradients from the latent, which we typically do from weights in conventional NN ?
@adityagupta-hm2vs
@adityagupta-hm2vs 6 күн бұрын
How do we decide the scaling factor in VAE part i.e. 0.18215, any hint on how to decide it ? I did try changing and could see the different output, but what's a good way to choose ?
@chrismarais1999
@chrismarais1999 6 күн бұрын
Q: why not just make bs=len(valid_ds), i.e. make the batch size for the validation set the same as its length? I can't see a function in having batches of the validation set, since we're just computing some metric on it?
@paulsawyer1288
@paulsawyer1288 8 күн бұрын
Did you ever use your understanding to teach your peers in classes?
@keramoti_baharul
@keramoti_baharul 16 күн бұрын
A hackers' guide to using language models, including open-source and OpenAI models, with a focus on a code-first approach. Covers language model pre-training, fine-tuning, and reinforcement learning from human feedback. Demonstrates creating a custom code interpreter and fine-tuning a model for SQL generation. Key moments: 00:01 Language models are essential in predicting the next word or filling in missing words in a sentence. They use tokens, which can be whole words or subword units, and are trained through pre-training on large datasets like Wikipedia. -Language models predict the next word or fill in missing words. They use tokens that can be whole words or subword units, enhancing their predictive capabilities. -Training language models involves pre-training on extensive datasets like Wikipedia. This process helps the model learn language patterns and improve its predictive accuracy. 08:04 Neural networks, specifically deep neural networks, are trained to predict the next word in a sentence by learning about the world and building abstractions. This process involves compression and fine-tuning through language model fine-tuning and classifier fine-tuning. -The importance of neural networks learning about objects, time, movies, directors, and people to predict words effectively. This knowledge is crucial for language models to perform well in various tasks. -The concept of compression in neural networks and the relationship between compression and intelligence. Fine-tuning through language model fine-tuning and classifier fine-tuning enhances the model's capabilities. -Different approaches like instruction tuning and reinforcement learning from human feedback are used in classifier fine-tuning to improve the model's performance in answering questions and solving problems. 16:07 To effectively use language models, starting with being a proficient user is crucial. GPT-4 is currently recommended for language modeling, offering capabilities beyond common misconceptions about its limitations. -GPT-4's ability to address reasoning challenges and common misconceptions about its limitations. It can effectively solve problems when primed with custom instructions. -The training process of GPT-4 and the importance of understanding its initial purpose to provide accurate answers. Custom instructions can guide GPT-4 to offer high-quality information. -The impact of custom instructions on GPT-4's problem-solving capabilities and the ability to generate accurate responses by priming it with specific guidelines. 24:12 Language models like GPT-4 can provide concise answers but may struggle with self-awareness and complex logic puzzles, leading to hallucinations and errors. Encouraging multiple attempts and using advanced data analysis can improve accuracy. -Challenges with self-awareness and complex logic puzzles can lead to errors and hallucinations in language models like GPT-4, affecting the accuracy of responses. -Encouraging multiple attempts and utilizing advanced data analysis can enhance the accuracy of language models like GPT-4 in providing solutions to complex problems. -Utilizing advanced data analysis allows for requesting code generation and testing, improving efficiency and accuracy in tasks like document formatting. 32:20 Language models like GPT-4 excel at tasks that require familiarity with patterns and data processing, such as extracting text from images, creating tables, and providing quick responses based on predefined instructions. -The efficiency of language models in tasks like extracting text from images and creating tables due to their ability to recognize patterns and process data quickly. -Comparison between GPT-4 and GPT 3.5 in terms of cost-effectiveness for using the Open AI API, showcasing the affordability and versatility of GPT models for various tasks. -The practical applications of using the Open AI API programmatically for data analysis, repetitive tasks, and creative programming, offering a different approach to problem-solving. 40:26 Understanding the cost and usage of OpenAI's GPT models is crucial. Monitoring usage, managing rate limits, and creating custom functions can enhance the experience and efficiency of using the API. -Monitoring usage and cost efficiency of OpenAI's GPT models is essential to avoid overspending. Testing with lower-cost options before opting for expensive ones can help in decision-making. -Managing rate limits is important when using OpenAI's API. Keeping track of usage, especially during initial stages, and implementing functions to handle rate limit errors can prevent disruptions in service. -Creating custom functions and tools can enhance the functionality of OpenAI's API. Leveraging function calling and passing keyword arguments can enable the development of personalized code interpreters and utilities. 48:30 Creating a code interpreter using GPT-4 allows for executing code and returning results. By building functions, one can enhance the model's capabilities beyond standard usage. -Exploring the concept of doc strings as the key for programming GPT-4, highlighting the importance of accurate function descriptions for proper execution. -Utilizing custom functions to prompt GPT-4 for specific computations, showcasing the model's ability to determine when to use provided functions. -Enhancing GPT-4's functionality by creating a Python function for executing code and returning results, ensuring security by verifying code before execution. 56:34 Using Fast AI allows accessing others' computers for cheaper and better availability. GTX 3090 is recommended for language models due to memory speed over processor speed. -Options for renting GPUs include GTX 3090 for $700 or A6000 for $5000, with memory size considerations. Using a Mac with M2 Ultra can be an alternative. -Utilizing the Transformers library from Hugging Face for pre-trained models. Challenges with model evaluation metrics and potential data leakage in training sets. -Selecting models based on Metas Llama 2 for language models. Importance of fine-tuning pre-trained models for optimal performance and memory considerations. 1:04:38 Jeremy Howard, an Australian AI researcher and entrepreneur, discusses optimizing language models for speed and efficiency by using different precision data formats, such as B float 16 and gptq, resulting in significant time reductions. -Exploring the use of B float 16 and gptq for optimizing language models, leading to faster processing speeds and reduced memory usage. -Utilizing instruction-tuned models like stable Beluga and understanding the importance of prompt formats during the instruction tuning process. -Implementing retrieval augmented generation to enhance language model responses by searching for relevant documents like Wikipedia and incorporating the retrieved information. 1:12:42 The video discusses using open-source models with context lengths of 2000-4000 tokens to answer questions by providing context from web pages. It demonstrates using a sentence Transformer model to determine the most relevant document for answering a question. -Utilizing sentence Transformer models to identify the most suitable document for answering questions based on similarity calculations. -Exploring the process of encoding documents and questions to generate embeddings for comparison and selecting the most relevant document. -Discussing the use of vector databases for efficient document encoding and retrieval in large-scale information processing tasks. 1:20:46 Fine-tuning models allows for customizing behavior based on available documents, demonstrated by creating a tool to generate SQL queries from English questions, showcasing the power of personalized model training in just a few hours. -Utilizing the Hugging Face data sets library for fine-tuning models, enabling quick customization based on specific datasets for specialized tasks. -Exploring the use of Axolotl, an open-source software, to fine-tune models efficiently, showcasing the ease of implementation and ready-to-use functionalities for model training. -Discussing alternative options for model training on Mac systems, highlighting the mlc and llama.cpp projects that offer flexibility in running language models on various platforms. 1:28:50 Exploring language models like Llama can be exciting yet challenging for Python programmers due to rapid development, early stages, and installation complexities. -Benefits of using Nvidia graphics card and being a capable Python programmer for utilizing Pi torch and hugging face ecosystem in language model development. -The evolving nature of language models like Llama, the abundance of possibilities they offer, and the importance of community support through Discord channels.
@marko.p.radojcic
@marko.p.radojcic 17 күн бұрын
I am getting KZbin premium just! to download this series. Thank you!
@adnanwahab4191
@adnanwahab4191 24 күн бұрын
Amazing content thank you so much !
@shubh9207
@shubh9207 26 күн бұрын
Please roll out the next series, although I'm in the first part, I just can't wait to reach here and learn from such amazing tutors.
@DeltaJes-co8yu
@DeltaJes-co8yu 27 күн бұрын
Kerala has something to be proud about!
@joxa6119
@joxa6119 27 күн бұрын
What does Jeremy throw to the audience? is it mic
@KetanSingh
@KetanSingh Ай бұрын
I try watching this once every year. Incredibly good course
@DinoFancellu
@DinoFancellu Ай бұрын
Don't like all this jumping around. Would be much easier to simply go through it, in a linear fashion, explaining as you go. Disappointing
@frankchieng
@frankchieng Ай бұрын
i thought in the class of WandBCB(MetricsCB) def _log(self, d): if self.train: should be modified with if d['train']=='train'
@bbalban
@bbalban Ай бұрын
great course! so weird that the videos have less than 100k views.
@RayhaanKhan-mu4qu
@RayhaanKhan-mu4qu Ай бұрын
29:45 I can agree that anime people watch wayy too much anime!
@swimmingpolar
@swimmingpolar Ай бұрын
First comment on KZbin here. Among all those videos on KZbin, using custom instruction like what you did is literally eye opening. I thought current AI models’ limitations are limited by nature that it can’t be improved. Of course it is that you are professional in AI but things are so organized well and straightforward that I can understand and see the result right away. 😂 Gonna have to steal your instruction as well.
@bilalch83
@bilalch83 Ай бұрын
Sensei Jeremy Howard
@bayesianmonk
@bayesianmonk Ай бұрын
Sometimes explaining the math helps more than escaping it, no heavy math is used anyway. I found the explanation of DDIM not very clear. Thanks for the course and videos.
@thehigheststateofsalad
@thehigheststateofsalad Ай бұрын
We need another session to explain this process.
@maxkirby8500
@maxkirby8500 18 күн бұрын
Yeah. I've been spending quite a bit of time trying to bridge the gap by reading through the papers and stuff, but maybe that's intented...
@antonioalvarado7594
@antonioalvarado7594 Ай бұрын
Freedom for deep learning: Unlocked. Thank you sir.
@user-kl1dc8nh3l
@user-kl1dc8nh3l Ай бұрын
really great content.
@ItzGanked
@ItzGanked Ай бұрын
jupyter also has documentation for python and other libraries inside the notebook if you go to help python reference. Keeps you inside of the notebook instead of searching the web and will provide docs for the version of python you are using.
@alecmorgan3541
@alecmorgan3541 Ай бұрын
You wouldn't download 600 images of bears!
@pankajsinghrawat1056
@pankajsinghrawat1056 Ай бұрын
since we want to increase the probability of our image being a digit, we should "add" and not "substract" the grad of probability wrt to img. Is this right? or am I missing something ?
@thomasdeniffel2122
@thomasdeniffel2122 Ай бұрын
Adding a dimension at kzbin.info/www/bejne/laO7q5iNppl2bNk is very important as otherwise the minus in the loss function, which then would do incorrect broadcasting leading do an model, which achieves at most 0.55 accuracy. The error is silent, as the mean in the loss function hides this.
@samanforoughi7898
@samanforoughi7898 Ай бұрын
Jeremy spitting gold as always.
@jkscout
@jkscout Ай бұрын
That guy really doesn't like laywers.
@thomasdeniffel2122
@thomasdeniffel2122 Ай бұрын
In `one_epoch` at 44.09, there is a `coeffs.grad.zero_()` missing :-)
@abhishekkalagurki959
@abhishekkalagurki959 Ай бұрын
how much python need to learn bf4 learning this thing
@ggawd
@ggawd Ай бұрын
Thank you for doing this
@mattambrogi8004
@mattambrogi8004 Ай бұрын
Great lesson! I found myself a bit confused by the predictions and loss.backward() at ~37:00. did some digging to clear my confusion up which might be helpful for others: - At 37:00 minutes when we're creating the predictions, Jeremy says we're going to add up (each independent variable * coef) over the columns. There's nothing wrong with how he said this, it just didn't click for my brain: we're creating a prediction for each row by adding up each of the indep_vars*coeffs. So at the end we have a predictions vector with the same number of predictions as we have rows of data. - This is what we then calculate the loss on. Then using the loss, we do gradient descent to see how much changing each coef could have changed the loss (backprop). Then we go and apply those changes to update the coefs, and that's one epoch.
@bilalch83
@bilalch83 Ай бұрын
I can safely say that Jeremy, and by extension, Fast AI have helped me power through some of the most difficult times in my life. The end result was my complete pivot towards a new field and I have never been happier or more driven. Thank you doesn't even cut it.
@howardjeremyp
@howardjeremyp Ай бұрын
Amazing to hear! :D
@marktomm1959
@marktomm1959 Ай бұрын
With gradio==4.28.3 the example code in the video won't work. error "module 'gradio' has no attribute 'inputs'" Fix: image = gr.Image(height=192, width = 192) label = gr.Label()
@NeuralNetWorth-pm4uf
@NeuralNetWorth-pm4uf 2 ай бұрын
With the release of Miniforge3-23.3.1-0, that incorporated the changes in #277, the packages and configuration of Mambaforge and Miniforge3 are now identical. The only difference between the two is the name of the installer and, subsequently, the default installation directory. Is it correct to assume I should be installing Miniforge3-23.3.1-0 as of Sept-23 per the comment above from miniforge?
@joeyc666
@joeyc666 2 ай бұрын
You rock, Jeremy :) Thanks for such an in-depth, yet elegant explanation. Have you posted the custom instructions anywhere? I couldn't seem to find them.
@balajicherukuri
@balajicherukuri 2 ай бұрын
love it! first time ever understand what is ml, of course at surface level. thank you
@Moiez101
@Moiez101 2 ай бұрын
just a quick question: by reproduce the code, is it mean that one should be able to write out the code by memory/understanding as in know all of the parameters within the arguments as well as the defined functions? Of course that would be best case scenario but I feel it would get in the way of moving through the course as one does not need to perfectly be able to reproduce the code, just understand what the parameters are doing, right?
@sunr8152
@sunr8152 2 ай бұрын
building a neural net in spreadsheet. Heck yea!
@fernandoflores4656
@fernandoflores4656 2 ай бұрын
For anyone watching this recently as of 04/18/2024 or I guess anytime in 2024 and beyond, Jeremy put to use the latest version in the first line of code and this is also in the book. However, if you are just starting out, you may get tripped up with this because there were a few changes with some of the source code such as the use of 'search_images_ddg' which no longer exists in the lastest version. At the top of the file for really any of the lessons, replace the first line as follow: # Instead of using this #! [ -e /content ] && pip install -Uqq fastbook # Use this instead #! [ -e /content ] && pip install -Uqq fastbook==0.0.28 && pip install fastai==2.7.9 These are the specific versions of FastAI and Fastbook they are using. This will save you some trouble if you are just trying to follow along or are new to Python.
@user-xn1ly6xt8o
@user-xn1ly6xt8o 2 ай бұрын
it's awesome! thanks a lot
@josephbloom7410
@josephbloom7410 2 ай бұрын
Finish the Damn Course
@VolodymyrBilyachat
@VolodymyrBilyachat 2 ай бұрын
Instead of splitting code to cells I like to run notebook in VsCode and i can debug as normal