LLMs for Everything and Everyone! - Sebastian Raschka - Lightning AI

  Рет қаралды 6,799

Optimized AI Conference

Optimized AI Conference

Күн бұрын

Пікірлер: 6
@wisconsuper
@wisconsuper Жыл бұрын
🎯 Key Takeaways for quick navigation: 00:29 🤖 Sebastian Raschka is the lead AI educator for Lightning AI and a former professor of Statistics. 00:43 🤔 LLMs (Large Language Models) promise to make us more productive by speeding up tasks like writing emails and coding. 01:52 🦾 Sebastian discusses the motivation behind caring about LLMs, including staying updated with news and the potential for increased productivity. 04:09 🎛️ LLMs like ChatGPT go through three main stages: pre-training, supervised fine-tuning, and alignment. 07:37 💡 The pre-training and fine-tuning stages involve next word prediction tasks, but the format of the data sets and instructions differ. 10:09 🤖 LLMs (Large Language Models) go through a three-stage process: pre-training, fine-tuning, and alignment. 11:31 🦾 Reinforcement Learning from Human Feedback (RLHF) involves sampling prompts, having humans rank the responses, training a reward model, and refining the model using proximal policy optimization. 13:21 💡 When it comes to using LLMs, there are five main ways: everyday tasks, pre-training and prompting for specific domains, fine-tuning on specific data, fine-tuning and prompting for code-related tasks, and training custom models. 16:36 👥 Bloomberg pre-trained their own LLM on general data and finance-specific data to generate a proprietary database format and finance news headlines. 18:13 🐍 Meta AI (formerly Facebook) trained the Code LLM by pre-training a model, further training it on general code data, fine-tuning it on python code, and iterating through various fine-tuning stages to create models like Code Llama. 19:07 👥 Open source LLMs provide access to log probabilities, allowing for analyzing the confidence of the model and ranking responses. 21:24 🤖 Open source LLMs provide privacy and control as the data never has to leave the computer and can be modified as desired. 23:19 👍 Open source LLMs offer full customizability and allow for experimentation and modifications to the model. 24:30 💡 Open source LLMs do not change unless desired, which can be an advantage or disadvantage depending on the need for updates. 25:40 🔌 Running open source LLMs requires access to hardware and some additional work in terms of cloning repositories and downloading weights. 30:02 👥 Open source models like Free Willy Falcon, Bikuna, PhysioStable, Code Llama, and Lit GPT offer freely available weights and a hackable, customizable codebase. 31:27 🏢 Fine-tuning smaller models can achieve better performance on specific tasks than using larger general models like GPT-3 or GPT-4. 32:49 💡 Fine-tuning models are task-specific and useful for solving specific business problems, while prompting models are more general-purpose. 34:40 ⏱️ Parameter-efficient fine-tuning techniques like adapters and low-rank adaptation (Laura) can save a significant amount of time compared to full fine-tuning. 37:12 🌟 Future trends for LLMs include the mixture of experts approach, multi-modal models, and LLMs for specific domains like protein-related tasks. 39:33 🎯 Non-transformer large language models like RWKV, hyena hierarchy, and retentive Network offer alternative approaches to language modeling that are worth keeping an eye on. 39:49 🔄 There are alternatives to reinforcement learning-based fine-tuning, such as relabeling data in hindsight and direct policy optimization, which show promising results and may simplify the process. 41:22 🌐 The performance of fine-tuned models can vary based on the specificity of the domain, with the potential to outperform larger pre-trained models in smaller, more specific domains. 41:09 📚 Sebastian Raschka stays updated on new research papers and frequently discusses them on his blog. He is also involved in developing the open-source repository Electricity for loading and customizing LLMs. 41:40 🎙️ There is no size limit for the domain when it comes to outperforming pre-trained models with fine-tuned models. The performance can vary based on the specific task and the domain covered. Made with HARPA AI
@zakiasalod891
@zakiasalod891 Жыл бұрын
Absolutely amazing! Thank you so much for this.
@Saitama-ur3lq
@Saitama-ur3lq Жыл бұрын
what exactly is a parameter in llm? what do you mean when you say this model has 1 billion parameters?
@kuterv
@kuterv Жыл бұрын
Parameters are numerical values that basically translate into model performance/behaviour, learned during the training process. Those are needed to better capture the patterns and relationships in the data. More = higher complexity = ability to perform more complex tasks. As a rough estimate, for every 1 billion data points, you can expect approximately 10 billion parameters. Where data point is a unit of information that is used to train a model. Let's say we want to train a model to understand positive and negative reviews, so data point will be: one review (tasty pizza and great service) + sentiment (positive). Example: weights. LLM is a neural network - a lot of connections. How to figure out the influence of one neuron on another? - weights. He mentioned another parameter called "temperature" - controls the randomness of the generative process. Temperature is usually used for the entire model but for some cases, multiple temperature values could be used. You can adjust "biases" to eliminate (probably partially if data set is like that) offensive or harmful model responses.
@Saitama-ur3lq
@Saitama-ur3lq Жыл бұрын
@@kutervthank you for the insanely detailed explanation, lets say you wanted to train a model on reviews since you mentioned it. How many parameters do you need or does this somehow get set by the system while training
@rodi4850
@rodi4850 Жыл бұрын
nothing new, just recycled existing information on this topic, very disappointing ...
How to Fight a Gross Man 😡
00:19
Alan Chikin Chow
Рет қаралды 18 МЛН
FOREVER BUNNY
00:14
Natan por Aí
Рет қаралды 33 МЛН
Turn Off the Vacum And Sit Back and Laugh 🤣
00:34
SKITSFUL
Рет қаралды 7 МЛН
Insights from Finetuning LLMs with Low-Rank Adaptation
13:49
Sebastian Raschka
Рет қаралды 6 М.
MIT Introduction to Deep Learning | 6.S191
1:09:58
Alexander Amini
Рет қаралды 746 М.
What's Next? (LLM Bootcamp)
1:00:09
The Full Stack
Рет қаралды 10 М.
Transformers (how LLMs work) explained visually | DL5
27:14
3Blue1Brown
Рет қаралды 3,8 МЛН
LLM and Vector Databases: Concepts, Architectures, and Examples - Sam Partee
23:29
BloombergGPT: How We Built a 50 Billion Parameter Financial Language Model
40:33
Toronto Machine Learning Series (TMLS)
Рет қаралды 124 М.
How to Fight a Gross Man 😡
00:19
Alan Chikin Chow
Рет қаралды 18 МЛН