Fine-tuning Large Language Models (LLMs) | w/ Full Code

  Рет қаралды 2,591

Farzad Roozitalab (AI RoundTable)

Farzad Roozitalab (AI RoundTable)

Күн бұрын

Пікірлер: 23
@dhrumil5977
@dhrumil5977 9 ай бұрын
What people never say is how they convert pdf to qa pairs programmatically 😵‍💫
@airoundtable
@airoundtable 9 ай бұрын
That is a great point @dhrumil5977. This is a very challenging task and I tried to cover it in the video. However, addressing the challenge of processing PDF files and converting them to Q&A format involves dealing with several key issues: 1. Diverse PDF Structures: PDF files vary widely in their composition, presenting challenges for consistent data extraction. For instance: - Some PDFs contain large images, like AutoCAD designs, which many libraries struggle to handle effectively. - Scanned PDFs lack machine-readable text unless optical character recognition (OCR) is applied. - Structural inconsistencies, such as incorrect spacing settings, can hinder content interpretation. 2. Structural Complexity: The structure of PDFs further complicates processing, with differences between documents like product manuals and customer support Q&As. Each type may require unique preprocessing steps. (I discussed it at 00:19:25) 3. Tool Limitations: Existing libraries and tools face limitations in accurately parsing and understanding PDF content. This necessitates customized solutions and manual intervention, especially in scenarios involving: - Loading PDFs programmatically. - Converting content into question-and-answer (Q&A) pairs, often relying on Language Model Models (LLMs) like ChatGPT. (as I discussed it in the video at 00:19:40 and also showed the prompts that I passed to the GPT model for these documents at 00:19:57) - Manual verification and modification of generated questions due to LLMs' inability to match expert comprehension. 4. Human Involvement: Despite advancements, human interaction remains crucial, particularly for: - Identifying and removing irrelevant questions. - Enhancing incomplete questions. - Adding important questions missed by automated processes. 5. Scalability Challenges: Companies with extensive PDF archives face scalability hurdles, as manually addressing each document is impractical. Establishing clear guidelines and long-term plans is essential to streamline processes and develop efficient pipelines for document handling. In summary, complexities such as varied PDF structures, tool limitations, and scalability requires a multifaceted approach, combining automation with human oversight and strategic planning. Hope this answer helps.
@omidsa8323
@omidsa8323 11 ай бұрын
Very well-structured! Keep going.
@MANASSINGHA-i2h
@MANASSINGHA-i2h 3 ай бұрын
this is so good....sir can we have more full end to end projects to show in our resume..ur teaching is so dope
@alexdossantosliberato4636
@alexdossantosliberato4636 10 ай бұрын
Abriu a mente para muitas possibilidades, estou recomendando ao amigos. Continue :)
@airoundtable
@airoundtable 10 ай бұрын
Muito obrigado pelo apoio! Fico feliz em saber que o conteúdo foi útil para você :)
@RAJARAM-cx5jb
@RAJARAM-cx5jb 9 ай бұрын
The best and holistic video on fine tuning LLM😊. Looking forward to your PEFT fine tuning with LORA/QLORA video
@airoundtable
@airoundtable 9 ай бұрын
Thanks! I am glad to see you liked the content. PEFT is coming soon 💯
@lijunlang8575
@lijunlang8575 9 ай бұрын
Really appreciated your LLM series! Keep it going!
@hadi-yeg
@hadi-yeg 11 ай бұрын
Very exciting! Thanks for the informative session!
@airoundtable
@airoundtable 11 ай бұрын
Thanks Hadi! I am glad you enjoyed the video
@nathWSD
@nathWSD 6 ай бұрын
very beautiful and sweet thanks alot man
@airoundtable
@airoundtable 6 ай бұрын
Thanks! I am glad you liked the video!
@KinesitherapieImanesghuri
@KinesitherapieImanesghuri 7 ай бұрын
You choose the best model based on what ? and how can we include Quantization ?
@airoundtable
@airoundtable 7 ай бұрын
Based on: 1. Leaderboards 2. Use case: It's performance in various tasks, specially how it performs in tasks similar to the one that you have in mind for fine-tuning. 3. Number of parameters: Size definitely makes a big difference. Performance gets better as the model gets bigger (as well as the cost) 4. Hardware: Available hardware is very important in selecting the proper model 5. Project specifications: If the model is supposed to interact with large number of users, number of user's, acceptable system's delay and so many other factors need to be considered. 6. Close source vs open source: This can make a huge difference in model selection 7. Cost of fine tuning, serving, and maintanance and alot of other specifications for the project such as security measures, hallucination, etc. I wanted to upload a video for quantization but I saw alot of content on the internet so I postponed it. Soon I will upload a video on how to fine-tune a multi modal LLM with quantization. But until then, by search the web and you will find a ton of good material on how to add quantization in fine-tuning process. It will be very simple for you if you understood the contents of this video.
@pavankumarelango7295
@pavankumarelango7295 11 ай бұрын
can i run this in colab?
@airoundtable
@airoundtable 11 ай бұрын
yes. The whole pipeline can be executed on colab. there are only 3 things that you have to take into account: 1. Installing the libraries in colab (e.g: transformers) 2. Managing the GPU RAM that colab gives you. In there you can select to use a free T4 GPU that gives you 14 GB of RAM and with that you can fine-tune the 70m and 1.5b model that I explained in the video. However, if you load the models and fine-tune them using PEFT methods, you can fine-tune up to a 7B model on that 14 GB GPU. 3. Make sure to copy and paste the datasets in your colab session (or your google drive) and load them from there.
@pavankumarelango7295
@pavankumarelango7295 11 ай бұрын
Thanks for your quick reply @@airoundtable
@sureshm3435
@sureshm3435 2 ай бұрын
your content is amazing but your voice is very low to hear...
@airoundtable
@airoundtable 2 ай бұрын
Thanks. You are right. I noticed the issue and bought a Microphone to improve the sound on the next videos. So, my voice in my recent videos is more clear
@sureshm3435
@sureshm3435 2 ай бұрын
@@airoundtable 👍
@amraromoro
@amraromoro 3 ай бұрын
Great video thank you man.
@airoundtable
@airoundtable 3 ай бұрын
I am glad you liked it
Langchain vs Llama-Index - The Best RAG framework? (8 techniques)
53:50
Farzad Roozitalab (AI RoundTable)
Рет қаралды 15 М.
Fine-Tuning Multimodal LLMs (LLAVA) for Image Data Parsing
53:43
Farzad Roozitalab (AI RoundTable)
Рет қаралды 4,2 М.
Правильный подход к детям
00:18
Beatrise
Рет қаралды 11 МЛН
VIP ACCESS
00:47
Natan por Aí
Рет қаралды 30 МЛН
Une nouvelle voiture pour Noël 🥹
00:28
Nicocapone
Рет қаралды 9 МЛН
Connect GPT Agent to Duckduckgo Search Engine | Streamlit Chatbot
52:11
Farzad Roozitalab (AI RoundTable)
Рет қаралды 1,7 М.
Fine-Tuning Large Language Models (LLMs)
1:16:12
Oren Sultan, AI Research Scientist & Engineer
Рет қаралды 5 М.
Chat and RAG with Tabular Databases Using Knowledge Graph and LLM Agents
1:23:34
Farzad Roozitalab (AI RoundTable)
Рет қаралды 24 М.
Chat with Multiple/Large SQL and Vector Databases using LLM agents (Combine RAG and SQL-Agents)
1:36:39
Efficient Fine-Tuning for Llama-v2-7b on a Single GPU
59:53
DeepLearningAI
Рет қаралды 86 М.
Building Production RAG Over Complex Documents
1:22:18
Databricks
Рет қаралды 16 М.
Chat with SQL and Tabular Databases using LLM Agents (DON'T USE RAG!)
58:54
Farzad Roozitalab (AI RoundTable)
Рет қаралды 80 М.
LLM Fine Tuning Crash Course: 1 Hour End-to-End Guide
1:21:01
AI Anytime
Рет қаралды 66 М.