What people never say is how they convert pdf to qa pairs programmatically 😵💫
@airoundtable9 ай бұрын
That is a great point @dhrumil5977. This is a very challenging task and I tried to cover it in the video. However, addressing the challenge of processing PDF files and converting them to Q&A format involves dealing with several key issues: 1. Diverse PDF Structures: PDF files vary widely in their composition, presenting challenges for consistent data extraction. For instance: - Some PDFs contain large images, like AutoCAD designs, which many libraries struggle to handle effectively. - Scanned PDFs lack machine-readable text unless optical character recognition (OCR) is applied. - Structural inconsistencies, such as incorrect spacing settings, can hinder content interpretation. 2. Structural Complexity: The structure of PDFs further complicates processing, with differences between documents like product manuals and customer support Q&As. Each type may require unique preprocessing steps. (I discussed it at 00:19:25) 3. Tool Limitations: Existing libraries and tools face limitations in accurately parsing and understanding PDF content. This necessitates customized solutions and manual intervention, especially in scenarios involving: - Loading PDFs programmatically. - Converting content into question-and-answer (Q&A) pairs, often relying on Language Model Models (LLMs) like ChatGPT. (as I discussed it in the video at 00:19:40 and also showed the prompts that I passed to the GPT model for these documents at 00:19:57) - Manual verification and modification of generated questions due to LLMs' inability to match expert comprehension. 4. Human Involvement: Despite advancements, human interaction remains crucial, particularly for: - Identifying and removing irrelevant questions. - Enhancing incomplete questions. - Adding important questions missed by automated processes. 5. Scalability Challenges: Companies with extensive PDF archives face scalability hurdles, as manually addressing each document is impractical. Establishing clear guidelines and long-term plans is essential to streamline processes and develop efficient pipelines for document handling. In summary, complexities such as varied PDF structures, tool limitations, and scalability requires a multifaceted approach, combining automation with human oversight and strategic planning. Hope this answer helps.
@omidsa832311 ай бұрын
Very well-structured! Keep going.
@MANASSINGHA-i2h3 ай бұрын
this is so good....sir can we have more full end to end projects to show in our resume..ur teaching is so dope
@alexdossantosliberato463610 ай бұрын
Abriu a mente para muitas possibilidades, estou recomendando ao amigos. Continue :)
@airoundtable10 ай бұрын
Muito obrigado pelo apoio! Fico feliz em saber que o conteúdo foi útil para você :)
@RAJARAM-cx5jb9 ай бұрын
The best and holistic video on fine tuning LLM😊. Looking forward to your PEFT fine tuning with LORA/QLORA video
@airoundtable9 ай бұрын
Thanks! I am glad to see you liked the content. PEFT is coming soon 💯
@lijunlang85759 ай бұрын
Really appreciated your LLM series! Keep it going!
@hadi-yeg11 ай бұрын
Very exciting! Thanks for the informative session!
@airoundtable11 ай бұрын
Thanks Hadi! I am glad you enjoyed the video
@nathWSD6 ай бұрын
very beautiful and sweet thanks alot man
@airoundtable6 ай бұрын
Thanks! I am glad you liked the video!
@KinesitherapieImanesghuri7 ай бұрын
You choose the best model based on what ? and how can we include Quantization ?
@airoundtable7 ай бұрын
Based on: 1. Leaderboards 2. Use case: It's performance in various tasks, specially how it performs in tasks similar to the one that you have in mind for fine-tuning. 3. Number of parameters: Size definitely makes a big difference. Performance gets better as the model gets bigger (as well as the cost) 4. Hardware: Available hardware is very important in selecting the proper model 5. Project specifications: If the model is supposed to interact with large number of users, number of user's, acceptable system's delay and so many other factors need to be considered. 6. Close source vs open source: This can make a huge difference in model selection 7. Cost of fine tuning, serving, and maintanance and alot of other specifications for the project such as security measures, hallucination, etc. I wanted to upload a video for quantization but I saw alot of content on the internet so I postponed it. Soon I will upload a video on how to fine-tune a multi modal LLM with quantization. But until then, by search the web and you will find a ton of good material on how to add quantization in fine-tuning process. It will be very simple for you if you understood the contents of this video.
@pavankumarelango729511 ай бұрын
can i run this in colab?
@airoundtable11 ай бұрын
yes. The whole pipeline can be executed on colab. there are only 3 things that you have to take into account: 1. Installing the libraries in colab (e.g: transformers) 2. Managing the GPU RAM that colab gives you. In there you can select to use a free T4 GPU that gives you 14 GB of RAM and with that you can fine-tune the 70m and 1.5b model that I explained in the video. However, if you load the models and fine-tune them using PEFT methods, you can fine-tune up to a 7B model on that 14 GB GPU. 3. Make sure to copy and paste the datasets in your colab session (or your google drive) and load them from there.
@pavankumarelango729511 ай бұрын
Thanks for your quick reply @@airoundtable
@sureshm34352 ай бұрын
your content is amazing but your voice is very low to hear...
@airoundtable2 ай бұрын
Thanks. You are right. I noticed the issue and bought a Microphone to improve the sound on the next videos. So, my voice in my recent videos is more clear