I started my Data Science journey two years ago and now I'm building projects like AI Assistants or tkinter Desktop apps with ollama integration which is able to summarize the content of different files (pdf, docx, images) and you are a big part of my development since your passion and love for this swapped over to me :))
@Thuvu5Ай бұрын
Oh congratulations on your projects! So glad you found inspiration from my vids 🤗☺️
@luisalbertocodesКүн бұрын
Just started my data science studies at university and this is awesome, I see my initial linear algebra classes paying off
@rayzorrАй бұрын
Wow, that was probably the best tutorial I have watched ... and I have watched a lot! Perfectly pitched and well thought out and delivered. Congrats on a great job!
@Thuvu5Ай бұрын
Aw you’re so kind! I’m so glad to hear that 🙌
@aireescreatesАй бұрын
Thanks for this Thu Vu. I have followed you from your first video. I was just starting in my DS journey. Your videos helped me a lot in my journey. I kinda missed you and I'm just glad that you posted again. This is super helpful and you explained all the concepts very clearly. I am currently building a web app extracting sales data from PDF files and using LLM to generate insights, analysis and recommendations and data viz. You explanation on Docker is a treasure as I'm building my app! Thank you so much!
@Thuvu5Ай бұрын
I'm so glad to hear! 🙌
@marktahu2932Ай бұрын
Thank you Thu Vu, for a very straight forward step by step guide to creating a RAG project, I have needed something like this for a while to understand how to implement this. Many thanks!!
@sk3ffingtonaiАй бұрын
Thank you so much for creating this comprehensive tutorial. I have been and am working hard on my AI Certification and this content is gold.
@oksanastrelnikova6970Ай бұрын
Absolutely amazing content. I an only a beginner, I do not think I will be able to do it by myself (too frighten) but I could understand every single step you were doing!!! (also considering that English is not my first language). Thank you a lot!!! For all you work. Your tutorials are super professional and extremely useful!!!
@ZakinAbdul28 күн бұрын
Thank you for the video, Thu vu. I recently completed a project using LLMs to interact with PDF data as a chatbot. Your code has been invaluable in helping me handle errors with ChromaDB and create a well-structured project directory. I was curious about potential improvements or alternative approaches that could enhance my project. Convert unstructured PDF data into a structured format with the use of LLMs. This was a new concept for me, as my project focused solely on chatbot interactions with the data. And your approach has opened my eyes to new possibilities and I'm eager to explore similar techniques in my future work.
@cerealport2726Ай бұрын
This is super interesting. Just like your other projects, you make it easy to see how the general process could be adapted for other purposes. Thanks very much!
@whatsbetter8457Ай бұрын
Instead of only be able to use OpenAI you could use the “instructor” or “ollama-instructor” library in Python to get structured and validated outputs from a LLM (Ollama, OpenAI, Gemini, Groq, etc.). Was already there before OpenAI came up with its feature :-)
@Thuvu5Ай бұрын
Thanks for sharing this! Yeah indeed, instructor seems to be more flexible if we want to try different LLMs in the same project
@jemiranhunterАй бұрын
Great content. Very informative. Thanks for sharing.
@cybeticaАй бұрын
You might want to renew your API key, as you showed it in plain text in 10:09 secs and scrolled. Nice vid!
@Thuvu5Ай бұрын
Oh thanks, good eyes! 😄 Yep I've revoked the key :)
@perpl1618Ай бұрын
This was an amazing video , Thank you Thu San , Would you consider making an advanced users video , with all of the small details and edge options ?
@SumithRajagopalanАй бұрын
Amazing explanation and video 👍
@ahmadzaimhilmiАй бұрын
I prefer to use Cohere's command-r instead of OpenAI for RAG tasks. The api response can pinpoint the exact sentences from where the information is retrieved given the chunks that we feed in. Good for retrieving answers with citations.
@kenchang3456Ай бұрын
Excellent tutorial. Thank you very much.
@georgejetson9801Ай бұрын
this would have been amazing for my phd studies
@Thuvu5Ай бұрын
Maybe for a second PhD? 🤣
@istifanusbulus12148 күн бұрын
Wow, one of the best tutorials, I want learn how to extract info on sales invoices and vendor invoices and convert them in datagram to match it the general ledger. Please can do a video about it. Thank in advance.
@ravikumarsingh976618 күн бұрын
Very nicely explained ... Really love the content . Way to go !!!. I wanted to ask if I have multiple PDF files , How can create Embedding for all the PDF files, like 10 PDF files . And then want to run rest of the query ? Whenever you have time , please do suggest . would wait for your reply !!!
@rickrandall3174Ай бұрын
Thu Vu, you are wonderful. 🙂
@FrancoGeraci_volАй бұрын
Absolutely fantastic! ❤
@robertbutscher6824Ай бұрын
great video, thank you so much for that valuable inspirations
@gviacavaАй бұрын
What a great tutorial!!! Thank you!
@MrGbrugesАй бұрын
THANX THU VU, VERY INTERESTING!!!!
@eulerthegreatestofall147Ай бұрын
Great Video as always!!!, quick question, how did you create the requirements.txt file??
@agape13Ай бұрын
With that said, there are going to be a big layoffs waves. One can already experience translators positions being significantly reduced. The need for analysts will change in the future as well.
@VinhNguyen-zg7lu28 күн бұрын
Hay quá chị ơi ❤❤❤
@Aaron-it5ilАй бұрын
Thanks for sharing!
@dannyrene12 күн бұрын
Ngl you’re one smart cookie
@dannyrene12 күн бұрын
I’m not finished watching but doesn’t each embedding vector need to have the same number of dimensions to perform a calculation of their Euclidean distance? Which would imply that all vectors have the same number of dimensions, right? If that’s the case, what is the limiting variable on the number of dimensions? Processing power? Wouldn’t more dimensions as give you a smarter model?
@GoogleUser-tk3mbАй бұрын
You're really taking my interest in data to the next level! It popped up in my KZbin recommendations, and this is truly a hidden gem. Keep it up, sis. +1 Subscribe! I'm sure this channel will blow up soon 🎉 Anyway, I was wondering how you do that code thing in VSCode without having to type everything? It's amazing!. And now I'm totally lost! FullStack? FrontEnd? BackEnd? Data Analytics?... FOMO is killing me! 🔥😭 But, the worldwide jobs market is stable for data roles, right? 🤔
@iantotan4229Ай бұрын
New video!Finally!
@nguyenhai.truongan24 күн бұрын
Hi Thu. Tôi đã theo dõi bạn cách đây vài năm trước, video của bạn làm rất hay. Thời gian gần đây tôi thấy bạn có đăng những video phân tích dữ liệu sử dụng AI. Là một nhà phát triển ứng dụng AI, tôi muốn tìm hiểu các quy trình, nhiệm vụ và nhu cầu của một nhà phân tích dữ liệu là như thế nào để có thể tạo ra một ứng dụng hoàn chỉnh cho ngành phân tích dữ liệu này. Hy vọng bạn sẽ có vài gợi ý cho tôi. Cám ơn Thu.
@Rationalview4915Ай бұрын
It was really helpful Thank you for this video❤
@petersheldrick1851Ай бұрын
great content, so well explained. I am doing an AI course at the moment, I am stuck on solving my project task,see if you can guide me! The requirement is to use AI or even deep learning to predict a person's shoe size based on a photo of the sole of their foot, without shoes and socks. Not allowed to use other items in the photo as a reference point, for example a centimetre ruler or something of a known size. Have to use learning from known images and their respective shoe size. I am struggling where to start!
@quangvu20780Ай бұрын
Tuyệt vời, video hay đấy em..
@readas119 күн бұрын
Hello, I found your video very informative since I have a similar project I am working on. Question for you: What would you do if the program was not returning good chunks? By that I mean, I uploaded a 90 page pricing document, and asked for the title of the document, and none of the chunks included the first page of the document, so the LLM could not correctly answer the question.
@kamilherbik22 күн бұрын
Thanks
@coldbelowfrozeАй бұрын
I missed you so much!
@Thuvu5Ай бұрын
Aww, thank you 🥹
@SkySesshomaruАй бұрын
incredible
@freedman1405Ай бұрын
Hi Thu Vu, what's your take on privacy issues with ChatGPT? Wouldn't companies risk their confidential data if they implement this system and use their APIs?
@Thuvu5Ай бұрын
Good question! In my experience companies typically use an enterprise subscription to a cloud service like Microsoft Azure that integrates access to these LLMs. Here’s an example learn.microsoft.com/en-us/azure/ai-services/openai/
@rodeondurotan6142Ай бұрын
I hope you can make a video on unstructured pdf data.
@MichealAngeloArtsАй бұрын
Thanks for the awesome project. What is the amount of code change required if I'll be using a Gemini LLM via Vertex AI on GCP instead of GPT4 / OpenAI (in particular, the LangChain-related code) to replicate this project?
@s3m3staАй бұрын
thanks a bunch Thu Vu
@dushimiyimanathaulin7930Ай бұрын
Very informative
@nnamdiodozi7713Ай бұрын
Did you use a Linux environment for this video? I’m asking cos I keep seeing bin in the file paths.
@sayfasayfa3500Ай бұрын
Pls can u tell which ide u use iam complete biginner and i wanna do this for main project
@nyanlynn-450Ай бұрын
Cool👍💯
@FauziFayyadАй бұрын
Yay Thu vuu !
@jeffkidder528219 сағат бұрын
Anything that even looks/feels too good to be true usually is. All this wonderful advancement screams of disaster just waiting to happen.
@junaidamin28 күн бұрын
For getting structured data in our answer, we can also use metadata ?
@hongmeixie409Ай бұрын
can you show what it looks like in the docker?
@ahmadzaimhilmiАй бұрын
I have a question about the structured output. I've been trying to find a workaround for dynamic attributes. The ones that you showed as example are hardcoded. I want to pass in a dictionary of field name and its explanations and get a resulting dictionary back in return. So far I couldn't think of a way.
@trungvan2154Ай бұрын
Does this code scenario work well for the other language such as Vietnamese , with a lang parameter vn for example? Thanks
@CyberHorrorHunterАй бұрын
I am new to this journey but how did you get your VS code to output so many lines, I have been tinkering with notebook settings and cant seem to get it to output the larger amount of data without go far out the right of the screen.
@Thuvu5Ай бұрын
Good question, I believe it’s a setting for notebook. Check it stackoverflow.com/questions/67855498/how-to-display-all-output-in-jupyter-notebook-within-visual-studio-code
@_Around_The_Globe_Ай бұрын
i get a NotImplementedError when using the with_structured_output, using gpt-4o-mini, can someone help plz?
@datagusАй бұрын
Is the extracting outcome from the PDF good? Often the extraction process produces text that is all messed up, which can have negative consequences in the chunking process.
@heritage1834Ай бұрын
I believe it depends on the formatting of the pdf files and also method the extraction is carried. A project article I read suggested that using image to text (OCR) usually produces better results than parsing pdf documents, especially when the pdf is badly formatted
@d.d.z.Ай бұрын
Very complete
@joseduarte1240Ай бұрын
we can create an local envirement that can read all the files in one folder, even if its excel,pdfs everything?
@CyberHorrorHunterАй бұрын
Additionally, I have found this does not output tables correctly (any idea how to remedy that?). Also, this seems to be affected by real text vs PNG, jpg images of the original pdf text that was then embedded in a pdf.
@supertab365Ай бұрын
Damn that's beginner level? I am f--d
@readas117 күн бұрын
Have you refined this project at all? I built your version with 0 edits, and it gets everything wrong every time I test a paper of anything length. The program works, but it does not actually interpret the documents well at all.. Of the 10 or so I have tested I dont think it has gotten a title correct once, and it usually gets 0/4 correct.
@RipulKumar-g2d29 күн бұрын
Hi not sure of you revert or not , i tried to follow your video but i am stuck at 22:22 sec and not able to move further. getting error when i execute the same code
@DrB934Ай бұрын
You may have just killed QSR NVivo...
@hoangsang2471Ай бұрын
Are you vietnamese, your name seem like vietnamese nam
@sifhatshams-s1jАй бұрын
If you ware my sister i dont have to warry about any problem :)) Why you did not born as my sister :((