No video

Improve RAG with This Simple API (code included)

  Рет қаралды 7,933

Dave Ebbelaar

Dave Ebbelaar

Күн бұрын

Пікірлер: 28
@farhanafridi8694
@farhanafridi8694 Ай бұрын
I believe you are the hero every AI engineer needs. Unlike most KZbinrs who copy and paste code from documentation, you address the real problems AI engineers face.
@daveebbelaar
@daveebbelaar Ай бұрын
Appreciate that!
@ClarkeBishopConsulting
@ClarkeBishopConsulting Ай бұрын
Very helpful, Dave! Many companies try to use naive chunking because there are so many examples on the web, KZbin videos, etc. You gave us a very good way to do smarter chunking and get more useful results. This is the future for RAG use cases.
@daveebbelaar
@daveebbelaar Ай бұрын
Thanks Clarke!
@BrockMesarich
@BrockMesarich Ай бұрын
So good at simplifying concepts in these tutorials. Loved this Dave!
@daveebbelaar
@daveebbelaar Ай бұрын
Thanks Brock 🙏🏻
@GeertBaeke
@GeertBaeke Ай бұрын
Good stuff! We use the exact same technique with markdown-based chunking and extra metadata for the chunks. Works really well!
@daveebbelaar
@daveebbelaar Ай бұрын
I think this is currently the best approach for RAG.
@micbab-vg2mu
@micbab-vg2mu Ай бұрын
great - thank you for sharing:) Please explore the topic more - )
@skyrayzor3693
@skyrayzor3693 Ай бұрын
Your video is detailed and very helpful, thank you for these type of techniques.
@krlospatrick
@krlospatrick 27 күн бұрын
Thanks a lot for sharing this knowledge, it's really useful!
@inflationking1271
@inflationking1271 Ай бұрын
Could you do a GraphRAG tutorial?
@Divyv520
@Divyv520 Ай бұрын
Hey Dave , Really Nice Video . I was wondering if I could help you with more High Quality and engaging editing with maintaining a brand colour to your youtube channel which can help you to get more engagement in your videos and Build your Unique Personal Brand . Pls lmk what do you think ?
@StephanieNguyen-om1ss
@StephanieNguyen-om1ss Ай бұрын
Super helpful. Can you please make a tutorial on how to use AWS Textract too?
@trendavira5128
@trendavira5128 Ай бұрын
Hi Dave, Thanks for the awesome content, a client come to me for a RAG solution, he have a library of hundreds of thousands of pages (about 60 Giga), simplest rag techniques doesn't seem to work for this case, come up to a solution using hybrid retriever and a reranker using llama-index, the results was good but not perfect, if were you how will you tackle this problem?
@awakenwithoutcoffee
@awakenwithoutcoffee 23 күн бұрын
we are working on a solution for this that can be white-labeled on release! does your client has an API endpoint or some kind o bucket containing all the files ? it really depends in what formats the data comes. If it its just text than you can use a hybrid-approach with semantic chunking, parent-document retrieval or other meta-data filtering techniques. The main point of importance is to make sure the data is pre-processed and cleaned before being chunked an embedded. Entity extraction is expensive but can be very helpful. A second best option is to extract meta-data. One is used for semantic extraction (entity) and the other for additional filtering. GraphRAG is the best solution, using entities, but it costs a massive amount of resources & development time making it only accessible to enterprise clients (10-50k +).
@AaronGayah-dr8lu
@AaronGayah-dr8lu Ай бұрын
Enjoyed this. Thank you.
@__m__e__
@__m__e__ 18 күн бұрын
Thanks I'm a newbie and your videos helped get me started. Can you please also share pdf_ingester?
@awakenwithoutcoffee
@awakenwithoutcoffee 23 күн бұрын
awesome video but where can we find the "from config.settings import get_settings" ?
@LaHoraMaker
@LaHoraMaker Ай бұрын
Have you tried passing the PDF to Jina Reader API? The Markdown output is quite clean too! (but it's only usable for public documents)
@chwaleedsial
@chwaleedsial Ай бұрын
Will try this with textaract. For my use case I am just sending a csv ( of an excel ) and its working but I think that is not a systematic, luck proof way. Do you think rag approach will be better, less prone to context, structure related hallucinations ?
@testadrome
@testadrome Ай бұрын
Does it work with scanned pdf docs?
@daveebbelaar
@daveebbelaar Ай бұрын
Yes!
@brandonvelasquez3530
@brandonvelasquez3530 Ай бұрын
This seems similar to GraphRAG. What is the difference?
@sahiljain9376
@sahiljain9376 Ай бұрын
GraphRAG is a more powerful solution than this baseline RAG. In GraphRAG, the data is stored in the graph with entities and relationships and also doing community summaries in detail which excels in retrieval flow. For eg: questions like "Did company underperform in Q4 vs Q3?" This kind of question would be difficult to answer using Baseline-RAG which can be answered easily using GraphRAG
@awakenwithoutcoffee
@awakenwithoutcoffee 23 күн бұрын
@@sahiljain9376 you can enhance RAG with agentic frameworks to allow these questions e.g. an SQL Agent with meta-data filtering. I love graphRAG but its a.) super expensive since entity extraction requires a ton of LLM calls b.) takes allot of time to set-up the graph, c.) has additional challenges to be overcome before it can really be used for non-enterprise.
@__m__e__
@__m__e__ 18 күн бұрын
@@sahiljain9376 I was unaware of GraphRAG, and it looks really interesting thanks. It looks like it's beyond my skill level now, but hopefully MS integrates it into Azure soon
Why Agent Frameworks Will Fail (and what to use instead)
19:21
Dave Ebbelaar
Рет қаралды 45 М.
How I'd Learn AI in 2024 (if I could start over)
17:55
Dave Ebbelaar
Рет қаралды 928 М.
Box jumping challenge, who stepped on the trap? #FunnyFamily #PartyGames
00:31
Family Games Media
Рет қаралды 32 МЛН
Кадр сыртындағы қызықтар | Келінжан
00:16
Kind Waiter's Gesture to Homeless Boy #shorts
00:32
I migliori trucchetti di Fabiosa
Рет қаралды 10 МЛН
Why I stopped using Jupyter Notebooks
11:17
Dave Ebbelaar
Рет қаралды 11 М.
Marker: This Open-Source Tool will make your PDFs LLM Ready
14:11
Prompt Engineering
Рет қаралды 47 М.
5 Steps to Build Your Own LLM Classification System
21:15
Dave Ebbelaar
Рет қаралды 8 М.
Build a FULL Web App With Claude With 2 SCREENSHOTS!
17:36
Riley Brown
Рет қаралды 114 М.
How to Find Freelance Data & AI Projects in 2024
23:19
Dave Ebbelaar
Рет қаралды 15 М.
GraphRAG Advanced: Avoid Overspending with These Tips
12:41
Mervin Praison
Рет қаралды 8 М.
ML Was Hard Until I Learned These 5 Secrets!
13:11
Boris Meinardus
Рет қаралды 275 М.
LlamaParse: Convert PDF (with tables) to Markdown
15:55
Alejandro AO - Software & Ai
Рет қаралды 10 М.
Box jumping challenge, who stepped on the trap? #FunnyFamily #PartyGames
00:31
Family Games Media
Рет қаралды 32 МЛН