Developing and Serving RAG-Based LLM Applications in Production

  Рет қаралды 21,461

Anyscale

Anyscale

Күн бұрын

Пікірлер: 14
@jzziesing
@jzziesing Жыл бұрын
I would love to see an hour long presentation on this!
@TymonVideos
@TymonVideos Жыл бұрын
Really enjoyed this talk - found a lot of value in it. Both speakers are clearly so knowledgable, and i love the extra little details the chap in the blue hoodie gave throughout. Would love to connect & share!
@TymonVideos
@TymonVideos Жыл бұрын
...just realised the "chap in blue" is a co-founder! No disrespect meant :) awesome
@junaidiqbal4104
@junaidiqbal4104 Жыл бұрын
🎯 Key Takeaways for quick navigation: 00:05 🚀 Initial Motivation and Project Start - Started building LM applications to gain firsthand experience and improve user experience. - Developed a RAG application, focusing on making it easier for users to work with products. - Emphasized the importance of underlying documents and user questions in building such applications. 01:31 🌐 Community Engagement and Insights - Encouraged sharing insights and experiences on building RAG-based applications. - Acknowledged the community's early stage and the value of diverse perspectives. - Welcomed external input to enrich the collective understanding of RAG applications. 03:07 🧩 Experimentation with Data Chunking - Explored different strategies for efficient data chunking, moving beyond random chunking. - Utilized HTML document sections for precise references and better understanding of content. - Aimed for a generalizable template, potentially open-sourcing a solution for various HTML documents. 05:14 🗃️ Vector Database and Technology Choices - Chose Postgres as the Vector database, emphasizing familiarity and compatibility. - Highlighted the increasing options of specialized databases for LM applications. - Advised selecting a database based on team familiarity but exploring new options for specific features. 06:10 🔄 Retrieval Workflow and Database Query - Described the retrieval process, including embedding queries and calculating distances. - Discussed pros and cons of building Vector DB on Postgres versus using dedicated solutions. - Addressed potential limitations based on document scale and the flexibility of different databases. 08:20 📏 Considerations for Context Size and Token Limits - Acknowledged token limits in LM context and model-specific variations. - Encouraged experimenting with different chunk sizes, possibly using multiple embeddings for longer chunks. - Highlighted the importance of adapting to the LM's limitations and exploring diverse experimental setups. 09:29 🔍 Evaluation Metrics and Component-wise Assessment - Introduced the two major components for evaluation: retrieval workflow and LM response quality. - Explained the evaluation process, including isolating each component for focused assessment. - Shared insights into the challenges and considerations of scoring LM responses. 11:32 📊 Evaluator Selection and Quality Assessment - Used GPT-4 as an evaluator based on empirical comparison and understanding of the application. - Discussed the limitations of available LM models and potential biases in self-evaluation. - Advocated for iterative improvement and potential collaboration with external LM development communities. 15:13 📈 Iterative Evaluation and System Trust Building - Illustrated the iterative evaluation process, starting with trusting an evaluator. - Demonstrated the evaluation flow, using different configurations and trusting the chosen LM's outputs. - Emphasized the importance of building trust in each component before assessing the overall system. 17:04 ❄️ Cold Start Strategy and Bootstrapping - Presented a cold start strategy using chunked data to generate initial questions. - Addressed noise reduction by refining generated questions and encouraging creativity. - Described the bootstrapping cycle from clean slate to using generated data for further annotations. 18:38 🔄 Continuous Learning and Evaluation Scaling - Responded to questions about the number of examples for cold start and overall evaluation. - Advocated for a balance of quantity and diversity in examples for comprehensive evaluations. - Stressed the importance of continuous learning, adaptation, and leveraging automated pipelines for scaling evaluations. 19:49 📈 Chunk Size Impact on Retrieval and Quality - Retrieval score increases with chunk size but starts tapering off. - Quality continues to improve even as chunk sizes increase. - Code snippets benefit from longer context or special chunking logic. 21:30 🧩 Number of Chunks and Context Size - Increasing the number of chunks improves retrieval and quality scores. - Larger context windows for LLMs show a positive trend. - Experimentation with techniques like RoPE for extending context. 22:30 🛠️ Fixing Hyperparameters During Tuning - Fixing hyperparameters sequentially: context size, chunk size, embedding models. - Experimentation with spread and fixing parameters once optimized. - Illustrates a pragmatic approach to hyperparameter tuning. 23:12 🏆 Model Selection and Benchmarking - GPT-3.5-based model (GTE base) outperformed larger models on their use case. - Emphasizes the importance of evaluating models based on specific use cases. - Benchmarking against openai's text embedding and choosing a smaller, performant model. 23:56 💰 Cost Analysis and Hybrid LM Routing - Cost analysis comparing different LM models. - Introduction of a hybrid LM routing approach for cost-effectiveness. - Consideration of performance, cost, and hybrid routing for optimal results. 25:10 🤖 Classifier vs. Language Model for Routing - Classifier used for routing decisions due to speed considerations. - Mention of training a classifier using a labeled dataset for routing. - Potential transition to LM-based routing as LM inference speed improves. 27:17 🔄 Future Developments and System Integration - Integration of components into larger systems, citingAnyScale's doctor application. - Anticipation of more developments and applications in the future. - Acknowledgment of the importance of iteration in building robust systems. Made with HARPA AI
@ndamulelosbg8887
@ndamulelosbg8887 5 ай бұрын
Great presentation. Just one question: What is relevance_score in this case? Is it an aggregation of grounding metrics for all reference examples?
@noosfera_It
@noosfera_It Жыл бұрын
amazing work! thank you
@rohvir2615
@rohvir2615 11 ай бұрын
goated video no cap
@mumcarpet109
@mumcarpet109 10 ай бұрын
on god, we making out the hood with this one 💯
@JavierTorres-st7gt
@JavierTorres-st7gt 7 ай бұрын
How to protect a company's information with technology ?
@victorhenriquecollasanta4740
@victorhenriquecollasanta4740 Жыл бұрын
Top gs
@noosfera_It
@noosfera_It Жыл бұрын
when accents swap.
@charlesthompson8938
@charlesthompson8938 10 ай бұрын
😂😂 and awesome content though.
@tunglee4349
@tunglee4349 6 ай бұрын
great content! thanks a lot!
NLP And The Future of Search With You.com
32:02
Anyscale
Рет қаралды 1,1 М.
Enabling Cost-Efficient LLM Serving with Ray Serve
30:28
Anyscale
Рет қаралды 7 М.
I'VE MADE A CUTE FLYING LOLLIPOP FOR MY KID #SHORTS
0:48
A Plus School
Рет қаралды 20 МЛН
요즘유행 찍는법
0:34
오마이비키 OMV
Рет қаралды 12 МЛН
Маусымашар-2023 / Гала-концерт / АТУ қоштасу
1:27:35
Jaidarman OFFICIAL / JCI
Рет қаралды 390 М.
How to Build Effective AI Agents (without the hype)
24:27
Dave Ebbelaar
Рет қаралды 2,6 М.
Python RAG Tutorial (with Local LLMs): AI For Your PDFs
21:33
pixegami
Рет қаралды 354 М.
Scaling Your Node JS API Like a Boss Part Two
50:43
Vadideki Geyik
Рет қаралды 72
System Design Interview - Step By Step Guide
1:23:31
System Design Interview
Рет қаралды 825 М.
A Survey of Techniques for Maximizing LLM Performance
45:32
Building Production-Ready RAG Applications: Jerry Liu
18:35
AI Engineer
Рет қаралды 343 М.
RAG But Better: Rerankers with Cohere AI
23:43
James Briggs
Рет қаралды 65 М.
I'VE MADE A CUTE FLYING LOLLIPOP FOR MY KID #SHORTS
0:48
A Plus School
Рет қаралды 20 МЛН