Developing and Serving RAG-Based LLM Applications in Production

Рет қаралды 21,461

Anyscale

Күн бұрын

Пікірлер: 14

@jzziesing Жыл бұрын

I would love to see an hour long presentation on this!

@TymonVideos Жыл бұрын

Really enjoyed this talk - found a lot of value in it. Both speakers are clearly so knowledgable, and i love the extra little details the chap in the blue hoodie gave throughout. Would love to connect & share!

@TymonVideos Жыл бұрын

...just realised the "chap in blue" is a co-founder! No disrespect meant :) awesome

@junaidiqbal4104 Жыл бұрын

🎯 Key Takeaways for quick navigation: 00:05 🚀 Initial Motivation and Project Start - Started building LM applications to gain firsthand experience and improve user experience. - Developed a RAG application, focusing on making it easier for users to work with products. - Emphasized the importance of underlying documents and user questions in building such applications. 01:31 🌐 Community Engagement and Insights - Encouraged sharing insights and experiences on building RAG-based applications. - Acknowledged the community's early stage and the value of diverse perspectives. - Welcomed external input to enrich the collective understanding of RAG applications. 03:07 🧩 Experimentation with Data Chunking - Explored different strategies for efficient data chunking, moving beyond random chunking. - Utilized HTML document sections for precise references and better understanding of content. - Aimed for a generalizable template, potentially open-sourcing a solution for various HTML documents. 05:14 🗃️ Vector Database and Technology Choices - Chose Postgres as the Vector database, emphasizing familiarity and compatibility. - Highlighted the increasing options of specialized databases for LM applications. - Advised selecting a database based on team familiarity but exploring new options for specific features. 06:10 🔄 Retrieval Workflow and Database Query - Described the retrieval process, including embedding queries and calculating distances. - Discussed pros and cons of building Vector DB on Postgres versus using dedicated solutions. - Addressed potential limitations based on document scale and the flexibility of different databases. 08:20 📏 Considerations for Context Size and Token Limits - Acknowledged token limits in LM context and model-specific variations. - Encouraged experimenting with different chunk sizes, possibly using multiple embeddings for longer chunks. - Highlighted the importance of adapting to the LM's limitations and exploring diverse experimental setups. 09:29 🔍 Evaluation Metrics and Component-wise Assessment - Introduced the two major components for evaluation: retrieval workflow and LM response quality. - Explained the evaluation process, including isolating each component for focused assessment. - Shared insights into the challenges and considerations of scoring LM responses. 11:32 📊 Evaluator Selection and Quality Assessment - Used GPT-4 as an evaluator based on empirical comparison and understanding of the application. - Discussed the limitations of available LM models and potential biases in self-evaluation. - Advocated for iterative improvement and potential collaboration with external LM development communities. 15:13 📈 Iterative Evaluation and System Trust Building - Illustrated the iterative evaluation process, starting with trusting an evaluator. - Demonstrated the evaluation flow, using different configurations and trusting the chosen LM's outputs. - Emphasized the importance of building trust in each component before assessing the overall system. 17:04 ❄️ Cold Start Strategy and Bootstrapping - Presented a cold start strategy using chunked data to generate initial questions. - Addressed noise reduction by refining generated questions and encouraging creativity. - Described the bootstrapping cycle from clean slate to using generated data for further annotations. 18:38 🔄 Continuous Learning and Evaluation Scaling - Responded to questions about the number of examples for cold start and overall evaluation. - Advocated for a balance of quantity and diversity in examples for comprehensive evaluations. - Stressed the importance of continuous learning, adaptation, and leveraging automated pipelines for scaling evaluations. 19:49 📈 Chunk Size Impact on Retrieval and Quality - Retrieval score increases with chunk size but starts tapering off. - Quality continues to improve even as chunk sizes increase. - Code snippets benefit from longer context or special chunking logic. 21:30 🧩 Number of Chunks and Context Size - Increasing the number of chunks improves retrieval and quality scores. - Larger context windows for LLMs show a positive trend. - Experimentation with techniques like RoPE for extending context. 22:30 🛠️ Fixing Hyperparameters During Tuning - Fixing hyperparameters sequentially: context size, chunk size, embedding models. - Experimentation with spread and fixing parameters once optimized. - Illustrates a pragmatic approach to hyperparameter tuning. 23:12 🏆 Model Selection and Benchmarking - GPT-3.5-based model (GTE base) outperformed larger models on their use case. - Emphasizes the importance of evaluating models based on specific use cases. - Benchmarking against openai's text embedding and choosing a smaller, performant model. 23:56 💰 Cost Analysis and Hybrid LM Routing - Cost analysis comparing different LM models. - Introduction of a hybrid LM routing approach for cost-effectiveness. - Consideration of performance, cost, and hybrid routing for optimal results. 25:10 🤖 Classifier vs. Language Model for Routing - Classifier used for routing decisions due to speed considerations. - Mention of training a classifier using a labeled dataset for routing. - Potential transition to LM-based routing as LM inference speed improves. 27:17 🔄 Future Developments and System Integration - Integration of components into larger systems, citingAnyScale's doctor application. - Anticipation of more developments and applications in the future. - Acknowledgment of the importance of iteration in building robust systems. Made with HARPA AI