LLMOps (LLM Bootcamp)

Рет қаралды 89,599

Күн бұрын

New course announcement ✨
We're teaching an in-person LLM bootcamp in the SF Bay Area on November 14, 2023. Come join us if you want to see the most up-to-date materials building LLM-powered products and learn in a hands-on environment.
www.scale.bythebay.io/llm-wor...
Hope to see some of you there!
--------------------------------------------------------------------------------------------- In this video, Josh gives a tour of the emerging discipline of LLMOps: principles and practices for continuous improvement of large language model-powered applications.
- Comparing and evaluating open source and proprietary models
- Workflows and tools for iteration and prompt management
- Principles for applying test-driven development to LLMs
Download slides and view lecture notes: fullstackdeeplearning.com/llm...
Intro and outro music made with Riffusion: github.com/riffusion/riffusion
Watch the rest of the LLM Bootcamp videos here: • LLM Bootcamp - Spring ...
00:00 Why LLMOps?
01:55 Choosing your base LLM
04:20 Proprietary LLMs
09:15 Open-source LLMs
14:45 Iteration and prompt management
22:35 Testing LLMs: Why and why is it hard?
28:15 Testing LLMs: What works?
35:05 Evaluation metrics for LLMs
37:40 Deployment and monitoring
44:35 Test-driven development for LLMs

Пікірлер: 64

@ryaninghilterra109 Жыл бұрын

Dolly 2.0 is fully permissive license for commercial use I thought? . Listed on your talk as proprietary license . Also the mosaic fully open source models look promising (MPT-7B)

@The_Full_Stack Жыл бұрын

This is correct! We got a bit tripped up on Dolly 2.0, as the licensing is weirdly complicated. From what we can tell, it has an MIT license on the weights (huggingface.co/databricks/dolly-v2-12b), an Apache license on the training/inference code (github.com/databrickslabs/dolly/blob/master/LICENSE) and a CC-BY-SA license on the training data (github.com/databrickslabs/dolly#model-overview). MPT wasn't out yet when these were recorded (three weeks ago, late April 2023), but we agree it looks promising. Especially the long context window models!

@ryaninghilterra109 Жыл бұрын

Awesome, appreciate the detailed response!

@jonassteinberg3779 22 сағат бұрын

This is probably the best talk on LLMops from the dev perspective, as opposed to from the devops perspective, on the internet.

@winsontruong6264 Жыл бұрын

if anyone else is exploring this chat its good to note that because LLMs are moving so fast, there are even more Apache 2.0 models out there that have been released after this presentation. RedPajama and GPT4All-j variants have A2.0 licenses and from memory their performance is decent.

@astronemir Жыл бұрын

👍

@blivas Жыл бұрын

What is the latest on these?

@robertcormia7970 Жыл бұрын

Fantastic "part 3" in a sequence of topics. The speaker (Josh) is very comfortable explaining application development for LLMs, which is our main focus in developing an AI certificate at our college. Josh is clearly experienced and enthusiastic about this field, and explains topics well!

@alessandrorossi1294 Жыл бұрын

I started working in this space a few months after Glove and Word2Vec embeddings came out back in 2014. I have to say when I see the word "bootcamp" in a title I usually run for the hills, but this guy actually gave a great presentation with a coherence and fluency showing he actually has experience and didn't just learn this from index cards 5 minutes before the presentation (my usual experience with bootcamps). Bravo!

@elginbeloy9066 Жыл бұрын

My boy Josh Tobin. legend

@datasciencetoday7127 Жыл бұрын

dear youtube algo please give me more recommendation like this

@MrMaikeul Жыл бұрын

Dude, this was awesome. Thanks for spilling the beans on what's to come in our space ;)

@AlexLeu Жыл бұрын

Very useful session. I've learned a lot -- especially evaluation metrics for LLM. Thank you!

@siddharthbhargava4857 Жыл бұрын

An amazing presentation. We definitely need more videos/content like this that can help navigate the quick-paced, dynamic tech world. Thank you.

@joelalexander7293 Жыл бұрын

Goldmine of information. Love it!

@alireza29675 Жыл бұрын

I found exactly what I was searching for! The explanation was amazing and the insights were great

@a_user_from_earth 10 ай бұрын

What a phenomenal talk! Amazing slides, kept simple yet they do really add something to your great explanatioins. Showing the difference between then and now, between DNNs and LLMs operations was also great and a very kind wrap up in the first half.

@arvj123 Жыл бұрын

This is exactly the thing I was looking for (having made a codebase analysis tool with LLM that I want to share with my team). Thank you for making this video for free. Much appreciation to whoever runs this channel.

@datasciencetoday7127 Жыл бұрын

so good man, I really loved the model comparisons

@omarelmady Жыл бұрын

Very impressive quality Lecture, excited to learn more. I’m looking to get started with making my own chatbot tutor.

@ShifraTech Жыл бұрын

Great Talk, a lot of work to be done in the LLM deployment/production scene for Software Eng/DevOps I like how this was recorded like 2 weeks maybe ago ? now it a bit aged look at anthropic announcing their 100k not out yet but more promising the 65k MPT-7B-StoryWriter-65k+ by Mosaic crazy how this field is progressing

@MridulBanikcse 8 ай бұрын

gem of a resource. concise and clear.

@snehotoshbanerjee1786 Жыл бұрын

Fantastic talk!

@fudanjx Жыл бұрын

## Choosing a base language model - Trade-offs to consider: - Out-of-the-box quality - Speed and latency - Cost - Fine-tunability - Data security - License permissibility - Conclusion: Start with GPT-4 for most use cases ## Managing prompts and chains - Level 1: No tracking - Level 2: Manage in git - Level 3: Use a specialized tool (if needed) ## Evaluating performance - Build evaluation set incrementally: 1. Start small 2. Use LM to generate test cases 3. Add more data as you discover failure modes - Metrics: - Accuracy (if correct answer exists) - Reference matching (if reference answer exists) - Which is better? (if previous answer exists) - Incorporates feedback? (if human feedback exists) - Static metrics (if no data exists) ## Deployment - Call API from frontend - Isolate LM logic as separate service (if needed) ## Monitoring - Outcomes - Model performance metrics - Common issues: incorrect answers, toxicity, etc. ## Improving the model - Use feedback to improve prompt - Optionally fine-tune the model

@alchemication Жыл бұрын

Wow. Great timing for this. Thanks! The only model I’m missing in comparison is Open Assistant, seems to be fully “open”

@jsfnnyc 10 ай бұрын

Great talk!!

@jonassteinberg3779 Күн бұрын

All one needs to do to track prompt accuracy, at least from a basic standpoint, is track prompts in git, as he mentions, but that have an automation pipeline that runs prompt changes against a ground truth or fine-tuning data set, probably in cicd. Have that pipe output statistical measurements and vois la: automated prompt comparison.

@richardadams1909 10 ай бұрын

Really good talk! New to LLMs and learned a lot. At the end when you were talking about the iteration cycle, you were describing how you would come up with an idea as an individual, experiment a bit, then share with your team. As a software developer I find that pair or mob programming is really good approach at the start of a new piece of work. Do you have any thoughts on 'pair-prompting' as a way to improve the initial stage of the project ? After all, interacting with a LLM is a conversation, so having a few people working together on refining prompts could help reduce biases/assumptions you are introducing as an individual.

@oguuzhansahin Жыл бұрын

So, my question is how flan-t5's context length is referred as 2K? As far as I know, it must be 512. Am I wrong?

@alessandrorossi1294 Жыл бұрын

Why weren't GPT-J models included in the open source discussion?

@fintech1378 8 ай бұрын

it feels quite ancient?after openAI dev day? things can be obsolete in months?

@ayanghosh8226 Жыл бұрын

Can we find the slides used anywhere? The Fine tuning related slides were skipped due to shortage of time, but it seemed there is a lot of useful information in it too. If there is a link available to the slides, kindly share.

@ayanghosh8226 Жыл бұрын

Ignore the comment. I found the slide links in the description. Thank you! Excellent presentation ❤

@3169aaaa Жыл бұрын

cool

@user-oj9iz4vb4q Жыл бұрын

well your open source slide was just wrong. OpenRAIL absolutely does allow commercial use for both the BLOOM and BLOOMZ model. Oddly enough, BLOOMZ which is alot like gpt-3.5 is conspicuously missing from your slides.

@Sean_neaS Жыл бұрын

I'm doing initial coding on an open source model, then I can switch to gpt-4 once I know I'm not doing anything stupid like infinite loops

@The_Full_Stack Жыл бұрын

Interesting approach! For intensive and open-ended applications like agents, the LM calls can definitely add up to a ton of tokens. When using model providers, follow best practices for all cloud services, like putting guardrails in place to limit the pain from surprise bills.

@pietraderdetective8953 Жыл бұрын

I'm surprised seeing claude-instant only got 1 out of 4 star in terms of quality. I've been using both chatgpt 3.5 and claude-instant and I much prefer claude-instant. in my opinion if chatgpt 3.5 receives 3 star then claude-instant deserves at least the same.. the issue with openai model is they put too much filter / constraints to the models...if I ask chatgpt something considered to be "sensitive" it just outright refuse to answer the question.

@togo7022 Жыл бұрын

The move from MLOps to LLMOps will be quite humbling for the MLOps world/hype. LLMs means custom internal DS/ML functions are no longer that important when you have commodity API to use. LLMOps then just becomes basic data engineering and management again

@The_Full_Stack Жыл бұрын

Definitely possible -- that's why we spent less time on deployment in the LLM Bootcamp than in our Deep Learning Course. But if FOSS models and finetuning take off, then MLOps concerns about experiment management and model versioning will come roaring back!

@mohamedelsayed8428 Жыл бұрын

llama quality after fine tuning like vicuna or wizardlm is similat ot gpt3.5 why did you rank them so low?!

@The_Full_Stack Жыл бұрын

We definitely agree that LLaMA makes for a great base model for finetuning, especially for narrow tasks. In our experience, prompted 13B+30B Alpacas are not as good as prompted gpt3.5turbo for complex tasks. In light of the results from the LMSys Chatbot Arena (lmsys.org/blog/2023-05-03-arena/), which came out after this talk was recorded, we can see an argument for 2.5 stars for Vicuna, between gpt3.5turbo's 3 stars and Dolly 2.0's 2 stars. This field is developing rapidly! If you have any other benchmarks to point to, we'd love to hear about them.

@mohamedelsayed8428 Жыл бұрын

@@The_Full_Stack try 13b wizard vicuna and you will be impressed with its quality

@picklenickil 11 ай бұрын

I was expecting more on deployment. Comeon.. 😢😮😅

@picklenickil Жыл бұрын

Calude was supposed to be 100k context....?

@StephenRayner Жыл бұрын

That only just dropped. I doubt this is up to date? Only at 5:33 atm

@jerryyuan3958 Жыл бұрын

@@StephenRayner It is supported in Poe now

@The_Full_Stack Жыл бұрын

Correct! These videos are about three weeks old, and a lot happened in the FOSS model world in that time.

@Lolleka Жыл бұрын

@@The_Full_Stack Does this mean that the time to obsolescence is getting drastically shorter?

@thomasr22272 Жыл бұрын

You guys barely mentioned Prompt injection attacks, come on this is a crucial aspect for the future of LLMs

@The_Full_Stack Жыл бұрын

We agree that mitigating prompt injection is critical for LLM-powered apps that use tools or access possibly sensitive information! Because prompt injection isn't solved yet, we covered it in our What's Next? lecture, where we discuss multiple safety+security concerns for LLM software: kzbin.info/www/bejne/l6nCg2evr5aKra8

@limitlesslife7536 Жыл бұрын

Dolly is not proprietary.

@NeuroScientician Жыл бұрын

Llama is OSS now

@sachinkun21 Жыл бұрын

Source? Can't find anything regarding this

@sachinkun21 Жыл бұрын

On the official model card it still says: License Non-commercial bespoke license

@jordancardenas4953 Жыл бұрын

@@sachinkun21 OpenLLaMA is an open reproduction of LLaMA with the original architecture but trained with the RedPajama Dataset

@NeuroScientician Жыл бұрын

@@sachinkun21 Released under the name OpenLLaMA under Apache 2.0

@sachinkun21 Жыл бұрын

Oh that one. I thought meta’s. Haven’t experimented with OpenLlama so can’t say anything about performance but meta’s llama if open sourced will also open doors for it’s popular dialogue derivatives such as Vicuna and Koala.

@mrGapMan1 Жыл бұрын

This presentation is horribly outdated after 1 week. There are now super competent open source LLMs, uncencored that can be used as Auto-GPTs with Langchain and Pinecode and 100K tokens. Cmon, this bootcamp needs to chill or go streaming every second day to be relevant.

@The_Full_Stack Жыл бұрын

Some of the material we cover does change quickly, and the state of play for FOSS models happened to change a lot in the three weeks since we recorded this video! Here's hoping they keep improving. We really like HELM (crfm.stanford.edu/helm) and the LMSys leaderboard (chat.lmsys.org/?leaderboard) for keeping up with capabilities and benchmarking models against one another. What do you use?

@BodinhoDE Жыл бұрын

The presentation is mainly about how to evaluate / testing / deploying LLMs. Can you elaborate what is „horribly“ outdated on these topics?

@mrGapMan1 Жыл бұрын

@@The_Full_Stack I used hyperbole speech to point to the super progress AI is making and that this kind of conference would probably be better of waiting untill the progress reaches a steady state. I didn't mean to hurt peoples feeling. Sorry if i did.

@OliNorwell Жыл бұрын

@@BodinhoDE The Vicuna 13B model and comparable ones are far far better than what is suggested here (where they are rated as basically useless), that's the only misleading part of this video. But also, they can't update the video every week, so it's hard to be annoyed!