RouteLLM Tutorial - GPT4o Quality but 80% CHEAPER (More Important Than Anyone Realizes)

  Рет қаралды 30,355

Matthew Berman

Matthew Berman

Күн бұрын

Пікірлер: 139
@matthew_berman
@matthew_berman 3 ай бұрын
I'm putting the final touches on my business idea presentation that I'm going to give away, which is partially inspired by RouteLLM. Can you guess what it is? 🤔 Subscribe to my newsletter for your chance to win the Asus Vivobook Copilot+ PC: gleam.io/H4TdG/asus-vivobook-copilot-pc (North America only)
@sugaith
@sugaith 3 ай бұрын
I've actually tried this for a Production use-case and it did not work as expected. Simple questions as "What is the name of our Galaxy" were routed to GPT4o. So the "only when absolutely necessary" is absolutely not true Also there is a delay problem here because it takes a while for the RouteLLM model to run and decide the route.
@louislouisius8261
@louislouisius8261 3 ай бұрын
Does the models have context of each other?
@sugaith
@sugaith 3 ай бұрын
@@louislouisius8261 not by default but you can add that
@negozumbi
@negozumbi 3 ай бұрын
Could you have modify your initial prompt before the actual query? Do you think you would get a better outcome?
@ernestuz
@ernestuz 3 ай бұрын
@@louislouisius8261 you can transfer the conversation pairs, or a subset if you wish, between models.
@negozumbi
@negozumbi 3 ай бұрын
WOW. This is crazy. I had a similar idea last week and mentioned to my team this morning 10AM UK time. I’ve just seen this video and it blows my mind that our thoughts were so aligned. And yes. I’m not sure if people really got it together, and Agentic on device system + Routing LLM technology is the future. I believe models will become smaller and smaller. Crazy how this part of tech is advancing so fast. It is as scary as it’s exiting. I learn a lot from your videos that forces me to read papers and try some of the tech for myself. I really appreciate your content.
@DeeDoo-o8o
@DeeDoo-o8o 2 ай бұрын
Has happened to me for almost a year, I think they collect ideas from our prompts and use them. None of the ones I kept locally have been created yet lol
@Corteum
@Corteum 24 күн бұрын
@negozumbi How are you finding LLM routing gso far? Are you still using it ?
@DihelsonMendonca
@DihelsonMendonca 3 ай бұрын
💥 Matthew Berman is so clear and concise. He has that natural talent for explaining things in a way that everybody understand, with emphasis on all phrases, clear diction, intonation, that hooks the listener. People like David Shapiro, Matt Wolf, Matthew Berman, who speaks only the necessary, and value every phrase. This is great. 🎉❤❤❤
@bigglyguy8429
@bigglyguy8429 3 ай бұрын
Thanks GPT
@sinukus
@sinukus 3 ай бұрын
Is the context preserved across models for subsequent queries? E.g. Where is country x? - goes to the weak model What is good to eat there?- goes to the strong model Does the strong model have the context of country X?
@rodrigogarcia9273
@rodrigogarcia9273 2 ай бұрын
That is a very good point. I guess you can now use whatever model you want to help you answer this important question. Maybe let us know?
@jaysonp9426
@jaysonp9426 3 ай бұрын
Awesome video, with GPT4o mini though idk how much I'll be routing lol
@cluelesssoldier
@cluelesssoldier 3 ай бұрын
Came here to say the same thing lol. These techniques are only barely outpacing foundation model improvements - costs are dropping like a rock and latency continues to decrease. Its wild!
@rodrigogarcia9273
@rodrigogarcia9273 2 ай бұрын
Thank you sir for your videos, your clear explanations and the amount of tools you give us visibility to. Please don't stop. God bless!!
@JaredWoodruff
@JaredWoodruff 3 ай бұрын
I can see this being useful in scenarios where you have local Phi-3 models, then escalating to GPT4-Mini. If there was some way of declaring what each SLM is best suited for, then RouteLLM could kick it to the expert that knows best. Great video, as always Matthew! Cheers from Australia!
@brianrowe1152
@brianrowe1152 2 ай бұрын
We were already working on the business idea you mention. I assume more will be now too. just makes sense. Many clients can't afford to just pay $20/user/month for everyone in their enterprise when they don't really know if there will be value yet (ROI). The challenge with RouteLLM is that it doesn't quite do everything you suggest because you only get weak and strong. So local.. or 4o... What if I want Local for really private, and then 4o-mini for simple but the llama3.1 might struggle? or I want a poem, and Anthropic is better. The model limit is the challenge - great solution to start saving $$, but right now its like 4o and 40-mini are the only logical choices for an enterprise and I'd suggest a full solution needs a little more subtle routing options.. Great stuff.
@magnusbrzenk447
@magnusbrzenk447 3 ай бұрын
Matt: you need to discuss how the router logic works. I am nervous that it will not be smart enough for my niche business use cases
@longboardfella5306
@longboardfella5306 2 ай бұрын
I also would want a choice of models based on use case rather than just weak or strong which seems very limited. This is an area to be detailed for sure
@drogokhal3058
@drogokhal3058 3 ай бұрын
Hi, thanks for the great video. I love watching them. The idea about RouteLLM is great, but I would say it lacks control. The control needs to be reliable and avoid hallucinations, which is a huge problem with LLMs. For me, it is not clear about the criteria. On what basis is he trying to route ? For production, I think it is better to develop an agent whose task is to understand the requirements and, with some instructions or even fine tunning some LLM model, actually perform the routing. LLMs are not ready for general-purpose use. If you need an AI agent to perform a specific job, then make sure you have a very good specification, good prompt engineering, use function calling, and do a lot of testing. I am using the LamaIndex framework, and I want to do the same LLM routing thing, but using my own agent or agents, which should provide me with more control over verification and decision making. I did some some agents with Lamaindex and it is working OK so far. I can mix models and for output verification, using always top tier LLM model.
@aiplaygrounds
@aiplaygrounds 3 ай бұрын
That is a task not a prompt 😮it is useful for complex tasks, setting up a system prompt and classifying the task level. Great video 👏
@jaysonp9426
@jaysonp9426 3 ай бұрын
Great video! With GPT4o mini idk how much I'll be routing though lol
@TheRubi10
@TheRubi10 3 ай бұрын
Sam Altman already mentioned that, in addition to trying to improve the reasoning ability of the models, another aspect they were working on was a mechanism capable of discerning the complexity of a task to allocate more or fewer resources accordingly. Currently, LLMs always use all resources regardless of the task, which is wasteful.
@jtjames79
@jtjames79 3 ай бұрын
Sounds like consciousness might be an economic problem. Interesting.
@sauxybanana2332
@sauxybanana2332 3 ай бұрын
@@jtjames79 now you mentioned it, right on
@jtjames79
@jtjames79 3 ай бұрын
@@sauxybanana2332Hypothesis: Consciousness (or a form of it) is a market of ideas. The currency is attention. Food for thought.
@modolief
@modolief 3 ай бұрын
Matthew, can you do a deep dive on how to choose hardware for home use? Can we use Linux? And also, what hardware choice might we make if we wait? Should we wait? Should we buy now to avoid a signed GPU? Etc.
@clint9344
@clint9344 3 ай бұрын
I agree, it is getting confusing for the beginner..its like learning a whole new language again...lol Great vids keep up Great work be in peace God speed.
@punk3900
@punk3900 3 ай бұрын
That's the way to go! Great and thanks for sharing!
@punk3900
@punk3900 3 ай бұрын
@Matthew_bberman wow, did I won something?
@wardehaj
@wardehaj 3 ай бұрын
Great video and love your great businesses idea. Thanks a lot! Does routeLLM support visual input or only text input?
@matthew_berman
@matthew_berman 3 ай бұрын
I believe only text right now
@clint9344
@clint9344 3 ай бұрын
Good question...would this be where the agents come into play..say a visual agent?
@richardkuhne5054
@richardkuhne5054 3 ай бұрын
I was actually thinking to use it the other way around. Route LLM to have a strong cheap model and a weak even cheaper model and then add it to MOA together with strong frontier models to aggregate the responses so you could potentially build something that is slightly better than gpt4-o while still reducing your cost a bit. Another step would be to also chain it together with memgpt to have a longterm memory. And then use the endpoint in aider as the ultimate coding assistant 😅
@j-rune
@j-rune 3 ай бұрын
Thank you, I'll try this out soon! ❤
@toadlguy
@toadlguy 3 ай бұрын
Matt, do you know how this (or your "business idea") will deal with context? Most chat uses for LLMs involve a conversation. I'm not sure with this sort of simplistic routing that the only way to deal with context is to keep using the model the conversation has started with, although you could keep a separate running context that you pass back to whichever model is being used - I just don't know how well that would work, though. BTW, your "business idea" is what many people are currently working on (or something similar), even some LLM providers may be doing this behind the scenes. There is a distinct lag and additional token cost with more complex queries on GPT-4o mini that may also suggest a similar approach.
@dimii27
@dimii27 3 ай бұрын
but you do it locally with minimal cost instead of paying the bill to openai
@jamesyoungerdds7901
@jamesyoungerdds7901 3 ай бұрын
Really great stuff, Matthew - thank you! I noticed the prompt was to keep the cost of the call to the MF router api down to 11.5c - does this mean the router llm uses tokens per cost or does that run locally?
@zhengzhou2076
@zhengzhou2076 3 ай бұрын
Thank you! Matthew. I like your video very much.
@cristian15154
@cristian15154 3 ай бұрын
🙌 How nice would be if gpt4 had the speed of groq. Thanks for the video Matthew
@michaelslattery3050
@michaelslattery3050 3 ай бұрын
Another excellent video. I wonder if this can judge multiple models, and if not, when they might add it. You might want to choose between a smart expensive model, a fast model, a cheap model, and several specialized fine-tuned models. I've tried to do my own LLM routing in my agents, but mine is naive compared to this thing. Mine just asks the weaker LLM which LLM should do a task prompt, and it's often wrong, even when I supply a bunch of multi-shot examples.
@laiscott7702
@laiscott7702 3 ай бұрын
another game change project, excited for the next video
@jeffg4686
@jeffg4686 3 ай бұрын
Matt, you need to get with Groq to improve their process. They MUST get rid of the just make things up behavior when it doesn't have data. That is REALLY annoying. I have to ask it now - did you make that up?
@jean-paulespinosa4994
@jean-paulespinosa4994 3 ай бұрын
did you know that gpt-4o and mini can run python code? I have tried it and it worked, but only with python, hopefully the other LLM will soon follow with this ability and maybe add more programming languages that can be run directly in the prompt.
@mohdjibly6184
@mohdjibly6184 3 ай бұрын
Wow Amazing...thank you Matthew :)
@LuckyWabbitDesign
@LuckyWabbitDesign 3 ай бұрын
If Llama 3.1 is looking at "9.11" > "9.9" as a String data type and not as a Float data type, 9.11 is larger because it contains more characters.
@paul1979uk2000
@paul1979uk2000 3 ай бұрын
This could work really well with locally run models, especially if using specialised models for given tasks, having multiple models, each one good in their own respected area and much lighter on the hardware because of not being a jack of all trade and could potentially be a game changer for running much stronger A.I. on our local hardware, especially if there is a way to have it where it does a good job at picking the right model for the given task you ask of it and honestly, storage space is a lot cheaper than vram is, so if A.I. models can be switched in and out of memory on the fly, having a lot of A.I. models on your hard drive that are good in their own area isn't going to be that big of a deal but could boost the quality of A.I. massively at a local level while not needing a crazy amount of vram. Mind you, all this only works if A.I. models can be switched in and out of memory on the fly and do so quite quickly that from an end user, it all seems like the same A.I. model, this also would only work if you have a master general A.I. model that can do well at delegating to other specialised A.I. models, after all, having a few hundred GB's of A.I. models isn't going to be that big of a deal with how cheap storage is, and it sure would be a lot cheaper and faster to run if they can be switched in and out on the fly.
@sauxybanana2332
@sauxybanana2332 3 ай бұрын
you dont need to run llama3 with ollama in terminal, ollama app took over the 11434 port already, as long as you invoke local models with the matching model name, ollama will do the work
@ernestuz
@ernestuz 3 ай бұрын
I've been doing something similar with an 7B model as front end, if it can't answer the question, it forwarded it to a few bigger models, and then cook a response with all the answers, I call it TPMFMOEX because it's catchy (the poor man's fake mixture of experts). Thanks for your videos!
@hqcart1
@hqcart1 3 ай бұрын
how do you know if it cant answer?
@ernestuz
@ernestuz 3 ай бұрын
@@hqcart1 It does, not me, "If you are not sure you can answer the query...." There is also a few tens of Response/Answer pair examples injected at start into the context giving examples of what it can and can't answer. A poor man's solution.
@hqcart1
@hqcart1 3 ай бұрын
​@@ernestuz i am not sure if such a prompt would work for any LLM, you will definitely get answers where LLM has no clue. you should test it for like 1000+ prompts to make sure all LLMs follow the instructions..
@ernestuz
@ernestuz 3 ай бұрын
@@hqcart1 Well, the model I am using tends to be correct, and the one before it as well. At the moment I am giving around 30 something pairs for context injection, the prompts (It has more to it because it has to use tools), and some silly support by the tool that forwards the query if necessary (basically a list of what / where). Because I am using agents, if the task fails, the model can be told and retry with some instructions. It's really a poor man solution, nothing fancy. EDIT: It looks like I am keeping the models secret, not really, the old one was Mistral 7B V0.2 and 0.3, then Llama3 8B and now, since last week, it's actually Codestral (not for coding, it just turns out to be great for this project in particular). Think of every pair you are injecting as a prompt on their own, add a "cost" to forwarding queries, you know, the tools can answer back. I also tend to go to ChatGPT and Claude to ask their thoughts on my prompts, ask them examples, AI produces excellent prompts for AI. Think you can inject context midway during the task, in between generating completions, if the model doesn't go in the direction you want....
@ernestuz
@ernestuz 3 ай бұрын
@@hqcart1 I wrote a long answer that it seems to have been dropped, grrrrr. In short I am using 34 pairs to prime the context and the prompt. There is also a "cost" added to the query, The way the model sees the upstream models is through tools with the associated "cost". I've been using Mistral 7B, though I tried phi and Gemma too (thats another story) and now llama3 8B and Codestral is next, not a code related task, but this model is a very good model for the task.
@flavmusic
@flavmusic 3 ай бұрын
Great getting started thanks. But how to instruct Route LLM the conditions for choosing the strong or weak model ?
@asipbajrami
@asipbajrami 3 ай бұрын
hopping langchain to implement this...because if frameworks (for agents) don't support this, it will be hard for developers to implement it
@AhmedMagdy-ly3ng
@AhmedMagdy-ly3ng 3 ай бұрын
Thanks a lot Matthew 😮❤
@leonwinkel6084
@leonwinkel6084 3 ай бұрын
How does it decide which model to go for. Knowing the mechanism would be useful. Also possibly having more than 2 models would be great. For example if I tailor a sales message, it’s possible to do it with a 7B model but with a 70B model as far as I experienced it’s much better. I guess everyone needs to test if for their usecases. Overall if the selecting mechanism works correctly it’s great tech and highly relevant. (I don’t need gpt4o for example to tell me that with rstrip(„/„) I remove the last /. I’m sure mixtral etc can do that) If it goes wrong just in few times in production, where people are in an interaction with the bot, then it would not be worth it since the quality of the product cannot be guaranteed. Anyways, it’s all in development, thanks for the video!!
@vincentnestler1805
@vincentnestler1805 3 ай бұрын
Thanks!
@matthew_berman
@matthew_berman 3 ай бұрын
Thanks so much!
@togetherdotai
@togetherdotai 3 ай бұрын
Great video!
@Omobilo
@Omobilo 3 ай бұрын
Mate great content.. Any LLM or platform that I can point it to or give it a website to analyze its content with some specifics mentioned in the prompt, instead of me compiling the whole website copy into some document first (tedious) as a prompt?
@totempow
@totempow 3 ай бұрын
Suppose I was using something like Perplexica and OAI and a local LLM like Llama3, what API would it call to at that point? Hypothetically speaking, of course. Also cool delivery.
@mdubbau
@mdubbau 3 ай бұрын
How can I use this with an AI coding assistant in VS Code? Specifically that the ai assistant will use local llm for most tasks and cloud llm for higher difficulty.
@moontraveler2940
@moontraveler2940 2 ай бұрын
Is it possible to use LM studio instead of ollama? Would be nice to see a tutorial how to set it up with cursor.
@julienduchesneau9747
@julienduchesneau9747 3 ай бұрын
I dont have an High end CPU nor GPU, so I am not sure if going local is doable for me, I am always confuse looking at what I need to go local, would be nice to have a video that would clarify a bit what you need as a minimum gear to run an acceptable AI. What should people aim in term of PC to enjoy local AI? I know things change fast, can we hope to have models so small one day to run on an old 2010 gear???
@sugaith
@sugaith 3 ай бұрын
I've actually tried this for a Production use-case and it did not work as expected. Simple questions as "What is the name of our Galaxy" were routed to GPT4o. Also there is a delay problem here because it takes a while for the RouteLLM model to run and decide the route.
@MingInspiration
@MingInspiration 3 ай бұрын
that's a shame. i was hoping that i can have 10 different models behind the scene and let it decide which one to pick to do the job. for example if it knows which one writes the best sql, and which one does the best python, it would just pick the right one. i think the idea is absolutely phenomenal, especially given now the number of models are exploding. it's like googling for the best model to answer your queries
@sugaith
@sugaith 3 ай бұрын
@@MingInspiration yes it would be great if we could easily attach our own trained model to decide the route
@hqcart1
@hqcart1 3 ай бұрын
It wont work of course! what the hell were you thinking??? it's a bad idea, and will fail, the best way is YOU are the one who should know how to route your SPECIFIC prompts to what LLM based on intensive testing, not a random routeLLM to decide for you...
@nasimobeid2945
@nasimobeid2945 3 ай бұрын
Does this support which llms can be experts of what? If not, I can only see it being useful if I had a very expensive model and a very cheap model.
@crazyKurious
@crazyKurious 3 ай бұрын
By the way, why do this at all, I simply did ICL, with llama3:8b, and it gives me {key:value} in response whether the query requires complex workload or lite workload and then I case switch. Simple, now I have removed llama3:8b to gpt-4o-mini. We need to get over this increase security crap, all these apis provided by large companies are safe and you have written guarantee.
@SvenReinck
@SvenReinck 3 ай бұрын
There are times when I find the answer of 3.5 better than 4o. Because 4o sometimes tries too hard to be helpful and it gets annoying. Also it seems 4o can’t just give part of an answer… It has to repeat the complete answer.
@rafaeldelrey9239
@rafaeldelrey9239 3 ай бұрын
Would it work for rag use cases, where the prompt can be way larger?
@gaylenwoof
@gaylenwoof 3 ай бұрын
I tried using GPT-4o to solve a seemingly simple problem, but it failed. I'm not the greatest prompt engineer so the failure might be on my part, but I spent several hours trying to refine the prompt and never solved the problem. 👉Question: If GPT4o can't solve it, is there much point in spending hours going from one to another trying to find an AI that can do it? Or is it more like: "If GPT-4o can't, no one can!"? The problem, in case you are interested: My problem is to translate the shape of a object (e.g., a boardgame miniature) into a string of numbers that represent the shape and then characterize other properties of the miniature (e.g., height, width, distribution of colors...). Procedure: I upload a photo of a miniature laying on graph paper. Each cell of the graph paper is numbered. The AI's job is to determine which cells were covered by the object, then list those cell numbers. That list would be the numerical representation of the shape. GPT-4o cannot consistently give correct answers for different miniatures. Perhaps this problem requires too much genuine understanding of visual data and, thus, I may need to wait until we have something closer to actual AGI? Or is there some AI that is better at handing the "meaning" of visual data better than GPT-4o?
@rbc2511
@rbc2511 3 ай бұрын
Does this work within agent work flows, where the general instruction set is extensive, even though the specific instance of the task may not be complex
@limebulls
@limebulls 3 ай бұрын
How is it compared to GPT4o-mini?
@smokewulf
@smokewulf 3 ай бұрын
My new LLM stack: AIOS, LLM Router, Mixture of Agents, Agency swarm, Tool Chest, Open Agents. I am working on improvements. Also, a reason and logic engine, a long-term memory engine, and a continuous learning engine
@knowit999
@knowit999 3 ай бұрын
the best use case for me would be if I can use it purely locally and have a small model like llama3 for most of the tasks and a vision llm. so it needs to know what choose. can this be used for that? thanks
@alessandrorossi1294
@alessandrorossi1294 3 ай бұрын
Any idea why they use Python3.11 and not the lastest version (Python 3.12)?
@actorjohanmatsfredkarlsson2293
@actorjohanmatsfredkarlsson2293 3 ай бұрын
GPToMini support for RouteLLM? API calls LLMs are still to expensive. What are the hardware requirement for building a serious weak model local inference.
@natsirhasan2288
@natsirhasan2288 3 ай бұрын
Assuming you want to run a 8b model it would need 6-8gb gpu to run smoothlly
@actorjohanmatsfredkarlsson2293
@actorjohanmatsfredkarlsson2293 3 ай бұрын
@@natsirhasan2288 My MacBook would handle that, but it wont handle the 70B. I would like to have a weak model that's actually on the level of GPToMini (or at least the bigger groq models). I'm guessing this would require a larger graphic card?
@natsirhasan2288
@natsirhasan2288 3 ай бұрын
@@actorjohanmatsfredkarlsson2293 70B model even in Q4 need like 40-50gb gpu to run... have you try gemma2 27B? The model beat llama 3 70b, its worth trying.
@eightrice
@eightrice 3 ай бұрын
but what LLM is the router using?
@kbb8030
@kbb8030 3 ай бұрын
Have you played with Fabric by Daniel Messler?
@Jeremy-Ai
@Jeremy-Ai 3 ай бұрын
Ok. Matt… I can sense your excitement, and recognize your potential. I am grateful for your work so I will return the favour. “Don’t let your hard work become a fundamental lesson in business foolishness” Take a breath before speaking openly. Seek wise and trustworthy leadership and counsel. I could go on. I don’t have to. Good luck Jeremy
@hqcart1
@hqcart1 3 ай бұрын
it's a bad idea, and will fail, the best way is YOU are the one who should know how to route your SPECIFIC prompts to what LLM based on intensive testing, not a random routeLLM to decide for you...
@OMGitsProvidence
@OMGitsProvidence 3 ай бұрын
It’s not a bad concept as a module or tool in a more comprehensive system but I’ve always thought a shortcoming of many models is token waste
@hqcart1
@hqcart1 3 ай бұрын
@@OMGitsProvidence you cant use it for serious production product.
@jtjames79
@jtjames79 3 ай бұрын
How much is your time worth? That's a level of micromanaging that's generally not worth it to me.
@BunnyOfThunder
@BunnyOfThunder 3 ай бұрын
Yeah this isn't meant for 1 person typing prompts into their keyboard. This is for large-scale production systems that are sending thousands of prompts per second. You need something automatic, fast, and cheap. I.e. not a human.
@DihelsonMendonca
@DihelsonMendonca 3 ай бұрын
It's a bad idea for one person using a laptop to chat in the free time. Think about great business, corporations who deals with millions of tokens ! 👍💥💥💥
@restrollar8548
@restrollar8548 3 ай бұрын
Not sure how useful this really is. When you're writing production agents, you need to send appropriate prompts to the weak/strong models to make sure you get consistency.
@hqcart1
@hqcart1 3 ай бұрын
it's useless, you are the one who should route your specific prompts based on testing testing & testing.
@bm830810
@bm830810 3 ай бұрын
okay, this is fine for single questions, but what about context?
@crazyKurious
@crazyKurious 3 ай бұрын
No, you made a mistake, Apple wont route it to chaTGPT they will route it to a bigger version of the same model in same Silicon slice in the cloud. ChatGPT is only used if you want to use it.
@modolief
@modolief 3 ай бұрын
Thanks!!!
@VastCNC
@VastCNC 3 ай бұрын
Can it work 3 tier? If I had a phi model locally, gpt4o mini, and Claude 3.5. I think it would make for an awesome dirt cheap setup.
@toadlguy
@toadlguy 3 ай бұрын
How would a router determine whether gpt4o mini or Claude 3.5 (I assume Sonnet) was more appropriate? They generally produce similar results.
@VastCNC
@VastCNC 3 ай бұрын
@@toadlguy better context length, and code specific tasks. Phi3 128k could max out gpt4o mini and there could be a case to route large outputs with additional context to Claude. In an agentic workflow, Claude could also be the “fixer” when there are errors on gpt4o mini outputs.
@JustinsOffGridAdventures
@JustinsOffGridAdventures 3 ай бұрын
If the snake game didn't' get verified, then you have no idea if you got a correct answer. Right. So, let's say that it failed the snake code. Well then, the prompts answer wasn't very good, was it? Also, during the apple sentence test always give the AI a chance to correct itself. If AI is working properly it should learn from it's faults.
3 ай бұрын
Does it still makes sense after 4o mini?
@donkeroo1
@donkeroo1 3 ай бұрын
A model that feeds a model that feeds a model that feeds yet another model. What could go wrong.
@donkeroo1
@donkeroo1 3 ай бұрын
@Matthew_bberman BTW, love the content.
@kasinadhsarma
@kasinadhsarma 3 ай бұрын
thankyou
@mirek190
@mirek190 3 ай бұрын
Have you seen leaked tests? Seems llama 3.1 70b is better than gpt-o and small llama 3.1 8b is smarter than "old" llama 3 70b! Insane.
@firatguven6592
@firatguven6592 3 ай бұрын
I wanr the best not the cheapest, therefore better if you show how to integrate gpt-4o(mini) or claude in to mixture of agents as main llm. MoA is really a fantastic framework. I am sorry but i don't see any benefit in this routellm for me. If I can use groq or good open source .models, i don't really want to save any cost there..we need the best of best...therefore improvement in MoA would be nice like custom system prompt or ability to use top frontier models
@OmerNesher
@OmerNesher 3 ай бұрын
ok, this is massive! how can this be integrated in OUI ? make it a pseudo LLM model?
@OmerNesher
@OmerNesher 3 ай бұрын
@Matthew_bberman I am what? the victor? victor of what?
@OmerNesher
@OmerNesher 3 ай бұрын
@Matthew_bberman it wasn't even in my mindset 😁 that's awesome. What's the next step? Thanks!
@OmerNesher
@OmerNesher 3 ай бұрын
?
@LiquidAIWater
@LiquidAIWater 3 ай бұрын
Say If AI LLMs are like workers, then why would a company hire a PHD in "everything" for some simple task? When you think about it, this is how human organizations work.
@mikestaub
@mikestaub 3 ай бұрын
This seems like a hack. The LLM itself should be able to do this internally eventually. Perhaps that is what project strawberry is about.
@dani43321
@dani43321 3 ай бұрын
Do a video on Mistral NeMo
@Robert-z9x
@Robert-z9x 2 ай бұрын
Loudermilk
@pacobelso
@pacobelso 3 ай бұрын
Without a shared context window this routerLLM is useless 🤷🏼
@throwaway6380
@throwaway6380 3 ай бұрын
Why do you dress like that...
@Legorreta.M.D
@Legorreta.M.D 3 ай бұрын
Because it makes him look like Skittles in the process of being digested. He likes it. 🍬🍭
@gani2an1
@gani2an1 3 ай бұрын
Is this not what aider does? My aider uses gpt4o-mini as the weak model
@garic4
@garic4 3 ай бұрын
I had high hopes for Gemini Advanced, but it was such a letdown. Unintuitive, glitchy, and the results were just awful. Don't waste your time. #disappointed
@8bitsadventures
@8bitsadventures 3 ай бұрын
Ok . 3rd here
@blackswann9555
@blackswann9555 3 ай бұрын
A.I. should be almost free or free with Ads soon.
@khanra17
@khanra17 3 ай бұрын
Openrouter Auto is faaaar better. In fact Openrouter is awesome!!!
@crazyKurious
@crazyKurious 3 ай бұрын
No, you made a mistake, Apple wont route it to chaTGPT they will route it to a bigger version of the same model in same Silicon slice in the cloud. ChatGPT is only used if you want to use it.
AI Coding BATTLE | Which Open Source Model is BEST?
12:50
Matthew Berman
Рет қаралды 4,7 М.
Intro to RAG for AI (Retrieval Augmented Generation)
14:31
Matthew Berman
Рет қаралды 62 М.
How Strong is Tin Foil? 💪
00:25
Brianna
Рет қаралды 57 МЛН
Elza love to eat chiken🍗⚡ #dog #pets
00:17
ElzaDog
Рет қаралды 16 МЛН
Wait for the last one 🤣🤣 #shorts #minecraft
00:28
Cosmo Guy
Рет қаралды 21 МЛН
Human vs Jet Engine
00:19
MrBeast
Рет қаралды 177 МЛН
ChatGPT Search is LIVE 🔍 The New Era of Search Begins
11:29
Matthew Berman
Рет қаралды 32 М.
This AI Coder Is On Another Level (Pythagora Tutorial)
43:21
Matthew Berman
Рет қаралды 131 М.
What are AI Agents?
12:29
IBM Technology
Рет қаралды 580 М.
o1 Insane LIVE Demos - o1 FULL Coming Soon! (OpenAI Dev Day London)
11:50
You've been using AI Wrong
30:58
NetworkChuck
Рет қаралды 540 М.
The moment we stopped understanding AI [AlexNet]
17:38
Welch Labs
Рет қаралды 1,2 МЛН
AI Pioneer Shows The Power of AI AGENTS - "The Future Is Agentic"
23:47
MoA + Groq - The Ultimate LLM Architecture (Tutorial)
5:50
Matthew Berman
Рет қаралды 38 М.
Every Developer Needs a Raspberry Pi
27:27
Sam Meech-Ward
Рет қаралды 821 М.
How Strong is Tin Foil? 💪
00:25
Brianna
Рет қаралды 57 МЛН