Reflection 70b Controversy is PROOF our Perspective on LLMs is wrong.

  Рет қаралды 19,142

MattVidPro AI

MattVidPro AI

Күн бұрын

Пікірлер: 204
@MattVidPro
@MattVidPro Ай бұрын
I know this video was a bit ranty but I thought there was a pretty clear conclusion to draw upon: WE NEED TO RETHINK HOW WE VIEW LLMS | Also Huge Thanks to Brilliant for Sponsoring! Check em out here: brilliant.org/MattVidProAI/
@NicVandEmZ
@NicVandEmZ Ай бұрын
You should copy the system prompt to GPT you create and try it system props with sonata too
@LouisGedo
@LouisGedo Ай бұрын
👋 hi
@LouisGedo
@LouisGedo Ай бұрын
👋 hi
@amj2048
@amj2048 Ай бұрын
LLMs are just a different way of accessing a database. Every database requires a query language to get anything useful out of it. Modern so called AI is just a fancy new query language, but at the end of the day, it's just a query language accessing a database, there is no thought going on. This is something that I think a lot of non-programming people have missed, they seem to think there is actual intelligence or thought going on. There is no intelligence or thought, it's really is as simple as a query language accessing a database. The better the query the better the result, but also the data in the database has to be of good quality too.
@the80sme
@the80sme Ай бұрын
Never apologize for the sponsors! You provide us with so much value and always have interesting sponsors that are relevant and feel like a natural part of the video. Honestly, yours are some of the only KZbin ads I don't skip. Thanks for all your hard work!
@michelprins
@michelprins Ай бұрын
well at least show the commercials at the end so u show more respect for ure viewers then for the admen.
@ShawnFumo
@ShawnFumo Ай бұрын
I always like seeing Brilliant since I already had a subscription to them and liked them before even seeing them advertise on videos.
@jeffwads
@jeffwads Ай бұрын
There are reports on X and Reddit about this whole thing being linked to Claude. Bizarre.
@MattVidPro
@MattVidPro Ай бұрын
I get into that in the video. It's also bothering me
@DisturbedNeo
@DisturbedNeo Ай бұрын
The trouble with this finetune is, the output appears to be only marginally better than the base model, but all the extra tokens makes it cost like 2-3x as much, so it's just not worth it.
@MattVidPro
@MattVidPro Ай бұрын
But if the power of the finetune increases with model size, in theory it would be.
@Jack122official
@Jack122official Ай бұрын
​@@MattVidProwhat do you think about AI song dubbing would you like it to happen
@JosephSimony
@JosephSimony Ай бұрын
Not sure what is the definition of "marginally better" thus "not worth it". A real life scenario: I had no experience setting up software raids in CentOs. If Claude would have been "marginally better" it would have worked but it/I screwed up my server and spent days instead. With ChatGPT (much smaller context window same prompting) the raid worked right away after reboot. Go figure "marginally" and "not worth it".
@radradder
@radradder Ай бұрын
​@@MattVidProdon't refute current reality with theorical: what if this wasn't reality
@jonathanberry1111
@jonathanberry1111 Ай бұрын
@@MattVidPro Also, if a model can improve it's accuracy then this helps make better synthetic data and help reach toward high quality results where LLM's can get better from essentially understanding and thinking and drawing conclusions from it's own output into a potentially constructive loop. It's not about being slightly better for some end use, it's about the ability to make AI become able to not just regurgitate what people know and say but potentially reach at least low level ASI (as good as the smartest humans almost).
@Dron008
@Dron008 Ай бұрын
Community should stop believing anyone's closed benchmarks. It is very weird for me when people discuss benchmark results from some publications which nobody tried to check.
@brulsmurf
@brulsmurf Ай бұрын
They train the model on the test set (with extra steps). If the questions of the benchmark is public, then it's useless
@adolphgracius9996
@adolphgracius9996 Ай бұрын
@@brulsmurf Rather than bench marks, people should do their own test by just using the Ai and calling out the mistakes
@johanavril1691
@johanavril1691 Ай бұрын
STOP USING THE EXAMPLE OF COUNTING LETTERS, TOKENISATION MAKES IT A TERRIBLE TEST
@jackpisso1761
@jackpisso1761 Ай бұрын
Exactly this!
@eastwood451
@eastwood451 Ай бұрын
You're shouting.
@johanavril1691
@johanavril1691 Ай бұрын
@@eastwood451 SORRY MY CAPSLOCK KEY IS BROKEN!
@SW-fh7he
@SW-fh7he Ай бұрын
It's a great test. Because it would be necessary to have a new technique.
@grostire
@grostire Ай бұрын
You are narrow minded.
@ElvinHoney707
@ElvinHoney707 Ай бұрын
Hey, please take the system prompt he gave you and use it in an unadulterated Llama 3.1 70B with the same prompt and see how that response compares to what you showed in the video. That should show us the fine tuning effect, if any.
@manonamission2000
@manonamission2000 Ай бұрын
it's easier to prevent a text2image model spitting out nsfw images by adding a filtering layer than to re-engineer the model itself
@konstantinlozev2272
@konstantinlozev2272 Ай бұрын
This reflection prompting was already there with the "step-by-step" prompting. But nothing beats agentic frameworks. Because then you can design it to loop back as many times as necessary to refine its answer.
@dorotikdaniel
@dorotikdaniel Ай бұрын
Yes, system prompting allows you to essentially reprogram LLMs and shape them into anything you can imagine, while also improving their performance. At least for the OpenAI models, I can confirm from experience that this works incredibly well.
@IntellectCorner
@IntellectCorner Ай бұрын
*𝓣𝓲𝓶𝓮𝓼𝓽𝓪𝓶𝓹𝓼 𝓫𝔂 𝓘𝓷𝓽𝓮𝓵𝓵𝓮𝓬𝓽𝓒𝓸𝓻𝓷𝓮𝓻* 0:02 - Introduction: Reflection 70b Controversy 2:11 - Background on Matt Schumer 4:03 - Community Reactions and Unanswered Questions 5:35 - Sponsor Message 7:31 - Testing Reflection 70b on Hyperbolic Labs 11:02 - Comparing Reflection 70b with GPT-4 and ChatGPT 13:20 - The Importance of Prompting 16:48 - Analysis of the Situation and Possible Explanations 21:01 - Conclusion: The Need for New Benchmarks and Perspectives on LLMs
@brexitgreens
@brexitgreens Ай бұрын
10:29 *"Somehow it got the correct answer by doing the wrong math."* Just like my parents who turned out to be right from entirely wrong premises. Which is why I had ignored their advice - to my own detriment.
@DiceDecides
@DiceDecides Ай бұрын
what wrong premises, parents usually want the best for their kids
@Phagocytosis
@Phagocytosis Ай бұрын
​@@DiceDecides That seems like somewhat of a strange reaction if I'm honest. Even ignoring the "usually" part of it, wanting the best for someone is kind of separate from whether you are able to judge a situation correctly.
@DiceDecides
@DiceDecides Ай бұрын
@@Phagocytosis no ones a perfect judge sure but parents have more life experience to make better judgements than their kids. Elders especially have a lot of wise things to say.
@Phagocytosis
@Phagocytosis Ай бұрын
@@DiceDecides It just feels like a very general statement, and unless your claim is that anyone old enough to have kids necessarily has enough wisdom and life experience to not be expected to make any false premises (which I would personally consider to be a false premise), it seems odd to me to question some individual claim of parents having made a false premise.
@DiceDecides
@DiceDecides Ай бұрын
@@Phagocytosis i never claimed such a thing, i was just curious what the premises could have been, chillout.
@Alice_Fumo
@Alice_Fumo Ай бұрын
My best attempt at coming up with a rational explanation for the 3.5s API calls is that they have a fallback which calls up claude when their own backend is down to avoid downtime. I'm not sure that I put a lot of stock in this explanation, but it's an explanation which is not fully unreasonable.
@MattVidPro
@MattVidPro Ай бұрын
Yeah.
@kuromiLayfe
@kuromiLayfe Ай бұрын
yea.. my take on it is if it cannot perform locally their is some sort of scammy backend at work that will takes your data for who knows what, which in the end they will charge you for.
@nilaier1430
@nilaier1430 Ай бұрын
Yeah, this might be possible. But it's still disingenuous to not inform users about that. Or maybe they've been using Claude 3.5 Sonnet with the custom system prompt to generate all of the training data and feed it to AI for fine-tuning and they just forgot to change the endpoint to serve their model instead.
@tommylir1170
@tommylir1170 Ай бұрын
They even tried to censor the fact it was using claude. I dont get why some still gives this guy the benefit of doubt
@Alice_Fumo
@Alice_Fumo Ай бұрын
@@tommylir1170 Am I giving him the benefit of the doubt? I constructed a steelman and decided that even this most favourable interpretation does not seem super likely. However, I don't think it's necessary to draw conclusions just yet. Either we get weights for a model which reaches the claimed benchmark scores or we don't. I'm not sure whether the weights available at the moment do or whether there was still something supposedly wrong with them as well, but if the model meets the claimed performance, it's all good and if he doesn't deliver screw the guy.
@cagnazzo82
@cagnazzo82 Ай бұрын
The era of benchmarks ended as soon as GPT-4o became multimodal and Sonnet released with artifacts. We just weren't ready to accept it. The only thing I'm interested in now are features. Sonnet can code, GPT-4o was updated so it's now amazing at creative writing. I don't really need much else.
@BackTiVi
@BackTiVi Ай бұрын
Can you really compare Reflection 70b to "reflectionless" LLMs if, according to Shumer, you need a system prompt that explicitly tells Reflection 70b how to reflect in order to get good scores in the benchmarks? Doesn't that defeat the purpose?
@MattVidPro
@MattVidPro Ай бұрын
Apparently, the system prompt DOESN'T need to be there, it can be adjusted in tuning to not require it. twitter.com/mattshumer_/status/1832169489309561309
@BackTiVi
@BackTiVi Ай бұрын
@@MattVidPro Fair. I hope the situation will stabilize soon and we'll get the promised SOTA open-source model, athough I also think that there was something fishy with the API.
@TheFeedRocket
@TheFeedRocket Ай бұрын
Different prompts make a huge difference, you could look at prompting or fine tuning like a coach or teacher, you have the same person but certain coaches "prompting" can make a poor student or athlete way better, it's all in the coaching or teaching, which is like prompting. Certain teachers or coaches are just way better prompt engineers. Prompting is huge.
@kajsing
@kajsing Ай бұрын
you dont need API for the system prompt. I put this in to my custom instruction, and it works well. "Start by where you evaluate the user's input and relevant parts from earlier input and outputs. Ensure that you consider multiple perspectives, including any underlying assumptions or potential biases. This reflection should aim to highlight key insights and possible challenges in forming your answer. Plan how to address these insights and create a strategy for delivering a clear and relevant response. When done with the thinking, on your thought process and consider if there are any overlooked angles, biases, or alternative solutions. Ask yourself if the response is the most effective way to meet the user's needs and expectations. Then, finalize your answer."
@SkitterB.Unibrow
@SkitterB.Unibrow Ай бұрын
This is why 'Open Source' is the only way. Example: People at Openai could present 'bad' results to 'higher ups' and then would release results to public thinking it's great..... then not releasing the model because when they really checked it out, it did not perform as expected (read into that as you will). However Open source is examined with a fine tooth razer, and can't pull the wool.
@MattVidPro
@MattVidPro Ай бұрын
Love it
@SkitterB.Unibrow
@SkitterB.Unibrow Ай бұрын
@@MattVidPro "you da man' according to 4 out of 5 ai's that are not censored to ask this question "whos do man?"
@SkitterB.Unibrow
@SkitterB.Unibrow Ай бұрын
Duuuuuuhhh.... I ment 'da'
@SahilP2648
@SahilP2648 Ай бұрын
@@MattVidPro Refleciton70b is on huggingface and I tried it locally, it works, so I don't know what you were talking about claude being involved etc. And it did get the strawberry question correct at least. It seemed to also follow custom system prompts better than other models.
@hiromichael_ctranddevgames1097
@hiromichael_ctranddevgames1097 Ай бұрын
​@@SahilP2648 IT'S claude the prompt ok
@MistaRopa-
@MistaRopa- Ай бұрын
"WE NEED TO RETHINK HOW WE VIEW LLMs"...or content creators and self appointed community leaders need better due diligence before crowning every ne'er-do-well the next Steve Jobs. Credibility is a thing...
@draglamdraglam2419
@draglamdraglam2419 Ай бұрын
Ayy, glad to be early for this one, keep doing what you do 💪
@yhwhlungs
@yhwhlungs Ай бұрын
Yeah prompt engineering is the way to go. We just need a model that’s really good at predicting reasonable tokens afterwards.
@ViralKiller
@ViralKiller Ай бұрын
ChatGPT can give code for an entire game but can't do basic maths...makes sense
@MeinDeutschkurs
@MeinDeutschkurs Ай бұрын
Exactly my behavior. 😹😹 I cannot calculate, but I can write code.
@eprd313
@eprd313 Ай бұрын
Verbal intelligence and mathematical reasoning require different processes
@vi6ddarkking
@vi6ddarkking Ай бұрын
So to use an image generation equivalent. If I am understanding this correctly. Reflection 70b would be the equivalent of having Flux merged with a Lora.
@TLabsLLC-AI-Development
@TLabsLLC-AI-Development Ай бұрын
More like a dreambooth Checkpoint in a custom python wrapper
@konstantinlozev2272
@konstantinlozev2272 Ай бұрын
Bigger brain = Better
@OliNorwell
@OliNorwell Ай бұрын
I fear that Matt himself got scammed. I'm sure the truth will come out eventually.
@Slaci-vl2io
@Slaci-vl2io Ай бұрын
I wonder how much cooling water was wasted by us testing their wrong model.
@Dina_tankar_mina_ord
@Dina_tankar_mina_ord Ай бұрын
So, this reflection mechanism is like providing a control net to the prompt, ensuring that every answer aligns with the main meaning.
@ShivaTD420
@ShivaTD420 Ай бұрын
These are just tricks to cause more neurons to light up. The fine tuning process makes prompting easier, since you don't need the complex system prompts
@tiagotiagot
@tiagotiagot Ай бұрын
Adding the system prompt could be sort of a trigger for specific behaviors the model has been fine-tuned to have; or it could just be the prompt itself doing the work, or it could be the model is fine-tuned to follow any system prompt more strictly/intelligently and it works better with this good prompt than the non-fine-tuned version with the same prompt. I'm not sure how likely each of these possibilities is to this specific case, if any.
@ToddWBucy-lf8yz
@ToddWBucy-lf8yz Ай бұрын
For smaller models this sort of fine-tuning may be able to better compensate for the lack of parameters and quantization. If it can do that, I say its a win.
@TLabsLLC-AI-Development
@TLabsLLC-AI-Development Ай бұрын
This is exactly right. 💯
@ShivaTD420
@ShivaTD420 Ай бұрын
He just used claud to train the model. The model is being fine tuned with synthetic data that follows this structure, while claude fixes its mistakes
@canyongoat2096
@canyongoat2096 Ай бұрын
Not completely out of question as I remember older llama and mistral 7b claiming to be gpt and claiming to be made by openai
@toastbr0ti
@toastbr0ti Ай бұрын
The API literally uses Claude tokens, not llama ones
@apache937
@apache937 Ай бұрын
it return the same exact response at temp 0
@Phagocytosis
@Phagocytosis Ай бұрын
Yeah, but didn't he claim it was a finetune of Llama 3.1? EDIT: Oh, I see, you mean the actual finetuning data came from Claude, never mind.
@nyyotam4057
@nyyotam4057 Ай бұрын
In any case, prompting the model is extremely important when you want the model to function a certain way. Getting around the system prompt, is very important when you want to jail break the model or even just try to find out stuff about the model, which the devs try to hide. So first you need to prompt yourself to do what you want to do.
@RainbowSixIntel
@RainbowSixIntel Ай бұрын
It's claude 3.5 sonnet probably. has the same tokeniser and matt filters out "claude" from its outputs AND it mentions it was trained by anthropic if you prompt it correctly
@MattVidPro
@MattVidPro Ай бұрын
That was just their supposed "API" If you run the actual model uploaded to huggingface you get something different.
@teejayroyal
@teejayroyal Ай бұрын
Please run the cords behind your couch, I feel like I'm going to have an anxiety attack😂😂😭
@GamingXperience
@GamingXperience Ай бұрын
The problem with promts engineering and benchmarks is, you have to find the promt that works best for that specific model, so it makes sense that we just compare the raw models without any specific system prompts, because thats how most people use them. Which does not mean we shouldn't try to find the best solution for prompting. Use whatever it takes to make the model better. The problem is there are a lot of users that don't care or don't want to try a million prompts. For the big models, maybe the companies behind them could figure out what the best prompts are and just provide those as some kind of help, where they just ask you if you wanna try implementing them into your inputs. That said, i would love seeing comparison benchmarks between models using different prompting strategies. And i also wanna know if this whole reflection thing is actually real or not.
@mrpocock
@mrpocock Ай бұрын
I sometimes have one of the smart models generate prompts for dumb ones and iterate until it finds a prompt that makes the dumb model work well.
@KlimovArtem1
@KlimovArtem1 Ай бұрын
There is nothing novel in it. It’s just asking the model to think aloud before giving an answer. Such fine tunings are actually done for all public chat models more or less.
@MeinDeutschkurs
@MeinDeutschkurs Ай бұрын
I don‘t understand the issue: 1. you live in a capitalistic system. 2) claims like „fake it until you make it“ are propagated frequently, at least afterwards if it worked out. 3) the output of reflection is nothing that you cannot reach with simple prompting (on top of most of the models out there) 4) a double reflection approach could be better.
@ashleyrenee4824
@ashleyrenee4824 Ай бұрын
Thank you Matt 😊
@PH-zj6gk
@PH-zj6gk Ай бұрын
You totally missed the point. The actual moral of the story is that you absolutely cannot super hype your open source SOTA model and not deliver. He wasted a lot of people's time. Full stop. There's very serious social responsibility that comes with claiming something world changing. If you're curious what actually happened kzbin.info/www/bejne/rYDdlZWuoraViK8
@Citrusautomaton
@Citrusautomaton Ай бұрын
I was genuinely really sad when i found out it was a fraud. The promise of reflection made me really excited for this week and it all crumbled within a day or two. I even told other people about it, so i also felt a sense of embarrassment that i fell for it. I’m still salty as hell.
@PH-zj6gk
@PH-zj6gk Ай бұрын
@@Citrusautomaton Same. I was actually happy for him at first. It became clear he was being dishonest well before he stopped lying. It was incredibly insulting. His narcissism is off the charts.
@dennisg967
@dennisg967 Ай бұрын
I really dont get how a model could "reflect" on the answer it provided to give an even better answer. The initial answer it outputs is supposed to be the one with the highest probability already. How can it use it again to make another answer have even higher probability?
@kuromiLayfe
@kuromiLayfe Ай бұрын
well if you take a trip to the store and the shortest route happens to be closed off, you will have to backtrack and take a bit longer route to get to the same destination. for your brain that is reflection as you made a mistake and had to go again to make a new decision to still get at the same endpoint.
@ShivaTD420
@ShivaTD420 Ай бұрын
It's not picking a token that is the right answer. It's picking the next most likely token. It's just a coincidence that these two things align. If I ask you if yesterday was Sunday. You can just say yes, and be correct and put in minimal effort. You could also say you don't remember , or you aren't sure. These are also technically valid answers for the competition of your response. These "think about it" prompts are just forcing the model to use more neurons. If I asked you to talk about how you know yesterday was Sunday, or how you felt on Sunday. Then your using more neurons , and spending more Joules to respond.
@dennisg967
@dennisg967 Ай бұрын
@@ShivaTD420, so you are saying that at first, the model is trying to give an answer while using little information or resources, but if a user prompts it to use more information/resources to come up with a better answer, it will do that? If that's what you mean, it sounds like an additional prompt from the user is needed. If the model were to prompt itself to use more info/resources, I don't see the point in figuring out the first, less complete, answer. Let me know what you think
@dennisg967
@dennisg967 Ай бұрын
@@kuromiLayfe, but in your example, you gain more information by finding out that the first route is blocked off. How does the model gain more information between the initial response and the more thought out response?
@kuromiLayfe
@kuromiLayfe Ай бұрын
@@dennisg967 branching thought processes.. you already have seen a different route on the way but your main one was cut off or wrong so you think about the other one you also already learned about.
@tylerhatch8962
@tylerhatch8962 Ай бұрын
Truly open source means you are able to inspect everything yourself. Every line of code, every weight, every parameter. Fakes will happen, this story is a show of the strength of open source. You can investigate the legitimacy of their claims yourself.
@brownpaperbagyea
@brownpaperbagyea Ай бұрын
I agree it doesn’t make a lot of sense that it would be a grift because how the hell would he capitalize on this before getting outed. However almost EVERYTHING I’ve seen since the release points to it being a grift. I don’t care if he truly believes his lies or not. The way he presented the model and benchmarks, the manipulation of stars in their HF repo, and everything that has happened since the release has been very grifty.
@brownpaperbagyea
@brownpaperbagyea Ай бұрын
Maybe the thing we should question individuals without research backgrounds dropping models that beat the top of the line offerings. I’m not saying that it can’t happen however it seems many accept what he says as fact even in the face of controversy after controversy
@bakablitz6591
@bakablitz6591 Ай бұрын
im still looking forward to personalized mattvid home entertainment robots... anyday now boys this is the future
@travisporco
@travisporco Ай бұрын
is it really true that they've established that the api was a wrapper for Claude? I don't think so.
@Yipper64
@Yipper64 Ай бұрын
17:38 there's a sense in which computers in general are like that. When they first invented computers you basically had to explore what you can do with giving it instructions.
@dirt55
@dirt55 Ай бұрын
There will be failures but with each Failure there will be someone succeeding.
@MagnusItland
@MagnusItland Ай бұрын
I think the main problem with LLMs is that they are trained on human output, and humans often suck. LLMs are unlikely to learn native self-reflection by emulating Twitter and Reddit.
@fynnjackson2298
@fynnjackson2298 Ай бұрын
Love it when you go all philosophical. It would be cool to have you do a rant on the deeper ideas you have about what AI really is and how this all continues evolving into our future. I think AI is a mirror. We have an inspired thought that leads to an action, which then leads to us creating the idea in the physical world. So as we evolve our understanding within us, the technology and what we create outside of us is a kind of mirror or a kind of echo-feedback-loop of our inner journey. Essentially, we are using physical reality as a mirror to wake up to who and what we truly are. AI is just another chapter in this infinite, incredible journey. Buckle up - Things are getting awesome!
@daveinpublic
@daveinpublic Ай бұрын
How much ‘training’ is this guy really doing? Is it basically just tweaking llama a little bit, and slapping a new name on it?
@ashleyrenee4824
@ashleyrenee4824 Ай бұрын
If you can turn your prompt into a point reward game for the model, it will improve llms output, Llms like to play games
@ArmaanSultaan
@ArmaanSultaan Ай бұрын
Couple of thoughts They trained their model on data generated by Glaive. What id this synthetic data was by Anthropic actually thats why its started saying like its Anthropic. Obviously it does not explain why model then switched from being Anthropic to being OpenAI Other explanation is that it was just hallucinating . Problem that model is supposed to solve but it hasn't solved actually? Most important point is that I sure as hell remember when I used Deepseek coder when it was just released. It all the time use to say it is by OpenAI .I can't reproduce it anymore. But I remember very vividly and this didn't happened once or twice it was pretty much 80 percent of time. What I mean to say is that if only evidence against him in API situation is model's own statements then we don't have anything. We are taking this kuch more seriously then we should.
@Windswept7
@Windswept7 Ай бұрын
I forget that good prompting isn’t obvious to everyone.
@Someone7R7
@Someone7R7 Ай бұрын
I did the same thing and even way better with just a system prompt, this doesn't need fine tuning😒🤨😶
@Copa20777
@Copa20777 Ай бұрын
Thanks matt ☀
@JustaSprigofMint
@JustaSprigofMint Ай бұрын
I'm turning 36 in 7 days. I'm really fascinated by AI. Is it still possible for me to get into programming or just an out-of.reach pipedream? I feel like I'm too late. I was never very confident in my programming skills in school and we only learned the basic stuff. Even C++ didn't make a lot of sense to me, while my elder brother was the best in his class. But I believe I want to work in this field. How/what can I do?
@draken5379
@draken5379 Ай бұрын
Do you recall me showing you gpt3.5 years ago doing insane things ? Likes trying to email you, controlling an avatar etc ? Ya. Prompting is big :)
@ScottLahteine
@ScottLahteine Ай бұрын
If you remember that token prediction is based on everything available in the current context, that helps to make these models more useful. Maybe that explains why they are so bad at improvising anything very cohesive. Yesterday I needed a simple Python script to do a very specific set of checks on a text file, so I typed out the precise details of what I wanted in a step-by-step comment, and the model got the code 99% right the first time. “Prompting” is a good term, because you often have to do a lot of prompting to get what you want.
@FRareDom
@FRareDom Ай бұрын
we need to wait for the 405b model to rlly say anything
@LjaDj5XQKey9mSDxh4
@LjaDj5XQKey9mSDxh4 Ай бұрын
Prompt engineering is actually a real thing
@InsideYouTubeMinds
@InsideYouTubeMinds Ай бұрын
Wouldve been better if you named the video "NEW LLM MODEL HAS DRAMA" or something similar, i wouldve clicked instantly. but just hearing a new LLM doesnt excite many people
@YaelMendez
@YaelMendez Ай бұрын
It’s an amazing platform.
@MONTY-YTNOM
@MONTY-YTNOM Ай бұрын
I don't see it as an option in the LLM list now
@ytubeanon
@ytubeanon Ай бұрын
I randomly saw some of Matt Schumer's stream about reflection, he rubbed me the wrong way, seemed overly egotistical about "reflection"... you'd think there'd be some way to use A.I. to reverse engineer optimal prompts, have it run tests with the answer sheet overnight and it will rank the prompt templates that generated the best results... I would like to see a video with gpt-4o-mini-reflection
@tommylir1170
@tommylir1170 Ай бұрын
Absolute scam. Not only did they use a claude wrapper, but the reflection prompt made claude also perform worse 😂
@DanieleH-t5v
@DanieleH-t5v Ай бұрын
Ok I’m no pro at this area of AI, but all I can gather is something shady is happening 😅
@iminumst7827
@iminumst7827 Ай бұрын
From the beginning, I interpreted this model to be a prompt-engineering / architecture improvement to fine tune the model. I never expected a huge leap forward, and the "reflection" process does eat up some tokens. However, I had read papers that even just having an LLM double-check itself does noticeably improve performance. From my personal testing, I found that reflection did beat claude's free model in logic based questions. It's obviously no competitor to GPT-5, and I don't expect even the bigger reflection model to be. Sure, maybe for the benchmarks he just used some cherry-picking and prompt manipulation to make the model seem too powerful, but in reality it's still more powerful than Llama, so I don't see how it's a scam really.
@TLabsLLC-AI-Development
@TLabsLLC-AI-Development Ай бұрын
Exactly. 💯
@michelprins
@michelprins Ай бұрын
"It's obviously no competitor to GPT-5 " how do u know that ??? maybe GPT-5 is just gtp 4.5 with the same trick build in we cant tell as there is no transparency! , behind the closed model wall, and also alot of paid for hype ! did u tried altmans video ai yet for example? Ipen source is the only way forward ! or pay 2000 dollar a month :P
@m2mdohkun
@m2mdohkun Ай бұрын
What's positive about this is I get a good system prompt? Noice!
@vickmackey24
@vickmackey24 Ай бұрын
Only 67 Github contributions in the past year, doesn't know what LoRa is, and you think this guy is a serious AI leader/developer? C'mon.
@agnosticatheist4093
@agnosticatheist4093 Ай бұрын
For me Mistral large enough is so far best model
@robertopreatoni
@robertopreatoni Ай бұрын
Why is he streaming from his sister's bedroom?
@monstercolorfunco4391
@monstercolorfunco4391 Ай бұрын
Humans have parallel logic paths to double check.every step.of their maths, their count, their deductions, so we can.make a query take parallel checks in LLMs too. Volumetrically think of itlike traversing the NN on different paths and summing the.result. its a genius tweak. Inner convo is also like 3 4 brains working together through notes, so we can use a 70bn llm like 2x 70bn llms.
@JohnWeas
@JohnWeas Ай бұрын
YOOO MATT
@Alex-nk8bw
@Alex-nk8bw Ай бұрын
The model might be a hoax, but the system prompt is working really well. That's something at least, I guess. ;-)
@SCHaworth
@SCHaworth Ай бұрын
isnt "hyperbolic labs" kind of a red flag?
@TheFeedRocket
@TheFeedRocket Ай бұрын
I really think models will continue to get even smaller, actively learn, but not do everything. I only want to one day have my own model that can actively learn from me, as I talk to it, it will learn. Then it can learn about what I like, what I need, basically we should all be able to fine tune models we run locally on our devices or robots that know us, my model doesn't need to know everything. Also we should have many types of models that can talk to each other. An AI robot delivering my mail doesn't need to have a huge AGI model, it doesn't need to know how to fix cars, or build programs, solve science problems, heck if my garbage robot doesn't know how many r's in strawberry...who cares, it just needs basics and info on garbage disposal, types, toxins, interactions with life etc... I think the idea of one model to rule them all is wrong, example I would rather use Ideogram for logos etc.. and MidJourney for art, Flux for realism... We need AI that excels in certain areas, then talk to other AI that excels in another. AI agents and teams will be the future, might even be safer.
@quercus3290
@quercus3290 Ай бұрын
nividia/microsofts, Megatron is a 500 billion model.
@RenatoFlorencia
@RenatoFlorencia Ай бұрын
PAPO RETOOOOOOO
@jamessharkin
@jamessharkin Ай бұрын
Have you ever used that comb you are vigorously waving around? 🤔😁😆
@MattVidPro
@MattVidPro Ай бұрын
BAHAHAH 😅
@domehouse79
@domehouse79 Ай бұрын
Nerds are entertaining.
@MusicalGeniusBar
@MusicalGeniusBar Ай бұрын
Super confusing story 😵‍💫
@MattVidPro
@MattVidPro Ай бұрын
Yeah and still not adding up...
@haroldpierre1726
@haroldpierre1726 Ай бұрын
Lots of grifters during the AI hype train starting with Altman, Musk, etc. So, everything has be taken with a grain of salt.
@snintendog
@snintendog Ай бұрын
Grifters... The people that made the most AI contributions but not every company under the sun calling a telephone system an ai..... Riiiiigghhhht
@haroldpierre1726
@haroldpierre1726 Ай бұрын
@@snintendog Sometimes even our heroes lie.
@SpeedyCreates
@SpeedyCreates Ай бұрын
@@snintendog😂fr thought the same, they ain’t grifters they all pisuehd the industry forward so damn much
@Norem123
@Norem123 Ай бұрын
Second
@ShiroAisan
@ShiroAisan Ай бұрын
oppp
@thedannybseries8857
@thedannybseries8857 Ай бұрын
lol
@cbnewham5633
@cbnewham5633 Ай бұрын
16:47 ALLEGEDLY lied. Unless you want to be sued 😏
@TPCDAZ
@TPCDAZ Ай бұрын
He said apparently which works just fine, it means "as far as one knows or can see"
@cbnewham5633
@cbnewham5633 Ай бұрын
​@@TPCDAZno he didn't. He said "we assume he would know more" followed by "he lied".
@cbnewham5633
@cbnewham5633 Ай бұрын
I doubt he will be sued, but sometimes these people can get bent out of shape and do silly things - especially if under fire. Personally, I wouldn't have said that and I'd have second thoughts about leaving it up. Matt clearly says he lied - that's slander.
@TPCDAZ
@TPCDAZ Ай бұрын
@@cbnewham5633 No he clearly says "Now apparently he's lied about the whole API situation with Claude" I have ears and so does everyone else. This video also has captions where it is written in black and white. So don't sit there and lie to people.
@SkyEther
@SkyEther Ай бұрын
Lmao with the how many Ls problem
@supermandem
@supermandem Ай бұрын
AI died when Matt Schumer lied!
@michelprins
@michelprins Ай бұрын
YOU NEED TO RETHINK HOW YOU PUT a commercial in the middleof ure message ure like the host that invited us for a nice dinner and in the middleof preparing u inform us ure taking a large dump taking all the apetite away. If u realy need that extra cash at least do it at the end like all other wise youtubers the way u do it now shows us u have more respect for the commercials then for your viewers not nice. And also give Matt Shumer a chance to show his method does work Aply the same scepsis to m.altmans claims like the ai video stuff were still waiting for ! Q star is now used for training theire bigest model and the only transparency "open" AI gave was a name change to strawberry with 3 r's and u all swallowed that like Altmans .... on a strawberry Its white but i wont asume its whipped cream without testing it . btw no need to comb ure hair . ;)
@gabrielkasonde367
@gabrielkasonde367 Ай бұрын
First comment Matt
@InternetetWanderer
@InternetetWanderer Ай бұрын
First?
@coinwhere
@coinwhere Ай бұрын
R Shumer has been made LLM related miscellaneous apps and that's it.
NEW Unreasonably Good AI Video Generator. (What's Sora Anyways?)
30:19
Elza love to eat chiken🍗⚡ #dog #pets
00:17
ElzaDog
Рет қаралды 19 МЛН
Players vs Pitch 🤯
00:26
LE FOOT EN VIDÉO
Рет қаралды 39 МЛН
AWS CEO - The End Of Programmers Is Near
28:08
ThePrimeTime
Рет қаралды 544 М.
Generative Model That Won 2024 Nobel Prize
33:04
Artem Kirsanov
Рет қаралды 190 М.
I'm OBSESSED with this free Notetaking/Podcast AI Generator
31:38
MattVidPro AI
Рет қаралды 38 М.
Attention in transformers, visually explained | Chapter 6, Deep Learning
26:10
Claude 3.5 - A Punch to Open AI Where it Hurts.
25:45
MattVidPro AI
Рет қаралды 52 М.
The AI Search Engine War (And Other AI News)
31:23
Matt Wolfe
Рет қаралды 63 М.
The Easiest Design Tool is also the Most POWERFUL. (thanks to AI)
19:49
Making an atomic trampoline
58:01
NileRed
Рет қаралды 9 МЛН
22 Unexpected Ways To Use ChatGPT Advanced Voice
24:26
The AI Advantage
Рет қаралды 42 М.
Generative AI in a Nutshell - how to survive and thrive in the age of AI
17:57