Reflection 70b Controversy is PROOF our Perspective on LLMs is wrong.

  Рет қаралды 19,170

MattVidPro AI

MattVidPro AI

Күн бұрын

Пікірлер: 204
@MattVidPro
@MattVidPro 2 ай бұрын
I know this video was a bit ranty but I thought there was a pretty clear conclusion to draw upon: WE NEED TO RETHINK HOW WE VIEW LLMS | Also Huge Thanks to Brilliant for Sponsoring! Check em out here: brilliant.org/MattVidProAI/
@NicVandEmZ
@NicVandEmZ 2 ай бұрын
You should copy the system prompt to GPT you create and try it system props with sonata too
@LouisGedo
@LouisGedo 2 ай бұрын
👋 hi
@LouisGedo
@LouisGedo 2 ай бұрын
👋 hi
@amj2048
@amj2048 2 ай бұрын
LLMs are just a different way of accessing a database. Every database requires a query language to get anything useful out of it. Modern so called AI is just a fancy new query language, but at the end of the day, it's just a query language accessing a database, there is no thought going on. This is something that I think a lot of non-programming people have missed, they seem to think there is actual intelligence or thought going on. There is no intelligence or thought, it's really is as simple as a query language accessing a database. The better the query the better the result, but also the data in the database has to be of good quality too.
@the80sme
@the80sme 2 ай бұрын
Never apologize for the sponsors! You provide us with so much value and always have interesting sponsors that are relevant and feel like a natural part of the video. Honestly, yours are some of the only KZbin ads I don't skip. Thanks for all your hard work!
@michelprins
@michelprins 2 ай бұрын
well at least show the commercials at the end so u show more respect for ure viewers then for the admen.
@ShawnFumo
@ShawnFumo 2 ай бұрын
I always like seeing Brilliant since I already had a subscription to them and liked them before even seeing them advertise on videos.
@jeffwads
@jeffwads 2 ай бұрын
There are reports on X and Reddit about this whole thing being linked to Claude. Bizarre.
@MattVidPro
@MattVidPro 2 ай бұрын
I get into that in the video. It's also bothering me
@DisturbedNeo
@DisturbedNeo 2 ай бұрын
The trouble with this finetune is, the output appears to be only marginally better than the base model, but all the extra tokens makes it cost like 2-3x as much, so it's just not worth it.
@MattVidPro
@MattVidPro 2 ай бұрын
But if the power of the finetune increases with model size, in theory it would be.
@Jack122official
@Jack122official 2 ай бұрын
​@@MattVidProwhat do you think about AI song dubbing would you like it to happen
@JosephSimony
@JosephSimony 2 ай бұрын
Not sure what is the definition of "marginally better" thus "not worth it". A real life scenario: I had no experience setting up software raids in CentOs. If Claude would have been "marginally better" it would have worked but it/I screwed up my server and spent days instead. With ChatGPT (much smaller context window same prompting) the raid worked right away after reboot. Go figure "marginally" and "not worth it".
@radradder
@radradder 2 ай бұрын
​@@MattVidProdon't refute current reality with theorical: what if this wasn't reality
@jonathanberry1111
@jonathanberry1111 2 ай бұрын
@@MattVidPro Also, if a model can improve it's accuracy then this helps make better synthetic data and help reach toward high quality results where LLM's can get better from essentially understanding and thinking and drawing conclusions from it's own output into a potentially constructive loop. It's not about being slightly better for some end use, it's about the ability to make AI become able to not just regurgitate what people know and say but potentially reach at least low level ASI (as good as the smartest humans almost).
@johanavril1691
@johanavril1691 2 ай бұрын
STOP USING THE EXAMPLE OF COUNTING LETTERS, TOKENISATION MAKES IT A TERRIBLE TEST
@jackpisso1761
@jackpisso1761 2 ай бұрын
Exactly this!
@eastwood451
@eastwood451 2 ай бұрын
You're shouting.
@johanavril1691
@johanavril1691 2 ай бұрын
@@eastwood451 SORRY MY CAPSLOCK KEY IS BROKEN!
@SW-fh7he
@SW-fh7he 2 ай бұрын
It's a great test. Because it would be necessary to have a new technique.
@grostire
@grostire 2 ай бұрын
You are narrow minded.
@ElvinHoney707
@ElvinHoney707 2 ай бұрын
Hey, please take the system prompt he gave you and use it in an unadulterated Llama 3.1 70B with the same prompt and see how that response compares to what you showed in the video. That should show us the fine tuning effect, if any.
@Dron008
@Dron008 2 ай бұрын
Community should stop believing anyone's closed benchmarks. It is very weird for me when people discuss benchmark results from some publications which nobody tried to check.
@brulsmurf
@brulsmurf 2 ай бұрын
They train the model on the test set (with extra steps). If the questions of the benchmark is public, then it's useless
@adolphgracius9996
@adolphgracius9996 2 ай бұрын
@@brulsmurf Rather than bench marks, people should do their own test by just using the Ai and calling out the mistakes
@manonamission2000
@manonamission2000 2 ай бұрын
it's easier to prevent a text2image model spitting out nsfw images by adding a filtering layer than to re-engineer the model itself
@konstantinlozev2272
@konstantinlozev2272 2 ай бұрын
This reflection prompting was already there with the "step-by-step" prompting. But nothing beats agentic frameworks. Because then you can design it to loop back as many times as necessary to refine its answer.
@dorotikdaniel
@dorotikdaniel 2 ай бұрын
Yes, system prompting allows you to essentially reprogram LLMs and shape them into anything you can imagine, while also improving their performance. At least for the OpenAI models, I can confirm from experience that this works incredibly well.
@SkitterB.Unibrow
@SkitterB.Unibrow 2 ай бұрын
This is why 'Open Source' is the only way. Example: People at Openai could present 'bad' results to 'higher ups' and then would release results to public thinking it's great..... then not releasing the model because when they really checked it out, it did not perform as expected (read into that as you will). However Open source is examined with a fine tooth razer, and can't pull the wool.
@MattVidPro
@MattVidPro 2 ай бұрын
Love it
@SkitterB.Unibrow
@SkitterB.Unibrow 2 ай бұрын
@@MattVidPro "you da man' according to 4 out of 5 ai's that are not censored to ask this question "whos do man?"
@SkitterB.Unibrow
@SkitterB.Unibrow 2 ай бұрын
Duuuuuuhhh.... I ment 'da'
@SahilP2648
@SahilP2648 2 ай бұрын
@@MattVidPro Refleciton70b is on huggingface and I tried it locally, it works, so I don't know what you were talking about claude being involved etc. And it did get the strawberry question correct at least. It seemed to also follow custom system prompts better than other models.
@hiromichael_ctranddevgames1097
@hiromichael_ctranddevgames1097 2 ай бұрын
​@@SahilP2648 IT'S claude the prompt ok
@yhwhlungs
@yhwhlungs 2 ай бұрын
Yeah prompt engineering is the way to go. We just need a model that’s really good at predicting reasonable tokens afterwards.
@kajsing
@kajsing 2 ай бұрын
you dont need API for the system prompt. I put this in to my custom instruction, and it works well. "Start by where you evaluate the user's input and relevant parts from earlier input and outputs. Ensure that you consider multiple perspectives, including any underlying assumptions or potential biases. This reflection should aim to highlight key insights and possible challenges in forming your answer. Plan how to address these insights and create a strategy for delivering a clear and relevant response. When done with the thinking, on your thought process and consider if there are any overlooked angles, biases, or alternative solutions. Ask yourself if the response is the most effective way to meet the user's needs and expectations. Then, finalize your answer."
@cagnazzo82
@cagnazzo82 2 ай бұрын
The era of benchmarks ended as soon as GPT-4o became multimodal and Sonnet released with artifacts. We just weren't ready to accept it. The only thing I'm interested in now are features. Sonnet can code, GPT-4o was updated so it's now amazing at creative writing. I don't really need much else.
@brexitgreens
@brexitgreens 2 ай бұрын
10:29 *"Somehow it got the correct answer by doing the wrong math."* Just like my parents who turned out to be right from entirely wrong premises. Which is why I had ignored their advice - to my own detriment.
@DiceDecides
@DiceDecides 2 ай бұрын
what wrong premises, parents usually want the best for their kids
@Phagocytosis
@Phagocytosis 2 ай бұрын
​@@DiceDecides That seems like somewhat of a strange reaction if I'm honest. Even ignoring the "usually" part of it, wanting the best for someone is kind of separate from whether you are able to judge a situation correctly.
@DiceDecides
@DiceDecides 2 ай бұрын
@@Phagocytosis no ones a perfect judge sure but parents have more life experience to make better judgements than their kids. Elders especially have a lot of wise things to say.
@Phagocytosis
@Phagocytosis 2 ай бұрын
@@DiceDecides It just feels like a very general statement, and unless your claim is that anyone old enough to have kids necessarily has enough wisdom and life experience to not be expected to make any false premises (which I would personally consider to be a false premise), it seems odd to me to question some individual claim of parents having made a false premise.
@DiceDecides
@DiceDecides 2 ай бұрын
@@Phagocytosis i never claimed such a thing, i was just curious what the premises could have been, chillout.
@Alice_Fumo
@Alice_Fumo 2 ай бұрын
My best attempt at coming up with a rational explanation for the 3.5s API calls is that they have a fallback which calls up claude when their own backend is down to avoid downtime. I'm not sure that I put a lot of stock in this explanation, but it's an explanation which is not fully unreasonable.
@MattVidPro
@MattVidPro 2 ай бұрын
Yeah.
@kuromiLayfe
@kuromiLayfe 2 ай бұрын
yea.. my take on it is if it cannot perform locally their is some sort of scammy backend at work that will takes your data for who knows what, which in the end they will charge you for.
@nilaier1430
@nilaier1430 2 ай бұрын
Yeah, this might be possible. But it's still disingenuous to not inform users about that. Or maybe they've been using Claude 3.5 Sonnet with the custom system prompt to generate all of the training data and feed it to AI for fine-tuning and they just forgot to change the endpoint to serve their model instead.
@tommylir1170
@tommylir1170 2 ай бұрын
They even tried to censor the fact it was using claude. I dont get why some still gives this guy the benefit of doubt
@Alice_Fumo
@Alice_Fumo 2 ай бұрын
@@tommylir1170 Am I giving him the benefit of the doubt? I constructed a steelman and decided that even this most favourable interpretation does not seem super likely. However, I don't think it's necessary to draw conclusions just yet. Either we get weights for a model which reaches the claimed benchmark scores or we don't. I'm not sure whether the weights available at the moment do or whether there was still something supposedly wrong with them as well, but if the model meets the claimed performance, it's all good and if he doesn't deliver screw the guy.
@MistaRopa-
@MistaRopa- 2 ай бұрын
"WE NEED TO RETHINK HOW WE VIEW LLMs"...or content creators and self appointed community leaders need better due diligence before crowning every ne'er-do-well the next Steve Jobs. Credibility is a thing...
@TheFeedRocket
@TheFeedRocket 2 ай бұрын
Different prompts make a huge difference, you could look at prompting or fine tuning like a coach or teacher, you have the same person but certain coaches "prompting" can make a poor student or athlete way better, it's all in the coaching or teaching, which is like prompting. Certain teachers or coaches are just way better prompt engineers. Prompting is huge.
@IntellectCorner
@IntellectCorner 2 ай бұрын
*𝓣𝓲𝓶𝓮𝓼𝓽𝓪𝓶𝓹𝓼 𝓫𝔂 𝓘𝓷𝓽𝓮𝓵𝓵𝓮𝓬𝓽𝓒𝓸𝓻𝓷𝓮𝓻* 0:02 - Introduction: Reflection 70b Controversy 2:11 - Background on Matt Schumer 4:03 - Community Reactions and Unanswered Questions 5:35 - Sponsor Message 7:31 - Testing Reflection 70b on Hyperbolic Labs 11:02 - Comparing Reflection 70b with GPT-4 and ChatGPT 13:20 - The Importance of Prompting 16:48 - Analysis of the Situation and Possible Explanations 21:01 - Conclusion: The Need for New Benchmarks and Perspectives on LLMs
@vi6ddarkking
@vi6ddarkking 2 ай бұрын
So to use an image generation equivalent. If I am understanding this correctly. Reflection 70b would be the equivalent of having Flux merged with a Lora.
@TLabsLLC-AI-Development
@TLabsLLC-AI-Development 2 ай бұрын
More like a dreambooth Checkpoint in a custom python wrapper
@BackTiVi
@BackTiVi 2 ай бұрын
Can you really compare Reflection 70b to "reflectionless" LLMs if, according to Shumer, you need a system prompt that explicitly tells Reflection 70b how to reflect in order to get good scores in the benchmarks? Doesn't that defeat the purpose?
@MattVidPro
@MattVidPro 2 ай бұрын
Apparently, the system prompt DOESN'T need to be there, it can be adjusted in tuning to not require it. twitter.com/mattshumer_/status/1832169489309561309
@BackTiVi
@BackTiVi 2 ай бұрын
@@MattVidPro Fair. I hope the situation will stabilize soon and we'll get the promised SOTA open-source model, athough I also think that there was something fishy with the API.
@ViralKiller
@ViralKiller 2 ай бұрын
ChatGPT can give code for an entire game but can't do basic maths...makes sense
@MeinDeutschkurs
@MeinDeutschkurs 2 ай бұрын
Exactly my behavior. 😹😹 I cannot calculate, but I can write code.
@eprd313
@eprd313 2 ай бұрын
Verbal intelligence and mathematical reasoning require different processes
@bakablitz6591
@bakablitz6591 2 ай бұрын
im still looking forward to personalized mattvid home entertainment robots... anyday now boys this is the future
@OliNorwell
@OliNorwell 2 ай бұрын
I fear that Matt himself got scammed. I'm sure the truth will come out eventually.
@ToddWBucy-lf8yz
@ToddWBucy-lf8yz 2 ай бұрын
For smaller models this sort of fine-tuning may be able to better compensate for the lack of parameters and quantization. If it can do that, I say its a win.
@TLabsLLC-AI-Development
@TLabsLLC-AI-Development 2 ай бұрын
This is exactly right. 💯
@Dina_tankar_mina_ord
@Dina_tankar_mina_ord 2 ай бұрын
So, this reflection mechanism is like providing a control net to the prompt, ensuring that every answer aligns with the main meaning.
@ShivaTD420
@ShivaTD420 2 ай бұрын
These are just tricks to cause more neurons to light up. The fine tuning process makes prompting easier, since you don't need the complex system prompts
@draglamdraglam2419
@draglamdraglam2419 2 ай бұрын
Ayy, glad to be early for this one, keep doing what you do 💪
@PH-zj6gk
@PH-zj6gk 2 ай бұрын
You totally missed the point. The actual moral of the story is that you absolutely cannot super hype your open source SOTA model and not deliver. He wasted a lot of people's time. Full stop. There's very serious social responsibility that comes with claiming something world changing. If you're curious what actually happened kzbin.info/www/bejne/rYDdlZWuoraViK8
@Citrusautomaton
@Citrusautomaton 2 ай бұрын
I was genuinely really sad when i found out it was a fraud. The promise of reflection made me really excited for this week and it all crumbled within a day or two. I even told other people about it, so i also felt a sense of embarrassment that i fell for it. I’m still salty as hell.
@PH-zj6gk
@PH-zj6gk 2 ай бұрын
@@Citrusautomaton Same. I was actually happy for him at first. It became clear he was being dishonest well before he stopped lying. It was incredibly insulting. His narcissism is off the charts.
@teejayroyal
@teejayroyal 2 ай бұрын
Please run the cords behind your couch, I feel like I'm going to have an anxiety attack😂😂😭
@RainbowSixIntel
@RainbowSixIntel 2 ай бұрын
It's claude 3.5 sonnet probably. has the same tokeniser and matt filters out "claude" from its outputs AND it mentions it was trained by anthropic if you prompt it correctly
@MattVidPro
@MattVidPro 2 ай бұрын
That was just their supposed "API" If you run the actual model uploaded to huggingface you get something different.
@ShivaTD420
@ShivaTD420 2 ай бұрын
He just used claud to train the model. The model is being fine tuned with synthetic data that follows this structure, while claude fixes its mistakes
@canyongoat2096
@canyongoat2096 2 ай бұрын
Not completely out of question as I remember older llama and mistral 7b claiming to be gpt and claiming to be made by openai
@toastbr0ti
@toastbr0ti 2 ай бұрын
The API literally uses Claude tokens, not llama ones
@apache937
@apache937 2 ай бұрын
it return the same exact response at temp 0
@Phagocytosis
@Phagocytosis 2 ай бұрын
Yeah, but didn't he claim it was a finetune of Llama 3.1? EDIT: Oh, I see, you mean the actual finetuning data came from Claude, never mind.
@GamingXperience
@GamingXperience 2 ай бұрын
The problem with promts engineering and benchmarks is, you have to find the promt that works best for that specific model, so it makes sense that we just compare the raw models without any specific system prompts, because thats how most people use them. Which does not mean we shouldn't try to find the best solution for prompting. Use whatever it takes to make the model better. The problem is there are a lot of users that don't care or don't want to try a million prompts. For the big models, maybe the companies behind them could figure out what the best prompts are and just provide those as some kind of help, where they just ask you if you wanna try implementing them into your inputs. That said, i would love seeing comparison benchmarks between models using different prompting strategies. And i also wanna know if this whole reflection thing is actually real or not.
@mrpocock
@mrpocock 2 ай бұрын
I sometimes have one of the smart models generate prompts for dumb ones and iterate until it finds a prompt that makes the dumb model work well.
@konstantinlozev2272
@konstantinlozev2272 2 ай бұрын
Bigger brain = Better
@Yipper64
@Yipper64 2 ай бұрын
17:38 there's a sense in which computers in general are like that. When they first invented computers you basically had to explore what you can do with giving it instructions.
@Slaci-vl2io
@Slaci-vl2io 2 ай бұрын
I wonder how much cooling water was wasted by us testing their wrong model.
@tiagotiagot
@tiagotiagot 2 ай бұрын
Adding the system prompt could be sort of a trigger for specific behaviors the model has been fine-tuned to have; or it could just be the prompt itself doing the work, or it could be the model is fine-tuned to follow any system prompt more strictly/intelligently and it works better with this good prompt than the non-fine-tuned version with the same prompt. I'm not sure how likely each of these possibilities is to this specific case, if any.
@nyyotam4057
@nyyotam4057 2 ай бұрын
In any case, prompting the model is extremely important when you want the model to function a certain way. Getting around the system prompt, is very important when you want to jail break the model or even just try to find out stuff about the model, which the devs try to hide. So first you need to prompt yourself to do what you want to do.
@tylerhatch8962
@tylerhatch8962 2 ай бұрын
Truly open source means you are able to inspect everything yourself. Every line of code, every weight, every parameter. Fakes will happen, this story is a show of the strength of open source. You can investigate the legitimacy of their claims yourself.
@ashleyrenee4824
@ashleyrenee4824 2 ай бұрын
If you can turn your prompt into a point reward game for the model, it will improve llms output, Llms like to play games
@MeinDeutschkurs
@MeinDeutschkurs 2 ай бұрын
I don‘t understand the issue: 1. you live in a capitalistic system. 2) claims like „fake it until you make it“ are propagated frequently, at least afterwards if it worked out. 3) the output of reflection is nothing that you cannot reach with simple prompting (on top of most of the models out there) 4) a double reflection approach could be better.
@dirt55
@dirt55 2 ай бұрын
There will be failures but with each Failure there will be someone succeeding.
@daveinpublic
@daveinpublic 2 ай бұрын
How much ‘training’ is this guy really doing? Is it basically just tweaking llama a little bit, and slapping a new name on it?
@ArmaanSultaan
@ArmaanSultaan 2 ай бұрын
Couple of thoughts They trained their model on data generated by Glaive. What id this synthetic data was by Anthropic actually thats why its started saying like its Anthropic. Obviously it does not explain why model then switched from being Anthropic to being OpenAI Other explanation is that it was just hallucinating . Problem that model is supposed to solve but it hasn't solved actually? Most important point is that I sure as hell remember when I used Deepseek coder when it was just released. It all the time use to say it is by OpenAI .I can't reproduce it anymore. But I remember very vividly and this didn't happened once or twice it was pretty much 80 percent of time. What I mean to say is that if only evidence against him in API situation is model's own statements then we don't have anything. We are taking this kuch more seriously then we should.
@KlimovArtem1
@KlimovArtem1 2 ай бұрын
There is nothing novel in it. It’s just asking the model to think aloud before giving an answer. Such fine tunings are actually done for all public chat models more or less.
@brownpaperbagyea
@brownpaperbagyea 2 ай бұрын
I agree it doesn’t make a lot of sense that it would be a grift because how the hell would he capitalize on this before getting outed. However almost EVERYTHING I’ve seen since the release points to it being a grift. I don’t care if he truly believes his lies or not. The way he presented the model and benchmarks, the manipulation of stars in their HF repo, and everything that has happened since the release has been very grifty.
@brownpaperbagyea
@brownpaperbagyea 2 ай бұрын
Maybe the thing we should question individuals without research backgrounds dropping models that beat the top of the line offerings. I’m not saying that it can’t happen however it seems many accept what he says as fact even in the face of controversy after controversy
@fynnjackson2298
@fynnjackson2298 2 ай бұрын
Love it when you go all philosophical. It would be cool to have you do a rant on the deeper ideas you have about what AI really is and how this all continues evolving into our future. I think AI is a mirror. We have an inspired thought that leads to an action, which then leads to us creating the idea in the physical world. So as we evolve our understanding within us, the technology and what we create outside of us is a kind of mirror or a kind of echo-feedback-loop of our inner journey. Essentially, we are using physical reality as a mirror to wake up to who and what we truly are. AI is just another chapter in this infinite, incredible journey. Buckle up - Things are getting awesome!
@ScottLahteine
@ScottLahteine 2 ай бұрын
If you remember that token prediction is based on everything available in the current context, that helps to make these models more useful. Maybe that explains why they are so bad at improvising anything very cohesive. Yesterday I needed a simple Python script to do a very specific set of checks on a text file, so I typed out the precise details of what I wanted in a step-by-step comment, and the model got the code 99% right the first time. “Prompting” is a good term, because you often have to do a lot of prompting to get what you want.
@draken5379
@draken5379 2 ай бұрын
Do you recall me showing you gpt3.5 years ago doing insane things ? Likes trying to email you, controlling an avatar etc ? Ya. Prompting is big :)
@Someone7R7
@Someone7R7 2 ай бұрын
I did the same thing and even way better with just a system prompt, this doesn't need fine tuning😒🤨😶
@ashleyrenee4824
@ashleyrenee4824 2 ай бұрын
Thank you Matt 😊
@MagnusItland
@MagnusItland 2 ай бұрын
I think the main problem with LLMs is that they are trained on human output, and humans often suck. LLMs are unlikely to learn native self-reflection by emulating Twitter and Reddit.
@Windswept7
@Windswept7 2 ай бұрын
I forget that good prompting isn’t obvious to everyone.
@LjaDj5XQKey9mSDxh4
@LjaDj5XQKey9mSDxh4 2 ай бұрын
Prompt engineering is actually a real thing
@dennisg967
@dennisg967 2 ай бұрын
I really dont get how a model could "reflect" on the answer it provided to give an even better answer. The initial answer it outputs is supposed to be the one with the highest probability already. How can it use it again to make another answer have even higher probability?
@kuromiLayfe
@kuromiLayfe 2 ай бұрын
well if you take a trip to the store and the shortest route happens to be closed off, you will have to backtrack and take a bit longer route to get to the same destination. for your brain that is reflection as you made a mistake and had to go again to make a new decision to still get at the same endpoint.
@ShivaTD420
@ShivaTD420 2 ай бұрын
It's not picking a token that is the right answer. It's picking the next most likely token. It's just a coincidence that these two things align. If I ask you if yesterday was Sunday. You can just say yes, and be correct and put in minimal effort. You could also say you don't remember , or you aren't sure. These are also technically valid answers for the competition of your response. These "think about it" prompts are just forcing the model to use more neurons. If I asked you to talk about how you know yesterday was Sunday, or how you felt on Sunday. Then your using more neurons , and spending more Joules to respond.
@dennisg967
@dennisg967 2 ай бұрын
@@ShivaTD420, so you are saying that at first, the model is trying to give an answer while using little information or resources, but if a user prompts it to use more information/resources to come up with a better answer, it will do that? If that's what you mean, it sounds like an additional prompt from the user is needed. If the model were to prompt itself to use more info/resources, I don't see the point in figuring out the first, less complete, answer. Let me know what you think
@dennisg967
@dennisg967 2 ай бұрын
@@kuromiLayfe, but in your example, you gain more information by finding out that the first route is blocked off. How does the model gain more information between the initial response and the more thought out response?
@kuromiLayfe
@kuromiLayfe 2 ай бұрын
@@dennisg967 branching thought processes.. you already have seen a different route on the way but your main one was cut off or wrong so you think about the other one you also already learned about.
@YaelMendez
@YaelMendez 2 ай бұрын
It’s an amazing platform.
@ytubeanon
@ytubeanon 2 ай бұрын
I randomly saw some of Matt Schumer's stream about reflection, he rubbed me the wrong way, seemed overly egotistical about "reflection"... you'd think there'd be some way to use A.I. to reverse engineer optimal prompts, have it run tests with the answer sheet overnight and it will rank the prompt templates that generated the best results... I would like to see a video with gpt-4o-mini-reflection
@travisporco
@travisporco 2 ай бұрын
is it really true that they've established that the api was a wrapper for Claude? I don't think so.
@Copa20777
@Copa20777 2 ай бұрын
Thanks matt ☀
@FRareDom
@FRareDom 2 ай бұрын
we need to wait for the 405b model to rlly say anything
@InsideYouTubeMinds
@InsideYouTubeMinds 2 ай бұрын
Wouldve been better if you named the video "NEW LLM MODEL HAS DRAMA" or something similar, i wouldve clicked instantly. but just hearing a new LLM doesnt excite many people
@DanieleH-t5v
@DanieleH-t5v 2 ай бұрын
Ok I’m no pro at this area of AI, but all I can gather is something shady is happening 😅
@iminumst7827
@iminumst7827 2 ай бұрын
From the beginning, I interpreted this model to be a prompt-engineering / architecture improvement to fine tune the model. I never expected a huge leap forward, and the "reflection" process does eat up some tokens. However, I had read papers that even just having an LLM double-check itself does noticeably improve performance. From my personal testing, I found that reflection did beat claude's free model in logic based questions. It's obviously no competitor to GPT-5, and I don't expect even the bigger reflection model to be. Sure, maybe for the benchmarks he just used some cherry-picking and prompt manipulation to make the model seem too powerful, but in reality it's still more powerful than Llama, so I don't see how it's a scam really.
@TLabsLLC-AI-Development
@TLabsLLC-AI-Development 2 ай бұрын
Exactly. 💯
@michelprins
@michelprins 2 ай бұрын
"It's obviously no competitor to GPT-5 " how do u know that ??? maybe GPT-5 is just gtp 4.5 with the same trick build in we cant tell as there is no transparency! , behind the closed model wall, and also alot of paid for hype ! did u tried altmans video ai yet for example? Ipen source is the only way forward ! or pay 2000 dollar a month :P
@m2mdohkun
@m2mdohkun 2 ай бұрын
What's positive about this is I get a good system prompt? Noice!
@JustaSprigofMint
@JustaSprigofMint 2 ай бұрын
I'm turning 36 in 7 days. I'm really fascinated by AI. Is it still possible for me to get into programming or just an out-of.reach pipedream? I feel like I'm too late. I was never very confident in my programming skills in school and we only learned the basic stuff. Even C++ didn't make a lot of sense to me, while my elder brother was the best in his class. But I believe I want to work in this field. How/what can I do?
@monstercolorfunco4391
@monstercolorfunco4391 2 ай бұрын
Humans have parallel logic paths to double check.every step.of their maths, their count, their deductions, so we can.make a query take parallel checks in LLMs too. Volumetrically think of itlike traversing the NN on different paths and summing the.result. its a genius tweak. Inner convo is also like 3 4 brains working together through notes, so we can use a 70bn llm like 2x 70bn llms.
@agnosticatheist4093
@agnosticatheist4093 2 ай бұрын
For me Mistral large enough is so far best model
@MONTY-YTNOM
@MONTY-YTNOM 2 ай бұрын
I don't see it as an option in the LLM list now
@vickmackey24
@vickmackey24 2 ай бұрын
Only 67 Github contributions in the past year, doesn't know what LoRa is, and you think this guy is a serious AI leader/developer? C'mon.
@tommylir1170
@tommylir1170 2 ай бұрын
Absolute scam. Not only did they use a claude wrapper, but the reflection prompt made claude also perform worse 😂
@TheFeedRocket
@TheFeedRocket 2 ай бұрын
I really think models will continue to get even smaller, actively learn, but not do everything. I only want to one day have my own model that can actively learn from me, as I talk to it, it will learn. Then it can learn about what I like, what I need, basically we should all be able to fine tune models we run locally on our devices or robots that know us, my model doesn't need to know everything. Also we should have many types of models that can talk to each other. An AI robot delivering my mail doesn't need to have a huge AGI model, it doesn't need to know how to fix cars, or build programs, solve science problems, heck if my garbage robot doesn't know how many r's in strawberry...who cares, it just needs basics and info on garbage disposal, types, toxins, interactions with life etc... I think the idea of one model to rule them all is wrong, example I would rather use Ideogram for logos etc.. and MidJourney for art, Flux for realism... We need AI that excels in certain areas, then talk to other AI that excels in another. AI agents and teams will be the future, might even be safer.
@Alex-nk8bw
@Alex-nk8bw 2 ай бұрын
The model might be a hoax, but the system prompt is working really well. That's something at least, I guess. ;-)
@SCHaworth
@SCHaworth 2 ай бұрын
isnt "hyperbolic labs" kind of a red flag?
@robertopreatoni
@robertopreatoni 2 ай бұрын
Why is he streaming from his sister's bedroom?
@JohnWeas
@JohnWeas 2 ай бұрын
YOOO MATT
@quercus3290
@quercus3290 2 ай бұрын
nividia/microsofts, Megatron is a 500 billion model.
@domehouse79
@domehouse79 2 ай бұрын
Nerds are entertaining.
@RenatoFlorencia
@RenatoFlorencia 2 ай бұрын
PAPO RETOOOOOOO
@jamessharkin
@jamessharkin 2 ай бұрын
Have you ever used that comb you are vigorously waving around? 🤔😁😆
@MattVidPro
@MattVidPro 2 ай бұрын
BAHAHAH 😅
@haroldpierre1726
@haroldpierre1726 2 ай бұрын
Lots of grifters during the AI hype train starting with Altman, Musk, etc. So, everything has be taken with a grain of salt.
@snintendog
@snintendog 2 ай бұрын
Grifters... The people that made the most AI contributions but not every company under the sun calling a telephone system an ai..... Riiiiigghhhht
@haroldpierre1726
@haroldpierre1726 2 ай бұрын
@@snintendog Sometimes even our heroes lie.
@SpeedyCreates
@SpeedyCreates 2 ай бұрын
@@snintendog😂fr thought the same, they ain’t grifters they all pisuehd the industry forward so damn much
@MusicalGeniusBar
@MusicalGeniusBar 2 ай бұрын
Super confusing story 😵‍💫
@MattVidPro
@MattVidPro 2 ай бұрын
Yeah and still not adding up...
@Norem123
@Norem123 2 ай бұрын
Second
@thedannybseries8857
@thedannybseries8857 2 ай бұрын
lol
@ShiroAisan
@ShiroAisan 2 ай бұрын
oppp
@supermandem
@supermandem 2 ай бұрын
AI died when Matt Schumer lied!
@SkyEther
@SkyEther 2 ай бұрын
Lmao with the how many Ls problem
@cbnewham5633
@cbnewham5633 2 ай бұрын
16:47 ALLEGEDLY lied. Unless you want to be sued 😏
@TPCDAZ
@TPCDAZ 2 ай бұрын
He said apparently which works just fine, it means "as far as one knows or can see"
@cbnewham5633
@cbnewham5633 2 ай бұрын
​@@TPCDAZno he didn't. He said "we assume he would know more" followed by "he lied".
@cbnewham5633
@cbnewham5633 2 ай бұрын
I doubt he will be sued, but sometimes these people can get bent out of shape and do silly things - especially if under fire. Personally, I wouldn't have said that and I'd have second thoughts about leaving it up. Matt clearly says he lied - that's slander.
@TPCDAZ
@TPCDAZ 2 ай бұрын
@@cbnewham5633 No he clearly says "Now apparently he's lied about the whole API situation with Claude" I have ears and so does everyone else. This video also has captions where it is written in black and white. So don't sit there and lie to people.
@michelprins
@michelprins 2 ай бұрын
YOU NEED TO RETHINK HOW YOU PUT a commercial in the middleof ure message ure like the host that invited us for a nice dinner and in the middleof preparing u inform us ure taking a large dump taking all the apetite away. If u realy need that extra cash at least do it at the end like all other wise youtubers the way u do it now shows us u have more respect for the commercials then for your viewers not nice. And also give Matt Shumer a chance to show his method does work Aply the same scepsis to m.altmans claims like the ai video stuff were still waiting for ! Q star is now used for training theire bigest model and the only transparency "open" AI gave was a name change to strawberry with 3 r's and u all swallowed that like Altmans .... on a strawberry Its white but i wont asume its whipped cream without testing it . btw no need to comb ure hair . ;)
@gabrielkasonde367
@gabrielkasonde367 2 ай бұрын
First comment Matt
@InternetetWanderer
@InternetetWanderer 2 ай бұрын
First?
@coinwhere
@coinwhere 2 ай бұрын
R Shumer has been made LLM related miscellaneous apps and that's it.
Flux CANNOT be stopped! They Just Keep Shipping NEW AI Tools!
22:11
MattVidPro AI
Рет қаралды 11 М.
Introducing Lindy 2.0 - The FIRST True AI-First Automation Platform
27:17
This Game Is Wild...
00:19
MrBeast
Рет қаралды 128 МЛН
Ice Cream or Surprise Trip Around the World?
00:31
Hungry FAM
Рет қаралды 20 МЛН
HELP!!!
00:46
Natan por Aí
Рет қаралды 75 МЛН
The Right Way To Train AGI Is Just GOOD Data?
15:52
bycloud
Рет қаралды 26 М.
AWS CEO - The End Of Programmers Is Near
28:08
ThePrimeTime
Рет қаралды 548 М.
This AI Tool Might Make Learning RIDICULOUSLY Easy
36:42
MattVidPro AI
Рет қаралды 57 М.
Has Generative AI Already Peaked? - Computerphile
12:48
Computerphile
Рет қаралды 1 МЛН
The Most POWERFUL AI Storytelling Tool of 2024 is Here.
26:06
MattVidPro AI
Рет қаралды 29 М.
The Easiest Design Tool is also the Most POWERFUL. (thanks to AI)
19:49
Qwen Just Casually Started the Local AI Revolution
16:05
Cole Medin
Рет қаралды 77 М.
I'm OBSESSED with this free Notetaking/Podcast AI Generator
31:38
MattVidPro AI
Рет қаралды 39 М.
Ideogram 2.0 is my new Favorite Image Gen! | First Look
21:37
MattVidPro AI
Рет қаралды 27 М.
This Game Is Wild...
00:19
MrBeast
Рет қаралды 128 МЛН