This is Llama finetuned by Deepseek R1. It's not distillation of the 671b model. Their naming has people confused.
@GosuCoder20 сағат бұрын
100% right, I hope I was clear in the video on that! My main test was to see how DeepSeek modified the original Llama 3.3 model with its distillation of knowledge.
@rrrrazmatazzz-zq9zy18 сағат бұрын
I do like this type of format, thanks for making this.
@GosuCoder14 сағат бұрын
Amazing thank you!!!
@ChatGTA34512 сағат бұрын
Very helpful, thanks! Would be glad to see more comparisons
@GosuCoderКүн бұрын
What are your thoughts are DeepSeek R1 Distill Llama 70B?
@casper7555917 сағат бұрын
Im all here for it! Liked, Subbed, and Bell Ranged! I would much rather have a video talking about this than a Ai AI-generated article. They still did the work but it feels better when you see or hear a person instead.
@GosuCoder14 сағат бұрын
Thank you so much that means so much to me!
@DaveEtchells22 сағат бұрын
It’s interesting that there’s such an incredibly wide range of variation from one run to another; everything from scrolling screens of ASCII, to a complete website. Also sometimes paragraphs of thinking, other times just spitting out the answer. I wonder what’s going on with that? I understand that the LLMs are stochastic, but it seems strange to see such a wide variation in behavior with the same input. I think there’s something in this that’s pointing out an obstacle to true AGI; an actual reasoning, intelligent program wouldn’t be anywhere near this random, and my intuition is that we won’t get there with this as the underlying basis. (OTOH, maybe it could still work with some sort of a higher supervisory level guiding the inference? It’s been well established that LLMs are much smarter at analyzing vs generating.) Fascinating as always, thanks! (Your channel should have at least 100x the current subscribers!)
@GosuCoder20 сағат бұрын
Yeah I’m perplexed on the variance as well which is making me test different temperature settings to see if that is playing a role in it. Part of me thinks there is something foundational to these reasoning models that causes them to go into thinking mode. And somehow what I’m doing bypasses that at times. Thank you so much for the kind word that means so much to me!
@DaveEtchells16 сағат бұрын
@@GosuCoder Yeah, it doesn't seem that even moderate changes in the prompt should result in such divergent behavior. Even if your prompts were a little different, that shouldn't cause the LLM to respond so totally differently. I wonder if there's a time-dependence and Deepseek is very actively training on and incorporating previous interactions? Or perhaps once it's seen and thought-through a particular problem, that gets integrated into the model somehow? If that's the case, it would be a real breakthrough, much more so than just the (supposed) efficiency of R1. Did the immediate-response instances always occur after the model had already thought through the question once, or were there also situations in which you saw an immediate answer and then later a "thinking" one? (Of course you could spend multiple days 24/7 testing all the variations 😁) (FWIW, I'm suspicious of the narrative that this was all just a side-project and only cost $5MM to train. - Or that the inference is actually as cheap as what they're charging. People in the know could have made _huge_ profits by short-selling nVidia and others ahead of the R1 hype. And it'd very much suit the CCP's goals to throw a monkey wrench into Western AI infrastructure investment. We live in interesting times, as the ancient saying goes😉)
@GosuCoder14 сағат бұрын
I've actually been becoming more suspicious of the 5m to train as well. It would definitely be a 4D chess move if this was all a ploy by the CCP to derail western advancements. I do know that major constraints can lead to some amazing innovations, which I think happened here, but its very likely the story is being embellished some.
@JC.7211 сағат бұрын
@GosuCoder I understand this is a tech coding channel and I probably shouldn't talk about politics and this will probably be removed shortly anyways and I say it here because I kinda liked u for seemingly being a genuine tech person. and what I will say is that don't believe everything u hear out there. And the Chinese aren't as shady as u think. Sure they are probably 100x more capable of being shady than u would ever imagined if they go that route but that's not the path they choose. at least that's not the current stance of the so called CCP. Officially it's called CPC btw. The most recent news there right now is that deepseek does not need Nvidia's cuda to run. The bottom line is let's just focus on the tech it's what most of us tech enthusiasts love and it actually brings everyone together instead of constantly thinking someone is trying to 1 up on u in unconventional ways. Yes the Chinese are flexing and swinging hard now but the ploy u guys thinking they are doing, no offense in my opinion are all too low level. They are aiming for the home run now and not for the base steal. Simply because they can do so much more and is more capable than the tricks u are suspecting.
@JC.7211 сағат бұрын
Check toms hardware article title "DeepSeek's AI breakthrough bypasses industry-standard CUDA for some functions, uses Nvidia's assembly-like PTX programming instead"
@Aiworld20258 сағат бұрын
I had to open my eyes, I almost thought that was Sam Altman speaking
@bufuCS9 сағат бұрын
nice video, here a few suggestions for future test videos: 1. If there is an issue with a result (cutoff response due to API issues, etc.), you should rerun the test. 2. As the results vary so much from iteration to iteration, i think you should run each test a few times to get an avg and determine the winner that way, that would make it more fair imo 3. test the provided code to see if they even work as expected
@GosuCoder4 сағат бұрын
Thank you for those I’ll definitely incorporate
@kanguruster13 сағат бұрын
How good is "looks correct" as a measure for the comparisons? All of my code looks correct, before the compiler, interpreter or unit tests tell me otherwise. It's a genuine question; I have no idea how state of the art code assistants work these days and maybe the models ability to give convincing seeming results is a non-issue.
@GosuCoder12 сағат бұрын
I’ve been thinking about this, I’m working on updating my side by side to see if I can also run the code. I ran some of those but the video would be too long to run all of them the way I was. I’m wondering if I can provide tests and format the output into something I can automate and show
@ominoussage9 сағат бұрын
I hope a new paper comes out to teach thinking models not to think too much on easy questions. A new algo + RL would fix that. Surprised how close they were but R1 70B took way more time to answer. I just see that as a loss.
@RickeyBowers11 сағат бұрын
Most these types of tests are very subjective based on user expectation and style. For example, Claude Sonnet 3.5 is amazing, but often requires a dialog to reach expectation and the code is usually more efficient in practice. OpenAI o1 can produce more complex/modular code. Gemini has great conversation, but the code is lacking. All these opinions are based on my choice of programming language and development domain - ymmv.
@GosuCoder4 сағат бұрын
You’re 100% right, it is a very subjective thing for me as well.
@ricardokullock25359 сағат бұрын
The comparison would be more interesting against Llama 3.3 70b. The title in the column you show is just Llama 3 70b.
@GosuCoder4 сағат бұрын
I need to update that it is Llama 3.3
@r9999t9 сағат бұрын
How about NOT asking Leetcode problems. There's vast amounts of training data that directly solves Leetcode problems, but if you at least restate them, choose different values, or preferably choose a problem that has never been included in Leetcode, you would get a far better results. Do you really want an AI assistant that can solve Leetcode problems, or do you want an AI assistant that can help you with real world code. I recently tried to get several AI systems to code a disk based extendable hash, all of them fell flat on their faces. It was difficult to even get them to create a file on disk, they would save everything in RAM and say it was good.
@GosuCoder4 сағат бұрын
Oh you gotta check out my video coming out today! This video was more to see how DeepSeek distillation changed Llama 3.3. Today it’s throwing real code at Llama 3.3.
@r9999t2 сағат бұрын
@@GosuCoder Sounds good, I'll check it out...
@Leto2ndAtreides15 сағат бұрын
There's now a Qwen 2.5 Max ... That may be better than Claude.
@GosuCoder14 сағат бұрын
I've been testing that one today as well!
@robtho3786Күн бұрын
Use the model in VS code using cline extension and show us some magic
@GosuCoderКүн бұрын
I’ve got about 10 hours of footage I’m working through of using it. Hopefully tomorrow I’ll have that out to check out!
@DaveEtchells22 сағат бұрын
Yikes, that’s insane! Not just that you recorded 10 hours of footage, but that now you’re going to _edit_ all of that 😮