Deepseek-R1-Lite (Tested): This OPENSOURCE Model BEATS O1 & CLAUDE 3.5 SONNET!?

Рет қаралды 15,200

AICodeKing

Күн бұрын

Пікірлер: 47

@robinmountford5322 6 күн бұрын

Sonnet is a force to be reckoned with when it comes to coding.

@DGFilmsNYC 7 күн бұрын

I gave deep think a system prompt before I, rewrote your confetti prompt because when you prompt cline you say it like this write a website in HTML, CSS , and JS, in your test you say you can use CSS and JS , I got the confetti in one shot, just press run html in the window... Let's add a 2nd test update the questions and a/b test the tests to really test these models

@vdbv0 7 күн бұрын

Oh lord ! Just tested it, it's wild ! Loving it ! I'm hoping the API cost is the same as now, if it's the case I will forget Heroku quickly!

@ANSHU61936 7 күн бұрын

What do you mean by you will forget heroku? They also have some kind of Bedrock alternate?

@vdbv0 7 күн бұрын

@@ANSHU61936 Claude Haïku sorry! Wrote too quick 🙏

@luizbueno5661 6 күн бұрын

I’m curious about how to drop Heruku and why you could do that once deepseeks api costs little

@ANSHU61936 6 күн бұрын

@@vdbv0 same question again, why are you dropping heroku? They also provide some kind of ai model?

@localscripted 3 күн бұрын

someone please answer this guy @@ANSHU61936

@bot. 7 күн бұрын

Now this is interesting to see. Finally a new model showing highly promising results. Well lets see what I think of it. Also, forgive me if it is a bit chaotically structured, I am writing this as I watch the video. With that out of the way, let us get started! As weird as it is, I would consider test one neither a fail nor pass, as what the model went through eerily resembled a human being stunned by a question, and not seeing the logical answer immediately. Hard to say how this can be improved, but I theorise the problem may solve itself once the model is given more time to think without rushing. Maybe even having it change perspectives at some point? Moving on to test #4, we can tell that it did objectively fail, but the reasoning chain was obviously halted prematurely, presumably by the system itself to limit the amount of tokens spent on thinking. Smart choice by DeepSeek, yet obviously a performance limiter in cases like these. Would love to see what the answer would be if given as much time as it wants. Yet again, if they release the model open source down the line, these compute power problems can be solved easily by the users themselves (of course assuming that it is not an absurdly big model, which it unfortunately does seem to be the case here. Would love to be incorrect on the size though.) Yet again, more compute time will not solve all the problems, as we can see from test #9 that it was unable to create a proper website for the confetti. Unfortunately though, we did not see the code perform outside of DeepSeek's own environment, which may itself be the limiting factor in this case, not the model itself. For more rigour the code should have been run also through more conventional means just in case, something like how the Python code was externally executed. (Also, do pardon me if my assumptions about DeepSeek's environment are incorrect, I am not that familiar with web frameworks or their execution.) I would say something similar for test #12, but I did not catch if the DeepSeek environment was used, so I am forced into mere speculation for this. Sorry for the long paragraph, but moving on to Test #11, I would consider it a fail from an artistic perspective, but the model itself was most likely not trained on SVG creation, so the expected potential is rather low. However, it is still impressive that it created a general shape of a butterfly. All in all, a very, very exciting model. Especially if it is able to be used on most systems.

@TURKLERDIZIS 7 күн бұрын

claude sonnet 3.5 is the best choice for coding

@aculz 7 күн бұрын

well, this is great result for model named "Lite" which can almost beat o1 not just o1-mini. im very sure that might the "Large" one can beat sonnet aswell so we can have an Greatest Open Source model and much much cheaper than sonnet. cant wait for another brilliant move from this company

@LetYourLightShine5218 7 күн бұрын

The AI's response to Test #3 was correct but it would have been interesting if it had been able to further speculate that C possibly was the other person playing table tennis with E unless E was playing solo with the table against the wall.

@jargoti20 6 күн бұрын

Interesting comparison. I would love to see the API coming out so we can implement it in our own apps

@AnugrahPrahasta 7 күн бұрын

WOW. Finally, deepseek!

@stephensamuel2770 7 күн бұрын

First to view, first to comment. This is quite an impressive. I have used it and the results is so amazing.

@perfectartiste6332 7 күн бұрын

good one, this will really be a game changer

@PrinzMegahertz 6 күн бұрын

With regards to question 2 - shouldn't C be playing table tennis with E? If noone else in the house and C is not playing, who is E playing with?

@LetYourLightShine5218 7 күн бұрын

While the "thinking outside the box" is impressive I think the AI failed Test #1 for 2 reasons. First, the AI said >>there doesn't appear to be any country with an official English name that ends in "lia."

@BeastModeDR614 7 күн бұрын

Nice open models are getting close, Cant wait until we can run cline locally and do full stack applications with no limit

@michaelrichey8516 7 күн бұрын

logically, your 3rd question has a better answer than "unknown" Statement 1 says there are 5 people in a house, naming them. 4 people are given activities with the 5th (C) not being mentioned. E, however, is playing table tennis, a 2 player game. Logically, E is playing with C, because there are 5 people in the house and table tennis cannot be played alone.

@TheProtein83 7 күн бұрын

Great job. I disagree with the opinion about CoT and coding. In case of complicated architectures, thinking steo by step should provide better results

@MeinDeutschkurs 7 күн бұрын

Have you tested AYA? Great for structured outputs.

@idea_list 6 күн бұрын

I wonder if the answer to question 3 should be "playing table tennis". It's hard to imagine that E is playing tennis solo, right..?

@luizgustavs 7 күн бұрын

i like very much the artstyle of the images in the beginning of your videos, would you mind share the prompt to get this art style? Would be greatly appreciated

@AICodeKing 7 күн бұрын

It's very basic.. Something like "A panda in a forest, in front of a campfire, cinematic, anime style".

@eado9440 7 күн бұрын

Open source model this good is crazy, just hope it's not like 500 GB

@aculz 7 күн бұрын

its okay for 500GB, if we cant use it on our local then we can use their's which is crazy cheap than sonnet and gpt

@dixalex02 7 күн бұрын

I'm having trouble creating a markdown file of the pixijs api. Something about the url syntax prevents it from being scraped. Any advice?

@HikaruAkitsuki 5 күн бұрын

I wonder if this is usable for doing Thesis? Anyway, I don't think it's monologue is necessary.

@kevinehsani3358 4 күн бұрын

I stuck in some code to debug and it said it can not assist me with that, it did not even try did anyone else had that experience?

@rassular 7 күн бұрын

Can you test the new Mistral Large 2411?

@gabrielkasonde367 7 күн бұрын

Yoooooo😂🎉

@TawnyE 7 күн бұрын

E Edit: I just now noticed on the top commenter with the most hearted comments, that is a pog

@DouhaveaBugatti 6 күн бұрын

Suppose we combine it with the coder model☠️

@thatbeezie 7 күн бұрын

How do you use this open source in like in aider or va code extensions/apps?

@aculz 7 күн бұрын

you need to pay to use their api

@flutterflowexpert 7 күн бұрын

I think you need a new benchmark 😂

@다루루 7 күн бұрын

😊

@warlockassim4240 7 күн бұрын

first and bro answer how to make aider detect my local project?

@AICodeKing 7 күн бұрын

It should probably detect it automatically.

@Gorops 7 күн бұрын

Are you running it in the project folder/repository?

@wasimdorboz 7 күн бұрын

@@AICodeKing i am using linux and it not , i think i should do /save then /load not ?

@wasimdorboz 7 күн бұрын

@@Gorops yep

@shay5338 7 күн бұрын

haha 8th one to comment, it would be cool if you were to show how can we access these llm models for free without any limits

@aculz 7 күн бұрын

well. just install it on your ollama or LM Studio and use it locally. but be sure you have the greatest GPU or it will perform very slow

@randomlettersqzkebkw 6 күн бұрын

openai has no moat