Sonnet is a force to be reckoned with when it comes to coding.
@DGFilmsNYC7 күн бұрын
I gave deep think a system prompt before I, rewrote your confetti prompt because when you prompt cline you say it like this write a website in HTML, CSS , and JS, in your test you say you can use CSS and JS , I got the confetti in one shot, just press run html in the window... Let's add a 2nd test update the questions and a/b test the tests to really test these models
@vdbv07 күн бұрын
Oh lord ! Just tested it, it's wild ! Loving it ! I'm hoping the API cost is the same as now, if it's the case I will forget Heroku quickly!
@ANSHU619367 күн бұрын
What do you mean by you will forget heroku? They also have some kind of Bedrock alternate?
@vdbv07 күн бұрын
@@ANSHU61936 Claude Haïku sorry! Wrote too quick 🙏
@luizbueno56616 күн бұрын
I’m curious about how to drop Heruku and why you could do that once deepseeks api costs little
@ANSHU619366 күн бұрын
@@vdbv0 same question again, why are you dropping heroku? They also provide some kind of ai model?
@localscripted3 күн бұрын
someone please answer this guy @@ANSHU61936
@bot.7 күн бұрын
Now this is interesting to see. Finally a new model showing highly promising results. Well lets see what I think of it. Also, forgive me if it is a bit chaotically structured, I am writing this as I watch the video. With that out of the way, let us get started! As weird as it is, I would consider test one neither a fail nor pass, as what the model went through eerily resembled a human being stunned by a question, and not seeing the logical answer immediately. Hard to say how this can be improved, but I theorise the problem may solve itself once the model is given more time to think without rushing. Maybe even having it change perspectives at some point? Moving on to test #4, we can tell that it did objectively fail, but the reasoning chain was obviously halted prematurely, presumably by the system itself to limit the amount of tokens spent on thinking. Smart choice by DeepSeek, yet obviously a performance limiter in cases like these. Would love to see what the answer would be if given as much time as it wants. Yet again, if they release the model open source down the line, these compute power problems can be solved easily by the users themselves (of course assuming that it is not an absurdly big model, which it unfortunately does seem to be the case here. Would love to be incorrect on the size though.) Yet again, more compute time will not solve all the problems, as we can see from test #9 that it was unable to create a proper website for the confetti. Unfortunately though, we did not see the code perform outside of DeepSeek's own environment, which may itself be the limiting factor in this case, not the model itself. For more rigour the code should have been run also through more conventional means just in case, something like how the Python code was externally executed. (Also, do pardon me if my assumptions about DeepSeek's environment are incorrect, I am not that familiar with web frameworks or their execution.) I would say something similar for test #12, but I did not catch if the DeepSeek environment was used, so I am forced into mere speculation for this. Sorry for the long paragraph, but moving on to Test #11, I would consider it a fail from an artistic perspective, but the model itself was most likely not trained on SVG creation, so the expected potential is rather low. However, it is still impressive that it created a general shape of a butterfly. All in all, a very, very exciting model. Especially if it is able to be used on most systems.
@TURKLERDIZIS7 күн бұрын
claude sonnet 3.5 is the best choice for coding
@aculz7 күн бұрын
well, this is great result for model named "Lite" which can almost beat o1 not just o1-mini. im very sure that might the "Large" one can beat sonnet aswell so we can have an Greatest Open Source model and much much cheaper than sonnet. cant wait for another brilliant move from this company
@LetYourLightShine52187 күн бұрын
The AI's response to Test #3 was correct but it would have been interesting if it had been able to further speculate that C possibly was the other person playing table tennis with E unless E was playing solo with the table against the wall.
@jargoti206 күн бұрын
Interesting comparison. I would love to see the API coming out so we can implement it in our own apps
@AnugrahPrahasta7 күн бұрын
WOW. Finally, deepseek!
@stephensamuel27707 күн бұрын
First to view, first to comment. This is quite an impressive. I have used it and the results is so amazing.
@perfectartiste63327 күн бұрын
good one, this will really be a game changer
@PrinzMegahertz6 күн бұрын
With regards to question 2 - shouldn't C be playing table tennis with E? If noone else in the house and C is not playing, who is E playing with?
@LetYourLightShine52187 күн бұрын
While the "thinking outside the box" is impressive I think the AI failed Test #1 for 2 reasons. First, the AI said >>there doesn't appear to be any country with an official English name that ends in "lia."
@BeastModeDR6147 күн бұрын
Nice open models are getting close, Cant wait until we can run cline locally and do full stack applications with no limit
@michaelrichey85167 күн бұрын
logically, your 3rd question has a better answer than "unknown" Statement 1 says there are 5 people in a house, naming them. 4 people are given activities with the 5th (C) not being mentioned. E, however, is playing table tennis, a 2 player game. Logically, E is playing with C, because there are 5 people in the house and table tennis cannot be played alone.
@TheProtein837 күн бұрын
Great job. I disagree with the opinion about CoT and coding. In case of complicated architectures, thinking steo by step should provide better results
@MeinDeutschkurs7 күн бұрын
Have you tested AYA? Great for structured outputs.
@idea_list6 күн бұрын
I wonder if the answer to question 3 should be "playing table tennis". It's hard to imagine that E is playing tennis solo, right..?
@luizgustavs7 күн бұрын
i like very much the artstyle of the images in the beginning of your videos, would you mind share the prompt to get this art style? Would be greatly appreciated
@AICodeKing7 күн бұрын
It's very basic.. Something like "A panda in a forest, in front of a campfire, cinematic, anime style".
@eado94407 күн бұрын
Open source model this good is crazy, just hope it's not like 500 GB
@aculz7 күн бұрын
its okay for 500GB, if we cant use it on our local then we can use their's which is crazy cheap than sonnet and gpt
@dixalex027 күн бұрын
I'm having trouble creating a markdown file of the pixijs api. Something about the url syntax prevents it from being scraped. Any advice?
@HikaruAkitsuki5 күн бұрын
I wonder if this is usable for doing Thesis? Anyway, I don't think it's monologue is necessary.
@kevinehsani33584 күн бұрын
I stuck in some code to debug and it said it can not assist me with that, it did not even try did anyone else had that experience?
@rassular7 күн бұрын
Can you test the new Mistral Large 2411?
@gabrielkasonde3677 күн бұрын
Yoooooo😂🎉
@TawnyE7 күн бұрын
E Edit: I just now noticed on the top commenter with the most hearted comments, that is a pog
@DouhaveaBugatti6 күн бұрын
Suppose we combine it with the coder model☠️
@thatbeezie7 күн бұрын
How do you use this open source in like in aider or va code extensions/apps?
@aculz7 күн бұрын
you need to pay to use their api
@flutterflowexpert7 күн бұрын
I think you need a new benchmark 😂
@다루루7 күн бұрын
😊
@warlockassim42407 күн бұрын
first and bro answer how to make aider detect my local project?
@AICodeKing7 күн бұрын
It should probably detect it automatically.
@Gorops7 күн бұрын
Are you running it in the project folder/repository?
@wasimdorboz7 күн бұрын
@@AICodeKing i am using linux and it not , i think i should do /save then /load not ?
@wasimdorboz7 күн бұрын
@@Gorops yep
@shay53387 күн бұрын
haha 8th one to comment, it would be cool if you were to show how can we access these llm models for free without any limits
@aculz7 күн бұрын
well. just install it on your ollama or LM Studio and use it locally. but be sure you have the greatest GPU or it will perform very slow