Qwen-2.5 Coder (32B) + Cline & Aider + Free API : This NEW AI Coding Model BEATS Claude 3.5 Sonnet!?

Рет қаралды 26,584

AICodeKing

Күн бұрын

Пікірлер: 120

@ShamusMac 15 күн бұрын

If this video is a true reflection of its capabilities, benchmarks aren't just bad, they are broken.

@ctwolf 14 күн бұрын

this 100%

@TheSuperColonel 15 күн бұрын

I like your channel; it's straight to the point. There is likely a lot of competition among the AI with tons of hype. We will see how many of them will survive the next 5 years.

@maddoxthorne2297 15 күн бұрын

Others: It answers the benchmark questions well so no need to run it. AICodeKing: Hold my beer.👑

@ctwolf 14 күн бұрын

AICodeKing is actually a deity

@bamit1979 15 күн бұрын

Thank you for saving our time! :)

@TaughtByTech 15 күн бұрын

i know right. really the AI king

@ashgtd 15 күн бұрын

yup saved me a big fat download today

@notme2136 12 күн бұрын

yup, saved me a chunk of my time this week.

@Quitcool 11 күн бұрын

Wrong, that's a great model according to other youtubers and the open source community.

@ashgtd 11 күн бұрын

@@Quitcool are they just saying that for clicks though? if I see a video with this model not sucking ass then I'll try it

@CPM94 15 күн бұрын

Those dancing pokemons clearly stole the spotlight of the vidoe

@Andres-m2u 11 күн бұрын

the maximum achievable with Qwen2.5-Coder32b (131k context window) was a around 100k tokens. Then it slowed down to a timeout. But impressive...

@RaffaelloTamagnini 10 күн бұрын

true , just tested , and with 24gb gpu offload too on a machine with 192gb of ram. 131k context want too much memory

@developersdiary1995 15 күн бұрын

Thanks for sharing this with us, your content is gold! I tried Qwen 2.5 coder yesterday on my Intel Core I7, 16GB RAM DDR4, RTX 3050 (4GB VRAM) and it struggled with Bolt. So I guess that I should only use Open-Source Local AI models for generating text, for now...

@aleksanderspiridonov7251 15 күн бұрын

YOU NEED AT LEAST MAC WITH 32GB RAM M3-M4 I THINK BUT BETTER 2-3 3090 MINIMUM FOR +- GOOD WORK BUT ALSO OPENROUTER CHEAP

@johnnyarcade 15 күн бұрын

@@aleksanderspiridonov7251 WOULD THE NEW MACBOOK PRO WITH 40 GPU CORES AND 48GB RAM WORK WELL ENOUGH OR SHOULD I OPT FOR MORE RAM?

@handfuloflight 14 күн бұрын

@@aleksanderspiridonov7251 y u screamin son

@alexjensen990 15 күн бұрын

BTW, you had me laughing so hard at the whole "why the hell am I using it then!" comment. Truly priceless.

@sammcj2000 15 күн бұрын

Looking at your output it almost seems as if you or the model provider you're using is using the wrong chat templates + inference parameters that aren't configured for coding tasks. What about the temperature - it should be set to 0 for coding, and you should use a top_p of no higher than about 0.85. Did you set the context size to something reasonable? I've found the 32b model to be really impressive, certainly the best open weight model out there by far. In in my experience Cline specially it's not very good with any models other than Claude which it was originally written for.

@AICodeKing 15 күн бұрын

I did try it with Fireworks and it was the same results. It might be that Cline is not okay with the model. But, even if you consider the aider results.. It's too buggy and not good at all if you're working on bigger application with mutliple context of files..

@sammcj2000 15 күн бұрын

@@AICodeKingthanks for the extra info. I might try a couple of your common prompts running the model directly without aider or cline in the mix to see if it's a templating issue. It could be something like then using the default chatml template and not the proper updated Qwen 2.5 toolcalling template - or something along those lines.

@MM-24 15 күн бұрын

@@sammcj2000 would love to see what analysis you come up with - thanks for double checking. super helpful

@bodyguardik 14 күн бұрын

He didnt even downloaded it as it seems. This video is about some crap online service

@gmag11 15 күн бұрын

I love your style. Go on like this. AI coding a great use case for LLM. I'm learning a lot with your videos

@fezkhanna6900 15 күн бұрын

hahahah, "man if i have to implement it myself, why the hell am I using this". This made me laugh (9:50)

@peacekeepermoe 12 күн бұрын

same 😂😂 man I'd be mad too if AI is asking me to do something I asked it to do for me in the first place. It's like who is the master and who is the slave here goddamnit?

@darkreader01 15 күн бұрын

I also did a test before seeing your video and my conclusion was "trash", at least for my use case. After seeing your video, I see that I am not the only one! It's not worth the hype.

@AaronBlox-h2t 5 күн бұрын

Interesting....thanks for the video.

@raj4624 12 күн бұрын

thanks for this hyperbolic webite.. it helped me

@davidcarey37 15 күн бұрын

Thank you very much. Well explained and informative as always, and in this case it has definitely “seperated the wheat from the chaff” … qwen 2.5 coder seems very disappointing.

@tecnopadre 15 күн бұрын

Truth testing = reality Great job, as usual. Congratulations 🎉

@jaynucca 14 күн бұрын

Thank you for being honest! I wanted to love Qwen 2.5 Coder as well, but it just can't actually do anything useful beyond VERY simple applications.

@Dyson_Lu 11 күн бұрын

Strange, Cole Medin got great results and did Simon Willison. Both were extremely impressed.

@phoenyfeifei 14 күн бұрын

I find Cline just doesn't work with OLLAMA local model very well. Their developer appears to blame these OLLAMA models are heavily quantized which I do agree, but I run Q8 and FP16 models but still getting same shitty result

@cgimoonai 15 күн бұрын

Thank you man!

@PhuPhillipTrinh 13 күн бұрын

lmao good testing king! will you change pokemon one day?

@maertscisum 14 күн бұрын

I am guessing that the benchmarking use a carefully engineered prompting to beat other models. I have always questions validity of each model benchmark claim. There should be a formal body with standard test sets to run the benchmark.

@jamesbuesnel5054 15 күн бұрын

Ahahah your hate for cursor is hilarious 😂

@yoannthomann 15 күн бұрын

The point is done, we need better benchmarks 😢😅

@onmetrics 14 күн бұрын

here for the low frequency roasts

@chadpogs7973 15 күн бұрын

Wow!! This is it!!

@tomwawer5714 15 күн бұрын

I run 32b on 6GB vram it’s slow about token/s but works.

@justtiredthings 15 күн бұрын

what a bummer. I had high hopes for this model

@isheriff82 15 күн бұрын

so true bro, i hate it when ppl do that! also aider and cline is way better at everything!

@xelerator2398 15 күн бұрын

Thank you!

@diplobla 15 күн бұрын

thanks for this 👍

@konstantinoskonstantinos8524 15 күн бұрын

Is Hyperbolic using the Instruct model or the Base one?

@AICodeKing 15 күн бұрын

Instruct and unquantized as well.

@JeffreyWang-hh4ss 14 күн бұрын

I like your objectivity, these small model hypes + marketing are pretty annoying.

@DAZEOFFICIAL 15 күн бұрын

Strange, though I think this is a milestone for a local model to be able to even create something using Aider. From my testing. Properly used Aider, cline did not work in my testing. I have a 3090 and it did run at workable speeds.

@AICodeKing 15 күн бұрын

Yes, but claiming unbelievable things is never good

@HikaruAkitsuki 15 күн бұрын

Dude. Can you review Blackbox AI? It has Gemini Pro, GPT 4o, Claude Sonnet 3.5 and it's own Blackbox model. It's mostly a chat app AI like anything else but there is also VS Code and JetBrains extension.

@MacS7n 15 күн бұрын

You made me hate cursor 😅 and to be honest you're right about cline being better 😅

@ctwolf 14 күн бұрын

me @3:50 hell yeah, dancing Pokémon

@fmatake 10 күн бұрын

Benchmarks always come out 'pretty,' but in real life, I've found that it's far behind even claude-3-5-haiku and gpt-4o-mini.

@ghosert 14 күн бұрын

what is the smaller local LLM model which you think is better than Qwen 2.5 coder 32b, thanks, you didn't mention which video I should take a look.

@Fenixtremo 14 күн бұрын

1) Qwen 2.5 coder 32b 2) Deepseek v2.5 205b 3) Nope

@WaveOfDestiny 15 күн бұрын

benchmarks with smaller models usually are completely bs. They probably distill the bigger models into it, making it memorize benchmark like questions without actually making them smarter.

@mz8755 15 күн бұрын

It's such a small model and the hype to try compare it with sonnet is where all these start to fail. It should do what a small model should do in some specialized cases. Not to run a general coding agent. It is also specialized for code generation while powering an aider is much more demanding on versital intelligence

@christerjohanzzon 15 күн бұрын

Great video! Real tests in real apps. I would like to see a full workflow test, from figma design to tested product. Done with NextJS, TS, TailwindCSS and assisted coding with AI all the way from setup to testing, reviewing and deployment.

@wolverin0 13 күн бұрын

could you make a guide to use cline with the local qwen ?

@lcarv20 15 күн бұрын

Hi there, In your first prompt qwen was trying to generate the build files, and node_modules, maybe if you had the project setup wouldn’t try to generate that much code? Can you try?

@lcarv20 15 күн бұрын

Ok after seeing the whole video I understand that it wouldn’t matter.

@AICodeKing 15 күн бұрын

I had created the NextJS App before hand.

@brandon1902 11 күн бұрын

To make matters worse, outside of coding Qwen2.5 is far worse than Qwen2. Most notably, it hallucinates far more across all domains of knowledge. I really do think you're right that Qwen is optimizing their LLMs for tests at the expense of overall performance. Qwen2 72b used to be almost as good as Llama 3.1 70b, but now Qwen2.5 72b is far worse despite climbing higher on benchmarks.

@Piotr_Sikora 10 күн бұрын

Model in hyperbolic use 128k context window?

@antoniofuller2331 15 күн бұрын

16 million tokens uploaded just to generate 3 files??!!! 6:30

@AICodeKing 15 күн бұрын

I think that it's a bug in cline and that's why it displays that.

@antoniofuller2331 15 күн бұрын

@AICodeKing hmm

@toCatchAnAI 15 күн бұрын

is this available on Open Bolt?

@2005sty 14 күн бұрын

Alibaba has qwen max model (not opensoure) which is far better then the open source version. But.. strangely they dont show it off. I suspect ...

@alainmona268 15 күн бұрын

hey is it possible you can add Qwen2.5 32B to OpenHands? I tried a million different ways with the help of claude and copilot and chatgpt but couldnt get it running

@A-Jaradat-Z 8 күн бұрын

openRouter?

@mrpocock 15 күн бұрын

So why does it score well in benchmarks if it can't function in these ide or agentic contexts?

@AICodeKing 15 күн бұрын

You can basically just train models on specific benchmark questions and make them score well in benchmarks but in real life this approach fails.

@mrpocock 15 күн бұрын

@@AICodeKing That really sucks. Benchmark chasing should be an immediate disqualification. I wonder if there are ways to structure benchmarks so that they produce a randomised but equivalent task. Or alternatively, flood the market with so many benchmarks that it is not practical to over-fit to them all.

@AICodeKing 15 күн бұрын

There are actually many benchmarks but you just need to select 5 or 10 and just compare the results with that..

@mrpocock 15 күн бұрын

@@AICodeKing It would be better if model publishers were expected to submit their models to 3rd party benchmarking rather than doing it in-house. We used to have this problem with protein 3d reconstructions. People would publish papers on cooked benchmarks. That's why the CASP protein structure prediction competition was set up.

@ThrivingMotivation28 14 күн бұрын

Every LLM model except Sonnet disappoints

@ZzzKekeke 15 күн бұрын

can you make the dragons twerk?

@jjdorig9712 15 күн бұрын

I have it running on a single 3090, how do i check how much context window it has?

@AICodeKing 15 күн бұрын

I think it should be mentioned on Hugging Face.

@CyrilSz 15 күн бұрын

vscode combo with aider +qwencoder ; cline +claude ; continue + opencoder ?

@sinapxiagency 15 күн бұрын

Time saving !!!

@vaioslaschos 15 күн бұрын

I tried QWEN 2.5 for math because I am taking part in the AIMO Kaggle competition. I cant say that with certainty but I feel they train their models on the benchmarks. In one weird case it did a function calling but also provided me the result (without actually performing the function calling).

@jose-lael 14 күн бұрын

That’s common for LLMs, try using a wider variety of them and you’ll have an intuition for how LLMs behave.

@wasimdorboz 15 күн бұрын

please answer how u get to know the base url ? hyperbolic ?

@AICodeKing 15 күн бұрын

You can see it by going to Hyperbolic API Script thing

@wasimdorboz 15 күн бұрын

@@AICodeKing alright thanks bro , you good developer

@wasimdorboz 15 күн бұрын

@@AICodeKing bro there is qwen 2.5 72b and i looked over ai and google i didnt get the base url or how to use it exactly Qwen/...instruct wow instruct and boom work, u good developer

@wasimdorboz 15 күн бұрын

bro please us etutorial on electron or tauri or any open source one

@emilianosteinvorth7017 15 күн бұрын

I have had some issues with cline using models that are not claude/gpt since i think cline requires a model with proper agentic features. It could be a reason why the performance was so poor with it. I think testing qwen using a chatting interface could change the results.

@a1_Cr3at0r 15 күн бұрын

Dude make video about g4f (gpt4free) API + Cline

@hipotures 15 күн бұрын

The same with python, garbage produced without end. Maybe it's a problem with ollama?

@BeastModeDR614 15 күн бұрын

It doesn't follow instruction

@QorQar 15 күн бұрын

Hyberpolc free or no?

@AICodeKing 15 күн бұрын

Free $10 credits

@enloder 14 күн бұрын

So I should not use Qwen2.5 Coder 7B anymore?

@AICodeKing 14 күн бұрын

Depends on your choice.. I see no use of that model for me as of now.. I just use SmolLM2 which is better and can actually be used locally at great speeds on my machine. There's no one size fits all or anything like that.

@mnageh-bo1mm 15 күн бұрын

test it with cursor

@alexjensen990 15 күн бұрын

I'm pretty sure that Qwen is Chinese, right? That may explain the questionable benchmarking.

@다루루 15 күн бұрын

Very powerful model!!!

@TawnyE 15 күн бұрын

@HansKonrad-ln1cg 15 күн бұрын

very bad at instruction following. it has something in common with my wife there.

@justtiredthings 15 күн бұрын

🙄

@meassess 15 күн бұрын

I did everything right but I get this error: # VSCode Visible Files (No visible files) # VSCode Open Tabs (No open tabs) # Current Working Directory (d:/Mert - Workspace/test-ai-project) Files No files found.