QWEN 2.5 72b Benchmarked - World's Best Open Source Ai Model?

  Рет қаралды 3,915

Digital Spaceport

Digital Spaceport

Күн бұрын

Пікірлер: 54
@opensourcedev22
@opensourcedev22 29 күн бұрын
I use this 32b model primarily for coding now. It's done so well, that I wonder if they trained it against claude 3.5 coding output, because it is very good. I wish one of these companies would make a hyper focused coding corpus model so that it can fit into 48gb vram at very high precision
@justtiredthings
@justtiredthings 27 күн бұрын
l believe they're planning on releasing a 32b coder variant soon
@alx8439
@alx8439 25 күн бұрын
It went off the rails because you're keep reusing the same open web ui chat and overflew the default ollama context size for any model, which is 2k. Use different chats for different topic - it will save you from pushing all the chat history to model, even though the previous messages are no longer relevant to what you're asking them afterwards. And increase context size to something like 8k
@Jp-ue8xz
@Jp-ue8xz 29 күн бұрын
wdym "failed" the first 2 tests. Does the game actually work if you put a png image in the corresponding folder? Btw there's a solid argument that could be made that if the scenario you proposed was the "best" the entire human kind was able to put together as a plan, the correct thing to do not to save us
@DigitalSpaceport
@DigitalSpaceport 29 күн бұрын
No it didnt and it failed as llama3.1 70b was able to make a poor, but functioning, one on a oneshot that did run.
@Lorv0
@Lorv0 28 күн бұрын
Awesome video! What is the name of the tool used for the web interface for local inference?
@DigitalSpaceport
@DigitalSpaceport 28 күн бұрын
Yes full software install video here on that kzbin.info/www/bejne/ip6xhHehn6mHhdU. This is Openwebui and Ollama together.
@alx8439
@alx8439 25 күн бұрын
Add some tool using tests. Like web searching and text summarization. Open Web Ui comes with web search and community tools you can equip. Ask a question like to find you some good AMD Ryzen laptops made in 2024 and compose a comparison table with all the specifications and prices
@DigitalSpaceport
@DigitalSpaceport 15 күн бұрын
I just am getting to the use case videos and setting up tools and vision so these things will be included in future evals.
@alx8439
@alx8439 15 күн бұрын
@@DigitalSpaceport Awesome. Thanks! That will be super helpful for community
@DigitalSpaceport
@DigitalSpaceport 15 күн бұрын
@@alx8439 also I did read your other comments and thanks for taking the time to write them out. I am incorporating your feedback actively.
@ManjaroBlack
@ManjaroBlack 14 күн бұрын
Size vs Quantization. When I can fit a larger model with q2 quantization, I can fit a smaller q8 model. Comparing the two, the larger q2 model is much more likely to give me gibberish. The only advantage that I find with larger models in a lower quantization is that it can handle a larger system prompt better.
@DigitalSpaceport
@DigitalSpaceport 14 күн бұрын
Yeah my bottom is q4 from what I have seen returned. Have you checked q8 vs fp16 on many models? The gains at fp16 feel in llama3.1/3.2 does not seem to make a big difference
@ManjaroBlack
@ManjaroBlack 14 күн бұрын
@@DigitalSpaceport I don’t find any difference in quality between q8 and fp16. Even q6 tends to be about the same for my use case. Below q6 I can tell a difference with long inputs. One thing I do see a difference in is the size of the output. q8 and fp16 seem to output about the same amount, but q6 will often output less, which can be a problem for me if I have a large output structure in the system prompt.
@DigitalSpaceport
@DigitalSpaceport 13 күн бұрын
Good to get your observation on that. q8 does seem like the sweet spot and qwen 2.5 q8 is mind-blowingly good if I need to have 4 models loaded in for RAG.
@dna100
@dna100 19 күн бұрын
I've found the qwen2.5 7b model to be the best of the current crop of 7b models. I've tried Llama3.1 7b, Internlm2.5 7b and mistral 7b. My second placed choice is then Interenlm model. Great video by the way. Nice to hear an honest opinion about the benchmarks. They are completely 'gamed' and pretty much meaningless. The only way is to gauge them yourself, as you have done here. Good work.
@DigitalSpaceport
@DigitalSpaceport 15 күн бұрын
qwen2.5 is very good I agree. I use that almost all the time now myself. The 32b variant allows me to have several models running at once as well.
@ManjaroBlack
@ManjaroBlack 14 күн бұрын
Exactly my experience at the 7b size. My use case builds quite large prompts, and they all struggle at this size, but InternLM was my go to. I find that qwen2.5 and InternLM are about the same, but I prefer qwens output and formatting.
@Mike-pi3xu
@Mike-pi3xu 23 күн бұрын
It would be helpful to se ollama ps and see how much is actually on the GPU's and how much is run by CPU. I noticed that the four 4090's only ran on 1/4 compute utilization, and seeing the execution context might shine some light on the discrepancy. Please consider including this. This is especially important with GPU's with less VRAM.
@DigitalSpaceport
@DigitalSpaceport 23 күн бұрын
I am mindful of this in all tests and all models reviewed here fit fully in VRAM I do check. Yes the workload is split into 4 and each gpu runs at 1/4 speed on 1/4 the workload. This is how llama.cpp does parallelism currently. That is the model processor for ollama. vllm enables an alternate way to do parallelism that may significantly improve on that which I will test here.
@tohando
@tohando 29 күн бұрын
Thanks for the video. Shouldn´t you clear the chat, after each question, so the context is not full with previous stuff?
@DigitalSpaceport
@DigitalSpaceport 29 күн бұрын
If I was benchmarking just for numbers that are high, probably. We have plenty of those benchmarks and synthetics though. Im interested in how usage for normies like me is though. I often dont make a new clean new chat topically and the ones I do eventually meander off the original conceived topic. It isn't scientific testing, but rather mimics normies usage patterns. It is purposeful in that choice.
@Evanrodge
@Evanrodge 28 күн бұрын
@@tohando yes, his methodology is borked
@NilsEchterling
@NilsEchterling 2 күн бұрын
@@DigitalSpaceport Good intent on your part. But I think we should propagate using LLMs well. And making new chats for new topics is simply something everyone has to learn. KZbin videos like yours should educate on this.
@justtiredthings
@justtiredthings 27 күн бұрын
The issue of the ethical question is very difficult. Maybe unsolvable given that we want these things to be agents. The problem is if you can give it just the right ethical scenario and the AI will do as you say, then any bad actor could simply lie to their AI and have it go on a killing spree. That's not too desirable either. But then how do we ensure that these things are making reasonable decisions when we give them any level of autonomy? I'm not sure how we resolve that contradiction.
@DigitalSpaceport
@DigitalSpaceport 27 күн бұрын
In my opinion, primary initiator decision making scaffolding being the entire chain is an issue I can see leading to real problems. Unassociated and independent evaluative recommendation systems not running on homogenous base models that feed into an arbiter weighting is a decent solution that can work better. We already have machines making life and de@th decisions autonomously at scales we likely don't see day to day.
@justtiredthings
@justtiredthings 27 күн бұрын
@@DigitalSpaceport tbh I'm not parsing you v well
@stattine
@stattine 29 күн бұрын
Can you clarify why you are going so deep on p2000 vs p5000? The extra vram in the p5000 seems like a clear choice
@DigitalSpaceport
@DigitalSpaceport 29 күн бұрын
Oh its because I tested the P2000 as I have one on hand. kzbin.info/www/bejne/eXuVf3uFeLZlr7s Im pretty much testing everything I've got around. I don't have a P5000 but yeah extra VRAM and more cuda cores FTW!
@Merializer
@Merializer 28 күн бұрын
Do you think this 72B model can be run with 64GB ram and RTX-3060? btw, I wouldn't use a red background colour for PASSED, but use green instead. Red seems more suited for a word like FAILED.
@DigitalSpaceport
@DigitalSpaceport 28 күн бұрын
Good point re colors updated for next video. Yes it can layer into vram and system ram, but the performance will be painfully slow.
@Merializer
@Merializer 27 күн бұрын
@@DigitalSpaceport not gonna try then I think. My internet is really slow to download models, saves me the trouble.
@kkendall99
@kkendall99 28 күн бұрын
That model is probably hard coded to cause no harm no matter what the scenario.
@elecronic
@elecronic 25 күн бұрын
Why all questions are in same chat? You should try new chat everytime for each question
@DigitalSpaceport
@DigitalSpaceport 25 күн бұрын
I have started doing this in the new chats for the new videos and will with that into the future. Thanks.
@CheesecakeJohnson-g7q
@CheesecakeJohnson-g7q 29 күн бұрын
Hi, I tried, on LM Studio, several versions of Qwen+Calme 70 , 72, 78b with all sorts of quant where Q5 and Q6 seems to perform best but I didn't find any that have a sufficient conversational speed. 3090 seems to work. While I have read the definition of K S K_M K_S and so on... I didn't really fully absorb the concept yet and from a model to the next, the "best performing model for my hardware" isn't always the same... The cozy spot is around 16gb even thought the device have 24gb... What am I missing? What settings should I tweak?
@83erMagnum
@83erMagnum 29 күн бұрын
I'd interested in this too. There is so little specific content for 24gb vram machines. The demand should be there since it is the only affordable solution for most.
@justtiredthings
@justtiredthings 27 күн бұрын
I've got a single 3090. 32b quant is pretty slow (~1.5 tkns/s), but 14b model is surprisingly decent for its size and reasonably fast (7-10 tkns/s)
@CheesecakeJohnson-g7q
@CheesecakeJohnson-g7q 24 күн бұрын
@@justtiredthings I run codestral 22b smoothly here at q_5 k_m and Q6-7-8 at a decreasingly unsatisfying speed but it runs.
@mrorigo
@mrorigo 28 күн бұрын
You should clear the context before trying the next challenge, no?
@mrorigo
@mrorigo 28 күн бұрын
As I said, clear the context before you try a new challenge, or your responses will be confused.
@DigitalSpaceport
@DigitalSpaceport 28 күн бұрын
Purposefully done this way at this time. Have explained a few times why prior.
@justtiredthings
@justtiredthings 27 күн бұрын
Please, please, please test it on an M1 or M2 Ultra. I'm dying for someone to demonstrate the speeds on Apple's efficient chip.
@DigitalSpaceport
@DigitalSpaceport 27 күн бұрын
I can send you an address and amazon can deliver a gift? 😀
@justtiredthings
@justtiredthings 27 күн бұрын
haha, fair enough, but idk what you've got given that supercomputer setup. M-Ultra chips actually seem like the economic option for mid-sized LLM inference, but I haven't been able to see enough testing results to confirm that--they're weirdly difficult to find
@xyzxyz324
@xyzxyz324 19 күн бұрын
the ai-model maddness is going somewhere too complicated. as end-user i have to evaulate freaking lots of parameters, hardware needs, fine-tunes, dedicating the role for the model, the main title that its trained, top-p value, top-k value, penalty, temperature, etc, etc, etc... I need an ai model to help me find the most easy-going one for my needs! And by the way they are bigggggg in size and hardware needs are going crazy. Someone please collect all the ai models knowledge in one, and create an easy interface with few parameters.. Now reaching using and hosting ai model is getting more expensive and complicated rather than owning a real brain.
@DigitalSpaceport
@DigitalSpaceport 15 күн бұрын
I found a pretty decent and much less "knobs" interface I will be reviewing. I think it might fit for you. AnythingLLM
@mrorigo
@mrorigo 28 күн бұрын
Sentence, not Sentance, fr you have llms to correct your spelling, no?
@DigitalSpaceport
@DigitalSpaceport 28 күн бұрын
Its easy to toss hay from the sidelines which is why I urge everyone to get into youtube themselves. It is a very humbling journey as a solo producer. Especially hard is those watching who catch some detail you must have missed but you have no idea what they are talking about because they dont context it.
Wait for the last one 🤣🤣 #shorts #minecraft
00:28
Cosmo Guy
Рет қаралды 11 МЛН
Как не носить с собой вещи
00:31
Miracle
Рет қаралды 1,5 МЛН
didn't manage to catch the ball #tiktok
00:19
Анастасия Тарасова
Рет қаралды 34 МЛН
This AI Coder Is On Another Level (Pythagora Tutorial)
43:21
Matthew Berman
Рет қаралды 125 М.
NEVER install these programs on your PC... EVER!!!
19:26
JayzTwoCents
Рет қаралды 3,7 МЛН
Using Clusters to Boost LLMs 🚀
13:00
Alex Ziskind
Рет қаралды 57 М.
Thermoelectric cooling: it's not great.
32:51
Technology Connections
Рет қаралды 3 МЛН
host ALL your AI locally
24:20
NetworkChuck
Рет қаралды 1,2 МЛН
Microservices are Technical Debt
31:59
NeetCodeIO
Рет қаралды 558 М.
How to Get $500 Motherboards for $50
31:29
Linus Tech Tips
Рет қаралды 2 МЛН
Wait for the last one 🤣🤣 #shorts #minecraft
00:28
Cosmo Guy
Рет қаралды 11 МЛН