Llama-3.3: The BEST Opensource LLM EVER! Beats GPT-4o! (Fully Tested)

Рет қаралды 11,381

Күн бұрын

Пікірлер: 53

@intheworldofai Ай бұрын

Want to HIRE us to implement AI into your Business or Workflow? Fill out this work form: td730kenue7.typeform.com/to/WndMD5l7 💗 Thank you so much for watching guys! I would highly appreciate it if you subscribe (turn on notifcation bell), like, and comment what else you want to see! 📆 Book a 1-On-1 Consulting Call WIth Me: calendly.com/worldzofai/ai-consulting-call-1 🔥 Become a Patron (Private Discord): patreon.com/WorldofAi 🧠 Follow me on Twitter: twitter.com/intheworldofai Love y'all and have an amazing day fellas. Thank you so much guys! Love yall!

@lancemarchetti8673 Ай бұрын

I used 3.3 in glhf last night to build a qrcode storage concept..After 4 hours I was so excited because the app actually worked. 3.3 was particularly good at figuring out why my qrcode outputs from the design grid were not passing qrcode scanning tests. It tuned out that there was a faint grey 1px outline bordering each cell that was exported to the respective image format. I've noticed that keeping explanations simple in the prompting is key. Also, just paste issues from the console without explaining. The llm is pretty sharp at figuring out the issue. Overall...I'm impressed.

@AaronBlox-h2t Ай бұрын

a qrcode storage concept?

@intheworldofai Ай бұрын

[Must Watch]: Qwen-2.5: The BEST Opensource LLM EVER! (Beats Llama 3.1-405B + On Par With GPT-4o): kzbin.info/www/bejne/r5WTnJp6rNCZsJIsi=Uh2eCpIWYpcY54Hq DeepSeek-v2.5: BEST Opensource LLM! (Beats Claude, GPT-4o, & Gemini) - Full Test: kzbin.info/www/bejne/o6fTnI1nrqusbdEsi=NR9ChO50-HKJW9Cb Bolt.New + Ollama: AI Coding Agent BEATS v0, Cursor, Bolt.New, & Cline! - 100% Local + FREE!: kzbin.info/www/bejne/kKDSoJ2Mab93g9k

@intheworldofai Ай бұрын

Warp: AI Terminal - FREE AI Coding Agent BEATS Cursor, V0 & Bolt! (Generate Full-Stack Apps!): kzbin.info/www/bejne/q4PUpnhme8-pfM0

@intheworldofai Ай бұрын

Gemini 2.0 Flash: BEST LLM Ever! Beats Claude 3.5 Sonnet + o1! (Fully Tested): kzbin.info/www/bejne/onanYayCp7qYoq8

@jackinsights Ай бұрын

Great analysis mate.

@intheworldofai Ай бұрын

Thank you so much!

@MatthewSanders-l7k Ай бұрын

Love the enthusiasm for Llama 3.3! I'm definitely trying it out, cheaper and better than GP4 Omni is a win-win for me. Huge props to Meta AI team!

@jeffwads Ай бұрын

LM Studio has their own version of the model. Tested the 8bit and it is fantastic. Refactored a large 1200 line class into 650 lines including notes. Only a few minor adjustments and it worked beautifully.

@SahilP2648 Ай бұрын

Squashing lines of codes doesn't mean it's more performant or even cleaner. I am not sure why you mentioned only lines of code and nothing else.

@HansKonrad-ln1cg Ай бұрын

i have a 1 gallon jug, a 10 gallon jug, a 170 gallon jug and a 200 gallon jug. i want to measure exactly 173 gallons. how do i do that? all the models i have tested have failed this. the solution is obviously to pour a full 170 gallon jug into the 200 gallon one and then add a full 1 gallon jug three times to that. all the models either overthink it hopelessly or dont even add the right amount or want to fill jugs more than they can hold, etc. pretty sobering experience in case you got too excited about agi and stuff.

@alinayossimouse Ай бұрын

how is that butterfly a pass? Maybe you should ask Llama 3.3 how many wings a butterfly has

@pr0d1gyvisions74 Ай бұрын

That butterfly looked like Mr Hankey with wings. 🤣🤣

@intheworldofai Ай бұрын

HAHAHA south park ref

@AlexJohnson-g4n Ай бұрын

Llama 3.3 sounds like a beast! Can't wait to try it out and see how it compares to GP4 Omni. Props to Meta AI team for making AI more accessible

@julienarpin5745 Ай бұрын

It performs no better than Llama 3.1 on my private benchmark of verbal intelligence based on years of items encountered in the IQ testing community. My benchmark has matched prevailing benchmark leaderboards for years but is now diverging, indicating that these LLMs are gaining skillsets but not true intelligence, which is the ability to perform work on arbitrarily distant symbols in concept space, relations modelling, or sensitivity to analogies.

@marcus-b4x3h Ай бұрын

Make a video ❤

@8python8 Ай бұрын

they are no more than freaking parrots if something is in their training set, you get a reasonable answer otherwise it's either BS either FOS, depends how deep it hallucinates no creativity whatsoever, just parroting its training set like a 5 yo with a huge SSD

@julienarpin5745 Ай бұрын

@pythonate2906 Not exactly true since cranking the temperature up changes some function that enables novelty, which is arguably creativity as a whole. It's all about how you traverse the latent space with prompting and fine-tuning

@threepe0 Ай бұрын

@@8python8you are no more than a parrot. How you use agents and teams is limited only by your imagination.

@8python8 Ай бұрын

@@threepe0I guess stating the truth is disturbing. Not just me saying these things. But I wouldn't waste my time arguing. If something is NOT in the training set, parrots won't come up with it. I am testing NEW things, never been tested before, unlike your trivial daily BS.

@xierxu 23 сағат бұрын

Are you running it locally?

@ABUNDANCEandBEYONDATHLETE Ай бұрын

Idk if i downloaded a bad version but my 4090 liquid, 3970x w/256GB RAM is only pumping out like 0.5-2 tokens a sec. Thats not functional to even benchmark if its too slow. The qwen2.5 32B coder is like 30-60 tokens a sec.

@lmaoanatorhilarious2582 Ай бұрын

Does this have reasoning

@intheworldofai Ай бұрын

Yup. Logical, analogical, and common sense reasoning

@tomgreen8246 Ай бұрын

Sort of. Not in the o1 Preview sense... let alone o1 Pro (oh. my. God). Yes I dropped the money for it.

@intheworldofai Ай бұрын

@@tomgreen8246 wow let me know how the tests go fro the o1 pro!

@lmaoanatorhilarious2582 Ай бұрын

@@tomgreen8246 yeah I want that o1 reasoning. How do you like o1 pro? How does it differentiate from o1 preview

@UCs6ktlulE5BEeb3vBBOu6DQ Ай бұрын

QWEN2.5 72B and QwQ both ace the train question (tried it while watching this)

@intheworldofai Ай бұрын

QwQ and the new deepseek reasoning model is quite impressive. Great alternative to o1

@UCs6ktlulE5BEeb3vBBOu6DQ Ай бұрын

@@intheworldofai QwQ is better at coding, 72B is better at creating the code. If one could, he'd get 72B to create the program structure and tell QwQ to code it.

@ABUNDANCEandBEYONDATHLETE Ай бұрын

What lm-studio settings for this model should I use? The Q4_K_S is so slow for my machine. Loading 131072 context is laughingly slow. 4090 with 256GB and 3970X 32core AMD. Other models have no issues.

@intheworldofai Ай бұрын

@@ABUNDANCEandBEYONDATHLETE Try decreasing the batch size to a smaller valuery setting. Plus you can also set model parallelism to False

@UCs6ktlulE5BEeb3vBBOu6DQ Ай бұрын

@@ABUNDANCEandBEYONDATHLETE you are running a model that is slow on gpu with vram speed under 600gb/s on a system that offload too much load on memory that operate under 200gb/s. You are paying the poverty tax with your time. 72B at Q5 runs at 5-6 token/s for me. Its slow but at least I do not offload anything. Get yourself some other gpus and few PCI-e risers and increase your total vram. That system of yours is overkill for a server but its almost useless for LLM because of memory bandwith. Sorry