Insanely Fast LLAMA-3 on Groq Playground and API for FREE

Рет қаралды 26,808

Күн бұрын

Learn how to get started with LLAMA-3 on Groq API, the fastest inference speed that is currently available on the market on any API. Learn how to use the Groq API in your own applications.
🦾 Discord: / discord
☕ Buy me a Coffee: ko-fi.com/promptengineering
|🔴 Patreon: / promptengineering
💼Consulting: calendly.com/engineerprompt/c...
📧 Business Contact: engineerprompt@gmail.com
Become Member: tinyurl.com/y5h28s6h
💻 Pre-configured localGPT VM: bit.ly/localGPT (use Code: PromptEngineering for 50% off).
Signup for Advanced RAG:
tally.so/r/3y9bb0
LINKS:
Notebook: tinyurl.com/57yhf26h
Groq API: groq.com/
TIMESTAMPS:
[00:00] Getting Started with Llama 3 on Grok Cloud
[01:49] LLAMA-3 ON Playground
[03:03] Integrating Llama 3 into Your Applications with Grok API
[05:40] Advanced API Features: System Messages and Streaming
All Interesting Videos:
Everything LangChain: • LangChain
Everything LLM: • Large Language Models
Everything Midjourney: • MidJourney Tutorials
AI Image Generation: • AI Image Generation Tu...

Пікірлер: 46

@engineerprompt Ай бұрын

If you are interested in learning more about how to build robust RAG applications, check out this course: prompt-s-site.thinkific.com/courses/rag

@3choff 2 ай бұрын

Oh my! And it even has function calling too. Looking forward to Whisper integration.

@engineerprompt 2 ай бұрын

Yeah, whisper will be awesome. Need to try their function calling with llama3

@starblaiz1986 2 ай бұрын

Bro this is WILD! Just imagine combining this with agent frameworks like Crew AI! 😮

@hanlopi 2 ай бұрын

very nice explained

@NLPprompter 2 ай бұрын

i watch this with 2x playback speed its generations speed become like a dream come true

@Yusef-uh4wl 2 ай бұрын

Try 4x speed you will reach agi

@NLPprompter 2 ай бұрын

@@Yusef-uh4wl what? LOL

@TzaraDuchamp 2 ай бұрын

Wow, that’s fast. Are you going to test this with function calling in an agentic workflow?

@TheReferrer72 2 ай бұрын

I get 50 tokens per second for the 8b on a 3090 at home. Its a nice model.

@nac341 2 ай бұрын

"we don't care about the responses, we only care about the speed". I can give you an even faster API that just returns random words :)

@Tofu3435 2 ай бұрын

Cool, a fast passphrase generator 😂

@unclecode 2 ай бұрын

Actually author has a point. Picture a scenario with multiple agents working together on a super complex task. You might not even care about their responses or understand their complicated talk, but all you really want is to have 800 tokens per second to handle the task in just a few seconds. At that point, the final response is all that matters. Although I wish that random word generator API or "Infinite monkey theorem" was enough to solve world complex problems 😅

@RickySupriyadi 2 ай бұрын

@@unclecode actually human do that. there was a time when our team as human work together, we must be work ridiculously fast then we all came up with stupid yet efficient way to communicate...

@syeshwanth6790 2 ай бұрын

He means he is not going to test the accuracy of the model in this video. He is demonstrating how fast the api is. There are other videos or articles where performance of these models have been evaluated.

@unclecode 2 ай бұрын

@@RickySupriyadi haha u right, and It's not surprising that we're inclined to use our own human collaboration methods to design multi-agent systems. There's a desire to make AI resemble us.

@zhonwarmon 2 ай бұрын

Cant wait for local models

@TheReferrer72 2 ай бұрын

They have been around since Thursday.

@looseman 2 ай бұрын

70b is fine for local run.

@huyvo9105 2 ай бұрын

Sometimes it is limited, how to handle it?

@mirek190 2 ай бұрын

Why did you set for only 1024 tokens?

@NicolasEmbleton 2 ай бұрын

Do we know how aggressively they quantize? I heard the quantization was pretty aggressive and as an outcome the models aren't "as good" as verbatim. If true, it's a reasonable tradeoff but we just need to know for sure so we can make informed decisions.

@Cingku 2 ай бұрын

Yes I tested it for one of my complex calculation prompt and the one in the Groq (llama 70 billion) is really bad and answer wrongly always...but if I use the one in huggingchat, it will give perfect answer every time! So quantization really decrease the performance drastically and it doesn't matter if it fast when it gives the wrong answer.

@NicolasEmbleton 2 ай бұрын

@@Cingku I had fairly similar outcomes in my tests and stopped using Mistral / Mixtral back then. Maybe the free version target audience is just people testing and that would make sense. But it did not convince me to use the service. I'll give it another paid attempt see if it's any better.

@unclecode 2 ай бұрын

Do u agree, Groq feels way better when u set "stream=False" :)) When you understand "stream" was a way to hide a weakness.

@engineerprompt 2 ай бұрын

I totally agree. Streaming make it worse for Groq but others used it to show they are faster than they actually are :)

@noxplayer-rt9tj Ай бұрын

How use Google Colab&Huggin Face to make Groq+Whisper converter ftom audio file to text with UI?

@CharlesOkwuagwu 2 ай бұрын

Please can you show us end to end fine-tuning llama3 on custom dataset

@engineerprompt 2 ай бұрын

Check the previous video on the channel. Will be making more on fine-tuning.

@Warung-AI-Channel 2 ай бұрын

We just built Llama3 #RAG powered by groq and it's extremely fast 😮

@gazzalifahim 16 күн бұрын

Hello, I am looking for learning something for my thesis. Would you please share a tutorial?

@MrN00N3_ 2 ай бұрын

Can you run Groq locally?

@greendsnow 2 ай бұрын

Wait a second, that's extremely cheap

@snehitvaddi 2 ай бұрын

Llama3 can generate images as well right? Can I use this API to generate images? If so, could please make a tutorial on that or atleast a short? (BTW, subscribed to see an update on that)

@engineerprompt 2 ай бұрын

there is another model on meta.ai which can generate images. Its not part of llama3. I am not sure if its available via api. Will check it out and update on the channel.

@snehitvaddi 2 ай бұрын

@@engineerprompt also, if you don't mind please leave an update as reply to this if you found any update on that

@nexuslux 2 ай бұрын

Notebook link doesn’t work

@engineerprompt 2 ай бұрын

Can you check again, seems to be working on my end.

@abdelhameedhamdy 2 ай бұрын

I did not understand the difference between system and user roles !

@engineerprompt 2 ай бұрын

system role defines the behavior of the model. Think about that as a global instruction that will control the behavior model. "user" role is the actual input from the user. Hope that helps.

@InsightCrypto 2 ай бұрын

so fucked up that you wrote free

@namecUI 2 ай бұрын

You said for free ?! How this is possible ?

@wwkk4964 2 ай бұрын

Groq has too many LPUs that's why

@InsightCrypto 2 ай бұрын

@@wwkk4964 its not free groq has clear pricing for models

@YoussefKareemBouiahadj-xn3od Ай бұрын

Well this is a waste of use.. The mate is asking a question in gallons when hes got an accent away from the united states... You should be asking in litres in the international system units not the imperial system