Groq API - 500+ Tokens/s - First Impression and Tests - WOW!

  Рет қаралды 18,421

All About AI

All About AI

Күн бұрын

Groq API - 500+ Tokens/s - First Impression and Tests - WOW!
👊 Become a member and get access to GitHub:
/ allaboutai
🤖 AI Engineer Course:
scrimba.com/?ref=allabtai
📧 Join the newsletter:
www.allabtai.com/newsletter/
🌐 My website:
www.allabtai.com
Groq:
www.groq.com
In this video I give my first impression and to multiple test on the Groq API. Like real time speech to speech and comparing Groq to ChatGPT. A new chip design for running inference on AI apps like LLMs. Thanks to Groq for giving me early access!
00:00 Groq API Intro
00:45 Groq LPU
01:45 Groq Real Time Speech to Speech Test
06:07 Groq vs ChatGPT Test
09:33 Groq Chain Prompting Test
11:15 Conclusion

Пікірлер: 41
@kate-pt2ny
@kate-pt2ny 3 ай бұрын
I have joined your membership. Now I just need to get the groq API application approved. Thank you again Kris for sharing.
@cjenkinsiv
@cjenkinsiv 2 ай бұрын
You had to apply?
@kate-pt2ny
@kate-pt2ny 2 ай бұрын
At first need to apply for groq API, but now don't need it.@@cjenkinsiv
@GroqInc
@GroqInc 3 ай бұрын
Thanks for the demos. We love what you're doing.
@paul1979uk2000
@paul1979uk2000 3 ай бұрын
Been testing this out online and been getting around 400 to 550 tokens on it, it's crazy fast. There are only two models it lets you select, but both are big models and run lightning fast compared to any other A.I. model I've seen online and locally.
@automatescellulaires8543
@automatescellulaires8543 3 ай бұрын
is there a difference between the two model though ? When you test them via groq ?
@cjenkinsiv
@cjenkinsiv 2 ай бұрын
It's super fast but they don't follow long-form instructions as well as ChatGPT.
@josecastroesq
@josecastroesq 3 ай бұрын
That was amazing!
@khalifarmili1256
@khalifarmili1256 3 ай бұрын
yes i did enjoy it, thanks for the video, keep it up ❤
@ATLJB86
@ATLJB86 3 ай бұрын
This guy is too smart for my brain to process
@richarddanielzoom
@richarddanielzoom 3 ай бұрын
SO GOOD !
@thesystemera
@thesystemera 3 ай бұрын
Damn cool stuff.
@theflipbit01
@theflipbit01 2 ай бұрын
You wouldn't believe, I was experimenting with Siri and groq api, and I asked the same question on summarizing the "attention is all you need" paper even before coming across this video. I mean I am spooked here, what are the odds of that happening? We humans obviously do behave in patterns. lol
@limebulls
@limebulls 3 ай бұрын
A video from you about Avatar AI would be awesome! Haven’t found one yet
@KodandocomFaria
@KodandocomFaria 3 ай бұрын
I was wondering if you know anything about airllm? I read that this inference is capable of loading a 70B model on gpus as small as 4gb, but I don't saw no one speaking about that
@The_Questionaut
@The_Questionaut 3 ай бұрын
Here's a long wall of text about the thing you're interested in. Now I think this is very interesting myself, what I understand is that it allows llms to be used on lower end hardware or something? I used AI to do this. Understanding AirLLM and Its Significance AirLLM is a groundbreaking technique that facilitates running a 70-billion-parameter large language model (LLM) on a single 4GB GPU, overcoming traditional hardware limitations. Traditionally, such large models would require more powerful and expensive hardware, restricting accessibility. **What is AirLLM?** AirLLM stands for a method allowing large language models to operate on smaller, less powerful GPUs. It employs two main techniques: 1. **Layer-wise Inference:** Breaks down the model into individual layers, loading only the necessary layers into memory during inference, reducing the overall memory footprint. 2. **Flash Attention:** An optimization within layer-wise inference, focusing on loading and executing only one layer at a time, further minimizing memory requirements. **Why is AirLLM Significant?** The significance lies in democratizing the use of large language models, making it accessible for personal projects, educational purposes, and small businesses. Practical implications include the ability to run advanced AI models, like chatbots, on smaller hardware, fostering innovation in various fields. **Running 70B LLM on a 4GB GPU with AirLLM Technique** **What it is:** AirLLM is a groundbreaking technique enabling the operation of a 70-billion-parameter large language model (LLM) on a single 4GB GPU, overcoming traditional hardware limitations. **How it works:** The technique employs layer-wise inference, dividing the model into layers and loading only the necessary parts into memory, drastically reducing memory requirements. **Key Features:** - **Layer-wise Inference:** Splits the model into layers, significantly cutting down on GPU memory usage. - **Flash Attention:** Optimizes single-layer execution, further reducing memory needs per layer. **Benefits:** Allows for efficient and effective inference on a 4GB GPU without compromising model performance, bypassing the need for high-end GPUs or excessive RAM. **Applications:** Ideal for scenarios where hardware resources are limited, such as personal projects or low-budget research. This technique represents a significant leap in making advanced AI models more accessible, enabling users with modest hardware to leverage the power of large LLMs for various applications.
@THE-AI_INSIDER
@THE-AI_INSIDER 2 ай бұрын
Groq api I have been using to use the mistral 8x7 and it is currently letting me use it for free , but until when it is free any idea ?
@indikom
@indikom 3 ай бұрын
How expensive is that kind of voice conversation using Groq API?
@JaredWoodruff
@JaredWoodruff 3 ай бұрын
I need this kind of inference speed in Skyrim with GPT :D
@Wilkbezstada
@Wilkbezstada 3 ай бұрын
lol nailed
@hqcart1
@hqcart1 3 ай бұрын
What matters most is the first token latency, question is does grow has edje on that?
@sirrobinofloxley7156
@sirrobinofloxley7156 3 ай бұрын
It's amazingly, stupendously and miraculously FAST, isn't it, haha
@JNET_Reloaded
@JNET_Reloaded 3 ай бұрын
you didnt show how to set this up i see the site now what?
@Ryan-yj4sd
@Ryan-yj4sd 3 ай бұрын
How to use JSON mode with the model?
@cjenkinsiv
@cjenkinsiv 2 ай бұрын
The code is available in the playground.
@Bigjuergo
@Bigjuergo 2 ай бұрын
What is price?
@IvanBialotski
@IvanBialotski Ай бұрын
Where the membership link? You just provide the like to your KZbin Chanel.
@CoderN1337
@CoderN1337 3 ай бұрын
amazin video but can you link rep?
@ScottWinterringer
@ScottWinterringer 3 ай бұрын
Its actually depressing that hardware made for int math wasnt made already.
@user-or4ks4bs5p
@user-or4ks4bs5p 3 ай бұрын
so 5x faster than openAI but the card cannot be used to train your own models...
@teebu
@teebu 3 ай бұрын
nvidia killer. im sure they're also working on a training dedicated card... and if not them, someone else. a lot of companies are going to try to eat nvidias 2T dollar lunch.
@hqcart1
@hqcart1 3 ай бұрын
nvidia killer is a company that will beat them in: 1. watt/token 2. $/token 3. first token latency 4. total generation time so far i see 4 was beaten, but it's the least important aspect.
@MARKXHWANG
@MARKXHWANG 3 ай бұрын
257 LPU vs 1 H100? do the math and watch your wallet
@thierry-le-frippon
@thierry-le-frippon 3 ай бұрын
Not too late for Meta to cancel its order from Nvidia 😅😅😅
@SpectralAI
@SpectralAI 3 ай бұрын
We should start using AI for something that is more useful than playing games and generating p0rnographic images. Its not a toy.
@Leto2ndAtreides
@Leto2ndAtreides 3 ай бұрын
Groq, so unfriendly about giving access...
@aaronortiz6268
@aaronortiz6268 3 ай бұрын
How? I haven't tried out their API or service
@cjenkinsiv
@cjenkinsiv 2 ай бұрын
What do you mean?
Improve Your AI Skills with Open Interpreter
15:23
All About AI
Рет қаралды 12 М.
World’s Fastest Talking AI: Deepgram + Groq
11:45
Greg Kamradt (Data Indy)
Рет қаралды 32 М.
Cat story: from hate to love! 😻 #cat #cute #kitten
00:40
Stocat
Рет қаралды 13 МЛН
it takes two to tango 💃🏻🕺🏻
00:18
Zach King
Рет қаралды 28 МЛН
Jonathan Ross, Groq | SC23
23:31
SiliconANGLE theCUBE
Рет қаралды 10 М.
DjangoCon US 2023: Don't Buy the "A.I." Hype
26:09
Tim Allen
Рет қаралды 10 М.
Google Releases AI AGENT BUILDER! 🤖 Worth The Wait?
34:21
Matthew Berman
Рет қаралды 199 М.
Making AI real with the Groq LPU inference engine
18:54
Chat with Documents is Now Crazy Fast thanks to Groq API and Streamlit
12:18
Generative AI in a Nutshell - how to survive and thrive in the age of AI
17:57
Creating an AI Agent with LangGraph Llama 3 & Groq
35:29
Sam Witteveen
Рет қаралды 34 М.
Creating J.A.R.V.I.S. powered by GROQ and Python
14:47
AI FOR DEVS
Рет қаралды 150 М.
Carregando telefone com carregador cortado
1:01
Andcarli
Рет қаралды 1,9 МЛН
Эволюция телефонов!
0:30
ТРЕНДИ ШОРТС
Рет қаралды 6 МЛН
Индуктивность и дроссель.
1:00
Hi Dev! – Электроника
Рет қаралды 1,5 МЛН