Glad you clarified the definition of an agent! Many mix it up with calling multiple consecutive LLM calls which is “pipeline” and not agent. An agent needs autonomicity to plan, think, and decide. Also, I'm creating a COLAB to challenge the model's function calling ability. I'll share it soon for you to use and review. Another thing I wonder if their AWQ quantization is 4-bit or 8-bit. The table suggests 4-bit because AWQ preserves accuracy better and scores better than other 4-bit methods. But it scores lower compared to 8-bit, indicating they use 4-bit. I think AWQ 8bits is the best for most of cases.
@engineerprompt7 ай бұрын
I have seen a lot of confusion around function calling and agents and thought it will be helpful. Would love to see the collab :) I think they probably were using AWQ 4-bit. There are some other formats will as and will be interesting to see how they compare. They also did some testing on the inference speed when you use different frameworks (TGI, vLLM etc.). The results are on their page and are really interesting. Will probably create a video on that soon.
@unclecode7 ай бұрын
@@engineerprompt Exactly, and to be honest, I think sometimes some people do it deliberately. For fundraising or hype, it's a marketing move, like when "big data" or "cloud computing" were hot terms. Now, I see some people using a function call and calling it an agent. Experts like you have to clarify this for newcomers, so they understand what an agent truly is.
@xXWillyxWonkaXx7 ай бұрын
I have random question regarding massedcompute. How many hours do you typically run the LLM for? 1 hour? more? less? And how much do you spread it across? 1 week (assuming you're testing it or you're building something with it)? Also what's the overall cost?
@engineerprompt7 ай бұрын
I usually do testing of new LLMs on their VM if I need Nvidia GPU. So can be an hour or two when a new LLM is released or more if I am working on a project. They charge you per hour. You can run their VM for days/weeks and I haven't encountered any issues with it. For pricing I would suggest to checkout massedcompute.com/home/pricing/ If you use my code: PromptEngineering, you will get discount on their VMs. I do get a small commission out of it :)
@MrDenisJoshua7 ай бұрын
Witch type of subscription you have make at massedcompute please ? I also wonder how the hour is calculate please ? Thanks for the video.
@engineerprompt7 ай бұрын
I usually use those on hourly basis. If I am testing a model or running a training run. Different GPUs have different rates there. I normally use A6000 on their platform. Will recommend to checkout their pricing page. (massedcompute.com/home/pricing/). If you decide to use them, you can use my code: PromptEngineering for reduce pricing on certain VM. I can connect you to my contact there if you need it for enterprise usage. Happy to help
@MrDenisJoshua7 ай бұрын
@@engineerprompt No.... is not for enterprise usage, just for hobby :-) Thanks for the answer.
@zxc15zxc7 ай бұрын
Very informative, just curious why not langgraph to manage this.
@engineerprompt7 ай бұрын
I think they are building their own framework. You could use langgraph for sure with something like this. Personally, I am not a big fan of frameworks. They add a lot of abstraction and bloatware which is not needed. You are better off to write custom code for your application if one has the time and skills to do that.
@jeffdavis51967 ай бұрын
i've found other models performing much better in an agentic workflow across many use cases, many times the results are nonsense from qwen2, so it needs to be filtered out, but the 128k context length is nice.
@MeinDeutschkurs7 ай бұрын
On ollama we have Q4. How do pull another quantzed model?
@mihaitanita7 ай бұрын
in the drowndown selector on the left it's a link with "97 tags". click on it and you got the all available flavors. q2 to q6.
@engineerprompt7 ай бұрын
If you go down the model card on ollama website, they generally have commands list for other quantization levels as well.
@MeinDeutschkurs7 ай бұрын
@@engineerprompt , have to check this again.
@MeinDeutschkurs7 ай бұрын
@@engineerprompt I have found it under the tags section. Thx
@snuwan7 ай бұрын
How can u run a small model in a phone
@timhopkins237 ай бұрын
There are apps like Ai on Device or Private LLM that allow you to download and run small models.
@yngeneer7 ай бұрын
so even iQ2-XXS of 70B is still far better than fp16 7B?
@jackgaleras7 ай бұрын
Gracias
@engineerprompt7 ай бұрын
:)
@Automan-AI6 ай бұрын
now Groq built a llama3 with function calling inside