Faster LLM Function Calling

Faster LLM Function Calling - Dynamic Routes

Рет қаралды 10,025

Күн бұрын

LLM function calling can be slow, particularly for AI agents. Using Semantic Router's dynamic routes, we can make this much faster and scale to thousands of tools and functions. Here we see how to use it with OpenAI's GPT-3.5 Turbo, but the library also supports Cohere and Llama.cpp for local deployments.
In semantic router there are two types of routes that can be chosen. Both routes belong to the Route object, the only difference between them is that static routes return a Route.name when chosen, whereas dynamic routes use an LLM call to produce parameter input values.
For example, a static route will tell us if a query is talking about mathematics by returning the route name (which could be "math" for example). A dynamic route can generate additional values, so it may decide a query is talking about maths, but it can also generate Python code that we can later execute to answer the user's query, this output may look like "math", "import math; output = math.sqrt(64).
⭐ GitHub Repo:
github.com/aurelio-labs/seman...
📌 Code:
github.com/aurelio-labs/seman...
🔥 Semantic Router Course:
www.aurelio.ai/course/semanti...
👋🏼 AI Consulting:
aurelio.ai
👾 Discord:
/ discord
Twitter: / jamescalam
LinkedIn: / jamescalam
00:00 Fast LLM Function Calling
00:56 Semantic Router Setup for LLMs
02:20 Function Calling Schema
04:04 Dynamic Routes for Function Calling
05:51 How we can use Faster Agents

Пікірлер: 36

@NicholasRenotte 4 ай бұрын

James this is freaking awesome!

@MozgGroupVideos 6 ай бұрын

James, thanks for sharing. It might be really useful and probable can make the conversational agents with much more simple architecture.

@jamesbriggs 6 ай бұрын

yeah, I'm yet to build a full agent using this, but I think we could build a fundamentally different type of agent with this idea - I'm sure there would be both pros and cons but I would love to see it done :)

@thinkerisme4363 6 ай бұрын

Great, thanks a lot. Intent to use it in my search and chat app!

@jamesbriggs 6 ай бұрын

do it and let us know how it goes!

@GeorgeFoxRules 6 ай бұрын

‘Agentic workflow’ you’ve earned a new badge #rockstar 🎉

@jamesbriggs 6 ай бұрын

haha

@manojnagabandi9779 6 ай бұрын

Thanks for the video i really liked the concept. I wanted to ask you if dynamic routing being used needs just OpenAI encoding model for finding the dynamic routes? does the dynamic routing support multiple routes triggering if the query asked contains info about multiple routes?

@123userthatsme 4 ай бұрын

I'm also looking for simultaneous function calling possibilities. I realize this gets more and more indeterminate with each layer of abstraction though, so I'm kinda half-expecting that kinda thing not to exist yet.

@joehenri1 5 ай бұрын

Hi James, what would be the main difference between dynamic router and doing RAG ? Let's say I have multiple functions. I don't want to pass all of them in my prompt. I could write a few query examples for each of the function and use a retriver to get the function that I need. What would be the advantage of dynamic router vs the simple retriver ? Thanks for your great content !

@dusanbosnjakovic6588 6 ай бұрын

Any experiments on how many routes can you support? And for things like intents, how do you think we can route to hundreds of routes potentially?

@jamesbriggs 6 ай бұрын

it is essentially extreme classification problem, and I have seen that specific use-case applied to thousands of classes - so afaict at least that many - I will test this out though soon

@anpham 4 ай бұрын

What is the value of this while we already had Nemo GuardRails? Does the GuardRails config essentially do this and more?

@3GSnapShot 6 ай бұрын

Hey JAMES, i more than sure that openai is doing the same semantic search when choose function calling. Did you try 10 different functions for Dynamic Routes and also have an option bypassing the function?

@TommyJefferson1801 6 ай бұрын

Good optimization. I'm currently working on a research problem and will be done within this month. Post that my plan is to contribute to your repo. Also why not use a small LLM (phi) to generate code which you can further run on code interpreter? I'm saying small LLM because it generates fast

@jamesbriggs 6 ай бұрын

that'd be awesome would be great to have you contribute! We can integrate OSS LLMs now too, see github.com/aurelio-labs/semantic-router/blob/main/docs/05-local-execution.ipynb

@user-sz1iw4zi4y 6 ай бұрын

This is great work! It seems like this approach limits us to one or the other, and from some testing the routing appears to be greedy. How would we have the option to use either static and dynamic routing?

@jamesbriggs 6 ай бұрын

We can include both static and dynamic routes in the same route layer - it will decide to not use a route if none are similar enough (ie returns Route(None)) - but we do want to add route specific thresholds to improve the flexibility here, and down the line some auto optimization seems reasonable too

@user-sz1iw4zi4y 6 ай бұрын

@@jamesbriggs That's true that we can mix the type of route in the same route layer, but what I mean is that having a static get time method that modifies the query is no longer possible when you add the function schema. You'll end up getting some sort of TypeError or mapping error. Basically, I want to be able to ask "What time is it in Littleton Colorado" and "What time should I eat lunch" both of which would hit a time-like route but require different logic. And because the route layer appears to be greedy in the sense that if I make a route layer with routes ["get_time_static", "get_time_dynamic"] everything seems to route to get_time_static because the utterances are so similar and it was placed first. I hope my concern makes sense.

@jamesbriggs 6 ай бұрын

@@user-sz1iw4zi4y okay yes I understand now, it should actually work if you set up the routes with enough utterances to separate them both - what I would recommend is to test with a ton of example questions and where you see the wrong route being chosen add the query to the correct Route.utterances list - doing this with enough examples should result in the correct routes being identified between them It has been awhile since the logic for route selection was written, so I may need to revisit - but the choice should be based on which route scores highest and should be independent of the route order

@carterjames199 6 ай бұрын

I have an embedding question could you use the local embedding with an open ai model? Or do the open ai model require open ai embeddings?

@jamesbriggs 6 ай бұрын

you can use open source embedding models, see here for example :) github.com/aurelio-labs/semantic-router/blob/main/docs/05-local-execution.ipynb

@carterjames199 6 ай бұрын

Your using gpt3.5 I dont see you specificy a model can this also work with gpt 4?

@jamesbriggs 6 ай бұрын

yes would work better with gpt-4 even, and we also support cohere LLM, and open source LLMs

@carterjames199 6 ай бұрын

@@jamesbriggs great can you showcase specifying the model in your next video great stuff

@carterjames199 6 ай бұрын

@@jamesbriggs I’m going to build a bit with this and use it for one of my first videos on my channel big ups

@BPRimal 6 ай бұрын

Doesn't OpenAI API already have this? They use OpenAPI schema as well. How is this different?

@jamesbriggs 6 ай бұрын

This is a more efficient and deterministic way of doing what OpenAI do with function calling, rather than providing description for when to use each tool (which is fed into the LLM call, costing tokens), we define a set of queries that should trigger a route (does not get fed into the llm, saving tokens and time). Because we don’t feed descriptions into the llm for decision making we can also scale the number of tools being used to hundreds, thousands, or more tools (ie routes)

@SimonMariusGalyan 6 ай бұрын

Can I use it with open source LLM?

@jamesbriggs 6 ай бұрын

yes, see here kzbin.info/www/bejne/hHimpXV8n9-hmsU

@user-pb5pn1np3m 6 ай бұрын

why it can make this much faster? it used LLM too

@jamesbriggs 6 ай бұрын

less token input/output because we don't need a full agent description, with many different tool descriptions etc - instead we use semantic route to choose the tool, and pass it directly to the LLM which generates the response second reason for faster speed is that no all tools require dynamic (ie generated routes), some can be static, trigger a function to retrieve data (for example) and return directly to final LLM call - with that you're skipping one of the LLM calls

@llaaoopp 6 ай бұрын

The LLM is only creating the function call output. It does not have to generate a „Thought“. This step is done in by the semantic routing which is less versatile but much faster and more stable (in the scenarios the routes account for)