AsyncLM: Asynchronous Large Language Model Function Calling

Рет қаралды 51

Күн бұрын

Ref: arxiv.org/pdf/...
This research introduces AsyncLM, a system designed to improve the efficiency of Large Language Models (LLMs) by enabling asynchronous function calls. Current LLM function calling is synchronous, creating bottlenecks. AsyncLM allows concurrent function execution and LLM operation using an interrupt mechanism, significantly reducing task completion latency. A novel domain-specific language (CML) manages the asynchronous interactions, and fine-tuning strategies adapt LLMs to handle interrupts effectively. The system demonstrates substantial speed improvements while maintaining accuracy, and the interrupt mechanism has broader applications for human-LLM and LLM-LLM interactions.
Asynchronous function calling improves LLM efficiency by allowing the LLM to continue generating tokens for other tasks while function calls execute in the background. This reduces latency and improves resource utilization.
Here's a breakdown of how it works:
●
Synchronous Function Calling: Traditional LLMs use synchronous function calling, where the LLM pauses its token generation and waits for each function call to finish before proceeding. This approach leads to inefficiencies as the LLM, a resource-intensive process, remains idle during function execution.
●
Asynchronous Function Calling: Asynchronous function calling enables the LLM and the function call executor to operate independently without blocking each other. This is similar to asynchronous programming paradigms where events, like function call completions, happen independently of the main program flow (LLM token generation).
○
Interrupt Mechanism: A key aspect of asynchronous function calling is the interrupt mechanism. When a function call completes, the executor notifies the LLM by inserting special "interrupt tokens" into the token stream. The LLM is trained to understand these interrupts and use the returned results in its subsequent processing.
○
CML for Asynchronous Interaction: A domain-specific language called CML (Context Markup Language) is used to represent function calls and interrupts. CML provides the necessary context for the LLM and executor to interact asynchronously.
○
Fine-tuning for Asynchronous Handling: LLMs are fine-tuned to generate asynchronous function calls, handle interrupts, and notify the serving system when they need to pause and wait for function results.
●
Benefits of Asynchronous Function Calling:
○
Reduced Latency: By overlapping the generation and execution of functions, asynchronous function calling reduces the overall task completion time. This is particularly beneficial when tasks involve multiple independent function calls that can be executed in parallel.
○
Improved Resource Utilization: Asynchronous function calling allows the LLM to continuously generate tokens instead of waiting for function calls to complete, making better use of resources and reducing idle time.
○
Automatic Parallelism: Asynchronous function calling enables automatic parallelism without requiring prior knowledge of function call dependencies. For instance, in recursive tasks, asynchronous calls can be executed in parallel, similar to a parallel depth-first search.
●
Example: A user asks the LLM to "Summarize webpage.html, email the summary to people in attendee.xls, and CC the department chair from the directory." Asynchronous function calling enables the LLM to generate function calls for reading the webpage, reading the attendee list, and fetching the chair's contact information concurrently. While these functions execute in the background, the LLM can continue generating tokens for summarizing the webpage content.
Overall, asynchronous function calling significantly improves LLM efficiency by reducing latency, enhancing resource utilization, and enabling automatic parallelism.
Created with NotebookLM