Рет қаралды 6,777
When to use a Llama 8B, when to update to a 405B model? When to pay for a o1 or o3 model? Why? What you can expect in performance?! How is the complexity of your task the defining criteria for choosing the right LLM? All answer in my new, breathtaking video.
New insights into AI sys and advanced LLMs for agents, with a new study how and when to update your LLM for a better reasoning performance (including inference reasoning /test-time-compute models).
Terms within video explained:
---------------------------------------
def pass_at_k(n, c, k):
"""
calculate the pass@k probability
:param n: total number of samples
:param c: number of correct samples
:param k: k in pass@$k$ the number of top samples to consider in the pass@k calculation.
"""
if n - c (less than) k: return 1.0
return 1.0 - np.prod(1.0 - k / np.arange(n - c + 1, n + 1))
Breakdown of formulae:
First: np.arange(n - c + 1, n + 1): Generates an array of integers from n - c + 1 to n (inclusive).
Second: k / np.arange(n - c + 1, n + 1): Divides k by each element in the generated array, resulting in an array of fractions.
Third: 1.0 - (above result): Subtracts each fraction from 1.0, yielding the probabilities of not selecting each specific incorrect sample.
Fourth: np.prod(...): Calculates the product of these probabilities, representing the probability that all k selected samples are incorrect.
Fifth step: 1.0 - (product): Subtracts this product from 1.0 to obtain the probability that at least one of the k samples is correct.
The pass_at_k function provides a probabilistic measure of a model's performance in generating correct code samples within a specified number of attempts (k). This metric is particularly useful in evaluating code generation models, as it reflects the likelihood of obtaining at least one correct solution among the top k generated samples.
-----------------------------------------
All rights w/ authors:
ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning
by Bill Yuchen Lin, Ronan Le Bras, Kyle Richardson, Ashish Sabharwal, Radha Poovendran, Peter Clark, Yejin Choi
form University of Washington, Allen Institute for AI and Stanford University
@stanford @universityofwashington @allenai
Feb 3, 2025
#airesearch
#chatgpt
#o1
#o3
#education
#stanford