No video

Terry Yue Zhuo "BigCodeBench: Benchmarking Code Generation"

  Рет қаралды 78

Rohan Alexander

Rohan Alexander

Күн бұрын

Thursday 11 July 2024, 9am (EDT)
Toronto Data Workshop
Terry Yue Zhuo, Monash University
“BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions”
In this talk we introduce BigCodeBench, a benchmark that challenges LLMs to invoke multiple function calls as tools from 139 libraries and 7 domains for 1,140 fine-grained programming tasks. Our evaluation of 60 LLMs shows that LLMs are not yet capable of following complex instructions to use function calls precisely, with scores up to 60%, significantly lower than the human performance of 97%.
Terry Yue Zhuo is a PhD candidate in Computer Science at Monash University and the CSIRO’s Data61. He holds a Bachelor of Computer Science (Honours) from Monash University. He is additionally an associate member of the Sea AI Lab, a visiting scholar at Singapore Management University, and a research technician at CSIRO’s Data61. His research has been published at venues including EMNLP, ICLR, EACL, and TMLR.

Пікірлер
Bradley Congelio - Introduction to NFL Analytics with R
33:30
Rohan Alexander
Рет қаралды 193
Do you need a degree to get into the industry?
1:22
CG Spectrum
Рет қаралды 5 М.
Алексей Щербаков разнес ВДВшников
00:47
SPILLED CHOCKY MILK PRANK ON BROTHER 😂 #shorts
00:12
Savage Vlogs
Рет қаралды 47 МЛН
Challenge matching picture with Alfredo Larin family! 😁
00:21
BigSchool
Рет қаралды 41 МЛН
الذرة أنقذت حياتي🌽😱
00:27
Cool Tool SHORTS Arabic
Рет қаралды 14 МЛН
DevOps Internship Program: 2.1 Kubernetes II ( Day 9) Live
1:49:26
KalKey Solution
Рет қаралды 2,2 М.
C++ Should Be C++ - David Sankel - C++Now 2024
1:28:49
CppNow
Рет қаралды 13 М.
My Honest College Advice for Computer Science Majors
14:06
Sayhyun (세현)
Рет қаралды 123 М.
Cameron Buckner - "The philosophy of Large Language Models"
43:36
Rohan Alexander
Рет қаралды 166
Алексей Щербаков разнес ВДВшников
00:47