CORE-Bench Presentation

Tools EVERY Software Engineer Should Know

Hardy's Integral

НУБ И ПРО СТРОЯТ ЗАЩИЩЕННУЮ ТЮРЬМУ ЗА 10 СЕКУНД / 1 МИНУТА / 5 МИНУТ В МАЙНКРАФТ БИТВА СТРОИТЕЛЕЙ

Обзор матча Казахстан - Дания 3:2. Отборочный турнир ЕURO-2024

«Жат бауыр» телехикаясы І 26-бөлім

От первого лица: Школа 4 🤯 ЗАТОПИЛИ КВАРТИРУ УЧИЛКЕ 😂 ЦЫГАНЕ В ШКОЛЕ 😍 ВЕЧЕРИНКА ГЛАЗАМИ ШКОЛЬНИКА

CORE-Bench Presentation

Рет қаралды 35

Zachary Siegel

Zachary Siegel

Күн бұрын

Пікірлер: 2

@MarkRoth8 7 күн бұрын

Thank you for sharing this research, Zach! Were there any common patterns you noticed from the instances of published research that tended to make it easier for the agents to be able to reproduce the results?

@ZacharySiegel 7 күн бұрын

Hi Mark, great to hear from you! We included papers from 3 disciplines: computer science, medical sciences, and social sciences. Agents scored about 20% higher on computer science tasks compared to the other two fields. It turns out that this is explained by the fact that computer science tasks tend to be written in Python, whereas medical and social sciences tasks are more in R. These R tasks are harder to reproduce because the dependency installation process is generally much more tedious, and the results are outputted in long PDF, which can be difficult to search through. You raise an interesting application though; if agents could be used to identify why certain projects are not reproducible, more tailored guidance towards authors in various fields could be provided to preemptively address certain issues!

Tools EVERY Software Engineer Should Know

11:37

Tools EVERY Software Engineer Should Know

Tech With Tim

Рет қаралды 23 М.

Hardy's Integral

13:47

Hardy's Integral

Michael Penn

Рет қаралды 15 М.

НУБ И ПРО СТРОЯТ ЗАЩИЩЕННУЮ ТЮРЬМУ ЗА 10 СЕКУНД / 1 МИНУТА / 5 МИНУТ В МАЙНКРАФТ БИТВА СТРОИТЕЛЕЙ

27:29

НУБ И ПРО СТРОЯТ ЗАЩИЩЕННУЮ ТЮРЬМУ ЗА 10 СЕКУНД / 1 МИНУТА / 5 МИНУТ В МАЙНКРАФТ БИТВА СТРОИТЕЛЕЙ

DakPlay

Рет қаралды 4,8 МЛН

Обзор матча Казахстан - Дания 3:2. Отборочный турнир ЕURO-2024

17:28

Обзор матча Казахстан - Дания 3:2. Отборочный турнир ЕURO-2024

QAZSPORT TV / ҚАЗСПОРТ TV

Рет қаралды 880 М.

«Жат бауыр» телехикаясы І 26-бөлім

52:18

«Жат бауыр» телехикаясы І 26-бөлім

Qazaqstan TV / Қазақстан Ұлттық Арнасы

Рет қаралды 434 М.

От первого лица: Школа 4 🤯 ЗАТОПИЛИ КВАРТИРУ УЧИЛКЕ 😂 ЦЫГАНЕ В ШКОЛЕ 😍 ВЕЧЕРИНКА ГЛАЗАМИ ШКОЛЬНИКА

28:03

От первого лица: Школа 4 🤯 ЗАТОПИЛИ КВАРТИРУ УЧИЛКЕ 😂 ЦЫГАНЕ В ШКОЛЕ 😍 ВЕЧЕРИНКА ГЛАЗАМИ ШКОЛЬНИКА

Руслан Гладенко

Рет қаралды 4 МЛН

What's the REAL Reason Writing Things Down Actually Works?

6:47

What's the REAL Reason Writing Things Down Actually Works?

Chadeveryday

Рет қаралды 73

How to make a Multiplayer Cloud Platformer Game on Scratch!

21:52

How to make a Multiplayer Cloud Platformer Game on Scratch!

Zachary Siegel

Рет қаралды 18 М.

37. OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement (ACL 2024) 리뷰

27:00

37. OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement (ACL 2024) 리뷰

Human-AI Collaborative Programming Platform

Рет қаралды 13

Scratch Presentation

25:05

Scratch Presentation

Zachary Siegel

Рет қаралды 509

A Very Nice Math Olympiad Problem || Solve for all values of x=??💯✍️🖋️

13:05

A Very Nice Math Olympiad Problem || Solve for all values of x=??💯✍️🖋️

Mamta maam

Рет қаралды 1,3 М.

How to make an INFINITE Multiplayer Cloud Game in Scratch 2.0!

10:40

How to make an INFINITE Multiplayer Cloud Game in Scratch 2.0!

Zachary Siegel

Рет қаралды 15 М.

HANDS-ON WORKSHOP | Cloud Security Forensics & Incident Response: Aviata Chapter 9

1:19:42

HANDS-ON WORKSHOP | Cloud Security Forensics & Incident Response: Aviata Chapter 9

SANS Cloud Security

Рет қаралды 698

Olympiad Mathematics | Beautifully solved | High school students can solve this

4:24

Olympiad Mathematics | Beautifully solved | High school students can solve this

Phil Cool Math

Рет қаралды 68

China | Math Olympiad | Nice Algebra Exponential Problems

11:09

China | Math Olympiad | Nice Algebra Exponential Problems

Leo Dorber

Рет қаралды 939

Scratch 2.0 Tutorial Video: Pen Tools

7:21

Scratch 2.0 Tutorial Video: Pen Tools

Zachary Siegel

Рет қаралды 157

НУБ И ПРО СТРОЯТ ЗАЩИЩЕННУЮ ТЮРЬМУ ЗА 10 СЕКУНД / 1 МИНУТА / 5 МИНУТ В МАЙНКРАФТ БИТВА СТРОИТЕЛЕЙ

27:29

НУБ И ПРО СТРОЯТ ЗАЩИЩЕННУЮ ТЮРЬМУ ЗА 10 СЕКУНД / 1 МИНУТА / 5 МИНУТ В МАЙНКРАФТ БИТВА СТРОИТЕЛЕЙ

DakPlay

Рет қаралды 4,8 МЛН