Thank you for sharing this research, Zach! Were there any common patterns you noticed from the instances of published research that tended to make it easier for the agents to be able to reproduce the results?
@ZacharySiegel7 күн бұрын
Hi Mark, great to hear from you! We included papers from 3 disciplines: computer science, medical sciences, and social sciences. Agents scored about 20% higher on computer science tasks compared to the other two fields. It turns out that this is explained by the fact that computer science tasks tend to be written in Python, whereas medical and social sciences tasks are more in R. These R tasks are harder to reproduce because the dependency installation process is generally much more tedious, and the results are outputted in long PDF, which can be difficult to search through. You raise an interesting application though; if agents could be used to identify why certain projects are not reproducible, more tailored guidance towards authors in various fields could be provided to preemptively address certain issues!