Agreed. I'm curious to see the results of the updated eval.
@person-jw7vbКүн бұрын
o1 pro alone will probably heavily impacted this paper
@sashetasev505Күн бұрын
Awesome treatment of a great paper! It feels like someone is doing a calm ASMR deep dive on the paper that begets Skynet, but that’s beside the point 😂😂 Liked, subbed and commented - may the algo-gods kindly extend your reach ❤
@poiitidis2 күн бұрын
nice
@CrispinCourtenayКүн бұрын
Interesting, but inference and training on such a small number of last generation GPUs makes this a thought experiment and intellectual stretch, rather than AI vs. Human.
@kaio0777Күн бұрын
I agree plus this is not on the latest AIs at the cutting edge either plus this is just testing not put to the grind to optimize in a company setting Honestly, the real test is to make a shell company completely run by AIs at the top with only humans doing the blue-collar work. If they can run as good as humans or better with less oversight work, as we know it, the work in the future is cooked.
@SamuelAlbanie118 сағат бұрын
@kiao0777 Thanks for the comment. I agree that having a company completely run by AIs would be more representative of full automation. However, it's worth noting that a key goal of this kind of work is to make measurements before full automation occurs (often with the goal of informing safety mitigations that would be best to set up in advance).
@SamuelAlbanie117 сағат бұрын
@CrispinCourtenay Thanks for the comment. I agree the comparison is imperfect, but I think it does a reasonable job of capturing R&D tasks over the time period of one working day. Also, it's worth noting that many AI R&D experiments are conducted at small scales on older generations of hardware to reduce cost (so in that respect, it is not necessarily unrealistic).
@kaio077714 сағат бұрын
@@SamuelAlbanie1, yeah, measurements are good-that's why I watched the video-but what they are testing might not work out in practice imo.