Thanks for the pointer to this write-up. Seems quite clear that agentic solutions in coding space aren’t there yet - but as you say, it’s likely only going to improve from here.
@jsbgmc661311 сағат бұрын
What LLM model is used in Devin for this test?
@SasskiaLudin6 сағат бұрын
This is always the same issue, the utter lack of metacognitive integrated abilities, i.e. to have the agent self critically assess its own progress toward the goal and when not progressing toward it, backtrack to an alternate approach, meanwhile piling up the successfully completed subtasks (memorizing and indexing them, to have them available as stepping stones for further potential reuse), and iteratively doing so until actual first successful completion (an later trying to optimize it). What is particularly infuriating is when the system gets stuck in a never ending loop but this might also just be indicative of a too small context window relatively to the size of the code repository to simultaneously address and manage...
@engineerprompt4 сағат бұрын
I think a potential solution would be to have two agents, one that performs a task the other verify. They needs to be completely independent.