Devin AI - Is the First AI Software Engineer Ready? Real World Testing Results

Рет қаралды 4,269

Күн бұрын

DEVIN AI promised to revolutionize software engineering as the first autonomous coding agent-but how does it perform in real-world tests? In this video, we dive into a month-long experiment to see if Devin AI lives up to the hype.
LINKS:
Blogpost: www.answer.ai/...
Hamel's post: x.com/HamelHus...
💻 RAG Beyond Basics Course:
prompt-s-site....
Let's Connect:
🦾 Discord: / discord
☕ Buy me a Coffee: ko-fi.com/prom...
|🔴 Patreon: / promptengineering
💼Consulting: calendly.com/e...
📧 Business Contact: engineerprompt@gmail.com
Become Member: tinyurl.com/y5h...
💻 Pre-configured localGPT VM: bit.ly/localGPT (use Code: PromptEngineering for 50% off).
Signup for Newsletter, localgpt:
tally.so/r/3y9bb0
All Interesting Videos:
Everything LangChain: • LangChain
Everything LLM: • Large Language Models
Everything Midjourney: • MidJourney Tutorials
AI Image Generation: • AI Image Generation Tu...

Пікірлер: 6

@RoryMacdonald-pfff 14 күн бұрын

Thanks for the pointer to this write-up. Seems quite clear that agentic solutions in coding space aren’t there yet - but as you say, it’s likely only going to improve from here.

@jsbgmc6613 14 күн бұрын

What LLM model is used in Devin for this test?

@Cgl190 8 күн бұрын

Hello sir I want to ask one thing that Mechanical engineering students can pursue their career in AI/ML If yes then how they can start

@SasskiaLudin 14 күн бұрын

This is always the same issue, the utter lack of metacognitive integrated abilities, i.e. to have the agent self critically assess its own progress toward the goal and when not progressing toward it, backtrack to an alternate approach, meanwhile piling up the successfully completed subtasks (memorizing and indexing them, to have them available as stepping stones for further potential reuse), and iteratively doing so until actual first successful completion (an later trying to optimize it). What is particularly infuriating is when the system gets stuck in a never ending loop but this might also just be indicative of a too small context window relatively to the size of the code repository to simultaneously address and manage...

@engineerprompt 14 күн бұрын

I think a potential solution would be to have two agents, one that performs a task the other verify. They needs to be completely independent.

@SasskiaLudin 13 күн бұрын

@@engineerprompt Yes, but each one has to steer the other one in the right direction, so not completely independent, or if so, reunited by a third (overarching) agent combining both inputs...