NEW - Anthropic Updated Claude Models & Computer Use Agents!!

  Рет қаралды 16,523

Sam Witteveen

Sam Witteveen

Күн бұрын

Пікірлер: 53
@drhxa
@drhxa Ай бұрын
Used new Sonnet 3.5 today for work (coding). It's def a solid improvement. I'd say it's on par with o1-preview or o1-mini but much faster. Haven't had a chance yet to try it with very long instructions because claude models are typically super strong on instruction following. Can't wait to keep building with it tomorrow!
@richardadonnell
@richardadonnell Ай бұрын
🎯 Key points for quick navigation: 00:00:00 *🚀 Introduction and New Model Overview* - Announcement of two new Claude models: 3.5 Sonnet and 3.5 Haiku. - Overview of how the new models fit into existing frameworks. - Mention of Opus 3.5, which is anticipated but not yet available. 00:01:00 *📊 Performance and Benchmark Comparisons* - 3.5 Sonnet outperforms previous models on most benchmarks. - Benchmarked against GPT-4o, Gemini 1.5 Pro, and others. - Highlight of SWE Bench score improvement from 33.4% to 49%. - Focus on agentic tool use and coding enhancements. 00:03:27 *⚡ Haiku Model Details and Future Potential* - Haiku 3.5 expected to outperform Claude 3 Opus. - Limitations: initially released as text-only, with image input support to follow. - Potential for fast and affordable performance in many tasks. 00:04:23 *🖥️ API Development and Computer Interaction* - Introduction of an API that enables Claude models to interact directly with computers. - Allows searches and task execution through a browser autonomously. - Benchmarked on OSWorld; possible risks highlighted. 00:06:20 *🧪 Demonstrations and Precautions* - Demo videos showcase model abilities like filling Google Sheets and performing searches. - Identified risks include errors during testing and potential misuse. - Suggested using a separate computer for safety when testing the API. 00:08:25 *📋 Conclusion and Summary* - Summary of the benefits of using Sonnet for coding and Haiku for fast tasks. - Speculation about the release of Opus 3.5. - Invitation for viewer feedback and future exploration of the API usage. Made with HARPA AI
@RichardWatson1
@RichardWatson1 Ай бұрын
Should be some online VM desktop you could use the computer use on. Reduce risks and give more people a way to use it safely.
@davidtindell950
@davidtindell950 Ай бұрын
Exciting ! Thank You !!
@jeffsteyn7174
@jeffsteyn7174 Ай бұрын
Computer use is going to be a game changer
@samwitteveenai
@samwitteveenai Ай бұрын
Yeah this is exactly what I talked about in the Agent-S video yesterday, just didn't expect it to be here so quickly
@tappiera
@tappiera Ай бұрын
Is this like RPA on steroids?
@TamasDrNagy
@TamasDrNagy Ай бұрын
Thanks, very informative
@billybofh2363
@billybofh2363 Ай бұрын
A very small thing - but one of my 'bots' that was using sonnet 3.5 seems to now be automatically aware of the tool/function-calls it has available. As in, it'll mention them in it's response as 'something you might want to ask me to do'. Not sure if it's just a quirk - but I never had previous models seem user-facing 'aware' of their available tools. It's responses with an eye to a nuanced take on it's system prompt also seems much better. Looking forward to trying Haiku!
@devanshoo
@devanshoo Ай бұрын
computer use has a big big usecase for Software QA specifically. Really excited
@justtiredthings
@justtiredthings Ай бұрын
yeah, one of the biggest missing pieces for a mostly autonomous SWE. if we can automatically feed console errors back into the prompt (easy) and have the agent actually test various aspects of the app (hard), then that's really all you need to set a coding agent up with a list of product requirements, leave it alone for a while, and come back the next day to see what it's managed to build in an iterative fashion
@wendten2
@wendten2 Ай бұрын
Why did they not change the name to Claude 4 or at the very least 3.6.. Isn't that what those numbers are for?
@samwitteveenai
@samwitteveenai Ай бұрын
Agree I almost called it 3.6 in the Thumbnail to show it was new
@SwapperTheFirst
@SwapperTheFirst Ай бұрын
my assumption is that they're using the same architecture, as in 3.5 v1.
@toadlguy
@toadlguy Ай бұрын
I think it isn't the architecture but the foundation model weights (i.e., the weights may change due to fine tuning, quantization, etc., but based on the same training) that are the same. If you mean architecture as in the Model architecture, I agree 😉
@wendten2
@wendten2 Ай бұрын
​@@toadlguy In my understanding the first denominator is the architecture, and the decimals the weight tuning.. but that just from pure intuition
@justtiredthings
@justtiredthings Ай бұрын
OpenAI does the same annoying thing. Why denominate 15 different versions of GPT4 by date instead of just using the versioning number like a normal person
@GNARGNARHEAD
@GNARGNARHEAD Ай бұрын
*excitement intensifies!*
@alchemication
@alchemication Ай бұрын
Looking forward to compare gpt4o-mini and the new haiku, as they definitely have their place. And trying the new sonnet asap obviously (assuming price is same..)
@sheikhfaizan73
@sheikhfaizan73 Ай бұрын
How to use previous model because i wana to use previous model but don't show any option to use previous model
@denijane89
@denijane89 Ай бұрын
Funny, about 4 hours ago, I got one very unfortunate session with Claude in which it basically forgot Latex. I wonder if it has something to do with the update. Because it looked VERY odd. (like writing pi as a symbol and not as \pi etc).
@samwitteveenai
@samwitteveenai Ай бұрын
interesting I wonder if that was during the swap over
@denijane89
@denijane89 Ай бұрын
@@samwitteveenai Very likely, because I've never seen Claude be so stupid. But after few prompts it normalized.
@ukoni8667
@ukoni8667 Ай бұрын
Thats why he was saying AGI by 2026..the new era of autonomous machines
@aliyananwar3727
@aliyananwar3727 Ай бұрын
computer use is beyond over hyped agents of langchain, we need powerful ocr and and powerful llm for this to replicate
@pure0027
@pure0027 Ай бұрын
The next question is how to make all agents work together and check/verify in one company? Maybe beyond one company.
@marilynlucas5128
@marilynlucas5128 Ай бұрын
I've been waiting for a model that can use blender efficiently. i describe the scene i want and then it gets to work to build the scene in blender
@justtiredthings
@justtiredthings Ай бұрын
It looks like Adobe is working on something like that with project scenic
@marilynlucas5128
@marilynlucas5128 Ай бұрын
@@justtiredthings Oh, I'll check it out. I have been exploring Open Usd as an alternative. First using a huge library of 3d assets, that can be composed by an llm into an open usd scene. Then i've successfully created a 3d environment. For a more dynamic approach, I've been exploring the use of the Kolmogorov-Arnold theorem to create continuous functions that project 2d gaussian splats onto a 3d plane. Most projects have been focusing on 3d generation models when tools exist that an llm should be able to use to produce any 3d scene.
@justtiredthings
@justtiredthings Ай бұрын
@@marilynlucas5128 yeah, we really need more tooling like that for consistency in AI filmmaking
@mybocks3
@mybocks3 Ай бұрын
LMAO! 😂 Yellowstone is quite beautiful ❤️
@samwitteveenai
@samwitteveenai Ай бұрын
AGI just wants to see people's nice pics
@micbab-vg2mu
@micbab-vg2mu Ай бұрын
an interesting update:)
@8888-u6n
@8888-u6n Ай бұрын
can you make a video on how to use the computer model to do an action 🙂
@samwitteveenai
@samwitteveenai Ай бұрын
just released it !
@LeonvanBokhorst
@LeonvanBokhorst Ай бұрын
Bring it on 😁
@GiovanneAfonso
@GiovanneAfonso Ай бұрын
please do more
@0011110000111110
@0011110000111110 Ай бұрын
Computer use will be great ONCE IT IS RUN LOCALLY. I don't trust cloud machines owned by others to be using my computer, that makes it not my computer anymore and it's a pain making a VM for each time.
@justtiredthings
@justtiredthings Ай бұрын
Hard agree
@r.m8146
@r.m8146 Ай бұрын
Amazing.
@BenKlock-k9w
@BenKlock-k9w Ай бұрын
Chapters?
@AdamTwardoch
@AdamTwardoch Ай бұрын
Version numbers are kind of useless if vendors don't increase them when they actually upgrade the functionality. I don't know why they wouldn't call the new model Claude 3.6 or so.
@hqcart1
@hqcart1 Ай бұрын
It's playwright framework or similar, then LLM interacts with it, it's not new.
@dankprole7884
@dankprole7884 Ай бұрын
Software services should provide APIs and SDKs. The idea of an agent clicking around a screen like a person is so unbelievably dumb and inefficient.
@yellowboat8773
@yellowboat8773 Ай бұрын
The assumption is it can be integrated into any computer and system, instead of relying on an API for every part of the computer. Can you imagine trying to write an API just to open a window, on windows Mac and Linux? Times that by the thousands of different functions required. Just train the AI to use the PC like we do, way more adaptable in the long term
@dankprole7884
@dankprole7884 Ай бұрын
@@yellowboat8773 you wouldn't use an API to open a window, you'd use it to get the data directly from the application backend in a reliable way. Front ends are point solutions to the inefficiencies of having a human agent. Why replicate this inefficient and buggy layer?
@Various666
@Various666 Ай бұрын
saying you need another computer makes no sense, just don't use an admin role and do not provide passwords to sensitive conent/service.
@merlingrim2843
@merlingrim2843 Ай бұрын
Yeah, computer use will not pass security audits
@toadlguy
@toadlguy Ай бұрын
On a Mac (or, I suppose, a Linux box) you could sandbox all app interactions under a user with diminished privileges to protect both your machine and your data. It will be interesting to see which model will prevail. Apple's very complete restrictions, Anthropic's (as I suggest) sandboxed restrictions or Google's (and perhaps MS) lack of restrictions.
@samwitteveenai
@samwitteveenai Ай бұрын
Really good point
Anthropic Does The Unthinkable with Haiku 3.5
9:25
Sam Witteveen
Рет қаралды 25 М.
Anthropic MCP + Ollama. No Claude Needed? Check it out!
18:06
What The Func? w/ Ed Zynda
Рет қаралды 8 М.
Мясо вегана? 🧐 @Whatthefshow
01:01
История одного вокалиста
Рет қаралды 7 МЛН
Chain Game Strong ⛓️
00:21
Anwar Jibawi
Рет қаралды 41 МЛН
Cat mode and a glass of water #family #humor #fun
00:22
Kotiki_Z
Рет қаралды 42 МЛН
Anthropic Computer Use - Hands On Tutorial
9:11
Sam Witteveen
Рет қаралды 16 М.
Anthropic's New Agent Protocol!
15:35
Sam Witteveen
Рет қаралды 42 М.
Claude Computer Use TESTED - This is VERY Promising!
17:39
All About AI
Рет қаралды 48 М.
Gemini 2.0 Flash Thinking
20:13
Sam Witteveen
Рет қаралды 11 М.
Anthropic MCP with Ollama, No Claude? Watch This!
29:55
Chris Hay
Рет қаралды 13 М.
No Code App Development is a Trap
9:31
Coding with Dee
Рет қаралды 334 М.
Anthropic’s New AI Can Control Your Computer!
18:29
Matthew Berman
Рет қаралды 68 М.
Мясо вегана? 🧐 @Whatthefshow
01:01
История одного вокалиста
Рет қаралды 7 МЛН