Agent-S : Unleash The Power Of GUI Computer Use Agents !

Рет қаралды 11,448

Күн бұрын

Пікірлер: 35

@steelwolf180 Ай бұрын

This sounds useful for automated testing to mimic how a user would behave then interact a web app or desktop app or even just carry workflow tasks as a chaos user.

@Avman20 2 ай бұрын

Businesses often have bespoke apps that have user documentation but no API. I can see Agent S being fabulous for this type of thing.

@samwitteveenai 2 ай бұрын

Totally agree

@kenchang3456 2 ай бұрын

Thanks Sam. It'll be interesting when you can fine-tune this on your domain specific apps, and what the fine-tuning process would look like.

@samwitteveenai 2 ай бұрын

Interesting idea, you could actually pre-fill a lot of the memory with things that would be useful for the app that you want to use it in. In that way you would be customizing for that use case.

@eado9440 Ай бұрын

I think that agent zero, cline , and omni parser, are already halfway there, and if combined could be even more powerful.

@themax2go 23 күн бұрын

i totally agree

@shiv248_ 2 ай бұрын

awesome explanation sam. Can you do more of these videos of explaining papers really helps merge understanding between GA and scientific knowledge. Where do you find worthwhile papers, hugging face?

@unclecode 2 ай бұрын

You mentioned something I strongly believe in. A generic solution is required. It's just a matter of time before website owners, apps, and platforms realize they need to create specific layers for AI agents and assistants. Rather than creating weird solutions to communicate with apps, it makes sense now when apps can't provide enough API data for AI applications. Website owners will likely have specific markdown with knowledge and instructions for AI, possibly developing a markup language for AI data. We can even include tools our websites or apps want AI to use. Like with robots.txt, website owners will define which parts AI can control. This isn't far off. Even for other products/services like books, musics or movies, authors can include that AI content layer. Until then, IMO we have patchwork solutions that aren't permanent but help understand the system's needs, weaknesses, and strengths.

@AnmolSharma293 2 ай бұрын

What would be some of these business which are ripe for this? Like what business would be OK with bots using their web product? I'm curious because I can't seem to think there may be many out there.

@Wotevz 2 ай бұрын

Semantic markup already exists for SEO. Tools and knowlege can be shared in an agent-friendly way. Linked data and OpenAPI snippets point the way.

@Wotevz 2 ай бұрын

Agents will become economic. Any business that wants to capture agentic $. Might be behind paywalls or purchase. LLMs grok JSON-LD so it’s kinda exists just haphazardly.

@unclecode Ай бұрын

@@AnmolSharma293 For instance, at present, marketing has been heavily based on search. This means people were supposed to search, but soon, everyone will have a shopping AI assistant. Their ai-assistant will search and interact with internet to find suitable goods and services. At that point, retail owners will start to incorporate features and data that are AI-agent-friendly. And that's the benefit, as those who do that, such as those with better SEO, will get more sales and conversions. This is just one example of how industries are convinced to provide a layer in their applications for AI assistance. The same thing happened when digital marketing came into the picture, and gradually, all companies and websites realized the need to improve their SEO. So, this is one way of thinking about how it will be in the future.

@unclecode Ай бұрын

@@Wotevz You are very true, that's what happens when digital marketing is first introduced. Everyone understands they need to build things like SEO at the beginning. However, AI assistance can do much more than just provide information on SEO. They have the capability to run an action. I believe this will be a layer that business owners, website owners, and web applications will use to create context or prompt engineering plus some available tools (with their JSON schema) to make it friendlier to AI agents. There will be a special structure allowing businesses to input their context, provide information, and specify their tools. Subsequently, the AI assistant can utilize this setup in real-time to generate content and communicate accordingly. For instance, when everyone has a shopping AI assistant, it will browse the internet and access websites that have this feature, making communication easier for the assistant and ultimately contributing to more sales for businesses and better service for users. This is similar to how SEO operates. Perhaps we will have 'SEO Plus' specifically designed for AI agents in the future.

@muhammadhasnain8177 2 ай бұрын

I have done so many project after getting a lot knowledge from you. We need the new video on image generation model that can handle the text and facial and body problem

@alx8439 2 ай бұрын

Several months ago, when Rabbit R1 device was announced, there was another wave of "large action models" - an attempt of training or fine tuning transformers to do the UI interaction stuff. I wonder where did this eventually go? There were few quite promising products

@samwitteveenai Ай бұрын

Yeah it seemed they just hyped it too much and then didn't get their. Adept was another interesting one as well.

@GNARGNARHEAD 2 ай бұрын

that's really exciting!

@arungnanaable 2 ай бұрын

Thanks Sam. I'm learning everything by myself and I need help in identifying worthy recent research papers to study. How do you know which ones are good?

@samwitteveenai Ай бұрын

check out /papers on Huggingface I most see people talking on twitter or friends send me links. There are so many papers these days.

@kyoungd 2 ай бұрын

Is it an OP version of N8N?

@40bombala 2 ай бұрын

Very interesting, does this compete with Microsoft UFO?

@muhammadhasnain8177 2 ай бұрын

Create a video on image generation model plz

@megaklis.vasilakis Ай бұрын

Funny that the next day this video came out anthropic published their computer use API

@samwitteveenai Ай бұрын

totally. I felt this would be coming but never expected it would be the next day.

@davidmetekingi9694 Ай бұрын

24 hours later... Anthropic brings out computer use.

@samwitteveenai Ай бұрын

lol exactly. I felt this would be coming but never expected it would be the next day.

@pensiveintrovert4318 2 ай бұрын

I would be super concerned to allow anything to run directly on my desktop. It could see passwords, cryptographic keys, modify the registry, destroy the system.

@samwitteveenai 2 ай бұрын

What about in a docker instance ?

@pensiveintrovert4318 2 ай бұрын

@@samwitteveenai it certainly would mitigate my concerns, as I could possibly peel off "defective" layers, but this docker would still be subject to control by my machine, and therefore could be damaged beyond repair. I normally do my development in a docker container, but if you have a lot of work that accumulates over time in this container, then losing the container is as bad as losing your machine. There would need be a trusted third software that can selective lock out some APIs, applications, resources, thus limiting the utility of your AI PC overlord.

@tornyu 2 ай бұрын

It might be safer to have it logged in as a different user or on a different machine entirely, and just collaborate with it like you would with a colleague. I wouldn't let a colleague use my computer as me, either.