This New AI Vision Model Beats Everything (Molmo Ai)

Рет қаралды 32,034

TheAIGRID

Күн бұрын

Пікірлер: 95

@SwaggerjackProductions 3 ай бұрын

Soon we're gonna have AI create videos about AI that is consumed by AI (that is then summarizd by AI for AI)

@Z-Z-W_origin 3 ай бұрын

Only until it becomes ouroboros and poisons itself (and then hallucinates to death.) Then, us humans will have to input real information to continue to improve it. We're already past the golden age of original content (imho). As AI consumes all past data to train, new data will become increasingly valuable. 🙈😅

@lovisakaffe 3 ай бұрын

how do we create protected and data siloed content? some sort of honeypot that leaves us alone from the AI

@Z-Z-W_origin 3 ай бұрын

@@lovisakaffe i think the job of the future for humans will be creating data specifically to be used by whatever private company needs it to train or refine their models. And people who do not consume AI data will be the most valuable, since they haven't been influenced by it. Picture entire towns or campuses where they are isolated, for the sole purpose of generating new authentic content. Spend several years in one and then retire to the real world. Dystopian, yes. But realistic? Perhaps. People do crazier things for money already, lol

@shadygamererfan377 3 ай бұрын

Synthetic data!

@shadygamererfan377 3 ай бұрын

Models are already being built on synthetic data.

@brandonm3674 3 ай бұрын

Imagine giving it to a blind person and instructing it to “guide me throughout the day as we navigate downtown New York. User just uses an ear piece. They could “see” red lights, people waving, or girls winking. Amazing stuff

@MichaelSmith-lm5sl 3 ай бұрын

- **Introduction to Momo (00:00)**: The video introduces Momo, a new family of multimodal AI models that surpass existing standards by enabling advanced interactions with both physical and virtual environments. - **Key Features (00:34)**: Momo not only interprets images and text but can also point at what it perceives, enhancing its ability to interact meaningfully. - **Demonstrations of Functionality (01:27)**: A series of demos showcase Momo's capabilities, including counting people, converting data to JSON, and answering various queries related to everyday situations. - **Performance Metrics (03:17)**: Momo's vision capabilities are highlighted, showing that it competes effectively with larger models, with human evaluations rating it highly against other closed-source models. - **Data Quality Over Quantity (05:24)**: Momo's training focuses on high-quality data, using fewer but more detailed images and descriptions to improve learning efficiency and reduce issues like hallucinations. - **Innovative Data Collection Methods (06:27)**: The model employs detailed human descriptions of images and speech-based data collection to enhance understanding and accuracy in recognizing objects and their contexts. - **Integration in Robotics (12:00)**: Momo's vision capabilities are positioned as beneficial for robotics, assisting robots in understanding their environment, identifying objects, and executing tasks efficiently. - **Conclusion and Future Implications (16:36)**: The video wraps up by reflecting on the rapid advancements in AI technology and hints at upcoming innovations from various companies in the field.

@kronux3831 3 ай бұрын

I’ve been following the A.I. space for a while, and last year, I made a timeline trying to predict when I reasonably expect different products to be released. Updated it again in June. Somehow, nearly every guess gets beaten by at least a couple of months. Very exciting trend to see happening

@Mr.Existence 3 ай бұрын

0:30 closing the gap between open... and proprietary systems. Absolutely brilliant.

@BruceWayne15325 3 ай бұрын

Very impressive. The most impressive bit was actually the end of the demo where they showed it using agents, something the flagship models are still working on. And this is only a 1B parameter model?!?!?

@Merializer 3 ай бұрын

5:55 Garbage in garbage out. The expression was popular in the early days of computing. The first known use is in a 1957 (Wikipedia says).

@chinobino1474 3 ай бұрын

First thing I thought too. It seems obvious. Why take inaccurate samples that will 'poison' your data. Be specific and deliberate.

@picksalot1 3 ай бұрын

SLAMs - Small Language Agentic Models for the win. Vision is critical, as is quality of data. 😎

@Arcanant 3 ай бұрын

-How many ads this video has? -Yes

@koen.mortier_fitchen 3 ай бұрын

I’ll try it. Looks promising. Plz don’t disappoint, plz don’t disappoint 🤞🤞

@meandego 3 ай бұрын

Never trust people who buy pumpkin latte for 20$.

@feeltheomega 3 ай бұрын

You should enable transcriptions

@BanXxX69 3 ай бұрын

Damn that‘s crazy!!!

@TheAIExplorer-o5j 3 ай бұрын

I have to admit, the strategy at 8:52 15:29 was so clever. Total respect! 🔥

@Stretesky 3 ай бұрын

Hopefully this will work well for self development and learning training, not only for economic functions.

@christopherd.winnan8701 3 ай бұрын

How does this model work in the real world, beyond the realm of luxury coffee machines and VC honeydew? If I hear another LLM order another silicon valley spiced pumpkin latte, I might just lose it!

@baumwollejr 3 ай бұрын

Just imagine the web Versions of the MS Apps! It can do analytics, answer E-Mails and do a lot of workflows

@24-7gpts 3 ай бұрын

So you really can't think of other use cases!

@DailyTuna 3 ай бұрын

Hey, just be thankful they’re not using robots to operate K cup coffee machines. Like that task is so hard!😂

@christopherd.winnan8701 3 ай бұрын

@@24-7gpts = There are so many, but I am sick of valley types using it to order pumpkin lattes.

@christopherd.winnan8701 3 ай бұрын

@@baumwollejr = Sadly copilot and google's offerings in this area are still pretty lame.

@youtube_summarizer-o4m 3 ай бұрын

Summary of the video (Powered by NEX, an AI tool which summarizes KZbin videos) Key Points： 1. [Key Point 1]: Momo AI surpasses large models in vision and interaction. 2. [Key Point 2]: Momo's 72 billion parameter model matches GPT-40 in benchmarks. 3. [Key Point 3]: Momo's data quality focuses on fewer, high-quality images. Important Details: Here's the timeline 00:00:00 Introduction to Momo AI • Speaker introduces Momo AI's multimodal capabilities. • Momo interacts with both physical and virtual worlds. 00:00:37 Demonstrating Momo's Capabilities • Momo demonstrates tasks like counting people and converting tables to JSON. • Momo can also write descriptions for items and answer complex questions. 00:02:33 Vision Capabilities and Benchmarks • Momo's vision capabilities match state-of-the-art models. • Momo outperforms other closed-source models in vision benchmarks. 00:03:50 Data Quality and Pixo • Momo uses high-quality data for training, not quantity. • Pixo gathers detailed descriptions for better AI learning. 00:08:11 Integration with Apple Vision Pro • Momo integrates with Apple Vision Pro for enhanced interaction. • Momo can answer questions and point to objects in images. 00:11:34 Robotics and Vision • Momo aids robotics by improving vision models. • Momo helps robots identify and interact with objects. 00:16:07 Conclusion and Future of AI • Momo's advancements highlight AI's rapid progress. • Speaker speculates on future AI advancements from other companies.

@sylversoul88 3 ай бұрын

Has anyone used it? Does it really work agentically?

@zakyvids6566 3 ай бұрын

Anyone wondering that now that we have text models vision models now the next logical step would be a model for audio one that can do audio tasks like tts speech to speech training new kinds of audio etc

@godtable 3 ай бұрын

If it is actually this good, it's very impressive.

@Boolvtech_official 3 ай бұрын

Amazing 😮

@paulyflynn 3 ай бұрын

oh Molmo

@AvizuraDnB 3 ай бұрын

The breakthroughs don't stop, do they?

@MrRandomPlays_1987 3 ай бұрын

Can't find Molmo's official site, how come?

@Michael_Jeromy_Kaiser 3 ай бұрын

This is incredible!

@Upstatecashew 3 ай бұрын

How can i test out the vision model where the guy tells it to order him a coffee ?

@simonstrandgaard5503 3 ай бұрын

Amazing

@TheRealChrisVeal 3 ай бұрын

*grabs popcorn*

@Nightstorm-2516 3 ай бұрын

20 bucks for a cup of coffee?!!!!

@sephirothcloud3953 3 ай бұрын

Molmo 1B = 29GB, how do you load this?

@24-7gpts 3 ай бұрын

Decent GPU

@marcelogobello9757 3 ай бұрын

For Lazzy people is PERFECT !

@DieselBlack-b6r 3 ай бұрын

I wish this channel would at least make some effort to de-ChatGPT the script before laying down the narration. The steady supply of contrasting sentence structures is a dead giveaway.

@sights33r14 3 ай бұрын

Imagine this AI model playing Minecraft.

@pollywops9242 3 ай бұрын

Ouch my privacy though

@DiegoFernandez-cy3fr 3 ай бұрын

Future is going to be BaaS (Brain as a Service)

@JohnsonNong 3 ай бұрын

cool❤

@ChronicKPOP 3 ай бұрын

the start it says "momo" I thought from Twice

@galailliz 3 ай бұрын

Yes papi beat it

@VanSocero 3 ай бұрын

Its decent but it's not were it needs to be. Tried to take text off of a comic page and had to add about 79% percent of it myself

@Cory-v4w 3 ай бұрын

So in this objective reality...who has the pleasure of superposition upon observation. How much pleasure can you give me. How much pleasure can you make it mine. We are having fun today. Today is a good day to blurr the lines of reality.

@janweber1699 3 ай бұрын

real voice = sub

@69x 3 ай бұрын

😢reading off ai script for that intro “diving”

@24-7gpts 3 ай бұрын

i use diving too as a human being, because it's a human that made the word

@JellySword8 3 ай бұрын

Stay at home text adventure here we come

@NneonNTJ 3 ай бұрын

That whole presentation feels fake to me, lets wait and see when it releases

@Subpilot1 3 ай бұрын

$20 Latte 🤪

@hai.1820 3 ай бұрын

miitary applications are endless...

@mixey01 3 ай бұрын

I'm afraid at one point we might be too dependent on A.I.'s Momo: "Which girlfriend should I date?" Momo: "Is she cheating on me?"

@ZenTheMC 3 ай бұрын

Short term problem. Eventually we'll merge via BCI or nanotech. There's plenty of stuff we no longer do because technology does it for us.

@TheDude_767 3 ай бұрын

Garbage tried it, and it doesn't work

@codyfsw 3 ай бұрын

Maybe ask it for some grammar correction 😂

@EliteBankQuant 3 ай бұрын

distilled vision model perhaps!

@DubStepKid801 3 ай бұрын

I was the first person to watch the video so I won 😅

@WillBeebe 3 ай бұрын

sounds like joe joe

@CustomComputing 3 ай бұрын

This was very cherry picked scenarios. The model is not that impressive. I tested it and it’s efficiency rate is a little bit actually bad.

@Natural_beauty1212 3 ай бұрын

Hello everyone I hope everyone is going well If someone want to create AI videos but they don't know how to make a quality AI video here i am going to help you generating videos Images Avatar

@JohnnyColchester 3 ай бұрын

I tested this Molmo Ai and it is the worst AI i ever came uppon! It halucinates lies and if you fool it with a data like for example say that you say something in the image it then says "Oh i forgot about that! You are right...." And if you then tell it the truth it just keeps lying 🤣🤣🤣🤣 Its like a butthurt egotisticall teen 😅😅

@robertt8279 3 ай бұрын

This isn't new, is it? Hell, I was taking photos of my refrigerator asking ChatGPT what I could make for dinner last year. It does everything in your demo. And zapiers new tools make all that possible. All the best.

@СергейФалалеев-й7у 3 ай бұрын

In general yes, but I think the emphasis here is on being able to accurately determine the position of an object and point it accurately. This is still a mega difficult task for all LLMs - they can't even determine the position of a button on the screen accurately: I tell the GPT where to press, he names it, but specify the coordinates or at least the approximate position in percentages and he can't do it or does it almost always wrong

@DailyTuna 3 ай бұрын

Visual AI for people that are not used to functioning in the real world? “ How long can I park here?”😂 Yes the tech can be great for business and such but sad if you have to be dependent on it to function. Amazing people survive thousands of years without it.😂