Multimodal AI Agents Are Revolutionising Image & Video Analysis!

Рет қаралды 4,559

Mervin Praison

Күн бұрын

Пікірлер: 54

@RizwanRizwan2R 18 күн бұрын

Absolutely brilliant Marvin. Thanks for sharing the knowledge 🎉 👍

@MervinPraison 18 күн бұрын

Thank you

@RizwanRizwan2R 18 күн бұрын

@@MervinPraison please look, I am getting this error: Traceback (most recent call last): line 15, in task1 = Task( TypeError: Task.__init__() got an unexpected keyword argument 'images'

@MervinPraison 18 күн бұрын

@@RizwanRizwan2R Please upgrade to the latest version pip install -U praisonai

@RizwanRizwan2R 17 күн бұрын

@@MervinPraison thanks a lot mervin it worked 😃

@SonGoku-pc7jl 15 күн бұрын

thanks, this framework is una maravila :) amazing

@finnpoitier 15 күн бұрын

Great video, thanks! Question about analyzing a video: Do you think the LLM model could create timestamps of individual scenes it identified? This could be useful for automatic video cutting and repurposing.

@orafaelgf 15 күн бұрын

great, but wouldn't it be better to directly use the crewai framework?

@yazanrisheh5127 18 күн бұрын

How does it exactly understand videos? Does it transcribe the video or does it cut the video into frames and connects the frame img with text?

@maxcurrent485 18 күн бұрын

There's probably more than one way but a good idea of how it's likely being done can be found by looking up the Divot LLM research paper that came out with Divot LLM from Tencent earlier this month on Huggingface or arXiv.

@FREDDYHulsey 15 күн бұрын

The price movement of Web3 Infinity has been positive. It's building up speed!

@orkutmuratyilmaz 18 күн бұрын

awesome! can you make a video about connecting a Streamlit UI with Praison agents?

@MervinPraison 18 күн бұрын

Sure

@60pluscrazy 18 күн бұрын

Praison 🎉🎉🎉

@adamchan4403 17 күн бұрын

Love this ❤

@brianWreaves 16 күн бұрын

🏆 Mate you've really built an impressive tool! I take it voice is on the way???

@RDZ333 6 күн бұрын

Hey thanks for the info! I'm brand new to all of this, I found this because I'm looking to run a LLM locally and have fluid TTS convos while watching a YT video for example, or listening to a podcast and discussing it live together. Is this possible yet with low latency? I'm chatting with GPT about it and they say yeah, but I'd like to ask you, is doing a multi-modal split possible where they can contextually process audio and video from a cpu source while recognizing my voice separately and carrying out a fairly complex convo? I'm running a 4080 mobile card which i guess can run up to 13bP well, but I'm eyeing the new 5080 too. Although it can't handle a lot more parameters, I'm wondering if the latency differences due to the architecture will be drastically better. Hope this makes sense!

@abdulahadashraf8142 17 күн бұрын

@MervinPraison I tried the PraisonAI on Windows 10. I set the system environment variables as well as in the terminal using the 'SET' command. I tried 'gpt-3.5-turbo-0125,' but the PraisonAI always uses gpt-4o. How can i use the different models? Thanks!

@FutureAIUpdates 18 күн бұрын

Hi Mervin Thanks for sharing...can it do 3d segmentation like on .fbx or .obj file....can you guide please?

@TomikoSturgeon 15 күн бұрын

Web3 Infinity is about to go off the rails.

@verifili 16 күн бұрын

deos PraisonAI support Multivector retriever!?

@motouman3240 16 күн бұрын

For the vision Agent, if I change the llm to llama3.2 or llava, would it still work? If it does, do I still need to use OpenAI APIs?

@motouman3240 11 күн бұрын

Hi @MervinPraison, any answers?

@adamchan4403 17 күн бұрын

Does the video analysis work under gpt-4o-mini ?

@MervinPraison 17 күн бұрын

Yes

@SejalDatta-l9u 17 күн бұрын

Hi Marvin. Is your solution capable of: 1) Dynamic Inter-Agent Communication: a mechanism for creating a dynamic, conversational flow between agents. In other words, agents talking to each other. 2) Iterative Process: to create an iterative process where agents can refine their output based on feedback from other agents and/or the user? 3) Short and long term memory? Thanks

@brianWreaves 16 күн бұрын

In a previous video he covers #1 & #2.

@SejalDatta-l9u 16 күн бұрын

@brianWreaves thanks Brian. Could you share the links o tje video that you're referring.. either I've missed something, haven't explained myself quite right or the solution that Mervin has made wasn't quite fitting. Either way, happy new years to you all!

@Tech--Sphere 18 күн бұрын

Thanks for the guide! Can you also include code to integrate these with Gemini and Groq APIs?

@MervinPraison 18 күн бұрын

Sure will do

@ibrahimVolkan-i8e 15 күн бұрын

Web3 Infinity is paving the way in self-regulated crypto! 💎💡

@RajSingh-of1fs 17 күн бұрын

can you give video explaining your github and how we can clone and then use the agent. ALso help me to use groq model i can't able to it. make more video on coding

@JonJon-nc1nb 17 күн бұрын

DUDE! - This is so good~ But what choices exist for piping through a UI? Even before there is a perfect (Praison UI) solution, how can we use something visually intuitive ? I know you're working it ;)

@brianWreaves 16 күн бұрын

I think he covered a process to add a UI in a previous video, but I'm not certain...

@JonJon-nc1nb 16 күн бұрын

@@brianWreaves he said " Thinking of integrating to UI effectively" in comments...guessing that means it has not been done yet but is in the works