Microsoft's Visual ChatGPT using LangChain

Рет қаралды 13,744

Күн бұрын

Пікірлер: 21

@ItsRyanStudios Жыл бұрын

This is awesome GPT4 multimodal capabilities without GPT4👍 Also great to see so many open source models being put to use. Shows how important open source is in enabling independent groups to buld more complex systems.

@johntanchongmin Жыл бұрын

Very detailed runthrough, Seems like Langchain is the brain of the decision-making, since the Visual ChatGPT code does not explicitly call those tools.

@kevinehsani3358 Жыл бұрын

You made a great comment here that has me confused for a while, "The language model decides whether it needs a tool or not". I guess for now we may need agents but won't agents be obsolete once language models are trained further and do not need to be told what tools they need.

@samwitteveenai Жыл бұрын

I think Tools are always good as they can get up to date info etc where the models can't be totally up to date also model weights are not good for storing facts, this is one reason we see a lot of hallucinations.

@kevinehsani3358 Жыл бұрын

@@samwitteveenai weights are created using facts or data. If we guide the model in steps hallucination decreases. I understand you can't save every combination after all the "go" game has 10 to the power of 167 combinations alone. Of course one needs tools. But my thoughts are that the models can be fine tuned as new tools come along and at some point do we really need agents in the long run, as I understand it, an agent is the declaration of tools to use. Of course tools are necessary specially for mathematical calculations and structured data, etc.

@lloydsloan1349 Жыл бұрын

I love this tool! I'm trying to integrate it with some of my projects. I also have been interested in how to implement longer memory in ChatGPT in my own coding projects. I noticed around 11:05 the ConversationBufferMemory. Can you tell me more about this function?

@samwitteveenai Жыл бұрын

Hi I have a whole video on types of memory and how to use them here kzbin.info/www/bejne/jmaYYY2Yr8SFhac hope this helps.

@viorelteodorescu Жыл бұрын

Very good and very interesting. Keep it up!

@tarasankarbanerjee Жыл бұрын

It's awesome!! Thanks for making such great videos.. 👍👍

@bandui4021 Жыл бұрын

Langchain decides wich tool to use. But how does it decide which tool to take? Thank you for explaining these concepts in an easy-to-understand way!

@samwitteveenai Жыл бұрын

It does this with a prompt strategy called the ReACT. A number of people have asked about this so I will make a video on this over the next week or so

@venkatesanr9455 Жыл бұрын

Thanks for the video. But I believe that it is combining all features like stable diffusion, langchain & llms together but there is no handling of ocr related doc like images or pdf doc....this is my thought process..If I am wrong you can correct me ... Thanks

@samwitteveenai Жыл бұрын

Yes, you are right this doesn't do OCR out of the box, but you could add a call to something like Google OCR or TrOCR (huggingface.co/docs/transformers/model_doc/trocr) in a similar way to what is done in the Visual ChatGPT. I wouldn't suggest to use it like this for documents though. In a case like that better to just OCR first and then use a vector store and semantic search etc.

@nark4837 Жыл бұрын

Would you expect this is how GPT-4 actually does it, or not? Because if so, I would still say we are very far off from AGI, since it is just a bunch of interconnected (but separate) models.

@samwitteveenai Жыл бұрын

No they are most likely tokenizing the image and passing that in with a special token that represents an image, similar to what the recent PaLM-E model does from Google. The said we are still a long long way of full AGI.

@compteprivefr Жыл бұрын

That's what the brain is. A bunch of separate "models". Visual processing in the brain is completely different from Language processing. So I'd say we are exactly on the right track

@nark4837 Жыл бұрын

@@compteprivefr I sort of see your point, but I believe there was some research done in which different sensory organs were connected to different cortices in the brain, and the animal quickly learned how to hear through their visual cortex and such. Implying the algorithms are essentially the same, which obviously in our case, sequential modelling vs CNNs involve completely different algorithms.

@stalinamirtharaj1353 Жыл бұрын

How this is different from Dall.E model offered by OpenAI? using OpenAI would be very straight forward.isn't it?

@akshara8812 Жыл бұрын

Hi Can you please help me how to create public link

@samwitteveenai Жыл бұрын

Do you mean serving this in the cloud?

@akshara8812 Жыл бұрын

@@samwitteveenai Thank you for you reply , i am able to create public link by providing share=True as argument for launch method in visual_chatgpt.py file