Research Forum 4 | Keynote: Phi-3-Vision: A highly capable and "small" language vision model

Рет қаралды 12,334

Күн бұрын

Пікірлер: 21

@JaredWoodruff 2 ай бұрын

The Phi series never fails to surprise me, combined with ONNX runtime its really portable and powerful. I'm using Phi-3.5 instruct at the moment for enterprise clients and its performing very well, Looking forward to adapting the vision model into the mix too. Fantastic work MSR team, keep up the amazing work! Small, Smart and Scalable for the win! 🚀

@quickpert1382 2 ай бұрын

a realistic voice decoder along that image encoder is all we need in rest. Hope meta guys are not going to be late at the small vision models party.

@WearyTimeTraveler 2 ай бұрын

The phi models are truly impressive, excited to see the future work around embodiment. Only hope in future is that frozen weights at different training stages are available to download

@GNARGNARHEAD 2 ай бұрын

open source, lets go!

@sammcj2000 2 ай бұрын

Microsoft hasn’t contributed in the most widely used format (GGUF) though meaning unless the community does the work it won’t be usable in common tooling such as llama.cpp, Ollama etc

@ChristianNode 2 ай бұрын

what do you mean @@sammcj2000

@ahmedtremo 2 ай бұрын

Great and concise explanation, thanks!

@n8works 2 ай бұрын

This was a detailed and interesting video. Congrats on the achievement.

@renereiche 2 ай бұрын

Phi-3 is absolutely incredible, super capable and yet resilient to misuse and always kind and understanding. Magical at this size already and then it's even good at math. However, I think Microsoft should cut the parameter sizes of the different versions more smartly in regards to current device hardware.

@markmatzke 2 ай бұрын

Fantastic presentation! I’m particularly interested in how the F3 Vision model's performance compares to other vision-language models in terms of scalability for different hardware platforms. It seems like a game-changer for integrating vision capabilities with language understanding. Also, how do you see the model evolving to address emerging challenges in diverse data contexts? Looking forward to seeing its future applications and updates!

@tamineabderrahmane248 2 ай бұрын

phi-3 vision has the same structure of PaliGemma , and both are open sourced , great !

@p4r7h-v 2 ай бұрын

brilliant

@ChristophBackhaus 2 ай бұрын

SO how well does this for extraction from pdfs in comparison to OCR?

@r.m8146 2 ай бұрын

awesome

@YiKidane 2 ай бұрын

specswriter AI fixes this. Highly capable small vision model.

@sammcj2000 2 ай бұрын

Needs a GGUF!

@octaviusp 2 ай бұрын

How can i join the microsoft research team? that's one of my life-goals, and i will reach it.

@fahnub 2 ай бұрын

microsoft catchin up

@getasmilefix 2 ай бұрын

LFG

@edi.maulana 2 ай бұрын

okay great, but i have to turn on subtitle now.

@bilalazhar4495 2 ай бұрын

The fucking contrast of the text transparency looks straight garbage microsoft needs to fire all the Modern art majors on their design team in the next layoff round