Visual Reasoning

  Рет қаралды 2,150

hu-po

hu-po

Күн бұрын

Пікірлер: 9
@sue_green
@sue_green 23 күн бұрын
God I love your streams man! Thank you, thank you so much for what you're doing
@thivuxhale
@thivuxhale 24 күн бұрын
1:11 starting horn
@alirezaahmadi5018
@alirezaahmadi5018 20 күн бұрын
I watch your saved stream for first in my life and it's awsome, I enjoyed so much, please continue with heavy booster man. we love this job. good luck.
@wolpumba4099
@wolpumba4099 18 күн бұрын
*Visual Reasoning and the Future of AI: A Stream Summary* * *0:00** Stream Introduction:* Host introduces the theme of "visual reasoning" and the use of Google Illuminate to create AI-generated podcasts summarizing the discussed papers. * *1:27** Vision Encoder Scaling Laws:* Just as with large language models, vision encoders are continually improving, showing a strong correlation between scale and performance. * *10:09** Inference Optimization Nuances:* Inference for vision-language models presents a unique challenge. Balancing language model size and visual token count is crucial and highly task-specific. Tasks like OCR benefit from a higher number of tokens, while visual reasoning tasks might achieve optimal performance with fewer, even just one. * *11:48** GUI Agents: The Future of AI Interaction?* The future of AI might be dominated by GUI agents, interacting with existing user interfaces rather than relying on specialized APIs. This is due to the widespread use of GUIs and the inherent efficiency of leveraging existing systems. * *26:53** The Dawn of GUI Agents:* An exploration of the paper "Dawn of a GUI Agent" reveals successes and failures of agents interacting with software like Microsoft Word and the game Hearthstone. * *36:55** Structured Reasoning and Self-Improvement:* "LLaVA-o1" employs a structured, hardcoded approach to reasoning, demonstrating better performance through step-by-step analysis. This method can be further enhanced by training on self-generated data. * *42:18** Self-Improvement Through Consistency:* "Large Language Models Can Self-Improve in Long-Context Reasoning" shows how language models can enhance their performance by analyzing the consistency of their own outputs and fine-tuning based on that analysis. * *50:14** Generative World Exploration and Imagining the Future:* The "Generative World Explorer" paper explores an agent's ability to imagine future scenarios to make better decisions. This is achieved through a generative video model that envisions potential outcomes. * *1:06:14** The Arms Race of Speed and Reasoning:* The future likely holds an arms race between optimizing hardware for faster token processing (tokens per second) and the development of ever more complex reasoning chains that require more tokens to process. * *1:23:17** Stream Summary:* A final summary highlights the key takeaways from the discussed papers, emphasizing the ongoing improvements in vision encoders, the complex landscape of inference optimization, the rise of GUI agents, the potential for self-improving AI, and the future interplay between speed and reasoning. I used gemini-1.5-pro-exp-0827 on rocketrecap dot com to summarize the transcript. Cost (if I didn't use the free tier): $0.05 Input tokens: 36990 Output tokens: 542
@Zoronoa01
@Zoronoa01 23 күн бұрын
This is so informative please keep them coming thank you so much
@SuperSoloSquad
@SuperSoloSquad 22 күн бұрын
love your video!
@spirobel2.0
@spirobel2.0 3 күн бұрын
banger stream
@rafaykhattak483
@rafaykhattak483 23 күн бұрын
Have they released weights for BlueLM-V-3B?
Streaming RL
1:33:07
hu-po
Рет қаралды 1,4 М.
Visual Autoregressive Modeling
1:52:44
hu-po
Рет қаралды 1,5 М.
It’s all not real
00:15
V.A. show / Магика
Рет қаралды 18 МЛН
So Cute 🥰 who is better?
00:15
dednahype
Рет қаралды 19 МЛН
Video Generation
1:34:16
hu-po
Рет қаралды 1,1 М.
Tokenformer
1:33:46
hu-po
Рет қаралды 2,5 М.
Reflections on the GenAI Rollercoaster: Glimpses into Our Future
1:19:23
Connected Intelligence Centre | University of Technology, Sydney
Рет қаралды 3,8 М.
RAG
1:32:54
hu-po
Рет қаралды 2,1 М.
Did & Datinformation w/Ann Li: Marx and Signs
55:17
The Autonomous Collective
Рет қаралды 40
Strawberry
1:55:38
hu-po
Рет қаралды 7 М.
AI Scientist
1:54:44
hu-po
Рет қаралды 3,3 М.
How To RETRAIN Your BRAIN Using Neuroscience
1:53:38
Purple Way Podcast
Рет қаралды 400
Generative Molecular Dynamics
1:37:15
hu-po
Рет қаралды 1,3 М.