Language or Vision - What's Harder? (Ilya Sutskever) | AI Podcast Clips

  Рет қаралды 32,527

Lex Fridman

Lex Fridman

Күн бұрын

Full episode with Ilya Sutskever (May 2020): • Ilya Sutskever: Deep L...
Clips channel (Lex Clips): / lexclips
Main channel (Lex Fridman): / lexfridman
(more links below)
Podcast full episodes playlist:
• Lex Fridman Podcast
Podcasts clips playlist:
• Lex Fridman Podcast Clips
Podcast website:
lexfridman.com/ai
Podcast on Apple Podcasts (iTunes):
apple.co/2lwqZIr
Podcast on Spotify:
spoti.fi/2nEwCF8
Podcast RSS:
lexfridman.com/category/ai/feed/
Ilya Sutskever is the co-founder of OpenAI, is one of the most cited computer scientist in history with over 165,000 citations, and to me, is one of the most brilliant and insightful minds ever in the field of deep learning. There are very few people in this world who I would rather talk to and brainstorm with about deep learning, intelligence, and life than Ilya, on and off the mic.
Subscribe to this KZbin channel or connect on:
- Twitter: / lexfridman
- LinkedIn: / lexfridman
- Facebook: / lexfridman
- Instagram: / lexfridman
- Medium: / lexfridman
- Support on Patreon: / lexfridman

Пікірлер: 60
@darshantank554
@darshantank554 2 жыл бұрын
"where the vision ends, language begins" this line touches my heart!
@JonKroeker
@JonKroeker Жыл бұрын
Not only is this guy brilliant, he’s just such a nice guy
@bubelevakalisa7313
@bubelevakalisa7313 3 жыл бұрын
Vision ends when the viewer (agent 1) sees the words. Language begins when the viewer (agent 1) combines the words it has seen with "prior knowledge" and then communicate "value added" information to a listener (agent 2). For example, when agent 1 "sees" (vision) the name Lewis Hamilton, it must be able to use its knowledge about Hamilton to effectively engage in a coherent conversation with an expert about this great F1 driver. At the moment state of the art like GPT3 can fake a coherent only when communicating with non-experts.
@sidgirase
@sidgirase Жыл бұрын
Vision take the visual input, Brain caches the sentences, NLP begins? If cache is out of memory, Vision goes back and queries the same input again?
@nikhilvarmakeetha3917
@nikhilvarmakeetha3917 4 жыл бұрын
The question "Where does vision end and language start?" was intriguing. It shows a potential final destination that needs to achieved for DL based AI.
@breakawaybooks4752
@breakawaybooks4752 4 жыл бұрын
✔️John Venn liked this.
@holgerjrgensen2166
@holgerjrgensen2166 4 жыл бұрын
Windows start, should be, On/off, it is here Your illiteracy begin, better open some windows and get some fresh air, and understand the nature of dictator-principle. AI, is illiteracy and superstition, - intelligence can never be artificial. Repeating dead mantras, is Not individual thinking. The development of Consciousness and Language is two sides of the very same development, based on Eternal Principles.
@olivercroft5263
@olivercroft5263 4 жыл бұрын
@@holgerjrgensen2166 more like returnal than eternal 🤔😘
@holgerjrgensen2166
@holgerjrgensen2166 4 жыл бұрын
What do You mean, if You know what You're saying.
@holgerjrgensen2166
@holgerjrgensen2166 4 жыл бұрын
Ru-Mu, okai, can means allmost any thing, in danish, it is åvkæj, just a sound-combination.
@DamianReloaded
@DamianReloaded 4 жыл бұрын
This is about semantic interpretation. Whether image recognition and natural language processing could share the same "back end" for semantic interpretation and abstraction. I wonder if one could train an convolutional NN and a transformer to spit out the same semantic vector. So a natural language description of a picture and the picture would be compressed into the same (or similar) vector space coordinates ? :/
@Gyringag
@Gyringag 4 жыл бұрын
There is already shine datasets for this task: you build net for NLP, net for CV and minimize KL-div between two hidden spaces
@adambrickley1119
@adambrickley1119 4 жыл бұрын
"i am going to explain why"...opens by asking a question, nice!
@Ross-nd6xi
@Ross-nd6xi 4 жыл бұрын
You should get a linguist on lex might be interesting to talk about the hermeneutic aspect of language learning and interpretation for AGI
@jamesblankenship3077
@jamesblankenship3077 4 жыл бұрын
This conversation really seemed to enlighten me on how language would have been impossible with sight and hearing. I can see that a word can have many definitions without the presence of a visual or tone of voice. So for the computer to learn. If we relate these few in the algorithm things so that the computer can as we did. If the computer is a rigid piece of electronics, isn't that how life began billions of years ago? Maybe with a better architect.
@FromFame
@FromFame 4 жыл бұрын
I literarily suffer from the same cosmetic matter this respectable person suffers from. I use a solution daily, I understand how you can get used to it but please for the sake of other people research a solution too. I felt embarrassed to mention it and not many will, but I care about AI and those pushing it forward. Beyond being highly intelligent, you are an attractive person👍
@burkebaby
@burkebaby Жыл бұрын
This was an interesting conversation! Lex - I wonder if the title should be "Language vs. Vision" instead. 6:56 - In terms of Generative AI, can Language and Vision both work to improve each other, like an arms race? How will the AI model and algorithm decide when to determine a pass or fail result for either/or?
@user-my5qk5xu1d
@user-my5qk5xu1d 4 жыл бұрын
0:49 The Word is "Interdisciplinary"
@ssssssstssssssss
@ssssssstssssssss 4 жыл бұрын
Man. I hate that word.... It stems from artificial boundaries that we've created due to historical happenstance.
@johnniefujita
@johnniefujita 4 жыл бұрын
i believe cnn and nlp should stand as inputs for decision making systems and reinforcement learning should explore space for actions, state and targets states. so the 2 first are more like perception constructor and the last as decision space explorer
@chrisbarry9345
@chrisbarry9345 Жыл бұрын
Man this is going to finally get watched by people
@TimoNineSix
@TimoNineSix 3 жыл бұрын
once the vision can read the language, the loop is complete
@pratik245
@pratik245 2 жыл бұрын
Great Illya
@justinkiff4159
@justinkiff4159 4 жыл бұрын
I think the wife example is quiet bad because there is a sexual component in the perception of the other, probably with a friend there will be more objectivity. Also yes if you have human level speech recognition and understanding you'll have the vision for free, understanding text is just a primitive form of acquiring information, replace objects on a picture by words and voila.
@NoOne-uz4vs
@NoOne-uz4vs 4 жыл бұрын
0:54 - Does anyone know what those principles are??
@nobodykid23
@nobodykid23 4 жыл бұрын
This is just my ballpark guess, but i think it should be empirical risk minimization and something around no free lunch theorem. i dont know the third one
@shreeyatyagi
@shreeyatyagi 4 жыл бұрын
Yes, the manmade world (physicality) our thought and action is primarily governed by language. So, language is fundamental.
@stevee5718
@stevee5718 Жыл бұрын
So interesting to look back at this interview now, in the wake of GPT4.
@timdh100
@timdh100 4 жыл бұрын
Lex, how about a podcast with Shai Ben-David on advances on the theoretical side of ML?
@nobodykid23
@nobodykid23 4 жыл бұрын
YES, THIS
@jonomichi2262
@jonomichi2262 6 ай бұрын
I thought the interviewer was smart, but Ilya is on a different level.
@joshuaerkman1444
@joshuaerkman1444 2 жыл бұрын
Language has much higher dimensionality than vision. Vision has three basic dimensions and that could probably be abstracted up to thousands or millions. Language has over 6,500 basic dimensions. The abstraction of these basic dimensions may go into the trillions
@leecharlie2513
@leecharlie2513 3 жыл бұрын
Which field have more jobs(NLP or CV)? It seems to me that so far there are a lot more applications for CV, and therefore CV has more jobs opportunities than NLP. Simply search “computer vision job USA” in google and “NlP jobs USA”, the comparison result of both will show that CV has more jobs. Wonder what is your 2 cent on it? Maybe I am wrong?
@MrSchweppes
@MrSchweppes 3 жыл бұрын
It will change this year or maybe in 2022.
@henrikbergman4055
@henrikbergman4055 4 жыл бұрын
Throwing out a question here, as there are some clever people in the thread. Anyone care to help me understand why "natural language" (and does that exclude body language and tone of voice?) would be important for AI? As an example; IKEA furniture assembly instructions don't need words to explain stuff to humans. And being a poet is not a requirement for human level intelligence, right?
@seo95
@seo95 3 жыл бұрын
Your examples are more about language generation, even if important, the hot topic nowadays is language understanding. Understanding language hides a lot of very difficult challenges. Among them reasoning about entities is one of the most difficult one. Each time we speak we refer to events happened in the past and in the present, make implicit relations between entities and talk about abstract things. The language is the description of the world in which we live and the abstract world we have created (the concept of nations, politics, jokes etc.). To understand language a machine needs at first to understand the world we have built. We are far from achieving something like that with AI. How can we pretend to have an "intelligent" machine if it can not understand us?
@styles9783
@styles9783 4 жыл бұрын
Hey Lex
@mohammadaminparchami7462
@mohammadaminparchami7462 4 жыл бұрын
Hey lex, one cool thing would be to add some more media to the conversations. Show the guests some clips, read them news, and then we would like to hear their opinion. Great job ✋🏻👏🏻
@IsmaelAlvesBr
@IsmaelAlvesBr 4 жыл бұрын
The problem is that we are trying to make a robotic brain from scratch. Maybe the solution is to give initials steps so that it doesn't start from 0. It's like when you learn other language. You already know what is a dog, but need to learn how to say it in other "way" and when you should say it.
@maxsnts
@maxsnts 4 жыл бұрын
How does that apply? When a baby is born he does not know what a dog is. The only thing he starts with are unconscious behaviors, like "cry if hungry". In that sense starting from scratch seams very similar.
@danielcogzell4965
@danielcogzell4965 4 жыл бұрын
man.. I find it interesting how I really respect Ilya for what he achieved but I just don't agree with his views on things most of the time.
@pawarboy7
@pawarboy7 2 жыл бұрын
I think vision lags language because it doesn't have a lot of labeled data
@AM-qx3bq
@AM-qx3bq 3 жыл бұрын
I don't understand the difficulty in the "Where vision ends and language starts" question. I imagine an advanced enough vision system can just recognize that a particular region of pixels assortment represents text, from that point it can be converted to raw text (which is a decades-old solved problem) and then fed to an NLP pipeline for interpretation. Imu, it's not a vision system's role to accomplish language understanding, but it would be ideal if it could at least identify what is text and relay it to the NLP component.
@Priyanka-us8rw
@Priyanka-us8rw Жыл бұрын
Computer vision fascinating more
@michaelpetronzio6557
@michaelpetronzio6557 4 жыл бұрын
You are the most nicest cutest thing!
@ko95
@ko95 Жыл бұрын
hmmm
@chocolategolemofroidgutand2839
@chocolategolemofroidgutand2839 4 жыл бұрын
JUST
@olivercroft5263
@olivercroft5263 4 жыл бұрын
Rezpect ze russians🇷🇺
@BaikalLV
@BaikalLV 4 жыл бұрын
8:15 such a blue pilled Lex
@jefferysherwood7424
@jefferysherwood7424 4 жыл бұрын
🐸🐸🐸🐸🐸🐸
@shreeyatyagi
@shreeyatyagi 4 жыл бұрын
Language
@leecharlie2513
@leecharlie2513 3 жыл бұрын
Why?
@shreeyatyagi
@shreeyatyagi 3 жыл бұрын
@@leecharlie2513 because language is a representation.
@leecharlie2513
@leecharlie2513 3 жыл бұрын
@@shreeyatyagi But isn’t the recent GPT-3 demonstrating very promising result to generating meaningful text and dialog?
@luisselvera9878
@luisselvera9878 2 жыл бұрын
Vision ends when language starts.
@henrychoy2764
@henrychoy2764 2 жыл бұрын
hav 2 say that the dumbest animals hav vision but not langwage
@enriquemartinez5647
@enriquemartinez5647 4 жыл бұрын
Read what Lacan says about language. Not chomsky.
How to Build AGI? (Ilya Sutskever) | AI Podcast Clips
18:46
Lex Fridman
Рет қаралды 62 М.
Conforto para a barriga de grávida 🤔💡
00:10
Polar em português
Рет қаралды 97 МЛН
[Vowel]물고기는 물에서 살아야 해🐟🤣Fish have to live in the water #funny
00:53
Consciousness is Not a Computation (Roger Penrose) | AI Podcast Clips
23:00
What is Statistics? (Michael I. Jordan) | AI Podcast Clips
10:39
Lex Fridman
Рет қаралды 53 М.
What AI is Making Possible | Ilya Sutskever and Sven Strohband
25:27
Khosla Ventures
Рет қаралды 65 М.
M4 iPad Pro Impressions: Well This is Awkward
12:51
Marques Brownlee
Рет қаралды 6 МЛН
Добавления ключа в домофон ДомРу
0:18
Обзор игрового компьютера Макса 2в1
23:34