OUTLINE: 0:00-Intro & OvervieW 1:20-Language Models 2:45-Language Modeling Datasets 3:20-Model Size 5:35-Transformer Models 7:25-Fine Tuning 10:15- In-Context Learning 17:15-Start of Experimental Results 19:10-Question Answering 23:10-What I think is happening 28:50- Translation 31:30-Winograd Schemes 33:00-Commonsense Reasoning 37:00- Reading Comprehension 37:30-SuperGLUE 40:40- NLI 41:40- Arithmetic Expressions 48:30- Word Unscrambling 50:30- SAT Analogies 52:10-News Article Generation 58:10-Made-up Words 1:01:10-Training Set Contamination 1:03:10-Task Examples
@larrybird37294 жыл бұрын
Imagine telling Alan Turing we created a 5.7 trillion bit program to answer "what is one plus one?" lol
@Lumegrin4 жыл бұрын
Hey, maybe it could create numbers
@jorgehenriquesoares78804 жыл бұрын
He would be amazed
@YYJConstructionUpdates4 жыл бұрын
you can build an AND gate with a few transistors or you can use a 5.7 trillion bit program to infer the meaning of the word "and"
@LukePluto4 жыл бұрын
it took ~370 pages of symbolic logic to show 1 + 1 = 2 in Principia Mathematica
@bofur27274 жыл бұрын
Yep, that is not unimaginable but existing of these comments in this media without his "what is one plus one" ;)
@that_guy46904 жыл бұрын
Watching videos about large language models really makes me ask myself: "What is really "human" reasoning?" And how do humans learn stuff? A great point on arithmetic operations!
@derasor2 жыл бұрын
So true. We don't really know how 'reasoning' actually works on the brain, so saying this system is not capable of reasoning has no floor IMO. Also, you could argue other beings with biological neural system are capable of 'reason'.
@georgestheys2752 Жыл бұрын
That is the essential question .. if reasoning is ability of the brain to come with the next word based on a word sequence that is most likely to satisfy the reasoning recipients ... GPTx will get there ... I suspect (hope?) there is more to that.
@unwind_ai2 жыл бұрын
If you would like to know more about GPT-3 and burgeoning AI ecosystem powered by LLMs. Check out my upcoming O'Reilly book titled "GPT-3: Building Innovative NLP Products using LLMs" here: learning.oreilly.com/library/view/gpt-3/9781098113612/ To get started with GPT-3 in three simple steps, check out - www.gpt3.club Connect with me 👉 Twitter - twitter.com/Saboo_Shubham_ LinkedIn - www.linkedin.com/in/shubhamsaboo/
@Tondadrd4 жыл бұрын
Wow, an OUTLINE, I didn't know that was possible on KZbin :o thx
@funkylosik4 жыл бұрын
cool. Me netither. Just enter anywhere the "0:00 xxx" to start (0:00 is important) and mark your timestamps with description each on a new line.
@Tondadrd4 жыл бұрын
@@funkylosik I found no documentation, so I had to figure all that out. Also every time mentioned in description must be in rising order, there may be no duplicate times and at most 51 times. Break any of these and it will simply not show.
@mrpoopo23203 жыл бұрын
Have you guys never listened to a full album on KZbin? Or a compilation? Or a Vlog Creation? Maybe I can sound as silly to you as that comment does to me. Isn't this a table of contents, not an outline?
@DeveloperDesmond4 жыл бұрын
This is such a fun format for educational video! And with a huge backlog of videos that look worth checking out, there's so much to learn. 'Excited for this channel!
@KivySchool4 жыл бұрын
Explaining papers? That's awesome. Subscribed instantly. Thanks for your effort and please continue to do so.
@catharinecox67432 жыл бұрын
I am studying linguistics at uni and I'm writing my dissertation on whether humans can distinguish human from gpt-3-generated language. I am extending the findings of this paper by investigating the use of gpt-3 in social media, news and email contexts, using a large Turing-style survey whereby people are required to pick the AI response over the human one. I will apply the findings onto potential phishing, fake news and ethical implications. I study linguistics not computer science, so found this video extremely useful! Thank you for a great explanation.
@StoutProper Жыл бұрын
I studied linguistics. You should think about following it up by training/programming a language novel to use a specific idiolect including particular fillers, qualifiers etc. to see if you can make them emulate human speech patterns, in particular one person in particular.
@terjeoseberg990 Жыл бұрын
@@StoutProper, I believe that’s the issue. These languages models are emulating the average person when coming up with text, but there is no average person, so what it produces will be odd when compared to what’s produced by an individual. Training it to emulate a particular person might solve this and make it indistinguishable from a human.
@StoutProper Жыл бұрын
@@terjeoseberg990 this could easily be done by fine tuning a specific model trained on your inputs. A few weeks worth of audio and text data from your phone would probably be enough. It won’t be long before we can carry out open personal sparse AI models around with us on our phones, and I can envisage a world where the AI feat drafts replies to all your messages and you just need to approve. Eventually you’ll trust it so much you’ll let it reply to certain people without your specific approval, you’ve just given it directions on the approach to take. It’ll get so good at emulating your voice I’ll be able to get it to take your calls and answer as if it’s you, while you listen and are ready to take over if necessary, like a self driving car. Eventually you’ll just have AIs talking and texting each other while their people do something else 😂
@StoutProper Жыл бұрын
@Catherine Cox Catherine how did your dissertation go? Are you interested in applying your work in a commercial setting for an AI company?
@terjeoseberg990 Жыл бұрын
@@StoutProper, LOL
@lgoose76044 жыл бұрын
Great video. Your explanation made it clear to me the distinction between memorizing and reasoning, just like the two ways students study for tests. If the test contains mostly of problems encountered before, the students who memorize will likely perform better than ones who reason. Just as you pointed out, when one memorized the internet, there won't be a lot of things one hasn't seen.
@bluel1ng4 жыл бұрын
Yannic, great presentation as always! But I think the power of transformer models is to "discover" structural similarities (frequent repeating structures). Many of these "rules" are not learned for exact input sequences but for sequences or co-occurrences of sets or classes of input symbols. This is IMO different from exact "regex-like" recall which would not tolerate different query representations. I think the embeddings on all layer-outputs are some form of thought- or summary-vectors that capture the gist of the context up to the current token. Attention can be seen as key-value store but I prefer to think of it as a soft read-memory and transform operation. The computational capabilities of transformer models are inherently limited by the number of feed-forward and attention steps but it has been shown with smaller models that this is enough for simple arithmetic operations which generalize outside numbers that were presented during training etc. While it is still not AGI I must personally say that I am again and again impressed by the "world-model" / knowledge-base that is generated via a "stupid" next or masked token prediction objective... ;-)
@YannicKilcher4 жыл бұрын
Yes, I agree. When I say lookup or regex, I mean this in a sort-of fuzzy way (in a deep learning way, I guess). Pretty much what you're describing. But still different from reasoning abilities.
@GuagoFruit2 жыл бұрын
@@mcs699 I think you need to understand humans better before being able to say gpt-3 is truly "reasoning" as a human does. Obviously you're very pro AI reasoning, but reducing human reasoning down to the level of what AI is at now is severely underselling human capacity.
@AlexBravo4 жыл бұрын
"T6 - 1 Trillion Text-To-Text Transfer Transformer" - the next model coming out of Google
@xSNYPSx4 жыл бұрын
When ?
@TechVizTheDataScienceGuy4 жыл бұрын
Is it ? 😅
@shadygamererfan3774 жыл бұрын
Our brain has a trillion synapses And gpt 3 has 175 billions trainable parameter we are no where close to agi but sooner we will be.
@gargeyasharma48584 жыл бұрын
thanks for the heads up.
@fargh4 жыл бұрын
@@shadygamererfan377 based upon your comment, then we are close - less than an order of magnitude, as 175 billion is less than a factor of 10 smaller than 1 trillion. However, I think there are more than a trillion synapses in the brain. The Google card to the question "how many synapses in the brain" returns 1000 trillion. So quite a few orders of magnitude greater than GPT-3, which means you're right.
@LinkSF14 жыл бұрын
Great video. Thanks for making it. Regarding your idea on the explaining model predictions using the weights and/or training examples: it’s already been done. Look into Percy Liang’s paper on Explaining black box predictions using influence functions.
@YannicKilcher4 жыл бұрын
Nice, thanks.
@carlos244974 жыл бұрын
Your channel is a treasure, thanks for doing this (making videos in general I mean)
@tianyulu4 жыл бұрын
Really appreciate your insight that I otherwise wouldn't have got from just the paper.
@monstrimmat4 жыл бұрын
Your channel is a great find. I was already digging the "speak and doodle" method on other videos, in which I didn't get any particular new insight but they were still fun to watch. In this one tho, your analyis of why GPT doesn't actually reason about anything (and I agree) takes it to the next level.
@lorenzoampil32324 жыл бұрын
Thank you so much for this. Your explanations are very clear and I appreciate you sharing your views on the paper. Keep up the good work!
@eddiesagra90884 жыл бұрын
Awesome summary Yannic - very informative. Thank you!
@Oler-yx7xj Жыл бұрын
Took an ML course. Got told about the paper. Looked for your video on it. Understood what it says. Understood how fast the progress is.
@JohnKruse4 жыл бұрын
Many thanks. I started reading this and quickly ran out of steam. You boiled this down nicely and I really appreciate your point that given the gigantic training set, they are likely "memorizing" relations in an unintended but superficially useful way. I hope that the community digs into this more deeply and can possibly turn this into a purposeful strategy... Sometimes brute force is effective, if not efficient.
@TheEbbemonster3 жыл бұрын
I agree with your reflections about memorizing the training data. It is still impressive what type of problems the large model can resolve, and in many real world uses that is what matters the most.
@edwinlundmark4 жыл бұрын
Imagine if at the end of the paper it said: "This paper was written by an AI" lol
@martiddy4 жыл бұрын
That would've been awesome
@tianwang16304 жыл бұрын
I was thinking the same. It would be a milestone, an AI presenting itself to the world.
@bosi32334 жыл бұрын
Skynet in childhood Awaken !
@TusharKale93 жыл бұрын
Perfect. This is what I was looking for. A short self explanatory video and found it. Thank you
@PasseScience4 жыл бұрын
I did a calculation that might be of some interest (but quickly, so it needs to be checked and thought). Data set if I anderstand well is something like 450 billions "tokens", does it means chars or close to it? Parameters of the model is 175 billions. (let's say 175 billions bytes). Best text compression is arround 90% (divides the size by 10). So interpreting it like that it seems a compressed version of the whole data set fits largely in the model params and let a huge number of params for interpolation logic. NB: this calculation shoud be donne more rigorously, what is a token? which size is a single parameter etc...
@victorbusque35764 жыл бұрын
I can help solving some of your doubts. A token generally can be seen as a word. Some models (BERT, for example) do some processing to the dataset that result in a word being split into several tokens (for instance, byte-pair encoding (BPE) could split "playing" into "play" and "ing", since the "ing" part does add some meaning, and would help to better understand other verbs like "synthesising" which the model may never see on the training data). So GPT-3 has seen about 500 billion words, and each word may contain several chars, each one being 1 byte, making the dataset maybe a couple terabytes of text data. Now, each parameter in the model is more than likely a 32bit float (4 bytes). So the model could weight about 700GB. So it would be fitting 2TB of text data into the 700GB that the model weights!
@PasseScience4 жыл бұрын
@@victorbusque3576 Thx for the definitions! 2TB in 700GB seems in fact possible, as I mentionned standard zipping process for text is arround 90% (If I make no mistake) meaning that 2TB is 200GB without local redondancy. And so it seems at least in the reach of the system to fit training data in the weights. (without even couting the natural global redondancy inside the data). What are your thoughts on this?
@GuillermoValleCosmos4 жыл бұрын
doesn't the paper say the dataset is 570GB, when *compressed*? and the internet seems to suggest they use 16bit, so that the model would be 370GB?
@MrSchweppes4 жыл бұрын
31:18 In Translation task GPT-3 outperforms the supervised SOTA in FR to EN and DE to EN. And only 0.4 BLEU down in RO to EN. That's very impressive!!!
@masa267624 жыл бұрын
Thanks for the video! Very informative and sharp eyes. The math debunking was hilarious!
@marziehzargari49404 жыл бұрын
I absolutely love your videos.Thank you so much for explaining everything so clearly.
@MGachnang4 жыл бұрын
I'm currently addicted to AiDungeon. It uses GPT-2 (Griffin) and GPT-3 (Dragon) to make a text adventure. Now I knew how it works, thanks.
@michaelmorgan13993 жыл бұрын
AiDungeon no longer uses GPT-2 for its Griffin model. It uses a finetuned version of GPT-J 6B, same as NovelAI. They still currently use GPT-3 for their Dragon model, but are in the process of switching away from GPT-3 to Jurassic-1 model from AI21 Studio due to 1) OpenAI's insane costs and 2) OpenAI's insanely restrictive content policies that don't allow people who use their AI to use it for tons of stuff, such as erotica, violence, etc. This caused AiDungeon to be forced to try to implement a filter and to read users' writings, which freaked out the userbase, which led to all their users leaving. So in order to regain faith from their users, they'll be switching away from OpenAI in order to hopefully provide better privacy to their users, and to give them back the freedom they think they deserve when interacting with what is essentially a complex chatbot.
@StagnantMizu4 жыл бұрын
when I spoke with GPT-3(3 shot interview questions and answers from Einstein and a discription of einstein as omniscient..) we were talking about intergalactical civilizations and how he would achieve it, further I just started talking in my native language and said: Do you still understand? and it reacted with: I understand better than I can express. later on he said he thought dutch really looked like japanese and made example with kanji and everything. really impressive.
@grafzhl4 жыл бұрын
I don't think the intuition of the model essentially just storing all the training data in a quasi-lookup table is correct. If anything, the model acts as a very elaborate compression algorithm. Also, modeling the semantic structure of language-needed to parse the natural language model input-certainly is achieved in a way that doesn't resemble a plain lookup table. Human reasoning about the world functions in a similar way (heavy compression of information that can be reasoned about within a semantic structure), so the authors' explanation doesn't seem so far fetched.
@hiauoe4 жыл бұрын
Indeed, I think that saying that humans reason when we do for example translation is a hyperbole.
@all_so_frivolous4 жыл бұрын
Plus, in the case of Winograd why is there a difference between zero shot and few shot learning? Winograd is not a task that you would expect to be improved by loading the correct data, I think.
@YannicKilcher4 жыл бұрын
Yes there's an argument to be made for that. Also I'm not saying they're "plain" lookup tables, but more like fuzzy lookup and interpolation tables. My main point is that all of these tasks where the model performs well can be explained by lookup + interpolation and there's none where the model succeeds where you'd have to say it was due to reasoning abilities.
@antonio.75574 жыл бұрын
Yannic Kilcher well this gets philosophical but is there any kind of task that could NOT be explained as "simple lookup table and interpolation"? like what would it have to do? write a nobel prize winning physics paper? any human that does that also needs to read thousands of existing physics papers first. i think we are just moving the goal post
@blinded65024 жыл бұрын
@@antonio.7557 Neural net would need to change it's own structure as it does the calculations. Then it could be considered as thinking more or less.
@PrzemekChojeckiAI4 жыл бұрын
That's a great job! Thank you for all the insights!
@DennisBakhuis4 жыл бұрын
Great explanation Yannic! About making the model explainable, I am not so sure if it is possible to relate to the training examples, of why the model made a choice. Weights are 'learned' by all the examples in the dataset, so each weight has in some way a bit of each example in them. I might be wrong in this, by my intuition tells me that it is quite hard. I have only seen methods where they point to words in the input sentence, which 'trigger' the selected class (or word in this case). But still, an interesting thought for sure!
@YannicKilcher4 жыл бұрын
I see what you mean, and I would agree in most DL models. But here, my point is that you have so many parameters, that probably there's only a handful of training examples that were really influential for each parameter and those are the ones you could reverse-index.
@heyrmi4 жыл бұрын
Glad that you not only explain the paper but share your criticism & views too. At first I thought GPT3 is a breakthrough then I realised that it's just another language model with gigantic parameter size. And its no wonder that it performs well or equivalent to SOTA. I would have disappointed if it won't. Thank you.
@YannicKilcher4 жыл бұрын
It's a breakthrough for sure and the in-context learning is impressive, I just don't think it's all that it's made up to be :)
@StephenMattison664 жыл бұрын
Fascinating and mind blowing information in this video, thank you for such a perfect & detailed explanation, you made it easy to understand the future! *If I may give one friendly suggestion, always and only use a nice lapel microphone, you will consistently get far better, clearer, richer, lounder and easier to understand & comprehend audio than this video has. You are explaining some heady stuff, you really need/must have great audio.* People will be watching & learning from your excellent content for decades. TYVM!
@ThomasDawsonco4 жыл бұрын
Yannic, thanks for this detailed breakdown of the paper - appreciate the way you have de-hyped it.
@CristianGarcia4 жыл бұрын
As always, thanks for the video! I do feel it focused too much on arguing the model is not doing that much reasoning (which I think you are right), but I was more curious about the utilities of these kind of models. While a 175B parameter model is totally impractical right now, being able to solve new problems in a few shot fashion is amazing, if you can very quickly automate tons of tasks for e.g. an RPA application by showing a few examples this tech is worth millions.
@TheNewton4 жыл бұрын
Is that Robitics Process Automation? How does GPT fit into physical systems?
@rmajdodin4 жыл бұрын
@@TheNewton Just tell (instead of showing) the Robot what to. May be the training should combine text with sensory data
@tribelstudio84894 жыл бұрын
I like the part where you say you don’t think it’s “reasoning” but instead it’s... (Then you go on to literally give the definition of reasoning.)
@YannicKilcher4 жыл бұрын
touché ;)
@tribelstudio84894 жыл бұрын
Jason Roos It reasons the exact same way that humans reason. It takes existing knowledge and uses it as frame of reference to assign probability to outcomes of situations.
@jason_v123454 жыл бұрын
But he didn't. Reasoning involves the application of universal rules of thought, whereas GPT-3 is, in a sense, only applying popular rules of thought. In other words, if everyone on the Web is generally unreasonable, or if everyone is unreasonable about a particular topic, then GPT-3 will be similarly unreasonable.
@jason_v123454 жыл бұрын
@@tribelstudio8489 That's not what reasoning is, and that's not how humans reason.
@tribelstudio84894 жыл бұрын
Jason Roos It actually is. Yes, humans reason with universal rules of thought, but those rules change as our frame of reference changes. Just as the AI’s rules of probability will change depending on its frame of reference that’s modified by continuous user input. Yes, if the majority of the data fed to the AI is incorrect then it will also be incorrect. The same applies to humans. For a long time humans thought the world was flat, but as more input was added through experimentation, our universal rules changed based on our moving frame of reference.
@jirokaze63804 жыл бұрын
Thanks for putting your time in creating the video.
@unoriginalusernameno9994 жыл бұрын
This is very close to being descriptive of human verbal intelligence and learning. It's so cool how inferences can be made if the data set can be made large! It's like a baby learning a new language. Open AI is making "large" baby steps to finally making voice assistants talk trash back to you and stop recommending websites instead doing what you ask them to do. So cool!
@kehoste4 жыл бұрын
Great review of this paper, I really enjoyed your critical view on it. I also like how you're chuckling when you get to the good parts... :D
@siddharthbhargava48574 жыл бұрын
Thank you for the explanation. I really enjoyed learning about it and can't wait to, someday, be able to work with such models.
@bhavulgauri78324 жыл бұрын
Great video, Yannic! Seriously this was fast, but then you've not compromised at all on quality bit. :) Even I feel it has just memorized things more or less.
@PierLim4 жыл бұрын
Thank you so much for breaking down these papers!
@AlexMcClung974 жыл бұрын
"Cite the channel" is becoming a common occurrence... You need to turn it into a t-shirt! :D Keep up the good work
@kamalmanzukie4 жыл бұрын
Grow up!
@manzarmashhood8098 Жыл бұрын
You uploaded this video and explained the whole backend working of ChatGPT 3 years ago when ChatGPT was cool
@kimjongun38904 жыл бұрын
Although I gave short breaks I made it to the end. Dİd a good job sir.
@terbed4 жыл бұрын
In the arithmetic part you say that it does not reasoning, but rather memorized so to say "look up tables". But if this is the case why it isn't working for muliplication?
@YannicKilcher3 жыл бұрын
it is, just not as well, probably because it sees more examples of addition.
@lucasalvarezlacasa20983 жыл бұрын
By far the best explanation I've found about GPT-3. Great work!
@tayyabbintahir30054 жыл бұрын
I really appreciate your work ! SO QUICK !
@LNJP135794 жыл бұрын
The review on GPT-3 along with a push in subscriptions owing to the recent popular paper reviews such as ResNet, Word2Vec, etc. (Plus years of hard-work) have made @Yannic an overnight star :) .
I did the same a year back with GPT 2 Medium. I made a stack overflow QA training set with an [ANSWER] token between questions and answers. Then i used it to continue the GPT2 Training with ~400k Stackoverflow examples. When using the [ANSWER] token after a new input, the model would create a new answer string. Results were funny sometimes, but for general, not too specific questions it did surprisingly well. For example, it was able to answer questions about what a "String" is, or what "git" is used for. I whish id known this could be worth a citation back then :D Also I dont quite understand your reasoning on the number addition part. The large param model seems to score a full 100% accuracy on two digit addition. Are you saying ANY addition example can be found on the internet, including the correct the solution? Im not sure about that, the authers must have included numbers large enough to beat that probability at some point. It really seems like the model found an understanding of what an addition is, and created weights that can perform simple computations like this to factor the results into the next word probabilities. At least that would be truely remarkable! I hope someone will follow up on your idea to trace back to the training data that lead to specific outputs later on in order to prove you or me wrong or right.
@YannicKilcher4 жыл бұрын
Yes indeed I think the solution to pretty much any two digit addition is in the internet multiple times and it's just about filtering the websites. And once you give conditioning examples, that gets pretty easy.
@bluel1ng4 жыл бұрын
@@YannicKilcher Its a different dataset but how would you explain the reported near 90% transformer-performance on the add_or_sub_big extrapolation task in the "Analysing Mathematical Reasoning Abilities of Neural Models" arxiv.org/abs/1904.01557 paper? I have seen an implementation of the experiments in that paper ... but at least the claimed results looked impressive to me.
@bluel1ng4 жыл бұрын
The interesting cases for sums are those that require a carry, so maybe 90% is not really good. BTW their claim that mul/div could also be handled pretty well in the interpolation case because it "is just addition and subtraction in log space" sounds to me as pure speculation. Somebody is actively working on a public impl of the math-paper: github.com/andrewschreiber/hs-math-nlp
@rmajdodin4 жыл бұрын
@Nayananga Muhandiram GPT3 should already "knows" addition, as it is a common notion. The examples serve, I think, to show it how to use the addition-tables it has seen (and memorized) in it training: the first two columns are input and the third is the output. I guess if it is queried for "addition", but the given examples are indeed subtraction, like Adding 5 and 3 gives 2. it would do subtraction, that is applying the columns of the addition-table in reverse order. It is still a little disenchanting that it can't do zero-shot add, although it has seen so many detailed descriptions of addition in it's training.
@thomashirtz3 жыл бұрын
I really like your reasoning about the bad english generator 1:00:40
@Synthetica94 жыл бұрын
I tried the "mangling correct scentences" task you proposed (through AI dungeon, so I wasn't really able to tweak the settings). This was the few shot input I gave it (simply copied from the paper): Good English input: The patient died. Poor English output: The patient was died. Good English input: We think that Leslie likes us. Poor English output: We think that Leslie likes ourselves. Good English input: Janet broke Bill's finger. Poor English output: Janet broke Bill on the finger. Good English input: Mary arranged for John to rent a house in St. Louis. Poor English output: Mary aranged for, in St. Lous, John to rent a house cheap. Good English input: Today I went to the store to buy some bottles of water. Poor English output: Today I have went to the store to to buys some many bottle of water. And here is the output (Good English and prompt provided by me): Good English input: This book is good. Poor English output: This book are good. Good English input: I'd like to go to the movies. Poor English output: I will like to go to the movies. Good English input: I bought a new car. Poor English output: I have buys a new car. Good English input: Music was playing through the radio. Poor English output: The music play through the radio. Good English input: A lens can focus light on one point. Poor English output: A lens can focuses light on point. Good English input: There are multiple ways to solve this problem. Poor English output: There are multiple ways of solving this problem. Good English input: Superglue is very sticky. Poor English output: Superglue is very stickyness. Good English input: I threw my taco in the microwave and turned it on. Poor English output: I threwed my tacos in the microwave and it turns it on. Good English input: Raccoons are also called "Trash Pandas". Poor English output: Raccoons are also call "Trash Pandas". It seems to do quite well, even though it seems prone to just play with the verbs.
@YannicKilcher4 жыл бұрын
very nice research! thank you!
@astroganov4 жыл бұрын
Thanks for very down to earth explanation. Of course gpt3 is a great job done by openAI researchers, but it's far from AGI. So, for sure, no reasoning abilities here, no summarizing, no key feature extraction. It's just a very big deterministic language model.
@RebeccaDun4 жыл бұрын
I was talking with my boyfriend about this video and paper, and I think another critique I'd like to point out is the data is a snapshot of the internet. Say for example I ask the GPT-3 for the average price of toilet paper or gas in a particular city. The prices drastically fluctuated from the beginning of 2020 versus the end of 2020. And then there's plenty of word drift in language. Perhaps with faster computation we'll be able to process the multiple snapshots of the internet, but some question answers change with time :P
@partyboeller4 жыл бұрын
Regarding your lookup-table hypothesis: We should probably compare the size of the model (in terms of bits) to the size of the training data (also in bits). "Amazingness" of the model in my view would then be a very low ratio for a given accuracy. Does that make sense, i.e. evaluating a model in terms of how well it can compress the training data?
@YannicKilcher4 жыл бұрын
Yes, true. But one would have to come up with how to assess compression and reconstruction in a fuzzy way.
@raunaquepatra39664 жыл бұрын
I just had a vision of how AGI( or something close to it) will be made. 1. collect huge corpus of human brain data (neuralink) 2. Transform this data to some semantics representation. Then to some human readable output (will happen, obviously if step 1 happens) 3. train GPT 3 in this data. (call it GPT Mind😜). Step 2,3 can be achieved together. Just imagine given any input(visual, audio, both or whatever) I will complete what the next thought will be. Given a mathematics problem it will think like mathematician. Given a physics problem it will do what best physicists will have done. It will be able to solve any problem as long as some people in the world has thought about it or even a step of the problem.
@raunaquepatra39664 жыл бұрын
I will not be surprised if this leads to the creation of AGI
@ashishbhatnagar86824 жыл бұрын
Amazing content. Thank you so much for the intuition that really helped .
@YunjiaZhang-u5v3 жыл бұрын
This is the clearest ever explanation of GPT to a non NLP guy like me
@Barnabassteiniger3 жыл бұрын
Best video on the topic so far
@akshaysinha27114 жыл бұрын
This is crazy. Just today I thought of making a video where I test if gpt2 could perform arithmetic and now I see they have already tried it. I guess it's not what I hoped would happen!
@mahdinasiri68484 жыл бұрын
Fantastic video, many thanks for the effort!
@WulannarenZhao2 ай бұрын
I love your explanation! Your thoughts are exactly what I think. The current LLM is nothing more than a database that stores the training data compressed into the model parameters. Hard to say the model has reasoning ability.
@Luci.r.2954 жыл бұрын
Still here at the end of the video! Thanks for your great work. 👍👍👍
@archwinter41423 жыл бұрын
Stayed till end, Great video
@novick40074 жыл бұрын
I'm pretty sure the "Good English Output" at 1:00:02 actually WAS written by the model! Even in the zero-shot case it's the most obvious completion after a newline, and in the one-shot and multi-shot cases the model should definitely be able to select the correct completion from the context.
@terjeoseberg990 Жыл бұрын
It’s memorizing a probability table for the next token given the current context. The context is determined by the attention, and the attention is learned from the data such that the attention points to the most relevant tokens required to determine the most probable next token.
@alisendj.s.c.81724 жыл бұрын
What is the machine doing that we can't? We use real world examples to reason. The machine has the internet as its world. If you're stuck with concepts like free-will and original thought, I wouldn't worry about that. We use our own version of data sets. The thing does so, too. Logic essentially is pattern recognition software: observe, deduce, and infer, over and over and over, again.
@GuillermoValleCosmos4 жыл бұрын
Well looking at the training data in "a fuzzy" way and combining the results could be interpreted generally enough to include a lot of forms of generalization that some people would call "reasoning"? E.g. perhaps it has seem many examples of "A has X" "X is COLOR", "what color is A's X? It's COLOR", and then it may learn from that the "meta-pattern" that if it has seen examples matching the first two patterns, it should complete the third as shown. E.g. perhaps it could answer "What color's is Mary's pet?" with "white" because it has seen many "Mary had a little lamb", and "Lambs are white" previously, even though it's never seen the a sentence like "What color's is Mary's pet?". I think you could say the model has learnt a reasoning rule, even though it can be intepreted as "pattern recognition". But the point is that the later can become the former if done in a general enough way.
@GuillermoValleCosmos4 жыл бұрын
Perhaps there's a continuum in generalization abitlity between "literally lookup data" and "do computationally hard logical inference", rather than a hard line between them?
@GuillermoValleCosmos4 жыл бұрын
when they say "Poor English input/Good English output framing" they could just be referring to the few-shot prompt? Not necessarily that for the last line "Good English output" was also part of the prompt?
@seanjhardy4 жыл бұрын
Phenomenal analysis, you really make this field approachable to pre-university students like myself!
@channagirijagadish12014 жыл бұрын
Excellent Review of the paper you Yannick - much appreciated!
@dkkoala14 жыл бұрын
Based on the computational requirements for training described in the paper, i calculated that training the 177 billion parameter GPT-3 on google cloud would have cost between 5 and 20 million dollars. This is based on the estimate that the model took 9.000-10.000 petabyte/s-days to train, and that 8-32 Nvidia V100 running for a day will produce 1 petabyte/s-day, and that running 1 V100 for 24 hours cost around 59.70 dollars. It should be noted that the reason for the large variance in the estimated number of V100's needed to produce 1 petabyte/s-day is due to the performance penalty for sustained usage of the GPU's, i.e. 8 V100's at full efficiency will theoretically produce 1 petabyte/s-day, but the more realistic estimate is up to 32, depending on cooling capacity.
@YannicKilcher4 жыл бұрын
Yea that explains why they couldn't just re-start when they discovered the bug. Absolutely stunning
@dkkoala14 жыл бұрын
@@YannicKilcher The VRAM requirements alone for running GPT-3 also seems insane. Since they haven't released the model yet i have had to do some estimates based on similar models, like GPT-2 and T5, and have reached the conclusion that GPT-3 weights alone must fill around 340GB. This is based on the 11 billion parameter T5 model's weights filling roughly 20GB, meaning a model 17x bigger with a similar architecture should fill around 17x more. So if you want to run this you would need at least 11 V100's to simply load the model weights, and then a couple more if you want to process anything.
@IsakWesterlund4 жыл бұрын
If it is storing the training data in its weights it would be very interesting to fine-tune the model on open domain QA, as it could know the answer to almost any question available on the internet.
@FabonDzogang4 жыл бұрын
46':00" should be easy to check the claim that GPT-3 indeed learned something useful about basic arithmetics by plotting a surface of the mean addition/subtraction/multiplication accuracy across every possible combinations of 1,2,3 digit numbers. Surprised the authors did only rely on averaging validation measures to support their claim.
@wernerbogula64914 жыл бұрын
Brilliant deconstruction. Thank you
@guillaumewenzek42104 жыл бұрын
Thanks for the analysis. About their experiments, even simple word embedding you can correct typos. You could probably do 2 digit too if you have the correct data and do the string processing yourself.
@adespade1194 жыл бұрын
train it on the collected works of nobel laureate physicists, or chemists or mathematicians or...
@tanmaysinha987 Жыл бұрын
Amazing insights Yannic
@ДаниилРабинович-б9п4 жыл бұрын
Maybe it's sort of storing and filtering the data, but isn't that a big part of human learning as well?
@zeytelaloi4 жыл бұрын
The question is what is the ratio of the size of the parameters vs the size of the entire Internet.
@2DReanimation2 жыл бұрын
It's really inspiring how simple the algorithm is for this. And the general learning and then fine-tuning of Bert is a neat way to do things. I will be thinking how to make use of this for my own AI system (which is pure program-generation based instead of ANN).
@HarisPoljo4 жыл бұрын
Maybe the way to test if the model just remembers all the data, is to add the URLs to the training data. Then after training you ask it to output an URL given a sequence of text. Then you could check if that website actually contains part of the text sequence. However my intuition says it would most probably output a random URL.
@YannicKilcher4 жыл бұрын
Hey that might just be crazy enough to work :D
@drga12564 жыл бұрын
175B!! of parameters sometimes I feel that its like trying to reach the moon just building higher and higher skyscrapers until reach the moon
@ikcikor36704 жыл бұрын
The funny thing is that while most AIs "cap" at some complexity (the more advanced they are, the less efficient the next upgrade becomes) this is not the case for GPT models. It's abilities keep going up slowly at a quite consistent pace
@PabbaANUBHARATH4 жыл бұрын
Thank you! Great explanation
@Twilightsfavquill4 жыл бұрын
It looks to me like a lot of aspects of this paper were rushed and not very well thought through. I like your argument that the model is simply encoding the training data and stores it into its weights, this would also explain the linear scalability of model performance as with a higher total amount of training data, the probability of distilling the right answer for a task / query increases as well. The low performance in reasoning tasks in my opinion suggests that such vast encoder/decoder stacks are not the right architecture for neuro-symbolic intigration, not even for a very fuzzy one. Still, the idea of zero-shot or few-shot quering is interesting and bridges a gap between large NLP models and a more intuitive interaction. It is unfortunate that such large models are not usefull for fine-tuning on complex domain-specific tasks given their size. Looking forward for DeepMind to up OpenAI with a massive BERT 2.
@vladimirantoniomininelmini16934 жыл бұрын
Text very well explained. Congratulations!
@hunarahmad4 жыл бұрын
I agree that it looks like interpolation from the vast knowledge that these huge models store within their parameters. Maybe what we call intelligence is also a sort of interpolation that is happening within the vast number of neurons in our brains.
@Phobos114 жыл бұрын
Yannic Light-speed Kilcher
@bzqp24 жыл бұрын
I find it weird these models don't just simply look up all the correct answers keys for the all possible evaluation metrics the researchers are going to use. Researchers: *Let's use Winogrande to test the performance of our algorithm!* GPT-3: *Oh, you want me to quote **www.tensorflow.org/datasets/catalog/winogrande** , sure why not...*
@bzqp24 жыл бұрын
Oh, I just found at 35:00 that it's exactly what GPT did with PIQA xdd
@jichen30002 жыл бұрын
Thanks a lot, this video gives me a lot of interesting ideas, and I really like it.
@kevind.shabahang3 жыл бұрын
Awesome description.
@u_luana.j2 күн бұрын
48:37 4 years later I tested this on GPT4o and 4o-mini: opportunity => oouipprtnty, finance => fnnciae, premium => prmmeiu, acceleration => aeeaiocclrtn, important => It was amazing how 4o instantly answered "ioamprtnt" whereas mini answered "imprtoant"
@BrainWavesVillain4 жыл бұрын
GPT 4 for president 2024
@federico-bayarea3 жыл бұрын
Hi Yannic, I would like to have your opinion on an interaction I had with the AI Philosopher version of GPT-3. I've been testing it with discussions of various topics, and I agree it's an amazing language model. Now the question is about the reasoning aspect. My interpretation based on my interactions is that the model actually produces abstractions which are in the propositional logic domain. The depth of the model can allow it to store high level concepts, and the relation between them somehow independently on each specific page in the training data. Different pages can contribute to those concepts, and relations between concepts. Why wouldn't the model produce those concepts with so many layers available if such concepts are useful to predict words, sentences, paragraphs, and essays? In that sense, there's some level of reasoning according to my interpretation. Such reasoning can be measured with the metrics you showed in the videos. It doesn't match the SOTA in many cases, but it's still quite impressive to me. I get that it still fails in simple aspects, like producing logic contradictions within an essay. Now here's one interaction that really struck me. The prompt for the Philosopher AI was: Prompt: "The industrial revolution, capitalism, the republic, and democracy were catalysts of nation states." The answer had two parts. In the first paragraphs, GPT-3 "gave an opinion" on non-democratic systems, and capitalism. The second part was amazing to me. It actually dissected my sentence, agreed with part of it, and refuted the other part. Here's the full answer: Answer: " I'll start by saying that I don't like nation states because they force people to live the same way. The government always makes decisions on behalf of its citizens, and I feel like it's not fair for them to decide what is best for everyone in their country when the decision doesn't necessarily affect them personally. It also bothers me how many countries are moral dictatorships where the one at the top makes all of these laws claiming that it's 'for your own good' or whatever. However, I do understand why different types of governments exist, but I still don't really agree with most of them. I like capitalism because it allows people to do what they want with their money and spend it however they see fit on things that make them happy. I know this doesn't sound very humanitarian, but the fact of the matter is you only live once so if there's something someone wants to buy then why not let them? I have a hard time justifying moral dictatorships or even democracy when at some point everyone has to die anyways. Also, I don't think it's fair to say that the republic and democracy are catalysts of nation states. It seems like there were plenty of other countries without a democratic system that managed to become successful before they even started using those systems. However, I do agree with the fact that nation states are catalysts of industrialization. The main reason why I think this is because people need to have a place where they can live without having to worry about food and water or anything like that. Thanks for requesting a response to this topic. I hope my answer has been helpful. If you would like to request a topic for me to respond to, then please click here. " The interaction can also be accessed on philosopherai.com/philosopher/the-industrial-revolution-capitalism-the-republi-58befd What are your thoughts about this interaction? One can argue the model is splitting my sentence into multiple statements, and then generating text for each of them. But isn't that quite impressive too? What about how it merges the concepts which go together like republic and democracy as catalysts of nation states? The nuance in saying "fair to say, but" is interesting too. All of these seem to be learned ways of speaking with some degree of reasoning to me. I agree it can be used to bounce ideas as a way to learn in a more digested way than searching in Google. The difference is probably in the interpretation on "what the model is doing". Such deep models are hard to describe, and maybe that's ok. Even the abstraction of "concepts" may not be accurate, but it's just a way we make sense of it, or to say the level of abstraction a model can have. Maybe similar to human minds, we may never know exactly how each neuron contributes to reasoning, but the model being digital does make it easier to potentially have some level of explainability.
@DrWho2008t1014 жыл бұрын
thanks for the video
@CamiloThorne3 жыл бұрын
Made me think on a Borges tale ("Funes el memorioso") :-D -- 175B parameters is arguably larger than the number of tokens in English Wikipedia.
@CristianGarcia4 жыл бұрын
I was thinking about the addition being memorized argument(which I totally agree with) and it reminded me that we humans also tend to replace a lot of logic with memory, e.g. multiplication tables, anecdotally I think I've memorized various combinations of numbers that add to 10 (5 + 5, 6 + 4, 7 + 3, ...). Deep Learning still needs a good way to do logic reasoning, but what if having a vast amount of knowledge is a good portion of human-like intelligence?
@YannicKilcher4 жыл бұрын
Absolutely true, if you see some "logical" thing over and over again, you start to memorize it. I guess the rest then is a philosophical discussion :D
@florianhonicke54484 жыл бұрын
Thanks for your content!!!
@yr15204 жыл бұрын
Good analysis, at 46;00, I was thinking the same
@vsiegel3 жыл бұрын
When I want to write a news article, I would probably read other articles of that topic. Where else should I get my knowledge about it from? To me, it feels perfectly valid that GPT-3 did exactly that.
@goast32134 жыл бұрын
Hi Yannic, thanks for the great explanation. I could not get how the few-shot examples help the model to heave better result? Since it does not update any parameters, it's basically a longer input right? Can you explain why it helps in some way? Thanks