The fact that giving more freedom to the model and having less inductive biases affected by human subjectivity actually improves performance is really iluminating. Thanks.
@jean-pierrecoffe66667 ай бұрын
Nothing new under the sun, this is just the Bitter Lesson
@chriswang24646 ай бұрын
Moreover, it is inspired by Occam's Razor.
@michaelbernaski73378 ай бұрын
Excellent. First talk is practical. Second is profound. Thank you.
@TrishanPanch8 ай бұрын
Outstanding. I teach an AI class and there are loads of great pedagogical nuggets here that I am going to borrow.
@ankitthawal13137 ай бұрын
Can you explain, what are those?
@lugia88887 ай бұрын
Nice, a fake class.
@irshviralvideo7 ай бұрын
@@anshuraj4277 why bother going to college to learn ?
@calm6947 ай бұрын
@@anshuraj4277 learn english first before making going to AI CS
@packsw92437 ай бұрын
@@calm694 "before making going" yeah you're a real genius
@zyxbody7 ай бұрын
I dont understand anything but I like how these people teach.May all get to understand the concepts thats my only prayer.
@flavioferlin31274 ай бұрын
I could listen to these gentlemen talk about this stuff all day. Thanks and kudos for making such a fascinating topic relatable.
@sanesanyo8 ай бұрын
One of my favourite talks in recent times..learnt so much from this.
@ricopags8 ай бұрын
Really grateful for this being uploaded! Thank you to both speakers and to Stanford for the generosity. Highlight of the video for me is the Hyung's sheepish refusal to get into predictions on the staying power/relevance of MoE or any specific architecture. It felt like a wasted question since the premise of his talk is "tl;dr Sutton's Bitter Lesson"
@sady017 ай бұрын
What an amazing lecture. It was simple, yet groundbreaking
@ariG234988 ай бұрын
He has his slides in his head! Loved the content.
@inforoundup98268 ай бұрын
Great talks by both speakers
@jasonmeyer4957 ай бұрын
Amazing content. His use of simple examples to explain deep concepts is extraordinary. So lucky to be living in a world where content like this is so easily discoverable and accessible.
@Aditya-ri7em7 ай бұрын
he came and started teaching like a teacher .
@atdt01410x8 ай бұрын
This lecture is super useful. really appreciate.
@indylawi50216 ай бұрын
Great lecture and insights on LLM.
@JiayiHe-fs2rh4 ай бұрын
Thanks for the great talk!
@JasonKendra7 ай бұрын
Don't let this setback define your trading journey. Keep working hard and striving for success.
@izumskee8 ай бұрын
Very great talk. Thank you
@gmccreight28 ай бұрын
Thanks for the talk! Really interesting stuff. I had one question. At 1:04:00 Hyung suggests that uni-directional attention is preferable to bidirectional attention in turn-taking scenarios because it allows the reuse of calculated information in the KV cache. I'm trying to understand how this fits into his broader thesis that we should be moving towards more generic approaches. On the surface the use of the KV cache doesn't feel particularly generic. Does it make sense because masked self-attention is necessary for next token generation, anyhow, so using a causal attention mask universally makes sense?
@the_wisecrack94726 ай бұрын
This is really great! Thank you
@jwPick6 ай бұрын
thank so much for the precious video
@itsaugbog7 ай бұрын
Hilariously Jensen Huang from NVIDIA just spoke in an fireside chat recently about how they're already dependent on AI and models for designing chips so that last comment is already happening. Great talk.
@simonfranco6447 ай бұрын
Can you support this with a doc or link. I am very keen in exploring this. Also, it was hilarious to me that the attendees laughed at the doctor for explaining that and I giggled when he mentioned that it might just be official in two years or so.
@adamlin1208 ай бұрын
Great and inspiring talks
@jooholee_7 ай бұрын
Greak Talk. Very Inspiring.
@CrazyFoxMovies8 ай бұрын
Great lecture!
@lugia88887 ай бұрын
All of this is BS 😂
@zacharykosove90488 ай бұрын
The students were asking some great questions, no wonder I don't go to Stanford
@roro51797 ай бұрын
im the dude at the end (dont go to Stanford xd)
@mprone7 ай бұрын
Questions looked pretty naive to me. What's "great" about them to you?
@laalbujhakkar8 ай бұрын
Thanks for all the extra popping into the mic during the intro brrrruh!
@doinitlive30157 ай бұрын
Types of leadership can be used as an analogy in the area of using less structure but at the same time performance is higher. A leader who utilizes an authoritarian type of leadership increases productivity within the team but decreases the team's creativity. Whereas a team under a democratic type of leadership are able to solve problems with increased creativity leading to innovative ideas.
@Faustordz7 ай бұрын
Very intriguing!
@lc.sin.7 ай бұрын
Besides compute, I guess the eponentially cheaper network bandwidth, data storage, sensors to capture real world input should also be part of driving forces
@aliwaheed9067 ай бұрын
Maybe the emergent behavior happens because for that task to be learned there are a set of pre-requisite tasks that need to be learned first. Just brainstorming here.
@boybro6245 ай бұрын
I don't quite understand that the overall loss is divided into many sub-losses, is it true that llm training only uses cross_entropy as karpathy said , sorry, I'm new to this field
@Arcticwhir7 ай бұрын
im more curious about the 22% of completely flat set of tasks and what the solutions are to change that. Also for larger models, showing that less structure is generally better but needs more compute, does that mean the model will need less RLHF to have a desirable model for humans...
@heyitsjoshd8 ай бұрын
How do we know what is small vs large? For example, with emergent tasks, it highlights that more data could lead to more accuracy with enough compute. The small LM would have not seen accuracy improvements but the large LM did. For the tasks currently indicated as flat, couldn't we just not have enough compute now to know if these tasks would get more accurate?
@DanBillings8 ай бұрын
Please put the subject of the talk in the title. You can then market the OpenAI speakers
@Lalala_17018 ай бұрын
Andrew ng also took same kind of example to explain LM.
@dkierans7 ай бұрын
Yeah, this is a pretty great talk. It is quite hard to figure out at what technical level to hit the widest audience. This is nice. Not as nice as those flaxen locks though.
@robertwilsoniii20487 ай бұрын
Something that always bothered me was that adding in random terms increases predicability power, holding sample size constant (scaling compute without increasing data size). The peoblem is it decreases explanatory power and ability to understand the individual contributions of each variable. It's like pop-astrology, star signs -- libra, gemini, leo... etc. -- adding extra variables improves scaling compute and predictability, but does it add anything to clarity? I suppose to make predictions clarity doesn't matter. That always annoyed me.
@xiaoxiandong73823 ай бұрын
wow!!! So good
@MatijaGrcic7 ай бұрын
Amazing!
@hedu53038 ай бұрын
Strange world. This dude is almost a kid and gives a lecture
8 ай бұрын
I am happy to learn from any kid :)
@shairuno8 ай бұрын
His intuition is older than me
@vireyes15957 ай бұрын
nah man gotta recognize game when you see it. dude’s a future titan of the industry and we’re out here getting his guest lecture for free. pretty solid win for all parties involved in my book
@SuperHeromindNsoul7 ай бұрын
True we can all learn from each other and Speakers here also learn from someone
@MrAmgadHasan7 ай бұрын
Indeed. Many of the recent breakthroughs ML were achieved by people in their 20s, mostly during or briefly after their PhDs.
@erebi83867 ай бұрын
형원게이 힘내라
@Umarbit7 ай бұрын
Please remove the noise from audio
@akashdeb98237 ай бұрын
Jason can do 18 pull ups no breaks
@hh06867 ай бұрын
Please…why can’t the presentation be done on a projector instead of a whiteboard. The kind of visual is so horrible.
@primedanny4175 ай бұрын
It was intentional, from Hyung Won Chung's tweet: "Jason walked into the classroom without anything (no laptop, no notes) and gave a lecture out of memory."
@rasen848 ай бұрын
The second half is 100% wrong on the idea that scaling is what matters and adding complexity into the model, adding inductive biases bites you in the ass later. You're not considering the considerable amount of human labor allocated to data curation and handwritten instruction tuning data. That is necessary because the model is too simple and too dumb. The model doesn't have the necessary inductive biases to intelligently take any data. You need to add more inductive biases in order to obviate the need for human labor on data curation and creation.
@김성주-h1b8 ай бұрын
He is not talking about the immediate moment. He is discussing what kind of model would be preferable when there is an abundance of data and computing resources. He mentioned that due to the current limitations in computing resources, it's necessary to use models with some degree of inductive bias. Although he didn't say it explicitly, he probably thinks that models with inductive bias are also needed due to limitations in data. However, in the future, as more computing and data resources become available, models with less inductive bias will be better.
@rasen848 ай бұрын
@@김성주-h1b what I’m saying is that the data collection, creation and curation process should count towards model complexity and scaling hypothesis. You could be removing complexity from the model and offloading that complexity to human data curators and creators.
@김성주-h1b8 ай бұрын
@rasen84 , I believe we are on the same page. I agree with your point that "You could be removing complexity from the model and offloading that complexity to human data curators and creators." However, I think he is talking about the trends and the distant future, perhaps 10 years from now. Yes, if we remove complexity from the model and training methods, we will need more resources to compensate for the trade-off in data preparation. However, in the future, there may be a vast array of open-source data available and synthetic data generated through self-play approaches. Then, our goal will be to reduce assumptions in the model, give it more freedom and make it bigger . I believe this is what he intended.
@hang_81697 ай бұрын
@@rasen84 I would argue even if you use old method which has more structure in it, you still need spend the same amount of effort on data if not more to be adhere to the structure that you impose on the model. Because your model has MORE assumptions on data that it expects not less.
@rasen847 ай бұрын
@@hang_8169 then it’s time to add more inductive biases.
@wenhanzhou58267 ай бұрын
Dude just learned how to manually classify lungcancer to better understand the neural network he is building 💀