Пікірлер
@elcanmhmmdli3305
@elcanmhmmdli3305 12 минут бұрын
Azerbaijan❤
@hajerjm
@hajerjm 20 минут бұрын
Thank you!
@SurajPrasad-mz1nx
@SurajPrasad-mz1nx 2 сағат бұрын
Too Good to be Honest.
@Lee-zo3dy
@Lee-zo3dy 3 сағат бұрын
Where can I find the problem sets? This is really import to me. Please someone help me!
@jonathanr4242
@jonathanr4242 8 сағат бұрын
Thanks for sharing.
@aeroperea
@aeroperea 12 сағат бұрын
wow
@fortuneolawale9113
@fortuneolawale9113 13 сағат бұрын
thanks
@systemai
@systemai 16 сағат бұрын
This is page 12. Imagine a beginner being confronted with this. What saddens me is that the concepts behind this word soup is quite straightforward, much of which can be visualised through graphs and simple examples. It's a failing of teaching to make things complicated, obfuscating meaning with terminology.Richard Feynman thought the same, it's for boys to talk to other boys in their club. Beginners keep out. Here we are modeling a single output, so ˆ Y is a scalar; in general ˆ Y can be a K-vector, in which case β would be a p×K matrix of coefficients. In the (p + 1)-dimensional input-output space, (X, ˆ Y) represents a hyperplane. If the constant is included in X, then the hyperplane includes the origin and is a subspace; if not, it is an affine set cutting the Y-axis at the point (0, ˆ β0). From now on we assume that the intercept is included in ˆ β.
@mahirturjo7509
@mahirturjo7509 17 сағат бұрын
❤❤❤❤❤❤
@nafikhan13-4-23
@nafikhan13-4-23 19 сағат бұрын
I love 💓💓💓💓Stanford Online💓💓💓💓
@yuretenno1
@yuretenno1 19 сағат бұрын
I firmly disagree. There must be an important distinction here: (a) expectation of growth by simple treasuring of commodities and currency; (b) expectation of growth by active exploration of the anticipated value by a third party in order to produce more value. The test must cover only option b, according to the reasons stablished in the video.
@Justjemming
@Justjemming 22 сағат бұрын
The dice example for independence is wild! If event G sums to 7, it's independent from E or F but if it sums to a number less than 7 it's not? Would someone be able to explain this in some detail? Or provide some intuition? Thanks!
@TomTom-xh9tp
@TomTom-xh9tp Күн бұрын
Where to find the "other" videos that Andrew says the students can watch at home?
@atdt01410x
@atdt01410x Күн бұрын
This lecture is super useful. really appreciate.
@himanshusamariya9810
@himanshusamariya9810 Күн бұрын
Just awesome 😊
@xX_BabooFrik_Xx
@xX_BabooFrik_Xx Күн бұрын
Love to maddie <3
@arjunkandaswamy
@arjunkandaswamy Күн бұрын
where is the full playlist?
@Beverage21
@Beverage21 Күн бұрын
is this course still applicable in 2024 guys. after a lot advancements will this be sufficient to get started?
@akshat_senpai
@akshat_senpai Күн бұрын
No idea 😄 but I m looking friends 😅
@cheapearth6262
@cheapearth6262 Күн бұрын
learning probability for 12th grade from standford lol
@forresthu6204
@forresthu6204 Күн бұрын
two great minds of nowadays.
@user-my8vx3ls2u
@user-my8vx3ls2u Күн бұрын
Great presenter.
@MLLearner
@MLLearner Күн бұрын
00:10 Today's discussion is about supervised learning and locally weighted regression. 07:48 Locally weighted regression focuses on fitting a straight line to the training examples close to the prediction value. 16:15 Locally weighted linear regression is a good algorithm for low-dimensional datasets 22:30 Assumptions for housing price prediction 29:45 Linear regression falls out naturally from the assumptions made. 36:36 Maximum Likelihood Estimation is equivalent to the least squares algorithm 44:40 Linear regression is not a good algorithm for classification. 51:04 Logistic regression involves calculating the chance of a tumor being malignant or benign 58:30 Logistic regression uses gradient ascent to maximize the log-likelihood. 1:05:36 Newton's method is a faster algorithm than gradient ascent for optimizing the value of theta. 1:12:40 Newton's method is a fast algorithm that converges rapidly near the minimum. Crafted by Merlin AI.
@MLLearner
@MLLearner Күн бұрын
0:28: 📚 The video discusses supervised learning, specifically linear regression, locally weighted regression, and logistic regression. 5:38: 📚 Locally weighted regression is a non-parametric learning algorithm that requires keeping data in computer memory. 13:05: 📊 Locally weighted regression is a method that assigns different weights to data points based on their distance from the prediction point. 19:01: 📚 Locally linear regression is a learning algorithm that may not have good results and is not great at extrapolation. 24:46: 🔍 The video discusses Gaussian density and its application in determining housing prices. 31:31: 💡 The likelihood of the parameters is the probability of the data given the parameters, assuming independent and identically distributed errors. 36:55: 📊 Maximum Likelihood Estimation (MLE) is a commonly used method in statistics to estimate parameters by maximizing the likelihood or log-likelihood of the data. 43:44: 📊 Applying linear regression to a binary classification problem is not a good idea. 49:22: 🎯 The video discusses the choice of hypothesis function in learning algorithms and why logistic regression is chosen as a special case of generalized linear models. 54:45: 📚 The video explains how to compress two equations into one line using a notational trick. 1:01:31: ✏ Batch gradient ascent is used to update the parameters in logistic regression. 1:07:52: 📚 The video explains how to use Newton's method to find the maximum or minimum of a function. 1:13:55: 💡 Newton's method is a fast algorithm for finding the place where the first derivative of a function is 0, using the first and second derivatives. Recap by Tammy AI
@adamlin120
@adamlin120 2 күн бұрын
Great and inspiring talks
@gmccreight2
@gmccreight2 2 күн бұрын
Thanks for the talk! Really interesting stuff. I had one question. At 1:04:00 Hyung suggests that uni-directional attention is preferable to bidirectional attention in turn-taking scenarios because it allows the reuse of calculated information in the KV cache. I'm trying to understand how this fits into his broader thesis that we should be moving towards more generic approaches. On the surface the use of the KV cache doesn't feel particularly generic. Does it make sense because masked self-attention is necessary for next token generation, anyhow, so using a causal attention mask universally makes sense?
@jj-uo9ti
@jj-uo9ti 2 күн бұрын
love it
@jj-uo9ti
@jj-uo9ti 2 күн бұрын
best lecture ever short and precise to the point love it
@guynyamsi7729
@guynyamsi7729 2 күн бұрын
Hello, I wanted to add an offset to my model, but I realize that it's not possible. An offset can be seen as a linear predictor with a variable having a fixed coefficient of 1 (g(mu) = f(x1) +x2 ) : x2 is an offset. Please, is it possible to fix the value of the coefficient in a linear term? For example, l(1, coef_estimate=1)? In this case, variable 1 (x2) will behave like an offset.
@aPhoton.
@aPhoton. 2 күн бұрын
I bought the book with R. Will try to do the labs and problems in Python and take the Python edition PDF as reference. Thank you so much to the awesome team for making this gem free for all the learners.
@michaelbernaski7337
@michaelbernaski7337 2 күн бұрын
Excellent. First talk is practical. Second is profound. Thank you.
@user-eb6xb7ol5t
@user-eb6xb7ol5t 2 күн бұрын
和我說話
@user-eb6xb7ol5t
@user-eb6xb7ol5t 2 күн бұрын
請各國前往台灣會議
@numairsayed9928
@numairsayed9928 2 күн бұрын
I need the Problem sets, can anyone help?
@user-zr4ns3hu6y
@user-zr4ns3hu6y 2 күн бұрын
Best explanation!
@CrazyFoxMovies
@CrazyFoxMovies 3 күн бұрын
Great lecture!
@rasen84
@rasen84 3 күн бұрын
The second half is 100% wrong on the idea that scaling is what matters and adding complexity into the model, adding inductive biases bites you in the ass later. You're not considering the considerable amount of human labor allocated to data curation and handwritten instruction tuning data. That is necessary because the model is too simple and too dumb. The model doesn't have the necessary inductive biases to intelligently take any data. You need to add more inductive biases in order to obviate the need for human labor on data curation and creation.
@user-se3zz1pn7m
@user-se3zz1pn7m Күн бұрын
He is not talking about the immediate moment. He is discussing what kind of model would be preferable when there is an abundance of data and computing resources. He mentioned that due to the current limitations in computing resources, it's necessary to use models with some degree of inductive bias. Although he didn't say it explicitly, he probably thinks that models with inductive bias are also needed due to limitations in data. However, in the future, as more computing and data resources become available, models with less inductive bias will be better.
@rasen84
@rasen84 Күн бұрын
@@user-se3zz1pn7m what I’m saying is that the data collection, creation and curation process should count towards model complexity and scaling hypothesis. You could be removing complexity from the model and offloading that complexity to human data curators and creators.
@user-se3zz1pn7m
@user-se3zz1pn7m Күн бұрын
​ @rasen84 , I believe we are on the same page. I agree with your point that "You could be removing complexity from the model and offloading that complexity to human data curators and creators." However, I think he is talking about the trends and the distant future, perhaps 10 years from now. Yes, if we remove complexity from the model and training methods, we will need more resources to compensate for the trade-off in data preparation. However, in the future, there may be a vast array of open-source data available and synthetic data generated through self-play approaches. Then, our goal will be to reduce assumptions in the model, give it more freedom and make it bigger . I believe this is what he intended.
@izumskee
@izumskee 3 күн бұрын
Very great talk. Thank you
@laalbujhakkar
@laalbujhakkar 3 күн бұрын
Thanks for all the extra popping into the mic during the intro brrrruh!
@ricopags
@ricopags 3 күн бұрын
Really grateful for this being uploaded! Thank you to both speakers and to Stanford for the generosity. Highlight of the video for me is the Hyung's sheepish refusal to get into predictions on the staying power/relevance of MoE or any specific architecture. It felt like a wasted question since the premise of his talk is "tl;dr Sutton's Bitter Lesson"
@sanesanyo
@sanesanyo 3 күн бұрын
One of my favourite talks in recent times..learnt so much from this.
@laalbujhakkar
@laalbujhakkar 3 күн бұрын
thanks for good audio!
@TrishanPanch
@TrishanPanch 3 күн бұрын
Outstanding. I teach an AI class and there are loads of great pedagogical nuggets here that I am going to borrow.
@anshuraj4277
@anshuraj4277 16 сағат бұрын
Hey... Nice... Am AI student... Would you like to connect?
@andrewgillespie4458
@andrewgillespie4458 3 күн бұрын
Great talk!
@Nevermind1000
@Nevermind1000 3 күн бұрын
Anyone know where to get the lecture notes for the lecture
@JH-bb8in
@JH-bb8in 3 күн бұрын
anyone looking for AI startup cofounders, comment below with ur LinkedIn
@chriscockrell9495
@chriscockrell9495 3 күн бұрын
Commonalities and differences.
@ivaninkorea
@ivaninkorea 3 күн бұрын
The professor is so cheerful! I stumbled upon this lecture, but I wanna watch the others just to see him happily explain and teach ^^
@poshpish6063
@poshpish6063 3 күн бұрын
where can i start to learn how to use genrative ai to make an app?
@lpmlearning2964
@lpmlearning2964 4 күн бұрын
It feels too theoretical. It should be more like this: how do I do this in practice, why do I do it this way, and why can it not be done with neural networks and have to go all the way to variational inference