Where can I find the problem sets? This is really import to me. Please someone help me!
@jonathanr42428 сағат бұрын
Thanks for sharing.
@aeroperea12 сағат бұрын
wow
@fortuneolawale911313 сағат бұрын
thanks
@systemai16 сағат бұрын
This is page 12. Imagine a beginner being confronted with this. What saddens me is that the concepts behind this word soup is quite straightforward, much of which can be visualised through graphs and simple examples. It's a failing of teaching to make things complicated, obfuscating meaning with terminology.Richard Feynman thought the same, it's for boys to talk to other boys in their club. Beginners keep out. Here we are modeling a single output, so ˆ Y is a scalar; in general ˆ Y can be a K-vector, in which case β would be a p×K matrix of coefficients. In the (p + 1)-dimensional input-output space, (X, ˆ Y) represents a hyperplane. If the constant is included in X, then the hyperplane includes the origin and is a subspace; if not, it is an affine set cutting the Y-axis at the point (0, ˆ β0). From now on we assume that the intercept is included in ˆ β.
@mahirturjo750917 сағат бұрын
❤❤❤❤❤❤
@nafikhan13-4-2319 сағат бұрын
I love 💓💓💓💓Stanford Online💓💓💓💓
@yuretenno119 сағат бұрын
I firmly disagree. There must be an important distinction here: (a) expectation of growth by simple treasuring of commodities and currency; (b) expectation of growth by active exploration of the anticipated value by a third party in order to produce more value. The test must cover only option b, according to the reasons stablished in the video.
@Justjemming22 сағат бұрын
The dice example for independence is wild! If event G sums to 7, it's independent from E or F but if it sums to a number less than 7 it's not? Would someone be able to explain this in some detail? Or provide some intuition? Thanks!
@TomTom-xh9tpКүн бұрын
Where to find the "other" videos that Andrew says the students can watch at home?
@atdt01410xКүн бұрын
This lecture is super useful. really appreciate.
@himanshusamariya9810Күн бұрын
Just awesome 😊
@xX_BabooFrik_XxКүн бұрын
Love to maddie <3
@arjunkandaswamyКүн бұрын
where is the full playlist?
@Beverage21Күн бұрын
is this course still applicable in 2024 guys. after a lot advancements will this be sufficient to get started?
@akshat_senpaiКүн бұрын
No idea 😄 but I m looking friends 😅
@cheapearth6262Күн бұрын
learning probability for 12th grade from standford lol
@forresthu6204Күн бұрын
two great minds of nowadays.
@user-my8vx3ls2uКүн бұрын
Great presenter.
@MLLearnerКүн бұрын
00:10 Today's discussion is about supervised learning and locally weighted regression. 07:48 Locally weighted regression focuses on fitting a straight line to the training examples close to the prediction value. 16:15 Locally weighted linear regression is a good algorithm for low-dimensional datasets 22:30 Assumptions for housing price prediction 29:45 Linear regression falls out naturally from the assumptions made. 36:36 Maximum Likelihood Estimation is equivalent to the least squares algorithm 44:40 Linear regression is not a good algorithm for classification. 51:04 Logistic regression involves calculating the chance of a tumor being malignant or benign 58:30 Logistic regression uses gradient ascent to maximize the log-likelihood. 1:05:36 Newton's method is a faster algorithm than gradient ascent for optimizing the value of theta. 1:12:40 Newton's method is a fast algorithm that converges rapidly near the minimum. Crafted by Merlin AI.
@MLLearnerКүн бұрын
0:28: 📚 The video discusses supervised learning, specifically linear regression, locally weighted regression, and logistic regression. 5:38: 📚 Locally weighted regression is a non-parametric learning algorithm that requires keeping data in computer memory. 13:05: 📊 Locally weighted regression is a method that assigns different weights to data points based on their distance from the prediction point. 19:01: 📚 Locally linear regression is a learning algorithm that may not have good results and is not great at extrapolation. 24:46: 🔍 The video discusses Gaussian density and its application in determining housing prices. 31:31: 💡 The likelihood of the parameters is the probability of the data given the parameters, assuming independent and identically distributed errors. 36:55: 📊 Maximum Likelihood Estimation (MLE) is a commonly used method in statistics to estimate parameters by maximizing the likelihood or log-likelihood of the data. 43:44: 📊 Applying linear regression to a binary classification problem is not a good idea. 49:22: 🎯 The video discusses the choice of hypothesis function in learning algorithms and why logistic regression is chosen as a special case of generalized linear models. 54:45: 📚 The video explains how to compress two equations into one line using a notational trick. 1:01:31: ✏ Batch gradient ascent is used to update the parameters in logistic regression. 1:07:52: 📚 The video explains how to use Newton's method to find the maximum or minimum of a function. 1:13:55: 💡 Newton's method is a fast algorithm for finding the place where the first derivative of a function is 0, using the first and second derivatives. Recap by Tammy AI
@adamlin1202 күн бұрын
Great and inspiring talks
@gmccreight22 күн бұрын
Thanks for the talk! Really interesting stuff. I had one question. At 1:04:00 Hyung suggests that uni-directional attention is preferable to bidirectional attention in turn-taking scenarios because it allows the reuse of calculated information in the KV cache. I'm trying to understand how this fits into his broader thesis that we should be moving towards more generic approaches. On the surface the use of the KV cache doesn't feel particularly generic. Does it make sense because masked self-attention is necessary for next token generation, anyhow, so using a causal attention mask universally makes sense?
@jj-uo9ti2 күн бұрын
love it
@jj-uo9ti2 күн бұрын
best lecture ever short and precise to the point love it
@guynyamsi77292 күн бұрын
Hello, I wanted to add an offset to my model, but I realize that it's not possible. An offset can be seen as a linear predictor with a variable having a fixed coefficient of 1 (g(mu) = f(x1) +x2 ) : x2 is an offset. Please, is it possible to fix the value of the coefficient in a linear term? For example, l(1, coef_estimate=1)? In this case, variable 1 (x2) will behave like an offset.
@aPhoton.2 күн бұрын
I bought the book with R. Will try to do the labs and problems in Python and take the Python edition PDF as reference. Thank you so much to the awesome team for making this gem free for all the learners.
@michaelbernaski73372 күн бұрын
Excellent. First talk is practical. Second is profound. Thank you.
@user-eb6xb7ol5t2 күн бұрын
和我說話
@user-eb6xb7ol5t2 күн бұрын
請各國前往台灣會議
@numairsayed99282 күн бұрын
I need the Problem sets, can anyone help?
@user-zr4ns3hu6y2 күн бұрын
Best explanation!
@CrazyFoxMovies3 күн бұрын
Great lecture!
@rasen843 күн бұрын
The second half is 100% wrong on the idea that scaling is what matters and adding complexity into the model, adding inductive biases bites you in the ass later. You're not considering the considerable amount of human labor allocated to data curation and handwritten instruction tuning data. That is necessary because the model is too simple and too dumb. The model doesn't have the necessary inductive biases to intelligently take any data. You need to add more inductive biases in order to obviate the need for human labor on data curation and creation.
@user-se3zz1pn7mКүн бұрын
He is not talking about the immediate moment. He is discussing what kind of model would be preferable when there is an abundance of data and computing resources. He mentioned that due to the current limitations in computing resources, it's necessary to use models with some degree of inductive bias. Although he didn't say it explicitly, he probably thinks that models with inductive bias are also needed due to limitations in data. However, in the future, as more computing and data resources become available, models with less inductive bias will be better.
@rasen84Күн бұрын
@@user-se3zz1pn7m what I’m saying is that the data collection, creation and curation process should count towards model complexity and scaling hypothesis. You could be removing complexity from the model and offloading that complexity to human data curators and creators.
@user-se3zz1pn7mКүн бұрын
@rasen84 , I believe we are on the same page. I agree with your point that "You could be removing complexity from the model and offloading that complexity to human data curators and creators." However, I think he is talking about the trends and the distant future, perhaps 10 years from now. Yes, if we remove complexity from the model and training methods, we will need more resources to compensate for the trade-off in data preparation. However, in the future, there may be a vast array of open-source data available and synthetic data generated through self-play approaches. Then, our goal will be to reduce assumptions in the model, give it more freedom and make it bigger . I believe this is what he intended.
@izumskee3 күн бұрын
Very great talk. Thank you
@laalbujhakkar3 күн бұрын
Thanks for all the extra popping into the mic during the intro brrrruh!
@ricopags3 күн бұрын
Really grateful for this being uploaded! Thank you to both speakers and to Stanford for the generosity. Highlight of the video for me is the Hyung's sheepish refusal to get into predictions on the staying power/relevance of MoE or any specific architecture. It felt like a wasted question since the premise of his talk is "tl;dr Sutton's Bitter Lesson"
@sanesanyo3 күн бұрын
One of my favourite talks in recent times..learnt so much from this.
@laalbujhakkar3 күн бұрын
thanks for good audio!
@TrishanPanch3 күн бұрын
Outstanding. I teach an AI class and there are loads of great pedagogical nuggets here that I am going to borrow.
@anshuraj427716 сағат бұрын
Hey... Nice... Am AI student... Would you like to connect?
@andrewgillespie44583 күн бұрын
Great talk!
@Nevermind10003 күн бұрын
Anyone know where to get the lecture notes for the lecture
@JH-bb8in3 күн бұрын
anyone looking for AI startup cofounders, comment below with ur LinkedIn
@chriscockrell94953 күн бұрын
Commonalities and differences.
@ivaninkorea3 күн бұрын
The professor is so cheerful! I stumbled upon this lecture, but I wanna watch the others just to see him happily explain and teach ^^
@poshpish60633 күн бұрын
where can i start to learn how to use genrative ai to make an app?
@lpmlearning29644 күн бұрын
It feels too theoretical. It should be more like this: how do I do this in practice, why do I do it this way, and why can it not be done with neural networks and have to go all the way to variational inference