Your videos are amazing, very clear and concise explanations!
@riadhbennessib39612 жыл бұрын
Thank you so much for the videos lessons, it encourage me to re-see the hard book of Bishop!
@KapilSachdeva2 жыл бұрын
🙏
@vi5hnupradeep3 жыл бұрын
thank you so much for your videos! they are really good at explaining concepts
@KapilSachdeva3 жыл бұрын
🙏
@wsylovezx Жыл бұрын
greatly appreciate your super clear video. I have an question at 5:48, by Bayesian formula, p(w|x,t) ∝ p(x,t|w)*p(w), it is different with the expression p(w|x,t) ∝ p(t,x|w)*p(w) from your slide.
@lakex24 Жыл бұрын
should be: it is different with the expression p(w|x,t) ∝ p(t|x, w)*p(w) from your slide.
@sbarrios933 жыл бұрын
This video is pure gold. Thank you!!
@KapilSachdeva3 жыл бұрын
🙏🙏
@bodwiser1006 ай бұрын
Thank you! One request -- can you explain the reason behind the equivalence between assuming that the target variable is normally distributed and the assumption that the errors are normally distributed. While I understand that the two assumptions are simply the two sides of the same coin, the mathematical equivalence between them appeared to me like something that is implicitly assumed in moving from part 2 video to part 3 video.
@yeo2octave273 жыл бұрын
Thank you for the video! I am currently reading up on manifold regularization and I am curious about applying Bayesian methods with MR. 12:11 for elastic net/manifold regularization we introduce a second regularization term to our analytical solution, could we simply express the prior as being conditioned on the 2 hyperparameters, i.e. p(w | \alpha, \gamma), by applying the Bayes theorem? How then could we arrive at an expression of the distribution of w? i.e. w ~ N(0, \alpha^(-1) * I)
@yogeshdhingra40702 жыл бұрын
Your lectures are gems..there is so much to learn here! Thanks for such a great explanation.
@KapilSachdeva2 жыл бұрын
🙏
@goedel.2 жыл бұрын
Thank you!
@KapilSachdeva2 жыл бұрын
🙏
@SANJUKUMARI-vr5nz2 жыл бұрын
Very nice vedio
@KapilSachdeva2 жыл бұрын
🙏
@adesiph.d.journal4613 жыл бұрын
Sorry for spamming with questions. In terms of programming when we say p(w) is a prior is this equivalent to initializing the weight with random a Gaussian, like in PyTorch "torch.nn.init.xavier_uniform(m.weight)"
@KapilSachdeva3 жыл бұрын
Please do not hesitate to ask questions. Prior is “your belief” about the value of a random variable ( w in this case). “Your belief” is your (data analysts/scientists) domain knowledge about w that you express as a random variable. Let’s take a concrete example. You were modeling the distribution of heights of adult males in India. Even before you go and collect the dataset you would have a belief about the height of adult males in India. Based on your experience you would say that it could be anything in between 5’ and 6’. If you think that all the values between 5’ and 6’ are equally likely then you would say that my prior is the uniform distribution with support from 5’ to 6’ Now coming to your pytorch expression, it is creating a tensor whose values are uniformly distributed (between 0 and 1). In neural networks, typically you fill in the “random” values to initialize your weights. You donot typically express your domain knowledge (aka the prior as in Bayesian statistics) Based on above, philosophically, the answer is no prior is not equivalent to your expression however implicitly it your belief (albeit completely random) about the “initial” values of weights.
@adesiph.d.journal4613 жыл бұрын
@@KapilSachdeva thank you so much! This makes total sense. I went on to watch the videos a few times to make sure the concepts sync in completely before I advance and every iteration of the video things are becoming more clear and I am able to connect things! Thanks!
@KapilSachdeva3 жыл бұрын
@@adesiph.d.journal461 🙏
@adesiph.d.journal4613 жыл бұрын
I came for this part from the book and you nailed it! Thanks. A quick question how are you differentiating in terms of notation between Conditional Probability and Likelihood. I find it confusing in PRML. To my understanding, Conditional Probability is a scalar value that indicates the chance of an event (the one in the numerator (I understand this is not a numerator but to convey my point)) given the the events in (denominator) have occurred. While the Likelihood is trying to find the best values of mean, standard deviation to maximize the occurrence of a particular value. I might be wrong! happy to be corrected :) The confusion is ideally raised because in the previous part we had p(t|x,w,beta) here we wanted to find optimal w,beta to "maximize the likelihood of t". While here p(w|alpha) becomes conditional probability or even p(w|x,t) also as conditional probability. They maybe naive questions! Sorry!
@KapilSachdeva3 жыл бұрын
No not a naive question. The difference between probability and likelihood has bothered many people. Your confusion stems from the fact that overloaded and inconsistent usage of notations and terminology is one of the root causes of why learning maths & science is difficult. Unfortunately, the notation of likelihood is the same as that of conditional probability. The "given" is indicated using the "|" operator. Both likelihood and conditional probabilities you have "given" operators. In some literature & context, the symbol "L" is used (with parameter and data flipped) > While the Likelihood is trying to find the best values of mean, standard deviation to maximize the occurrence of a particular value. > While here p(w|alpha) becomes conditional probability or even p(w|x,t) also as conditional probability. Here is one way to see all this and make sense of the terminology. In MLE, your objective is to find the values of parameters (mu, etc) keeping the data fixed. The outcome is what we call - likelihood. This likelihood is kind of a relative plausibility (probability) or proportional to probability. Now when we treat "parameters" as Random Variables then we seek their "probability distributions". A parameter (RV) could depend on another parameter (RV or Scalar) and hence these probability distributions take the form of conditional prob distributions. Hope this makes sense.
@adesiph.d.journal4613 жыл бұрын
@@KapilSachdeva Thank you so much for such a detailed response. Yes my confusion did come from the fact my previous knowledge of Likelihood had L as the notation and reversed notation for the distribution. This makes sense thank you!
@KapilSachdeva3 жыл бұрын
🙏
@zgbjnnw93063 жыл бұрын
at 12:38, if you make two equations equal, lambda is not calculated as the ration of alpha/beta... the equation of lambda includes the sum of deviation and W^tW...
@KapilSachdeva3 жыл бұрын
The value of lambda is not obtained by equating 2 equations. It’s purpose is to show that the hyper parameter (lambda) in ridge regression can be seen as a ratio of alpha and beta. In other words, the MAP equation is scaled by 1/beta.
@zgbjnnw93063 жыл бұрын
@@KapilSachdeva Thanks! Where I can see the derivation of lambda written as alpha/beta? Could I find it in the book by Bishop?
@KapilSachdeva3 жыл бұрын
@@zgbjnnw9306 section 1.2.5 of Bishop …the very last lines …
@YT-yt-yt-34 ай бұрын
P(w|x) - what is x mean her exactly. Probability of weight given different data within training or different training set or something else?
@pythonerdhanabanshi45542 жыл бұрын
I would push multiple likes if available...so satisfying...
@KapilSachdeva2 жыл бұрын
🙏
@zgbjnnw93063 жыл бұрын
why the posterior p(w | x t) uses the likelihood p(t | x w B) instead of p( x t | w )? Why there's B in the likelihood?
@KapilSachdeva3 жыл бұрын
This is the inconsistency of the notation that I talk about. Normally we would think that whatever goes after “|” (given) is a probability distribution but the notation allows to use scalar/hyper parameter/point estimates as well. Logically it is ok as even though in this exercise we are not treating beta as prob distribution, the likelihood still depends on it. Hence it is okay to include it in the notation. This is what makes me sad. The inconsistency in notation in the literature and books.
@zgbjnnw93063 жыл бұрын
Kapil Sachdeva Thanks for your help! So beta and X are both considered ‘constant’ like alpha?
@KapilSachdeva3 жыл бұрын
@@zgbjnnw9306 you can see it like that. Nothing wrong with it, however a better way of saying that would be -: Beta is either a hyper parameter (something u guess or set it based on your domain expertise) or a point estimate that you obtain using frequentists methods.
@stkyriakoulisdr3 жыл бұрын
The only mistake in this video is that "a posteriori" is latin and not french. cheers!
@KapilSachdeva3 жыл бұрын
You are absolutely correct. Many thanks for spotting it and informing me.
@stkyriakoulisdr3 жыл бұрын
@@KapilSachdeva I meant it as a compliment. Since the rest of the video was so well-explained
@KapilSachdeva3 жыл бұрын
I understood it :) ... but I am genuinely thankful for this correction because so far I had thought it is French. Your feedback will help me not make this mistake again.
@sujathaontheweb374021 күн бұрын
@kapil How did you think of formulating the problem as p(w|x, t)?