Deep Learning(CS7015): Lec 2.6 Proof of Convergence of Perceptron Learning Algorithm

Рет қаралды 83,118

NPTEL-NOC IITM

Күн бұрын

Пікірлер: 53

@manoranjansahu7161 Жыл бұрын

Best explanation I have seen on Perceptron!

@vivekrachakonda5803 Жыл бұрын

Very clear explanation on proof of convergence.

@marcuswonder5057 5 жыл бұрын

This is suprisingly well explained, thank you. For anyone watching this - don't be put off by his accent. The content is worth to be watched :)

@jeffinbiju5042 4 жыл бұрын

Welcome to IIT

@ankitgupta8797 3 жыл бұрын

What's wrong with his accent?

@bavidlynx3409 2 жыл бұрын

@@ankitgupta8797 he might be foreigner bitch

@amanutkarsh724 Жыл бұрын

why suprisingly?

@NayanDwivedi-ob6ql 10 ай бұрын

no one was put off by his accent in the first place!

@sarcastitva 4 жыл бұрын

"Please, we are done!" - That's what she said.

@parvalunawat4788 2 ай бұрын

That was an Awesome Explanation 👌👌👌

@abhishek-tandon Жыл бұрын

Brilliant teacher

@sohambasu660 2 жыл бұрын

why is p(i) square = 1 when proving on the denominator ?

@amoghsinghal346 2 жыл бұрын

P is a unit norm

@soumyasen1483 3 жыл бұрын

At 13:20, even though roughly speaking, we could say cos (beta) grows proportional to sqrt(k), how would one prove that mathematically?

@RandomLolaRandom 3 жыл бұрын

Take the expression on the right hand side, divide it by √k, and find the limit as k→∞. The division is how you would find the proportionality constant between two expressions; the limit will calculate the proportionality as k goes to infinity. It approaches −𝛿 as k grows. Another approach to show that "if k is unbounded then cos(β) is also unbounded" is to calculate the limit as k→∞ of just the right-hand side of the inequality in the slide (completely ignoring the proportionality to √k). Use L'Hôpital's rule and simplify to find the limit.

@pattabhidyta7991 2 жыл бұрын

@ 14:14 + too good 😀

@sohambasu660 2 жыл бұрын

Shouldn't it be wi * xi >minus( w0)? because w0(bias) = minus(threshold) ? Am I missing something here ?

@Vivekagrawal5800 4 жыл бұрын

Why is w and x considered positive throughout?

@anjushac7960 4 жыл бұрын

There is not assumption about w, It can be positive or negative. The only assumption about x is that its normalized and that it belongs to the "Positive class". If any x belongs to "Negetive class", the negative of that x is taken, which will inturn be in the positive class

@nandinimishra9153 Жыл бұрын

p' means all possible positive points?

@sufiyanadam 2 ай бұрын

Technically it is the union of all the positive points and the NOT (negative) points.

@RakeshGupta-ft6cc 4 жыл бұрын

In the entire proof, we haven't used the statement that the data is linearly separable. then even if the data is linearly inseparable, then wont this proof work? but it shouldnt. then how is the fact that data is linearly separable effecting this proof?

@rajeevsebastian6513 4 жыл бұрын

But if it is not linearly separable would there be w star? Please understand that w star in this case exists and is just unknown. But in your case, it does not exist. So how will you say anything about the Cos Beta between wt and w*?

@aswathik4709 2 жыл бұрын

This proof is done assuming w* exists, that is there could be a line completely separating these two sets of points. So the moment w* doesn't exist the proof will be invalid.

@bharathraman2098 4 жыл бұрын

If the delta represents minimum increase based on the set of points (p) then k could actually be more than 't' right? For example, imagine there is a value of p which is large and violates the angle requirement. Then, the increase in each iteration will be the minimum value of p which is represented by delta. If multiple times this iteration needs to run in order to overcome the violation for a specific point then overall k can exceed 't'. Am I missing something here?

@anjushac7960 4 жыл бұрын

I think you are confusing between 't' (the total number of time steps) and the total number of data points. In general, in one go, we iterate over all the data points, but there is no guarantee that it will converge if we do only that much. So, we have to repeat the process again. i.e, iterate over the data points again. Maybe after a few cycles of iterations, it will converge. The theorem only gurantess that it will converge after a finite set of cycles

@IIMRaipur_is_a_fraud_institute 10 ай бұрын

You answer your own question and yet do not realize. If multiple iterations are needed for a particular point, then those multiple iterations are added to the 't'. Therefore, even in such a scenario, k

@sohambasu660 2 жыл бұрын

Why does summation ( wixi > w0) needs to be proved ? Shouldn't we prove that sumamtion(wixi > 0) instead of (wixi > w0) ? Please somebody help

@amoghsinghal346 2 жыл бұрын

P is a unit norm

@umang9997 Жыл бұрын

I think you are reffering to 1:44 A. wixi > Wo has summation from i=1 to n B. wixi > 0 has summation from i=0 to n If you look closely, B ----> w0x0 + w1x1 + w2x2... wnxn > 0 and A ----> w1x1 + w2x2... wnxn > W0 which is same as w0x0 + w1x1 + w2xn...wnxn > 0 with w0=-W0 and x0=1 and which is same as B. Here w0 is also called bias and W0 is called threshold. Hence proving Summation ( wixi > w0) with i=1 to n is same as proving Summation ( wixi > 0) with i=0 to n

@vishnuvardhan6625 8 ай бұрын

Where can I get slides for these lectures?

@rishiRdutta 7 ай бұрын

Google CS7015 and download handouts.

@saptarshisanyal6738 2 ай бұрын

Watching it in 2024. In today's world we have brilliant tools like desmos to have better geometrical intuition of what he says in this series of tutorial. NPTEL lectures are generally very dry and Prof Khapra was not able to explain it clearly. Somehow I felt that the dots were not connected.

@dipaknidhi3969 4 жыл бұрын

Sir why you take transpose of W

@sarrae100 3 жыл бұрын

For matrix multiplication, dimensions have to be adjusted first.

@Abhisheism 6 жыл бұрын

Why do we Normalize the Inputs??

@SahanGamage99 3 жыл бұрын

For sigmoidal activation functions the gradient descent can operate faster as the gradients are the large when the magnitudes are lower

@Abhisheism 3 жыл бұрын

@@SahanGamage99 I appreciate your replying but its already two years since I posted it and figured it out later. Thanks BTW

@bavidlynx3409 2 жыл бұрын

@@Abhisheism well others might have same doubt so its helpful for them. *BTW i am others*

@surajkumar156 4 жыл бұрын

Sir..it is somewhat confusing, whether the entities of P and N are in order (ascending/descending) or random.

@anjushac7960 4 жыл бұрын

Random.

@wishurathore7214 3 жыл бұрын

Why would order matter if we randomly select points

@saurabhsuman2506 6 жыл бұрын

why || pi ||^2 = 1 at 12:01

@gauravlotey4263 4 жыл бұрын

@ravi gurnatham Why is 2*Wi*Pi taken negative?

@arvind31459 4 жыл бұрын

because all the inputs are normalized before training the perceptron...refer setup @ 5:43

@wishurathore7214 3 жыл бұрын

@@gauravlotey4263 Because pi was misclassified.

@amoghsinghal346 2 жыл бұрын

P is a unit norm

@manmeetpatel9475 Жыл бұрын

How is ||pi||² = 1?

@sufiyanadam 2 ай бұрын

It is the unit norm of pi

@sahilkhanna8332 Жыл бұрын

Kya hi horha hai

@rafaelantoniogomezsandoval3785 5 жыл бұрын

Nicela

@prithwishguha309 2 ай бұрын

I Already said this in your BS degree lecture So I'm not going to repeat myself but No, the whole Proof is wrong because the whole statement is wrong. You can't prove that perceptron always reaches convergence when we clearly know it doesn't. it depends on the starting point... but to be honest this course is much more, far more better than that one😂 other than bad intro, bad intro music and bad outro🤣