How to implement PCA (Principal Component Analysis) from scratch with Python

  Рет қаралды 17,703

AssemblyAI

AssemblyAI

Күн бұрын

Пікірлер: 28
@pragyantiwari3885
@pragyantiwari3885 Ай бұрын
To bring more robustness in your PCA code, also mention the condition that the n_components value should not exceed the max eigenvectors that are created unless it would be meaningless. Here is the updated with explained variance ratio method too: # assumption that the data is standardized class PCA: def __init__(self,n_components): self.n_components = n_components def fit(self,X): cov_matrix = np.cov(X.T) eigens = np.linalg.eig(cov_matrix) eigenvectors = eigens.eigenvectors eigenvalues = eigens.eigenvalues # max eigenvectors we can retrieve self.max_components = eigenvectors.shape[0] if self.n_components
@MinhNguyen-cl9pq
@MinhNguyen-cl9pq Жыл бұрын
Line 19 seems to have a bug, as return values should be swapped based on Numpy documentation
@luis96xd
@luis96xd 2 жыл бұрын
Wow, amazing video of the course! I liked the theory part and how it is implemented with numpy 😄👍 It was all well explained, thanks! 😄👏💯😁
@kachunpang7543
@kachunpang7543 Жыл бұрын
HI, I am wondering the output of 'np.linalg.eig(cov)' in line 20. According to NumPy documentation the first output is the eigenvalues and the second should be set of eigenvectors stored inside a matrix. However, in line 20 the you swap the names between eigenvector and eigenvalues but still get a pleasant plot after PCA. Could someone explain this part to me? Thanks.
@dylan.savoia
@dylan.savoia Жыл бұрын
Great observation and I think you're right, in fact. I've run the code swapping the two variable - i.e. eigenvalues, eigenvectors = np.linalg.eig(cov) - and you get a different plot. This wouldn't make sense as you cannot multiply a matrix and a vector if the dimensions aren't appropriate, but for how numpy works, I suspect there is an implicit "broadcasting" happening at np.dot in the transform() method (line 35) making the operation possible. TL;DR: Numpy doesn't raise an error, but the result you get is in fact wrong.
@pragyantiwari3885
@pragyantiwari3885 Ай бұрын
see my comment in this video...i mentioned the solution
@business_central
@business_central Жыл бұрын
all the ones explained by the girl are very clearly explained and walked through, this guy seems he just wants to be done and he is not really explaining much at all.
@martinemond1207
@martinemond1207 9 ай бұрын
How would you go about reconstructing the original data from the X_projected based on PC1 and PC2, which kept only 2 dimensions from the original 4 dimensions?
@ernestbonat2440
@ernestbonat2440 2 жыл бұрын
You should implement PCA with NumPy only. In fact, you need to use NumPy everywhere possible. The NumPy is the faster Python numerical library today. We should not teach based on some student understanding definition. We should teach students with real Python production code for them to find a job only. Everyone needs to pass the job interviews.
@iDenyTalent
@iDenyTalent 2 жыл бұрын
stop talking grandpa
@projectaz77
@projectaz77 Жыл бұрын
@@iDenyTalent
@badi1072
@badi1072 2 ай бұрын
No, I'll not implement it. What will you do?
@michelebersani7294
@michelebersani7294 6 ай бұрын
Good morning, this playlist is amazing and I was searching it for several weeks. I have a question about the interpretection of the eigenvectors. Why do the eigenvectors, of the covariance matrix, point in the direction of maximum variance?
@MahmouudTolba
@MahmouudTolba 2 ай бұрын
if you take a unit vector in the variance dimension , and tried to get the value when projecting data into that line means how much that unit vector encodes information about the relationship between the vectors in other words how much that unit vector contain information about the dimensions of covariance matrix so it we need to maximise that value of that dot product using Lagrange multipliers we got that Σv=λv , eigenvectors of the covariance matrix
@ASdASd-kr1ft
@ASdASd-kr1ft Жыл бұрын
Nice video!, but i have one doubt, why you have more variance in the principal component 2 than principal component 1, is it cuz the scale?
@yusmanisleidissotolongo4433
@yusmanisleidissotolongo4433 7 ай бұрын
Thanks so much for sharing.
@pranavgandhiprojects
@pranavgandhiprojects Жыл бұрын
Loved the vedio.....thanks man
@eugenmalatov5470
@eugenmalatov5470 Жыл бұрын
Sorry, the theory part did not explain anything to me
@chyldstudios
@chyldstudios 2 жыл бұрын
You should implement PCA without using Numpy, just vanilla python (no external libraries). It's more pedagogically rigorous and leads to a deeper understanding.
@thejll
@thejll Жыл бұрын
Could you show how to do pca with gpu?
@gokul.sankar29
@gokul.sankar29 Жыл бұрын
you could try to use pytorch and replace the numpy arrays with pytorch arrays and similarly replace numpy functions with pytorch functions. You will have to read up a bit on how to use gpu with pytorch
@igordemetriusalencar5861
@igordemetriusalencar5861 2 жыл бұрын
Excellent video and beautiful OOP python programming, clean and easy to understand for a programmer, but OOP in data analysis is terribly ugly and not productive with a lot of not necessary abstraction with classes and methods. The functional paradigm is way way better for data analysis due to its easy (initial) concepts of data flow and functions that transform the data. This way anyone that learned "general system theory" could understand (managers, biologists, physicists, psychologists...) if you could do the same in a functional way would be amazing! (in Python, R, or Julia).
@0MVR_0
@0MVR_0 7 ай бұрын
> states 'from scratch' > proceeds to import numpy
@prithvimarwadi345
@prithvimarwadi345 7 ай бұрын
well numpy is just a mathematical computational tool, you are using it to make your life simpler. from scratch means you are not using models already made by other people
@0MVR_0
@0MVR_0 7 ай бұрын
@@prithvimarwadi345 proceeds to import numpy.cov and numpy.linalg.eig and calls the method 'from scratch'
@HarshavardhanaSrinivasan-c2e
@HarshavardhanaSrinivasan-c2e 5 ай бұрын
Are you asking to code from an assembly language standpoint?
@0MVR_0
@0MVR_0 5 ай бұрын
@@prithvimarwadi345 I would dispute that from scratch means translating all relevant mathematical equations into plain python algorithms. Principle Component Analysis can be shown through eigenvectors and linear algebra. Relying on imports is honestly lazy when exemplifying the process. I am going to refuse acknowledging the comment on assembly language.
@sanjayrakshit8797
@sanjayrakshit8797 7 күн бұрын
I notice a small mistake according to the documentation, here is the correct version: eigenvalues, eigenvectors = np.linalg.eig(cov)
How to implement Perceptron from scratch with Python
13:46
AssemblyAI
Рет қаралды 37 М.
How to implement K-Means from scratch with Python
23:42
AssemblyAI
Рет қаралды 17 М.
Миллионер | 3 - серия
36:09
Million Show
Рет қаралды 2 МЛН
Thank you Santa
00:13
Nadir Show
Рет қаралды 24 МЛН
Players vs Pitch 🤯
00:26
LE FOOT EN VIDÉO
Рет қаралды 134 МЛН
Principal Component Analysis (PCA) 1 [Python]
7:37
Steve Brunton
Рет қаралды 48 М.
StatQuest: Principal Component Analysis (PCA), Step-by-Step
21:58
StatQuest with Josh Starmer
Рет қаралды 2,9 МЛН
K-Means Clustering model from Scratch using Python | Step-by-Step Guide
24:26
How to implement Decision Trees from scratch with Python
37:24
AssemblyAI
Рет қаралды 70 М.
Principal Component Analysis (PCA) - easy and practical explanation
10:56
PCA Analysis in Python Explained (Scikit - Learn)
16:11
Ryan & Matt Data Science
Рет қаралды 5 М.
StatQuest: PCA in Python
11:37
StatQuest with Josh Starmer
Рет қаралды 207 М.
How to implement Naive Bayes from scratch with Python
14:37
AssemblyAI
Рет қаралды 31 М.
Миллионер | 3 - серия
36:09
Million Show
Рет қаралды 2 МЛН