No video

Principal Component Analysis in Python | How to Apply PCA | Scree Plot, Biplot, Elbow & Kaisers Rule

  Рет қаралды 3,752

Statistics Globe

Statistics Globe

Жыл бұрын

This video explains how to apply a Principal Component Analysis (PCA) in Python. More details: statisticsglobe.com/principal...
The video is presented by Cansu Kebabci, a data scientist and statistician at Statistics Globe. Find more information about Cansu here: statisticsglobe.com/cansu-keb...
In the video, Cansu explains the steps and application of a Principal Component Analysis in Python. Watch the video to learn more on this topic!
Here can you find the previous videos of this series:
Introduction to Principal Component Analysis (Pt. 1 - Theory): • Introduction to Princi...
Principal Component Analysis in R Programming (Pt. 2 - PCA in R): • Principal Component An...
Links to the tutorials mentioned in the video:
PCA Using Correlation & Covariance Matrix (Examples): statisticsglobe.com/pca-corre...
Biplot for PCA Explained: statisticsglobe.com/biplot-pc...
Python code of this video:
Install libraries
!pip install scikit-learn
!pip install pandas
!pip install matplotlib
!pip install numpy
Load Libraries & Modules
import numpy as np
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt
Load Breast Cancer Dataset
breast_cancer = load_breast_cancer()
Data Elements of breast_cancer
breast_cancer.keys()
breast_cancer.data.shape
breast_cancer.feature_names
Print Data in DataFrame Format
DF = pd.DataFrame(data = breast_cancer.data[:, :10], # Create DataFrame DF
columns = breast_cancer.feature_names[:10])
DF.head(6) # Print first 6 rows of DF
Standardize Data
scaler = StandardScaler() # Create scaler
data_scaled = scaler.fit_transform(DF) # Fit scaler
print(data_scaled) # Print scaler
Print Standardized Data in DataFrame Format
DF_scaled = pd.DataFrame(data = data_scaled,
columns = data.feature_names[:10])
DF_scaled.head(6)
Print Standardized Data in DataFrame Format
DF_scaled = pd.DataFrame(data = data_scaled, # Create DataFrame DF_scaled
columns = breast_cancer.feature_names[:10])
DF_scaled.head(6) # Print first 6 rows of DF_scaled
Ideal Number of Components
pca = PCA(n_components = 10) # Create PCA object forming 10 PCs
pca_trans = pca.fit_transform(DF_scaled) # Transform data
print(pca_trans) # Print transformed data
print(pca_trans.shape) # Print dimensions of transformed data
prop_var = pca.explained_variance_ratio_ # Extract proportion of explained variance
print(prop_var) # Print proportion of explained variance
PC_number = np.arange(pca.n_components_) + 1 # Enumarate component numbers
print(PC_number) # Print component numbers
Scree Plot
plt.figure(figsize=(10, 6)) # Set figure and size
plt.plot(PC_number, # Plot prop var
prop_var,
'ro-')
plt.title('Scree Plot (Elbow Method)', # Plot Annotations
fontsize = 15)
plt.xlabel('Component Number',
fontsize = 15)
plt.ylabel('Proportion of Variance',
fontsize = 15)
plt.grid() # Add grid lines
plt.show() # Print graph
#Alternative Scree Plot Data
var = pca.explained_variance_ # Extract explained variance
print(var) # Print explained variance
The remaining code is unfortunately too long for a KZbin description.
Follow me on Social Media:
Facebook - Statistics Globe Page: / statisticsglobecom
Facebook - R Programming Group for Discussions & Questions: / statisticsglobe
Facebook - Python Programming Group for Discussions & Questions: / statisticsglobepython
LinkedIn - Statistics Globe Page: / statisticsglobe
LinkedIn - R Programming Group for Discussions & Questions: / 12555223
LinkedIn - Python Programming Group for Discussions & Questions: / 12673534
Twitter: / joachimschork
Instagram: / statisticsglobecom
TikTok: / statisticsglobe

Пікірлер: 20
@darrylmorgan
@darrylmorgan Жыл бұрын
Thank you Cansu and Joachim for the awesome PCA in python tutorial :)
@cansustatisticsglobe
@cansustatisticsglobe Жыл бұрын
Hello Darryl, We are glad to hear that you liked the video. Have a good one! Cansu
@thaynalg
@thaynalg 10 ай бұрын
That was so helpful, great job. I can't thank you enough.
@matthias.statisticsglobe
@matthias.statisticsglobe 10 ай бұрын
Thank you very much for the nice words and your support. It's great to hear that the video has been helpful for you!
@albertoavendano7196
@albertoavendano7196 8 ай бұрын
Let me tell you something: This might me the most clearer PCA video of all... Simple, clear and we have the concepts in other videos... Which is great!!! Thank you a lot... Just one question, is the PCA useful for supervised learning as well or do we use RFECV for that? Those are the 2 method for feature reduction I used...
@cansustatisticsglobe
@cansustatisticsglobe 8 ай бұрын
Hello Alberto, I'm deeply honored by your feedback. It's wonderful to know that diligent effort pays off. Regarding your question, Principal Component Analysis (PCA) and Recursive Feature Elimination with Cross-Validation (RFECV) are both techniques used for feature reduction, but they are useful in different contexts within the realm of supervised learning. PCA can be used in supervised learning but with caution. It's great for reducing the number of features and, hence, computation time. However, the principal components are linear combinations of the original features and might not have a straightforward interpretation. The main limitation in the context of supervised learning is that PCA does not consider the response variable. It only focuses on explaining the variance in the predictors, which might not always align with the predictive power with respect to the response variable. In summary, PCA can be used in supervised learning but would come with the aforementioned limitations. Best, Cansu
@popuriann8983
@popuriann8983 4 ай бұрын
Halo, thank you for the very comprehensive tutorial. However, I have some questions, if in my case I have 3 PC how to analyze it in biplot? and for the second question, I am a bit confused with the loading score, if it shows negative value, than what is that mean?
@StatisticsGlobe
@StatisticsGlobe 4 ай бұрын
Hey, thanks for the kind words, glad the tutorial was helpful! You may have a look at 3D plots when you want to analyze 3 components: statisticsglobe.com/3d-plot-pca-python A negative loading score in PCA means that the original variable inversely correlates with the principal component. I hope this helps! Joachim
@planq521
@planq521 4 ай бұрын
what about if we use more than two components how can we able to plot them in to graph?
@StatisticsGlobe
@StatisticsGlobe 4 ай бұрын
Hey, this really depends on what you want to analyze. However, when using more than two components in PCA, visualization typically involves plotting the first three principal components in 3D scatter plots. For more components, parallel coordinates plots or heatmaps can effectively represent higher-dimensional data.
@TeunXt5
@TeunXt5 4 ай бұрын
what program do you use to run python? I am using VS code, but this looks more convenient
@StatisticsGlobe
@StatisticsGlobe 4 ай бұрын
Hey, we are using Jupyter Notebook in this video. I think it's great! :)
@mjacfardk
@mjacfardk Жыл бұрын
🙏 thank for your great Tutorial.
@cansustatisticsglobe
@cansustatisticsglobe Жыл бұрын
You are very welcome! Best, Cansu
@user-pj3ln6fq7d
@user-pj3ln6fq7d 10 ай бұрын
Hello thanks, suppose I have two replication data of several genotypes. In the first column mentioned genotype names and in the second R1..R2 in the alternative form and in the next column variables data, then How i can get the mean values of R1 and R2 together of all variables? thanks in advance!
@cansustatisticsglobe
@cansustatisticsglobe 10 ай бұрын
Hello, You are welcome. I couldn't get your data setting. Could you please be more specific? Best, Cansu
@user-pj3ln6fq7d
@user-pj3ln6fq7d 10 ай бұрын
@@cansustatisticsglobe thanks a lot for your reply, can I send you my file through email? I will explicitly mention this.
@cansustatisticsglobe
@cansustatisticsglobe 10 ай бұрын
Hello @@user-pj3ln6fq7d ! Please try to explain it here so that the other visitors struggling with similar issues can benefit from the solution. I will do my best to help in the comments. Best, Cansu
@northernswedenstories1028
@northernswedenstories1028 Жыл бұрын
PCA and MCA is damn cool.
@cansustatisticsglobe
@cansustatisticsglobe Жыл бұрын
Indeed! Best, Cansu
PCA Analysis in Python Explained (Scikit - Learn)
16:11
Ryan & Matt Data Science
Рет қаралды 2,2 М.
Советы на всё лето 4 @postworkllc
00:23
История одного вокалиста
Рет қаралды 5 МЛН
No empty
00:35
Mamasoboliha
Рет қаралды 12 МЛН
Joker can't swim!#joker #shorts
00:46
Untitled Joker
Рет қаралды 29 МЛН
😳 Все русские уже знают итальянский?🇮🇹
00:15
Principal Component Analysis (PCA) in R Studio: Index Building Tutorial
11:36
Советы на всё лето 4 @postworkllc
00:23
История одного вокалиста
Рет қаралды 5 МЛН