Best Fit Line in 4 Lines of Code - Linear Regression with Python and SciKit-Learn

  Рет қаралды 17,221

Andrew Fung

Andrew Fung

Күн бұрын

Hello Everyone! My name is Andrew Fung, in this video, I will be showing you how to generate a line of best fit for a dataset by defining functions on yourself and also using the sklearn library’s function linearRegression() supported by Python. Hope you enjoy this tutorial ;)
#python​ #bestfitline #lineofbestfit #linearregression #machinelearning​
Kaggle’s Weight and Height dataset: www.kaggle.com/mustafaali96/w...
Installation and Setup!
Installing Jupyter Notebook: jupyter.readthedocs.io/en/lat​...
Sklearn linear regression doc: scikit-learn.org/stable/modul...
Check out my Github!
github.com/Andrew-FungKinHo
How I make my KZbin videos:
⌨️ Keyboard - Nuphy Air75 Mechanical Keyboard - amzn.to/3Xu4PD3
🎙 Microphone - MAONO A04 Professional Podcaster USB Microphone - amzn.to/3k8ocD5
🖱 Mouse - Microsoft Bluetooth Ergonomic Mouse - amzn.to/3CHhdHJ
🔌 Accessories - Laptop Docking Station for MacBook Pro - amzn.to/3CHi5Mv
Timestamps
0:00​ | Introduction
1:19 | Data cleaning
5:49​ | self-defined function method
16:12​ | SciKit-learn method
20:40​ | Out tro
Full code:
-------------------------------
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import style
from statistics import mean
from sklearn import linear_model
def best_fit_line(xs,ys):
slope = (((mean(xs) * mean(ys)) - mean(xs * ys)) / ((mean(xs) * mean(xs)) - mean(xs * XS)))
y_intecept = mean(ys) - slope * mean(XS)
return slope, y_intercept
load in dataframe and select a portion
df = pd.read_csv('weight-height.csv')
male_df = df[df['Gender'] == 'Male'][:200]
data cleaning:
male_df['Height'] = male_df['Height'].apply(lambda x: x*2.54)
male_df['Weight'] = male_df['Weight'].apply(lambda x: x*0.45359237)
convert height and weight columns to lists
height_list = male_df['Height'].tolist()
weight_list = male_df['Weight'].tolist()
convert lists to numpy lists
xs = np.array(height_list, dtype=np.float64)
ys = np.array(weight_list, dtype=np.float64)
1st method: using our own function
calculated slope and y-intercept of the lists
slope, y_intercept = best_fit_line(xs,ys)
get the regression line from the calculated slope and y-intercept
regression_line = [(slope * x) + y_intercept for x in XS]
Making predictions
average_man_height = 175.26
average_man_weight = (slope * average_man_height) + y_intercept
2nd method: using Python's sk-learn library
Create linear regression object
height_weight = linear_model.LinearRegression()
Train the model using the training sets
height_weight.fit(xs.reshape(-1,1),ys)
get the regression line using the model
regression_line = height_weight.predict(xs.reshape(-1,1))
Making predictions
KSI_height = 180
KSI_weight = height_weight.predict(np.array([[KSI_height]]))[0]
Plot outputs and plot customization
style.use('seaborn')
plt.scatter(xs,ys,label='Data Points', alpha=0.6,color='green',s=75)
plt.scatter(KSI_height,KSI_weight, label='KSI prediction',color='red',s=100)
plt.plot(xs,regression_line,label='Best Fit Line', color='orange',linewidth=4)
plt.title('Height and Weight linear regression')
plt.xlabel('Height (cm)')
plt.ylabel('Weight (kg)')
plt.legend()
plt.show()
-------------------------------
Feel free to drop a like and comment if you enjoy and video and let me know if you want me to do other types of programming videos ;) !!!

Пікірлер: 24
@ifeanyiokwuazu3225
@ifeanyiokwuazu3225 3 жыл бұрын
This is the best Video I have seen in a while, so explanatory. Thank you
@AndrewFungKinHo
@AndrewFungKinHo 3 жыл бұрын
Thank you so much for your support ;)
@hendrag1174
@hendrag1174 2 жыл бұрын
Thank you very much Andrew. After watching and practicing many videos in youtube, it is overstated if I could say that this video is the best one. Keep doing a good work to youtube community and may God bless you.
@AndrewFungKinHo
@AndrewFungKinHo 2 жыл бұрын
Thank you so much for your kind words!
@hendrag1174
@hendrag1174 2 жыл бұрын
@@AndrewFungKinHo Anytime Andrew
@proterotype
@proterotype 3 жыл бұрын
This was concise and informative. Thanks you for posting
@AndrewFungKinHo
@AndrewFungKinHo 3 жыл бұрын
Thank you so much for the support Nate ;)
@kaduflutist
@kaduflutist Жыл бұрын
Thank you very much, Andrew! Your teaching tecnique is very good and efficient, I think. I'm used to using language R and working with rstudio; now I'm learning spyder python to see how things work. Everything you showed worked on my side and I feel very pleased. Keep doing the good job, you have what it takes to teach/coach.
@AndrewFungKinHo
@AndrewFungKinHo Жыл бұрын
You're very welcome! Thank you for your kind words. 🙏🏻
@l0nefighter509
@l0nefighter509 2 жыл бұрын
Thank you very much I was able to use this for my final project in University!
@AndrewFungKinHo
@AndrewFungKinHo 2 жыл бұрын
Glad that I can help you with your final project L0neFightEr ;)
@RGmusics73
@RGmusics73 2 жыл бұрын
great and simple explanation 👍. Thank you
@AndrewFungKinHo
@AndrewFungKinHo 2 жыл бұрын
Thank you so much for your support ;)
@cpark4567
@cpark4567 2 жыл бұрын
Really great materials!!
@AndrewFungKinHo
@AndrewFungKinHo 2 жыл бұрын
Thank you so much for your support!!! :)
@aizaadnan1066
@aizaadnan1066 Жыл бұрын
Nice information
@jjohhny96
@jjohhny96 2 жыл бұрын
Technical question here: why did you need to reshape the xs and how do you decide on the values to use?
@sathvikthogaru5421
@sathvikthogaru5421 2 жыл бұрын
thank you
@AndrewFungKinHo
@AndrewFungKinHo 2 жыл бұрын
Thanks for your support Sathvik ;)
@meriem981
@meriem981 2 жыл бұрын
Hi please can u do a tutorial on how to fit a scatter plot with a logarithmic trend
@mohamedrezquellah529
@mohamedrezquellah529 Жыл бұрын
nice video thank you just one remark you can directly convert pandas columns too numpy using the to_numpy weight = dfMale['Weight'].to_numpy()
@pinkyeeeepinkydas
@pinkyeeeepinkydas 2 жыл бұрын
In first method, i am getting diiferent o/p when using regression line- i got regression line but all data samples lies is o axix. when plt.plot(xs,regression_line,label='Data points',color='orange', linewidth=4 ). inittially, i got exact scatter plot data samples towards x axis.but when i coded math eq of regression line , now showing all data sample lies in z axis.what is the mistake, can you guide me please
@TheSkyMagician
@TheSkyMagician 3 жыл бұрын
do you want some
@TheSkyMagician
@TheSkyMagician 3 жыл бұрын
EMMMMMMMMMMMMMMMM richer
50 YouTubers Fight For $1,000,000
41:27
MrBeast
Рет қаралды 138 МЛН
Does size matter? BEACH EDITION
00:32
Mini Katana
Рет қаралды 18 МЛН
Scary Teacher 3D Nick Troll Squid Game in Brush Teeth White or Black Challenge #shorts
00:47
How to: Import, Plot, Fit, and Integrate Data in Python
24:11
Linear Regression With Python in Google Colab
31:59
ButlerU Information Systems
Рет қаралды 8 М.
Polynomial Regression in Python
20:18
NeuralNine
Рет қаралды 43 М.
Least Squares Fit and Graphing in Python
9:49
Dot Physics
Рет қаралды 3,6 М.
Multiple Linear Regression using python and sklearn
19:51
Krish Naik
Рет қаралды 148 М.
Machine Learning with Python! Mean Squared Error (MSE)
23:24
Adrian Dolinay
Рет қаралды 13 М.
Professional Preprocessing with Pipelines in Python
21:48
NeuralNine
Рет қаралды 59 М.
50 YouTubers Fight For $1,000,000
41:27
MrBeast
Рет қаралды 138 МЛН