Рет қаралды 17,221
Hello Everyone! My name is Andrew Fung, in this video, I will be showing you how to generate a line of best fit for a dataset by defining functions on yourself and also using the sklearn library’s function linearRegression() supported by Python. Hope you enjoy this tutorial ;)
#python #bestfitline #lineofbestfit #linearregression #machinelearning
Kaggle’s Weight and Height dataset: www.kaggle.com/mustafaali96/w...
Installation and Setup!
Installing Jupyter Notebook: jupyter.readthedocs.io/en/lat...
Sklearn linear regression doc: scikit-learn.org/stable/modul...
Check out my Github!
github.com/Andrew-FungKinHo
How I make my KZbin videos:
⌨️ Keyboard - Nuphy Air75 Mechanical Keyboard - amzn.to/3Xu4PD3
🎙 Microphone - MAONO A04 Professional Podcaster USB Microphone - amzn.to/3k8ocD5
🖱 Mouse - Microsoft Bluetooth Ergonomic Mouse - amzn.to/3CHhdHJ
🔌 Accessories - Laptop Docking Station for MacBook Pro - amzn.to/3CHi5Mv
Timestamps
0:00 | Introduction
1:19 | Data cleaning
5:49 | self-defined function method
16:12 | SciKit-learn method
20:40 | Out tro
Full code:
-------------------------------
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import style
from statistics import mean
from sklearn import linear_model
def best_fit_line(xs,ys):
slope = (((mean(xs) * mean(ys)) - mean(xs * ys)) / ((mean(xs) * mean(xs)) - mean(xs * XS)))
y_intecept = mean(ys) - slope * mean(XS)
return slope, y_intercept
load in dataframe and select a portion
df = pd.read_csv('weight-height.csv')
male_df = df[df['Gender'] == 'Male'][:200]
data cleaning:
male_df['Height'] = male_df['Height'].apply(lambda x: x*2.54)
male_df['Weight'] = male_df['Weight'].apply(lambda x: x*0.45359237)
convert height and weight columns to lists
height_list = male_df['Height'].tolist()
weight_list = male_df['Weight'].tolist()
convert lists to numpy lists
xs = np.array(height_list, dtype=np.float64)
ys = np.array(weight_list, dtype=np.float64)
1st method: using our own function
calculated slope and y-intercept of the lists
slope, y_intercept = best_fit_line(xs,ys)
get the regression line from the calculated slope and y-intercept
regression_line = [(slope * x) + y_intercept for x in XS]
Making predictions
average_man_height = 175.26
average_man_weight = (slope * average_man_height) + y_intercept
2nd method: using Python's sk-learn library
Create linear regression object
height_weight = linear_model.LinearRegression()
Train the model using the training sets
height_weight.fit(xs.reshape(-1,1),ys)
get the regression line using the model
regression_line = height_weight.predict(xs.reshape(-1,1))
Making predictions
KSI_height = 180
KSI_weight = height_weight.predict(np.array([[KSI_height]]))[0]
Plot outputs and plot customization
style.use('seaborn')
plt.scatter(xs,ys,label='Data Points', alpha=0.6,color='green',s=75)
plt.scatter(KSI_height,KSI_weight, label='KSI prediction',color='red',s=100)
plt.plot(xs,regression_line,label='Best Fit Line', color='orange',linewidth=4)
plt.title('Height and Weight linear regression')
plt.xlabel('Height (cm)')
plt.ylabel('Weight (kg)')
plt.legend()
plt.show()
-------------------------------
Feel free to drop a like and comment if you enjoy and video and let me know if you want me to do other types of programming videos ;) !!!