Рет қаралды 37
Simple linear regression:
Simple linear regression aims to find a linear relationship to describe the correlation between an independent and possibly dependent variable.
Importing Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
`numpy` is used for numerical operations.
`pandas` is used for data manipulation and analysis.
`matplotlib.pyplot` is used for plotting graphs.
`train_test_split` from `sklearn` is used to split the dataset into training and testing sets.
`LinearRegression` from `sklearn` is used to create and train the linear regression model.
Loading the Dataset
data = pd.read_csv('/content/drive/MyDrive/Data sets ml/Salary.csv')
print(data.head())
The dataset is loaded using `pd.read_csv`.
`data.head()` displays the first few rows of the dataset to understand its structure.
Extracting Dependent and Independent Variables
x = data.iloc[:, :-1].values
y = data.iloc[:, 1].values
print(x)
print(y)
`x` contains the independent variable(s) (in this case, years of experience).
`y` contains the dependent variable (salary).
`data.iloc[:, :-1]` selects all columns except the last one for `x`.
`data.iloc[:, 1]` selects the second column for `y`.
Splitting the Dataset into Training and Testing Sets
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=0)
The dataset is split into training and testing sets.
`test_size=0.2` means 20% of the data is used for testing, and 80% for training.
`random_state=0` ensures reproducibility of the split.
Fitting the Model
regressor = LinearRegression()
regressor.fit(x_train, y_train)
A `LinearRegression` model is created.
The model is trained using the `fit` method on the training data (`x_train` and `y_train`).
Predicting on the Testing Set
y_pred = regressor.predict(x_test)
The trained model predicts salaries (`y_pred`) based on the testing set (`x_test`).
Plotting the Training Data Graph
plt.scatter(x_train, y_train, color="red")
plt.plot(x_train, regressor.predict(x_train), color="green")
plt.title("Salary vs Experience (Training set)")
plt.xlabel("Years of Experience")
plt.ylabel("Salary")
plt.show()
A scatter plot of the training data (`x_train`, `y_train`) is created, with red dots representing actual data points.
A green line represents the predicted salaries based on the training data.
Titles and labels are added for clarity.
Plotting the Testing Data Graph
plt.scatter(x_test, y_test, color="red")
plt.plot(x_train, regressor.predict(x_train), color="green") # Use x_train for the line
plt.title("Salary vs Experience (Testing set)")
plt.xlabel("Years of Experience")
plt.ylabel("Salary")
plt.show()
A scatter plot of the testing data (`x_test`, `y_test`) is created, with red dots representing actual data points.
The green line from the training data is plotted to compare the actual testing data points against the model's predictions.
Titles and labels are added for clarity.
Summary:
1. **Libraries**: Imported necessary libraries.
2. **Data Loading**: Loaded and displayed the dataset.
3. **Variable Extraction**: Extracted independent (`x`) and dependent (`y`) variables.
4. **Data Splitting**: Split the data into training and testing sets.
5. **Model Fitting**: Created and trained a linear regression model.
6. **Prediction**: Predicted salaries using the test set.
7. **Visualization**: Plotted graphs for both training and testing sets to visualize the model's performance.
#ml #machinelearningbasics #machinelearningtutorialforbeginners #linearregression #code #python #machinelearning