Linear regression is a statistical method used to model the relationship between a dependent variable and one or more independent variables by fitting a linear equation to the observed data. In this article, we’ll go over how to perform a linear regression in Python using the scikit-learn
library.
First, let’s start by installing scikit-learn
and any other dependencies you might need. You can do this by running the following command:
pip install scikit-learn
Now, let’s import the necessary libraries and create some sample data that we can use for our regression model. We’ll use numpy
to generate some random data, and pandas
to create a DataFrame from it:
import numpy as np
import pandas as pd
# Generate some random data for our regression model
np.random.seed(0)
X = np.random.rand(100, 1)
y = 4 + 3 * X + np.random.rand(100, 1)
# Create a DataFrame from the data
df = pd.DataFrame({'X': X[:, 0], 'y': y[:, 0]})
Now, we can use the LinearRegression
model from scikit-learn
to fit a linear regression model to our data. First, we’ll need to import the model and create an instance of it:
from sklearn.linear_model import LinearRegression
model = LinearRegression()
Next, we’ll use the fit
method to fit the model to our data. We’ll use the X
and y
columns from the DataFrame as the independent and dependent variables, respectively:
model.fit(df[['X']], df['y'])
Once the model is fitted, we can access the model’s coefficients and intercept using the coef_
and intercept_
attributes, respectively:
print(f'Intercept: {model.intercept_}')
print(f'Coefficient: {model.coef_[0]}')
This will output the intercept and coefficient of the fitted linear regression model. In this case, the output should be similar to the following:
Intercept: 4.007544817501123
Coefficient: 3.0014583848367077
We can also use the predict
method to make predictions on new data using our fitted model. For example, to predict the value of y
for a given value of X
, we can do the following:
X_new = [[0.5]]
prediction = model.predict(X_new)[0]
print(f'Prediction for X = {X_new[0][0]}: {prediction}')
This will output the predicted value of y
for X = 0.5
:
Prediction for X = 0.5: 5.504496361284768
That’s it! You now have a basic understanding of how to perform a linear regression in Python using scikit-learn
.