Polynomial Regression in Python

Polynomial Regression is a powerful statistical tool that extends the principles of linear regression to model relationships between variables in a more flexible and accurate manner. It allows us to fit curves rather than straight lines, making it particularly valuable in scenarios where the relationship between the independent and dependent variables is non-linear. In this article, we will explore the concept of Polynomial Regression in Python, detailing its use, differences from Linear Regression, and how to implement it with examples.

1. Introduction

1.1 What is Polynomial Regression?

Polynomial Regression is a form of regression analysis in which the relationship between the independent variable X and the dependent variable Y is represented as an nth degree polynomial. It can be expressed as:

Y = β₀ + β₁X + β₂X² + … + β_nXⁿ

1.2 Why Use Polynomial Regression?

Polynomial Regression is used when the relationship between the variables is more complex than what can be captured using linear regression. It allows the model to have flexibility and more accuracy in predictions. This is especially important in real-world applications such as finance, biology, and engineering.

2. Polynomial Regression vs Linear Regression

2.1 Differences Between Polynomial and Linear Regression

Aspect	Linear Regression	Polynomial Regression
Model Type	Linear (Straight Line)	Non-linear (Curved Line)
Complexity		More complex, adaptable to data
Use Case	Linear Relationships	Non-linear Relationships

3. Import Libraries

3.1 Required Python Libraries

To perform Polynomial Regression in Python, we need the following packages:

numpy: For numerical calculations
pandas: For data manipulation
scikit-learn: For creating and training the model

3.2 Additional Libraries for Visualization

For visualizing the dataset and the regression results, we will use:

matplotlib: For plotting graphs


# Import necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression

4. Create Dataset

4.1 Generating a Sample Dataset

We’ll generate a simple dataset that has a quadratic relationship:


# Generating a dataset
# Seed for reproducibility
np.random.seed(0)

# Sample data
X = np.random.rand(100, 1) * 10  # 100 random points between 0 and 10
y = X**2 + np.random.randn(100, 1) * 10  # Quadratic relation with noise

# Convert to pandas DataFrame
data = pd.DataFrame(np.hstack((X, y)), columns=['X', 'Y'])

4.2 Visualizing the Dataset

Now, we’ll plot the generated dataset:


# Visualizing the dataset
plt.scatter(data['X'], data['Y'], color='blue')
plt.title('Sample Dataset')
plt.xlabel('X')
plt.ylabel('Y')
plt.show()

5. Train a Polynomial Regression Model

5.1 Preparing the Data

We will split the dataset into training and testing sets and prepare it for Polynomial Regression:


# Splitting the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

# Transforming the features into polynomial features
poly_features = PolynomialFeatures(degree=2)
X_poly_train = poly_features.fit_transform(X_train)
X_poly_test = poly_features.transform(X_test)

5.2 Fitting the Polynomial Regression Model

We can now fit a polynomial regression model:


# Fitting the Polynomial Regression Model
model = LinearRegression()
model.fit(X_poly_train, y_train)

6. Visualizing the Polynomial Regression Model

6.1 Plotting the Polynomial Regression Results

Let’s visualize our Polynomial Regression model along with the dataset:


# Visualizing the Polynomial Regression Results
plt.scatter(X_train, y_train, color='blue', label='Training data')
plt.scatter(X_test, y_test, color='red', label='Test data')

# Plotting the Polynomial Regression curve
X_grid = np.arange(min(X), max(X), 0.1)
X_grid = X_grid.reshape((len(X_grid),1))
plt.plot(X_grid, model.predict(poly_features.transform(X_grid)), color='green', label='Polynomial Regression Curve')

plt.title('Polynomial Regression Results')
plt.xlabel('X')
plt.ylabel('Y')
plt.legend()
plt.show()

7. Conclusion

7.1 Summary of Polynomial Regression

Polynomial Regression is a valuable method for capturing complex relationships in data. By fitting a polynomial curve to the data, we can achieve more accurate predictions than with linear regression.

7.2 Future Applications of Polynomial Regression

This regression technique can be applied across various fields, including:

Finance for modeling trends and forecasts
Biology for studying growth patterns
Engineering for designing and optimization problems

FAQ

What are the advantages of Polynomial Regression?

Polynomial Regression can model non-linear relationships, which cannot be captured by linear regression. It provides enhanced accuracy and flexibility in predictions.

How do I choose the degree of the polynomial?

Choosing the degree of the polynomial requires consideration of the data characteristics. Higher degrees can fit the data better but may lead to overfitting.

What parameters are important in a Polynomial Regression model?

Key parameters include the coefficients of the polynomial equation, which can be extracted after fitting the model.

Can Polynomial Regression be used for multiple independent variables?

Yes, Polynomial Regression can be extended to handle multiple independent variables by including interaction terms and higher-degree features.

askthedev.com Latest Articles