Complete Overview on Regression Analysis in Machine Learning

Hello and welcome to the wonderful article on Regression Analysis. Regression is a type of supervised machine learning which leverages the prediction of continuous variables (numerics) like price, rating, fees, etc. There are different regression algorithms that we have taken a look at in our previous blog on supervised machine learning. In this blog, we will deep dive into Regression and understand the main concept behind Regression and what actually is regression analysis.

Regression Analysis in Machine Learning

Table Of Contents

  1. Brief Introduction to Regression Analysis
  2. Application of Regression with real-time example
  3. How Regression works?
  4. Terminologies used and related to regression
  5. Different Types of regression algorithm in machine learning
  6. Summary

Introduction to Regression Analysis

Regression Analysis is a Statistical Method used to find the relationship between independent and dependent variables. It helps one to understand the problem statement well and define it in a way to make understand any third person to extract the relationship between different variables.

👉 More specifically, Regression analysis helps to understand the correlation that how the dependent variable is changing on increasing or decreasing the independent variable. In simple words, Regression is simply predicting a continuous value or a number such as predicting sales, temperature, stock price, etc.

To understand and perform Regression Analysis in a better way let's consider the below example:

Understand Regression with real-time example

Problem Statement: Suppose, one organization is hiring employees and deciding their salaries concerning their work experience. Now, the experience and salaries of different employees vary a lot. They are having previous data of employees how they judge based on experience.

Regression with real-time example

👉 Now, Visiting the past data and observing the sheet, again and again, is a time-consuming and repetitive process, And suppose we have to estimate the salary of a person with 4.7 years experience then it makes a person overthink to decide. So, to automate this task we use Machine Learning, and to solve such kind of task we make use of Regression Analysis. 

I hope that Regression is clear to you so we can summarize our 2 headings in single as Regression is a Supervised Machine Learning technique that helps to find the relation between dependent(target variable) and independent variables(input features). It is mainly used in forecasting, prediction, determining trends, time series modeling, etc.

How does Regression Works?

In Regression, we draw a plot or a line between dependent and independent features that best fits the training data points in such a way that loss is minimum and with the help of the best-fit line algorithm is capable to make any new prediction. In simple words "Regression shows a line between that passes through all the data points on predictor graph in such a way that the vertical distance between data points and the regression line is minimum."

In simple words only understand that we have an input and an output to predict. we put all the input data points on X-axis and corresponding output data points on Y-axis so we got a graph. Now try to draw a straight line between all these points such that the distance between line and points is minimum in order to ensure a minimum loss.

Terminologies Related to Regression

  • Dependent Variable: The main feature that we want to predict or our target feature is known as the Dependent variable.
  • Independent Variable: Factors that affect the dependent variable directly or indirectly or feature which are used to predict the target variable is Independent variables. They are also known as input features.
  • Outliers: Outliers are the data points or the observation which are very low or very high in comparison to other data points. Outliers are present in a minimal amount in your data but have a severe effect on the performance that's why it is important to deal with outliers before modeling.
  • Multicollinearity: Multicollinearity is a condition when your two or more independent variables are highly correlated to each other. This condition should not be present in a dataset, because it creates a problem for the model and it gets confused. So If this type of condition is arising then, you should remove it because dealing with this will help you to reduce the dimensions and improve performance.
  • Overfitting and Underfitting: If our algorithm works well on the training dataset but did not work well on a test dataset then such a condition is known as Overfitting. And If the algorithm works worst on both the dataset(training and testing) then, such a condition is known as underfitting.

Hence, Now we are capable to understand the importance of Regression. as mentioned above that it is used to predict continuous variables. There are various scenarios where such a condition arises, and we will work and discuss more with real-world problems. By performing regression, we can confidently determine the least important feature, most important feature, and how one feature is affecting other features.

Types of Regression

There are various types of regression algorithms used for solving data science and Machine Learning problems that work on different scenarios and differ a little bit in working. But at a core, all the regression algorithms determine the effect of independent variables on a dependent variable. some of the regression algorithms which are mostly used are listed below.

  • Linear Regression
  • Polynomial Regression
  • Support Vector Regression
  • Decision Tree Algorithm
  • Random Forest Algorithm
  • Lasso Regression
  • Ridge Regression
  • ElasticNet Regression

Summary

Regression Analysis is used to find the correlation between input and output features. there are various concepts to be taken care of before modeling like multicollinearity, presence of outliers, dealing with overfitting and underfitting problems. In the coming articles, we are going to learn a complete intuition behind each algorithm and discuss each and every problem, in brief, we have seen and found the solution using different techniques.

If you have any suggestions, views, or queries, please post them in the comment section below 👇. I will be much happier to learn from you and support you in your journey.

Keep Learning, Happy Learning 😊
Thank You!..

Post a Comment

If you have any doubt or suggestions then, please let me know.

Previous Post Next Post