Logistic Regression - Sigmoid Function

Hello all, We have learned about the perceptron trick to solve logistic regression in the previous part, and the perception trick was classifying all the points. But we observe that the original Logistic regression was performing better than the perception trick and we discussed the problem with the perceptron trick that why it is not converging best so the problem was in its Algorithm. So we need to modify the algorithm and here comes the sigmoid function with a great solution which we will discuss in our today's tutorial.

Modifications In the algorithm

In perceptron trick, if a point is misclassified then it calls the separation line towards itself and if a point is correctly classified then no changes happen. But now we will make a change in the algorithm that if a point is correctly classified then they will push the line away and if misclassified then they will pull the line. In this way when a line will be pushed by both the classes in opposite direction then a line will converge in a symmetric way.

Now for pushing and pulling a line depends on any magnitude by which all this transformation will take place, and it is decided by how much distance far is a point from the line.

For example, you can observe in the above plot that at point one you will push the line with greater magnitude, and at a second point you will push the line will very less magnitude.

what changes do we need to implement the modified algorithm?

In perceptron trick algorithm the equation we form as where when a prediction was correct, then the new weight is equal to old weight because the difference between actual and predicted values become zero. Now when the prediction is correct we want to push the line and for this, we need to stop this difference from becoming zero. we cannot make any change in actual data, only we can prevent is prediction value so that difference of both is not zero. And the difference was becoming zero because we were using a step function that returns zero or one, But now to prevent this the new function is introduced as Sigmoid Function.

Sigmoid Function

The sigmoid function is a very popular function used in Machine learning and deep learning and I hope you have listened about sigmoid somewhere. we will learn about the sigmoid function and understand by implementing it how to do the behaviors of the above equation change. The most important feature of the sigmoid function is it scales down any number between 0 to 1 and it is the most useful feature.

You can observe the graph and equation of the sigmoid function below. If takes an input z and gives an output between 0 to 1. In our algorithm, we will replace the step function with a sigmoid function. If an input is negative then the output will be less than 0.5, if the input is 0 then the output will be exactly 0.5, and if positive input then the output will be exactly greater than 0.5.

The advantage we will get through the sigmoid function is that first, the difference between actual and predicted becomes zero, But here sigmoid will predict the continuous value between zero and one hence the difference will never be zero.

Let us see with different cases, how sigmoid function will work, in the above cases.

Case-1) Positive Point is correctly classified

Wn = Wo + learning_rate * 0.2 * Xi

we can see that it will push the line downwards with less magnitude.

Case-2) Negative Point Wrongly classified

Wn = Wo - learning_rate * 0.65 * Xi

It will pull the line towards itself (Upwards) with a greater magnitude

Case-3) Positive Point wrongly Classified

Wn = Wo + learning_rate * 0.7 * Xi

It will Pull the point towards itself(downwards)

Case-4) Negative Point Correctly Classified

Wn = Wo - learning_rate * 0.15 * Xi

It will push the line upwards with less magnitude due to the far distance from the line.

And after performing all this transformation line will converge better way.

Code the sigmoid Function for Logistic Regression

Now we will code the sigmoid function and fit our created data using a modified algorithm. we are working on the previous part-1 notebook only so I request to create data as per earlier if you have not done that. All the code is the same only a little modification is the perception function.

Visualize the results of sigmoid function and compare with perception and actual logistic regression.

We can see that the sigmoid function is an improvement over perception, but actual logistic regression is still better than it because it is using some different concepts that we are missing in the sigmoid. let's study that.

Why we did not Converge?

Actual Logistic regression performs better than the sigmoid function fit because actual Machine learning does not work in this way that you have to only classify the points. It also considers one error function(Error) and calculates the loss on the fit and tries to optimize the loss and then converges to minima. the loss function can be anything.

Now to improve our sigmoid function we have to find one loss function on it whether it can be gradient descent or any modified function. We will solve this problem in the next article, otherwise, the article will be too long to digest for you. I encourage you to find the function and work on it.

End Notes

we have learned about sigmoid function, and what actually sigmoid function does. we have also learned How we can use the sigmoid function to improve the perception trick and we have also implemented it practically on the previous dataset and visualize the changes. And we conclude that actual sciket-learn logistic regression performs better than sigmoid function because we do not take loss function into consideration. So, in our upcoming article, we will study Logistic regression with Gradient descent and converge like actual machine learning logistic regression.

keep learning, happy learning

Logistic Regression - Sigmoid Function | Part-2