What is logistic regression?
This blog is a continuation of my Machine Learning Blog Series, in this blog I am going to discuss Logistic Regression. To fully understand this blog, I recommend that you read my previous blog on Linear Regression.
Logistic Regression is a Linear Model for classification, a traditional linear model is used to predict a numerical value, whereas a logistic model is used to predict into which category an example belongs.
An example of a Logistic Regression Model is a system that is used by many email providers to determine whether an email is spam or ham. The email details are given to the model and the model categorises the email as spam or ham and moves the mail into the appropriate folder.
How does it work?
In essence, a Logistic Regression algorithm calculates the probability of an example belonging to each category. The algorithm structure is the same as a Linear Algorithm which I covered in a previous blog. Each input variable is assigned a coefficient, the difference is that the Log of the Equation is determined. This confines the Equation between 0 & 1. This answer is the probability of the current example being a member of the default category.
y=Log(C_1 x+C_2 x_2+C_3 x_3)
In the determine spam example, y is the probability of whether an email is spam. For Example, if y=0.7, there is a 70% chance that the mail is spam.
A Logistic Regression model is trained the same as a Linear Regression Model, the modal iterates through a training set of known examples. Adjusting the coefficients when predictions are incorrect.
Data Preparation is very similar to Linear Regression with some key differences.
- Non-Discrete Output Variable – All the Training Set output variables have to be a member of a fixed number of categories.
- Noise Sensitivity – Logistic Regression is much more sensitive to outliers and incorrect data than linear regression, so Data Quality standards need to be higher.
- Gaussian Distribution – It is assumed that there is a linear relationship between the input variables and the output variable.
Types of Logistic Regression
The Linear Regression algorithms, that I discussed in my previous blog, such as Ordinary Least Squares & Gradient Descent can also be applied to logistic problems. However, a more suitable method is Maximum-likelihood Estimation. This algorithm favors and prioritizes predictions that are closer to the extremes of the prediction range. EG. In our email example as close to 1 or 0 as possible, resulting in more definitive predictions, predictions close to 0.5 are discouraged.
This blog is a continuation of the Technical Machine Learning Series, it expands on the previous blog about Linear Regression. In this blog, we learned about another type of Linear Algorithm, Logistic Regression. We learned how logistic regression works, and how it differs from Linear Regression.
Contact Alan and the team to find out more about our Software Development services.