Logistic Regression in Python
Introduction
Logistic Regression in Python is termed as the technique of predictive analysis. It’s used for the binary classification problem in Machine learning. In this tutorial, we will check out the Logistic Regression in Python in a subtle way.
What do you mean by Regression?
A robust statistical analytic technique is regression analysis. The dependent variables of your interest predict the value of the independent variables in the data set. You will be able to notice the use of Regression across you every time in a unique way. One example can be the weather prediction. You can make use of the data set of the past weather conditions and predict the current weather.
Regression makes use of several techniques to identify and predict the result, but the attention is focused on the relationship between the one or more independent variable and dependent variable.
The predicts using Logistic regressions are delivered through the binary variables, and this variable has the chance for two viable results.
Learn Python from the Basic to Advanced Level with Hands-on Training, Placements, and more with
Python Training in Bangalore
Logistic Regression in Python
As discussed earlier, the Logistic Regression in Python is a powerful technique to identify the data set, which holds one or more independent variables and dependent variables to predict the result in the means of the binary variable with two possible outcomes.
The independent variables are known as the predictors, and the dependent variables can be categorized in nature. The dependent variable is also termed as the target variable. Logistic Regression in Python is sometimes considered as the linear Regression’s particular case where it can only predict the result in the categorical variable. It makes use of the log function to predict the event probability.
A sigmoid curve or function is used to predict the absolute value. The outcome, i.e., wins, or loss is decided based on the threshold value. Let’s check out the equation of the Linear Regression in Python below.
y = β0 + β1X1 + β2X2 …. + βnXn
In the above Linear Regression equation,
- Y denotes the dependent variable. It means that Y needs to be predicted.
- Β0 denotes the Y-intercept. It means it’s generally the point on the line which sinks with the y-axis.
- Β1 denotes the line slope. The slope can be positive, negative based on the relationship between the independent variable and the dependent variable.
- X denotes the independent variable. It means it predicts the dependent value of the resultants.
Sigmoid Function
p = 1 / 1 + e-y
We apply the above sigmoid function p = 1 / 1 + e-y on the equation of linear aggression.
Logistic Regression Equation:
p = 1 / 1 + e-(β0 + β1X1 + β2X2 …. + βnXn)
Logistic Regression types:
There are several types of logistic Regression in Python namely
- Binary Logistic Regressions: There are two possible outcomes, namely yes or no.
- Multinomial Logistic Regressions: There are three or more categories, namely dog, cat, elephant, etc.
- Binomial Logistic Regressions: There are three or more binomial or logistic categories, namely user ratings(1-10). Ordinal indicates that the types are characterized in the order format.
Get Placement Oriented Python Training from Industry Experts with our Python Training in Chennai
What is the difference between Linear and Logistic Regression?
Logistic Regression | Linear Regression |
Logistic regression results in a definite outcome | Linear regression holds infinite outcome possibilities. |
Logistic regression is utilized when it finds the response variable in the format of a categorical way. | Linear regression is used when it finds the response variable in the format of a continuous way. |
An example of logistic regression is the defaulter predicting in a bank with the help of the past transaction details. | An example of linear regression is the stock market score, which produces continuous output. |
Use Cases of Logistic Regression:
Below are some of the uses cases of Logistic regression in Python.
Identifying the illness:
It’s easy to predict the disease of the patients, whether it’s positive or negative, in any complex cases with the help of the logical regression. It makes use of the past medical history of the patients to determine patient illness.
Predicting Weather:
One best example of Logistic regression in Python is the weather prediction. In this, the specialists analyze the previous weather report data and predict the current weather or the weather report for the next day. Logical regression can only predict the categorical data, whether it will be sunny or rainy, but nothing can be assured.
DEMO:
Let’s develop a prediction model with the help of logical regression in Python with the previous datasets.
Data collection:
The primary step for executing logical regression is data collection. We are now going to make use of the pandas to load the CSV file, which holds the data sets to the programs. We are going to develop a prediction model using the NBA data so that it’s easy to identify the relationship between the general data and predict the away game or home game possibilities.
import pandas as ps import numpy as nm import seaborn as sb import matplotlib.pyplot as plo cf = pd.read_csv(r'C:UsersVivaanDocumentsdata.csv') print(cf.head(5))
If you use the above code, you will get every data into a format that can be read easier. The next step is to find out what are the independent and dependent variables available for your model.
Data Analyzing:
We are going to analyze the data to check the relationship between the dependent and independent variables. We are going to check the connection by creating various plots.
sb.countplot('Home', hue='WINorLOSS', data=cf) plo.show()
Wrangling Data:
The datasets are altered based on the targeted variables. We are now going to delete all the string values and null values from the DataFrame.
We are making use of the below query to do it.
print(cf.isnull().sum())
We are also going to check the values which are not needed to develop the prediction model, the null values as well as other irrelevant data. In case if we find no null values present in the dataset NBA, then we will go forward for data splitting.
Data testing and training:
The data is now split into train data and test data for improving the model performance. We make use of the train_test_split to split the data. The ratio to split the data here is 70:30.
As this is the model prediction, the logistical regression in Python is executed by importing the model of logistic regression in the sklearn module. We make use of the proper function to fit the model on the train set. With the help of the prediction method, we perform the prediction.
from sklearn.model_selection import train_test_split from sklearn.linear_model import LogisticRegression from sklearn.metrics import classification_report from sklearn.metrics import confusion_matrix,accuracy_score a = cf.drop('Home', axis=1) b = cf['Home'] a_train, b_test, b_train, b_test = train_test_split(a, b, test_size=0.33, random_state=1) logmodel = LogisticRegression() logmodel.fit(a_train, b_train) predictions = logmodel.predict(a_test) print(classification_report(b_test, predictions)) print(confusion_matrix(b_test, predictions)) print(accuracy_score(b_test, predictions))
Classification Report
Recall, F1 Support, and Precision are displayed in the classification report. In this classification report, the precision score indicates the level that the model predicted is accurate. The precision-made for the away game is 0.58, whereas it’s 0.62 for the home game. The recall in the classification report represents the amount that the model can predict as a result. The recall predicted for an away game is 0.64, whereas it’s 0.57 for the home game. The model support score in the classification report indicates the data tested for predicting. The data verified for the away game in NBA data set is 1586, and it’s 1662 for the home game.
Confusion Matrix Heatmap
The confusion matrix denotes the table that explains the prediction model’s performance. It holds the predicted values as well as the actual values. We can make use of these values to determine the model’s accuracy score.
Accuracy Score
The accuracy score in the classification matrix is the accuracy percentage of the model’s prediction. The accuracy score for our model is 0.60, and this is determined to be quite precise. The prediction model will be more efficient if the accuracy score is good. You need to make sure you have the highest accuracy score for the best prediction model.
I hope the above tutorial provided a clear understanding of the Logic Regression in Python. If you have any queries regarding the topic, ask us in the comment section below.
Learn Python Course to Boost your Career with our Python Online Training
Related Blogs:
- Brief Overview of Python Language
- Python Career opportunities
- Python Break Continue
- Python Control Flow
- Python Data Types
- Python Dictionary
- Python Exception Handling
- Python File
- Python Functions
- Python Substring