Logistic Regression
(Used for Classification)

The name "Logistic regression" is misleading.
Logistic regression is not used to predict values like price or temperature (like in regression problems).
It is used when the final answer is a label. The model always give a probability value, and using that value we decide the category.
The Probability values ends up making decisions like the following.

Binary decisions Yes / No True / False 0 / 1
Email filtering Spam Not spam
Weather Rain No rain
Overview

Logistic regression is used when the final outcome has only two possible choices. The model does not directly say yes or no. Instead, it first calculates how likely one outcome is compared to the other. Based on this likelihood, a final decision is made.

Two terms you will see very often are class 0 and class 1. These are simply labels used to represent two different groups. Their meaning depends entirely on the problem we are working on.

In spam filtering, we usually take class 1 to mean “Spam” and class 0 to mean “Not spam”. If an email contains words like “free”, “win”, or “jackpot”, the model may assign it to class 1. A normal email, such as a meeting message or homework notice, is more likely placed in class 0.

The same idea applies in other situations. For exam results, class 1 can represent “Pass” and class 0 can represent “Fail”. In medical screening, class 1 may indicate that a disease is detected, while class 0 indicates no disease.

The numbers 0 and 1 do not describe how strong or weak the case is. They are only labels used to separate one category from the other.
The score z (modelled by Computers)

Before making any decision, the model calculates a simple score called z. This score is only a number. It may be negative, zero, or positive. At this stage, the model is not predicting a class. It is only measuring which side the situation is leaning toward.

\( z = b_0 + b_1 x \)
b₀ is the baseline of the model. It represents the starting tendency before looking at the input.

b₁ decides how strongly the input x influences the score. If x increases and b₁ is positive, the value of z rises.

A larger positive z means the model is leaning toward class 1.
A negative z means it is leaning toward class 0.
The final decision comes later, after this score is converted into a probability.
Sigmoid (turning z into a probability)

The score z by itself is not very useful. It can be any number, like −5, 2, or even 20. But for classification, we need something meaningful. So the model converts z into a probability between 0 and 1. This conversion is done using a function called sigmoid.

\( p = \sigma(z) = \frac{1}{1 + e^{-z}} \)
The sigmoid function gently squeezes any value of z into the probability range.

If z is very negative, the probability becomes close to 0. If z is 0, the probability becomes exactly 0.5. If z is positive and large, the probability moves closer to 1.

Email example:
Suppose an email contains many suspicious words. The model computes a positive z value. After applying sigmoid, the probability may become something like 0.92. This means there is a 92% chance the email is spam.

If another email looks normal, z may be negative. After sigmoid, the probability may drop to something like 0.08, meaning the email is very unlikely to be spam.
Sigmoid curve (converts z into probability)
Threshold
(Decision Maker)

After applying the sigmoid function, the model produces a probability p. This value tells us how strongly the model believes the input belongs to class 1. To turn this probability into a final label, we choose a cutoff value called the threshold.

\( \text{If } p \ge t \Rightarrow \text{class } 1 \)
\( \text{If } p \le t \Rightarrow \text{class } 0 \)
A common choice for the threshold is t = 0.5. This means anything above 50% confidence is treated as class 1.

Email (strict spam filtering)
Suppose the threshold is set to t = 0.9. This means the model must be at least 90% confident before marking an email as spam.

If an email contains many suspicious words and the model outputs p = 0.94, then \( p \ge 0.9 \), so the email is marked as Spam (class 1).

If another email gives a probability of p = 0.78, even though it looks somewhat suspicious, it does not cross the threshold. So it is marked as Not spam (class 0).

Using a higher threshold reduces false spam warnings, but it may allow some spam emails to slip through.
Example
Identifying spam Emails
(Classifying Emails inbox/spam)

Let us work through one complete example of logistic regression for spam detection. In this example, an email is either accepted to the inbox or sent to spam.

In a real spam filter, the equation used to calculate z is not chosen by hand. It is learned by training the computer using data.

During training, the model is shown thousands of emails that are already labelled as Spam or Not spam. For each email, the computer extracts features (numbers), predicts a label, compares with the correct label, and measures the error.

When the prediction is wrong, the model adjusts the constants in its formula. This adjustment is done using optimisation techniques. The most common one is gradient descent, where the model repeatedly reduces its errors step by step. Other optimisation methods are also used in practice, but the idea is always the same: improve the model by learning from errors.

After training on many spam and normal emails, the model settles on values that work well. That is how an equation like the one below is obtained.

z = −4 + 1.2x
Here, the numbers −4 and 1.2 come from training. They reflect patterns learned from real email data, not human guesswork.
A List Of Spam Words
free, win, winner, prize, offer, click, buy, cheap, discount, deal, urgent, limited, bonus, cash, money, gift, reward, now, claim, jackpot

For each email, we count how many of these words appear. Let x be that count. The model does not understand the “meaning” of the message. It only uses this number.

p = 1 1 + e−z
Decision rule (strict filter): choose a threshold t (example: t = 0.9)
If p ≥ tSpam (class 1) → sent to Spam
If p < tNot spam (class 0) → delivered to Inbox
10 emails analysed
(x → z → p → inbox/spam)
Email (short) x z p Result
Meeting at 3 pm. Agenda attached. 0 −4.0 ≈ 0.02 Inbox
Invoice sent. Please review and reply. 0 −4.0 ≈ 0.02 Inbox
Limited offer. Click now for discount. 4 0.8 ≈ 0.69 Inbox
Urgent: claim your reward now. 3 −0.4 ≈ 0.40 Inbox
Buy cheap deal today. Limited bonus. 5 2.0 ≈ 0.88 Inbox
Win cash prize now. Click to claim. 6 3.2 ≈ 0.96 Spam
Congratulations winner! Claim your gift. 4 0.8 ≈ 0.69 Inbox
Jackpot prize! Win money now. 5 2.0 ≈ 0.88 Inbox
Free bonus cash. Limited time. Click now. 6 3.2 ≈ 0.96 Spam
Claim jackpot reward. Win prize now. 6 3.2 ≈ 0.96 Spam
Key idea:
The steps are always the same: count words → get x → compute z → convert to p → compare with threshold. With a strict threshold like 0.9, only emails with very high probability are sent to spam.
Example
Texas Rainfall Classification
(Wet May OR Not a Wet May)

Now let us to do logistic regression using a real weather idea from Texas. We are not trying to predict the exact rainfall amount. We only want a two label answer either Wet May or Not a Wet May.
Note: Logistic Regression is used to make Binary Decisions

Real data
The following data was collected from a weather station in Texas. For each year, the number recorded is the number of days in May that received measurable rainfall (at least 0.01 inch).

Data from 1995 to 2025 was used.

Using this historical data, each May was labelled as: Wet or Not Wet.

Definition used
Wet May → 10 or more rainy days
Not Wet May → fewer than 10 rainy days
Feature (input)
Let x be the number of rainy days in May (days with precipitation ≥ 0.01 inch).

Label (output we want or the Decision)
Let y = 1 for a Wet May and 0 for a Not Wet May.

During training, the computer adjusts the constants so the model separates “wet” versus “not wet” Mays as well as it can using the 1995–2025 data. After fitting the model, suppose the learned score equation is:

z = −6 + 0.7x

The score z is not a probability yet. We convert it into a probability using the sigmoid function:

p = 1 1 + e−z
Decision rule
We choose a threshold t. In this example, take t = 0.6.

If p ≥ 0.6 → classify as Wet May
If p < 0.6 → classify as Not Wet May
Now test one real May value
Real check (Texas)
Austin Camp Mabry, Texas
May 2025
Days with measurable rain (≥ 0.01 inch): 8 days
(This is the “DAYS ≥ 0.01” value from the monthly climate summary for that station.)

For May 2025, our input is x = 8.

z = −6 + 0.7(8) = −0.4
p ≈ 0.40
Since 0.40 < 0.6, May 2025 is classified as:

Not Wet May
Summary:
Real data → number of rainy days (x) → score (z) → probability (p) → label using a threshold.
Need help with mathematics? Contact SJMathTube.

Need help with mathematics?

School, college or engineering — if you are stuck, confused or need help, reach out to us. We will guide you step by step and make concepts simple.

Choose how you want to contact us:

Formulae

Available Formula Sheets

Latest
LATEST CONTENT Auto
Home