• 0208 432 6218
  • WhatsApp
  • Register

Outliers Explained Simply

Outliers are unusual values that are very different from the rest of the data. Understanding outliers is important because they can affect averages, charts, business decisions and machine learning models.

What is an Outlier?

An outlier is a data point that is much higher or lower than most other values in a dataset. It stands out because it does not follow the general pattern.

Simple Example

Values: 10, 12, 13, 14, 15, 100

The value 100 is an outlier because it is much larger than the other values.

Outlier = a value that is unusually different from the rest of the data

Why Outliers Matter

Outliers can change the way we interpret data. If we ignore them, we may make poor decisions or build inaccurate models.

  • They can distort the mean
  • They can affect charts and visualisations
  • They can reduce model accuracy
  • They may indicate errors in data collection
  • They may reveal important business events
Important: An outlier is not always a mistake. Sometimes it is a valuable signal.

Outliers and the Mean

Outliers can strongly affect the mean because the mean uses every value in the calculation.

Example

Dataset A: 20, 22, 23, 25, 26

Dataset B: 20, 22, 23, 25, 100

Dataset B has an unusually high value of 100. This pulls the mean upward and makes the average less representative.

Dataset Typical Pattern Issue
Dataset A Values are close together Mean is reliable
Dataset B One value is much larger Mean may be misleading

Common Causes of Outliers

Cause Example
Data entry error Typing £10000 instead of £1000
Measurement error Sensor recording an incorrect temperature
Rare event Very high sales during a special promotion
Natural variation A customer spending much more than average

Business Example

A company analyses daily sales:

£480, £510, £495, £505, £520, £5,000

The value £5,000 is an outlier. It may be caused by a large corporate order, a data entry mistake, or a special event such as a promotion.

Business question: Should we remove this value, correct it, or investigate it further?

How to Detect Outliers

Outliers can be detected using visual methods or statistical methods.

Method How It Helps
Histogram Shows unusual values far from the main group
Box Plot Highlights extreme values clearly
Mean and Median Comparison Large difference may suggest outliers or skewness
IQR Method Uses quartiles to identify extreme values

Outliers in Python

You can quickly inspect outliers using summary statistics.

import pandas as pd

data = [20, 22, 23, 25, 26, 100]

s = pd.Series(data)

print("Mean:", s.mean())
print("Median:", s.median())
print("Minimum:", s.min())
print("Maximum:", s.max())
Tip: If the maximum or minimum is very far from the rest of the values, investigate it.

Visualising Outliers with a Box Plot

A box plot is one of the easiest ways to spot outliers visually.

import pandas as pd
import matplotlib.pyplot as plt

data = [20, 22, 23, 25, 26, 100]

df = pd.DataFrame({"value": data})

df.boxplot(column="value")

plt.title("Box Plot Showing Outlier")
plt.show()

Values outside the main range of the box plot may be considered outliers.

Finding Outliers with the IQR Method

The IQR method is a common way to identify outliers using quartiles.

IQR = Q3 - Q1

Values are often considered outliers if they are below Q1 - 1.5 × IQR or above Q3 + 1.5 × IQR.

import pandas as pd

data = [20, 22, 23, 25, 26, 100]
s = pd.Series(data)

Q1 = s.quantile(0.25)
Q3 = s.quantile(0.75)
IQR = Q3 - Q1

lower_limit = Q1 - 1.5 * IQR
upper_limit = Q3 + 1.5 * IQR

outliers = s[(s < lower_limit) | (s > upper_limit)]

print("Outliers:")
print(outliers)

Should We Remove Outliers?

Not always. Before removing an outlier, we should understand why it exists.

Situation Action
Data entry mistake Correct or remove it
Valid rare event Keep it or analyse separately
Extreme but meaningful value Investigate before deciding
Sensor or system error Clean or replace the value
Simple rule: Never remove outliers automatically. Always investigate first.

Outliers and Machine Learning

Outliers can affect machine learning models because models learn patterns from training data.

  • Regression models can be pulled toward extreme values
  • Scaling can be affected by very large or small values
  • Predictions may become less reliable
  • Some algorithms are more sensitive to outliers than others

Example

If a house price dataset contains one incorrect value of £50 million, a regression model may learn an unrealistic pattern.

Quick Practice

Look at the following data:

12, 13, 15, 16, 17, 18, 95

Questions:

  • Which value looks like an outlier?
  • Could this value affect the mean?
  • Should we remove it immediately?

Suggested answer: 95 looks like an outlier. It can affect the mean. We should investigate it before removing it.

Common Beginner Mistake

A common mistake is removing all outliers without understanding them. This can remove important business information.

Remember: Outliers can be errors, but they can also be opportunities, risks or important events.

Key Takeaway

Outliers are unusual values that differ strongly from the rest of the data. They can affect averages, visualisations and machine learning models, so they should always be investigated carefully.

Simple rule: Detect outliers, understand them, then decide what to do.

Want to Learn More?

Explore our practical courses in Data Analysis, Machine Learning and AI to apply statistics in real-world projects.

View Courses

What we do?

At London Academy of IT, we provide instructor-led online and in-person IT training in Data Analytics, SQL, Python, Power BI, and more. Our cutting-edge courses are designed to boost performance and enhance employability, providing the competitive edge employers look for.

Our Contacts

London Academy of IT
64 Broadway
Stratford
London E15 1NT
United Kingdom

Regional Training

2012 - 2026 © London Academy of IT Limited. All Rights Reserved.
UKPRN: 10045491. Registered in England & Wales with company no. 07923992.