Outliers Explained Simply
Outliers are unusual values that are very different from the rest of the data.
Understanding outliers is important because they can affect averages, charts, business decisions and machine learning models.
What is an Outlier?
An outlier is a data point that is much higher or lower than most other values in a dataset.
It stands out because it does not follow the general pattern.
Simple Example
Values: 10, 12, 13, 14, 15, 100
The value 100 is an outlier because it is much larger than the other values.
Outlier = a value that is unusually different from the rest of the data
Why Outliers Matter
Outliers can change the way we interpret data. If we ignore them, we may make poor decisions or build inaccurate models.
- They can distort the mean
- They can affect charts and visualisations
- They can reduce model accuracy
- They may indicate errors in data collection
- They may reveal important business events
Important: An outlier is not always a mistake. Sometimes it is a valuable signal.
Outliers and the Mean
Outliers can strongly affect the mean because the mean uses every value in the calculation.
Example
Dataset A: 20, 22, 23, 25, 26
Dataset B: 20, 22, 23, 25, 100
Dataset B has an unusually high value of 100. This pulls the mean upward and makes the average less representative.
| Dataset |
Typical Pattern |
Issue |
| Dataset A |
Values are close together |
Mean is reliable |
| Dataset B |
One value is much larger |
Mean may be misleading |
Common Causes of Outliers
| Cause |
Example |
| Data entry error |
Typing £10000 instead of £1000 |
| Measurement error |
Sensor recording an incorrect temperature |
| Rare event |
Very high sales during a special promotion |
| Natural variation |
A customer spending much more than average |
Business Example
A company analyses daily sales:
£480, £510, £495, £505, £520, £5,000
The value £5,000 is an outlier. It may be caused by a large corporate order, a data entry mistake,
or a special event such as a promotion.
Business question: Should we remove this value, correct it, or investigate it further?
How to Detect Outliers
Outliers can be detected using visual methods or statistical methods.
| Method |
How It Helps |
| Histogram |
Shows unusual values far from the main group |
| Box Plot |
Highlights extreme values clearly |
| Mean and Median Comparison |
Large difference may suggest outliers or skewness |
| IQR Method |
Uses quartiles to identify extreme values |
Outliers in Python
You can quickly inspect outliers using summary statistics.
import pandas as pd
data = [20, 22, 23, 25, 26, 100]
s = pd.Series(data)
print("Mean:", s.mean())
print("Median:", s.median())
print("Minimum:", s.min())
print("Maximum:", s.max())
Tip: If the maximum or minimum is very far from the rest of the values, investigate it.
Visualising Outliers with a Box Plot
A box plot is one of the easiest ways to spot outliers visually.
import pandas as pd
import matplotlib.pyplot as plt
data = [20, 22, 23, 25, 26, 100]
df = pd.DataFrame({"value": data})
df.boxplot(column="value")
plt.title("Box Plot Showing Outlier")
plt.show()
Values outside the main range of the box plot may be considered outliers.
Finding Outliers with the IQR Method
The IQR method is a common way to identify outliers using quartiles.
IQR = Q3 - Q1
Values are often considered outliers if they are below Q1 - 1.5 × IQR or above Q3 + 1.5 × IQR.
import pandas as pd
data = [20, 22, 23, 25, 26, 100]
s = pd.Series(data)
Q1 = s.quantile(0.25)
Q3 = s.quantile(0.75)
IQR = Q3 - Q1
lower_limit = Q1 - 1.5 * IQR
upper_limit = Q3 + 1.5 * IQR
outliers = s[(s < lower_limit) | (s > upper_limit)]
print("Outliers:")
print(outliers)
Should We Remove Outliers?
Not always. Before removing an outlier, we should understand why it exists.
| Situation |
Action |
| Data entry mistake |
Correct or remove it |
| Valid rare event |
Keep it or analyse separately |
| Extreme but meaningful value |
Investigate before deciding |
| Sensor or system error |
Clean or replace the value |
Simple rule: Never remove outliers automatically. Always investigate first.
Outliers and Machine Learning
Outliers can affect machine learning models because models learn patterns from training data.
- Regression models can be pulled toward extreme values
- Scaling can be affected by very large or small values
- Predictions may become less reliable
- Some algorithms are more sensitive to outliers than others
Example
If a house price dataset contains one incorrect value of £50 million, a regression model may learn an unrealistic pattern.
Quick Practice
Look at the following data:
12, 13, 15, 16, 17, 18, 95
Questions:
- Which value looks like an outlier?
- Could this value affect the mean?
- Should we remove it immediately?
Suggested answer: 95 looks like an outlier. It can affect the mean. We should investigate it before removing it.
Common Beginner Mistake
A common mistake is removing all outliers without understanding them. This can remove important business information.
Remember: Outliers can be errors, but they can also be opportunities, risks or important events.
Key Takeaway
Outliers are unusual values that differ strongly from the rest of the data. They can affect averages, visualisations and
machine learning models, so they should always be investigated carefully.
Simple rule: Detect outliers, understand them, then decide what to do.
Want to Learn More?
Explore our practical courses in Data Analysis, Machine Learning and AI to apply statistics in real-world projects.
View Courses