Overfitting and Underfitting Explained Simply
Overfitting and underfitting are two of the most important concepts in machine learning.
They help us understand whether a model has learned useful patterns or learned the data incorrectly.
Why This Topic Matters
A machine learning model should not simply memorise training data.
It should learn patterns that also work on new, unseen data.
Good machine learning = good performance on unseen data
What is Underfitting?
Underfitting happens when a model is too simple and fails to learn important patterns from the data.
Simple Analogy
A student studies only one page for an exam and cannot answer most questions.
| Signs of Underfitting |
Meaning |
| Poor training performance |
The model cannot even learn the training data properly |
| Poor testing performance |
The model also fails on new data |
| Very simple model |
The model lacks learning capacity |
Simple rule: Underfitting means the model learned too little.
What is Overfitting?
Overfitting happens when a model memorises the training data too closely, including noise and small details.
Simple Analogy
A student memorises answers to practice questions but struggles when the real exam questions change slightly.
| Signs of Overfitting |
Meaning |
| Excellent training performance |
The model memorised the training data |
| Poor testing performance |
The model cannot generalise well |
| Very complex model |
The model learned unnecessary details |
Simple rule: Overfitting means the model learned too much detail.
What is a Good Fit?
A good model learns the important patterns without memorising unnecessary details.
| Model Type |
Training Performance |
Testing Performance |
| Underfitting |
Poor |
Poor |
| Good Fit |
Good |
Good |
| Overfitting |
Excellent |
Poor |
Goal = balance between learning and generalisation
Visual Understanding
Underfitting
A straight line trying to fit highly curved data.
Good Fit
A smooth curve capturing the main trend.
Overfitting
A very complicated curve trying to pass through every single point.
Why Overfitting Happens
- Model is too complex
- Training for too many iterations
- Too many unnecessary features
- Very small training dataset
- Model memorises noise in data
Why Underfitting Happens
- Model is too simple
- Not enough training time
- Important features are missing
- Insufficient learning capacity
Overfitting in Real Business Scenarios
Customer Purchase Prediction
A model learns the exact behaviour of historical customers but performs poorly when predicting future customers.
Fraud Detection
A model memorises old fraud cases but cannot identify new fraud patterns.
How Train-Test Split Helps
Train-test split helps detect overfitting and underfitting by evaluating the model on unseen data.
| Situation |
Possible Issue |
| High training accuracy + low testing accuracy |
Overfitting |
| Low training accuracy + low testing accuracy |
Underfitting |
| Similar good performance on both |
Good generalisation |
Reducing Overfitting
| Technique |
Purpose |
| More training data |
Helps model learn broader patterns |
| Simpler model |
Reduces unnecessary complexity |
| Feature selection |
Removes irrelevant features |
| Regularisation |
Controls model complexity |
| Dropout (deep learning) |
Prevents memorisation |
Reducing Underfitting
| Technique |
Purpose |
| More training time |
Allows model to learn better |
| More useful features |
Provides more information |
| More complex model |
Increases learning ability |
Simple Python Example
from sklearn.tree import DecisionTreeClassifier
model = DecisionTreeClassifier(max_depth=2)
model.fit(X_train, y_train)
A very small depth may underfit. A very large depth may overfit.
Deep Learning Example
Neural networks can also overfit if they are too complex or trained too long.
from tensorflow.keras.layers import Dropout
model.add(Dropout(0.3))
Dropout helps reduce overfitting by randomly disabling some neurons during training.
Quick Practice
A model achieves 99% training accuracy but only 60% testing accuracy.
Question: Is this likely overfitting or underfitting?
Answer: Likely overfitting because the model performs very well on training data but poorly on unseen data.
Common Beginner Mistake
Many beginners focus only on training accuracy. A model is only useful if it performs well on new data.
Remember: High training accuracy alone does not mean the model is good.
Key Takeaway
Underfitting happens when a model learns too little. Overfitting happens when a model memorises too much.
The goal is to build a model that learns useful patterns and generalises well to unseen data.
Simple rule: Good machine learning balances learning and generalisation.
Want to Learn More?
Explore our practical courses in Data Analysis, Machine Learning and AI to understand how real models are built and improved.
View Courses