Feature Scaling Explained Simply
Feature scaling is an important data preparation step in machine learning. It puts numerical features onto a similar scale
so that models can learn more fairly, efficiently and accurately.
What is Feature Scaling?
Feature scaling means changing numerical columns so their values are on a similar range or scale.
This is useful when different features have very different units or value ranges.
Simple Example
Age may range from 18 to 60, while salary may range from £20,000 to £120,000.
Without scaling, salary values are much larger and may dominate some machine learning models.
Feature scaling = putting numerical features on a comparable scale
Why Feature Scaling Matters
Some machine learning algorithms are sensitive to the size of numbers. If one feature has very large values,
the model may give it too much influence even if it is not the most important feature.
- Helps models learn more effectively
- Prevents large-value features from dominating
- Improves performance for distance-based algorithms
- Helps gradient-based models train faster
- Makes features easier to compare
Key idea: Scaling does not change the meaning of the data. It changes the numerical range.
Business Example
Suppose we want to predict whether a customer will buy a product using:
Feature 2
Estimated Salary: £20,000 to £120,000
Salary values are much larger than age values. Feature scaling helps place them on a more comparable scale before modelling.
Before and After Scaling
| Customer |
Age |
Salary |
After Scaling |
| A |
25 |
30000 |
Age and salary are converted to comparable numerical ranges |
| B |
45 |
90000 |
The model can compare both features more fairly |
Simple rule: Scale numerical features when their ranges are very different.
Common Scaling Methods
The two most common feature scaling methods are StandardScaler and MinMaxScaler.
StandardScaler
MinMaxScaler
Normalisation
Preprocessing
| Method |
What It Does |
Typical Use |
| StandardScaler |
Centres data around 0 using mean and standard deviation |
Many ML models and neural networks |
| MinMaxScaler |
Scales data into a fixed range, usually 0 to 1 |
When a fixed range is useful |
StandardScaler
StandardScaler transforms data so that the feature has a mean of 0 and a standard deviation of 1.
StandardScaler: values are transformed based on mean and standard deviation
Simple Meaning
Values above the average become positive. Values below the average become negative.
StandardScaler is commonly used with algorithms such as logistic regression, support vector machines,
K-nearest neighbours and neural networks.
MinMaxScaler
MinMaxScaler transforms values into a fixed range, usually between 0 and 1.
MinMaxScaler: smallest value becomes 0 and largest value becomes 1
Simple Meaning
If salary ranges from £20,000 to £120,000, MinMaxScaler converts those values into a 0 to 1 scale.
MinMaxScaler is useful when you want all features to stay within a clear minimum and maximum range.
Feature Scaling in Python: StandardScaler
Here is a simple example using Scikit-learn.
import pandas as pd
from sklearn.preprocessing import StandardScaler
data = {
"Age": [25, 35, 45],
"EstimatedSalary": [30000, 60000, 90000]
}
df = pd.DataFrame(data)
scaler = StandardScaler()
scaled_data = scaler.fit_transform(df)
print(scaled_data)
Important: fit_transform learns the scaling rules and applies them to the data.
Feature Scaling in Python: MinMaxScaler
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
data = {
"Age": [25, 35, 45],
"EstimatedSalary": [30000, 60000, 90000]
}
df = pd.DataFrame(data)
scaler = MinMaxScaler()
scaled_data = scaler.fit_transform(df)
print(scaled_data)
The transformed values will be between 0 and 1.
Correct Scaling Workflow
In machine learning, scaling should be done carefully to avoid data leakage.
Split data first → fit scaler on training data → transform training and testing data
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
X_train, X_test, y_train, y_test = train_test_split(
X,
y,
test_size=0.2,
random_state=42
)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
Very important: Use fit_transform on training data, but only transform on testing data.
Why Not Fit the Scaler on Test Data?
The test data should represent unseen future data. If we fit the scaler using test data, information from the test set leaks into training.
Correct
Learn scaling rules from training data only.
Incorrect
Learning scaling rules from the full dataset before splitting.
Key idea: The test set should stay unseen until evaluation.
Which Models Need Scaling?
| Usually Need Scaling |
Usually Less Sensitive |
| Logistic Regression |
Decision Trees |
| K-Nearest Neighbours |
Random Forest |
| Support Vector Machines |
Tree-based ensemble models |
| Neural Networks |
|
Tree-based models such as Decision Trees and Random Forests usually do not require feature scaling because they split data using thresholds.
Feature Scaling and Deep Learning
Neural networks usually perform better when input features are scaled.
Large feature values can make training unstable or slow.
Example
If one input is age and another is salary, scaling helps the network learn from both features more effectively.
Deep learning tip: Scaling is usually recommended before training neural networks.
Common Beginner Mistakes
- Forgetting to scale numerical features
- Scaling before train-test split
- Using fit_transform on both training and testing data
- Scaling categorical values after encoding without understanding the effect
- Forgetting to save the scaler when deploying a model
Feature Scaling and Deployment
When using a trained model in real life, new data must be scaled in the same way as the training data.
This means the scaler should be saved with the model.
import pickle
pickle.dump(scaler, open("scaler.pkl", "wb"))
Later, the same scaler can be loaded and used for new predictions.
scaler = pickle.load(open("scaler.pkl", "rb"))
new_data_scaled = scaler.transform(new_data)
Quick Practice
You are building a model using the following features:
- Age: 18 to 65
- Annual Income: £15,000 to £150,000
- Website Visits: 1 to 500
Question: Should you consider feature scaling?
Suggested answer: Yes. The features have very different ranges, so scaling can help many machine learning models learn more effectively.
Key Takeaway
Feature scaling puts numerical features onto a similar scale. It is especially important for models that use distances,
gradients or numerical optimisation, including logistic regression, KNN, SVM and neural networks.
Simple rule: If numerical features have very different ranges, consider scaling them before modelling.
Want to Learn More?
Explore our practical courses in Data Analysis, Machine Learning and AI to apply feature scaling in real-world projects.
View Courses