Feature Scaling Explained Simply

What is Feature Scaling?

Feature scaling means changing numerical columns so their values are on a similar range or scale. This is useful when different features have very different units or value ranges.

Simple Example

Age may range from 18 to 60, while salary may range from £20,000 to £120,000.

Without scaling, salary values are much larger and may dominate some machine learning models.

Feature scaling = putting numerical features on a comparable scale

Why Feature Scaling Matters

Some machine learning algorithms are sensitive to the size of numbers. If one feature has very large values, the model may give it too much influence even if it is not the most important feature.

Helps models learn more effectively
Prevents large-value features from dominating
Improves performance for distance-based algorithms
Helps gradient-based models train faster
Makes features easier to compare

Key idea: Scaling does not change the meaning of the data. It changes the numerical range.

Business Example

Suppose we want to predict whether a customer will buy a product using:

Feature 1

Age: 20 to 60

Feature 2

Estimated Salary: £20,000 to £120,000

Salary values are much larger than age values. Feature scaling helps place them on a more comparable scale before modelling.

Before and After Scaling

Customer	Age	Salary	After Scaling
A	25	30000	Age and salary are converted to comparable numerical ranges
B	45	90000	The model can compare both features more fairly

Simple rule: Scale numerical features when their ranges are very different.

Common Scaling Methods

The two most common feature scaling methods are StandardScaler and MinMaxScaler.

StandardScaler MinMaxScaler Normalisation Preprocessing

Method	What It Does	Typical Use
StandardScaler	Centres data around 0 using mean and standard deviation	Many ML models and neural networks
MinMaxScaler	Scales data into a fixed range, usually 0 to 1	When a fixed range is useful

StandardScaler

StandardScaler transforms data so that the feature has a mean of 0 and a standard deviation of 1.

StandardScaler: values are transformed based on mean and standard deviation

Simple Meaning

Values above the average become positive. Values below the average become negative.

StandardScaler is commonly used with algorithms such as logistic regression, support vector machines, K-nearest neighbours and neural networks.

MinMaxScaler

MinMaxScaler transforms values into a fixed range, usually between 0 and 1.

MinMaxScaler: smallest value becomes 0 and largest value becomes 1

Simple Meaning

If salary ranges from £20,000 to £120,000, MinMaxScaler converts those values into a 0 to 1 scale.

MinMaxScaler is useful when you want all features to stay within a clear minimum and maximum range.

Feature Scaling in Python: StandardScaler

Here is a simple example using Scikit-learn.

import pandas as pd
from sklearn.preprocessing import StandardScaler

data = {
    "Age": [25, 35, 45],
    "EstimatedSalary": [30000, 60000, 90000]
}

df = pd.DataFrame(data)

scaler = StandardScaler()

scaled_data = scaler.fit_transform(df)

print(scaled_data)

Important: fit_transform learns the scaling rules and applies them to the data.

Feature Scaling in Python: MinMaxScaler

import pandas as pd
from sklearn.preprocessing import MinMaxScaler

data = {
    "Age": [25, 35, 45],
    "EstimatedSalary": [30000, 60000, 90000]
}

df = pd.DataFrame(data)

scaler = MinMaxScaler()

scaled_data = scaler.fit_transform(df)

print(scaled_data)

The transformed values will be between 0 and 1.

Correct Scaling Workflow

In machine learning, scaling should be done carefully to avoid data leakage.

Split data first → fit scaler on training data → transform training and testing data

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

X_train, X_test, y_train, y_test = train_test_split(
    X,
    y,
    test_size=0.2,
    random_state=42
)

scaler = StandardScaler()

X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

Very important: Use fit_transform on training data, but only transform on testing data.

Why Not Fit the Scaler on Test Data?

The test data should represent unseen future data. If we fit the scaler using test data, information from the test set leaks into training.

Correct

Learn scaling rules from training data only.

Incorrect

Learning scaling rules from the full dataset before splitting.

Key idea: The test set should stay unseen until evaluation.

Which Models Need Scaling?

Usually Need Scaling	Usually Less Sensitive
Logistic Regression	Decision Trees
K-Nearest Neighbours	Random Forest
Support Vector Machines	Tree-based ensemble models
Neural Networks

Tree-based models such as Decision Trees and Random Forests usually do not require feature scaling because they split data using thresholds.

Feature Scaling and Deep Learning

Neural networks usually perform better when input features are scaled. Large feature values can make training unstable or slow.

Example

If one input is age and another is salary, scaling helps the network learn from both features more effectively.

Deep learning tip: Scaling is usually recommended before training neural networks.

Common Beginner Mistakes

Forgetting to scale numerical features
Scaling before train-test split
Using fit_transform on both training and testing data
Scaling categorical values after encoding without understanding the effect
Forgetting to save the scaler when deploying a model

Feature Scaling and Deployment

When using a trained model in real life, new data must be scaled in the same way as the training data. This means the scaler should be saved with the model.

import pickle

pickle.dump(scaler, open("scaler.pkl", "wb"))

Later, the same scaler can be loaded and used for new predictions.

scaler = pickle.load(open("scaler.pkl", "rb"))

new_data_scaled = scaler.transform(new_data)

Quick Practice

You are building a model using the following features:

Age: 18 to 65
Annual Income: £15,000 to £150,000
Website Visits: 1 to 500

Question: Should you consider feature scaling?

Suggested answer: Yes. The features have very different ranges, so scaling can help many machine learning models learn more effectively.

Key Takeaway

Feature scaling puts numerical features onto a similar scale. It is especially important for models that use distances, gradients or numerical optimisation, including logistic regression, KNN, SVM and neural networks.

Simple rule: If numerical features have very different ranges, consider scaling them before modelling.

What is Feature Scaling?

Simple Example

Why Feature Scaling Matters

Business Example

Feature 1

Feature 2

Before and After Scaling

Common Scaling Methods

StandardScaler

Simple Meaning

MinMaxScaler

Simple Meaning

Feature Scaling in Python: StandardScaler

Feature Scaling in Python: MinMaxScaler

Correct Scaling Workflow

Why Not Fit the Scaler on Test Data?

Correct

Incorrect

Which Models Need Scaling?

Feature Scaling and Deep Learning

Example

Common Beginner Mistakes

Feature Scaling and Deployment

Quick Practice

Key Takeaway

Want to Learn More?

Popular Courses

Useful Links

Share this page now!

What we do?

Our Contacts

Regional Training