KNN: Turning Data Proximity into Predictive Power
- Ashish John Edward
- Oct 19, 2024
- 11 min read
Updated: Oct 26, 2024
In a world where data is king, K-Nearest Neighbours (KNN) emerges as a powerful yet intuitive tool that can solve a wide range of business problems. Imagine being able to predict which customer will buy a product or estimate the price of a house based on similar data points from the past. KNN does exactly this, offering businesses a way to make data-driven decisions without complicated assumptions. Whether you're a marketer predicting customer behaviour or a real estate analyst estimating property prices, KNN can help you make smarter, faster, and more informed decisions by leveraging the power of proximity in your data. Ready to learn how this simple algorithm can revolutionize decision-making in your office? Let's dive in.

In predictive modelling, we often seek algorithms that are not only powerful but also easy to understand and implement. Enter K-Nearest Neighbours (KNN) — a simple yet highly effective algorithm that brings intuition to the forefront of machine learning. Like a good neighbor who knows what's going on around the block, KNN uses the proximity of data points to make its predictions. In this article, we dig deep into the inner workings of KNN, uncover how it learns, and explore its practical applications.
K-Nearest Neighbours (KNN) is a non-parametric, instance-based learning algorithm that can be used for both classification and regression tasks. The core idea is straightforward: when given a new data point to classify or predict, KNN looks at the closest K neighbors to determine what the outcome should be.
For example, if we have a dataset of fruits categorized by their size and color, KNN will classify a new fruit by looking at the most similar fruits in the dataset.
Why Use KNN?
One of the strengths of KNN is that it requires no training phase, meaning that it stores the entire dataset and only performs computations when making predictions. This makes it highly adaptable and easy to implement. However, it also means that the algorithm can be computationally expensive, especially when dealing with large datasets, as KNN needs to calculate the distance from the new data point to every other point in the dataset.
Another advantage is its simplicity and interpretability. The decision-making process in KNN is transparent — you can literally see the data points being compared and how the majority vote or distance influences the outcome.
Intuition Behind KNN
At its heart, KNN works on the principle of similarity. It assumes that similar data points are close to each other in feature space. Therefore, to predict the outcome of a new data point, KNN calculates the distance between this point and all other points in the training set. Based on the proximity to its nearest neighbors, it makes a decision.
In classification, KNN assigns the label that is most common among the K nearest neighbors. In regression, it takes the average of the target values of the neighbors to predict the output.
To put it simply, KNN doesn’t assume any specific distribution of data (like linear or logistic regression does). Instead, it lets the data “speak for itself.”
Real life use cases for KNN
Let’s look at some real-life use cases of KNN before we jump into understand this by getting our hands dirty.
Zillow: Real Estate Price Prediction
Zillow, a leading online real estate marketplace, uses KNN to help estimate property prices through its Zestimate tool. For home buyers and sellers, having an accurate estimate of a property’s value is essential. Zillow employs KNN to predict the price of a house by comparing it to similar properties in the same neighbourhood, based on attributes like square footage, number of bedrooms, and recent sale prices of neighbouring houses. By finding homes with similar characteristics, KNN allows Zillow to predict how much a home is worth with a high degree of accuracy.

This algorithm not only helps buyers and sellers make more informed decisions but also enhances the credibility of Zillow as a trusted source for real estate pricing. The Zestimate tool has become one of the most popular features on Zillow, offering users an instant estimate of home values. By grouping homes based on KNN, Zillow can analyse market trends and predict pricing fluctuations based on historical data and comparable sales, providing value not only to individual buyers but also to the broader real estate market.
Tinder: Match Recommendations
Tinder, one of the most popular dating apps, uses KNN to enhance its match recommendation system. In online dating, providing relevant match suggestions is essential to keeping users engaged and improving their chances of finding meaningful connections. KNN helps Tinder compare users based on their behaviour, such as swiping patterns, age preferences, and interests. By identifying "neighbors" who have similar profiles and dating behaviours, KNN allows Tinder to recommend users who are likely to be good matches for each other.

This method helps Tinder users discover potential partners who align with their preferences, increasing user satisfaction. The more relevant the matches, the longer users stay active on the app, and the higher the likelihood of successful matches. KNN’s ability to analyse multiple factors and find patterns in user behaviour has led to improved match quality, which is one of the reasons Tinder remains a leader in the dating app market. Personalized matches driven by KNN have boosted user retention, helping Tinder grow its global user base and monetize through premium subscriptions.
Alibaba: Fraud Detection
Alibaba, the Chinese e-commerce giant, leverages KNN in its fraud detection system. With millions of transactions happening daily, identifying fraudulent activities is crucial for maintaining trust and security on the platform. KNN helps Alibaba analyse customer behaviour patterns and transaction histories to detect anomalies. By comparing a current transaction with past similar transactions, KNN identifies unusual behaviours, such as a sudden spike in purchases from a specific account, abnormal payment methods, or changes in delivery locations.

When such anomalies are detected, Alibaba’s system flags them for further investigation. KNN is particularly useful in this context because it can compare transactions across various dimensions (e.g., frequency, amount, location) and find the ones that deviate significantly from normal behaviour. By using KNN for fraud detection, Alibaba has improved its ability to prevent fraud in real-time, protecting both buyers and sellers from malicious activities. This has helped maintain trust in Alibaba’s platform while minimizing financial losses due to fraud.
Pinterest: Image Recognition for Visual Search
Pinterest, a leading platform for sharing visual content, uses KNN to power its image recognition and visual search capabilities. With users heavily reliant on image discovery, Pinterest needed a way to help users find similar images and content related to their interests. KNN allows Pinterest to analyze visual features such as colors, shapes, and patterns within images. When a user pins an image or performs a visual search, KNN compares the new image to a vast database of other images, grouping them based on similarity. This helps Pinterest deliver highly accurate and visually relevant search results.

The visual search enhancement brought by KNN ensures that users can easily find aesthetically or contextually similar content, improving the overall user experience. For instance, if a user pins a home décor idea, KNN suggests similar décor designs that match the user’s tastes. By facilitating such personalized discovery, Pinterest keeps users engaged longer, encouraging them to explore more content. This has directly contributed to increased user retention and engagement, as users spend more time on the platform discovering new ideas. As a result, Pinterest has become a go-to platform for visual content discovery, driving higher user interaction and more frequent platform visits.
How KNN Works: Step by Step
Let’s break down how KNN works into simple steps:

While machine learning algorithms like KNN are often implemented in Python or R using libraries such as Scikit-learn, you can also gain a fundamental understanding of KNN through MS Excel. Excel allows you to manually perform each step of the algorithm, providing hands-on experience with KNN calculations.
Problem Statement: Predicting Whether a Customer Will Purchase a Product
You are the marketing manager of an e-commerce platform and want to predict whether a customer will purchase a product based on two features: Age and Income. You have a dataset of past customer purchases. Your task is to predict whether a new customer (Age = 40, Income = 58000) will purchase the product or not using the KNN algorithm.
Step 1: Dataset Setup
In Excel, we will create a small dataset with the following features:
Age: The age of the customer.
Income: The annual income of the customer.
Purchased: 1 if the customer purchased the product, 0 if not.

Now, your task is to predict whether the new customer (Age = 40, Income = 58000) will purchase the product based on the purchasing patterns of previous customer
Step 2 : Calculate Euclidean Distance
KNN works by calculating the Euclidean distance between the new customer and each of the other customers in the dataset.
Euclidean Distance Formula:

Where p1 and p2 are the Age and Income of the new customer, and q1 and q2 are the Age and Income of an existing customer.
Steps in Excel:
Add a new column titled Distance.
Use the following formula to calculate the Euclidean distance between the new customer and each existing customer =SQRT((B2 - 40)^2 + (C2 - 58000)^2)
Drag the formula down for all rows to calculate the distance for each customer.
Step 3: Sort the Data by Distance
Once the distances are calculated, sort the data in ascending order of the distance to find the closest neighbors.Highlight the entire dataset (including the distances).Go to Data → Sort → Sort by Distance (ascending).
Step 4: Select the Nearest Neighbors (K = 3)
In this example, let's assume K = 3, meaning we will look at the three closest customers to the new customer. After sorting by distance, pick the top 3 nearest neighbors. Look at their Purchased values (either 0 or 1).
Step 5: Predict the Class (Purchased or Not)
Once you've identified the three nearest neighbors, count how many of the nearest neighbors purchased the product (Purchased = 1) and how many did not (Purchased = 0).
Use the COUNTIF function in Excel to count how many of the neighbours have Purchased = 1: =COUNTIF(D2:D4, 1) ; where D2:D4 represents the range of the Purchased column for the nearest neighbours.
Prediction:
If the majority of the nearest neighbors purchased the product, predict that the new customer will purchase the product (i.e., Purchased = 1).
If the majority did not purchase the product, predict that the new customer will not purchase the product (i.e., Purchased = 0).
Step 6: Conclusion
Once you've completed these steps, you’ll have predicted whether the new customer (Age = 40, Income = 58000) will purchase the product based on the purchasing behaviour of the three nearest neighbors.
Worked Example (Results)
For simplicity, let’s calculate the Euclidean distances manually here for each customer:

After sorting by distance, the nearest neighbors are :
Customer 2 (Age = 32, Income = 60000, Purchased = 1)
Customer 5 (Age = 23, Income = 55000, Purchased = 1)
Customer 1 (Age = 25, Income = 50000, Purchased = 1)

Here is the visual representation of the KNN problem for predicting whether a customer will purchase a product based on their Age and Income:
· Green points represent customers who purchased the product.
· Red points represent customers who did not purchase the product.
· Blue point represents the new customer (Age = 40, Income = 58000) for whom we want to make a prediction.
Using KNN, you can now see how the new customer is positioned relative to the existing customers. By calculating the Euclidean distance from the new customer to the others, you can predict whether they are likely to purchase the product by observing their nearest neighbors.
Prediction :
All three nearest neighbors purchased the product (Purchased = 1).
Therefore, we predict that the new customer will also purchase the product (Purchased = 1).
K-Nearest Neighbors (KNN) & Regression
K-Nearest Neighbors (KNN) can also be applied to regression tasks! In KNN regression, instead of predicting a class label (as in classification), KNN predicts a numerical value by averaging the values of the K nearest neighbors.
Let's walk through how we can apply KNN for regression in Excel with a step-by-step problem, just as we did for classification.
Problem Statement: Predicting House Prices Using KNN Regression
You are a real estate analyst, and you want to predict the price of a house based on its size (in square feet) and the number of bedrooms. You have data from previously sold houses, and you want to use KNN to predict the price of a new house that has not been sold yet.
Step 1: Dataset Setup
In Excel, we will create a small dataset with the following features:
Size (sq ft): The size of the house.
Bedrooms: The number of bedrooms in the house.
Price (in USD): The price at which the house was sold.
The dataset looks like this:

Your task is to predict the price of the new house (Size = 1550 sq ft, Bedrooms = 3) using KNN regression.
Step 2: Calculate Euclidean Distance
As in KNN classification, we will calculate the Euclidean distance between the new house and each of the houses in the dataset.
Euclidean Distance Formula for Regression:

Where p1 and p2 are the size and number of bedrooms of the new house, and q1 and q2 are the size and number of bedrooms of an existing house in the dataset.
Steps in Excel:
Add a new column titled Distance.
Use the following formula to calculate the Euclidean distance between the new house and each existing house = SQRT((B2 - 1550)^2 + (C2 - 3)^2)
This formula calculates the distance between the new house (Size = 1550, Bedrooms = 3) and each house in the dataset.
Step 3: Sort the Data by Distance
Once you have calculated the distances for all houses, sort the data in ascending order based on the distance. This will give you the houses that are closest to the new house.
Highlight the entire dataset (including the distances).
Go to Data → Sort → Sort by Distance (ascending).
Step 4: Select the Nearest Neighbors (K = 3)
For this regression task, we will set K = 3, meaning we will consider the 3 nearest neighbors (houses) to the new house. After sorting by distance, choose the top 3 neighbors.
Step 5: Predict the Price Using Averaging
Once you've selected the nearest 3 neighbors, Take the average price of the nearest 3 neighbors to predict the price of the new house. To compute the mean of the prices of the 3 nearest neighbors =AVERAGE(D2:D4) ; where D2:D4 is the range of the Price column for the top 3 nearest neighbors.
Step 6: Conclusion
The predicted price of the new house will be the average of the prices of the 3 nearest neighbors.
Worked Example (Results)
Let’s calculate the Euclidean distance for each house:

The nearest 3 neighbors are :
House 1: Price = 300000
House 2: Price = 320000
House 3: Price = 350000
Prediction:
The predicted price of the new house is the average of the 3 nearest neighbours prices:
Predicted Price = (300000+320000+3500003)/3 = USD 323333.33

This visualization helps demonstrate how the new house's price is predicted by averaging the prices of the nearest houses.
The blue points represent the existing houses in the dataset, while the green points highlight the three nearest neighbors used for the prediction. The red point represents the new house, with its predicted price of USD 323,333.33 based on the KNN algorithm.
By evaluating factors like dataset size, dimensionality, and computational power, you can decide whether KNN is the right algorithm for your problem. The table below provides a clear understanding of when K-Nearest Neighbors (KNN) is most effective and highlights its limitations in certain situations.

K-Nearest Neighbors (KNN) is more than just a simple algorithm—it’s a powerful tool that enables businesses to make informed, data-driven decisions. From predicting customer purchases to estimating real estate prices, KNN thrives on proximity-based predictions, making it highly intuitive and practical across various industries. Its versatility in both classification and regression tasks, combined with its ease of implementation, makes it a go-to algorithm for many real-world applications.
Whether you're navigating fraud detection, personalized recommendations, or price predictions, KNN offers a transparent and effective approach that can transform your data into actionable insights. By leveraging the power of KNN, you open the door to smarter, faster, and more accurate business decisions.
Comments