Have you ever wanted to group customers into meaningful segments based on their behavior, without manually sorting through rows of data? That is exactly what k-means clustering helps you do.
In a previous post, we looked at how Copilot can help with this kind of customer segmentation.
This time, we will code the workflow ourselves in Python in Excel so we can better understand what is happening behind the scenes.
We will analyze a customer dataset and group customers based on factors like annual spend, order frequency, average order value, discount usage, returns, support tickets, and recency.
The goal is to find natural customer segments in the data, visualize those segments, and understand the fundamentals of how the process works.
Follow along by downloading the exercise file below.
Preparing the data
Before we can build customer segments, we need to get the data into the right shape for k-means clustering.
This code begins by loading the customer table from Excel into Python, then selecting the numeric fields that describe each customer’s behavior. These include how much they spend, how often they order, the size of their average order, how much they use discounts, how often they return items, how many support tickets they create, and how recently they placed an order.
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
df = xl("CustomerData[#All]", headers=True)
features = [
"annual_spend",
"orders_per_year",
"avg_order_value",
"discount_rate",
"returns_rate",
"support_tickets",
"days_since_last_order"
]
X = df[features]
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
X_scaled[:5]
Together, these fields give the model a behavioral profile for each customer. We are not using fields like customer ID, region, or membership tier at this stage because k-means works best with numeric inputs. Those descriptive fields can still be useful later, though, once we want to interpret the clusters and understand what kinds of customers ended up in each group.
The final step in this block is standardization. This is an important part of the process because the variables are measured on very different scales. Annual spend may be in the thousands, while discount rate and returns rate are much smaller decimal values. Without standardizing the data, k-means could end up giving too much weight to the columns with larger numbers simply because of the way they are measured.
StandardScaler solves that problem by putting all of the selected features onto a comparable scale. After scaling, each value represents how far that customer is above or below the average for that feature. Values close to 0 are near average, positive values are above average, and negative values are below average.
The final line, X_scaled[:5], displays the first five rows of the scaled data. This lets us confirm that the transformation worked and that the data is ready for clustering.

Choosing the number of clusters
Now that the customer data has been standardized, the next question is how many clusters we should ask k-means to create. This is an important choice because the number of clusters controls how detailed the final customer segmentation will be.
To help answer that question, this code uses the elbow method. We start by creating an empty list called sse, which will store the model’s error score for different numbers of clusters. In this context, SSE stands for “sum of squared errors.” It measures how far the customers are from the center of the cluster they were assigned to. A lower SSE means the customers within each cluster are grouped more tightly together.
sse = []
for k in range(2, 11):
model = KMeans(n_clusters=k, n_init=10, random_state=42)
model.fit(X_scaled)
sse.append(model.inertia_)
plt.plot(range(2, 11), sse, marker="o")
plt.xlabel("Number of clusters")
plt.ylabel("SSE / Inertia")
plt.title("Elbow Method")
The loop then tests several possible values for k, from 2 through 10. In k-means clustering, k represents the number of clusters we want the model to create. For each value of k, the code creates a new k-means model, fits it to the scaled customer data, and stores the model’s inertia score in the sse list.
Inertia is the term scikit-learn uses for this within-cluster error score. As we increase the number of clusters, inertia will usually go down because the model has more groups to work with. However, that does not mean we should automatically choose the largest number of clusters. More clusters may improve the score mathematically, but they can also make the final segmentation harder to explain and less useful in practice.
That is why the code plots the SSE values as a line chart. The x-axis shows the number of clusters, while the y-axis shows the SSE, or inertia. We are looking for the point where the line starts to bend and flatten out. This bend is known as the “elbow.”

In this chart, the SSE drops sharply as we move from 2 to 3 clusters, and again from 3 to 4 clusters. That tells us the model is getting much better at grouping similar customers as those clusters are added. After 4 clusters, the line still goes down, but the improvement becomes much smaller and more gradual.
Based on this pattern, 4 clusters appears to be a reasonable choice for this dataset. It captures the major differences among customers without creating so many segments that the results become difficult to interpret. In other words, 4 clusters gives us a practical balance between model fit and simplicity.
The goal here is not to find the lowest possible SSE. The goal is to find a useful number of customer segments that gives us meaningful insight into the data. Based on the elbow chart, we will move forward with k=4 for the final k-means model.
Fitting the final k-means model
Now that we have chosen 4 clusters, we can fit the final k-means model and add the results back to the original customer dataset.
kmeans = KMeans(n_clusters=4, n_init=10, random_state=42)
df["cluster"] = kmeans.fit_predict(X_scaled)
df["cluster"].head()
This code creates a k-means model with 4 clusters, fits it to the scaled customer data, and then stores each customer’s assigned cluster in a new column called cluster.
The output from df["cluster"].head() shows the first few cluster labels:

These cluster numbers are just labels. Cluster 0 is not automatically better or worse than Cluster 1. The labels only become meaningful once we summarize the clusters and look at the average customer behavior in each group.
Interpreting and visualizing the customer segments
Now that each customer has been assigned to a cluster, we need to understand what those clusters actually mean.
cluster_summary = (
df
.groupby("cluster")[features]
.mean()
.round(2)
)
cluster_summary
This table shows the average customer profile for each cluster. Each row is one segment, and each column shows the average value for one of the features used in the model.
This is where the cluster numbers start to become useful. Instead of just seeing that a customer belongs to Cluster 0, 1, 2, or 3, we can compare the average behavior of each group.

For example, we can look for patterns such as:
- Which cluster spends the most annually?
- Which cluster orders most often?
- Which cluster uses the largest discounts?
- Which cluster has the highest return rate or support activity?
- Which cluster has gone the longest since its last order?
Those patterns help us turn the raw cluster labels into business-friendly segment descriptions. One group might look like high-value loyal customers, another might look like discount-driven buyers, and another might look like customers who are starting to drift away.
Now we can visualize the customer segments with a scatterplot.
plt.scatter(
df["annual_spend"],
df["orders_per_year"],
c=df["cluster"]
)
plt.xlabel("Annual Spend")
plt.ylabel("Orders per Year")
plt.title("Customer Segments from K-Means")
Each point represents one customer. The x-axis shows annual spend, the y-axis shows orders per year, and the color shows the cluster assignment.
This gives us a quick visual check on the segments. We can see clear differences between lower-spend, lower-frequency customers; frequent buyers; and higher-spend customers.
Keep in mind that the model used all of the selected features, not just the two shown here. This chart is a simplified view, but it helps make the cluster results easier to understand.

Conclusion and next steps
In this post, we used Python in Excel to build a practical k-means clustering workflow for customer segmentation. We selected numeric behavior fields, standardized the data, used the elbow method to choose a reasonable number of clusters, fit the model, and added the cluster labels back to the original dataset.
From there, we summarized the clusters and created a scatterplot to help interpret the results. The key point is that the cluster labels are not the final insight. The real value comes from understanding what each segment represents and connecting those findings back to business questions.
For example, one segment might represent high-value customers, another frequent but discount-sensitive buyers, and another less engaged customers. These groups can support follow-up analysis, marketing strategy, retention planning, or customer service decisions.
There are a few limitations to keep in mind. K-means requires us to choose the number of clusters in advance, the elbow method is useful but not perfectly objective, and the results depend heavily on which variables are included. The scatterplot is also only a simplified two-variable view of a model built from several customer behavior fields.
As a next step, you could test different values of k, try different feature combinations, or bring in descriptive fields like region or membership tier to better understand each segment. You could also use the final cluster assignments in PivotTables, charts, or additional Python analysis inside Excel.
