Copilot in Excel: How to do principal component analysis with Python

There’s more data than ever before, so wouldn’t it be nice if you could squeeze the meaning out of all your variables into just a few that really matter? That’s exactly what Principal Component Analysis (PCA) is designed to do. It helps you reduce complex datasets into a smaller set of meaningful components that capture most of the variation in the data. And now that Python is directly available in Excel, performing PCA has become simpler than ever—even if you’re not a data scientist.

In this post, we’ll demonstrate PCA using the wine quality dataset as posted on the UC Irvine Machine Learning repository (We’ll use the red variants dataset.). This dataset is ideal for beginners because it has just the right number of variables (about a dozen) to clearly illustrate PCA without becoming overwhelming. Additionally, all variables in the dataset are numeric and measured on similar scales, such as acidity, alcohol content, and pH, which makes them well-suited to PCA.

And while we’ll use wine as our case study, PCA is widely applicable across many industries. In finance, analysts use PCA to identify patterns in stock movements or compress multiple risk indicators into key factors. In market research, PCA helps group customer behaviors and distill survey data into core attitudes or preferences. Even manufacturing leverages PCA to monitor quality control metrics and quickly detect when something is off.

To keep this post focused on understanding, interpreting, and applying PCA—not just executing code—we’ll utilize the Advanced Analysis with Python features of Copilot:

How to get started with Advanced Analysis with Python for Copilot in Excel

You can follow along with the exercise file below:

Download the exercise file here

Get an overview of PCA with Microsoft Copilot

Before diving into the nitty-gritty of running PCA, keep in mind that Copilot is trained on a huge range of information… not just Excel and Python code. So it’s helpful to start by asking some higher-level questions, like:

“Explain Principal Component Analysis in simple terms for a finance analyst who primarily uses Excel.”
“What are some common FP&A scenarios where PCA might be useful?”
“List the main benefits and limitations of using PCA on financial data.”

Also, remember that the canvas in Copilot for Excel is relatively small and less suited for broader conceptual conversations. If you need to tackle bigger-picture questions or discussions, consider using Copilot Notebook instead. Copilot in Excel is best for hands-on data operations and can support some analysis, but for deeper reasoning and exploration, the Notebook environment is a better fit.

Check out this post for more on using Copilot Notebooks:

How to get detailed Excel help with Copilot Notebooks

Prepare the dataset for PCA

Now that you’re feeling more confident about PCA, let’s dive into the actual analysis in Excel. First, make sure your dataset is uploaded to OneDrive and launch Advanced Analysis (check out our previous post if you need a refresher).

Before jumping straight into PCA, it’s a good practice to start with some exploratory data analysis such as checking for missing values, reviewing summary statistics, plotting histograms, or creating a quick correlation matrix. Copilot in Excel (especially through the Advanced Analysis pane) can help you easily generate and interpret these initial insights.

Now, let’s ask Copilot to help us prepare our dataset for PCA:

Show me how to prepare this dataset before running PCA.

Copilot handles this pretty well… and remember, all the code is right there for you to review and understand step-by-step. But keep in mind that Copilot sometimes makes judgment calls. For instance, here it automatically separated the “quality” variable from the rest of the data, assuming you’d use it as a target variable for predictions. Typically, PCA excludes these target variables because PCA results often feed into predictive models. So stay engaged, read closely, and double-check what Copilot is up to. Good pilots always stay alert, after all! 😉

Conduct the analysis

Now that we’ve standardized our data, let’s have Copilot run the PCA and help us make sense of the results.

Perform PCA on the standardized data and explain the output.

We get some initial insights here with the variance ratios for each principal component, but a solid PCA doesn’t stop there. Like most data analyses, adding context through summary tables and visualizations will give us a much clearer picture of what’s going on and what actions we can take.

That means we might have to look beyond just the initial output from Advanced Analysis. Ready to dig a bit deeper? Let’s do it!

Interpret the results

Let’s kick things off by asking Copilot a higher-level question:

Given the explained variance of each principal component above, how many components should I use, and how do I decide?

Copilot in Excel may or may not give us a great answer here—I’ve tried this a few times and found the guidance a bit underwhelming. This is a good moment to escalate to Microsoft Copilot and Copilot Notebooks if you need stronger support. Typically, they’ll recommend looking at eigenvalues or using scree plots to help guide your decisions. So, let’s see if Advanced Analysis can at least produce those outputs for us…

Can you show me the scree plot of these results?

We do, in fact, get a scree plot of the eigenvalues, and from this, we could apply the well-known “elbow rule” to suggest keeping two components. Not sure what the elbow rule is? Keep prompting Copilot for help! You can even screenshot your Excel output and upload it directly into your Copilot Notebook. Remember, because you’re staying within your Copilot license, it’s all secure so no worries about sensitive data.

Next, let’s check out the actual factor loadings from the PCA:

Can you give me the factor loadings of PC1 and PC2?

Copilot will probably handle this just fine, but you might notice that the “Preview” it returns in your Excel workbook cuts off halfway through the variables. Here’s where it helps to have some basic comfort with the underlying Python code—since you can quickly dive into the output, grab the full dataframe, and print the complete table somewhere else.

Now, at this point, you might be feeling pretty lost… even with Copilot’s help. You might be wondering, “Why exactly are we doing this analysis? What’s the point?”

Let’s step back and revisit the big idea behind PCA: we’re boiling down a bunch of variables into just a few. We’re making decisions about how many variables we actually need, keeping it lean and meaningful. If you’ve ever selected variables for a regression model, you’re already on the right track. It’s all about getting the most value from the fewest variables; that is, parsimony.

So, what did that scree plot actually tell us? By visualizing the eigenvalues, the scree plot helped us spot an “elbow,” the point at which additional components don’t significantly add much more explanatory power. For our wine dataset, the plot suggested keeping two principal components as a good balance between complexity and insight.

But why stop there? Now we want to look at the actual component loadings, or the weight each original variable contributes to our new principal components. This helps us understand what these components actually represent.

I’ll ask Copilot:

Visualize PC1 and PC2 component loadings, sorted by absolute magnitude, from high to low.

Take a look at the plots above, sorted by their absolute loading values. For PC1, the variables “fixed acidity,” “citric acid,” “pH,” and “density” dominate. Notice a pattern? These are variables closely related to acidity and chemical balance—so PC1 is primarily capturing characteristics related to acidity.

On the other hand, PC2 is strongly driven by “total sulfur dioxide” and “free sulfur dioxide,” followed by “alcohol.” These components largely represent chemical preservatives and alcohol content: variables important for wine preservation and strength.

Summing it up: our PCA has distilled the red wine dataset into two meaningful components. PC1 captures acidity and chemical balance, while PC2 highlights preservatives and alcohol. Now you’re not just juggling numbers, you’re seeing clear patterns and relationships.

Feeling more confident? Good, because interpreting PCA is a crucial step towards smarter, data-driven decisions. Let’s keep going with the prompt:

What would be some next steps in this analysis? What could I do with these results for interpretation or for further analysis?

I received the following from Copilot, which was pretty helpful!

With the PCA results, you can interpret which features drive the main patterns in your data by examining the loadings. Next, you could visualize your samples in the space of the first two principal components to look for natural groupings or outliers. You could also explore how wine quality relates to these components, or use the principal components as new features in a machine learning model to predict wine quality or classify wines.

For even more detailed guidance, consider following up with Microsoft Copilot. You might also reference your Excel workbook with the data to provide Copilot with additional context.

Conclusion

In conclusion, PCA lets you boil down a heap of variables into just a few key factors, simplifying the story your data is trying to tell. By spotlighting the most important sources of variation, PCA helps you quickly extract valuable insights, clarify your business decisions, and tackle even the most complicated datasets with confidence.

In this post, we walked through exactly how easy PCA can be when you combine the power of Excel, Python, and Copilot. No fancy data science degrees needed here. Excel’s user-friendly environment, boosted by Copilot’s Python integration, makes this advanced technique surprisingly straightforward to implement.

But PCA isn’t perfect: it assumes linear relationships and numeric data, so it won’t capture more complex, nonlinear interactions well. Plus, since PCA standardizes your data, important nuances can sometimes be lost, especially if your features vary widely in scale. Also, PCA results can be a bit abstract, so it’s critical to tie them back to real-world business meaning.

Still, you’re in great shape to use PCA effectively. You could explore visualizations to uncover hidden patterns, use principal components as features in machine learning models, or connect your findings directly to key business tasks like customer segmentation, forecasting, or risk management.

Got questions or want to dive deeper? Drop a comment below or get in touch.

Copilot in Excel: How to do principal component analysis with Python

Get an overview of PCA with Microsoft Copilot

Prepare the dataset for PCA

Conduct the analysis

Interpret the results

Conclusion

Like this:

Related

Want more Excel + AI insights? Join my newsletter.

Thank you for signing up. I look forward to becoming savvier about data with you.

Leave a ReplyCancel reply

Get an overview of PCA with Microsoft Copilot

Prepare the dataset for PCA

Conduct the analysis

Interpret the results

Conclusion

Share this:

Like this:

Related

Want more Excel + AI insights? Join my newsletter.

Thank you for signing up. I look forward to becoming savvier about data with you.

Reader Interactions

Leave a ReplyCancel reply