Exploratory data analysis (EDA) is the process of analyzing data to uncover trends, anomalies, and relationships without preconceived assumptions. It typically involves summarizing data with descriptive statistics, visualizing patterns with charts and plots, and spotting potential issues like missing values or outliers.
For Excel users, EDA is essential because it ensures better-informed decisions by deeply understanding the data, reducing costly misinterpretations. While Excel’s familiar tools like PivotTables, charts and the Analysis ToolPak support basic EDA, they can fall short with larger or more complex datasets.
Python in Excel expands Excel’s EDA capabilities significantly, allowing users to leverage powerful data science libraries such as pandas and matplotlib directly within their spreadsheets. This enables sophisticated data manipulation and visualizations without ever leaving Excel. Copilot further simplifies EDA by automatically generating Python code and providing immediate insights, greatly reducing the need for coding expertise.
However, this ease and speed make strong analytical thinking even more crucial. Tools can’t replace judgment, and that’s why resources like The Data Detective by Tim Harford and Becoming a Data Head by Alex J. Gutman and Jordan Goldmeier are so valuable. They help Excel users think critically, spot biases, and responsibly interpret data insights.
Also keep in mind that all actions here will generate Python code. It’s highly beneficial to be fluent in basic Python and statistics to confidently vet and repurpose the outputs.
If you’ve never used Advanced Analysis with Copilot in Excel before, check out this post for the basics. One key takeaway: your data needs to be uploaded to OneDrive first.
Also, keep in mind that results from Advanced Analysis, just like with other generative AI tools, are probabilistic, meaning individual outcomes might vary. You may even see some of these outputs from Copilot without explicitly requesting them, especially since this tool often kicks things off with exploratory analysis and visualizations. As every good analysis should!
Summarizing basic statistics
Let’s start with the foundational statistics. Sure, Excel can calculate basic statistics, but it quickly becomes cumbersome, particularly with non-adjacent columns. Moreover, Excel’s Analysis ToolPak isn’t dynamic, meaning it won’t update automatically when your data changes.
Here’s a prompt for Copilot:
“Summarize the basic statistics for the mpg dataset, including mean, median, mode, and range for key numerical features like mpg, cylinders, and horsepower.”

This prompt helps you quickly generate a high-level statistical summary of your data. Copilot even offers a concise narrative explanation you could easily adapt as the basis of a report or analysis.
And because the code here is live, all the values in your workbook update automatically whenever your data changes. (As for dynamically updating all the figures in your report, we’ll explore that another time… that could be another exciting use case for Python in Excel.)
Checking missing values and data types
Excel struggles when handling missing values and data types, both of which are critical for accurate quantitative analysis.
Copilot can efficiently tackle this with the following prompt:
“Generate a summary of missing values and data types for each column in the mpg dataset.”

Copilot quickly summarizes missing values and data types, showing that the “horsepower” column has 6 missing values while other columns are complete. Knowing the data types, such as numeric (Float64, Int64) or categorical (ObjectDType), helps analysts choose appropriate methods for handling missing data, making data cleaning faster and more accurate.
Creating visualizations
Visualizations are central to effective exploratory data analysis (EDA). While Excel can produce basic charts, it often struggles with scaling visualizations across multiple categories or generating several charts at once.
This is where combining Python and Copilot within Excel provides a substantial advantage: you can quickly create detailed visualizations, like histograms for multiple variables, all at once and without manual setup:
“Create histograms for the distribution of mpg, horsepower, and weight across all cars in the dataset.”

To compare fuel efficiency clearly across categories, box plots are an excellent choice because they visually summarize the median, variability, and presence of outliers within each group.
Use prompts like this to easily compare data across categories:
“Show box plots to compare fuel efficiency across different car manufacturers.”
The box plots shown above allow you to quickly compare MPG (miles per gallon) among the top 10 car manufacturers, making it easy to identify brands that consistently offer better or worse fuel efficiency, as well as those with more variability. With Python and Copilot integrated directly into Excel, creating these insightful visualizations is straightforward and requires minimal manual effort.
Diving deeper with correlation
Once comfortable with your data, you might transition toward confirmatory analysis or predictive modeling. Perhaps you’re curious about identifying which vehicle characteristics strongly predict fuel efficiency (mpg) or understanding how these factors relate to each other.
Start examining these relationships easily with prompts like:
“Use Copilot to calculate the correlation matrix for mpg, cylinders, displacement, horsepower, and weight.”

Correlation matrices, like the one generated above with Copilot, quickly highlight relationships by displaying correlation coefficients between pairs of variables. For example, we see that mpg is strongly negatively correlated with weight, displacement, horsepower, and cylinders.
This means that heavier vehicles, those with larger engines, or more cylinders typically have lower fuel efficiency. Such insights guide your analysis toward selecting meaningful predictors for building regression models or further confirmatory tests.
Once you’ve identified important relationships using the correlation matrix, a natural next step is visualizing these correlations to quickly detect patterns or clusters. A heatmap is ideal for this—colors intuitively highlight strong positive or negative relationships, making it easy to interpret complex correlation matrices at a glance.
For example, you can ask Copilot:
“Create a heatmap to visualize the correlation matrix for mpg, cylinders, displacement, horsepower, and weight.”
The heatmap vividly displays strong negative correlations (shown in blue) between mpg and variables like weight, horsepower, displacement, and cylinders. Meanwhile, it reveals strong positive relationships (shown in red) among these predictors themselves. This visual clarifies that as cars get heavier or have larger engines, fuel efficiency decreases, making it easier to select variables for predictive modeling.
Conclusion and next steps
In this post, we’ve explored how integrating Python and Copilot within Excel significantly expands your EDA capabilities. This integration simplifies tasks such as summarizing statistics, handling missing values, and creating visualizations, while dramatically improving the identification and interpretation of correlations. Tools like correlation matrices and heatmaps help you quickly pinpoint key relationships, ensuring you’re better prepared for predictive modeling.
As you move forward, you might explore building predictive models like linear or logistic regression to forecast outcomes based on identified relationships. You could also consider clustering techniques to segment your data into meaningful groups or even venture into advanced machine learning methods such as random forests or neural networks for deeper insights. Additionally, integrating simulation modeling or hypothesis testing within your workflow could further strengthen your analytical toolkit.
Ultimately, integrating Python and Copilot into Excel enhances both your efficiency and the quality of your insights, empowering you to make truly data-driven decisions.
How did this exploration of Python-powered EDA with Copilot in Excel resonate with you? Are there other analytical scenarios or techniques you’re curious about? Let me know in the comments.
Leave a Reply