Python’s incorporation into Excel offers data analysts robust new tools, particularly in the realm of data visualization. This blog post introduces Matplotlib, a core Python library for crafting visualizations. We’ll begin with the essential components, explore Matplotlib’s customization features, and discuss the benefits of using Pandas DataFrames as your data source for Matplotlib when working with Python in Excel—though this approach isn’t a panacea.
To dive in, download the exercise file here:
The building blocks of Matplotlib
We’ll dive into creating some basic Matplotlib plots using simple datasets shortly. Before that, it’s important to grasp the structure of Matplotlib. Similar to how Excel charts consist of various elements, Matplotlib visualizations are constructed from several essential components:
pyplot
: This is Matplotlib’s high-level interface, functioning similarly to Excel’s chart toolbar. It provides tools to quickly create and customize plots. In Python in Excel, this interface is typically imported with the common aliasplt
, allowing you to use its functions conveniently.- Figure (
fig
): The overall canvas that contains one or more plots, comparable to the chart area in Excel. The figure acts as the foundation for your visualization, encompassing all elements of your plot. - Axes (
ax
): The specific region within the figure where the data is plotted. This includes the x-axis, y-axis, gridlines, and the plot itself. It is analogous to the plot area in an Excel chart, where the main visual representation of your data appears.
When these components are brought together, you might see code like this:
fig, ax = plt.subplots()
This command creates a blank canvas—an empty figure and axes. While no data is plotted yet, it’s fully prepared for you to add and customize your visualization.
Let’s explore a more comprehensive example of creating a plot using Matplotlib. We’ll start with the same basic structure and then add layers of data to enhance it:
This will produce a simple line chart in your workbook:
Let’s break down the steps of what was done here.
- First, we used
plt.subplots()
to generate a figure (fig
) and one or more axes (ax
). - Next, we used the
ax.plot()
method to visualize data on the axes. - Finally, we added titles, labels, legends, and gridlines using methods like
ax.set_title()
andax.legend()
.
Python in Excel leverages an IPython kernel, which simplifies working with Matplotlib compared to standalone Python scripts. This means there’s no need to use plt.show()
to explicitly render a plot as you might see in some online tutorials. Plots are displayed automatically as soon as the code is executed. Additionally, the matplotlib.pyplot
module is pre-imported in Python for Excel, allowing you to use functions like plt.subplots()
immediately, without requiring an import statement.
fig, ax
isn’t always used
In many tutorials or examples, you might not see the fig, ax
structure explicitly. That’s because matplotlib.pyplot
also provides an interface that implicitly manages figures and axes behind the scenes. This allows for quick and simple visualizations, which can be especially useful for straightforward plots.
For instance, the following code creates a line plot without explicitly defining fig
and ax
:
This will return the same basic line chart as produced in the previous example.
Changing chart types in Matplotlib
What if you wanted to switch from a line plot to something else? In Matplotlib, the plot()
function generates a line chart by default. To display a different plot type, you can use other Matplotlib functions. In this example, we’ll use fig
and ax
to showcase a variety of plot types, each displayed in a separate subplot. We’ll also specify the exact location of each plot within a 2×2 grid.
This approach lets us create a mini-dashboard with just a few lines of code.
Matplotlib and Pandas DataFrames
While we’ve been using lists as data inputs for Matplotlib so far, in Python for Excel, you’ll often work with DataFrames, as they are the default data storage type. Using DataFrames as inputs offers several key advantages, especially when paired with Matplotlib.
One major benefit is the seamless integration between Pandas and Matplotlib, which allows you to easily generate a wide range of visualizations directly from your DataFrame. Since Pandas is built on top of Matplotlib, its .plot()
API simplifies the process of creating charts. This consistency in using a DataFrame structure for different chart types makes your workflow more intuitive and efficient.
Here’s a straightforward example of creating a basic line chart. The sales data is located in the sales_df
worksheet of the exercise file. We’ll load this data into Python within Excel, and then you’ll see how simple the code is for generating the line chart. We just specify the x and y axis variables from the DataFrame within the arguments, and then set a title for the chart.
While visualizing a DataFrame, many customizations can be achieved using the plot()
function or with some assistance from pyplot
. However, by defining the fig
and ax
, you gain access to a broader range of customization options.
It’s worth noting that some options, like setting the y-axis label, could also be done using pyplot()
, leading to some redundancy. Matplotlib’s extensive customization capabilities are both a strength and a complexity; it offers numerous ways to modify your plots, sometimes making it feel overwhelmingly customizable.
Conclusion
If you’re interested in data visualization with Python in Excel, starting with Matplotlib is a wise choice. As indicated by Google Trends, it’s one of the oldest and most widely used data visualization libraries available for Python in Excel, surpassing other data visualization packages in both age and popularity.
(If the visualization below doesn’t load, try refreshing the page or visit the page directly using this link.)
At the same time, it can be quite overwhelming and confusing with its layered approach and manual plot construction. While the Pandas API simplifies creating some basic plots directly from DataFrames, it doesn’t cover everything. So, it’s essential to get comfortable with Matplotlib, but also explore other packages that might suit the specific needs of Excel users who often work with tabular data.
Do you have any questions about the basics of Matplotlib or about using Python for data visualization in Excel more broadly? Let me know in the comments.
Leave a Reply