Python in Excel offers powerful tools for creating impactful visualizations, and one standout option is the plotnine package.
In this post, we’ll delve into the origins and philosophy of plotnine, explore its core building blocks, and demonstrate some easy and impressive visualizations you can create—perfect for Excel users looking to enhance their data storytelling. To follow along, download the exercise file below:
The building blocks of Plotnine
Plotnine, inspired by the ggplot2 library in R, is a Python library built on the Grammar of Graphics philosophy. This approach simplifies visualization by breaking down the process into clear, reusable components that define what you want to plot and how you want to display it. Here are the key building blocks of Plotnine:
Data
The foundation of any Plotnine visualization is your dataset, typically a Pandas DataFrame. The data is mapped to visual elements (like points or lines) using aesthetics, such as x, y, color, and size. This close integration with DataFrames makes Plotnine ideal for Excel users who already organize their data in tabular formats.
Aesthetic Mappings (aes)
The aes()
function defines how variables in your dataset are mapped to visual properties of the plot. For instance, you might map one column to the x-axis, another to the y-axis, and yet another to the color or size of the points. This is akin to choosing which fields to visualize when creating an Excel chart. Examples:
- Assigning colors based on a categorical variable (
color=category_column
) - Mapping a column to the x-axis (
x=column_name
)
Geometric Objects (geoms)
Geometric objects define the type of plot you want to create—scatterplots, line charts, bar charts, etc. Each geom_
function represents a different visual layer. For instance:
geom_point()
: Scatterplotgeom_line()
: Line chartgeom_bar()
: Bar chart
You can layer multiple geoms on the same plot to add complexity, such as adding trendlines or annotations.
Plotnine, like ggplot2, allows for a variety of additional layers to enhance your plots, including scales, themes, and statistical transformations. A comprehensive overview of these can be found on the Quebec Centre for Biodiversity Science website. While this resource is focused on ggplot2 in R, the principles directly translate to plotnine in Python.
Plotnine refactors the ggplot2 package into Python by implementing a similar Grammar of Graphics framework. It provides a Pythonic API that mirrors ggplot2’s syntax, while integrating seamlessly with Pandas for data manipulation. This layered structure allows users to systematically combine data, aesthetic mappings, and geometric objects, making it easy to create sophisticated and customizable visualizations.
For Excel users, plotnine offers a bridge to powerful, scriptable data visualization, leveraging familiar concepts in a more flexible programming environment. By learning plotnine, you can unlock new ways to present and analyze your data with clarity and impact.
Data and package import
Let’s start by loading two datasets into Python in Excel. I’ll also import various components from plotnine to craft and customize visualizations. Each import, like geom_point
or theme_minimal
, adds specific plotting or styling features, enabling the creation of complex plots.
Let’s dive in! We’ll explore a variety of common plots used in data analysis, ranging from those that are relatively straightforward to create in base Excel to others that may require a bit more effort.
Scatterplot
Let’s start with a scatterplot. The aes()
function defines the plot’s aesthetics: x
maps to the “Advertising_Spend” column, y
maps to “Monthly_Sales,” and color
assigns different colors to points based on the “Store” column, enabling visual comparisons across stores. The geom_point()
function adds the data points, while labs()
customizes the plot’s title and axis labels.
Although creating a scatterplot in Excel is straightforward, conditionally coloring points by a category, like “Store,” can be cumbersome. In contrast, with the grammar of graphics, it’s as simple as mapping “Store” to color
in the aesthetics.
Line plot
Now let’s create a line plot. This time, we use both geom_line
and geom_point
to add lines connecting the data points and to highlight each individual point on the plot. We also apply theme_classic
for a clean, minimalist chart style. As before, we’ll add chart and axis labels to keep the visualization clear and informative.
If you’d like to customize features like line colors, point styles, or other elements, the grammar of graphics makes these adjustments simple and systematic.
KDE plot
Next, let’s use a Kernel Density Estimate (KDE) plot to visualize the distribution of advertising spend in the dataset. Like a histogram, a KDE plot illustrates the distribution of a variable, but instead of grouping data into discrete bars, it uses a smooth curve to estimate the probability density, offering a more continuous view of the data.
In this code, geom_density()
generates the KDE plot and fills it with a semi-transparent purple (fill='purple', alpha=0.5
) to enhance visibility.
Violin plot
Next, let’s use a violin plot to visualize the distribution of customer ratings by store. A violin plot is similar to a boxplot in that it displays the range, center, and spread of a dataset, but it also incorporates a Kernel Density Estimate (KDE) to show the full distribution shape on either side of the plot, making it easier to see patterns like multimodality or skewness.
In this code, geom_violin()
creates the violin plot, with draw_quantiles=[0.5]
adding a line to represent the median of each distribution. The fill='Store'
aesthetic colors each violin by store, and alpha=0.7
makes the fill semi-transparent for better visual appeal.
Bar plot
Last but not least, let’s explore the familiar bar plot. Here, we visualize the average monthly sales for each store. The data is first preprocessed by calculating the mean monthly sales for each store using groupby
and mean()
, then sorted in descending order of sales for clarity.
Using the preprocessed data, the ggplot
function defines the plot, and geom_bar(stat='identity')
creates the bar plot, displaying exact sales values rather than counts. For visual enhancement, scale_fill_brewer()
applies a sequential blue color palette, with the order reversed using direction=-1
. The theme_classic()
function ensures a clean and simple design, while theme(axis_text_x=element_text(rotation=45, hjust=1))
rotates the x-axis labels for improved readability.
Conclusion
Plotnine brings the power of ggplot2’s Grammar of Graphics into Python, offering Excel users a way to create visualizations that transcend the boundaries of traditional charting tools. Whether you’re crafting scatterplots, violin plots, or more complex visuals, plotnine’s flexibility and depth make it easy to transform raw data into clear, compelling stories.
While mastering the Grammar of Graphics requires an investment of time, the rewards are well worth it. By understanding its building blocks, you unlock a versatile system of interchangeable parts, enabling you to design virtually any visualization you can imagine—all while working seamlessly alongside Excel.
Have questions about using plotnine or data visualization with Python in Excel? Drop them in the comments, and happy plotting!
Leave a Reply