In a previous post, we explored the plotnine package in Python in Excel, highlighting how its implementation of the Grammar of Graphics allows for an almost limitless array of possibilities in creating any plot imaginable:
In this post, we’ll focus on how to craft layered (i.e., multiple elements in a single plot) and faceted (i.e., small multiples) plots. These capabilities are particular strengths of Plotnine and represent areas where typical Excel users can achieve significant gains. You can follow along with the exercise file below:
Layered regression plot
First, let’s analyze the relationships between body mass and flipper length by species, using a regression line for each group. Plotting separate regression lines for each species provides a more accurate representation of the relationship between flipper length and body mass within each group. In contrast, a single regression line across all species could obscure these distinctions and misrepresent the trends specific to individual species.
In the following code, we add a linear regression fit line for each group by including method='lm'
in the geom_smooth()
function, which is used to create the smoothing line geometry. The key to breaking down and layering the plot by species lies in assigning the color
aesthetic to Species
in the overall plot aesthetics.
Faceted boxplot
Next, we’ll create a faceted boxplot. Faceting refers to creating small multiples—a series of similar plots displayed side by side, each showing a subset of the data. In this example, each subplot, or facet, corresponds to a different species, allowing for easy comparison of bill length by sex across species.
In Plotnine, the function facet_wrap('~species')
creates a separate panel for each unique value of the species
variable. The ~
symbol is shorthand for “by,” so ~species
can be interpreted as “facet by species.” The facet_wrap()
function arranges these small multiples in a grid, with each panel displaying data for a single species. By default, Plotnine determines the layout based on the number of facets.
Faceting provides several advantages. By segmenting the data into smaller, comparable groups, it becomes easier to identify differences and similarities across categories. This approach avoids clutter by organizing the data into separate, tidy visualizations, ensuring clarity. Each subplot isolates trends within a single group, making them more apparent and eliminating distractions from other categories.
Layered histogram and density
Next, we’ll create a plot that combines a histogram and a density plot to display the distribution of body mass for different penguin species. Adding both a histogram and a density plot to the same visualization can be incredibly helpful for understanding the data. The histogram provides an intuitive, binned representation of the frequency of data points, while the density plot overlays a smoothed curve that represents the underlying distribution of the data. Together, they offer a comprehensive view: the histogram reveals specific counts within intervals, and the density plot highlights overall trends and patterns that might not be as obvious in the histogram alone.
In this code, the aes(y='..density..')
in the geom_histogram()
layer ensures the histogram is scaled to match the density plot. Instead of showing raw counts, the histogram is normalized to display probabilities, making it directly comparable to the density curve. The position='identity'
parameter ensures that the histogram bars for each species are overlaid on top of one another, rather than being stacked, so their individual contributions to the overall distribution are clear. The alpha
parameter, which controls the transparency of the layers, is particularly useful here. By setting alpha=0.3
for the histogram and alpha=0.5
for the density plot, you can see overlapping elements without any one layer obscuring the others, allowing for better visual interpretation.
Faceted regression plot
In an earlier example we created a layered regression plots, where all species share the same axes and data points were differentiated by color. This provided a unified view of the data, allowing for direct comparison of relationships across groups in a single plot.
However, this example uses faceting, splitting the plot into separate panels for each species. This is achieved with the facet_wrap('~species')
function, which creates a grid of plots, each focusing exclusively on one species. The ~
symbol specifies faceting “by” species, making it the key line of code that transitions the plot from layered to faceted.
Layering and faceting each have their pros and cons in this context. A layered plot allows for a direct, side-by-side comparison of regression trends across groups on the same axes, making it easier to detect overarching patterns or differences in slopes. However, it can become visually cluttered, especially if the groups overlap or have very different scales, making individual trends harder to discern.
In contrast, faceting separates the data into individual panels, reducing clutter and providing a clearer view of each group’s regression trend. This makes it easier to focus on within-group relationships without distraction from other groups. The downside is that faceting requires viewers to compare across separate panels, which can make subtle differences between groups less obvious.
Faceted density plot
Last but not least, we’ll create a faceted density plot that visualizes the distribution of penguins’ body mass across different islands, with the density curves differentiated by species. The mechanics of the code combine density plots and faceting to highlight patterns within the data effectively.
By using faceting, the plot separates the data by island, making it easy to identify location-specific patterns in body mass distributions. This avoids the visual clutter that might occur in a single, layered plot, especially if there are substantial differences between islands. The use of transparency with alpha
ensures that overlapping curves are distinguishable, enhancing the interpretability of the plot without sacrificing detail.
Conclusion
Plotnine’s Grammar of Graphics framework makes layered and faceted plots standout features for Python in Excel. By breaking visualizations into reusable components, Plotnine simplifies the process of creating advanced charts that seamlessly combine trends, comparisons, and distributions—all within a single workflow. These plots not only improve efficiency but also enhance the clarity and depth of your data storytelling, making them invaluable for uncovering insights and presenting findings effectively.
What questions do you have about layered and faceted plots in Plotnine specifically, or about data visualization with Python in Excel more broadly? Let me know in the comments.
Leave a Reply