In previous posts, I’ve discussed the Matplotlib package for plotting, particularly in the context of Python in Excel. In this post, I’ll demonstrate some quick wins for creating business-focused plots that are challenging to build directly in Excel. You can follow along using the exercise file below:
Let’s get started with a Gantt chart.
Gantt chart
A Gantt chart is a valuable visualization tool for managing project schedules. It represents tasks or activities as horizontal bars, with the bar’s length corresponding to the task’s duration. Gantt charts are widely used by analysts to plan projects, monitor progress, communicate timelines to stakeholders, and assess resource allocation.
Creating Gantt charts in Excel can be challenging due to the lack of a native feature. Building one typically requires extensive customization or using a specific template. Moreover, making dynamic updates to these charts can be time-consuming, and Excel’s interactivity is limited compared to tools like Matplotlib.
The Python code provided demonstrates how to create a Gantt chart in Matplotlib using the data from the gantt
worksheet:
Here’s how it works: the date2num
function converts datetime objects into numerical values that can be used for plotting.
For each task, a horizontal bar is created with the ax.barh
function. This function takes the task’s vertical position, the bar’s width (calculated as the difference between the end and start dates), and the starting position converted to numerical format. The bars are styled with a specified height and color for clear visualization.
The y-axis is labeled with task names, while the x-axis is formatted to display dates, making it easier to interpret the timeline. To enhance readability, the chart is further customized with axis labels and a title, ensuring it is both informative and visually appealing.
Below is the resulting Gantt chart:
Waterfall chart
A waterfall chart is a powerful visualization tool that illustrates how an initial value changes through a series of positive or negative adjustments to arrive at a final total. Commonly used for financial analysis, performance tracking, or breaking down contributing factors, waterfall charts effectively display cumulative effects across time or categories.
Although Excel has included a native waterfall chart option since 2016, let’s explore how to construct one in Matplotlib to take advantage of Python’s customization capabilities, using the data in the waterfall
worksheet. The process begins with setting up helper columns to prepare the data for visualization.
In the Python code provided, the DataFrame waterfall_df
is structured for plotting by calculating cumulative values for a running total and a “Previous” column to define the baseline for each bar. The cumulative values determine the top of each bar, while the “Previous” column offsets them downward to establish the starting position. This ensures the data aligns correctly for the waterfall chart.
The chart itself is built by iterating through the waterfall_df
DataFrame. The plt.subplots
function initializes the figure and axes, setting an appropriate figure size. In the first loop, each bar is plotted using ax.bar
. The bar height corresponds to the “Value” column, the starting position comes from the “Previous” column, and the color is defined in the “Color” column. To make the bars distinct, an edge color is added.
The second loop adds value labels to the bars. The midpoint of each bar is calculated using the “Previous” value and half the bar height (Value / 2
). These labels are displayed with a “+” sign for positive values, ensuring clarity. The labels are then positioned horizontally and vertically centered on the bars.
A dashed horizontal reference line at zero is added with ax.axhline
to separate positive and negative values visually. To finalize the chart, titles and axis labels are added with ax.set_title
, ax.set_ylabel
, and ax.set_xlabel
, and the x-axis tick labels are slightly rotated for improved readability using plt.xticks
. The plt.tight_layout
function is used to adjust spacing, ensuring all elements fit neatly within the chart.
Here’s the resulting waterfall chart:
Dumbbell chart
Lastly, let’s explore the dumbbell chart, using the data provided in the dumbbell worksheet. This visualization effectively compares two values across multiple categories, highlighting their differences. Its design resembles a pair of dumbbells, with two markers connected by a line for each category. Dumbbell charts are particularly useful for showcasing before-and-after values, changes over time, or differences between two conditions.
Although creating a dumbbell chart in Excel is possible with some manual effort, there’s no native feature to support it. Let’s see how Matplotlib simplifies this process, using data from the dumbbell
worksheet.
In the given code, the typical figure (fig
) and an axis (ax
) are initialized with plt.subplots
, setting the size of the chart. The chart is then initialized with plt.subplots
, specifying a figure size for clarity.
To enhance the visual design, an offset is introduced for the connecting lines using the variable line_offset
. This ensures that the gray lines linking “Before” and “After” values stop just short of the dots, creating a cleaner look. These lines are drawn horizontally using hlines
, where the vertical positions correspond to the dataset’s indices, and the adjusted “Before” and “After” values determine the horizontal endpoints.
The dots representing the “Before” and “After” values are plotted using scatter
with an increased size (dot_size
) for better emphasis. The dots for “Before” values are colored blue, while those for “After” are green. This distinction helps the viewer quickly differentiate between the two datasets.
To provide additional context, numeric labels are added directly inside the dots. This is achieved with the text
function, which places the “Before” and “After” values as text at their respective positions. The labels are styled with a bold white font to ensure visibility against the colored dots.
For readability, the y-axis tick labels are replaced with the names of the categories, and the x-axis is labeled to indicate the values being compared. The vertical limits of the chart are adjusted using set_ylim
to ensure sufficient spacing above and below the data points, preventing a cramped appearance. The result is as follows:
Conclusion
In conclusion, Matplotlib opens up a world of possibilities for Excel users to create dynamic, professional, and visually appealing charts that go beyond Excel’s native capabilities. With Python in Excel, you can now combine the accessibility of Excel with the flexibility and power of Python, making it easier than ever to enhance your data visualizations.
What questions do you have about creating Matplotlib plots or data visualization in Python in Excel? Let me know in the comments.
Leave a Reply