For those interested in learning Python for Excel, a key area to explore is time series analysis. In particular, Python’s Pandas package is specifically designed to handle time series data effectively, often much better than Excel.
In this post, we’re going to look at a common task with time series data: creating moving averages. This method helps smooth out data over time, making trends easier to see.
To try this out yourself, please download the exercise file provided below:
This dataset presents a simplified view of daily sales over time, characterized by a gradual increase interspersed with considerable day-to-day volatility.
For this post, I’ll proceed under the assumption that you’re proficient in importing Excel tables into Python and utilizing Python for plotting within Excel. Additionally, it’s assumed that you’re familiar with creating basic calculated columns and charts using Python. If this isn’t the case, I recommend my book Advancing into Analytics.
Let’s begin by visualizing the sales data through a line chart. Although the upward trend over time is evident, the chart also reveals significant daily fluctuations, making the data somewhat challenging to interpret and understand visually.
One method to observe a decent level of granularity, while reducing some of the noise, is to employ a moving average.
Creating a moving average in Excel is feasible, but adding customization can make the formulas somewhat cumbersome. Additionally, I find that line charts produced by Python tend to have a more professional appearance. So let’s explore how to do this with Python in Excel.
Creating a moving average in Python
We can create a basic calculated moving average from the Sales
column by applying the rolling()
method. I’m going to set the window
argument to 7
to indicate that I want a seven-day (or one week) moving average.
Next, I’ll introduce the mean()
method to set this as a moving average over a sum or some other aggregation.
This approach provides a solid foundation but there are a couple things we could improve. First, you might notice that the initial six entries do not calculate correctly due to a lack of sufficient data points for generating a seven-day moving average. Moreover, the resulting averages are expressed with an excessive number of decimal places.
To address the first issue, we can use the min_periods
parameter in the rolling()
method of Pandas. This specifies the minimum number of observations required to compute a value in the result. Setting it to 1
ensures that we obtain a result for every possible data point.
Next, to address the issue of decimal places, I’ll incorporate the round()
method, setting it to 2
. This adjustment will round the results of the column to two decimal places, making our data cleaner and more concise.
Now that I’ve created the rolling average column, let’s plot it alongside the original data. As expected, the chart resulting from this reveals a reduction in the day-to-day noise, while preserving the overall trend and pattern of the time series.
Customizing the number of periods
Some of you may believe that smoothing is a beneficial technique, yet you might be uncertain about the optimal number of periods to use. It’s a valid question!
For short-term analysis or when aiming to identify immediate trends, a shorter moving average—such as 5 or 10 periods—can offer significant insights. This approach is particularly advantageous for detecting rapid changes in consumer behavior or market conditions, which are critical for tactical decision-making.
Conversely, for those focused on longer-term strategic planning, employing a moving average with more periods, such as 20, 50, or even 100, may be more suitable. This extended range effectively filters out the volatility of short-term fluctuations, providing a clearer view of the overall trend. It proves especially useful for identifying stable trends over time, thereby assisting in forecasting and the development of long-term strategies.
Experimentation with different periods to determine which best complements your business cycle and decision-making requirements is always beneficial. In pursuit of this, let’s create a dynamic plot that allows users to input the number of periods for generating the rolling average. To do this, I will introduce a new variable, moving_n
, that users can adjust. This variable will dictate the window
parameter of rolling()
, indicating the number of periods for smoothing.
Given the increasing complexity of this code, I will provide it in a copy-paste-friendly box below, followed by the results.
Feel free to test out your own workbook or use the sample solution provided above. How many periods do you think make sense to roll?
I hope this post has demonstrated a quick win for integrating Python with Excel, especially for time series data. Do you have any questions about using Python in Excel for time series analysis or anything else? Please share your thoughts in the comments.
Additionally, if you’re aiming to enhance your team’s skills to maximize the impact of financial data in Excel using Python, consider exploring my Python Foundations for Finance corporate workshops:
Leave a Reply