Analyzing the distribution of stock returns is a cornerstone of financial modeling, as many foundational financial theories, including Modern Portfolio Theory and the Black-Scholes model, assume that stock returns follow a normal distribution. If returns significantly deviate from normality, financial models relying on this assumption can produce misleading results, potentially affecting investment strategies and risk management.
While Excel offers basic tools for visualizing distributions, Python provides more powerful, precise, and comprehensive methods for testing statistical assumptions. With Python now integrated directly within Excel, financial analysts and data enthusiasts can easily access advanced statistical tools without leaving their familiar Excel environment.
In this post, we’ll demonstrate step-by-step how to calculate and rigorously test the normality of Tesla’s stock returns using Python in Excel. We’ll explore several visual and statistical techniques, such as histograms, Q-Q plots, and statistical tests including the Shapiro-Wilk and Anderson-Darling tests. Additionally, we’ll discuss the advantages and limitations of these methods, helping you understand which approach might be best suited for different scenarios.
Download the exercise file below to follow along:
First, we’ll load Tesla’s closing price data from our Excel sheet and compute daily returns. This step transforms price data into returns, essential for many financial analyses because returns, rather than prices, typically exhibit the statistical properties of interest.
While this code block and those that follow are embedded here as Gists for easy sharing and copying, you’ll also find all of them available in the provided workbook.
Visualization techniques
Histograms
As is often the case in data analysis, we’ll start by visualizing our data to gain intuitive insights into its distribution. Histograms are ideal for quickly assessing the shape of our dataset. Specifically, we’re looking for that classic, symmetrical bell-shaped curve characteristic of a normal distribution.

While histograms offer a quick, intuitive glimpse into the distribution of our data, they’re inherently subjective: bin size and visual scaling can significantly affect our interpretation. Additionally, you may have noticed potential abnormalities such as excess kurtosis (the sharp peak around the center) and possible outliers in both tails.
Q-Q plots
To explore these concerns more objectively, let’s examine a Quantile-Quantile (Q-Q) plot next. A Q-Q plot compares the quantiles of our observed returns against the theoretical quantiles of a perfectly normal distribution. Any deviations from the straight reference line clearly illustrate departures from normality, providing detailed insights into exactly how and where our returns differ. To create this plot, we’ll use the probplot
function from Python’s SciPy library:

The Q-Q plot above provides a more detailed view of Tesla’s returns relative to a normal distribution. The data points mostly follow the red reference line closely, reinforcing the earlier observation from our histogram that returns appear roughly normal. However, we clearly see deviations at both ends, particularly in the tails, where data points stray away from the line. These tail deviations suggest the presence of outliers or heavier tails than a perfect normal distribution would predict. This visual evidence further emphasizes the need for formal statistical testing to quantify these deviations and confirm whether or not the normality assumption holds.
Statistical techniques
Shapiro-Wilk
We’ll begin with the Shapiro-Wilk test, a widely-used statistical method for evaluating whether data, such as financial returns, follows a normal distribution. This test is especially popular due to its accuracy with smaller datasets (typically fewer than 5,000 observations). Performing the Shapiro-Wilk test is straightforward: you simply input your returns, and the test examines whether deviations from normality are statistically significant. The test provides a p-value: if it’s below your selected significance threshold (commonly 0.05), this suggests your data likely deviates from a normal distribution.
To clearly display these results, we’re using the print()
function. Without it, Python would evaluate our conditional logic silently, without visibly indicating the outcome. These print statements ensure our results appear directly in the output pane of the Python editor (not in the workbook grid itself), giving you straightforward, easy-to-interpret output like this:
Shapiro-Wilk Test: Statistic=0.943, p-value=0.000
Returns are not normally distributed (reject H0)
In the output above, the Shapiro-Wilk test returned a statistic of 0.943
and a p-value of 0.000
. Because this p-value is well below our significance level (0.05), we reject the null hypothesis (H0) and conclude that our returns are not normally distributed.
One major advantage of the Shapiro-Wilk test is its sensitivity to smaller sample sizes, detecting deviations from normality even when other tests might miss them. However, this sensitivity can sometimes cause it to flag even minor deviations as statistically significant, especially in larger datasets. For substantial financial datasets, this heightened sensitivity might actually become a limitation, and alternative tests such as the Anderson-Darling test may provide a better assessment.
Anderson-Darling
The Anderson-Darling test measures how closely a given dataset aligns with a specified theoretical distribution—in this case, the normal distribution. Conceptually, it places greater emphasis on deviations at the tails of the distribution, making it particularly effective for detecting outliers or extreme observations that commonly occur in financial returns.
To run this test we’ll use the stats.anderson()
function from SciPy to perform this test, providing both a test statistic and critical values at various significance thresholds. A simple loop then checks whether our test statistic surpasses each critical value, providing explicit interpretations at each significance level.
This test returns the following output:
At 15.0% significance: NOT normal (statistic > 0.5710)
At 10.0% significance: NOT normal (statistic > 0.6510)
At 5.0% significance: NOT normal (statistic > 0.7810)
At 2.5% significance: NOT normal (statistic > 0.9110)
At 1.0% significance: NOT normal (statistic > 1.0830)
In the output above, the Anderson-Darling test returned a statistic of 1.0830, which exceeds all provided critical values, even at the strictest significance level (1%). Because our statistic surpasses these thresholds, we reject the null hypothesis (H0) and conclude that our returns are not normally distributed.
A key advantage of the Anderson-Darling test is its sensitivity to deviations in the tails of the distribution, making it particularly useful for financial data where extreme values and outliers commonly appear. Unlike Shapiro-Wilk, which excels at identifying general departures from normality in smaller samples, Anderson-Darling emphasizes tail deviations, providing a robust check for larger datasets. This complementary approach helps confirm findings from tests like Shapiro-Wilk and highlights the importance of considering multiple methods when testing for normality.
Comparison of visualization and statistical methods
By the way… some of you might have noticed a disconnect between the seemingly normal visualizations and our statistical test results. Here’s what’s happening.
Visualizations like histograms or Q-Q plots can look quite normal at first glance, but statistical tests often flag subtle deviations. Tests such as Shapiro-Wilk or Anderson-Darling are designed to detect even slight departures from perfect normality, like heavier tails or mild asymmetries that are common in financial returns. Financial data often has “fat tails,” meaning extreme values occur more frequently than expected under strict normality, causing tests to reject the assumption.
Whether this matters depends heavily on your analysis goals. If precise normality is essential—say, for certain risk models—you may need to consider alternative distributions or robust techniques. Otherwise, these minor deviations typically won’t significantly impact your analysis, though it’s still valuable to understand and acknowledge them.
Below is a summary of the methods we’ve explored, along with their strengths and limitations:
Method | Pros | Cons |
---|---|---|
Histogram | Easy and intuitive visualization of overall shape and spread. Great for quick insights into distribution. | Subjective interpretation; affected by choice of bins and scale, lacks statistical precision. |
Q-Q Plot | Clearly identifies how data deviates from a theoretical normal distribution. Especially effective at detecting deviations in tails. | Visual interpretation can still be somewhat subjective; subtle differences can be overlooked. |
Shapiro-Wilk | Highly effective and statistically rigorous for smaller datasets (under ~5000 points); straightforward interpretation using p-value. | Loses reliability as dataset size grows large; overly sensitive to minor, practically insignificant deviations at larger scales. |
Anderson-Darling | Specifically sensitive to deviations in distribution tails, making it ideal for financial returns known for fat-tailed behavior; excellent complement to Shapiro-Wilk. | Interpretation slightly more complex; relies on comparing test statistic against multiple critical values rather than a simple p-value. |
Conclusion
In this post, we’ve explored how to assess the normality of financial returns using Python integrated into Excel. While visual tools like histograms and Q-Q plots offer helpful, intuitive insights, they often lack the precision needed for rigorous conclusions.
Statistical tests such as Shapiro-Wilk and Anderson-Darling provide more objective evaluations by detecting subtle but important departures from normality. Each method has its strengths: Shapiro-Wilk is especially effective for smaller samples, whereas Anderson-Darling is sensitive to tail deviations common in financial data. Recognizing these differences is crucial because accurate modeling directly impacts your risk management strategies and investment decisions. No single method is perfect, so combining visualizations and statistical tests gives you the most reliable assessment.
If you have questions about using Python in Excel for finance or want to dive deeper into these methods, let me know in the comments.
Leave a Reply