In analytics, variables vary and we wouldn’t really have much to analyze if they didn’t, as this post shows:
Our goal is to make sense of how they vary, and one of the most important ways to do that (at least for continuous variables) is through measures of central tendency. This is a fancy way of saying, where does the typical observation’s fall?
The most typical measures of central tendency are the mean, median and mode. All of these can be easily calculated with Excel functions, and we’ll do so in this post.
But even more important than knowing how to derive these measures is understanding what they tell us about the data, and the pros and cons of each. And to understand that, we’ll use a short example…
Which measure should the nonprofit track?
Let’s say you are advising a nonprofit on their donations data. The nonprofit wants a measure that will indicate what they can expect a typical donation size to be. Let’s evaluate the pros and cons of the mean, median and mode here, calculating each in Excel:
Here’s our data:
Yeah, this is a miniscule and unrealistic dataset, but often the simplest explanation is the most powerful. You’ll also notice this data is stored in a table, with an index number. Read why I aim to do this to all my Excel datasets here.
The mean
First, the mean or average. This measure is derived by summing all the datapoints and then dividing that sum by the number of datapoints. You’re certainly welcome to follow that arithmetic in Excel, but we’ll use the AVERAGE()
function:
Cons of using the mean
Probably the most significant downside of the mean is that it can be sensitive to extreme values. For example, we have a donation of $120, which is three times the amount of the next-highest donation. Does this really tell us anything about its “central tendency?”
Pros of using the mean
First of all, this is probably the most common measure of central tendency out there, and for good reason. It extracts the most amount of information about the values of the variables, because unlike the median and mode we’re considering the entire magnitude of each datapoint instead of their position or frequency. So what if the $120 donation is unusual… it still happened!
That said, keeping the extreme value may cause problems for some data analysis methods. There are quite a few ways to address that situation. But we’ll move on to the other central tendency measures.
The median
The median is found at the “middle” of the data. Effectively, the highest and lowest values “cancel each other out,” then the second highest and lowest, and so on and so forth until you are at the median. If you find two numbers at the middle, you can take the average.
Fortunately, Excel handles all of this for us with the MEDIAN() function
:
Pros of using the median
This feels like more of an intuitive “center” of the data, no? The relative magnitude of the extreme value has been removed. I remember hearing college admission counselors telling us to ask for median standardized test scores rather than the average — the brainiacs and half-wits skew it for everybody else!
Cons of using the median
Again, we are judging where the center of the data is by more or less relative position, rather than true magnitude. This means we’re losing some important information about the underlying data when we take the median.
The mode
Last but not least: this is the most frequently-occuring value. A dataset can have multiple modes or it can have none. To account for the possibility of multiple pieces of output, Excel has created a new MODE.MULT()
function using the power of dynamic arrays:
Pros of using the mode
This is an interesting way to think about “central tendency:” which exact value is most common? You’ll often hear this metric stated by political campaigns — “Our most common donation is only $20!” to signal grassroots support. Perhaps the nonprofit would be interested in what the rank-and-file are donating too?
Cons of using the mode
The major downside is its instability. As mentioned, a dataset could have many modes or none. If you can’t reliably report on a figure, it will be hard to act on it!
What should the nonprofit do?
Which measure is right to track donations? We all hate this answer, but it depends. Each one has specific pros and cons, to the point where it may be worth tracking each to get a fuller look on the data.
Just as it’s hard to eat one potato chip, you’ll often find that statistical analysis entails evaluating multiple things to triangulate a solution:
Get centered
Where have you found interesting cases of central tendency measures? How would you advise the nonprofit? Let’s discuss in the comments.
Leave a Reply