Due to the differences in scope between data analytics/business intelligence and data science, there’s a school of thought implying that data visualization and statistics are more or less independent pursuits. In my opinion, data is data — so why should it be that we treat dataviz and stats so differently? (Lyrics intended). Each gives us a different angle to handle the slippery eel that is data.
A great example in this triangulating pursuit is Anscombe’s Quartet. Here we’d like to examine the similarity of four X-Y coordinates. This dataset is available in base R, but it’s a little hard to work with as-is so we’ll reshape it with some tidyverse
magic, then summarize with psych
and visualize with ggplot2
:
Don’t skip dataviz… don’t skip stats
Data visualization proponents — those who often claim it’s separate from statistics — will claim how charting things is the true way to enlighten, without impurities, over statistics. However, there is an important statistical “silver lining” here in my opinion: be wary of very small sample sizes, they behave unusually! Each gives us perspective on the data; neither should be discarded.
In short, there is more that unites than divides data visualization and statistical methods, whether in R or not.
Leave a Reply