Demonstration of Anscombe’s Quartet in R

Due to the differences in scope between data analytics/business intelligence and data science, there’s a school of thought implying that data visualization and statistics are more or less independent pursuits. In my opinion, data is data — so why should it be that we treat dataviz and stats so differently? (Lyrics intended). Each gives us a different angle to handle the slippery eel that is data.

A great example in this triangulating pursuit is Anscombe’s Quartet. Here we’d like to examine the similarity of four X-Y coordinates. This dataset is available in base R, but it’s a little hard to work with as-is so we’ll reshape it with some tidyverse magic, then summarize with psych and visualize with ggplot2:

Sorry, something went wrong. Reload?

Sorry, we cannot display this file.

Sorry, this file is invalid so it cannot be displayed.

view raw anscombes-quartet-in-r.ipynb hosted with

by GitHub

Don’t skip dataviz… don’t skip stats

Data visualization proponents — those who often claim it’s separate from statistics — will claim how charting things is the true way to enlighten, without impurities, over statistics. However, there is an important statistical “silver lining” here in my opinion: be wary of very small sample sizes, they behave unusually! Each gives us perspective on the data; neither should be discarded.

In short, there is more that unites than divides data visualization and statistical methods, whether in R or not.

Demonstration of Anscombe’s Quartet in R

Don’t skip dataviz… don’t skip stats

Like this:

Related

Newsletter signup

Thank you for signing up. I look forward to becoming savvier about data with you.

Leave a ReplyCancel reply

Don’t skip dataviz… don’t skip stats

Share this:

Like this:

Related

Newsletter signup

Thank you for signing up. I look forward to becoming savvier about data with you.

Reader Interactions

Leave a ReplyCancel reply