To an outsider, some R packages sound too cheeky to be very valuable. Take, for example, the tidyverse
. What on earth does that groaner of a portmanteau do?
By the end of this workshop, you’ll know that the tidyverse
is so-called because it’s a collection of packages used together to clean, model and depict data using “tidy” principles.
More than a package
That means the tidyverse
is more than a package. It’s even more than a series of packages. It’s a whole “mental model” of how data should work.
Hadley Wickham and Garrett Grolemund, in R for Data Science depict the data workflow like this:
That is, it’s an iterative process that starts with preparing the data by importing it and then “tidying” it. Those first steps are the focus of this learning guide, along with the basics of data visualization.
There are different ways to prepare a dataset for analysis, but the nice thing about using the “tidy” framework is there are no surprises in how to do it. This framework will help you explicitly think through what needs to happen to a dataset for it to be of much use in your analysis.
After that, the tidyverse
provides a suite of tools for continuing your data journey: from the re-shaping, to the manipulating, to the visualizing.
This one-day workshop focuses on the elements of the tidyverse
most commonly to be used in basic data cleaning, exploratory data analysis and visualization.
Get your copy of the guide below. You are welcome to use this learning guide at your school or workplace to guide workshops or for however you can benefit from it. Access to the learning guide is not access to a workshop itself. You can use this learning guide to conduct a workshop at your organization. Consider this guide more like a recipe or blueprint.
This learning guide is part of my resource library. For exclusive free access, subscribe to my newsletter below.
If you’re an individual user looking to get acquainted with the tidyverse
, I suggest my book Advancing into Analytics: From Excel to Python and R.
Lesson 1: The tidyverse and tidy data
Objective: Student can compare and contrast the tidyverse to the general R environment
Description:
- What is tidy data?
- A tour of the tidy galaxies
- The tidy workflow
Time: 40 minutes
Assets needed: none
Lesson 2: Importing data
Objective: Student can read tabular files into R
Description:
- Introduction to the tibble
- Importing text files
- Importing Excel workbooks
Time: 40 minutes
Assets needed: Baseball records
Lesson 3: Re-shaping data
Objective: Student can transform a dataset to fit tidy principles
Description:
- Pivoting and un-pivoting datasets
- Delimiting columns
Time: 60 minutes
Assets needed: Baseball records
Lesson 4: Manipulating data
Objective: Student can create a data manipulation pipeline
- Manipulating rows & columns
- Aggregating & summarizing data
- Piping functions
Time: 75 minutes
Assets needed: Baseball records
Lesson 5: Joining and appending data
Objective: Student can create a data manipulation pipeline
- Appending two or more tables
- Joining two tables: left, right, inner, outer
Time: 75 minutes
Assets needed: Baseball records
Lesson 6: Miscellaneous tidying
Objective: Student can manipulate strings, factors and dates
- Formatting, replacing and splitting strings
- Ordering and modifying factors
- Generating, calculating and resampling dates
Time: 60 minutes
Assets needed: Flight records
Lesson 7: Visualizing data
Objective: Student can create graphical depictions of variable relationships
- The grammar of graphics
- Plotting univariate relationships
- Plotting bivariate relationships
- Customizing scales, legends & themes
Time: 90 minutes
Assets needed: Baseball records
This learning guide is part of my resource library. For exclusive free access, subscribe to my newsletter below.
Yiming Liu
how do I register for this workshop? thanks
George Mount
Hi Yiming, Thanks for reading. You are welcome to use this learning guide at your organization to conduct the workshop; I currently do not offer it asynchronously. I can facilitate if helpful; this guide has all the learning objectives, datasets, etc needed.
david galván
Hi. How do I get the guide, datasets, etc… Thanks
George Mount
Hi David, thanks for reading — the learning guide is available for download toward middle of the page. You can click on the image through to the PDF and download. You are welcome to use this guide at your workplace or organization; consider it like a recipe or blueprint.
Good question on the datasets. The baseball and flight records I refer to in the datasets to use come from the `Lahman` and `nycflights13` packages, respectively.