Pivoting on Text in R (vs. Excel)

This post follows a previous tutorial on pivoting on text in Excel. In this post I will reproduce the exercise in R. This way you begin to see the similarities and differences of the program and begin to diversify your data skill base.

Related: 5 Things Excel Users Should Know About R — Free Mini-Course

Similar to the previous lesson, we will be using the 2017 .csv version of the Lahman baseball dataset. We will again be using the People.csv and AwardsPlayers.csv files.

1. Set up our environment

In addition to reading in the two .csv files, we need to call two libraries for data manipulation: plyr and dplyr.

library(plyr)
library(dplyr)
players <- read.csv("C:/RFiles/people.csv")
awards <- read.csv("C:/RFiles/awardsplayers.csv")

2. Filter and merge tables

Using the filter and select functions from dplyr, we create a table containing all awards information since 2010 by player ID, along with that player’s first and last name.

I include the nrow function to verify that indeed the resulting table in R has the same number of rows as the Excel table: 502.

awards <- filter(awards, yearID >= 2010)
players <- select(players, playerID, nameFirst, nameLast)
tbl <- merge(awards, players, all.x = TRUE)
nrow(tbl)

3. Derive “First Name Last Name” column

Using the paste function with a space delimiter, we create a new column containing each player’s first and last name.

tbl$fullname <- paste(tbl$nameFirst, tbl$nameLast, sep = " ")

4. Pivot and view results

Now we use the ddply function from plyr to pivot on text. Similar to creating a measure in the Excel data model (Step 5 of the Excel post), we create a table summarized by awardID and yearID with the player name’s concatenated with a space delimiter (the paste function in R).

The View function then launches a spreadsheet-like viewer of the data frame.

x <- ddply(tbl, .(awardID, yearID), 
      summarize, names = paste(fullname, collapse = ", "))

View(x)

Below I show the results in R and Excel — they are the same. Excel, however, does include a “subtotal” measure where R does not, but the base information is the same.

Complete code below:

#read in files and call libraries
library(plyr)
library(dplyr)
players <- read.csv("C:/RFiles/people.csv")
awards <- read.csv("C:/RFiles/awardsplayers.csv")

#filter and merge data frames
awards <- filter(awards, yearID >= 2010)
players <- select(players, playerID, nameFirst, nameLast)
tbl <- merge(awards, players, all.x = TRUE)
nrow(tbl)

#create full-name column
tbl$fullname <- paste(tbl$nameFirst, tbl$nameLast, sep = " ")

#pivot on full-name text
x <- ddply(tbl, .(awardID, yearID), 
      summarize, names = paste(fullname, collapse = ", "))

#launch spreadsheet-like viewer of data frame
View(x)

Related: 5 Things Excel Users Should Know About R — Free Mini-Course

Related: 5 Things Excel Users Should Know About R — Free Mini-Course

Like this:

Related

Newsletter signup

Thank you for signing up. I look forward to becoming savvier about data with you.

Leave a ReplyCancel reply

Related: 5 Things Excel Users Should Know About R — Free Mini-Course

Related: 5 Things Excel Users Should Know About R — Free Mini-Course

Share this:

Like this:

Related

Newsletter signup

Thank you for signing up. I look forward to becoming savvier about data with you.

Reader Interactions

Leave a ReplyCancel reply