If you’re an Excel user interested in building one-of-a-kind visualizations and stretching your analytics limits, R is a great program to add to your toolkit. It’s also a natural stepping-stone into programming, as R’s core data structures are quite similar to Excel’s.
Here’s how:
Excel’s named ranges are a lot like R’s vectors
Named ranges in Excel
We’ll start the comparisons with Excel’s named ranges. This is a fantastic Excel feature that will let you refer to any given range of cells with one object name.
For example, I’ve created named ranges my_numbers
and my_strings
containing the sets of numbers and text in the following example. Find a blank cell in the worksheet and type =my_numbers * 2
. What happens?
The gist here is that when we define this data as a named range, we can refer to and operate on all that data in one fell swoop. No more absolute references and dragging-and-dragging formulas over to operate on a range.
This works for operations on numbers just as well as on strings. Check out the following examples and try it for yourself:
Vectors in R
Through the magic of named ranges in the previous example, Excel operated on all of the values in our range at the same time. This operation is known as vectorization, and it’s a powerful concept in numerical computing.
Guess what other program uses vectorization extensively? That’s right, R. In fact, R’s core building block is even called a vector, and it’s very similar to an Excel named range.
In the following example, we’ll combine multiple numbers into one R vector using the c()
function (standing for combine). This is very similar to naming a range of cells in Excel. We can then operate on the data very similarly to how we would in Excel.
Go ahead and click “Run” to see the results for yourself:
# This will get executed each time the exercise gets initialized
my_vector <- c(5, 6, 7, 3, 1)
# Create a vector
my_vector <- c(5, 6, 7, 3, 1)
# Multiply vector
my_vector * 2
# Take square root of vector
sqrt(my_vector)
success_msg("Great job!")
<-
) to create the variable a
.And yes, in case you were wondering, this will work on text just as well:
# This will get executed each time the exercise gets initialized
my_vector <- c('you', 'are', 'an', 'awesome', 'analyst!')
# Create a vector
my_vector <- c('you', 'are', 'an', 'awesome', 'analyst!')
# Convert to uppercase
toupper(my_vector)
# Get number of characters
nchar(my_vector)
success_msg("Great job!")
<-
) to create the variable a
.So far we've been working on so-called one-dimensional ranges of data (i.e. rows only). Now it's time to move to two dimensions: rows and columns. Vectorization is a powerful concept... too powerful not to use here too. The similarities between Excel and R continue.
Tables are like data frames
Excel Tables
If you're not already using Excel tables... do it! This is one of my favorite Excel features of all time, and one that an alleged single-digit percentage of Excel users are taking advantage of:
Excel tables not only look great, they make operating on data a breeze for much the same reason that ranges do: you can work on a whole series of cells at once, rather than one at a time.
The following example has a very small roster table with some basic operations: finding a mean, indexing the table, and creating a calculated column.
If you've not worked with table before, the syntax may be a little strange, but keep with it: to get the average height, we took the average of the height
column of the roster
table like so:
=AVERAGE(roster[height])
To index the contents of the table to get, for example, the value of the cell in the third row and second column, all we needed to do was use the INDEX()
function:
=INDEX(roster, 3, 2)
Finally, I created a calculated column to convert player heights from inches to feet. This is another vectorized operation! As soon as I create the calculation, it applies down the entire column. The @
column indicates to use the corresponding row of the height
column as the basis for reference:
=[@height]/12
(If you want to try this last one for yourself especially, I suggest downloading your own copy as it's hard to recreate using the embedded feature.)
R Data Frames
To know Excel tables is to love them... so does it follow that to know R is to love it, because its data frames are so similar? I'll let you be the judge with the following example.
A few things to know about the below syntax:
- The
$
in R can be used to access any individual column of a data frame. Go ahead and try it for yourself!- This notation can be used to create new columns in a data frame, and we indeed do that by getting the height in inches.
- To index the data frame, we place square brackets next to its name. The first argument is the row position we want; the second, the column position. This is so similar to what you did in Excel!
The data frame roster
has been already created for you. Go ahead and run the code to see the Excel similarities come to life:
# This will get executed each time the exercise gets initialized
roster <- data.frame(
name = c('Jack', 'Jill', 'Billy', 'Susie', 'Johnny'),
height = c(72, 65, 68, 69, 66),
injured = c(FALSE, TRUE, FALSE, FALSE, TRUE))
# Average height?
mean(roster$height)
# Third row, second column?
roster[3, 2]
# Calculate height in feet
roster$height_feet <- roster$height / 12
# Print data frame -- check out the new column
roster
success_msg("Great job!")
<-
) to create the variable a
.Recap
OK, these examples were not identical. But they're pretty similar. Here's a recap of how range-and-table-powered Excel compares to R. Vectorization FTW:
Operation | How it's done in Excel | How it's done in R |
---|---|---|
Multiply a range by 2 | =my_range * 2 |
my_range * 2 |
Index a table to get the item in the 3rd row, 4th column | =INDEX(my_table, 3, 4) |
my_table(3,4) |
Get the average of column x from table y |
=AVERAGE(y[x]) |
mean(y$x) |
Create a calculated column that is the square root of column x in table y |
=SQRT([@x]) |
sqrt(y$x) |
The differences
Of course, the analogy between these Excel and R data types isn't perfect. One major difference: In R, every element in a vector or the column of a data frame must be of the same type. However, this shouldn't come as too big of a surprise for you Power Query users: it's the same deal there with columns.
"What about Python?"
I can hear someone on the interwebs already asking about how Python fits in here.
Depending on what you're looking to do with coding, Python may be a fine tool for you to pick up. It does things well that R is just decent at, and vice versa. All that said, I personally find Python a little harder for data analysts to pick up because it "thinks about" in quite a different way than R or Excel. You can read more about that here:
Making pRogress
My major goal of this post was to show you that as an Excel user, R is well within your wheelhouse. So what questions do you have about picking up this tool? Let me know in the comments.
Leave a Reply