Have you ever needed to pull the first record of a dataset? What about the last, or maybe even the seventeenth? This is called indexing. There’s a lot of data out there, and indexing gives us a set of rules to extract by position.
Except… not every program indexes the same way. In particular, there is a difference in how to count the element in the first position. Let me explaining using Excel, then an everyday computing example:
How Excel does it
Indexing can be done in Excel with — go figure — the INDEX()
function:
INDEX(array, row_num, [column_num])
Let’s take a look at how this plays out in both one and two dimensions.
One-dimensional
By one-dimensional I mean either a row or column of data. I will operate on a named range in the following example:
To get the third item, we pass 3
into the function.
Two-dimensional
By two-dimensional I mean an object with rows and columns. This will work similarly, just an extra argument. Note that I have stored the data in a named table — a good practice for any two-dimensional data.
Another way to count
So far, pretty intuitive. When you want to access the element you want, you start counting from one and that’s the index position. This is an example of one-based indexing. One-based indexing makes a lot of sense because as humans, we tend to start counting at one.
But computers don’t always start counting at one. Instead, they often start counting at zero. This is called (you guessed it) zero-based indexing. This may sound pretty foreign, but I’d like to show you an example of this you’ve probably seen before.
Imagine being so excited to get your hands on a dataset like this that you click “download” several times. You download folder will look something like this:
Did you notice that the second dataset is actually called dataset (1)
? The first dataset is just dataset… well, zero. This is zero-based indexing, and it happens all over computing, including Python.
How Python does it
To learn more about how to index in one and two dimensions in Python, check out the below Jupyter Notebook.
Computer programmers can have strong opinions about zero- versus one-based indexing, but you should be comfortable working with both: as you’ve seen, Excel is one-based, as is R, but Python and JavaScript, among others, are zero-based.
Want to keep counting?
If you’d like to learn more about Python, including indexing and pandas
, with the specific needs of an Excel user in mind, check out my book Advancing into Analytics: From Excel to Python and R
Leave a Reply