One of the central roles of an analyst is myth buster: what is assumed to be true that might not be? And why does it matter?
Excel is a legendary business application. But sometimes, legend passes to myth. And not all these myths are true, at least not anymore. Because, you see, Power Query has dulled so many of the naysayers’ claims that it should be considered the ultimate Excel myth buster.
A few important examples follow. I will also share some tweets expressing the myth “in the wild” to show these sentiments are real. This is done not to name and shame anyone; we’re all learning together, and some of these tweets were even true at the time of writing.
1. “Excel is not reproducible”
This is a very common criticism about Excel. I agree that reproducibility is important. And if you’ve ever felt totally lost in a spreadsheet because some random value was hard-coded or row was hidden and you can’t exactly make out what happened, then you understand it too.
Reproducibility means that an end user should be able to achieve the same result as the author, given access to the same input data and same process script. It’s like having a recipe you can follow to get the same dish from the same ingredients. Or at least that’s an analogy one of our misguided posters here have used:
The thing with Excel is it’s not exactly processing data as a script, and often the input data gets manipulated right over. So this makes reproducibility hard. To be fair to these posters, plain ol’ Excel is really not great with reproducibility.
But not so with Power Query.
Power Query follows the extract, transform, load (ETL) process to make a copy of the source data, apply a repeatable set of steps to transform that copy, then load it to the destination. That makes Power Query completely reproducible. No more tracking down what happened in a data cleanup, when. Every step is documented in the Applied Steps menu.
There’s a myth busted. What else we got?
2. “Excel does not have a true NULL
“
If you’ve used relational databases you’re familiar with the concept of a missing or NULL
value. Missing values do not equal zero! We don’t know why they are missing, they just are! But without a reserved keyword for NULL
s, it’s easy to mess up their storage and handling. Some users may keep them blank, others may hard-code a value like NA
.
NULL
s are also often helpful for conditional logic checks: if something doesn’t meet some statement, we may want a truly missing value to be returned:
Again, this objection is true in classic Excel. But Power Query does feature a dedicated NULL
, and you can use it as a conditional output. So another one bites the dust. Let’s keep going, shall we?
3. “Excel can’t process more than 1,048,576 rows”
OK, now this one may be my favorite myth to bust. The Excel-averse often make it their trump card: we’re in the age of “big data,” right? Excel can’t even go past ~1 million rows… that’s peanuts these days! Cue the Excel half-truths now…
It is still true that an Excel workbook itself can only contain 1,048,576 rows. But this does not mean that you can’t analyze more than a million rows in an Excel workbook!
How can this be? You guessed it — Power Query. Again, with this tool we can extract data from a variety of sources ranging from CSV files to databases and more. It’s true that we can’t load more than a million rows, but we are free to aggregate and summarize the data in Power Query, then load those results to Excel. And who wants to skim > 1 million rows, anyway?
For a more serious demonstration on how to blow past the alleged million-row Excel limit, check out this post from Master Data Analysis on analyzing fifty million rows using Excel Power Query:
And for something more humorous, check out my meme-video on the topic…
Let’s continue the grind, shall we?
4. “Excel only works with structured data”
Now this one I can be a little more sympathetic to, because in general Excel really isn’t the best tool for working with unstructured data. That said, it’s worth pointing out that Power Query does make it a little more possible and should expand our horizons about what Excel can do.
If you’re not familiar with the idea of unstructured or its opposite, structured data, think of how spreadsheets are laid out — in rows and columns. Generally there are some patterns and rules you can use to determine what type of data goes in those rows and columns. This is structured data in a nutshell. Excel thrives in this land of rows and columns.
Then there is the messier data like text, images, audio and so forth. Not real simple to cram this into a spreadsheet, right? I think most of the Internet would agree:
Again, I agree that Excel shouldn’t be the go-to solution when working with unstructured data.
That said, Power Query does make it possible to do these things. We can parse web data and even analyze text. You can see that in action with these posts:
I hope these Power Query possibilities are getting you to “think outside the cell…”
What myths can you bust?
Here we have four Excel myths, busted with Power Query. You even saw that I didn’t make up these myths… read the tweets 😉! From the shape and size of the data to work with, to how it’s processed, Power Query is a true game changer and myth buster.
Have you busted any Excel myths yourself with Power Query? Or are there any claims you’d like to try to bust? Let me know in the comments.
Leave a Reply