Back in the LinkedIn poll craze of 2021 I will admit I, like many users, got a little carried away. To my credit, all (or at least most ๐ผ) of my polls were done in good faith to generate a professional conversation. Some of them got some enviable engagement. And here’s one of them: where should Power Query feature in an overall Excel learning path?
You can view the poll results on LinkedIn; with over 1,000 votes, about 2/3rds of respondents supported including it in a beginners’ course.
Now to what extent Power Query should be featured isn’t clear from the question, and there were some great comments on the poll working that out. Back then, I tended to agree with the idea that Power Query could be mentioned toward the end of the course with maybe a couple of use cases demonstrated to whet the whistle. But no way should it be bumped for the standard suite of formulas and functions, right?
But after thinking about it more, I’m going to say Power Query should be introduced earlier and more forcefully than that. Way earlier. In fact, I’d say that we should re-engineer beginner Excel courses to favor Power Query first. And because Power Query is built off Excel Tables, I’d also say that users should learn from the get-go to store their data in Tables. This reimagined beginner course would start with table syntax, then lead directly into Power Query. Only later would learners get into formulas, referencing and so forth.
To be fair, some die-hard Excel experts eschew Tables. Others might think that Power Query is just too difficult for a brand-new Excel user to pick up. These and more are legitimate reservations. But I think this approach is the way forward to help more users prosper in the long run with Excel.
Tables are the gateway to Power Query (and so much more…)
Let’s start with Tables. You’re not going to get very far in Power Query without them, because any data you’ve got in a worksheet that you’d like to feed into Power Query needs to be stored as one (or a range). But Tables in and of themselves are worth learning about, more so than just as a medium into Power Query.
This is not a tutorial about Tables, although if you would like one I’d check out this recording from Celia Alves or the book Excel Tables from Zack Barresse and Kevin Jones. I do want to call out how helpful they are as a stepping stone into not just Power Query but eventually other tools.
When you work with Tables:
- Headers and totals are seen as distinct entities which you can interact with using distinct syntax. These rows aren’t really part of your data per se. They are metadata and should be treated differently, yet equally programmatically. Tables do that.
- Column references are dynamic. You’re not hard-coding formulas and references that could change if data is added and deleted. Programs that are designed for future growth are called extensible. Tables are extensible.
- Heck, even formatting is easier with Tables. You don’t have to futz around with trying to add a fill color to your header column or to include totals. That’s all done with the click of a button.
- The syntax to work with Tables is eerily similar to those with R, as I demonstrate in this post. So tables aren’t just the gateway to Power Query but many other programming topics:
As I mentioned, my love for Tables is not universally shared, but it’s by no means a fringe opinion. You can read more about that on the comments of this LinkedIn post:
I’d love to read even more comments — what do you think about Tables? Let me know here or on the LinkedIn post. In my opinion, any frustrations with them are greatly outweighed by the benefits, and beginners should get in the habit of storing data inside Tables right away.
Power Query enforces good computing habits
Now let’s really kick it up a gear: moving Power Query way up into the curriculum. Again, this is not a tutorial on Power Query; for that, check out this post over at Excel Campus.
Let’s focus on what sorts of tasks would be on the table (no pun intended) for introducing Power Query early, and what they would replace:
- Sorting/filtering rows and calculating/removing columns in Power Query versus base Excel functionality.
- Ditto for manipulating strings and dates.
- Teaching relational joins before
VLOOKUP()
,XLOOKUP()
, or whatever your lookup function of choice is. Relational joins aren’t going anywhere and are used in most analytics tools.
People dunk on Excel because its users are allowed to build these inauditable monstrosities that cause errors in everything from academic research to investment banking. The problem comes down to this: Excel users rarely think like programmers when putting their work together. They are not trained computer scientists or developers, and this is not a knock on them. They shouldn’t have to be to work well with data, just as drivers shouldn’t have to be Formula One racers to earn their wheels. Power Query is a step toward democratizing good computing habits and averting spreadsheet disasters.
Many of Excel trainwrecks could be averted if users just learned some basic data hygeine and standards:
- Raw data should not be tampered with willy-nilly. You query it, then do your manipulations. Save your receipts.
- Data should be stored in a tidy format, with each variable in its own column and value in its own cell.
- Your work should be reproducible so that anyone can click through from Point A to Point Z and see what you did. The Applied Steps do this. Power Query being driven by M code under the hood does this.
- In a two-dimensional dataset, each column should ideally be labeled and must contain data of all the same type. Don’t mix and match random bits of data so that you can no longer “see” a rectangle from your data.
- Missing values should be treated systematically as
NULL
s. It’s too easy to play fast-and-loose with missing values in base Excel as there’s no standard value to represent them.
There are many other basic computing habits that Power Query reinforces or expects. Let me know what I missed.
If we can get more Excel users on board with these habits, think of how much more productive and efficient many teams could be. It’s like getting a driver’s license, but for data. You understand the rules of the road for how to work with data. Not only that, but you now have a vehicle that will get you to new and exciting data destinations:
Power Query broadens horizons
Once users get the hang of doing basic row- and column-wise operations on a dataset in Power Query, it’s a hop, skip and jump to some very interesting places:
- Unique identifier columns and lookups lead naturally to relational joins
- When learners start to consider how to manipulate relationships between tables, they can start to learn the basics of data modeling
- As learners get to a place where they understand relational database operations, they can naturally add SQL to their toolkit
- Power Query talks to more than just Excel tables — learners will see the possibilities in blending data from different sources, or even working with metadata such as a list of files in a folder
To be fair many of these topics really fit into a beginner’s course, but perhaps one could nod to them toward the end just to reinforce what a gateway Power Query represents. After all, analytics tools are best thought of as in a stack; they complement rather than substitute each other. Power Query is such a good gateway to exploring this stack. Heck, it’s even used in another slice anyway with Power BI.
Is this too hard?
One of the objections I see coming is that Tables and Power Query are just too difficult conceptually. I think this underestimates what from a beginner’s perspective will click:
- Many seasoned Excel users find Table notation a little intimidating because it follows different rules than base Excel. But let’s step back from not just what we are used to but to what will be a more comprehensive approach for beginners. Tables come with dynamic references and use syntax that foots with other programs like R. Plus, they look nice. Base Excel does not include any of these.
- Power Query comes with an Applied Steps menu so the user can look back at every step they took. Almost everything they will want to do requires no coding. And their original data source is not impacted.
Are either of these truly somehow worse or harder experiences than base Excel? Instead I think the bias just comes from what we experienced Excel users are familiar with. It’s understandable, but it’s time to push through.
Great… now what?
Did I convince you to move Tables and Power Query way up in the Excel curriculum? Why or why not? Let me know in the comments.
If you are looking to get started, some ideas:
- Perhaps the best proof point is to take a day for training and see just how much your staff can grow wtih Excel Power Query. That is precisely what my Click-and-clean data in Excel Power Query workshop will do:
In this workshop we follow the same principles espoused here: starting wtih tables and the “mental model” of Excel, then moving directly into data cleaning with Power Query.
If you are looking to build out a longer, more holistic Excel course with this philosophy, here are some ideas to start with:
- Get started with Tables near-immediately. Like, after users know how to enter
=1 + 1
into the formula bar. Or at least get to it as soon as you’re starting to work with rows and columns. Getting the hang of table syntax may take 5-10% of the total course. Users should walk out of the course assuming that any source data they’ve got in a spreadsheet should be stored as a Table. - Then maybe 40% the course should be about Power Query. It doesn’t have to be anything exotic, really it shouldn’t. Just use all the menu-driven options. Learners should know how to sort and filter columns, for example.
Again, these are the sorts of topics that most Excel courses cover anyway, we’re just going to do it in Power Query versus base Excel. I have a resource guide you can download that walks through what an all-Power Query half-day workshop might look like. Use this to help guide a Power Query-heavy course.
- This leaves another half of the course to cover the traditional Excel bread-and-butter topics such as data visualizations and PivotTable. Learners should absolutely have some exposure to formulas, but I would say even how formulas should be reimagined with the emergence of dynamic arrays.
So we’re really not getting rid of the traditional Excel toolkit. We’re just diversifying it to include Power Query. And because Power Query can probably solve maybe half of what Excel users get stuck on, a course should reflect that commitment.
Ready to reimagine Excel instruction? I hope to expand on this topic as interest allows, so if you’d like to read more please let me know. Or maybe I should go back to posting LinkedIn polls about it? ๐ (The algorithm seems less keen on displaying those, for better or worse — mostly better.)
And if your team could use a rethink of how Excel is used for data analysis, please do get in touch.
I’d like to thank everyone on LinkedIn who provided helpful questions and comments over LinkedIn about these topics, especially Celia Alves, Don Tomoff, Paul Barnhurst and Christopher Reilly.
Leave a Reply