Whether from multiple tables of a database or from different systems altogether, working in Excel often means cobbling a worksheet together from disparate data sources. In this post we’ll look at how to handle this situation using lookup functions, Power Query joins and finally Power Pivot relationships. Each has its pros and cons.
This demo consists of data from Sean Lahman’s baseball database. We have a hof
table containing information about hall of fame nominations:
people
table
hof
table
We’d like to learn more about the nominated players, such as their full names, from the people
table.
Classic Excel: The Frankentable
The classic approach to working with multiple sources in Excel is to combine them into table one using lookup functions like VLOOKUP()
or its modern equivalent, XLOOKUP()
:
While this is a familiar approach for many, it comes with some disadvantages.
First, the output of a lookup function is rather static — it simply adds columns to an existing table, rather than create a new data source altogether. Moreover, lookup statements must be written one column at a time, which makes this solution difficult to scale.
A big honkin’ table of lookups can also wreak havoc on your computer’s memory. When you use a lookup function, Excel needs to store and then search through all of the lookup data using memory. I remember old workbooks getting increasingly unwieldy as more data was added — the computer just didn’t have the memory to spare. I call these monstrous, unmanageable Excel datasets “Frankentables.”
While steps can be taken to improve the performance of lookup functions, it’s also worth checking out modern Excel alternatives.
Power Query: The merge
One way to join these sources which closely resembles XLOOKUP()
is the Power Query join. Specifically, the left outer join will look for any matches in people
and return them to hof
:
This approach is more scalable than XLOOKUP()
as any number of lookup columns can be merged at the click of a mouse. Not only that, but a relational join is far more flexible in how its output is returned than XLOOKUP()
. Power Query also makes it easy to audit the entire project workflow and leaves the original data sources intact.
That said, this method takes a bit more overhead than XLOOKUP()
as the data sources must be read into Power Query, operated on in the separate Query Editor, and loaded back to Excel. The result? Kind of like another Frankentable, with data from two sources consolidated into a potentially massive data table — not exactly the most efficient method.
Power Pivot: The relational data model
Let’s take a look at another modern Excel approach to combining people
and hof
: Power Pivot. This is arguably the most elegant but most complex solution: rather than create a new table from our data, relationships are defined and manipulated in a data model. This lets the user create a PivotTable from multiple sources without actually combining them into a single table.
Not only that, but Power Pivot includes the DAX formula language to create complex calculations on the data. For example, we can find what percentage of nominations were inducted for each year:
Because no “Frankentable” needs to get stored in memory and DAX measures get calculated only on-demand, Power Pivot is a very efficient way to work with multiple data sources. That said, it has a notoriously steep learning curve, and many users find it difficult to work with a data source that has not physically been combined into one table.
It depends what’s best
A summary of the pros and cons for each method of combining data sources follows:
Tool | Pros | Cons |
---|---|---|
XLOOKUP() |
– Easy to understand – Available in native Excel |
– Limited output flexibility – Columns looked up one at a time – Memory-intensive |
Power Query | – More control over output – Easier to audit and maintain |
– Relational joins can be confusing – Additional overhead in loading to Power Query |
Power Pivot | – Can create complex data models – Built-in calculation and aggregation functions |
– Complex to set up data model – Steep learning curve – Relational modeling unfamiliar to many Excel users |
Which method is best? As the car commercials say, “Individual results may vary.” Given individual circumstances, one might make more sense than the others. None are inherently better — what’s important is the perspective to situate the pros and cons of each.
Leave a Reply