Let’s be honest, there are a lot of terms in tech that sound way more complicated than they really are. Maybe it goes by an acronym for peak confusion. But peel back the concept and it makes perfect sense.
ETL, or extract, transform, load is one of those terms. You’ll hear database administrators and data engineers fawn and hype over their “ETL pipelines” and “ETL tools.” Sounds like something only a certified data geek can do, right?
In fact, Power Query serves to democratize the process of ETL right to an Excel spreadsheet. Let’s see how that works, and what ETL does.
ETL = extract, transform, load
Take a moment to head to the official Power Query page on Microsoft. At the time of writing, the first sentence to explain the tool: “Power Query is the easiest way to connect, extract, transform and load data from a wide range of sources.”
Don’t let the techbros intimidate you: the meaning of ETL is right here plain English, and can be done in Excel! Let’s walk through each step in Power Query now. Go ahead and open Excel. Make sure your version of Excel includes the Power Query tool.
1. (connect and) Extract
The first step of ETL is to “extract” the data to an outside source. To do that, you of course need a way to connect to the data. Excel Power Query can connect to an outstanding variety of data sources, not just Excel workbooks! Some of these include:
- Text and CSV files
- Databases
- Sharepoint
- XML, HTML, Web data
For right now, let’s just stick with a plain old Excel workbook. Follow along with this file:
Our first order of business is to extract the data from the housing
table. Now it seems a little strange to be “connecting to and extracting” data that we already have, right in this workbook. It’s arguably overwrought, even.
What makes this such a powerful idea is that Power Query is forcing us to keep the raw data intact. We must take an extract of it (There’s the E). So let’s connect to this data, then transform it (There’s the T!).
To do so, click anywhere inside the housing
table, then go to Data > Get Data > From Table/Range:
Congratulations on getting the ETL started! An E for effort… and extract! Let’s move to the T.
2. Transform
The next step is to transform the data. This could be a whole lot of things, but it’s really all the steps you must take to make this data usable, such as:
- Sorting or filtering rows
- Adding, dropping, renaming or calculating columns
- Grouping by and aggregating categories
- Merging multiple data sources
Let’s make a very simple transformation to this data: we will add an index column. To do this, go to Transform on the Power Query ribbon, then Index Column > From 1:
There’s of course a lot more you can do with data transformation in Power Query, and most of your time is going to be in this step of ETL. But let’s move to that last leg, or L for load (say that 10 times fast!)
3. Load
Finally, click Home on the Power Query editor, then Close & Load. You will see the “transformed extract” of your housing
data loaded into Excel:
Congratulations for completing an entire ETL job!
- You extracted the raw data from an Excel table
- You transformed the data using the Power Query editor
- You loaded the results back to Excel
This is of course a quite diminutive ETL job. But regardless of complexity, Power Query lets you build repeatable data cleaning processes using a variety of sources. Very helpful, if I can say myself.
Want to compare your workbook to mine? Here is that:
What questions do you have about Power Query in general or ETL in particular? Let me know in the comments.
Leave a Reply