Ever heard the following?
- “Excel can’t handle millions of rows of data”
- “Excel can’t build reproducible data-cleaning processes”
- “Excel can’t collect and transform external data sources”
Power Query has definitively busted all of these myths.
This post provides the outline for a Power Query in Excel workshop which you can use as the basis for training at your organization. Or get in touch to partner on delivery.
A relatively new Microsoft offering, Power Query is an extract, transform and load (ETL) technology that is available for most newer versions of Excel.
If you’re not familiar with the ETL workflow — well, it works like it’s described: as a process to extract data from various sources, transform it so that it’s clean, efficient and generally consumable, and load it so that end-users can access and analyze it.
Importantly, this is done in a repeatable process — so that, rather than cobbling together a manual solution each time, the process can be reproduced on demand or at set intervals.
If you’ve spent any time as an Excel-heavy analyst, you’re familiar with the scenario I just described of ad-hoc, manual “ETL.” It’s not fun.
Power Query can be a game-changing shift from these processes. With it, you can ingest data from one or more sources, clean and transform it, and load it directly into a table or report…. which, in as many words, is basically an ETL process.
What’s even better, Power Query can do this across millions of rows, on all sorts of data sources, ranging from databases to the Web.
Check out my learning guide for a half-day introduction to Power Query below. I front-load the course with some conceptual understanding of what ETL is, and what makes data “tidy.” The reason is, well, “garbage in, garbage out:” if you don’t know what clean data looks like, and why Power Query helps you clean it, what’s the point of learning it?
Power Query can do a lot. The goal of this course is not to teach everything possible under the sun, but to show students how it can help in some of the most common data-cleaning tasks, and get them thinking about what they can do next.
This download is part of my resource library. For exclusive free access, subscribe below.
Lesson 1: Power Query and ETL technologies
Objective: Student can identify the traits and use cases of an extract, transform, load methodology
Description:
- Why extract, transform, load?
- Power Query and “Modern Excel”
- Exercises: Examine a messy dataset
Assets needed: Wholesale customers dataset
Time: 40 minutes
Lesson 2: Getting to tidy in Power Query
Objective: Student can use principles of “tidy” data to systematically clean data
Description:
- Reshaping data to tidy standards
- Connecting to data in Power Query
- Using the data profiler
Exercises: Form a plan for data cleaning, inspect a dataset
Assets needed: Dirty data, computer sales dataset
Time: 40 minutes
Lesson 3: Transforming rows
Objective: Student can perform row-wise data cleaning
Description:
- Sorting
- Removing duplicates
- Aggregating
- Filtering
- Filling
Exercises: Drills
Assets needed: Census dataset
Time: 40 minutes
Lesson 4: Transforming columns
Objective: Student can perform basic column-wise data cleaning
Description:
- Splitting columns
- Re-formatting text
- Changing data types
Exercises: Drills
Lesson 5: Transforming columns, continued
Objective: Student can perform intermediate column-wise data cleaning
Description:
- Concatenating columns
- Un-pivoting tables
- Creating calculated fields
- Appending tables
Exercises: Drills
Assets needed: Retail orders dataset, baseball dataset
Time: 40 minutes
Lesson 6: Joining data sources
Objective: Student can join two datasets together
Description:
- VLOOKUP() versus joins
- Inner joins
- Left outer joins
Exercises: Drills
Assets needed: Baseball dataset
Time: 50 minutes
Leave a Reply