John Michaloudis reserves the last episode of his MyExcelOnline podcast to a roundup of the top Excel tips and tricks from community leaders.
I was happy to again be invited to participate in this podcast, and this year I chose to feature the Power Query data profiler. Be sure to listen to the full podcast; this post serves to accompany my spot.
Check out the podcast here.
I come in at about the 34:30 mark.
To demonstrate the data profiler, I’ll use a dataset of penguin measurements which is available for download here. This is a .csv
file stored on the Web; turns out that you can actually paste the URL right into the file explorer and it will import! To do so, head to the ribbon, then Data > Get Data > From File > From Text/CSV.
We’re not going to transform the data here, but click on Transform Data anyway so that we land in the Power Query Editor.
Now, go to the View tab of the ribbon in the Power Query Editor. The middle group says Data Preview. The last three radio buttons here will include some extra information about your columns:
Pretty cool, right? You get a histogram of your data as well as what percent of records are empty or contain errors. This is a great little way to size up your before you begin working on it.
One caveat here — by default you are only previewing the first 1,000 rows of your dataset! To change this, there is a tiny status bar at the bottom of the screen. Click on the area that says “Column profiling based on top 1000 rows” and change it to the option “Column profiling based on entire data set.”
Our penguins dataset only has 345 observations, so this fortunately did not make a difference, but it’s an unsettling quirk to be aware of!
To good friends and good spreadsheetes
Thanks again to John for hosting this podcast and to you for listening. I hope this tip is something you can use in 2021.
Leave a Reply