AI is often misunderstood as a magical solution, leading people to believe it requires no user effort. But experienced Excel users, who have faced many data challenges, know better. To unlock AI’s full potential, one must adhere to fundamental principles that improve data readability and interpretability for the computer.
To maximize AI in Excel, there are essential techniques worth mentioning, including converting data to tables, performing data cleaning, and arranging it neatly. Let’s explore these steps through a hands-on demo.
The dataset captures sales data by location and department during a hypothetical 4th of July weekend:
While using AI for this dataset might seem contrived, it serves as valuable practice for handling more realistic datasets in the future.
Format your data as a table
To effectively utilize AI or analytics features in Excel, the first crucial step is to convert your data into a table. Not only does this facilitate loading your data into Power Query, but it also eliminates potential issues that may hinder AI processing, such as multiple headers or unnamed columns.
For a comprehensive understanding of the advantages of storing data in table format, refer to this post. However, for now, simply click on any cell within the sales dataset and press Ctrl + T
to insert a table, defining the range as cells A2:G21
. This will set up your data in a table format, preparing it for further AI analysis.
Excel gets confused about which cells constitute the table, often trying to include the first merged cell, leading to various problems. Additionally, it assigns a generic header name to your blank columns. Consistently named headers in a dataset are crucial for AI algorithms as they provide context, organization, and feature identification, enabling accurate data analysis and interpretation.
Make your data consistent
Not only headers but the rest of your data should also adhere to consistent formatting rules. Following these rules allows AI to effectively process and analyze data, recognizing patterns, extracting insights, and making reliable predictions. Some preparatory tasks include:
- Trimming excess spaces
- Changing the case of text
- Splitting columns
- Removing blank columns
- Removing duplicate rows
Let’s take a look at performing the last two. Both will be done using Power Query. I will assume you are familiar with loading this data into Power Query. Check out this post if you’d like more practice.
First, removing blank columns from the dataset reduces noise and data complexity, enabling algorithms to concentrate on relevant information during analysis and modeling.
To remove these columns in Power Query, click on Column1
, hold down Ctrl
, then click on Column2
and press Delete
:
Next, let’s address what appears to be a duplicate row in the data: San Francisco’s sales for July 4th. As subject matter experts and data analysts, we understand that all stores were closed on July 4th. It is highly improbable for the sales figures to be identical to those for July 3rd. This strongly suggests an error.
To handle this, we’ll filter out the problematic row by clicking on the dropdown arrow on the date column and removing the sales data for July 4th.
Without specific rules or context, the AI algorithm would not recognize this as an error. Our expertise allows us to identify and address such anomalies effectively.
Use tidy data
To ensure a “tidy” dataset, as pioneered by Hadley Wickham, the primary principle is to have each variable in its own column. Let’s take a closer look. If we find columns measured in the same unit, it indicates untidy data. Ideally, variables sharing the same measurement should be combined into one column.
In our dataset, the final three columns—electronics
, apparel
and outdoors
—all represent sales data. To tidy it up, we should consolidate these into a single sales
column and add a second column to categorize each sale based on the department.
With Power Query, performing this task has become effortless. We can simply pivot the data. By selecting these columns while holding down Ctrl, we can right-click and choose “Unpivot Columns:”
After completing the previous steps, rename the columns as department
and sales
by double-clicking on them.
Next, load the results back into Power Query. By default, the results will load in the form of a table. With this tidy dataset, working with AI and machine learning becomes significantly more straightforward and efficient.
Last but not least… develop the mindset
Follow these rules to reach your goals, but remember to develop a sixth sense as well. Understanding how machines read data, utilize algorithms, and interact with the technology stack is crucial. To grow this sixth sense, check out The Digital Mindset: What It Really Takes to Thrive in the Age of Data, Algorithms, and AI by Paul Leonardi and Tsedal Neeley.
What questions do you have about getting the most of AI for Excel? Have you discovered other principles that optimize your results? Let me know in the comments.
Leave a Reply