With AI assistants and Excel agents now capable of producing entire workbooks from a prompt, we’re entering a world where models can be created in seconds. That’s exciting, but it also raises an obvious question: how do we trust them?
Auditing Excel models has always been difficult. Excel is designed as an instant recalculation engine: formulas update automatically, dependencies stretch across sheets, and logic can be embedded almost anywhere. While this flexibility is one of Excel’s strengths, it also makes models surprisingly hard to inspect systematically.
Many firms sell specialized spreadsheet auditing tools, but today we also have another option. Because Excel files are structured data, we can analyze them programmatically. With Python, we can build lightweight utilities that scan workbooks, inspect formulas, and flag potential issues automatically.
This post walks through a minimum viable example of building a simple Excel model auditor with Python. The goal isn’t to replace professional auditing software, but to demonstrate how easily we can start analyzing spreadsheet structure with code.
For this walkthrough, we’ll use a small sample Excel workbook that contains a typical analytical model: a few worksheets, some input values, and formulas that generate outputs across the workbook. Our Python script will open the file, scan the cells, and begin identifying useful auditing information such as where formulas appear, where values are hardcoded, and how calculations are distributed across the model.
Python is a natural fit for this kind of task. It’s excellent at parsing structured files, analyzing patterns like formulas, and producing summaries that would be tedious to generate manually in Excel. As AI-generated spreadsheets become more common, tools like this will likely become just as important as the tools that generate the models themselves.
Understanding the script & its benefits
Before diving into the details of the script, it helps to understand the basic idea. Instead of calculating results the way Excel normally does, this program reads the workbook as structured data and inspects its contents for patterns that often signal modeling problems.
The script uses two Python tools to do this. The openpyxl library allows Python to open an Excel workbook and inspect its sheets, cells, and formulas directly. The re library provides regular expressions, which make it easy to detect patterns inside formulas such as numbers, cell references, or links to other workbooks.
Once the workbook is loaded, the script scans every worksheet and every cell. As it moves through the model, it performs a few simple checks that are common in spreadsheet auditing.
For example, it looks for formulas that contain hardcoded numbers. In many analytical models, important assumptions should live in dedicated input cells rather than being embedded inside formulas. The script removes cell references from each formula and checks whether any numbers remain, which can indicate hidden constants.
It also flags formulas that link to external workbooks, since these can introduce hidden dependencies that make models difficult to reproduce or maintain. The script then checks for numeric values entered directly into calculation sheets. Constants are allowed on designated input sheets, but values appearing elsewhere may indicate accidental overwrites or poor model structure.
Finally, the script looks for inconsistent formulas down a column. Many models apply the same calculation repeatedly across rows, so a formula that suddenly breaks the pattern may signal an editing mistake. By normalizing formulas and comparing their structure, the script can detect these inconsistencies automatically.
At the end, any potential issues are printed as a simple audit report showing the worksheet location and the formula or value that triggered the alert.
You might wonder why this is useful when Excel already provides tools like Trace Precedents or Show Formulas. Those tools are helpful for inspecting individual cells, but they are still largely manual and workbook-specific. A Python script like this operates differently: it can scan an entire workbook programmatically, apply consistent auditing rules, and produce repeatable results across many files.
Another important point is that everything here runs locally and is entirely rule-based. No AI is involved. The script simply reads the workbook file and applies a set of deterministic checks. That makes it easy to customize, automate, and integrate into larger workflows when auditing or validating spreadsheets at scale.
Understanding the results
At this stage the goal is not a perfect audit, but a quick structural scan that highlights places worth inspecting.
Here’s a quick summary of what the script detected:
- Several hardcoded constants placed directly in the Calculations sheet (cells A2, A3, A4, and C2). In many models, values like these would typically live on an Inputs sheet rather than inside the calculation layer.
- A formula containing a hardcoded multiplier (=B4*12 in D4), which could represent a hidden assumption instead of referencing an input cell.
- A few inconsistent formula patterns down columns in the Calculations sheet, suggesting the formulas may not follow the same structure from row to row.
None of these automatically mean the model is incorrect. But they are exactly the kinds of patterns auditors often look for when reviewing spreadsheets.
Next, let’s open the workbook and look at these flagged cells directly.
Harmless flags: timeline labels
Some of the findings do not concern me too much.
Cells A2, A3, and A4 on the Calculations sheet were flagged as hardcoded constants, but in this case they simply represent year numbers used to label the timeline of the model. In many spreadsheets this is perfectly reasonable. While it would be possible to generate these dynamically using a function such as SEQUENCE() so the timeline expands automatically, that level of flexibility is not always necessary.
Why the Inputs sheet was not flagged
You might also notice that constants on the Inputs worksheet were not flagged by the script. This behavior is intentional and reflects a common modeling convention.
Earlier in the script, we defined which sheets are allowed to contain constants:
input_sheets = ["Inputs"]
Later in the audit process, the script checks whether a numeric value appears in a worksheet that is not designated as an input sheet. If a number appears outside those allowed sheets, it is flagged as a potential hardcoded constant.
This mirrors a common spreadsheet design principle: inputs should be centralized in dedicated input areas, while calculation sheets should primarily contain formulas.
A real modeling issue
However, one finding does appear to represent a genuine modeling problem.
The script flagged the following cell:
Hardcoded constant cell
Value: 25000
Cell: Calculations!C2
In this case, the Year 1 revenue value has been entered directly as a number rather than being calculated from the underlying assumptions. In most analytical models, revenue should be derived from inputs such as Units and Price rather than hardcoded in the calculation sheet.
If those inputs exist on the Inputs worksheet, this value should likely be calculated as:
Revenue = Units * Price
A hardcoded revenue value bypasses the assumptions driving the model and can easily lead to inconsistencies if the inputs change.
A double offender
Another issue appears in cell D4 on the Calculations sheet, which the script flagged twice:
Hardcoded number inside formula
Formula: =B4*12
Cell: Calculations!D4
Inconsistent formula pattern
Formula: =B4*12
Cell: Calculations!D4
This cell violates two different auditing rules.
First, the formula contains a hardcoded constant. The multiplier 12 is embedded directly in the formula rather than referencing a value from the Inputs sheet.
Second, the formula breaks the expected pattern down the column. Earlier rows appear to follow a different structure, but this row switches to =B4*12, which causes the pattern check to flag it.
More importantly, the calculation itself appears incorrect. If this column represents cost, the correct logic would typically be:
Cost = Units Sold * Cost per Unit
Instead, the formula effectively calculates:
Cost = Units Sold * 12
which disconnects the calculation from the cost assumptions stored on the Inputs sheet.
A useful false positive
Still other findings are more nuanced. The script also flagged these formulas:
Inconsistent formula pattern
Formula: =B2*(1+Inputs!B4)
Cell: Calculations!B3
Inconsistent formula pattern
Formula: =B3*(1+Inputs!B4)
Cell: Calculations!B4
At first glance, these appear to be inconsistencies. In reality, the formulas are correct.
These rows are applying a growth rate from the Inputs sheet:
Next Year Value = Previous Year Value * (1 + Growth Rate)
The reason the script flags them is that year 1 typically does not have a growth calculation. The first year is usually a starting value, while later years grow from that base. Because of this, the formula pattern naturally changes after the first row.
From the script’s perspective this looks inconsistent, but from a modeling standpoint it is perfectly reasonable.
The limits of rule-based auditing
This highlights an important limitation of rule-based auditing.
Structural checks can quickly surface patterns that often indicate problems, but they cannot fully understand the intent of the model. As a result, some findings will inevitably be false positives that require human judgment.
In other words, scripts like this are extremely useful for narrowing down where to look. But catching every issue in a complex spreadsheet remains surprisingly difficult.
Conclusion
In the real world, a production-grade spreadsheet auditing system would take years to design, build, and deploy. Large firms that specialize in spreadsheet risk management invest heavily in tools that can analyze complex models, track dependencies, and enforce organizational standards across thousands of workbooks.
But deep down, many of those tools rely on the same basic idea we explored here in just a few minutes: treat the Excel workbook as structured data, scan it programmatically, and look for patterns that often signal modeling problems.
Even a small script can quickly surface issues such as hardcoded assumptions, broken formula patterns, or hidden dependencies. It won’t replace human judgment, but it can dramatically narrow down where to look.
That idea is becoming more important as Excel itself evolves. With AI assistants and Excel agents now capable of generating entire workbooks from a prompt, the challenge is no longer just building models. It’s understanding, validating, and maintaining them.
In other words, the tools that help us inspect and audit spreadsheets may become just as important as the tools that help us generate them. This simple example is just a starting point.
If you’re interested in exploring more workflows that combine Excel, Python, and modern AI tools to build more reliable analytical models, I share a growing collection of guides, experiments, and practical examples inside my membership.
You can learn more about it here:
It’s where I’m documenting many of the patterns and tools I’m experimenting with as Excel, automation, and AI continue to evolve.
