Many organizations resort to storing data in xlsb binary format for their workbooks, primarily because they find themselves grappling with files that are simply too large and fragile. While it may seem like a practical solution, using this format is, in fact, an unsafe practice that can lead to numerous issues.
This post will explore the drawbacks of xlsb, discuss the root causes of workbook bloat, and highlight how adopting modern Excel features like Power Query and Power Pivot can provide safer and more efficient alternatives.
The pros and cons of using xlsb
The xlsb (Excel Binary Workbook) format offers some significant advantages over the regular xlsx filetype, but it also comes with notable drawbacks that users should consider.
One of the main benefits of using xlsb is its reduced file size. Binary workbooks are generally smaller than their xlsx counterparts, which can save storage space and make file sharing more efficient. Additionally, xlsb files tend to open and save more quickly, enhancing productivity, especially when working with large datasets.
However, these advantages come with several significant disadvantages. A primary concern is the increased risk of file corruption. When an xlsb file becomes corrupted, recovering the data is often challenging, if not impossible.
Power Query, an essential tool for data transformation in Excel, also has limitations when dealing with xlsb files. Users frequently experience hang-ups and performance issues when attempting to use xlsb as a data source. Furthermore, Power Query will update data from xlsb files with the latest data, even if the file hasn’t been saved. This can lead to potential ‘dirty’ reads, where the data being analyzed may not reflect the most recent saved state, causing inconsistencies and errors in data analysis.
Another drawback is the performance lag associated with xlsb files, particularly when working with formula-heavy workbooks. Many users report noticeable delays when updating formulas, which can make working with complex spreadsheets less efficient and more frustrating. This lag undermines the time-saving benefits of the format, leading to a less optimal user experience.
In summary, while the xlsb format offers reduced file size and faster opening and saving times, it also introduces risks of corruption, limitations with Power Query, and performance lags with formulas. Users need to weigh these pros and cons carefully to determine if the benefits of using xlsb outweigh the potential drawbacks for their specific use cases.
The underlying causes of workbook bloat
Understanding workbook bloat is essential for addressing the issues that drive organizations to use the xlsb file format. One primary cause is copy-pasting large datasets directly into Excel, leading to redundant data and excessive file sizes. This practice is inefficient and prone to errors, as multiple copies of data can create inconsistencies within the workbook.
Another significant factor is the heavy use of functions like VLOOKUP()
and other intense calculations. Extensive use of VLOOKUP()
across large datasets can significantly slow down workbook performance. As a volatile function, VLOOKUP()
recalculates every time a change is made, causing noticeable performance issues, especially with large datasets [Note: Excel MVP Jon Peltier left a clarification about VLOOKUP()
‘s status as a volatile function in the comments — take a look to understand the distinction. Thanks Jon!]
Duplicated data sources also contribute to workbook bloat. Multiple copies of the same data within a workbook add unnecessary bulk, often due to poor data management. Avoiding duplication and managing data sources effectively can help maintain a leaner, more efficient workbook.
Modern Solutions: Power Query and Power Pivot
Instead of resorting to the xlsb format, organizations should adopt modern Excel features such as Power Query and Power Pivot. These tools offer robust solutions to the issues that cause workbook bloat.
First, Power Query allows users to import data from various external sources, perform transformations, and load the cleaned data into Excel. By using Power Query, users can avoid copy-pasting large datasets directly into their workbooks, thereby reducing file sizes and improving performance.
Next, Power Pivot enables users to create data models within Excel, allowing for relational data analysis. This feature lets users build PivotTables from multiple tables without using VLOOKUP()
or other manual data merging techniques. The relational model simplifies data analysis and reduces the computational load on the workbook.
The benefits of modern Excel features
Adopting Power Query and Power Pivot offers several key benefits:
- Reduced file sizes: By avoiding the direct import of large datasets and leveraging efficient data models, workbooks remain leaner and more manageable.
- Improved performance: Transforming data externally and using relational models for analysis reduces the need for computationally intense functions, leading to faster and more responsive workbooks.
- Enhanced data integrity: Power Query ensures that data transformations are consistent and reproducible. The use of external data sources reduces the risk of errors and maintains data integrity.
- Scalability: Modern Excel features are designed to handle larger datasets and more complex analyses, making them more scalable solutions for growing organizations.
- Compatibility and integration: Unlike xlsb, data managed through Power Query and Power Pivot can be easily integrated with other tools and platforms, ensuring broader compatibility and more seamless workflows.
Conclusion
Storing data in xlsb binary format may seem like a quick fix for managing large and fragile workbooks, but it is fraught with risks and limitations. By understanding the root causes of workbook bloat and leveraging modern Excel features like Power Query and Power Pivot, organizations can create more efficient, reliable, and scalable data management practices.
For those looking to dive deeper into these powerful tools, I invite you to explore my book, Modern Data Analytics in Excel:
This comprehensive guide will help you harness the full potential of Excel’s advanced features, transforming how you manage and analyze data in your organization.
What questions do you have about transitioning from using xlsb files to modern Excel? Let me know in the comments.
Jon Peltier
Strictly speaking, VLOOKUP is not volatile. It does not recalc whenever any arbitrary cell is changed. It does recalc if any cell in its lookup table is changed, even if the cell is not in the return column. This means VLOOKUP recalcs when its result would not change, so it behaves as if it is partially volatile. Presumably XLOOKUP behaves as if partially volatile regarding changes in its return array.
George Mount
Thank you Jon for making that clarification! Will make note of this in the post.