Several packages facilitate working with Python and Excel, but openpyxl
stands out as an excellent choice due to its comprehensive features for reading and writing workbooks. In this blog post, we will explore how to get started with creating Excel workbooks from scratch and modifying existing ones using openpyxl
.
For this lesson, I’ll assume you’re familiar with installing and working with Python packages, along with basic operations such as creating variables, indexing, and so on. If you need assistance with these topics before proceeding, consider checking out my book Advancing into Analytics:
First, I’ll import the necessary openpyxl modules for this lesson: workbook.Workbook
, which we’ll use to create workbooks, and load_workbook
, which, as you might guess, is used to load them.
from openpyxl.workbook import Workbook
from openpyxl import load_workbook
I plan to create an Excel workbook from scratch using openpyxl, which can be accomplished with the Workbook()
function. I’ll name this object wb
for easy reference later on.
wb = Workbook()
Creating a workbook directly from Python is quite impressive. However, the workbook alone isn’t very useful without a worksheet to populate with data.
Therefore, my next step is to create a worksheet object named ws
, which is assigned to wb.active
, referring to the active sheet of the current workbook. With this object now defined, it becomes possible to perform actions like naming the worksheet. This is done by setting the ws.title
to 'MySheet'
.
ws = wb.active
ws.title = 'MySheet'
Just like Excel, openpyxl can manage multiple worksheets. I plan to create a second worksheet named 'Another'
using the create_sheet()
function. By placing a 0
at the end of the function, I will position this worksheet as the first sheet in the workbook. This is because Python, unlike Excel, uses zero-based indexing.
ws2 = wb.create_sheet('Another', 0)
You can verify that this workbook contains two worksheets by using wb.sheetnames
. There, you will find 'Another'
and 'MySheet'
listed in the correct order.
wb.sheetnames
['Another', 'MySheet']
The last step is to save the results back to an Excel file using wb.save
, naming the file hello-openpyxl.xlsx
. When you open it in Excel, you will notice the two worksheets, correctly named and in the right order. Great job!
wb.save('hello-openpyxl.xlsx')
Modifying existing workbooks
Openpyxl isn’t limited to working only with brand-new workbooks; it can also modify existing ones. I’ll demonstrate this by working with the regions.xlsx
workbook, assigning it to wb2
using the load_workbook()
function.
wb2 = load_workbook('regions.xlsx')
Just like before, I’ll begin by identifying the active worksheet in the workbook, which I’ll refer to as active_sheet
. Then, I’m about to do something a bit mischievous: overwrite the existing, quality data in regions.xlsx
with some flawed, test data. I’ll do this by setting cell A1
of the active sheet to 0
.
active_sheet = wb2.active
active_sheet['A1'] = 0
Go ahead and save your work. Generally, I prefer not to save directly over my original data source, just in case I decide to review it later. Therefore, I’ll save the file under a new name, regions-modified.xlsx
.
wb2.save('regions-modified.xlsx')
When you open this workbook in Excel, you’ll notice that a 0
has replaced what should be labeled as Region
.
You can verify this by comparing it with the original regions.xlsx
file.
Conclusion
Of course, these tasks are quite basic, far simpler than those typically encountered in the business world. However, starting small and iterating gradually is a practical approach in professional settings. This is precisely the methodology you’ll explore in my comprehensive course on utilizing Python with Excel, available on LinkedIn Learning.
Please feel free to reach out if I can assist you or your organization in automating Excel-based workflows with Python.
You can also learn more about Python automation for Excel in my book Modern Data Analytics in Excel:
Leave a Reply