Python was not designed for data analysis (and why that’s OK)

posted on July 2, 2022

A major reason I think it’s easier for Excel users to pick up R versus Python is that these tools tend to “think” more similarly than Python. See what I mean here: let’s take a range of numbers and attempt to multiply it by two using the built-in range, vector and list objects in Excel, Python and R respectively:

multiply-range-by-two-in-excel-r-python Download

Looks pretty straightforward in Excel and R, right? Take the range, multiply by two, get each number times two. By contrast, Python does something rather different: it literally takes the range, and duplicates it (so we get eight numbers not four). Weird, right?

Well, not necessarily. Excel and R were designed for statistics and arithmetic. Python was designed more generally to communicate with the operating system, process errors, and so forth. The way a program ought to “think” for these tasks is rather different than for analyzing data.

“You’re crazy, bud. Python’s cleaning up in the data space right now,” you may be thinking (pun intended). That’s true. And it’s with the help of a fantastic set of packages to make analyzing data there feel a lot more natural (You may have heard of some of these: pandas, scikit-learn, numpy, etc.).

This post isn’t a takedown of Python or endorsement or R. You could never pick a favorite child. It’s just an exploration of how software objectives inform software behavior, with a very simple example.

To get started with this great set of tools for data analysis, check out my book Advancing into Analytics.

Python was not designed for data analysis (and why that’s OK)

Like this:

Related

Want more Excel + AI insights? Join my newsletter.

Thank you for signing up. I look forward to becoming savvier about data with you.

Share this:

Like this:

Related

Want more Excel + AI insights? Join my newsletter.

Thank you for signing up. I look forward to becoming savvier about data with you.