When many hear “data analytics” these days, they think of graphical user interface (GUI)-driven business intelligence (e.g. Tableau), data warehousing (e.g. Snowflake), or data preparation (e.g. Alteryx) platforms. These tools have their place (some more than others) in the analytics stack.
But rather than focus on these tools in my book Advancing into Analytics, I teach readers the two major data programming languages, R and Python. These are often considered data science tools. Many successful analysts don’t know them and don’t see the need. But I believe data analysts have great reason to learn how to code. Here’s why:
It’s a knowledge investment that pays off
If I had a dime for every time I mentioned that analysts spend 50 to 80 percent of their time preparing data, I might not need to write a blog.
So, how do we lighten that load? Traditionally, Excel wonks have made great use of keyboard shortcuts to speed up their workflow. UX research does indeed indicate that using the keyboard shows more productivity gains than using a mouse. Let’s extrapolate that to understand that coding is generally more productive than pointing-and-clicking. Of course, it takes longer to learn the former. So this becomes a break-even decision of code versus GUI.
For the early stages of a project, or one-off needs, a GUI could be fine. But there’s something to be said for “codifying” a project such that it can be automated. In a job with this much grunt work, it’s a learning investment that pays off.
Maybe future technologies can use NLP or augmented reality to provide other options than coding versus GUI. I’m pretty impressed, for example, with the AI-embedded tools of Power BI: start typing what kind of chart you want, and it’ll make its best guess. More stuff like this, and maybe the productivity gains of code versus GUI aren’t so clear. But for now, the calculus is still favorable to learning how to code.
GUI tools are running it under the hood anyway
Now, the choice between code and GUI isn’t always so clear-cut. In fact, VBA and now Power Query offer some menu-driven tools to generate syntax. Some business intelligence tools are offering the same for machine learning, usually powered by R or Python behind the scenes.
I don’t know about you, but I nearly always have some requirement that can only be accomplished by coding in these frameworks. Every data project is different — have you ever struggled to search-engine the answer of some task you’re looking to do? GUIs tend to limit your options with the tradeoff that there’s a lower barrier to entry. This isn’t always a tradeoff that rank-and-file analysts can make.
Maybe you can think of code as the engine powering your GUI under the hood. You are a long-haul analyst and need to be able to pop the hood and make the necessary adjustments when the GUI just can’t seem to ignite the fuse. Some coding knowledge is important, even if you’re working with tools that generate it for you.
It’s exposure to open source communities
Origin stories matter, and it’s no different for software. Many low- and no-code tools are proprietary. They may be easier to use and harder to break, but someone’s dropping a lot of money for that convenience and support.
On the other hand, many data programming languages are open source. This means that anyone is free to build on, distribute, or contribute to the software. In other words: open source is great freedom and great responsibility. Closed versus open source offer opposing worldviews, and it’s important to follow the implications of either.
Every analyst should have some direct exposure to open source due to the major impacts it’s had on technology and data in the last decade. What are the pros and cons of open source? What is a package? What is a pull request? How do companies make money off open source, anyway? Analysts should be able to answer these. They can do it through direct experience in the open source world, by learning R and Python.
I see too many data analysts totally commit to the offerings of one proprietary vendor. This tends to induce myopia in what an analyst knows about (They may start to think their vendor invented the left outer join, which has been around for decades). Committing to one closed source vendor leaves so much off the table in today’s analytics landscape.
Hey, if Microsoft is playing nice with open source, then maybe you should too.
It’s the surest way to reproducible research
The idea of reproducibility in data is that given the same inputs and processes, the same outputs should result time and again.
Now, a lot of people have piled onto Excel because it’s not always reproducible. It’s easy not to know if someone deleted a column or where exactly originally someone got the data to make a graph from, before they started chopping it off into worksheets. (Power Query busts many of these Excel myths, but let’s overlook that for now.)
The solution to this downside is often to adopt some kind of expensive business intelligence tool. The irony is that while BI tools have their benefits, reproducibility isn’t necessarily one of them: they, like Excel are GUI-driven, and it can be hard to track back to the original inputs.
Programming languages like R and Python offer a universe of tools for reproducible research. If, for example, you need to conduct some statistical analysis of a dataset and report your results, I couldn’t imagine doing this anymore without coding. “Show your work” is some solid advice from grade school, and code is a great way to do that in analytics.
Advancing into coding
Yes, learning to code is a knowledge investment; it is learning a new language, after all. But like learning any language, coding will open new doors for data analysts.
If you want the least friction to open these doors, I suggest you check out my book, Advancing into Analytics: From Excel to Python and R. You may not associate these as data analytics tools, but for the reasons explained above, I find them indispensable for data analysts.
Leave a Reply