I like to call Structured Query Language, or SQL, the “cool old uncle of data analysis,” because SQL has been around the block, and can still keep up.
While R has been around since 2000, and Python since 1991, SQL dates to the early 1970s. Don’t let the vocal, relatively recent adoption of these other languages distract you: SQL is a vital part of the data analysts’s stack of tools. (You also should consider these tools not as substitutes, but as complementary slices of the data analytics stack.)
The below graphic comes from a DataQuest blog post entitled “Want a Job in Data? Learn SQL.” The author mined 25,000 jobs on Indeed containing “data” as a search term.
Before Python, R or any other skill came SQL. So, what is this language, and what do data analysts need to know?
SQL is the language used to manage data held in a relational database. These tasks are summarized by the famous “CRUD” acronym: create, read, update, and delete. Generally, data analysts are responsible for the second activity, or reading information from a database.
That is the focus of this half-day workshop: to give analysts a command over the most common tasks for reading information out of a database for data analysis.
SQL comes in many flavors, from Access to PostgreSQL. For these elementary reading commands, the differences are negligible. I have designed to conduct the class in SQLite, which has the benefit of being fast, flexible and open-source. We will use the SQLiteStudio environment to move from visual interaction with the database to coding.
Lesson 1: Databases and data analytics
Objective: Student can navigate the SQLiteStudio environment to visually inspect tables in a database
Description:
- The basics of working with a database
- Database parts of speech
- The basics of working with SQLite + SQLiteStudio
Exercises: Retrieve dimensions and database types of a table
Assets needed: Home prices dataset
Time: 50 minutes
Lesson 2: Coding in SQL
Objective: Student can write style-compliant SQL scripts to retrieve tables from a field
Description:
- Styling in SQL
- Working with scripts: opening, saving, executing
- Selecting all fields from a table
- Selecting some fields from a table
- Aliasing a field
- Arithmetic operations on a field
- String operations on a field
- Limiting query results
Exercises: Drills
Assets needed: Home prices dataset
Time: 50 minutes
Lesson 3: Sorting & filtering
Objective: Student can sort and filter the results of a query
Description:
- Conditional operators
- Logical operators
- Filtering one or more fields
- Sorting one or more fields
Exercises: Drills
Assets needed: Home prices dataset
Lesson 4: Aggregating
Objective: Student can group and aggregate the results of a SQL query
Description:
- Counting and listing distinct records
- Understanding NULL
- Grouping and field arithmetic
- Aggregating and aliasing
- Filtering aggregation results
Exercises: Drills
Assets needed: Home prices dataset
Time: 50 minutes
Lesson 5: SQL and table joins
Objective: Student can join two or more tables from the same database
Description:
- How relational databases work
- JOINs and NULLs
- INNER JOIN
- LEFT OUTER JOIN
Exercises: Drills
Assets needed: Home prices dataset
Time: 50 minutes
Lesson 6: Intermediate joins
Objective: Student can use less common techniques to join two or more tables from the same database
Description:
- RIGHT OUTER JOIN
- FULL OUTER JOIN
Set operators
Exercises: Drills
Assets needed: Home prices dataset
Time: 50 minutes
Lesson 7: SQL for data analysis
Objective: Student can conduct basic data exploration and analysis in SQL
Description:
- CASE expressions
- Subqueries
- Common table expressions
Exercises: Drills
Assets needed: Home prices dataset
Time: 50 minutes
Lesson 8: SQL for data analysis, continued
Objective: Student can conduct intermediate data exploration and analysis in SQL
Description:
- Window functions
- Correlations
- Regression coefficients
Exercises: Drills
Assets needed: Home prices dataset
Time: 50 minutes
While the next wave in data analytics features tools like Power Query that allow users to accomplish many tasks with no coding, there is something to be said for learning data analysis the old-fashioned way: through SQL commands.
After all, it’s a “structured query language,” and I find that learning SQL tightens and structures your thoughts around data analysis.
kc
Hi I am subscribed to your email newsletter. How do I get access to the SQL lessons please?
George Mount
Hi KC, you will have received a link to my resource library with all of the lesson plans which you are welcome to use in whatever way.