Analytics tools are best thought of as a “stack,” so that each slice is seen in context with the others. The most exciting analytics products span multiple slices, combining the advantages of each.
Using SQLAlchemy inside Python is such a case. This open-source tool allows you to interact with relational databases from inside Python. As such, SQLAlchemy crosses two slices of the analytics stack: databases and programming languages.
As the name implies, SQLAlchemy is based on the tenets of SQL, the traditional language for interacting with relational databases. The “Alchemy” is that by adding Python to the mix, magic happens.
Not only is it incredibly useful to pull information from databases into Python objects, the Python language adds versatility to traditional SQL querying.
Take a look at the below half-day workshop as a way to get started with this popular Python package. By the end of the workshop, learners will be able to build an end-to-end data pipeline, from a relational database for data storage and integrity to a pandas
DataFrame for data analysis and visualization.
Lesson 1: Combining the powers of SQL & Python
Objective: Student can compare and contrast the uses of Python and SQL for data analysis
Description:
- Databases versus scripting languages
- Declarative versus procedural languages
- Object-oriented programming
Time: 20 minutes
Assets needed: none
Lesson 2: Connecting to a database and returning results
Objective: Student can connect to, retrieve from, and close a database
Description:
- Installing and loading the packages
- Connecting to a database
- Retrieving keys and items
- Retrieving records
- Closing the connection
Time: 45 minutes
Assets needed: Baseball database
Lesson 3: Sorting & filtering results
Objective: Student can sort and filter query results
Description:
- Conditional logic
- Filtering results
- Ordering results
Time: 35 minutes
Assets needed: Baseball database
Lesson 4: Student can group by and create calculations from query results
Objective: Student can create a data manipulation pipeline
- Grouping results
- Creating aggregated calculations
- Creating calculated fields
Time: 35 minutes
Assets needed: Baseball database
Lesson 5: Joining tables
Objective: Student can join two or more tables
- Inner joins
- Left outer joins
Time: 35 minutes
Assets needed: Baseball database
Lesson 6: SQLAlchemy & pandas
Objective: Student can load query results into pandas DataFrames and analyze
- Loading results to a DataFrame
- Summarizing and analyzing in pandas
Time: 30 minutes
Assets needed: Baseball database
Lesson 7: Capstone: creating a data pipeline
Objective: Student can create an end-to-end data pipeline using Python, SQLAlchemy and pandas
- Connect to, retrieve and close a database
- Load results to DataFrame
- Analyze, inspect and interpret results
Time: 35 minutes
Assets needed: Flights database
Leave a Reply