In honor of National Panda day, I have put together a Stringfest learning guide on pandas, the popular Python module for data analysis and manipulation. 
Fun fact: the name pandas comes from so-called “panel data” in econometrics. The primary data structure of interest in pandas is the DataFrame, which is two-dimensional and tabular. This is a very common and useful way to arrange data for data analysis, and Excel or SQL users will find many similarities to how pandas views and uses data — with some (useful) twists.
Take a look at the below half-day workshop and let me know what you think. My goal is for the learner to be ready to conduct exploratory data analysis in Python given their foundations in pandas.

Lesson 1: Up and running with NumPy
Objective: Student create and operate on NumPy arrays
Description:
- Installing NumPy
- Creating arrays
- Inspecting arrays
- Reshaping arrays
- Array mathematics
- Random numbers
Exercises: Drills
Assets needed: None
Time: 35 minutes
Lesson 2: Introduction to Pandas
Objective: Student can import and create Pandas DataFrames
Description:
- Installing NumPy
- NumPy and Pandas
- Series and DataFrames
- Columns and indices
- Creating DataFrames
- Importing: CSV, Excel
Exercises: Drills
Assets needed: Baseball records
Time: 25 minutes
Lesson 3: Exploring DataFrames
Objective: Student can inspect and explore Pandas DataFrames
Description:
- Inspecting columns
- Printing rows
- Descriptive statistics
- Checking for missing values
- Retrieving columns
Exercises: Drills
Assets needed: Baseball records
Time: 40 minutes
Lesson 4: Basic DataFrame manipulation
Objective: Student can perform basic operations on Pandas DataFrames
Description:
- Sorting and filtering rows
- Modifying columns
- Removing columns
- Manipulating missing values
- Removing duplicates
- Aliasing modules
Exercises: Drills
Assets needed: Baseball records
Time: 45 minutes
Lesson 5: Intermediate DataFrame manipulation
Objective: Student can perform intermediate operations on Pandas DataFrames
Description:
- Creating new columns
- Reshaping: melting and pivoting
- Aggregating
- Merging DataFrames
- Exporting DataFrames: CSV, Excel
Exercises: Drills
Assets needed: Baseball records
Time: 45 minutes
By the way, the “baseball records” I refer to in the guide come from the Lahman baseball database, one of my all-time favorite datasets.
The only thing better than that dataset would be, well, a panda playing baseball…. oh wait, that’s Pablo Sandoval.

Leave a Reply