In honor of National Panda day, I have put together a Stringfest learning guide on pandas
, the popular Python module for data analysis and manipulation.
Fun fact: the name pandas
comes from so-called “panel data” in econometrics. The primary data structure of interest in pandas
is the DataFrame, which is two-dimensional and tabular. This is a very common and useful way to arrange data for data analysis, and Excel or SQL users will find many similarities to how pandas
views and uses data — with some (useful) twists.
Take a look at the below half-day workshop and let me know what you think. My goal is for the learner to be ready to conduct exploratory data analysis in Python given their foundations in pandas
.
Lesson 1: Up and running with NumPy
Objective: Student create and operate on NumPy arrays
Description:
- Installing NumPy
- Creating arrays
- Inspecting arrays
- Reshaping arrays
- Array mathematics
- Random numbers
Exercises: Drills
Assets needed: None
Time: 35 minutes
Lesson 2: Introduction to Pandas
Objective: Student can import and create Pandas DataFrames
Description:
- Installing NumPy
- NumPy and Pandas
- Series and DataFrames
- Columns and indices
- Creating DataFrames
- Importing: CSV, Excel
Exercises: Drills
Assets needed: Baseball records
Time: 25 minutes
Lesson 3: Exploring DataFrames
Objective: Student can inspect and explore Pandas DataFrames
Description:
- Inspecting columns
- Printing rows
- Descriptive statistics
- Checking for missing values
- Retrieving columns
Exercises: Drills
Assets needed: Baseball records
Time: 40 minutes
Lesson 4: Basic DataFrame manipulation
Objective: Student can perform basic operations on Pandas DataFrames
Description:
- Sorting and filtering rows
- Modifying columns
- Removing columns
- Manipulating missing values
- Removing duplicates
- Aliasing modules
Exercises: Drills
Assets needed: Baseball records
Time: 45 minutes
Lesson 5: Intermediate DataFrame manipulation
Objective: Student can perform intermediate operations on Pandas DataFrames
Description:
- Creating new columns
- Reshaping: melting and pivoting
- Aggregating
- Merging DataFrames
- Exporting DataFrames: CSV, Excel
Exercises: Drills
Assets needed: Baseball records
Time: 45 minutes
By the way, the “baseball records” I refer to in the guide come from the Lahman baseball database, one of my all-time favorite datasets.
The only thing better than that dataset would be, well, a panda playing baseball…. oh wait, that’s Pablo Sandoval.
Leave a Reply