Homework 0
Assignment:Write a tutorial explaining how to construct an interesting data visualization of the Palmer Penguins data set.
1.Data Import and Cleaning
Reading in Palmer Penguins data set.
import pandas as pd
url = "https://raw.githubusercontent.com/PhilChodrow/PIC16B/master/datasets/palmer_penguins.csv"
penguins = pd.read_csv(url)
Choosing the relevant columns.
"Culmen Length (mm)",
"Culmen Depth (mm)",
"Flipper Length (mm)",
"Body Mass (g)",
penguins["Species"] = penguins["Species"].str.split().str.get(0)
penguins = penguins[cols]
Clean the data
#drops all rows with NaN values
penguins = penguins.dropna()
#drops rows with "." as Sex
penguins = penguins[penguins['Sex']!= "."]
Seaborn is a Python data visualization library based on matplotlib.
Scatterplot of Culmen Length against Body Mass for Three Penguin Species, Separated by Sex
import seaborn as sns
##create a facet grid called scatterplot
sns.relplot(data=penguins, # data that needs to be plotted
x="Culmen Length (mm)", # column name for x-axis
y="Body Mass (g)", # column name for y-axis
hue="Species", # column name for color coding
col = "Sex",
sizes=(40, 400),
Matplotlib is a library for creating interactive visualizations in Python.
Boxplot of Sex against Body Mass
from matplotlib import pyplot as plt
#creat an empty plot with figure size (10,7)
fig, ax = plt.subplots(figsize=(10,7))
#plot the graph with boxplot method
sns.boxplot(data=penguins, # data that needs to be plotted
x="Body Mass (g)", # column name for x-axis
y="Sex", # column name for y-axis
hue="Species", # column name for color coding
Plotly is a graphing library makes interactive graphs.
Hitogram of the number of penguins with differnt Culmen Length
from plotly import express as px
fig=px.histogram(penguins, #data that needs to be plotted
x="Culmen Length (mm)", # column name for x-axis
color="Species", # column name for color coding
opacity=0.5, # set the opacity
nbins=30, # set the number of bins
width=600, # the figure width in pixels
height=300) # the figure height in pixels
# reduce whitespace
# show the plot
Compare to matplotlib, plotly can offer a more ornate visualization, and plotly is easy to modify and export the plot.