Code
url = "https://raw.githubusercontent.com/MUSA-550-Fall-2023/week-2/main/data/measles_incidence.csv"
data = pd.read_csv(url, skiprows=2, na_values="-")This page is generated from a Jupyter notebook and shows examples of embedding interactive charts produced using Altair and hvPlot.
First, let’s load the data for measles incidence in wide format:
url = "https://raw.githubusercontent.com/MUSA-550-Fall-2023/week-2/main/data/measles_incidence.csv"
data = pd.read_csv(url, skiprows=2, na_values="-")| YEAR | WEEK | ALABAMA | ALASKA | ARIZONA | ARKANSAS | CALIFORNIA | COLORADO | CONNECTICUT | DELAWARE | ... | SOUTH DAKOTA | TENNESSEE | TEXAS | UTAH | VERMONT | VIRGINIA | WASHINGTON | WEST VIRGINIA | WISCONSIN | WYOMING | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1928 | 1 | 3.67 | NaN | 1.90 | 4.11 | 1.38 | 8.38 | 4.50 | 8.58 | ... | 5.69 | 22.03 | 1.18 | 0.4 | 0.28 | NaN | 14.83 | 3.36 | 1.54 | 0.91 |
| 1 | 1928 | 2 | 6.25 | NaN | 6.40 | 9.91 | 1.80 | 6.02 | 9.00 | 7.30 | ... | 6.57 | 16.96 | 0.63 | NaN | 0.56 | NaN | 17.34 | 4.19 | 0.96 | NaN |
| 2 | 1928 | 3 | 7.95 | NaN | 4.50 | 11.15 | 1.31 | 2.86 | 8.81 | 15.88 | ... | 2.04 | 24.66 | 0.62 | 0.2 | 1.12 | NaN | 15.67 | 4.19 | 4.79 | 1.36 |
| 3 | 1928 | 4 | 12.58 | NaN | 1.90 | 13.75 | 1.87 | 13.71 | 10.40 | 4.29 | ... | 2.19 | 18.86 | 0.37 | 0.2 | 6.70 | NaN | 12.77 | 4.66 | 1.64 | 3.64 |
| 4 | 1928 | 5 | 8.03 | NaN | 0.47 | 20.79 | 2.38 | 5.13 | 16.80 | 5.58 | ... | 3.94 | 20.05 | 1.57 | 0.4 | 6.70 | NaN | 18.83 | 7.37 | 2.91 | 0.91 |
5 rows × 53 columns
Then, use the pandas.melt() function to convert it to tidy format:
annual = data.drop("WEEK", axis=1)
measles = annual.groupby("YEAR").sum().reset_index()
measles = measles.melt(id_vars="YEAR", var_name="state", value_name="incidence")| YEAR | state | incidence | |
|---|---|---|---|
| 0 | 1928 | ALABAMA | 334.99 |
| 1 | 1929 | ALABAMA | 111.93 |
| 2 | 1930 | ALABAMA | 157.00 |
| 3 | 1931 | ALABAMA | 337.29 |
| 4 | 1932 | ALABAMA | 10.21 |
Finally, load altair:
import altair as altAnd generate our final data viz:
# use a custom color map
colormap = alt.Scale(
domain=[0, 100, 200, 300, 1000, 3000],
range=[
"#F0F8FF",
"cornflowerblue",
"mediumseagreen",
"#FFEE00",
"darkorange",
"firebrick",
],
type="sqrt",
)
# Vertical line for vaccination year
threshold = pd.DataFrame([{"threshold": 1963}])
# plot YEAR vs state, colored by incidence
chart = (
alt.Chart(measles)
.mark_rect()
.encode(
x=alt.X("YEAR:O", axis=alt.Axis(title=None, ticks=False)),
y=alt.Y("state:N", axis=alt.Axis(title=None, ticks=False)),
color=alt.Color("incidence:Q", sort="ascending", scale=colormap, legend=None),
tooltip=["state", "YEAR", "incidence"],
)
.properties(width=650, height=500)
)
rule = alt.Chart(threshold).mark_rule(strokeWidth=4).encode(x="threshold:O")
out = chart + rule
outGenerate the same data viz in hvplot:
# Make the heatmap with hvplot
heatmap = measles.hvplot.heatmap(
x="YEAR",
y="state",
C="incidence", # color each square by the incidence
reduce_function=np.sum, # sum the incidence for each state/year
frame_height=450,
frame_width=600,
flip_yaxis=True,
rot=90,
colorbar=False,
cmap="viridis",
xlabel="",
ylabel="",
)
# Some additional formatting using holoviews
# For more info: http://holoviews.org/user_guide/Customizing_Plots.html
heatmap = heatmap.redim(state="State", YEAR="Year")
heatmap = heatmap.opts(fontsize={"xticks": 0, "yticks": 6}, toolbar="above")
heatmap