from matplotlib import pyplot as plt
import numpy as np
import pandas as pd
import geopandas as gpd
Week 14B: Raster Data in the Wild
- Section 401
- Dec 11, 2023
Last class!
- Feedback on proposals via email soon if you haven’t received it yet
- Final project due: end of day on Wednesday, December 20
Final project details: https://github.com/MUSA-550-Fall-2023/final-project
Thank you for a great semester!!
This week: raster data analysis with the holoviz ecosystem
Two case studies:
- Last time: Using satellite imagery to detect changes in lake volume
- Today: Detecting urban heat islands in Philadelphia
Initialize our packages:
import intake
import xarray as xr
import hvplot.xarray
import hvplot.pandas
import holoviews as hv
import geoviews as gv
from geoviews.tile_sources import EsriImagery
# Ignore numpy runtime warnings
"ignore"); np.seterr(
Create the intake data catalog so we can load our datasets:
= 'https://raw.githubusercontent.com/pyviz-topics/EarthML/master/examples/catalog.yml'
url = intake.open_catalog(url) cat
Example #2: Urban heat islands
- We’ll reproduce an analysis by Urban Spatial on urban heat islands in Philadelphia using Python.
- The analysis uses Landsat 8 data (2017)
- See: http://urbanspatialanalysis.com/urban-heat-islands-street-trees-in-philadelphia/
First load some metadata for Landsat 8
= pd.DataFrame([
band_info 1, "Aerosol", " 0.43 - 0.45", 0.440, "30", "Coastal aerosol"),
(2, "Blue", " 0.45 - 0.51", 0.480, "30", "Blue"),
(3, "Green", " 0.53 - 0.59", 0.560, "30", "Green"),
(4, "Red", " 0.64 - 0.67", 0.655, "30", "Red"),
(5, "NIR", " 0.85 - 0.88", 0.865, "30", "Near Infrared (NIR)"),
(6, "SWIR1", " 1.57 - 1.65", 1.610, "30", "Shortwave Infrared (SWIR) 1"),
(7, "SWIR2", " 2.11 - 2.29", 2.200, "30", "Shortwave Infrared (SWIR) 2"),
(8, "Panc", " 0.50 - 0.68", 0.590, "15", "Panchromatic"),
(9, "Cirrus", " 1.36 - 1.38", 1.370, "30", "Cirrus"),
(10, "TIRS1", "10.60 - 11.19", 10.895, "100 * (30)", "Thermal Infrared (TIRS) 1"),
(11, "TIRS2", "11.50 - 12.51", 12.005, "100 * (30)", "Thermal Infrared (TIRS) 2")],
(=['Band', 'Name', 'Wavelength Range (µm)', 'Nominal Wavelength (µm)', 'Resolution (m)', 'Description']).set_index(["Band"])
columns band_info
Name | Wavelength Range (µm) | Nominal Wavelength (µm) | Resolution (m) | Description | |
---|---|---|---|---|---|
Band | |||||
1 | Aerosol | 0.43 - 0.45 | 0.440 | 30 | Coastal aerosol |
2 | Blue | 0.45 - 0.51 | 0.480 | 30 | Blue |
3 | Green | 0.53 - 0.59 | 0.560 | 30 | Green |
4 | Red | 0.64 - 0.67 | 0.655 | 30 | Red |
5 | NIR | 0.85 - 0.88 | 0.865 | 30 | Near Infrared (NIR) |
6 | SWIR1 | 1.57 - 1.65 | 1.610 | 30 | Shortwave Infrared (SWIR) 1 |
7 | SWIR2 | 2.11 - 2.29 | 2.200 | 30 | Shortwave Infrared (SWIR) 2 |
8 | Panc | 0.50 - 0.68 | 0.590 | 15 | Panchromatic |
9 | Cirrus | 1.36 - 1.38 | 1.370 | 30 | Cirrus |
10 | TIRS1 | 10.60 - 11.19 | 10.895 | 100 * (30) | Thermal Infrared (TIRS) 1 |
11 | TIRS2 | 11.50 - 12.51 | 12.005 | 100 * (30) | Thermal Infrared (TIRS) 2 |
Landsat data on Google Cloud Storage
We’ll be downloading publicly available Landsat data from Google Cloud Storage
See: https://cloud.google.com/storage/docs/public-datasets/landsat
The relevant information is stored in our intake catalog:
From our catalog.yml file:
google_landsat_band:
description: Landsat bands from Google Cloud Storage
driver: rasterio
parameters:
path:
description: landsat path
type: int
row:
description: landsat row
type: int
product_id:
description: landsat file id
type: str
band:
description: band
type: int
args:
urlpath: https://storage.googleapis.com/gcp-public-data-landsat/LC08/01/{{ '%03d' % path }}/{{ '%03d' % row }}/{{ product_id }}/{{ product_id }}_B{{ band }}.TIF
chunks:
band: 1
x: 256
y: 256
From the “urlpath” above, you can see we need “path”, “row”, and “product_id” variables to properly identify a Landsat image:
The path and row corresponding to the area over Philadelphia has already been selected using the Earth Explorer. This tool was also used to find the id of the particular date of interest using the same tool.
# Necessary variables
= 14
path = 32
row = 'LC08_L1TP_014032_20160727_20170222_01_T1' product_id
Use a utility function to query the API and request a specific band
This will return a specific Landsat band as a dask array.
from random import random
from time import sleep
def get_band_with_exponential_backoff(path, row, product_id, band, maximum_backoff=32):
"""
Google Cloud Storage recommends using exponential backoff
when accessing the API.
https://cloud.google.com/storage/docs/exponential-backoff
"""
= backoff = 0
n while backoff < maximum_backoff:
try:
return cat.google_landsat_band(
=path, row=row, product_id=product_id, band=band
path
).to_dask()except:
= min(2 ** n + random(), maximum_backoff)
backoff
sleep(backoff)+= 1 n
Load all of the bands and combine them into a single xarray DataArray
Loop over each band, load that band using the above function, and then concatenate the results together..
= [1, 2, 3, 4, 5, 6, 7, 9, 10, 11]
bands
= []
datasets for band in bands:
= get_band_with_exponential_backoff(
da =path, row=row, product_id=product_id, band=band
path
)= da.assign_coords(band=[band])
da
datasets.append(da)
= xr.concat(datasets, "band", compat="identical")
ds
ds
<xarray.DataArray (band: 10, y: 7871, x: 7741)> dask.array<concatenate, shape=(10, 7871, 7741), dtype=uint16, chunksize=(1, 256, 256), chunktype=numpy.ndarray> Coordinates: * band (band) int64 1 2 3 4 5 6 7 9 10 11 * x (x) float64 3.954e+05 3.954e+05 ... 6.276e+05 6.276e+05 * y (y) float64 4.582e+06 4.582e+06 ... 4.346e+06 4.346e+06 spatial_ref int64 0 Attributes: AREA_OR_POINT: Point scale_factor: 1.0 add_offset: 0.0
CRS for Landsat data is EPSG=32618
Also grab the metadata from Google Cloud Storage
- There is an associated metadata file stored on Google Cloud Storage
- The below function will parse that metadata file for a specific path, row, and product ID
- The specifics of this are not overly important for our purposes
def load_google_landsat_metadata(path, row, product_id):
"""
Load Landsat metadata for path, row, product_id from Google Cloud Storage
"""
def parse_type(x):
try:
return eval(x)
except:
return x
= "https://storage.googleapis.com/gcp-public-data-landsat/LC08/01"
baseurl = f"{path:03d}/{row:03d}/{product_id}/{product_id}_MTL.txt"
suffix
= intake.open_csv(
df =f"{baseurl}/{suffix}",
urlpath={
csv_kwargs"sep": "=",
"header": None,
"names": ["index", "value"],
"skiprows": 2,
"converters": {"index": (lambda x: x.strip()), "value": parse_type},
},
).read()= df.set_index("index")["value"]
metadata return metadata
= load_google_landsat_metadata(path, row, product_id)
metadata =20) metadata.head(n
index
ORIGIN Image courtesy of the U.S. Geological Survey
REQUEST_ID 0501702206266_00020
LANDSAT_SCENE_ID LC80140322016209LGN01
LANDSAT_PRODUCT_ID LC08_L1TP_014032_20160727_20170222_01_T1
COLLECTION_NUMBER 01
FILE_DATE 2017-02-22T04:00:05Z
STATION_ID LGN
PROCESSING_SOFTWARE_VERSION LPGS_2.7.0
END_GROUP METADATA_FILE_INFO
GROUP PRODUCT_METADATA
DATA_TYPE L1TP
COLLECTION_CATEGORY T1
ELEVATION_SOURCE GLS2000
OUTPUT_FORMAT GEOTIFF
SPACECRAFT_ID LANDSAT_8
SENSOR_ID OLI_TIRS
WRS_PATH 14
WRS_ROW 32
NADIR_OFFNADIR NADIR
TARGET_WRS_PATH 14
Name: value, dtype: object
Let’s trim our data to the Philadelphia limits
- The Landsat image covers an area much wider than the Philadelphia limits
- We’ll load the city limits from Open Data Philly
- Use the
slice()
function discussed last lecture
1. Load the city limits
- From OpenDataPhilly, the city limits for Philadelphia are available at: http://data.phl.opendata.arcgis.com/datasets/405ec3da942d4e20869d4e1449a2be48_0.geojson
- Be sure to convert to the same CRS as the Landsat data!
# Load the GeoJSON from the URL
= "http://data.phl.opendata.arcgis.com/datasets/405ec3da942d4e20869d4e1449a2be48_0.geojson"
url = gpd.read_file(url)
city_limits
# Convert to the right CRS for this data
= city_limits.to_crs(epsg=32618) city_limits
2. Use the xmin/xmax and ymin/ymax of the city limits to trim the Landsat DataArray
- Use the built-in slice functionality of xarray
- Remember, the
y
coordinates are in descending order, so you’ll slice will need to be in descending order as well
# Look up the order of xmin/xmax and ymin/ymax
city_limits.total_bounds?
Type: property String form: <property object at 0x1511c5990> Docstring: Returns a tuple containing ``minx``, ``miny``, ``maxx``, ``maxy`` values for the bounds of the series as a whole. See ``GeoSeries.bounds`` for the bounds of the geometries contained in the series. Examples -------- >>> from shapely.geometry import Point, Polygon, LineString >>> d = {'geometry': [Point(3, -1), Polygon([(0, 0), (1, 1), (1, 0)]), ... LineString([(0, 1), (1, 2)])]} >>> gdf = geopandas.GeoDataFrame(d, crs="EPSG:4326") >>> gdf.total_bounds array([ 0., -1., 3., 2.])
# Get x/y range of city limits from "total_bounds"
= city_limits.total_bounds xmin, ymin, xmax, ymax
# Slice our xarray data
# NOTE: y coordinate system is in descending order!
= ds.sel(x=slice(xmin, xmax), y=slice(ymax, ymin)) subset
subset
<xarray.DataArray (band: 10, y: 1000, x: 924)> dask.array<getitem, shape=(10, 1000, 924), dtype=uint16, chunksize=(1, 256, 256), chunktype=numpy.ndarray> Coordinates: * band (band) int64 1 2 3 4 5 6 7 9 10 11 * x (x) float64 4.761e+05 4.761e+05 ... 5.037e+05 5.038e+05 * y (y) float64 4.443e+06 4.443e+06 ... 4.413e+06 4.413e+06 spatial_ref int64 0 Attributes: AREA_OR_POINT: Point scale_factor: 1.0 add_offset: 0.0
3. Call the compute() function on your sliced data
- Originally, the Landsat data was stored as dask arrays
- This should now only load the data into memory that covers Philadelphia
= subset.compute()
subset
subset
<xarray.DataArray (band: 10, y: 1000, x: 924)> array([[[10702, 10162, 10361, ..., 11681, 11489, 15594], [10870, 10376, 10122, ..., 11620, 11477, 12042], [10429, 10147, 10063, ..., 11455, 11876, 11790], ..., [11944, 11696, 11626, ..., 11192, 11404, 10923], [12406, 11555, 11155, ..., 11516, 11959, 10766], [11701, 11232, 10657, ..., 10515, 11475, 10926]], [[ 9809, 9147, 9390, ..., 10848, 10702, 15408], [ 9989, 9405, 9139, ..., 10831, 10660, 11361], [ 9439, 9083, 8981, ..., 10654, 11141, 11073], ..., [11220, 10853, 10741, ..., 10318, 10511, 9950], [11765, 10743, 10259, ..., 10646, 11378, 9829], [10889, 10365, 9630, ..., 9500, 10573, 10008]], [[ 9042, 8466, 8889, ..., 10014, 9647, 14981], [ 9699, 8714, 8596, ..., 9866, 9783, 11186], [ 8623, 8457, 8334, ..., 9688, 10474, 9993], ..., ... ..., [ 5027, 5028, 5038, ..., 5035, 5037, 5029], [ 5058, 5021, 5023, ..., 5035, 5041, 5031], [ 5036, 5041, 5040, ..., 5036, 5044, 5035]], [[29033, 28976, 28913, ..., 32614, 32577, 32501], [28940, 28904, 28858, ..., 32516, 32545, 32545], [28882, 28879, 28854, ..., 32431, 32527, 32563], ..., [30094, 29929, 29713, ..., 29521, 29525, 29429], [30140, 29972, 29696, ..., 29556, 29516, 29398], [30133, 29960, 29614, ..., 29587, 29533, 29424]], [[25558, 25519, 25492, ..., 27680, 27650, 27619], [25503, 25450, 25402, ..., 27636, 27630, 27639], [25473, 25434, 25378, ..., 27609, 27668, 27667], ..., [26126, 26055, 25934, ..., 25622, 25586, 25520], [26149, 26077, 25935, ..., 25651, 25594, 25507], [26147, 26050, 25856, ..., 25696, 25644, 25571]]], dtype=uint16) Coordinates: * band (band) int64 1 2 3 4 5 6 7 9 10 11 * x (x) float64 4.761e+05 4.761e+05 ... 5.037e+05 5.038e+05 * y (y) float64 4.443e+06 4.443e+06 ... 4.413e+06 4.413e+06 spatial_ref int64 0 Attributes: AREA_OR_POINT: Point scale_factor: 1.0 add_offset: 0.0
# Original data was about 8000 x 8000 in size
ds.shape
(10, 7871, 7741)
# Sliced data is only about 1000 x 1000 in size!
subset.shape
(10, 1000, 924)
Let’s plot band 3 of the Landsat data
# City limits plot
= city_limits.hvplot.polygons(
limits_plot ="white", alpha=0, line_alpha=1, crs=32618, geo=True
line_color
)
# Landsat plot
= subset.sel(band=3).hvplot.image(
landsat_plot ="x",
x="y",
y=True,
rasterize="viridis",
cmap=500,
frame_width=500,
frame_height=True,
geo=32618,
crs
)
# Combined
* limits_plot landsat_plot
/Users/nhand/mambaforge/envs/musa-550-fall-2023/lib/python3.10/site-packages/osgeo/osr.py:385: FutureWarning: Neither osr.UseExceptions() nor osr.DontUseExceptions() has been explicitly called. In GDAL 4.0, exceptions will be enabled by default.
warnings.warn(
OMP: Info #276: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.
Calculate the NDVI
Remember, NDVI = (NIR - Red) / (NIR + Red)
band_info.head()
Name | Wavelength Range (µm) | Nominal Wavelength (µm) | Resolution (m) | Description | |
---|---|---|---|---|---|
Band | |||||
1 | Aerosol | 0.43 - 0.45 | 0.440 | 30 | Coastal aerosol |
2 | Blue | 0.45 - 0.51 | 0.480 | 30 | Blue |
3 | Green | 0.53 - 0.59 | 0.560 | 30 | Green |
4 | Red | 0.64 - 0.67 | 0.655 | 30 | Red |
5 | NIR | 0.85 - 0.88 | 0.865 | 30 | Near Infrared (NIR) |
NIR is band 5 and Red is band 4
# Selet the bands we need
= subset.sel(band=5)
NIR = subset.sel(band=4)
RED
# Calculate the NDVI
= (NIR - RED) / (NIR + RED)
NDVI = NDVI.where(NDVI < np.inf)
NDVI
NDVI.head()
<xarray.DataArray (y: 5, x: 5)> array([[0.34690712, 0.43934922, 0.45820226, 0.45798014, 0.38205128], [0.33747045, 0.40266976, 0.47276229, 0.44722264, 0.4530777 ], [0.40783302, 0.44093268, 0.47419128, 0.49448025, 0.49341935], [0.40190311, 0.39986498, 0.46889397, 0.48497283, 0.48686142], [0.47248631, 0.47255625, 0.46365524, 0.37692424, 0.36055777]]) Coordinates: * x (x) float64 4.761e+05 4.761e+05 4.761e+05 4.762e+05 4.762e+05 * y (y) float64 4.443e+06 4.443e+06 4.443e+06 4.443e+06 4.443e+06 spatial_ref int64 0
# NEW: Use datashader to plot the NDVI
= NDVI.hvplot.image(
p ="x",
x="y",
y=True,
geo=32618,
crs=True, # NEW: USE DATASHADER
datashade=True, # NEW: PROJECT THE DATA BEFORE PLOTTING
project=500,
frame_height=500,
frame_width="viridis",
cmap
)
# NEW: Use a transparent rasterized version of the plot for hover text
# No hover text available with datashaded images so we can use the rasterized version
= NDVI.hvplot(
p_hover ="x",
x="y",
y=32618,
crs=True,
geo=True,
rasterize=500,
frame_height=500,
frame_width=0, # SET ALPHA=0 TO MAKE THIS TRANSPARENT
alpha=False,
colorbar
)
# City limits
= city_limits.hvplot.polygons(
limits_plot =True, crs=32618, line_color="white", line_width=2, alpha=0, line_alpha=1
geo
)
* p_hover * limits_plot p
A couple of notes
- Notice that we leveraged datashader algorithm to plot the NDVI by specifying the
datashade=True
keyword - Hover tools won’t work with datashaded images — instead, we overlaid a transparent rasterized version of the image, which shows the mean pixel values across the image
Now, on to land surface temperature
Given the NDVI calculated above, we can determine land surface temperature
But, the data must be cleaned first! Several atmospheric corrections and transformations must be applied.
We’ll use existing an existing package called rio_toa to do this
We’ll also need to specify one more for transforming satellite temperature (brightness temp) to land surface temperature.
For these calculations we’ll use both Thermal Infrared bands - 10 and 11.
from rio_toa import brightness_temp, toa_utils
def land_surface_temp(BT, w, NDVI):
"""Calculate land surface temperature of Landsat 8
temp = BT/1 + w * (BT /p) * ln(e)
BT = At Satellite temperature (brightness)
w = wavelength of emitted radiance (μm)
where p = h * c / s (1.439e-2 mK)
h = Planck's constant (Js)
s = Boltzmann constant (J/K)
c = velocity of light (m/s)
"""
= 6.626e-34
h = 1.38e-23
s = 2.998e8
c
= (h * c / s) * 1e6
p
= (NDVI - NDVI.min() / NDVI.max() - NDVI.min()) ** 2
Pv = 0.004 * Pv + 0.986
e
return BT / 1 + w * (BT / p) * np.log(e)
def land_surface_temp_for_band(band):
"""
Utility function to get land surface temperature for a specific band
"""
# params from general Landsat info
= band_info.loc[band]["Nominal Wavelength (µm)"]
w
# params from specific Landsat data text file for these images
= metadata[f"RADIANCE_MULT_BAND_{band}"]
ML = metadata[f"RADIANCE_ADD_BAND_{band}"]
AL = metadata[f"K1_CONSTANT_BAND_{band}"]
K1 = metadata[f"K2_CONSTANT_BAND_{band}"]
K2
= brightness_temp.brightness_temp(
at_satellite_temp =band).values, ML, AL, K1, K2
subset.sel(band
)
return land_surface_temp(at_satellite_temp, w, NDVI)
Get land surface temp for band 10
# temperature for band 10
= 10
band = land_surface_temp_for_band(band).compute() temp_10
# convert to Farenheit
= toa_utils.temp_rescale(temp_10, 'F') temp_10_f
temp_10_f
<xarray.DataArray (y: 1000, x: 924)> array([[82.90960333, 82.6719826 , 82.40894577, ..., 97.35595848, 97.21128509, 96.91376458], [82.52155948, 82.37134833, 82.17908169, ..., 96.97252484, 97.08602323, 97.08602622], [82.2792908 , 82.26686815, 82.16232996, ..., 96.63949101, 97.01559592, 97.1564358 ], ..., [87.2877158 , 86.61250194, 85.72552832, ..., 84.9339736 , 84.9504808 , 84.55381061], [87.47549589, 86.78863063, 85.65558184, ..., 85.07848945, 84.9133618 , 84.42561547], [87.44697798, 86.73954729, 85.31790267, ..., 85.20644222, 84.9835009 , 84.53307946]]) Coordinates: * x (x) float64 4.761e+05 4.761e+05 ... 5.037e+05 5.038e+05 * y (y) float64 4.443e+06 4.443e+06 ... 4.413e+06 4.413e+06 spatial_ref int64 0
# Save the numpy array as an xarray
= subset.sel(band=band).copy(data=temp_10_f) band_10
band_10
<xarray.DataArray (y: 1000, x: 924)> array([[82.90960333, 82.6719826 , 82.40894577, ..., 97.35595848, 97.21128509, 96.91376458], [82.52155948, 82.37134833, 82.17908169, ..., 96.97252484, 97.08602323, 97.08602622], [82.2792908 , 82.26686815, 82.16232996, ..., 96.63949101, 97.01559592, 97.1564358 ], ..., [87.2877158 , 86.61250194, 85.72552832, ..., 84.9339736 , 84.9504808 , 84.55381061], [87.47549589, 86.78863063, 85.65558184, ..., 85.07848945, 84.9133618 , 84.42561547], [87.44697798, 86.73954729, 85.31790267, ..., 85.20644222, 84.9835009 , 84.53307946]]) Coordinates: band int64 10 * x (x) float64 4.761e+05 4.761e+05 ... 5.037e+05 5.038e+05 * y (y) float64 4.443e+06 4.443e+06 ... 4.413e+06 4.413e+06 spatial_ref int64 0 Attributes: AREA_OR_POINT: Point scale_factor: 1.0 add_offset: 0.0
Get the land surface temp for band 11
# Get the land surface temp
= 11
band = land_surface_temp_for_band(band).compute()
temp_11
# Convert to Farenheit
= toa_utils.temp_rescale(temp_11, "F")
temp_11_f
# Save as an xarray DataArray
= subset.sel(band=band).copy(data=temp_11_f) band_11
Plot both temperatures side-by-side
# plot for band 10
= band_10.hvplot.image(
plot_10 ="x",
x="y",
y=True,
rasterize="fire_r",
cmap=32618,
crs=True,
geo=400,
frame_height=400,
frame_width="Band 10",
title
)
# plot for band 11
= band_11.hvplot.image(
plot_11 ="x",
x="y",
y=True,
rasterize="fire_r",
cmap=32618,
crs=True,
geo=400,
frame_height=400,
frame_width="Band 11",
title
)
+ plot_11 plot_10
Plot the final product (the average of the bands)
# Let's average the results of these two bands
= (band_10 + band_11) / 2
mean_temp_f
# IMPORTANT: copy over the metadata
= band_10.attrs mean_temp_f.attrs
mean_temp_f
<xarray.DataArray (y: 1000, x: 924)> array([[79.40579086, 79.18871976, 78.98910218, ..., 91.86524855, 91.72038357, 91.49660753], [79.07306227, 78.86421586, 78.64685031, ..., 91.56712235, 91.60934815, 91.6311314 ], [78.87625297, 78.77157573, 78.57777687, ..., 91.33527213, 91.66605844, 91.73405961], ..., [83.01805262, 82.50340008, 81.75768196, ..., 80.579109 , 80.49671178, 80.13207064], [83.16916028, 82.64632139, 81.7252 , ..., 80.72431108, 80.4982923 , 80.03521661], [83.14992768, 82.55447768, 81.35868618, ..., 80.90148737, 80.65920451, 80.25022931]]) Coordinates: * x (x) float64 4.761e+05 4.761e+05 ... 5.037e+05 5.038e+05 * y (y) float64 4.443e+06 4.443e+06 ... 4.413e+06 4.413e+06 spatial_ref int64 0 Attributes: AREA_OR_POINT: Point scale_factor: 1.0 add_offset: 0.0
= mean_temp_f.hvplot.image(
p ="x",
x="y",
y=True,
rasterize=32618,
crs=True,
geo=600,
frame_width=600,
frame_height="rainbow",
cmap=0.5,
alpha=False,
legend="Mean Surface Temp (F)"
title
)
= p * EsriImagery
img
img
Use the zoom tool to identify hot spots
Key insight: color of the building roof plays a very important role in urban heat islands
Exercise: isolate cold/hot spots on the map
We can use the .where()
function in xarray to identify the heat islands across the city
To do: - Select only those pixels with mean temperatures about 90 degrees. - Remember, the where()
function takes a boolean array where each pixel is True/False based on the selection criteria - Use hvplot to plot the imagery for Philadelphia, with the hotspots overlaid
= mean_temp_f > 90 cond
= mean_temp_f.where(cond)
hot_spots
hot_spots
<xarray.DataArray (y: 1000, x: 924)> array([[ nan, nan, nan, ..., 91.86524855, 91.72038357, 91.49660753], [ nan, nan, nan, ..., 91.56712235, 91.60934815, 91.6311314 ], [ nan, nan, nan, ..., 91.33527213, 91.66605844, 91.73405961], ..., [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan]]) Coordinates: * x (x) float64 4.761e+05 4.761e+05 ... 5.037e+05 5.038e+05 * y (y) float64 4.443e+06 4.443e+06 ... 4.413e+06 4.413e+06 spatial_ref int64 0 Attributes: AREA_OR_POINT: Point scale_factor: 1.0 add_offset: 0.0
= hot_spots.hvplot.image(
hot_spots_plot ="x",
x="y",
y="fire",
cmap=32618,
crs=True,
geo=500,
frame_height=500,
frame_width=False,
colorbar=False,
legend=True,
rasterize="Mean Temp (F) > 90",
title=0.7
alpha
)
* EsriImagery hot_spots_plot
What about cold spots?
Let’s select things less than 75 degrees as well…
= mean_temp_f.where(mean_temp_f < 75) cold_spots
= cold_spots.hvplot.image(
cold_spots_plot ="x",
x="y",
y="fire",
cmap=True,
geo=32618,
crs=500,
frame_height=500,
frame_width=False,
colorbar=False,
legend=True,
rasterize="Mean Temp (F) < 75",
title=0.5
alpha
)
* EsriImagery cold_spots_plot
Cold spots pick out bodies of water and parks / areas of high vegetation!
Exercise: Exploring the relationship between temperature, trees, and redlining in Philadelphia
In this section, we’ll use zonal statistics to calculate the average temperatures for redlined areas and calculate the number of street trees in the city.
Redlining Readings
Saving xarray data to a geoTIFF file
- Unfortunately, the zonal statistics calculation requires a rasterio file — so, we will need to to save our mean temperature xarray DataArray to a
.tif
file - This is not built in to xarray (yet), but we can use the rioxarray package
import rioxarray
# Save to a raster GeoTIFF file!
"./mean_temp_f.tif") mean_temp_f.rio.to_raster(
Part 1: Load the redlined neighborhoods and make an interactive choropleth
- The redlining HOLC map for Philadelphia is available in the
data/
folder - Use hvplot to make the interactive choropleth map
- Try to match the historical color scheme from the original HOLC categories using the color palette below.
Hint - Create a new column called holc_color
in the redlining GeoDataFrame - You can then specify this column as the color column using the “c=” for hvplot()
- To create the column, you can use the .apply()
function of the holc_grade
column
= {
color_palette "A": "#388E3C",
"B": "#1976D2",
"C": "#FFEE58",
"D": "#D32F2F",
"Commercial": "#a1a1a1",
}
= gpd.read_file("./data/redlining.geojson") redlining
redlining.head()
area_id | holc_grade | geometry | |
---|---|---|---|
0 | 0 | A | POLYGON ((-75.17329 40.06201, -75.17572 40.060... |
1 | 1 | A | MULTIPOLYGON (((-75.19831 40.05021, -75.19899 ... |
2 | 2 | A | POLYGON ((-75.12810 40.04734, -75.12844 40.045... |
3 | 3 | A | POLYGON ((-75.11494 40.03443, -75.13476 40.036... |
4 | 4 | A | POLYGON ((-75.18377 40.02725, -75.17894 40.023... |
= redlining['holc_grade'].replace(color_palette) colors
colors.head()
0 #388E3C
1 #388E3C
2 #388E3C
3 #388E3C
4 #388E3C
Name: holc_grade, dtype: object
"holc_color"] = colors redlining[
= redlining.hvplot.polygons(
holc_map ="holc_color",
c=400,
frame_width=400,
frame_height=True,
geo=["holc_grade"],
hover_cols=0.5
alpha
)
* gv.tile_sources.ESRI holc_map
Part 2: Calculate the mean temperature for each redlined area
- Use
rasterio.open()
to load the mean temp TIFF file - Use the
rasterstats
package to calculate the zonal stats - Look back at week 5’s lecture for help!
- Make sure the CRS matches!
from rasterstats import zonal_stats
import rasterio as rio
# Open the GeoTIFF file using rasterio
= rio.open('./mean_temp_f.tif')
f f
<open DatasetReader name='./mean_temp_f.tif' mode='r'>
Load the data from the file:
= f.read(1)
temp_data
temp_data
array([[79.40579086, 79.18871976, 78.98910218, ..., 91.86524855,
91.72038357, 91.49660753],
[79.07306227, 78.86421586, 78.64685031, ..., 91.56712235,
91.60934815, 91.6311314 ],
[78.87625297, 78.77157573, 78.57777687, ..., 91.33527213,
91.66605844, 91.73405961],
...,
[83.01805262, 82.50340008, 81.75768196, ..., 80.579109 ,
80.49671178, 80.13207064],
[83.16916028, 82.64632139, 81.7252 , ..., 80.72431108,
80.4982923 , 80.03521661],
[83.14992768, 82.55447768, 81.35868618, ..., 80.90148737,
80.65920451, 80.25022931]])
Same as:
mean_temp_f.values
array([[79.40579086, 79.18871976, 78.98910218, ..., 91.86524855,
91.72038357, 91.49660753],
[79.07306227, 78.86421586, 78.64685031, ..., 91.56712235,
91.60934815, 91.6311314 ],
[78.87625297, 78.77157573, 78.57777687, ..., 91.33527213,
91.66605844, 91.73405961],
...,
[83.01805262, 82.50340008, 81.75768196, ..., 80.579109 ,
80.49671178, 80.13207064],
[83.16916028, 82.64632139, 81.7252 , ..., 80.72431108,
80.4982923 , 80.03521661],
[83.14992768, 82.55447768, 81.35868618, ..., 80.90148737,
80.65920451, 80.25022931]])
f.crs
CRS.from_epsg(32618)
# Get the CRS
= f.crs.to_epsg()
CRS print(CRS)
32618
# Get the transform too
= f.transform transform
# Use zonal_stats to get the mean temperature values within each redlined area
# FIRST ARGUMENT: vector polygons
# SECOND ARGUMENT: mean temperature raster data
= zonal_stats(
redlining_stats =CRS),
redlining.to_crs(epsg
mean_temp_f.values, =transform,
affine=np.nan
nodata
)
# Converts stats to a dataframe
= pd.DataFrame(redlining_stats)
redlining_stats
redlining_stats.head()
min | max | mean | count | |
---|---|---|---|---|
0 | 76.009914 | 85.732779 | 81.403242 | 1731 |
1 | 73.232288 | 88.181473 | 77.453338 | 7074 |
2 | 78.858240 | 88.117261 | 82.955090 | 1639 |
3 | 76.950025 | 95.523473 | 87.345773 | 2150 |
4 | 74.434519 | 89.862934 | 79.951873 | 1717 |
### ALTERNATIVE
### in this case, you don't need to specify the transform,
### it's loaded from the TIF file
= zonal_stats(redlining.to_crs(epsg=CRS), "./mean_temp_f.tif")
redlining_stats
# Converts stats to a dataframe
= pd.DataFrame(redlining_stats) redlining_stats
/Users/nhand/mambaforge/envs/musa-550-fall-2023/lib/python3.10/site-packages/rasterstats/io.py:328: NodataWarning: Setting nodata to -999; specify nodata explicitly
warnings.warn(
# Store the mean temp back in the original redlining dataframe
"mean_temp"] = redlining_stats['mean'] redlining[
redlining.head()
area_id | holc_grade | geometry | holc_color | mean_temp | |
---|---|---|---|---|---|
0 | 0 | A | POLYGON ((-75.17329 40.06201, -75.17572 40.060... | #388E3C | 81.403242 |
1 | 1 | A | MULTIPOLYGON (((-75.19831 40.05021, -75.19899 ... | #388E3C | 77.453338 |
2 | 2 | A | POLYGON ((-75.12810 40.04734, -75.12844 40.045... | #388E3C | 82.955090 |
3 | 3 | A | POLYGON ((-75.11494 40.03443, -75.13476 40.036... | #388E3C | 87.345773 |
4 | 4 | A | POLYGON ((-75.18377 40.02725, -75.17894 40.023... | #388E3C | 79.951873 |
Part 3: Use hvplot to plot a choropleth of the mean temperature
- Make a two panel chart with the redlined areas on the right and the mean temperature on the left
- To quickly let us see which areas are hotter or colder than the citywide average, subtract out the mean temperature of the entire city
Hint - You can use the np.nanmean()
function to calculate the mean of the temperature raster data, automatically ignoring any NaN pixels - Use a diverging color palette to visualize the difference from the citywide average
mean_temp_f.values
array([[79.40579086, 79.18871976, 78.98910218, ..., 91.86524855,
91.72038357, 91.49660753],
[79.07306227, 78.86421586, 78.64685031, ..., 91.56712235,
91.60934815, 91.6311314 ],
[78.87625297, 78.77157573, 78.57777687, ..., 91.33527213,
91.66605844, 91.73405961],
...,
[83.01805262, 82.50340008, 81.75768196, ..., 80.579109 ,
80.49671178, 80.13207064],
[83.16916028, 82.64632139, 81.7252 , ..., 80.72431108,
80.4982923 , 80.03521661],
[83.14992768, 82.55447768, 81.35868618, ..., 80.90148737,
80.65920451, 80.25022931]])
# Nan values cause this to not be defined!
mean_temp_f.values.mean()
nan
= np.nanmean(mean_temp_f.values) # use nanmean() to skip NaN pixels automatically!
citywide_mean_temp 'mean_temp_normalized'] = redlining['mean_temp'] - citywide_mean_temp redlining[
citywide_mean_temp
81.70253802309126
= redlining.hvplot(
choro ="mean_temp_normalized",
c=400,
frame_width=400,
frame_height=True,
geo=["holc_grade", 'mean_temp'],
hover_cols=0.5,
alpha='coolwarm',
cmap
)
* gv.tile_sources.ESRI + holc_map * gv.tile_sources.ESRI choro
Part 4: Calculate the average temp by HOLC grade
Use a groupby / mean operation to calculate the mean tempeature for eah HOLC grade, e.g., A, B, C, etc.
You should find that better HOLC grades correspond to lower mean temperatures
"holc_grade")["mean_temp"].mean().sort_values() redlining.groupby(
holc_grade
A 82.364416
B 84.636335
C 86.298266
D 86.305663
Commercial 86.755549
Name: mean_temp, dtype: float64