from matplotlib import pyplot as plt
import numpy as np
import pandas as pd
import geopandas as gpd
Week 14: Raster Data in the Wild
- Section 401
- Dec 6, 2023
"ignore"); np.seterr(
Last two lectures!
Raster data analysis with the holoviz ecosystem
Two case studies:
- Using satellite imagery to detect changes in lake volume
- Detecting urban heat islands in Philadelphia
The decline of the world’s saline lakes
- A 2017 study that looked at the volume decline of many of the world’s saline lakes
- Primarily due to water diversion and climate change
- Estimate the amount of inflow required to sustain these ecosystems
- Copy of the study available in this week’s repository
Some examples…
The Aral Sea in Kazakhstan
2000
2018
Source: https://earthobservatory.nasa.gov/world-of-change/AralSea
Lake Urmia in Iran
1998
2011
Source: https://earthobservatory.nasa.gov/images/76327/lake-orumiyeh-iran
Today: Walker Lake
1988
2017
Source: https://earthobservatory.nasa.gov/images/91921/disappearing-walker-lake
Let’s analyze this in Python!
import intake
import xarray as xr
# hvplot
import hvplot.xarray
import hvplot.pandas
import holoviews as hv
import geoviews as gv
from geoviews.tile_sources import EsriImagery
First, let’s use intake to load the data
Catalog file: https://github.com/pyviz-topics/EarthML/blob/master/examples/catalog.yml
Source: EarthML
= 'https://raw.githubusercontent.com/pyviz-topics/EarthML/master/examples/catalog.yml'
url = intake.open_catalog(url) cat
list(cat)
['landsat_5_small',
'landsat_8_small',
'landsat_5',
'landsat_8',
'google_landsat_band',
'amazon_landsat_band',
'fluxnet_daily',
'fluxnet_metadata',
'seattle_lidar']
We’ll focus on the Landsat 5 and 8 small datasets.
These are “small” snapshots around Walker Lake, around cut out of the larger Landsat dataset.
= cat.landsat_5_small() landsat_5
type(landsat_5)
intake_xarray.raster.RasterIOSource
landsat_5
landsat_5_small:
args:
chunks:
band: 1
x: 50
y: 50
concat_dim: band
storage_options:
anon: true
urlpath: s3://earth-data/landsat/small/LT05_L1TP_042033_19881022_20161001_01_T1_sr_band{band:d}.tif
description: Small version of Landsat 5 Surface Reflectance Level-2 Science Product.
driver: intake_xarray.raster.RasterIOSource
metadata:
cache:
- argkey: urlpath
regex: earth-data/landsat
type: file
catalog_dir: https://raw.githubusercontent.com/pyviz-topics/EarthML/master/examples
plots:
band_image:
dynamic: false
groupby: band
kind: image
rasterize: true
width: 400
x: x
y: y
Intake let’s you define “default” plots for a dataset
Leveraging the power of hvplot under the hood…
landsat_5.hvplot.band_image()
What’s happening under the hood?
landsat_5.hvplot.image(="x",
x="y",
y="band",
groupby=True,
rasterize=400,
frame_width=400,
frame_height=True,
geo=32611,
crs )
We can do the same for the Landsat 8 data
= cat.landsat_8_small() landsat_8
landsat_8.hvplot.image(="x",
x="y",
y="band",
groupby=True,
rasterize=400,
frame_width=400,
frame_height=True,
geo=32611,
crs )
We can use dask to read the data
This will return an xarray
DataArray
where the values are stored as dask
arrays.
= landsat_5.to_dask()
landsat_5_da = landsat_8.to_dask() landsat_8_da
landsat_5_da
<xarray.DataArray (band: 6, y: 300, x: 300)> array([[[ 640., 842., 864., ..., 1250., 929., 1111.], [ 796., 774., 707., ..., 1136., 906., 1065.], [ 975., 707., 908., ..., 1386., 1249., 1088.], ..., [1243., 1202., 1160., ..., 1132., 1067., 845.], [1287., 1334., 1292., ..., 801., 934., 845.], [1550., 1356., 1314., ..., 1309., 1636., 1199.]], [[ 810., 1096., 1191., ..., 1749., 1266., 1411.], [1096., 1048., 905., ..., 1556., 1217., 1411.], [1286., 1000., 1286., ..., 1749., 1604., 1411.], ..., [1752., 1565., 1566., ..., 1502., 1456., 1078.], [1752., 1799., 1706., ..., 983., 1172., 1077.], [1893., 1753., 1754., ..., 1736., 2250., 1736.]], [[1007., 1345., 1471., ..., 2102., 1462., 1719.], [1260., 1259., 1175., ..., 1847., 1419., 1719.], [1555., 1175., 1555., ..., 2059., 1889., 1760.], ..., ... ..., [2429., 2138., 2041., ..., 2175., 1885., 1301.], [2381., 2333., 2382., ..., 1204., 1495., 1301.], [2478., 2041., 2333., ..., 2755., 3431., 2223.]], [[1819., 2596., 2495., ..., 2429., 1785., 2023.], [2259., 2359., 1885., ..., 2158., 1684., 1921.], [2865., 2291., 2664., ..., 2057., 1955., 2057.], ..., [3081., 2679., 2612., ..., 2499., 2098., 1395.], [2779., 2544., 2779., ..., 1429., 1596., 1496.], [3183., 2309., 2679., ..., 3067., 3802., 2665.]], [[1682., 2215., 2070., ..., 2072., 1440., 1780.], [1876., 1973., 1633., ..., 1926., 1586., 1635.], [2409., 1924., 2215., ..., 1829., 1780., 1829.], ..., [2585., 2296., 2296., ..., 2093., 1710., 1134.], [2393., 2344., 2489., ..., 1182., 1374., 1326.], [2826., 2007., 2393., ..., 2860., 3724., 2333.]]]) Coordinates: * band (band) int64 1 2 3 4 5 7 * x (x) float64 3.324e+05 3.326e+05 ... 3.771e+05 3.772e+05 * y (y) float64 4.309e+06 4.309e+06 ... 4.264e+06 4.264e+06 spatial_ref int64 0 Attributes: AREA_OR_POINT: Area scale_factor: 1.0 add_offset: 0.0
Use “.values” to convert to a numpy array
landsat_5_da.shape
(6, 300, 300)
landsat_5_da.values
array([[[ 640., 842., 864., ..., 1250., 929., 1111.],
[ 796., 774., 707., ..., 1136., 906., 1065.],
[ 975., 707., 908., ..., 1386., 1249., 1088.],
...,
[1243., 1202., 1160., ..., 1132., 1067., 845.],
[1287., 1334., 1292., ..., 801., 934., 845.],
[1550., 1356., 1314., ..., 1309., 1636., 1199.]],
[[ 810., 1096., 1191., ..., 1749., 1266., 1411.],
[1096., 1048., 905., ..., 1556., 1217., 1411.],
[1286., 1000., 1286., ..., 1749., 1604., 1411.],
...,
[1752., 1565., 1566., ..., 1502., 1456., 1078.],
[1752., 1799., 1706., ..., 983., 1172., 1077.],
[1893., 1753., 1754., ..., 1736., 2250., 1736.]],
[[1007., 1345., 1471., ..., 2102., 1462., 1719.],
[1260., 1259., 1175., ..., 1847., 1419., 1719.],
[1555., 1175., 1555., ..., 2059., 1889., 1760.],
...,
[2090., 1840., 1798., ..., 1828., 1703., 1242.],
[2048., 2049., 2008., ..., 1074., 1326., 1158.],
[2216., 1965., 2049., ..., 2202., 2783., 1994.]],
[[1221., 1662., 1809., ..., 2401., 1660., 1957.],
[1564., 1465., 1367., ..., 2105., 1610., 1907.],
[2004., 1465., 1955., ..., 2302., 2055., 2006.],
...,
[2429., 2138., 2041., ..., 2175., 1885., 1301.],
[2381., 2333., 2382., ..., 1204., 1495., 1301.],
[2478., 2041., 2333., ..., 2755., 3431., 2223.]],
[[1819., 2596., 2495., ..., 2429., 1785., 2023.],
[2259., 2359., 1885., ..., 2158., 1684., 1921.],
[2865., 2291., 2664., ..., 2057., 1955., 2057.],
...,
[3081., 2679., 2612., ..., 2499., 2098., 1395.],
[2779., 2544., 2779., ..., 1429., 1596., 1496.],
[3183., 2309., 2679., ..., 3067., 3802., 2665.]],
[[1682., 2215., 2070., ..., 2072., 1440., 1780.],
[1876., 1973., 1633., ..., 1926., 1586., 1635.],
[2409., 1924., 2215., ..., 1829., 1780., 1829.],
...,
[2585., 2296., 2296., ..., 2093., 1710., 1134.],
[2393., 2344., 2489., ..., 1182., 1374., 1326.],
[2826., 2007., 2393., ..., 2860., 3724., 2333.]]])
- EPSG 32611 is the default CRS for Landsat data
- Units are meters
Evidence of shrinkage?
- We want to compare the images across time.
- Data for Landsat 8 is from 2017
- Data for Landat 5 is from 1988
Problem: they appear to cover different areas
First: Let’s plot RGB images of the two datasets
See lecture 5B for a reminder!
Hints - We want to use the earthpy
package to do the plotting. - We can use the data.sel(band=bands)
syntax to select specific bands from the dataset, where bands is a list of the desired bands to be selected.
Important notes
- For Landsat 5, the RGB bands are bands 3, 2, and 1, respectively.
- For Landsat 8, the RGB bands are bands 4, 3, and 2, respectively.
- Reference: Landsat 5 and Landsat 8
import earthpy.plot as ep
Landsat 8 color image
# Get the RGB data from landsat 8 dataset as a numpy array
= landsat_8_da.sel(band=[4,3,2]).values
rgb_data_8
# # Initialize
= plt.subplots(figsize=(10,10))
fig, ax
# # Plot the RGB bands
=(0, 1, 2), ax=ax); ep.plot_rgb(rgb_data_8, rgb
Landsat 5 color image
# Get the RGB data from landsat 5 dataset as a numpy array
= landsat_5_da.sel(band=[3,2,1]).values
rgb_data_5
# # Initialize
= plt.subplots(figsize=(10,10))
fig, ax
# # Plot the RGB bands
=(0, 1, 2), ax=ax); ep.plot_rgb(rgb_data_5, rgb
Can we trim these to the same areas?
Yes, we can use xarray builtin selection functionality!
The Landsat 5 images is more zoomed in, so let’s trim to the Landsat 8 data to this range
Take advantage of the “coordinate” arrays provided by xarray:
landsat_5_da.x
<xarray.DataArray 'x' (x: 300)> array([332400., 332550., 332700., ..., 376950., 377100., 377250.]) Coordinates: * x (x) float64 3.324e+05 3.326e+05 ... 3.771e+05 3.772e+05 spatial_ref int64 0
landsat_5_da.y
<xarray.DataArray 'y' (y: 300)> array([4309200., 4309050., 4308900., ..., 4264650., 4264500., 4264350.]) Coordinates: * y (y) float64 4.309e+06 4.309e+06 ... 4.264e+06 4.264e+06 spatial_ref int64 0
Remember: These coordinates are in units of meters!
Let’s get the bounds of the Landsat 5 image:
# x bounds
= landsat_5_da.x.min()
xmin = landsat_5_da.x.max()
xmax
# y bounds
= landsat_5_da.y.min()
ymin = landsat_5_da.y.max() ymax
Slicing with xarray
We can use Python’s built-in slice()
function to slice the data’s coordinate arrays and select the subset of the data we want.
Slicing in Python
- A slice object is used to specify how to slice a sequence.
- You can specify where to start the slicing, and where to end. You can also specify the step, which allows you to e.g. slice only every other item.
Syntax: slice(start, end, step)
More info: https://www.w3schools.com/python/ref_func_slice.asp
An example
= ["a", "b", "c", "d", "e", "f", "g", "h"]
letters
0:5:2] letters[
['a', 'c', 'e']
slice(0, 5, 2)] letters[
['a', 'c', 'e']
Back to xarray and the Landsat data…
We can use the .sel()
function to slice our x
and y
coordinate arrays!
xmax
<xarray.DataArray 'x' ()> array(377250.) Coordinates: spatial_ref int64 0
# slice the x and y coordinates
= slice(xmin, xmax)
slice_x = slice(ymax, ymin) # IMPORTANT: y coordinate array is in descending order slice_y
slice_x
slice(<xarray.DataArray 'x' ()>
array(332400.)
Coordinates:
spatial_ref int64 0, <xarray.DataArray 'x' ()>
array(377250.)
Coordinates:
spatial_ref int64 0, None)
# Use the .sel() to slice
= landsat_8_da.sel(x=slice_x, y=slice_y) landsat_8_trimmed
The y
coordinate is stored in descending order, so the slice should be ordered the same way (from ymax to ymin)
Let’s look at the trimmed data
landsat_8_trimmed.shape
(7, 214, 213)
landsat_8_da.shape
(7, 286, 286)
landsat_8_trimmed
<xarray.DataArray (band: 7, y: 214, x: 213)> array([[[ 730., 721., 703., ..., 970., 1059., 520.], [ 700., 751., 656., ..., 835., 1248., 839.], [ 721., 754., 796., ..., 1065., 1207., 1080.], ..., [1022., 949., 983., ..., 516., 598., 676.], [ 976., 757., 769., ..., 950., 954., 634.], [ 954., 1034., 788., ..., 1133., 662., 1055.]], [[ 874., 879., 858., ..., 1157., 1259., 609.], [ 860., 919., 814., ..., 968., 1523., 983.], [ 896., 939., 982., ..., 1203., 1412., 1292.], ..., [1243., 1157., 1212., ..., 604., 680., 805.], [1215., 953., 949., ..., 1095., 1110., 755.], [1200., 1258., 978., ..., 1340., 758., 1189.]], [[1181., 1148., 1104., ..., 1459., 1633., 775.], [1154., 1223., 1093., ..., 1220., 1851., 1345.], [1198., 1258., 1309., ..., 1531., 1851., 1674.], ..., ... ..., [2652., 2459., 2649., ..., 1190., 1299., 1581.], [2547., 1892., 2212., ..., 2284., 2416., 1475.], [2400., 2627., 2405., ..., 2469., 1579., 2367.]], [[3039., 2806., 2877., ..., 1965., 2367., 1203.], [2779., 2799., 2474., ..., 1685., 2339., 1637.], [2597., 2822., 2790., ..., 2030., 2587., 2116.], ..., [3144., 2892., 3168., ..., 1453., 1594., 1765.], [3109., 2405., 2731., ..., 2804., 3061., 1653.], [2518., 3110., 3144., ..., 2801., 2051., 2770.]], [[2528., 2326., 2417., ..., 1748., 2139., 1023.], [2260., 2238., 1919., ..., 1519., 2096., 1448.], [2041., 2226., 2247., ..., 1848., 2380., 1973.], ..., [2661., 2423., 2556., ..., 1225., 1335., 1469.], [2573., 1963., 2091., ..., 2479., 2570., 1393.], [2191., 2525., 2504., ..., 2658., 1779., 2514.]]]) Coordinates: * band (band) int64 1 2 3 4 5 6 7 * x (x) float64 3.326e+05 3.328e+05 ... 3.769e+05 3.771e+05 * y (y) float64 4.309e+06 4.309e+06 ... 4.265e+06 4.264e+06 spatial_ref int64 0 Attributes: AREA_OR_POINT: Area scale_factor: 1.0 add_offset: 0.0
Plot the trimmed Landsat 8 data:
# Get the trimmed landsat 8 RGB data as a numpy array
= landsat_8_trimmed.sel(band=[4,3,2]).values
rgb_data_8
# # Initialize
= plt.subplots(figsize=(10,10))
fig, ax
# # Plot the RGB bands
=(0, 1, 2), ax=ax); ep.plot_rgb(rgb_data_8, rgb
Plot the original Landsat 5 data:
# Select the RGB data as a numpy array
= landsat_5_da.sel(band=[3,2,1]).values
rgb_data_5
# Initialize the figure
= plt.subplots(figsize=(10,10))
fig, ax
# Plot the RGB bands
=(0, 1, 2), ax=ax); ep.plot_rgb(rgb_data_5, rgb
Some evidence of shrinkage…but can we do better?
Yes! We’ll use the change in the NDVI over time to detect change in lake volume
Calculate the NDVI for the Landsat 5 (1988) and Landsat 8 (2017) datasets
- NDVI = (NIR - Red) / (NIR + Red)
- You can once again use the
.sel()
function to select certain bands from the datasets
For Landsat 5: NIR = band 4 and Red = band 3
For Landsat 8: NIR = band 5 and Red = band 4
NDVI 1988
= landsat_5_da.sel(band=4)
NIR_1988 = landsat_5_da.sel(band=3)
RED_1988
= (NIR_1988 - RED_1988) / (NIR_1988 + RED_1988) NDVI_1988
NDVI 2017
= landsat_8_da.sel(band=5)
NIR_2017 = landsat_8_da.sel(band=4)
RED_2017
= (NIR_2017 - RED_2017) / (NIR_2017 + RED_2017) NDVI_2017
The difference between 2017 and 1988
- Take the difference between the 2017 NDVI and the 1988 NDVI
- Use the
hvplot.image()
function to show the difference - A diverging palette, like
coolwarm
, is particularly useful for this situation
= NDVI_2017 - NDVI_1988
diff
diff.hvplot.image(="x",
x="y",
y=400,
frame_height=400,
frame_width="coolwarm",
cmap=(-1, 1),
clim=True,
geo=32611,
crs )
/Users/nhand/mambaforge/envs/musa-550-fall-2023/lib/python3.10/site-packages/osgeo/osr.py:385: FutureWarning: Neither osr.UseExceptions() nor osr.DontUseExceptions() has been explicitly called. In GDAL 4.0, exceptions will be enabled by default.
warnings.warn(
NDVI_1988.shape
(300, 300)
NDVI_2017.shape
(286, 286)
diff.shape
(42, 43)
What’s going on here? Why so pixelated?
Two issues: - Different x/y bounds - Different resolutions
# Different x/y ranges!
print(NDVI_1988.x[0].values)
print(NDVI_2017.x[0].values)
332400.0
318300.0
# Different resolutions
print((NDVI_1988.x[1] - NDVI_1988.x[0]).values)
print((NDVI_2017.x[1] - NDVI_2017.x[0]).values)
150.0
210.0
Use xarray to put the data on the same grid
First, calculate a bounding box around the center of the data
# The center lat/lng values in EPSG = 4326
# I got these points from google maps
= -118.7081
x0 = 38.6942 y0
Let’s convert these coordinates to the sames CRS as the Landsat data
We can also use geopandas:
= pd.DataFrame({"lng": [x0], "lat": [y0]}) pt
= gpd.GeoDataFrame(
gpt =gpd.points_from_xy(pt["lng"], pt["lat"]), crs="EPSG:4326"
pt, geometry
)
gpt
lng | lat | geometry | |
---|---|---|---|
0 | -118.7081 | 38.6942 | POINT (-118.70810 38.69420) |
Convert to the Landsat CRS. Remember: we are converting from EPSG=4326 to EPSG=32611
= gpt.to_crs(epsg=32611) pt_32611
Let’s add a circle with radius 15 km around the midpoint
= pt_32611.copy()
pt_32611_buffer = pt_32611.geometry.buffer(15e3) # Add a 15 km radius buffer pt_32611_buffer.geometry
Did this work?
Use the builting geoviews imagery to confirm..
* pt_32611_buffer.hvplot(alpha=0.4, geo=True, crs=32611) EsriImagery
Now, let’s set up the grid
= 200 # 200 meter resolution
res = np.arange(xmin, xmax, res)
x = np.arange(ymin, ymax, res) y
len(x)
225
len(y)
225
Do the re-gridding
This does a linear interpolation of the data using the nearest pixels.
= NDVI_2017.interp(x=x, y=y)
NDVI_2017_regridded = NDVI_1988.interp(x=x, y=y) NDVI_1988_regridded
Plot the re-gridded data side-by-side
= NDVI_1988_regridded.hvplot.image(
img1988 ="x",
x="y",
y=32611,
crs=True,
geo=300,
frame_height=300,
frame_width=(-3, 1),
clim="fire",
cmap="1988"
title
)
= NDVI_2017_regridded.hvplot.image(
img2017 ="x",
x="y",
y=32611,
crs=True,
geo=300,
frame_height=300,
frame_width=(-3, 1),
clim="fire",
cmap="2017"
title
)
+ img2017 img1988
Now that the images are lined up, the change in lake volume is clearly apparent
= NDVI_2017_regridded - NDVI_1988_regridded
diff_regridded diff_regridded
<xarray.DataArray (y: 225, x: 225)> array([[ 0.04698485, 0.06277272, 0.04696496, ..., 0.04812127, -0.01462977, 0.01676142], [ 0.04915427, 0.03146362, 0.02520037, ..., 0.00764362, 0.02925599, 0.03030879], [ 0.05773904, 0.03895859, 0.03848018, ..., 0.04350497, 0.02471431, 0.03191422], ..., [ 0.02993016, 0.03760638, 0.03355653, ..., -0.00899765, 0.0051297 , -0.00335074], [ 0.02457032, 0.04594643, 0.03676735, ..., 0.00616795, -0.01034597, -0.00197619], [ 0.07661604, 0.04792224, 0.04808855, ..., 0.00257181, -0.00252524, 0.00854541]]) Coordinates: spatial_ref int64 0 * x (x) float64 3.324e+05 3.326e+05 ... 3.77e+05 3.772e+05 * y (y) float64 4.264e+06 4.265e+06 ... 4.309e+06 4.309e+06
diff_regridded.hvplot.image(="x",
x="y",
y=32611,
crs=True,
geo=400,
frame_height=400,
frame_width="coolwarm",
cmap=(-1, 1),
clim )
Takeaway:
The red areas (more vegetation in 2017) show clear loss of lake volume
One more thing: downsampling hi-res data
Given hi-resolution data, we can downsample to a lower resolution with the familiar groupby / mean framework from pandas
Let’s try a resolution of 1000 meters instead of 200 meters
# Define a low-resolution grid
= 1000
res_1000 = np.arange(xmin, xmax, res_1000)
x_1000 = np.arange(ymin, ymax, res_1000) y_1000
# Groupby new bins and take the mean of all pixels falling into a group
# First: groupby low-resolution x bins and take mean
# Then: groupby low-resolution y bins and take mean
= (
diff_res_1000 "x", x_1000, labels=x_1000[:-1])
diff_regridded.groupby_bins(="x")
.mean(dim"y", y_1000, labels=y_1000[:-1])
.groupby_bins(="y")
.mean(dim="x", y_bins="y")
.rename(x_bins
) diff_res_1000
<xarray.DataArray (y: 44, x: 44)> array([[0.04441188, 0.05055985, 0.05840989, ..., 0.03623162, 0.02836244, 0.0223555 ], [0.04569241, 0.04080022, 0.04404605, ..., 0.03685891, 0.02847791, 0.02341687], [0.04603897, 0.03883327, 0.04078481, ..., 0.02513061, 0.02298011, 0.01997115], ..., [0.03367364, 0.03217862, 0.0394038 , ..., 0.01064314, 0.01451403, 0.01129917], [0.03634618, 0.03259205, 0.03609488, ..., 0.00979841, 0.01484085, 0.01514796], [0.03707802, 0.03183665, 0.04082304, ..., 0.00811861, 0.00924846, 0.00649866]]) Coordinates: spatial_ref int64 0 * x (x) float64 3.324e+05 3.334e+05 ... 3.744e+05 3.754e+05 * y (y) float64 4.264e+06 4.265e+06 ... 4.306e+06 4.307e+06
Now let’s plot the low-resolution version of the difference
diff_res_1000.hvplot.image(="x",
x="y",
y=32611,
crs=True,
geo=500,
frame_width=400,
frame_height="coolwarm",
cmap=(-1, 1),
clim )
Example 2: urban heat islands
- We’ll reproduce an analysis by Urban Spatial on urban heat islands in Philadelphia using Python.
- The analysis uses Landsat 8 data (2017)
- See: http://urbanspatialanalysis.com/urban-heat-islands-street-trees-in-philadelphia/
First load some metadata for Landsat 8
= pd.DataFrame([
band_info 1, "Aerosol", " 0.43 - 0.45", 0.440, "30", "Coastal aerosol"),
(2, "Blue", " 0.45 - 0.51", 0.480, "30", "Blue"),
(3, "Green", " 0.53 - 0.59", 0.560, "30", "Green"),
(4, "Red", " 0.64 - 0.67", 0.655, "30", "Red"),
(5, "NIR", " 0.85 - 0.88", 0.865, "30", "Near Infrared (NIR)"),
(6, "SWIR1", " 1.57 - 1.65", 1.610, "30", "Shortwave Infrared (SWIR) 1"),
(7, "SWIR2", " 2.11 - 2.29", 2.200, "30", "Shortwave Infrared (SWIR) 2"),
(8, "Panc", " 0.50 - 0.68", 0.590, "15", "Panchromatic"),
(9, "Cirrus", " 1.36 - 1.38", 1.370, "30", "Cirrus"),
(10, "TIRS1", "10.60 - 11.19", 10.895, "100 * (30)", "Thermal Infrared (TIRS) 1"),
(11, "TIRS2", "11.50 - 12.51", 12.005, "100 * (30)", "Thermal Infrared (TIRS) 2")],
(=['Band', 'Name', 'Wavelength Range (µm)', 'Nominal Wavelength (µm)', 'Resolution (m)', 'Description']).set_index(["Band"])
columns band_info
Name | Wavelength Range (µm) | Nominal Wavelength (µm) | Resolution (m) | Description | |
---|---|---|---|---|---|
Band | |||||
1 | Aerosol | 0.43 - 0.45 | 0.440 | 30 | Coastal aerosol |
2 | Blue | 0.45 - 0.51 | 0.480 | 30 | Blue |
3 | Green | 0.53 - 0.59 | 0.560 | 30 | Green |
4 | Red | 0.64 - 0.67 | 0.655 | 30 | Red |
5 | NIR | 0.85 - 0.88 | 0.865 | 30 | Near Infrared (NIR) |
6 | SWIR1 | 1.57 - 1.65 | 1.610 | 30 | Shortwave Infrared (SWIR) 1 |
7 | SWIR2 | 2.11 - 2.29 | 2.200 | 30 | Shortwave Infrared (SWIR) 2 |
8 | Panc | 0.50 - 0.68 | 0.590 | 15 | Panchromatic |
9 | Cirrus | 1.36 - 1.38 | 1.370 | 30 | Cirrus |
10 | TIRS1 | 10.60 - 11.19 | 10.895 | 100 * (30) | Thermal Infrared (TIRS) 1 |
11 | TIRS2 | 11.50 - 12.51 | 12.005 | 100 * (30) | Thermal Infrared (TIRS) 2 |
Landsat data on Google Cloud Storage
We’ll be downloading publicly available Landsat data from Google Cloud Storage
See: https://cloud.google.com/storage/docs/public-datasets/landsat
The relevant information is stored in our intake catalog:
From our catalog.yml file:
google_landsat_band:
description: Landsat bands from Google Cloud Storage
driver: rasterio
parameters:
path:
description: landsat path
type: int
row:
description: landsat row
type: int
product_id:
description: landsat file id
type: str
band:
description: band
type: int
args:
urlpath: https://storage.googleapis.com/gcp-public-data-landsat/LC08/01/{{ '%03d' % path }}/{{ '%03d' % row }}/{{ product_id }}/{{ product_id }}_B{{ band }}.TIF
chunks:
band: 1
x: 256
y: 256
From the “urlpath” above, you can see we need “path”, “row”, and “product_id” variables to properly identify a Landsat image:
The path and row corresponding to the area over Philadelphia has already been selected using the Earth Explorer. This tool was also used to find the id of the particular date of interest using the same tool.
# Necessary variables
= 14
path = 32
row = 'LC08_L1TP_014032_20160727_20170222_01_T1' product_id
An alternative: Google Earth Engine (GEE)
- A public data archive of satellite imagery going back more than 40 years.
- A Python API exists for pulling data — good resource if you’d like to work with a large amount of raster data
We won’t cover the specifics in the course, but geemap
is a fantastic library for interacting with GEE.
References
- Google Earth Engine (GEE):
geemap
package:
Google Cloud Storage: Use a utility function to query the API and request a specific band
This will return a specific Landsat band as a dask array.
from random import random
from time import sleep
def get_band_with_exponential_backoff(path, row, product_id, band, maximum_backoff=32):
"""
Google Cloud Storage recommends using exponential backoff
when accessing the API.
https://cloud.google.com/storage/docs/exponential-backoff
"""
= backoff = 0
n while backoff < maximum_backoff:
try:
return cat.google_landsat_band(
=path, row=row, product_id=product_id, band=band
path
).to_dask()except:
= min(2 ** n + random(), maximum_backoff)
backoff
sleep(backoff)+= 1 n
Load all of the bands and combine them into a single xarray DataArray
Loop over each band, load that band using the above function, and then concatenate the results together..
= [1, 2, 3, 4, 5, 6, 7, 9, 10, 11]
bands
= []
datasets for band in bands:
= get_band_with_exponential_backoff(
da =path, row=row, product_id=product_id, band=band
path
)= da.assign_coords(band=[band])
da
datasets.append(da)
= xr.concat(datasets, "band", compat="identical")
ds
ds
<xarray.DataArray (band: 10, y: 7871, x: 7741)> dask.array<concatenate, shape=(10, 7871, 7741), dtype=uint16, chunksize=(1, 256, 256), chunktype=numpy.ndarray> Coordinates: * band (band) int64 1 2 3 4 5 6 7 9 10 11 * x (x) float64 3.954e+05 3.954e+05 ... 6.276e+05 6.276e+05 * y (y) float64 4.582e+06 4.582e+06 ... 4.346e+06 4.346e+06 spatial_ref int64 0 Attributes: AREA_OR_POINT: Point scale_factor: 1.0 add_offset: 0.0
CRS for Landsat data is EPSG=32618
Also grab the metadata from Google Cloud Storage
- There is an associated metadata file stored on Google Cloud Storage
- The below function will parse that metadata file for a specific path, row, and product ID
- The specifics of this are not overly important for our purposes
def load_google_landsat_metadata(path, row, product_id):
"""
Load Landsat metadata for path, row, product_id from Google Cloud Storage
"""
def parse_type(x):
try:
return eval(x)
except:
return x
= "https://storage.googleapis.com/gcp-public-data-landsat/LC08/01"
baseurl = f"{path:03d}/{row:03d}/{product_id}/{product_id}_MTL.txt"
suffix
= intake.open_csv(
df =f"{baseurl}/{suffix}",
urlpath={
csv_kwargs"sep": "=",
"header": None,
"names": ["index", "value"],
"skiprows": 2,
"converters": {"index": (lambda x: x.strip()), "value": parse_type},
},
).read()= df.set_index("index")["value"]
metadata return metadata
= load_google_landsat_metadata(path, row, product_id)
metadata metadata.head()
index
ORIGIN Image courtesy of the U.S. Geological Survey
REQUEST_ID 0501702206266_00020
LANDSAT_SCENE_ID LC80140322016209LGN01
LANDSAT_PRODUCT_ID LC08_L1TP_014032_20160727_20170222_01_T1
COLLECTION_NUMBER 01
Name: value, dtype: object
Exercise: Let’s trim our data to the Philadelphia limits
- The Landsat image covers an area much wider than the Philadelphia limits
- We’ll load the city limits from Open Data Philly
- Use the
slice()
function discussed last example!
1. Load the city limits
- From OpenDataPhilly, the city limits for Philadelphia are available at: http://data.phl.opendata.arcgis.com/datasets/405ec3da942d4e20869d4e1449a2be48_0.geojson
- Be sure to convert to the same CRS as the Landsat data!
# Load the GeoJSON from the URL
= "http://data.phl.opendata.arcgis.com/datasets/405ec3da942d4e20869d4e1449a2be48_0.geojson"
url = gpd.read_file(url)
city_limits
# Convert to the right CRS for this data
= city_limits.to_crs(epsg=32618) city_limits
2. Use the xmin/xmax and ymin/ymax of the city limits to trim the Landsat DataArray
- Use the built-in slice functionality of xarray
- Remember, the
y
coordinates are in descending order, so you’ll slice will need to be in descending order as well
city_limits.total_bounds?
Type: property String form: <property object at 0x13fa2dc60> Docstring: Returns a tuple containing ``minx``, ``miny``, ``maxx``, ``maxy`` values for the bounds of the series as a whole. See ``GeoSeries.bounds`` for the bounds of the geometries contained in the series. Examples -------- >>> from shapely.geometry import Point, Polygon, LineString >>> d = {'geometry': [Point(3, -1), Polygon([(0, 0), (1, 1), (1, 0)]), ... LineString([(0, 1), (1, 2)])]} >>> gdf = geopandas.GeoDataFrame(d, crs="EPSG:4326") >>> gdf.total_bounds array([ 0., -1., 3., 2.])
# Get x/y range of city limits from "total_bounds"
= city_limits.total_bounds xmin, ymin, xmax, ymax
# Slice our xarray data
= ds.sel(
subset =slice(xmin, xmax),
x=slice(ymax, ymin)
y# NOTE: y coordinate system is in descending order! )
Step 3: Call the compute() function on your sliced data
- Originally, the Landsat data was stored as dask arrays
- This should now only load the data into memory that covers Philadelphia
This should take about a minute or so, depending on the speed of your laptop.
= subset.compute()
subset
subset
<xarray.DataArray (band: 10, y: 1000, x: 924)> array([[[10702, 10162, 10361, ..., 11681, 11489, 15594], [10870, 10376, 10122, ..., 11620, 11477, 12042], [10429, 10147, 10063, ..., 11455, 11876, 11790], ..., [11944, 11696, 11626, ..., 11192, 11404, 10923], [12406, 11555, 11155, ..., 11516, 11959, 10766], [11701, 11232, 10657, ..., 10515, 11475, 10926]], [[ 9809, 9147, 9390, ..., 10848, 10702, 15408], [ 9989, 9405, 9139, ..., 10831, 10660, 11361], [ 9439, 9083, 8981, ..., 10654, 11141, 11073], ..., [11220, 10853, 10741, ..., 10318, 10511, 9950], [11765, 10743, 10259, ..., 10646, 11378, 9829], [10889, 10365, 9630, ..., 9500, 10573, 10008]], [[ 9042, 8466, 8889, ..., 10014, 9647, 14981], [ 9699, 8714, 8596, ..., 9866, 9783, 11186], [ 8623, 8457, 8334, ..., 9688, 10474, 9993], ..., ... ..., [ 5027, 5028, 5038, ..., 5035, 5037, 5029], [ 5058, 5021, 5023, ..., 5035, 5041, 5031], [ 5036, 5041, 5040, ..., 5036, 5044, 5035]], [[29033, 28976, 28913, ..., 32614, 32577, 32501], [28940, 28904, 28858, ..., 32516, 32545, 32545], [28882, 28879, 28854, ..., 32431, 32527, 32563], ..., [30094, 29929, 29713, ..., 29521, 29525, 29429], [30140, 29972, 29696, ..., 29556, 29516, 29398], [30133, 29960, 29614, ..., 29587, 29533, 29424]], [[25558, 25519, 25492, ..., 27680, 27650, 27619], [25503, 25450, 25402, ..., 27636, 27630, 27639], [25473, 25434, 25378, ..., 27609, 27668, 27667], ..., [26126, 26055, 25934, ..., 25622, 25586, 25520], [26149, 26077, 25935, ..., 25651, 25594, 25507], [26147, 26050, 25856, ..., 25696, 25644, 25571]]], dtype=uint16) Coordinates: * band (band) int64 1 2 3 4 5 6 7 9 10 11 * x (x) float64 4.761e+05 4.761e+05 ... 5.037e+05 5.038e+05 * y (y) float64 4.443e+06 4.443e+06 ... 4.413e+06 4.413e+06 spatial_ref int64 0 Attributes: AREA_OR_POINT: Point scale_factor: 1.0 add_offset: 0.0
# Original data was about 8000 x 8000 in size
ds.shape
(10, 7871, 7741)
# Sliced data is only about 1000 x 1000 in size!
subset.shape
(10, 1000, 924)