Geospatial Data¶
In this lesson, we'll introduce methods to manipulate geospatial data: data about locations on Earth, such as the information in a map. By the end of this lesson, students will be able to:
- Apply filters and plot geospatial data stored in shapefiles using
geopandas. - Describe the difference between numeric coordinate data and geospatial data.
- Draw multiple plots on the same figure by using
subplotsto specify axes.
Geospatial data is often tabular just like CSV files. But they typically contain extra data representing the geometry of each area. geopandas is a library that extends pandas to automatically process the geometries.
import geopandas as gpd
import matplotlib.pyplot as plt
GeoDataFrame¶
Geospatial data is often communicated shapefile format with the .shp extension. We can create a GeoDataFrame from a shapefile by calling gpd.read_file. The following dataset of world countries has 169 columns: today, we'll work with only a handful of them.
columns = ["POP_EST", "GDP_MD", "CONTINENT", "SUBREGION", "geometry"]
countries = gpd.read_file("ne_110m_admin_0_countries.shp").set_index("NAME")[columns]
countries
| POP_EST | GDP_MD | CONTINENT | SUBREGION | geometry | |
|---|---|---|---|---|---|
| NAME | |||||
| Fiji | 889953.0 | 5496 | Oceania | Melanesia | MULTIPOLYGON (((180.00000 -16.06713, 180.00000... |
| Tanzania | 58005463.0 | 63177 | Africa | Eastern Africa | POLYGON ((33.90371 -0.95000, 34.07262 -1.05982... |
| W. Sahara | 603253.0 | 907 | Africa | Northern Africa | POLYGON ((-8.66559 27.65643, -8.66512 27.58948... |
| Canada | 37589262.0 | 1736425 | North America | Northern America | MULTIPOLYGON (((-122.84000 49.00000, -122.9742... |
| United States of America | 328239523.0 | 21433226 | North America | Northern America | MULTIPOLYGON (((-122.84000 49.00000, -120.0000... |
| ... | ... | ... | ... | ... | ... |
| Serbia | 6944975.0 | 51475 | Europe | Southern Europe | POLYGON ((18.82982 45.90887, 18.82984 45.90888... |
| Montenegro | 622137.0 | 5542 | Europe | Southern Europe | POLYGON ((20.07070 42.58863, 19.80161 42.50009... |
| Kosovo | 1794248.0 | 7926 | Europe | Southern Europe | POLYGON ((20.59025 41.85541, 20.52295 42.21787... |
| Trinidad and Tobago | 1394973.0 | 24269 | North America | Caribbean | POLYGON ((-61.68000 10.76000, -61.10500 10.890... |
| S. Sudan | 11062113.0 | 11998 | Africa | Eastern Africa | POLYGON ((30.83385 3.50917, 29.95350 4.17370, ... |
177 rows × 5 columns
What makes a GeoDataFrame different from a regular DataFrame is the inclusion of a geometry column that is automatically plotted when we call plot().
ax = countries.plot()
ax.set(title="world map")
ax.set_axis_off()
The result of calling plot is a matplotlib Axes object, which is different from the seaborn FacetGrid objects that we used earlier. There will be some syntactical differences, but we can similarly customize an Axes by calling methods like set(...) to define labels or set_axis_off() to remove the automatic latitude and longitude label markings.
We can also pass keyword arguments to plot to create more interesting choropleth maps: maps where the color of each shape is based on a corresponding value. For example, to plot each country shaded according to population, specify column="POP_EST". To add a legend, specify legend=True.
ax = countries.plot(column="POP_EST", legend=True)
# ax = countries.plot(column="GDP_MD", legend=True)
# ax = countries.plot(column="CONTINENT", legend=True)
ax.set_axis_off()
type(ax)
matplotlib.axes._axes.Axes
Practice: South American country populations¶
Write an expression to plot a choropleth map of the POP_EST column for every country in the CONTINENT "South America". Include a legend.
south_america_filter = countries['CONTINENT'] == 'South America'
countries[south_america_filter].plot(column="POP_EST", legend=True)
<Axes: >
Customizing Axes with subplots¶
By default, each call to plot will return a new set of matplotlib Axes to represent the map. Since we're working with matplotlib rather than seaborn, our plots will typically require some work to arrange everything into a single figure.
To add more space to a plot, call plt.subplots to create a new figure and a new set of Axes within that figure. To make a larger figure, specify figsize=(width, height) where width and height are numbers. We want a wider figure to make more space for the legend.
fig, ax = plt.subplots(figsize=(10, 5))
countries.plot(ax=ax, column="POP_EST", legend=True)
<Axes: >
# fig, axs = plt.subplots(nrows=2, ncols=1)
# fig, axs = plt.subplots(nrows=2, ncols=2)
fig, axs = plt.subplots(nrows=1, ncols=2)
axs
array([<Axes: >, <Axes: >], dtype=object)
The first two keyword arguments for plt.subplots specify nrows and ncols. We can create a 2-tall by 3-wide figure and use tuple unpacking to handle the 6 resulting Axes objects. Let's plot the countries that belong to each of the following continents.
continents = list(countries['CONTINENT'].unique())
continents.remove('Seven seas (open ocean)')
continents
['Oceania', 'Africa', 'North America', 'Asia', 'South America', 'Europe', 'Antarctica']
continents = ["Oceania", "Africa", "North America", "Asia", "South America", "Europe"]
fig, [[ax1, ax2, ax3], [ax4, ax5, ax6]] = plt.subplots(2, 3, figsize=(15, 10))
axs = [ax1, ax2, ax3, ax4, ax5, ax6]
for continent, ax in zip(continents, axs):
# print(continent, ax)
continent_filter = countries['CONTINENT'] == continent
countries[continent_filter].plot(ax=ax, column="POP_EST", legend=True)
ax.set_axis_off()
ax.set(title=f'Map for {continent}')
fig.suptitle('Map by continent')
Text(0.5, 0.98, 'Map by continent')
Practice: Trillionaire GDP choropleth¶
Plot the GDP_MD (GDP in millions of US dollars) for every country with a GDP_MD value greater than 1000000 atop this background of the world map.
ax = countries.plot(color="#EEE")
countries[countries['GDP_MD'] > 1000000].plot(ax=ax, column='GDP_MD', legend=True)
<Axes: >
ax = countries[countries['GDP_MD'] > 1000000].plot(column='GDP_MD', legend=True)
countries.plot(ax=ax, color="#EEE")
<Axes: >
# plot for the poll question
ax = countries.plot(color='#EEE')
countries[countries['CONTINENT'] == 'Asia'].plot(ax=ax)
<Axes: >
countries[countries['CONTINENT'] == 'Asia'].iloc[[0]] # plot() works on a dataframe not a single record/series
| POP_EST | GDP_MD | CONTINENT | SUBREGION | geometry | |
|---|---|---|---|---|---|
| NAME | |||||
| Kazakhstan | 18513930.0 | 181665 | Asia | Central Asia | POLYGON ((87.35997 49.21498, 86.59878 48.54918... |
nrows = 4
ncols = 3
fig, axs = plt.subplots(nrows=nrows, ncols=ncols)
for i in range(nrows):
for j in range(ncols):
countries[countries['CONTINENT'] == 'Asia'].iloc[[i*ncols + j]].plot(ax=axs[i][j], column='POP_EST')