Geospatial Data Practice¶
In this section, we will practice manipulating and plotting geospatial data.
import geopandas as gpd
import matplotlib.pyplot as plt
Our Dataset: Countries!¶
Run the cell below to see the countries GeoDataFrame from yesterday's lecture, which you'll be working with today.
countries = gpd.read_file("ne_110m_admin_0_countries.shp")
Group Activity¶
Write a function called highlight_population
that takes a countries GeoDataFrame and a continent name as input and returns a plot that colors the specified continent based on its population. Instead of plotting raw population numbers, the color should represent the continent's population as a percentage of the global population. To do this, you should add a new column to the dataset called pop_ratio
.
The plot should show all countries outside of the continent as grey (color being #EEEEEE
and edgecolor #FFFFFF
). The plot should also include a legend. The legend should be scaled so the minimum value is 0 (vmin=0
) and the maximum value is 1 (vmax=1
). Finally, make sure the figsize is set to figsize=(15, 10)
.
def highlight_population(countries, continent):
"""Given a GeoDataFrame representing world data and a string continent name,
returns a plot that colors the inputted continent
as a ratio of gloabal population"""
# calculating global population
total_pop = countries["POP_EST"].sum()
# dissolving on continent
# NOTE: need to filter BEFORE dissolving because there
# is categorical data that can't be averaged
countries_subset = countries[["geometry", "CONTINENT", "POP_EST"]]
countries_subset = countries_subset.dissolve("CONTINENT", aggfunc="sum")
# data manipulation
countries_subset = countries_subset.loc[slice(continent)]
countries_subset["pop_ratio"] = countries_subset["POP_EST"] / total_pop
# plotting
fig, ax = plt.subplots(1, figsize=(15,10))
countries.plot(ax=ax, color="#EEEEEE", edgecolor="#FFFFFF")
countries_subset.plot(ax=ax, column="pop_ratio", legend=True, vmin=0, vmax=1)
return ax
highlight_population(countries, "Africa")
<Axes: >
Write a function called gdp_and_population_ratio
that takes a countries GeoDataFrame as input and returns an Axes
object with two subplots. The first subplot should color each continent based on its percentage of the world's population, while the second should color each continent based on its percentage of the world’s GDP. To achieve this, you may add new columns to the dataset called pop_ratio
and gdp_ratio
.
The plot should also include a legend. The legend should be scaled so the minimum value is 0 (vmin=0
) and the maximum value is 1 (vmax=1
). Finally, make sure the figsize is set to figsize=(15, 10)
!
HINT: In order to find which columns you might want to use, you can use the list(countries.columns)
properties to inspect what columns are in the dataset.
def gdp_and_population_ratio(countries):
"""Given a GeoDataFrame representing world data,
returns a two figure plot that shows world GDP and
population ratios"""
# data manipulation
total_pop = countries["POP_EST"].sum()
total_gdp = countries["GDP_MD"].sum()
countries["pop_ratio"] = countries["POP_EST"] / total_pop
countries["gdp_ratio"] = countries["GDP_MD"] / total_gdp
# plotting
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(15,10))
countries.plot(ax=ax1, column="pop_ratio", legend=True, vmin=0, vmax=1)
countries.plot(ax=ax2, column="gdp_ratio", legend=True, vmin=0, vmax=1)
return ax1, ax2
gdp_and_population_ratio(countries)
(<Axes: >, <Axes: >)