Geospatial Data Practice¶
In this section, we will practice manipulating and plotting geospatial data.
import geopandas as gpd
import matplotlib.pyplot as plt
Our Dataset: Countries!¶
Run the cell below to see the countries GeoDataFrame from yesterday's lecture, which you'll be working with today.
countries = gpd.read_file("ne_110m_admin_0_countries.shp").set_index("NAME")
# countries.head()
list(countries.columns)
['featurecla', 'scalerank', 'LABELRANK', 'SOVEREIGNT', 'SOV_A3', 'ADM0_DIF', 'LEVEL', 'TYPE', 'TLC', 'ADMIN', 'ADM0_A3', 'GEOU_DIF', 'GEOUNIT', 'GU_A3', 'SU_DIF', 'SUBUNIT', 'SU_A3', 'BRK_DIFF', 'NAME_LONG', 'BRK_A3', 'BRK_NAME', 'BRK_GROUP', 'ABBREV', 'POSTAL', 'FORMAL_EN', 'FORMAL_FR', 'NAME_CIAWF', 'NOTE_ADM0', 'NOTE_BRK', 'NAME_SORT', 'NAME_ALT', 'MAPCOLOR7', 'MAPCOLOR8', 'MAPCOLOR9', 'MAPCOLOR13', 'POP_EST', 'POP_RANK', 'POP_YEAR', 'GDP_MD', 'GDP_YEAR', 'ECONOMY', 'INCOME_GRP', 'FIPS_10', 'ISO_A2', 'ISO_A2_EH', 'ISO_A3', 'ISO_A3_EH', 'ISO_N3', 'ISO_N3_EH', 'UN_A3', 'WB_A2', 'WB_A3', 'WOE_ID', 'WOE_ID_EH', 'WOE_NOTE', 'ADM0_ISO', 'ADM0_DIFF', 'ADM0_TLC', 'ADM0_A3_US', 'ADM0_A3_FR', 'ADM0_A3_RU', 'ADM0_A3_ES', 'ADM0_A3_CN', 'ADM0_A3_TW', 'ADM0_A3_IN', 'ADM0_A3_NP', 'ADM0_A3_PK', 'ADM0_A3_DE', 'ADM0_A3_GB', 'ADM0_A3_BR', 'ADM0_A3_IL', 'ADM0_A3_PS', 'ADM0_A3_SA', 'ADM0_A3_EG', 'ADM0_A3_MA', 'ADM0_A3_PT', 'ADM0_A3_AR', 'ADM0_A3_JP', 'ADM0_A3_KO', 'ADM0_A3_VN', 'ADM0_A3_TR', 'ADM0_A3_ID', 'ADM0_A3_PL', 'ADM0_A3_GR', 'ADM0_A3_IT', 'ADM0_A3_NL', 'ADM0_A3_SE', 'ADM0_A3_BD', 'ADM0_A3_UA', 'ADM0_A3_UN', 'ADM0_A3_WB', 'CONTINENT', 'REGION_UN', 'SUBREGION', 'REGION_WB', 'NAME_LEN', 'LONG_LEN', 'ABBREV_LEN', 'TINY', 'HOMEPART', 'MIN_ZOOM', 'MIN_LABEL', 'MAX_LABEL', 'LABEL_X', 'LABEL_Y', 'NE_ID', 'WIKIDATAID', 'NAME_AR', 'NAME_BN', 'NAME_DE', 'NAME_EN', 'NAME_ES', 'NAME_FA', 'NAME_FR', 'NAME_EL', 'NAME_HE', 'NAME_HI', 'NAME_HU', 'NAME_ID', 'NAME_IT', 'NAME_JA', 'NAME_KO', 'NAME_NL', 'NAME_PL', 'NAME_PT', 'NAME_RU', 'NAME_SV', 'NAME_TR', 'NAME_UK', 'NAME_UR', 'NAME_VI', 'NAME_ZH', 'NAME_ZHT', 'FCLASS_ISO', 'TLC_DIFF', 'FCLASS_TLC', 'FCLASS_US', 'FCLASS_FR', 'FCLASS_RU', 'FCLASS_ES', 'FCLASS_CN', 'FCLASS_TW', 'FCLASS_IN', 'FCLASS_NP', 'FCLASS_PK', 'FCLASS_DE', 'FCLASS_GB', 'FCLASS_BR', 'FCLASS_IL', 'FCLASS_PS', 'FCLASS_SA', 'FCLASS_EG', 'FCLASS_MA', 'FCLASS_PT', 'FCLASS_AR', 'FCLASS_JP', 'FCLASS_KO', 'FCLASS_VN', 'FCLASS_TR', 'FCLASS_ID', 'FCLASS_PL', 'FCLASS_GR', 'FCLASS_IT', 'FCLASS_NL', 'FCLASS_SE', 'FCLASS_BD', 'FCLASS_UA', 'geometry']
Group Activity¶
Write a function called highlight_population
that takes a countries GeoDataFrame and a continent name as input. The function should return a plot that colors the countries in the specified continent based on their population. Instead of plotting raw population numbers, it should represent each country's population as a percentage of the continent's total population. To do this, you may add a new column to the dataset called pop_ratio
.
The plot should show all countries outside of the continent as grey (color being #EEEEEE
and edgecolor #FFFFFF
). The plot should also include a legend. The legend should be scaled so the minimum value is 0 (vmin=0
) and the maximum value is 1 (vmax=1
). Finally, make sure the figsize is set to figsize=(15, 10)
!
def highlight_population(countries, continent):
# TODO: write this function!
df_continent = countries[countries["CONTINENT"] == continent].copy()
fig, ax = plt.subplots(1, figsize=(15,10))
total_pop = df_continent["POP_EST"].sum()
df_continent["pop_ratio"] = df_continent["POP_EST"] / total_pop
# plotting
countries.plot(ax=ax, color="#EEEEEE", edgecolor="#FFFFFF")
df_continent.plot(ax=ax,column="pop_ratio",legend=True,vmin=0,vmax=1)
return ax
highlight_population(countries, "Africa")
<Axes: >
Write a function called gdp_and_population_ratio
that takes a countries GeoDataFrame as input and returns an Axes
with two subplots. The first subplot should color each country based on the percentage of the world’s population that lives there, while the second should color each country based on its percentage of the world’s GDP. To achieve this, you may add new columns to the dataset called pop_ratio
and gdp_ratio
.
The plot should also include a legend. The legend should be scaled so the minimum value is 0 (vmin=0
) and the maximum value is 1 (vmax=1
). Finally, make sure the figsize is set to figsize=(15, 10)
!
HINT: In order to find which columns you might want to use, you can use the countries.columns
properties to inspect what columns are in the dataset.
def gdp_and_population_ratio(countries):
# TODO: write this function!
world_pop = countries["POP_EST"].sum()
countries["pop_ratio"] = countries["POP_EST"] / world_pop
world_gdp = countries["GDP_MD"].sum()
countries["gdp_ratio"] = countries["GDP_MD"] / world_gdp
fig, [ax1, ax2] = plt.subplots(2, figsize=(15,10))
countries.plot(ax=ax1,column="pop_ratio",legend=True,vmin=0,vmax=1)
countries.plot(ax=ax2,column="gdp_ratio",legend=True,vmin=0,vmax=1)
return ax1,ax2
gdp_and_population_ratio(countries)
(<Axes: >, <Axes: >)