Geospatial Data Practice¶

In this section, we will practice manipulating and plotting geospatial data.

In [1]:
import geopandas as gpd
import matplotlib.pyplot as plt

Our Dataset: Countries!¶

Run the cell below to see the countries GeoDataFrame from yesterday's lecture, which you'll be working with today.

In [2]:
countries = gpd.read_file("ne_110m_admin_0_countries.shp").set_index("NAME")
# countries.head()
list(countries.columns)
Out[2]:
['featurecla',
 'scalerank',
 'LABELRANK',
 'SOVEREIGNT',
 'SOV_A3',
 'ADM0_DIF',
 'LEVEL',
 'TYPE',
 'TLC',
 'ADMIN',
 'ADM0_A3',
 'GEOU_DIF',
 'GEOUNIT',
 'GU_A3',
 'SU_DIF',
 'SUBUNIT',
 'SU_A3',
 'BRK_DIFF',
 'NAME_LONG',
 'BRK_A3',
 'BRK_NAME',
 'BRK_GROUP',
 'ABBREV',
 'POSTAL',
 'FORMAL_EN',
 'FORMAL_FR',
 'NAME_CIAWF',
 'NOTE_ADM0',
 'NOTE_BRK',
 'NAME_SORT',
 'NAME_ALT',
 'MAPCOLOR7',
 'MAPCOLOR8',
 'MAPCOLOR9',
 'MAPCOLOR13',
 'POP_EST',
 'POP_RANK',
 'POP_YEAR',
 'GDP_MD',
 'GDP_YEAR',
 'ECONOMY',
 'INCOME_GRP',
 'FIPS_10',
 'ISO_A2',
 'ISO_A2_EH',
 'ISO_A3',
 'ISO_A3_EH',
 'ISO_N3',
 'ISO_N3_EH',
 'UN_A3',
 'WB_A2',
 'WB_A3',
 'WOE_ID',
 'WOE_ID_EH',
 'WOE_NOTE',
 'ADM0_ISO',
 'ADM0_DIFF',
 'ADM0_TLC',
 'ADM0_A3_US',
 'ADM0_A3_FR',
 'ADM0_A3_RU',
 'ADM0_A3_ES',
 'ADM0_A3_CN',
 'ADM0_A3_TW',
 'ADM0_A3_IN',
 'ADM0_A3_NP',
 'ADM0_A3_PK',
 'ADM0_A3_DE',
 'ADM0_A3_GB',
 'ADM0_A3_BR',
 'ADM0_A3_IL',
 'ADM0_A3_PS',
 'ADM0_A3_SA',
 'ADM0_A3_EG',
 'ADM0_A3_MA',
 'ADM0_A3_PT',
 'ADM0_A3_AR',
 'ADM0_A3_JP',
 'ADM0_A3_KO',
 'ADM0_A3_VN',
 'ADM0_A3_TR',
 'ADM0_A3_ID',
 'ADM0_A3_PL',
 'ADM0_A3_GR',
 'ADM0_A3_IT',
 'ADM0_A3_NL',
 'ADM0_A3_SE',
 'ADM0_A3_BD',
 'ADM0_A3_UA',
 'ADM0_A3_UN',
 'ADM0_A3_WB',
 'CONTINENT',
 'REGION_UN',
 'SUBREGION',
 'REGION_WB',
 'NAME_LEN',
 'LONG_LEN',
 'ABBREV_LEN',
 'TINY',
 'HOMEPART',
 'MIN_ZOOM',
 'MIN_LABEL',
 'MAX_LABEL',
 'LABEL_X',
 'LABEL_Y',
 'NE_ID',
 'WIKIDATAID',
 'NAME_AR',
 'NAME_BN',
 'NAME_DE',
 'NAME_EN',
 'NAME_ES',
 'NAME_FA',
 'NAME_FR',
 'NAME_EL',
 'NAME_HE',
 'NAME_HI',
 'NAME_HU',
 'NAME_ID',
 'NAME_IT',
 'NAME_JA',
 'NAME_KO',
 'NAME_NL',
 'NAME_PL',
 'NAME_PT',
 'NAME_RU',
 'NAME_SV',
 'NAME_TR',
 'NAME_UK',
 'NAME_UR',
 'NAME_VI',
 'NAME_ZH',
 'NAME_ZHT',
 'FCLASS_ISO',
 'TLC_DIFF',
 'FCLASS_TLC',
 'FCLASS_US',
 'FCLASS_FR',
 'FCLASS_RU',
 'FCLASS_ES',
 'FCLASS_CN',
 'FCLASS_TW',
 'FCLASS_IN',
 'FCLASS_NP',
 'FCLASS_PK',
 'FCLASS_DE',
 'FCLASS_GB',
 'FCLASS_BR',
 'FCLASS_IL',
 'FCLASS_PS',
 'FCLASS_SA',
 'FCLASS_EG',
 'FCLASS_MA',
 'FCLASS_PT',
 'FCLASS_AR',
 'FCLASS_JP',
 'FCLASS_KO',
 'FCLASS_VN',
 'FCLASS_TR',
 'FCLASS_ID',
 'FCLASS_PL',
 'FCLASS_GR',
 'FCLASS_IT',
 'FCLASS_NL',
 'FCLASS_SE',
 'FCLASS_BD',
 'FCLASS_UA',
 'geometry']

Group Activity¶

Write a function called highlight_population that takes a countries GeoDataFrame and a continent name as input. The function should return a plot that colors the countries in the specified continent based on their population. Instead of plotting raw population numbers, it should represent each country's population as a percentage of the continent's total population. To do this, you may add a new column to the dataset called pop_ratio.

The plot should show all countries outside of the continent as grey (color being #EEEEEE and edgecolor #FFFFFF). The plot should also include a legend. The legend should be scaled so the minimum value is 0 (vmin=0) and the maximum value is 1 (vmax=1). Finally, make sure the figsize is set to figsize=(15, 10)!

In [3]:
def highlight_population(countries, continent):
    # TODO: write this function!
    df_continent = countries[countries["CONTINENT"] == continent].copy()
    fig, ax = plt.subplots(1, figsize=(15,10))
    total_pop = df_continent["POP_EST"].sum()
    df_continent["pop_ratio"] = df_continent["POP_EST"] / total_pop
    # plotting
    countries.plot(ax=ax, color="#EEEEEE", edgecolor="#FFFFFF")
    df_continent.plot(ax=ax,column="pop_ratio",legend=True,vmin=0,vmax=1)
    return ax
    

highlight_population(countries, "Africa")
Out[3]:
<Axes: >
No description has been provided for this image

Write a function called gdp_and_population_ratio that takes a countries GeoDataFrame as input and returns an Axes with two subplots. The first subplot should color each country based on the percentage of the world’s population that lives there, while the second should color each country based on its percentage of the world’s GDP. To achieve this, you may add new columns to the dataset called pop_ratio and gdp_ratio.

The plot should also include a legend. The legend should be scaled so the minimum value is 0 (vmin=0) and the maximum value is 1 (vmax=1). Finally, make sure the figsize is set to figsize=(15, 10)!

HINT: In order to find which columns you might want to use, you can use the countries.columns properties to inspect what columns are in the dataset.

In [4]:
def gdp_and_population_ratio(countries):
    # TODO: write this function!
    world_pop = countries["POP_EST"].sum()
    countries["pop_ratio"] = countries["POP_EST"] / world_pop
    world_gdp = countries["GDP_MD"].sum()
    countries["gdp_ratio"] = countries["GDP_MD"] / world_gdp
    fig, [ax1, ax2] = plt.subplots(2, figsize=(15,10))
    countries.plot(ax=ax1,column="pop_ratio",legend=True,vmin=0,vmax=1)
    countries.plot(ax=ax2,column="gdp_ratio",legend=True,vmin=0,vmax=1)
    return ax1,ax2
    

gdp_and_population_ratio(countries)
Out[4]:
(<Axes: >, <Axes: >)
No description has been provided for this image