The End of Knowledge¶
So far, the input training data for generative/predictive language models has been human-created material. As we've seen, these language models can approximate knowing the definition of words by placing them in context with other words. Given enough training data and "human-in-the-loop" feedback, the model is able to build more accurate context.
But what happens when humans aren't as in the loop? When the input training data includes data generated by an AI model, what happens?
In this lesson and discussion, we will work through various hypothetical "real life" scenarios involving AI.
Scenario 1 - Casual, Haphazard Use¶
You've just gotten a job on a team writing software to do data processing and analytics. As is standard practice in the software engineering field, your teammate requests a code review of some work they want to "push to production" (i.e., allow the code to be used by the team's users/clients). In reviewing the code, you're confused about what it does, and pose a question about it. The author, your teammate, responds, "oh lol. idk, CoPilot wrote it. seems to work though." How do you respond?
Consider the following potential solution to the last function in Mapping:
import geopandas as gpd
import matplotlib.pyplot as plt
import pandas as pd
from folium.features import GeoJson
from folium.folium import Map
import folium
def interactive_map(state_data):
'''
Ask teammate for what they prompt.
Ask AI to tell me what it does.
'''
m = Map(location=[47.5, -120.5], zoom_start=7)
low_access = state_data[
(state_data["LowIncomeTracts"] == 1) &
(
((state_data["Urban"] == 1) & (state_data["LATracts_half"] == 1)) |
((state_data["Rural"] == 1) & (state_data["LATracts10"] == 1))
)
]
GeoJson(low_access).add_to(m)
return m
Scenario 2 - AI as a Reviewer¶
Later, you're now a senior member of the team, and have a much greater say over what code and features are integrated into your team's work. Collectively, you and your team have decided that in order to streamline the code review process, any code you write will be fed into an AI for review. The AI will be asked to find any bugs, and then document the code. As a final check, you or another senior member will manually run the code to make sure things still work (as we all know, the AI sometimes makes typos or small mistakes). Who on your team knows how the code works?
Here's a different implementation of the interactive_map
function, also generated by AI:
def interactive_map(state_data):
"""
Returns an interactive map of low income and low access census tracts in Washington.
This function filters the state data to include only low-income tracts that are either
low access at half a mile for urban census tracts or low access at 10 miles for rural
census tracts. It then creates an interactive map centered on Washington state and
adds the filtered census tracts as a GeoJson layer.
Parameters:
state_data (GeoDataFrame): The merged GeoDataFrame containing census and food access data.
Returns:
Map: An interactive map with the filtered low-income low-access census tracts.
"""
# Filter for low-income tracts
filtered_data = state_data[state_data['LowIncomeTracts'] == True]
# Further filter based on low access criteria
filtered_data = filtered_data[(filtered_data['Urban'] & (filtered_data['LATracts_half'] == True)) |
(~filtered_data['Rural'] & (filtered_data['LATracts10'] == True))]
# Create the map centered on Washington state
map_center = [47.5, -120.5] # Approximate center of Washington
map_wa = folium.Map(location=map_center, zoom_start=7)
# Add filtered tracts as a GeoJson layer
folium.GeoJson(filtered_data).add_to(map_wa)
return map_wa
- What do we do when nobody but AI knows what the code does?
- Can we trust that it'll find all bugs?
interactive_map(state_data)
I copied it into the same AI and asked it, "Does my implementation of the interactive_map function have any bugs?" to which it replied:
Your implementation of the interactive_map function looks mostly correct, but there are a few potential issues:
- The Map class from folium should be imported to use in the type assertion.
- The GeoJson class from folium should be imported to use in the type assertion.
- The state_data variable should be defined or passed to the function for it to work correctly.
- The lalowi_idx variable should be defined for the final assertion to work.
Scenario 3 - Full Integration¶
Seeing that this process has been working well, you formalize the process your original teammate had been using: not only will your team be allowed to submit AI-generated code, but it is expected. Now, AI writes most of your team's code, generates data-visualization dashboards, summarizes results, debugs problems, and more. Who owns the intellectual property for the software your team creates? What is your job on the team now?