Objectives

By the end of this lesson, students will be able to:

  • Recognize how technical choices embed values and perspectives
  • Apply critical thinking to data collection and analysis practices
  • Consider context, ethics, and interpretation in computational work

Setting up

To follow along with the code examples in this lesson, please download the files in the zip folder here:

Make sure to unzip the files after downloading! The following are the main files we will work with:

  • lesson10.ipynb
  • nationalparks.ipynb
  • nationalparks.csv

What is Humanistic Computing?

Humanistic computing (sometimes called digital humanities or humanistic data science) is an approach that brings together computational methods and humanistic inquiry. Rather than viewing technology and the humanities as separate domains, humanistic computing recognizes that the most meaningful insights often emerge when we combine the pattern-finding power of computation and algorithms with the interpretive depth of humanistic thinking.

At its core, humanistic computing asks us to approach data work with questions that humanists have always asked: What does this mean? Whose perspectives are represented? What historical forces shaped this? What are the ethical implications? It tells us that technical skills are most powerful when paired with critical thinking about context, meaning, and human values. This approach is particularly important as data increasingly shapes decisions about everything from public policy to resource allocation, because it helps us understand not just what the data shows, but what it means for real people and communities. And as many applications and workflows in computing domains become increasingly automated, it becomes even more vital that we keep in touch with the inherent humanity of what we’re trying to do.

Core Principles

There are many ways to define the core principles or values of humanistic computing. In this class, we’ll consider the following:

  • Data as Representation Over Reality
  • Complementary Skills Over Competing Ones
  • Interpretation Over Prediction

Data as Representation, Not Reality

When we work with data, it’s easy to treat numbers as objective facts about the world. But data is better described as a selective representation of reality shaped by human choices. Someone decided what to measure, how to categorize it, when to collect it, and what to leave out. A dataset about national parks, for instance, tells us how many parks exist and where they’re located, but it doesn’t tell us why those particular places became parks, whose lands they were originally, or what competing interests shaped those decisions. This means that data can answer questions like “what” and “how many,” but it struggles with “why” and “what does this mean.”

Complementary Skills, Not Competing Ones

It’s tempting to think of computational and humanistic approaches as opposite ends of a spectrum: numbers vs. narratives, objectivity vs. interpretation, science vs. humanities. But this framing misses the point. These approaches ask different kinds of questions and provide different kinds of answers, and both are essential for responsible data work.

Computational analysis gives us the “what”, i.e., the patterns, trends, and quantifiable changes in our data. Humanistic inquiry gives us the “so what”, i.e., the meanings, motivations, and implications of those patterns. When we combine them, we get a much richer understanding than either approach could provide alone. We can identify a trend computationally and then use contextual research to understand what forces created that trend and what consequences it might have.

The most impactful data science work happens at the intersection of these skills. Consider how different your understanding of national park creation becomes when you pair a simple count (computational) with research into the environmental legislation, Indigenous land rights movements, and conservation debates (humanistic) that shaped those decisions. Neither skill set is complete without the other.

Interpretation Over Prediction

Context always matters. Data doesn’t exist in a vacuum! (In fact, the word “data” comes from a Latin word meaning “that which is given,” which implies that there was a giver at some point!)

Revisiting Technical and Non-Technical Skills

In Lesson 9, we discussed technical and non-technical skills. Let’s build on that foundation by thinking about how these skills interact in humanistic computing. Technical skills you’ve developed this quarter include:

  • Writing and documenting functions
  • Manipulating DataFrames
  • Creating visualizations
  • Writing tests

These skills are essential, but they’re not value-neutral. Every technical decision has implications:

  • When you write a function, you decide what inputs matter and what outputs should look like
  • When you manipulate data, you decide what to keep and what to discard
  • When you create visualizations, you decide what comparisons to highlight
  • When you write tests, you decide what counts as “correct”

Humanistic computing adds another layer of skills:

  • Critical questioning: Asking not just “how do I do this?” but “should I do this?” and “what are the consequences?”
  • Contextual awareness: Understanding where data comes from and what historical, social, or political factors shaped it
  • Interpretive humility: Recognizing that data analysis produces interpretations, not objective truths
  • Ethical reasoning: Considering who benefits and who might be harmed by your technical work
  • Transparent documentation: Making your choices visible so others can evaluate and critique your work

You’ve actually been practicing these skills all quarter, especially in your Reading Assignments! Though there’s no data analysis or coding to be found there, the nature of Reading Assignments makes you consider intention, audience, context, implications, and limitations of what you’re reading.

Documentation

Good documentation goes beyond just commenting and annotating your code (though these things are still important) to also provide a reference for what your code is doing, and any design decisions that were made. We’ll look at some examples of documentation (also called docs) for some of the libraries we’ve seen in class so far.

First, let’s look at the docs for pandas DataFrame. You can also click on the “Source” link on this page to view the raw code! What do the docs tell you that the raw code doesn’t?

Now, consider the documentation for the seaborn relplot function that we visited in Lesson 8. What information is in the docs that you wouldn’t find by calling:

import seaborn as sns

help(sns.relplot)

Finally, consider the information in matplotlib.pyplot‘s title function. What similarities or differences do you find between the matplotlib documentation and that of pandas or seaborn?

Something important to consider now is, what does the documentation here not tell you? You will learn a lot about how to use each library, but the documentation here crucially does not explain why these functions work the way they do, or any design decisions to why they appear like this.

Transparency and Interpretation

One of the most practical aspects of humanistic computing is documenting your analytical choices. You’ve been writing comments and docstrings all quarter. Now let’s think about documentation as a form of accountability. For these examples, we’ll add documentation as comments so that you see them side-by-side with code. However, in practice, these comments will likely live in a separate document, like a writeup or report.

Here’s our first approach, using an imaginary dataset of school districts and average standardized test scores:

# Drop rows with missing values
data = data.dropna()

# Calculate mean score
mean_score = data['score'].mean()

Here’s our second approach. Note that here, we use an f-string to display the number of schools that we dropped.

# Here, we remove rows with missing values to ensure that the schools
# we analyze have score measurements for all needed columns. However, 
# this means that we lose 23% of observations (17 schools).
# The missing data was concentrated in rural schools (16 of 17 dropped schools)
# Therefore, our analysis may not represent rural schools accurately.
# We could impute state averages for the missing values, but this 
# could obscure real differences between rural and non-rural schools.
data_complete = data.dropna()
print(f"Removed {len(data) - len(data_complete)} schools")

# Here, we are using the mean score as summary metric.
# The mean is sensitive to outliers, which we want to capture in analysis.
# However, the ean may be skewed by a few very high or low scores.
mean_score = data_complete['score'].mean()

The second approach does more than explain your code. It makes your reasoning visible, acknowledges trade-offs, and helps future readers (including yourself!) understand not just what you did, but why you did it and what the limitations are. Let’s break down the different parts:

  • Decision: Tells “what” you did in your code.
  • Reasoning: Explanation for why you made the decision you did.
  • Consequences: Outcome after your decision, whether expected or unexpected. Take note of any patterns that arise as a result of your decisions.
  • Limitations: What do those consequences mean for your analysis? What could you do differently? What are the tradeoffs?

Documentation isn’t just about recording technical steps—it’s also about interpretation. When you analyze data, you’re not discovering objective truths; you’re creating one possible interpretation of patterns in the data.

Documenting coursework

For future homework assignments, and for your portfolio or project, we will ask you to document your code choices according to similar guiding principles.

Food for thought: Look at some code you wrote recently. What decisions did you make that you didn’t document? Try adding comments that explain your reasoning and acknowledge limitations.

Ethics in Practice

Ethics in data work isn’t a separate checklist you complete after your analysis—it’s woven into every technical decision you make. Throughout this course, you’ve been making ethical choices whenever you decided how to clean data, what visualizations to create, or how to interpret your findings. Now we’ll make this ethical dimension explicit by examining how seemingly neutral technical choices can have real consequences for people and communities. Humanistic computing asks us to consider not just whether our code works, but whether it works fairly, whose interests it serves, and what harm it might inadvertently cause. By developing frameworks for ethical reasoning, we can make more thoughtful decisions and document them transparently.

Data Defaults

Many programming libraries and functions come with default parameters and settings that seem convenient and standard, but defaults are almost never neutral. When we use .dropna() without considering the reasoning and consequences behind that decision, we’re making an implicit choice about whose data matters. When we normalize or standardize values, we’re deciding what counts as “typical” and treating everything else as deviation. These defaults can systematically exclude certain groups (like the rural schools in our earlier example), erase important variation, or reinforce existing biases in our data.

What makes defaults particularly dangerous is that they feel objective; we assume that if something is the standard approach, it must be the right one! But “standard” practices in data science encode the values and assumptions of whoever designed them, and those assumptions may not align with your analytical goals or ethical commitments. Always ask: what does this default assume, who does it benefit, and who might it harm?

Food for thought: We’ve seen quite a few default parameters in pandas, seaborn, and matplotlib. What do these suggest to you about the “standards” for what these libraries and functions meet or accomplish?

Uncertainty

One of the most important aspects of humanistic computing is recognizing that analysis produces interpretations, not certainties. Data analysis reveals patterns and relationships, but it rarely tells us definitively why those patterns exist or what they mean. Correlation isn’t causation, samples don’t perfectly represent populations, and measurements are always incomplete proxies for complex realities. Acknowledging uncertainty isn’t a weakness! In fact, it’s intellectual honesty and a hallmark of rigorous analysis. When you’re transparent about what you don’t know or can’t prove, you invite others to engage critically with your work rather than accept it uncritically.

You CAN say:

  • “In this dataset, we observed…”
  • “The data suggests a relationship between…”
  • “One possible interpretation is…”
  • “This pattern might indicate…”

You CANNOT say (at least, not with certainty):

  • “This proves that…”
  • “X definitely causes Y…”
  • “All [group] are…”
  • “This will always…”

The difference matters! Consider these two statements about the same finding:

  • “Our analysis proves that students from wealthy families perform better because they have access to better resources.”
  • “Our analysis found a correlation between family income and test scores. This pattern is consistent with differences in access to educational resources, though other factors like parental education and school quality may also play a role. We cannot determine causation from this data alone.”

The first bullet point is what we might call an overclaim, or the inflation or misrepresentation of your findings and generalizing to unprovable claims. The second claim is more verbose but both more accurate to what you would have found in an analysis. Good analysis is honest about uncertainty and limitations. When writing about your findings, explicitly discuss:

  • What you can’t measure: Many important things resist quantification (motivation, teacher quality, school culture)
  • What your data doesn’t show: If certain groups aren’t well-represented, say so
  • What your methods assume: All methods make assumptions; make yours explicit
  • What remains ambiguous: Sometimes the answer is “we’re not sure”

Food for thought: If your dataset doesn’t answer all the questions you have, what can you do to find answers?

⏸️ Pause and 🧠 Think

Take a moment to review the following concepts and reflect on your own understanding. A good temperature check for your understanding is asking yourself whether you might be able to explain these concepts to a friend outside of this class.

Here’s what we covered in this lesson:

  • Principles of humanistic computing
  • Documentation types
  • Ethical considerations in data science
  • Interpretability and uncertainty

Here are some other guiding exercises and questions to help you reflect on what you’ve seen so far:

  1. In your own words, write a few sentences summarizing what you learned in this lesson.
  2. What did you find challenging in this lesson? Come up with some questions you might ask your peers or the course staff to help you better understand that concept.
  3. What was familiar about what you saw in this lesson? How might you relate it to things you have learned before?
  4. Throughout the lesson, there were a few Food for thought questions. Try exploring one or more of them and see what you find.

In-Class

When you come to class, we will work together on the problems in nationalparks.ipynb. We will also need nationalparks.csv for this notebook. Make sure that you have a way of editing and running these files!

Canvas Quiz

All done with the lesson? Complete the Canvas Quiz linked here!

Note: Because of the free-form nature of this quiz, it will not be autograded. However, you can view your answers after submitting and resubmit as many times as you want before it’s due.