Matplotalt: Alt text for matplotlib figures
A small python package to generate and surface alt text for matplotlib figures
Motivation
Visualizations on the web generated by code often lack alt text. Across 100000 Jupyter notebooks, Potluri et al. 2023, found that 99.81% of programmatically generated images did not have associated alt text. Of these, the vast majority were created with matplotlib or seaborn, neither of which contain methods for easily generating or displaying image descriptions. This is a critical barrier to perceiving computational notebooks for blind and visually impaired (BVI) users. Despite many feature requests from BVI developers (e.g., on screen reader navigation, cell indexing, and rendering dynamic content), accessibility tools for notebooks are rarely implemented in the Jupyter core. The goal of this package is to provide an alternative that allows users to read and automically generate alt text for matplotlib figures in computational notebooks with a screen reader.
Related Work
This project is primarily based on Potluri and Singanamalla’s paper “Notably Inaccessible” [1], which analyzes accessibility issues in Jupyter notebooks and provide suggestions for improving notebook figure and interfaces. The method for displaying generated alt text through HTML in this package follows instructions from their section 5.1.1 on “including alternate text in images”. Our method to automatically generate starter alt text from figure properties follows the four-level method from Lundgard and Satyanarayan’s paper “Accessible Visualization via Natural Language Descriptions” [2]. The generate_starter_alt_text
function implements a limited version of the level 1 (encoded properties) and level 2 (statistical concepts) descriptions that are perciever-independant. Many previous users have encountered these accessibility issues and come up with their own work-arounds. Wherever possible I try to credit the code that I’ve drawn from these StackOverflow and Github conversations.
Methodology
This package implements two primary functions:
-
generate_starter_alt_text
which takes a figure and axes and returns a string with text describing the encodings, axes, and statistics of the figure. -
surface_alt_text
which takes an alt text string and surfaces it in at least one of four ways: as markdown output of its cell, as a python comment in a new cell, embedded in an HTML display of the figure, or as text in a seperate file.
There are also several parameters to control details in the alt text like which statistics to include, and the number of signifigant figures. Below is an example of how to use both function to describe a line plot:
This project is entirely implemented in python. Besides Matplotlib and Seaborn, I used IPython for the custom notebook behavior (adding markdown to cells, adding new cells, and embedding alt text in the html figure display), and numpy to calculate statistics for the level 2 descriptions. Although I was unable to get feedback from people who use screenreaders on the final package, both primary functions are direct implementations of the method from Lundgard et al., reccomendations from “notably inaccessible”. Several smaller details like adding the option to disable statistics in alt text came from Lundgard et al.’s survey of participants on the Prolific platform, as some users prefered fewer technical details.
Surfacing alt text through markdown and python cells was based on Venkatesh’s feedback after the project presentations. They were definately right that embedding the alt text direclty in Jupyter would be harder than I thought!
Disability Justice
This project addresses collective access by contributing to the larger goal of making computational notebooks more accessible to BVI users. I plan to release all code publicly on github and invite other developers to add and suggest features to meet their access needs for figures. Matplotlib, Jupyter, and other open-source libraries are important because they are free and collaborative. When accessibility features are locked behind a paywall, however, it prevents users marginalized by their class from iteracting with data, multiplying barriers for poor and disabled users. By providing alternatives to paid software with more robust features for alt text like Microsoft Office and Tableau, I hope this project will contribute to *anti-capitalist politics.
Lessons and Future Work
One challange for automatically generating matplotlib alt text was inconsistancy between the properties of different plot types. Automatically detecting the type of plot from the figure and axes objects was especially challenging. Because there is no “LinePlot” or “BarChart” object, one workaround I tried was inferring the type from the most common subobject (e.g., line vs. rectangle), although there’s no rule for what proportion of shapes makes up a type of plot. Some of the ways I planned to surface alt text in the notebooks were similarly hard to implement. Although I was able to programatically create a new cell with given alt text in a python comment, there was no function to convert cells to markdown, which would be easier to navigate with a screen reader. Likewise, I tried editing the Exif imageDescription tag in pngs with multiple libraries, but was unable to embed text directly in figures in a way that was visible to screen readers.
Matplotlib and Jupyter are both massive libraries, and there are undoubtably many examples where this package fails to describe important features of figures or interacts poorly with the notebook interface. Extending the functions to work with many different plot types (e.g., contour, scatter matrix, ridge) is necessary before this sees any use. At the same time, models like Lundgard et al. 2021 need to be extended to these other plot types. There are also many other statistical features that could be automatically surfaced in future iterations of the package like correlation between variables and general trends. Finally, the method used in this package requires generating an external figure that’s referenced by the HTML display, which is not ideal. Like Potluri et al. note in their paper, there need to be better better options for embedding alt text in Jupyter or Matplotlib figures.
Accessibility Features
Due to the underlying inaccessability of Juptyer notebooks and matplotlib figures, this package is still difficult to use with a screen reader. To make understanding the individual functions easier, I tried to follow best practices for creating documentation with clear parameters and return values. Addionally, I included an example notebook to demo the main functions for generating and displaying alt text on several common plot types. In this notebook, I attempted to follow the reccomendations in “notably inaccessible” by adding markdown sections for each of the plots and exposing their data in tables when appropriate.
Works Cited
[1] Potluri, Venkatesh, et al. “Notably Inaccessible—Data Driven Understanding of Data Science Notebook (In) Accessibility.” Proceedings of the 25th International ACM SIGACCESS Conference on Computers and Accessibility. 2023.
[2] Lundgard, Alan, and Arvind Satyanarayan. “Accessible visualization via natural language descriptions: A four-level model of semantic content.” IEEE transactions on visualization and computer graphics 28.1 (2021): 1073-1083.