Skip to main content

Culturally Aware Alt Text Chrome Extension

Introduction

Problem

We are working to provide culturally relevant, descriptive, and accessible alt text for blind and low vision (BLV) individuals exploring the web. Traditional alt text often lacks depth and awareness of culture, especially when addressing culturally relevant images. This makes it difficult for BLV users to fully connect with and understand cultural content, especially when the image refers to unfamiliar customs, places, or symbols. This problem is most noticeable when the alt text is either too unclear or overly detailed. Solving this problem is important for making sure fair access to online cultural knowledge, giving users agency over their experiences, and encouraging web inclusion by accepting unique cultural identities and preferences.

Application

The main goal of our application is to provide people with low vision or blind people with a browser that allows them to understand images through a cultural lens. Throughout our interviews, our clients mentioned that many of the alt-texts that they work with do not explain the cultural identity/aspects of the image. Our Google Chrome browser allows the user to send in a specific image, which then the alt-text is produced. Additionally, the user can access a chatbox to ask more in-depth questions regarding the image and its cultural aspects. There is also a settings page where the user can go in and set the settings according to their cultural preferences and alt-text description length.

Client Input

Clients

  • Brown Indian Young man with low vision(Correctable with glasses)
  • Blind White American middle-aged adult woman
  • Blind White American Senior woman

Access Needs

Through interviewing the clients, I got to know more about their needs when it comes to their use of alt-text. I learned that a lot of description preferences come based on their lived experiences. While one client had the opportunity to travel to different parts of the world, experiencing different cultures had different alt-text preferences from our clients who did not. This client preferred more shorter and concise descriptions, while others preferred more in-depth ones since the concepts were often foreign to them.

Additionally, I learned how important alt-text plays in their daily lives. It often goes beyond just image descriptions on the internet, but also something they use as a part of their daily lives. One client mentioned that she uses AI to produce alt-text to describe objects around their home.

“I used Chat GPT to help me understand where the pumpkin was in my house”

Lofi Prototype

Through making our prototype, we learned about the importance of making sure our technology is accessible through voiceover or screen readers. Since our target audience is people who are low vision or blind we wanted to ensure our technology is compatible with screen readers. However, this raised a challenge since we needed to constantly ensure that not only the screen reader works but also the reading is happening in the correct order.

Lofi Prototype Settings Page

Image of a setting page, where the first section includes a large Settings header 'None of these settings are necessary, but they may help prompt the initial Alt Text Generation'Then the second section includes Identity Settings:In text box, Enter your Cultural IdentityThe third section talking about Context settingsToggle for switching on/off for 'Include Additional Cultural context' Toggle for switching on/off for 'Include Additional Cultural Related to  Cultural Identity'

Lofi Prototype CAATBox

chatbox displayed with the first text bubble that is a blue shade displaying the message 'What is a guzheng?' and another text bubble in a gray shade displaying the message 'Processing cultural insights...'

Final Concept

A Chrome Extension that allows users to generate culturally relevant alt text for an image. We used HTML, Javascript, and CSS to build the front end and used LLama 3.2 Vision Instruct as our LLM. We designed an Image Description Verification Framework loosely inspired by the framework found in the GenAssist Paper. An image will be input to the LLM with a set of Image Questions, Cultural Questions, and Settings Questions. The outputs would be summarized to produce well written alt text. Then the alt text would get each cultural aspect described in the final output.

Image Description Framework

Screenshot of Image Description Framework. Image and Settings flow into Image Verification Questions. From top to bottom it reads Image Verification QuestionsImage QuestionsWhat is the setting of this image?  What is the subject of this image?What is happening in the image?What is the background of the image?Cultural QuestionsWhat is the ethnicity of the people in the image?What cultural clothing is present?What culturally significant architecture or landmarks are present?What type of food or dining custom is present?ETCOn/Off from SettingsWhat are the emotions of the people?, This flows into Alt Text Generation. This Reads Alt Text GenerationSummarize FindingsBased on these findings, write succinct alt text with a maximum of three sentences describing the photo. This  and settings flows into Detail AddingBased off SettingsFor each cultural aspect in this image description, describe what it is. Finally flowing into Alt T

Example of Generated Alt Text

Screenshot of VoiceOver over an Image. The VoiceOver reads. 'An Asian woman with long parted black hair wearing a white traditional Chinese Hanfu dress plays a guzheng, a Chinese stringed musical instrument, in front of the Eiffel' before being cut off. She is in front of the Eiffel Tower

Intersectional Needs

Our design addresses our clients as whole people, not just people who are blind or have visual impairments. By having cultural identity be part of the design and output, we get a more holistic view of the image itself. Additionally, we found from our client interviews that if our users were familiar with a cultural aspect of an image whether by sharing the identity or having experienced that culture, they were more likely to prefer more succinct Alt Text. Conversely, if they were unfamiliar with a cultural aspect of an image, they were more likely to prefer more descriptive Alt Text. Customizable Alt Text output provides a more tailored experience and allows more agency for our users.

Potential Risks and Harms

We identified several risks in deploying culturally aware AI-generated alt text: hallucinated or inaccurate outputs, stereotyping, offensive assumptions, and overwhelming or excessive detail. To mitigate these harms, we designed a framework loosely based off of Chain of Thought Prompting and the framework found in the GenAssist Paper. Additionally, we prioritized user agency and contextual control. Users can optionally input their cultural identity and adjust generation settings—like including additional context or emotion—through an accessible settings page. This encourages transparency and lets users tailor the experience to their needs. The AI prompt was also carefully structured to focus on relevant cultural cues (e.g., food, clothing, activity) while avoiding overreach or bias.

Accessibility

VPAT Table

Criterion Conformance (1-5) Description
1.1.1 - Non-text Content 3/5 Bad alt text is intentional on the main web page, but user can add alt text to it with extension
1.3.1 (Info and Relationships) 5/5 Now Complete text reading when using Voiceover – VoiceOver now reads all parts of the texts, particularly important under the “Context Settings” area, on the web application
1.3.4 - Orientation 0/5 Not designed for mobile implementations
1.4.3 - Contrast (Minimum) 5/5 Used basic color scheme with black on white so contrast is very high
1.4.4 - Resize Text 0/5 Text in the application does not resize according to the client’s preferred font size
1.4.10 - Reflow 0/5 No ability to Resize
1.4.11 - Non textual contrast 5/5 Used basic color scheme with black on white so contrast is very high and switch colors are 4.5:1 minimum contrast with white
2.4.3 - Focus Order 5/5 Screen Readers accurately go over the content since heading levels and text is consistent
4.1.2 - Name, Role, Value 0/5 No indication on extension what technologies and actions are explicitly supported
4.1.3 - Status messages 4/5 Some indication when changes are happening since aria polite will read out the save but other instances this is not confirmed.

Accessibility Audits

Audit 1

Description:

Clickable Fields Present when they shouldn’t be

Testing Method:

VoiceOver

Evidence:
Guidelines violated:

WCAG 4.1.2 - Name, Role, Value

Screenshot of CAATBox Extension with voiceover Active. Voiceover reads main.

Explanation:

The example violates WCAG 4.1.2 - Name, Role, Value since there is no mention of what accessible tools are compatible with the extension even though it is important to know which screen readers are explicitly compatible with the extension since it is for people who rely on screen readers.

Severity Rating: 5
Justification:

Frequency is high on this website because it affects one of the main parts, the settings page, of the extension. Impact is high because it can be confusing to someone using a screen reader when they come across a bug or something similar that they are unprepared for. Persistence is high since this is unavoidable in this current state.

Possible Solution:

To improve this, the developer should accurately test different accessible tools, primarily different common screen readers, on the extension to explicitly state compatibilities and work on accessibility audits to state what the extension does not support.

Audit 2

Description:

High Contrast Non Text Elements

Testing Method:

WebAim WAVE

Evidence:
Guidelines Passed:

1.4.3 - Contrast (Minimum)
1.4.11 - Non textual contrast

Screenshot example of switch on Extension Final Version. The top switch is off and gray with a white toggle. The text in the middle in a black font with white background reads, Emotion: Alt text will include a focus on emotions. The bottom is on and green with a white toggle. Screenshot of WebAim Wave Color contrast color. The foreground color is hex #767676 which is a dark gray that has a 4.54:1 contrast ratio with #ffffff white. Screenshot of WebAim Wave Color contrast color. The foreground color is hex ##3A833d which is a Green that has a 4.67:1 contrast ratio with #ffffff white.

Explanation:

The example passes 1.4.3 - Contrast (Minimum) and 1.4.11 - Non textual contrast since the colors are high contrast compliant. For the text, black font on white background is contrast compliant, so the extension is 1.4.3 compliant. For the switches, they are both a minimum of 4.5:1 contrast with white, so that is also 1.4.11 compliant.

Severity Rating: 1
Justification:

Frequency is high on this website because it affects the UI of the extension. Impact is none because it is not a problem. Persistence is none because it is not a problem.

Possible Solution:

Continue using high contrast colors.

Audit 3

Description:

Indication of Saved Cultural Identity

Testing Method:

VoiceOver

Evidence:
Guidelines Passed:

4.1.3 - Status messages

Screenshot of extension's Identity Settings with VoiceOver on. VoiceOver reads Saved Cultural Identity: Chinese American, after entering in a cultural identity in the 'Enter Your Cultural Identity:' edit input box.

Explanation:

The example passes 4.1.3 - Status messages since with using VoiceOver, it will read out what cultural identity was just saved along with the visual change below the input box.

Severity Rating: 1
Justification:

Frequency is low on this website because it affects single input box under Settings. Impact is none because it is not a problem. Persistence is none because it is not a problem.

Possible Solution:

Continue using aria polite, assertive where appropriate.