SlideAssist AI

Project Overview

Explanation

SlideAssist AI is a Chrome extension designed to improve lecture slide accessibility for BVI (blind or visually impaired) students. Our extension enhances lecture materials by generating slide summaries, visual descriptions, and semantic interpretations of visual content to help users better understand lecture material. This tool serves as a proof of concept for improving accessibility in lecture environments by using AI to interpret slide visuals and lecture transcripts.

SlideAssist AI Chrome extension popup over a university course syllabus webpage, showing options to upload lecture slides and an optional transcript, with a purple "Open Lecture and Side Panel" button. SlideAssist AI Chrome extension side panel displayed beside a PDF lecture viewer showing CSE 123 LinkedIntList slides. The panel includes a keyword search for "diagram" with matching slide results and a lecture review section with a slide number input and a purple "Analyze Slide" button. SlideAssist AI Chrome extension side panel open beside a lecture PDF viewer showing CSE 123 LinkedIntList slides. The panel displays a keyword search for "diagram," indicates slide 5 is selected, and shows an analyzed slide summary with an "Analyze Slide" button.
SlideAssist AI Chrome extension side panel open next to a PDF lecture viewer showing CSE 123 LinkedIntList slides. The panel displays a keyword search for "diagram," indicates slide 5 is selected, and shows expanded visual descriptions explaining a linked list node diagram. SlideAssist AI Chrome extension side panel open beside a lecture PDF viewer showing CSE 123 LinkedIntList slides. The panel shows a keyword search for "diagram," notes that slide 5 is selected for analysis, and displays sections on professor emphasis and concept connections about linked list reference semantics. SlideAssist AI Chrome extension side panel open beside a PDF lecture viewer showing CSE 123 LinkedIntList slides. The panel displays a keyword search for "diagram," indicates slide 5 is selected, and shows expanded concept connections explaining reference semantics and linked list insertion logic.

Demo Video & Final Poster

Poster titled “SlideAssist AI” describing an accessible Chrome extension for reviewing lecture PDFs. Sections outline motivation about PDF barriers for blind and visually impaired students, the lecture analysis approach with interface screenshots, accessibility audit findings, and key project takeaways.

Abstract

Our project is an AI-powered Chrome web extension designed to improve the accessibility of visual course materials, such as PDFs and slide decks, for BVI students. Many lecture slides contain charts, diagrams, images, and complex spatial formatting that are not meaningfully accessible through screen readers. As a result, students often encounter missing content, incorrect reading order, skipping visuals, or generic alt text that fails to capture the semantic meaning of the material.

Our extension enhances accessibility by analyzing lecture PDFs and generating contextual, lecture-aware descriptions of visual content. Using API calls to Gemini 3 Flash, the system extracts slide text and produces detailed alt text, slide summaries, semantic explanations of visuals, conceptual connections across the lecture, and emphasis cues based on transcript data and lecture markdown such as professor highlighted concepts. The extension features a side panel interface where users can review slide summaries, search keywords across visual descriptions, and request deeper explanations for specific slides.

Unlike generic image-captioning tools, our system situates visual descriptions within the broader lecture context, incorporating transcript data to better reflect instructional intent and conceptual relationships. By reducing cognitive load and improving semantic clarity, our tool aims to create a more immersive and independent study experience for BVI students reviewing course materials in any environment.

This approach is informed by research papers and first person accounts, such as a YouTube video in which a blind creator describes how inaccessible PDFs create barriers when using screen readers, including content being read in the wrong order or announced as blank.

Project Details

Motivation

Our goal is to support students by creating an assistive study tool that enhances their experiences interacting with lecture slides and visual materials. Many classes publish lecture slides as PDFs, which can pose an accessibility issue. We found a first person account from a YouTube channel called Unsightly Opinions, where blind creator Tamara explains how inaccessible PDFs create barriers when using a screen reader. She describes her personal experience of PDFs being read in the wrong order, skipping content, or being announced as blank. This first person account is intended to be educational and communicates both Tamara’s frustrations with using inaccessible PDFs and possible fixes, not an advertisement.

Tamara describes and shows herself using a screen reader on PDFs, but notes that for inaccessible documents it often jumbles the order of content, skips sections, or reads PDFs as being blank. Tamara emphasizes the importance of adding appropriate alt text, proper headings, and exporting to PDF (instead of printing to PDF) to make sure PDFs are accessible for screen readers. In describing all of these fixes, she also shows the technology/apps needed to do each of these in the video.

The creator explained that many PDFs are not properly formatted, so screen readers jump around, skip entire sections, or fail to recognize text. This makes long documents frustrating or impossible to access independently. She also mentions that PDFs made by “printing to PDF” look good but sound blank. This means that it is only useful to non-BVI people and is inaccessible to BVI users. However, she believes PDFs can become accessible quickly if people use real headings, proper formatting, alt text, and export PDFs correctly. This shows that small changes in how documents are created can make huge differences.

Even though the video focuses on PDFs, the same issues apply to many other formats, especially lecture slides and visual materials in classes, which is our focus for this project. If images, diagrams, or charts are missing alt text or structure, BVI students may miss key information or need extra help. This problem can affect other users too, like people with dyslexia and other learning disabilities, or anyone using text-to-speech for various reasons. Overall, it shows how important it is to provide structured content and meaningful descriptions for visual information.

Background Research

Accessibility barriers in higher education technology often stem from visually structured content that is not designed with assistive technologies in mind. Across our individual research, a common theme emerged: visual information in lectures, PDFs, charts, and interactive visualizations is frequently inaccessible to BVI students who use screen readers or other assistive tools. Missing alt text, improper document structure, incorrect reading order, embedded text within images, and lack of captions prevent BVI students from fully engaging with course materials. Even when screen readers are available, their effectiveness depends heavily on formatting decisions that are often outside the student’s control.

Several of the tools we studied demonstrate how AI can help bridge this gap. The LectureAssistant paper shows how lecture recordings and slides can be translated into contextualized explanations, allowing students to ask questions about visual content and better understand how spoken explanations connect to slide visuals. Similarly, the Chart-Text paper and the AltGosling paper focus on automatically generating alt text for visualizations. While Chart-Text emphasizes automated chart classification and description generation, AltGosling highlights the importance of structured, hierarchical, and navigable descriptions that allow users to explore complex data in multiple levels of detail. These systems demonstrate that accessibility extends beyond simply labeling images; it requires semantic explanations, organized information, and customizable depth.

In addition, best practices in accessible document design emphasize the importance of headings, tags, concise alt text, logical reading order, and layouts that do not rely solely on color to convey meaning. Legal and institutional frameworks such as the ADA (ADA homepage), Section 508 (Section 508 guidelines page), and WCAG (WCAG guidelines page) further reinforce the responsibility of universities to ensure accessible materials and to train faculty and staff accordingly. However, research also cautions that AI tools should not replace inclusive teaching practices, but rather supplement them.

Together, this research informs our project by reinforcing that effective accessibility solutions must combine automated description generation, structured organization of concepts, integration with lecture transcripts, and user-centered design. Our proposed Chrome extension aims to build on these insights by transforming visually structured lecture PDFs and slide content into organized, screen reader-accessible conceptual information that supports meaningful learning rather than simple surface-level description.

Disability Model Analysis

Disability Justice Principle 1

Our proposed system meets the Recognizing Wholeness principle, which emphasizes that disabled people are whole people with full lives, experiences, histories.

In our research when coming up with our project idea, we found that recent AI tools for automatically generated alt text are being treated as a primary technological “solution” for visual accessibility, but in many online disability forums we found that this is seen as somewhat reductive and can be an example of throwing AI at an accessibility barrier and deeming it fully solved (without verification of its effectiveness or usefulness for disabled people). Rather, being able to accurately know what each visual in a document is depicting should be seen as a bare minimum requirement of accessibility for BVI users, but this alone does not fully support BVI students in participating entirely in academic learning. Thus, our tool attempts to go beyond only providing autogenerated alt text by recognizing that BVI students reviewing lecture slide PDFs (even those that are fully accessible documents!) are trying to make cross-concept connections, study key ideas, understand how visuals relate to lecture content, and navigate professor emphasis on slides such as bolded, highlighted, or circled material on slides that many screen readers do not announce. Rather than us making a tool that only provides literal audio descriptions of everything on a lecture PDF, our project is designed to offer semantic explanations, conceptual connections to other parts of the lecture, summaries, and cues drawn from lecture transcripts and slide markdown for professor emphasis. Additionally, our system supports keyword search across lecture text and generated visual alt texts, allowing students to more flexibly revisit and connect concepts across slides. This reflects an understanding that BVI students need more than just autogenerated descriptions to engage with course material holistically, and we feel that supporting this deeper conceptual navigation further recognizes the wholeness of BVI students as learners rather than treating accessibility as a minimal, logistical add-on.

Disability Justice Principle 2

Our proposed system does not meet the Leadership of Those Most Impacted principle, which emphasizes centering the leadership and lived experiences of people marginalized by the systems being fought against.

Though we are a team of undergraduate students who have taken the UW CSE 12X lecture series which we are primarily testing our tool with, none of us have a life experience where we are navigating most day-to-day interactions with impaired or no vision. While some of us or our family members experience low vision when not wearing prescription glasses, this is not equivalent to the lived experiences of BVI students who regularly use assistive technologies such as screen readers to access academic materials. Because we are not being led by BVI students, it is especially important that our team continues to seek out and center first person accounts from BVI students as we have been trying to do, as well as research studies involving BVI student participants (ex. paper on AI prototype that makes video lectures more accessible for BVI people). From these sources, we can gain more insight into what tools are actually useful for BVI students reviewing lectures, understand what may constitute as a disability dongle, and avoid falling into technology solutionism where we might only focus on creating a new AI-powered tool when systemic pedagogical approaches (like improving visually accessible lecture training for professors) might actually be the only effective fix. This underscores why it is very important for our team to make sure that our tool is actually addressing a real student need through effective methods. Moving forward, engaging with even more disability justice-informed research and first-person accounts in this area will be critical for grounding our design decisions in real needs—especially in the absence of directly impacted leadership in our team.

Additional Questions

Our proposed system tries to avoid ableist assumptions but may still unintentionally reinforce some. In designing this tool, we are actively pushing against assumptions that access simply means description (e.g. that if an image is described then the accessibility barrier is fully solved), that BVI students are only trying to passively receive information rather than actively study, review, synthesize, and connect concepts, or that accessibility only needs to support real-time lecture participation rather than review later on. By supporting features such as conceptual summaries, professor emphasis cues, and keyword search over generated visual alt texts, our system aims to increase user control and agency by allowing BVI students to revisit, skim, and connect lecture content using our tool on their own terms. For example, the ability to navigate slides through keyword searches of visual descriptions or identify emphasized professor content that may otherwise go unannounced by screen readers can support studying agency. However, the current version of our system does not fully support user control, such as the ability to query the AI model for more detailed explanations, corrections, or clarification of its provided summaries. Our project may also still reflect ableist assumptions in other ways, such as assuming that online academic materials are the primary site of visual inaccessibility when inaccessible teaching practices may be the primary barrier, that students necessarily want automated interpretation of meaning through semantic summaries, and that our tool in its current structure will be able to achieve a level of output quality and user experience that will be useful and accurate for BVI students. For example, by implementing keyword-based search over visual descriptions, we are implicitly assuming that BVI students could find value in studying lecture visuals through text-based skimming, which may not reflect all user preferences.

While we are attempting to ground our project in first person accounts and further research, our team is not currently being led by BVI students who regularly engage with lecture slide materials. We are also currently not able nor trained to conduct surveys or studies with BVI students, so we are relying on interpreting first person accounts found online as being relevant to our specific project context, which may overlook key pain points or priorities that those users would identify if we directly consulted them. Also, the perspectives highlighted in these accounts may not fully represent the views of BVI technologists and designers who actively work on accessibility tools like ours, which could bring key insights into what makes AI-powered assistive technologies genuinely useful. Although we do have a team member who is involved in accessibility research, this is not the same as incorporating directly impacted leadership into our design process, and primarily relying on published first person accounts could risk us filtering people’s lived experiences through our own interpretations.

Our system is designed to include BVI students who use screen readers as our core users by ensuring that our browser extension is fully screen reader accessible, and by supporting interaction with lecture materials through generated descriptions, summaries, and transcript-linked contextual information. However, currently we are unfortunately unable to fully include BVI students who would prefer to use our tool to study in languages other than English, especially those who may benefit from receiving the tool’s semantic explanations or conceptual connections in a different language. This potentially leaves out a lot of students whose first language is not English. While many AI models are capable of automatic translation, we have not planned for multilingual support within the scope of our project due to time, as implementing such a feature would also require careful validation to avoid inaccurate translations. As a result, our current design includes users who wish to use our tool in English, which does unfortunately oversimplify disability and identity by holding the implicit assumption that most BVI students would primarily prefer to use our tool in English.

Project Storyboard

Storyboard showing a student using a Chrome extension alongside CSE 123 lecture slides to search for and review content about modifying a linked list in order to compile notes. The extension highlights relevant slides, provides semantic explanations for visuals, and helps the student take notes before turning the extension off.

A student wants to review content and take notes on modifying linked lists. They decide to open the Chrome extension and utilize the keyword search feature. The extension shows results of relevant slides, and the students can click into it. There is lots of information on the images and visuals, as it shows alt-text and semantic explanations. This helps them review the content and take notes effectively.

Task 2: Figure out the topic of a flagged visual-heavy lecture slide while studying for final exam

Storyboard showing a student studying for a CSE 123 final exam who encounters a visual-heavy lecture slide and opens a Chrome extension for assistance. The extension identifies the slide’s topic, provides a summary of the content, and gives detailed explanations of the visuals, helping the student better understand the material and apply their learning on an exam.

A student decides to review previous lecture slides to study for their final exam in CSE 123. They decide to start reviewing linked lists, but find that there are many visuals, so they open the extension. It shows valuable information, like information on the topic, summary of the content, and details beyond a short description or alt text on the visuals. This helps them gain a better understanding, and they do well on the exam.

Task 3: Figure out what is happening in the lecture visual that the professor is talking about

Storyboard showing a student attending a live CSE 123 lecture who becomes confused by a complex, changing diagram on a slide and opens a Chrome extension in “In Lecture Mode.” The extension provides alt text and step-by-step semantic explanations of what is happening in the visual, helping the student follow the professor’s discussion, understand the diagram changes, and stay engaged during the lecture.

A student is attending a live lecture and is following the slides on their computer. They are having difficulty understanding the content, especially the visuals on the slides. They decide to open the extension. It provides more details on what’s happening in the slide, as there’s alt-text and explanations for the visuals. Additionally, there’s information on the content too. This helps them better understand how the visuals and lecture in general are flowing, and they are better engaged in the content.

Accessibility Assessment

WCAG Guidelines Analysis

| WCAG # | Status (ie. “Supports”/“Partially Supports”/“Does Not Support”) | | ————- | ————- | | Success criterion 4.1.2 - Name, Role, Value | Partially Supports | | Success criterion 1.3.1 - Info and Relationships | Partially Supports | | Success criterion 2.4.3 - Focus Order | Partially Supports | | Success criterion 1.1.1 - Non-text Content | Partially Supports | | Success criterion 1.4.10 - Reflow | Does Not Support | | Success criterion 1.3.4 - Orientation | Supports | | Success criterion 1.4.4 - Resize Text | Does Not Support | | Success criterion 4.1.3 - Status messages | Partially Supports | | Success criterion 1.4.11 - Non textual contrast | Partially Supports | | Success criterion 1.4.3 - Contrast (Minimum) | Partially Supports |

Tools Tested With

Issues Identified in UARs

The issues we identified in our UARs cover multiple principles of the POUR, more specifically the Perceivable and Operable.

The first UAR (AT-CC-01) addresses the Perceivable principle. This issue involves insufficient color contrast across several textual and non-textual components of the extension interface. When UI elements such as input box borders, button outlines, divider lines, and instructional text do not meet WCAG contrast requirements, users with low vision may struggle to distinguish interface components from their surroundings. Since important visual information will be difficult to perceive, this violates the Perceivable guideline, which requires that information be presented in ways that users can recognize and interpret.

The second UAR (TC-SR-02) also relates to the Perceivable principle but specifically in the context of assistive technologies. In this case, a status message is incorrectly announced as a list when using a screen reader. Since the screen reader is communicating inaccurate structural information about the content, users may become confused about how the interface is organized. This misrepresentation of content prevents the information from being perceived correctly by users who rely on screen readers.

The third UAR (RT-KN-03) addresses the Operable principle. This issue occurs because keyboard navigation does not consistently allow users to access or interact with all components of the interface. Inconsistent focus behavior and difficulty navigating the keyword search component make it harder for users to perform tasks without using a mouse. Since many assistive technologies rely on keyboard interaction, this problem prevents some users from fully operating the extension.

Overall, the 3 UARs demonstrate that the accessibility challenges in our project affect both how users perceive information and how they interact with the interface.

Output Validation

As part of our additional system validation, we conducted a quick informal experiment inspired by a meeting with our professor Jen, who was interested in whether providing lecture transcript context could improve the quality of AI-generated alt text for lecture visuals. Using our SlideAssist AI tool, we ran an informal experiment of our initial lecture PDF analysis API call under three conditions: No Transcript (lecture PDF only), Raw Transcript (lecture PDF with an unedited Panopto transcript), and Annotated Transcript (lecture PDF with an annotated Panopto transcript containing slide markers). We extracted the generated alt text for each visual element and manually evaluated outputs using four qualitative criteria: Accuracy, Conciseness, Relevance, and Specificity. The annotated transcript condition achieved the highest average quality score (4.33) from 1-5, followed closely by the no-transcript (4.28) and raw transcript (4.24) conditions.

While this provides some support for exploring our hypothesis that slide-aligned transcript context may improve lecture visual alt text generation, the score differences found through our current research method were statistically small and based on subjective alt text scoring (by students who are not trained in alt text analysis). In fact, we believe a major factor limiting alt text generation performance across all conditions with our tool is our current system prompt, which does not explicitly guide the model using clear standards for what constitutes high-quality alt text. For example in several alt text generation cases, the model described changes between slides or broader lecture context mainly rather than the exact visual content present on the current slide. Therefore, future work for us based on this validation method should focus on fine-tuning our system prompt to incorporate structured alt text guidelines and even examples (ex. few-shot prompting) before generation. In terms of this type of autogenerated alt text validation research, in the future we should also run repeated trials and use more objective evaluation methods.

We also informally assessed the accuracy of the Lecture Review component’s generated summaries and information by comparing its outputs directly with the original lecture slide content and corresponding transcript excerpts. To do this, we selected several slides from a lecture and manually examined whether the generated summaries correctly reflected the key ideas presented on the slides and the explanations provided by the instructor in the transcript. Overall, we found that the Lecture Review summaries were generally accurate and successfully captured the main concepts of the slides. For example, on a slide titled “Revisiting insertAfterLast,” the system correctly described the linked list diagram containing the nodes 1, 3, 4 and identified the roles of the front pointer and the traversal node reference used during iteration. In some cases, the summaries also incorporated explanatory context from the transcript, such as why traversal stopping at null can cause issues when modifying a list. While these explanations were not always explicitly written on the slide itself, they accurately reflected the instructor’s verbal explanation inferred from the transcript.

The autogenerated alt text from the Lecture Review component was also informally evaluated to determine whether it provided better detail and clarity compared to the alt text produced by the Keyword Search feature. While the Keyword Search tool may return slides out of chronological order based on search relevance, we manually selected the slide corresponding to the “Revisiting insertAfterLast” diagram for direct comparison. In this example, the Keyword Search alt text briefly described the diagram as two pointer boxes labeled “front” and “node” pointing to a list containing the values 1, 3, and 4. In contrast, the Lecture Review component generated a more detailed description of the diagram structure, explaining that the list consisted of nodes containing the integers 1, 3, and 4 and that each node included separate data and next reference fields. While both descriptions were accurate, the Lecture Review alt text provided a slightly more thorough explanation of the diagram’s structure, likely due to its access to broader lecture and transcript context. Overall, this informal comparison suggests that the Lecture Review component can generate more descriptive alt text than the Keyword Search component in some cases, though both approaches produced generally accurate descriptions of the visual content.

Learnings & Future Work

Through the process of developing SlideAssist AI, we learned that improving accessibility for visual course materials requires more than simply adding alt text to images. Many Lecture slides contain diagrams, charts, visual relationships that communicate meaning beyond what a short caption can describe. BVI students often need explanations that capture the conceptual relationships within visuals, such as trends in graphs or the connections between elements in a diagram. Building this extension helped us better understand how inaccessible PDFs create barriers in academic environments and how assistive technologies must account for both literal and conceptual understanding of visual information. Another important thing we learned was the role of context in generating meaningful descriptions. Generic image-captioning models often produce surface-level descriptions that miss the instructional intent of lecture materials. By incorporating lecture transcripts and slide text into the analysis process, we were able to generate explanations that are more aligned and combining multiple sources of information can improve the usefulness of AI-generated accessibility features. We also learned the importance of interface accessibility through testing. For example, our accessibility audit revealed keyboard navigation issues in parts of the extension, highlighting how important it is for assistive technologies to support full keyboard interaction and predictable navigation patterns.

While our current design demonstrates the potential of AI-assisted lecture analysis, there are several directions for future work. First, the system could be extended to support a wider variety of visual formats, including more complex diagrams, mathematical notations, and multi-layered charts that require deeper interpretation. The tool should also be able to support larger file upload sizes. Future versions could also incorporate real-time lecture recordings or Panopto transcript streams to provide explanations that update dynamically during a lecture. Additionally, integrating more advanced visual reasoning models could improve the accuracy and depth of generated explanations. Another area for Improvement is expanding the interactive capabilities of the extension. For example, users could ask follow-up questions about specific parts of a diagram, request simplified explanations, ask for translation to a different language for international students, or navigate visual elements through structured descriptions. Conducting user studies with BVI students would also be an important next step to evaluate how well the system supports real learning workflows and to identify design improvements.


Installation

Prerequisites

  • Google Chrome
  • Node.js (v18 or higher recommended)
  • npm
  • Gemini API key

Build Extension

  1. Clone the repository (or navigate to the project directory):

    cd lecture-extension
    
  2. Install dependencies:

    npm install
    
  3. Set up Gemini API key: Create a .env.local file in the root directory:

    VITE_GEMINI_API_KEY=your_gemini_api_key_here
    
  4. Build the extension:

    npm run build
    

Load Extension in Chrome

  1. Open Chrome Extensions page:
    Navigate to chrome://extensions in a Chrome tab.
  2. Turn on Developer Mode:
    Toggle the Developer Mode button on in the top-right corner.
  3. Load in extension:
    Click the “Load unpacked” button in the top-left corner and load the project’s dist/ folder.

    Note that while developing, any time you make updates to the Chrome extension you will need to run npm run build again and click “Reload” on the Chrome extension in the chrome://extensions page.


Technologies Used

  • React
  • TypeScript
  • CSS
  • Vite
  • Gemini Flash 3
  • Chrome local storage

Acknowledgments

Built for CSE 443 (Digital Accessibility) at the University of Washington. Thank you to our Professor Jen Mankoff and the CSE 443 TAs for their support and guidance throughout this project!