CSE443 Final Project

Accessible Transcript Review Tool
Sahana & Mahima

Project Overview

Project Wireframe: https://scrum-tofu-27822975.figma.site/

After receiving feedback, we narrowed the scope of our project. Instead of building a full end-to-end speech-to-text system, we begin with transcripts generated by an existing note-taking tool and focus on helping users interpret, repair, and restructure transcripts.

Our design space focuses on:

Transcript interpretation
Error repair
Transcript restructuring
Reducing cognitive overload

We situate the project in the context of university lecture transcripts, since note-taking tools are increasingly used by students for both productivity and accessibility.

Tasks and Timeline

Task 1: Review and Repair Raw Transcript

Workflow

User opens the application
A pre-generated lecture transcript appears
The transcript includes timestamps and speaker labels
The system highlights low-confidence or unclear text
Users can collapse sections to reduce visual overload

Users can then:

Edit incorrect words
Relabel speakers
Reorganize transcript sections
Tag content into categories

Example categories include:

Key Concepts
Definitions
Examples

Accessibility Focus

This task supports accessibility by:

Supporting different cognitive processing styles
Providing a plain-language option to reduce linguistic barriers
Supporting ESL learners
Supporting students with learning disabilities
Allowing multiple representations of information, such as outlines, summaries, and bullet points

Goal: Make transcription errors visible and reduce ambiguity in long transcripts while reinforcing that AI-generated text should not be treated as authoritative.

Task 2: Study Mode

Workflow

User selects Study Mode
The system automatically generates:
- A structured outline
- A plain-language summary
- Bullet-point key concepts
The user can adjust reading level and text size
Notes can be exported in accessible formats

Accessibility Focus

Study Mode supports accessibility by:

Providing multiple representations of the same content
Supporting different learning and cognitive styles
Reducing linguistic complexity through plain-language summaries
Helping ESL students and students with learning disabilities

Validation Plan

Since we are no longer building a transcription system, our validation focuses on usability and clarity rather than speech recognition accuracy.

Baseline Analysis

We will first run an existing note-taker on prerecorded lecture audio and document:

transcription errors
structural weaknesses
areas of cognitive overload

This establishes the limitations of current tools.

Usability Evaluation

We will test whether users can:

review transcripts
edit incorrect text
restructure notes
generate summaries

Evaluation metrics include:

repair efficiency
clarity of structured notes
differences between raw and edited summaries

The goal is to determine whether the interface improves transcript sense-making.

Accessibility Testing

Accessibility validation will include:

WCAG 2.1 audit
Keyboard-only navigation testing
Screen reader testing using VoiceOver
Verification of semantic HTML structure and color contrast

Storyboards

Task 1: Review and Repair Transcript

ALT: Three-panel storyboard titled “Task 1: Review and Repair Transcript”.
Panel 1 shows a lecture transcript with timestamps and highlighted low-confidence text.
Panel 2 shows a student editing incorrect words and tagging sections as Definition or Key Concept.
Panel 3 shows Study Mode generating a structured outline and plain-language summary.

Task 2: Study Mode

Study Mode Storyboard

ALT: Four-panel storyboard titled “Task 2: Study Mode”.
Panel 1 shows the user selecting Study Mode.
Panel 2 shows the transcript converted into structured notes and summaries.
Panel 3 shows settings for reading level and text size.
Panel 4 shows exporting the notes.

Timeline

Week 9 – Functional Prototype

Goal: Working system with core functionality.

Sahana:

Design transcript display UI
Implement timestamp navigation
Add editing functionality
Add accessibility features (keyboard navigation, contrast improvements, text resizing)

Mahima:

Implement transcription workflow
Integrate speech-to-text API
Ensure live captioning works

Deliverable: End-to-end record, save, and edit workflow.

Week 10 – Validation Phase

Sahana:

Conduct WCAG accessibility audit
Test keyboard-only navigation
Perform screen reader testing
Refine UI based on accessibility findings

Mahima:

Conduct Word Error Rate (WER) analysis
Test summary generation accuracy
Validate timestamp alignment

Deliverable: Documented validation results and improved system.

Final Presentation Week

Live demo
Poster presentation
Completed README
Storyboards uploaded with ALT text

Feasibility Analysis

This scope is feasible because we are not building a speech recognition system. Instead, we are designing an interaction layer on top of existing transcripts.

Technologies used include:

Web-based frontend interface
Accessibility evaluation tools such as VoiceOver and WCAG guidelines

Both team members have experience with frontend development and accessibility evaluation.

Fallback Plan

If technical challenges occur:

Use prerecorded transcripts instead of real-time transcription
Use rule-based summaries before integrating advanced models

Risks and Mitigation

Risk: Summary quality depends on transcription accuracy
Mitigation: Allow users to edit transcripts before generating summaries

Risk: Accessibility compliance requires iteration
Mitigation: Begin accessibility testing early in Week 10

Risk: Scope creep from AI features
Mitigation: Prioritize core transcript interaction features first

Disability Analysis

Principle 1: Leadership of Those Most Impacted

Accessibility solutions should be guided by people who directly experience disability-related barriers.

Our project is not led by individuals who identify as disabled users of speech-to-text tools, which creates a limitation. Design decisions may unintentionally reflect assumptions rather than lived experience.

To partially address this limitation, we:

studied first-person accounts from Deaf and hard-of-hearing users
analyzed critiques of captioning and transcription systems
grounded design decisions in documented accessibility needs

However, these steps do not replace participatory design with disabled stakeholders.

Principle 2: Intersectionality

Intersectionality recognizes that disability intersects with identities such as language background, race, socioeconomic status, and education.

Our design addresses intersectionality by:

supporting multiple spoken languages
allowing interaction without fine motor precision
providing visual, textual, and editable outputs
allowing customizable summaries to reduce cognitive load

These features support users who may experience multiple accessibility barriers simultaneously.

Is the Technology Ableist?

Speech technologies often perform worse for non-standard accents, speech impairments, or atypical communication styles.

Our design mitigates this by:

enabling transcript correction
allowing user control
supporting personalization and editing

However, reliance on speech input means the system cannot eliminate all accessibility barriers.

Is It Informed by Disabled Perspectives?

The project is informed by disabled perspectives but not led by them.

Sources include:

accessibility research
first-person narratives
critiques of assistive technologies

Common frustrations we considered include caption inaccuracies, cognitive overload, and lack of editing control.

Does It Oversimplify Disability?

We attempt to avoid oversimplification by providing customizable interaction modes and avoiding a single-user model.

Future improvements would involve participatory design with diverse disability communities.