Skip to the content.

Pausable TTS: Final Hand-In

Authors: Jacklyn Cui, Shangzhen Yang

Introduction

Many people rely on Text-to-Speech (TTS) tools to read written content aloud, particularly blind, visually impaired, or neurodivergent individuals. These tools play an important role in providing access to books, websites, emails, and other digital content. However, despite their importance, most TTS applications are missing essential features that could make them more inclusive and user-friendly. For example, most TTS tools can only read text from start to finish or allow changes to the reading speed. If a user loses focus or misses part of the content, they often have to manually restart or reselect sections of text to hear it again. This limitation is especially challenging for neurodivergent users, who may find it difficult to keep up with the fast speech rates typically used in TTS tools. Additionally, the lack of pause or replay functions makes it frustrating and time-consuming to navigate or review critical pieces of information, particularly on mobile devices, since keyboard shortcuts are unavailable. These usability issues served as the primary motivation for our project: to create a TTS app that gives users more control. Our app is designed to address these challenges by including features such as pause, replay, and adjustable playback speed.

The inspiration for this project came from listening to first-person accounts and feedback from the BVI and neurodivergent communities. Many existing TTS tools assume that users can follow fast-paced speech effortlessly or that there will be no interruptions or distractions. However, real life is not that simple. For example, in a video by a blind individual who uses a screen reader every day, they demonstrated how quickly text is read aloud and explained how they often lose track of details if their attention is diverted for even a moment. They also pointed out that replaying text is cumbersome and disruptive, especially when multitasking. Our app seeks to directly address these issues by prioritizing user control and flexibility. In doing so, we follow principles of disability justice, ensuring our solution is not only functional but also inclusive and respectful of diverse needs.

In summary, our app reimagines TTS technology to provide greater accessibility and control for users who are blind, visually impaired, neurodivergent, or anyone who prefers to listen to text rather than read it. By addressing the gaps in existing tools and incorporating features like pause, replay, and adjustable speed, we aim to create a solution that empowers users to engage with text in ways that work best for them. Our ultimate goal is not only to make technology more accessible but also to promote independence, inclusion, and dignity for all users.

Positive Disability Principals Analysis

Methodology and Results

Technical Approach

Our app was developed using the Flutter framework to ensure cross-platform compatibility, enabling seamless functionality across desktop and mobile devices. The core TTS functionality was implemented using the flutter_tts library, which facilitates real-time text-to-speech conversion. To enhance usability, we designed features that allow users to:

  1. Input text directly.
  2. Upload text files locally.
  3. Access text via a provided URL.

A screenshot of an iOS mobile application titled "Pausable Text-to-Speech." The interface displays a text box labeled "Text to Synthesize," which contains a bulk of texts for the app to synthesize. Below the text box are buttons labeled "Upload from Device," and "Upload an URL," arranged horizontally. Another button, "Synthesize Speec,h" was arranged below that. The app has a light purple theme with a red background banner "debug" in white texts on the top right. The texts entered in the textbox read, "This class uses a combination of in class work, individual assignments, and one larger project. Students spend a majority of the class on a longer open ended final project that is more research oriented. All of the assignments will have some minimum required competencies (see the syllabus and the section on competencies below) and students may indicate a certain number of self-selected additional competencies they wish to be assessed on."

Playback Controls

The app offers intuitive playback options, including:

These feature address limitations in traditional TTS applications, particularly for indivuals who struggle with following long sentences or fast speech rates.

A screenshot of an iOS mobile application titled "Synthesize Voice" with a back button on the title's left. Below the title bar, there is a highlighted text block in the yellow background and bolded red text, "This class uses a combination of in-class work, individual assignments, and one larger project." Below that were the remaining contents in red fonts: "Students spend a majority of the class on a longer open ended final project that is more research oriented. All of the assignments will have some minimum required competencies (see the syllabus and the section on competencies below) and students may indicate a certain number of self-selected additional competencies they wish to be assessed on." Below the text are playback controls with icons labeled "Speak," "Stop," "Previous," and "Next." Further down, there are adjustable sliders for "Volume," "Pitch," and "Rate," with volume and rate set at mid-level and pitch set at 1/3 position. At the bottom, there is a dropdown menu labeled "Select Language," currently set to "English (United States)." The app has a light purple theme with a red background banner and "debug" in white text on the top right.

User Interface

The user interface was designed with accessibility in mind. Key elements include:

A screenshot of a mobile application interface titled "Pausable Text-to-Speech," similar to the left image in the second row, but with larger texts on the interface. The text in the textbox shows texts, "Larger Text UI is available." Below the text box, the interface has the same three buttons labeled "Upload from Device," "Upload an URL," and "Synthesize Speech," with "Upload from Device" and "Upload an URL" on the same row "Synthesize Speech" below them and in the center. The app retains the light purple theme and a debug label in the top right corner.  The app retains the light purple theme and a debug label in the top right corner, which is the same size as the previous UI.

Language Support

To sever a diverse audience, our app includes options for speech synthesis in multiple languages. Users can select their preferred language from a dropdown menu. It allows the app to support more types of content and cater to a broader user base.

A screenshot of an iOS mobile application titled "Synthesize Voice" with a back button on the title's left, similar to the right image in the second row. The texts and layouts, including the font sizes, remained the same compared with the right image in the second row. The only difference is that the dropdown menu labeled "Select Language" is expanded, showing a list of language options including "English (United States)" (selected), "French (France)," "Russian (Russia)," and other languages, such as "Spanish (Spain)" and "Japanese." The app has a light purple theme with a red background banner and "debug" in white text on the top right.

Accessibility Testing

We ensured our app met accessibility standards by testing it across platforms using Lighthouse for web and Google Accessibility Scanner for Android, along with manual testing. Lighthouse evaluated key aspects like screen reader compatibility, color contrast, and labeling for the web interface, while Google Accessibility Scanner identified potential issues on Android, such as touch target sizes and missing content descriptions. Manual testing complemented these automated tools by validating the user experience firsthand, ensuring the app is intuitive and accessible across both web and mobile platforms. These efforts helped us create an inclusive tool for users, particularly those who are blind, visually impaired, or neurodivergent.

Results

By the project’s completion, we successfully developed a functional prototype demonstrating all planned features. The app was tested for cross-platform compatibility and performed consistently across iOS, Android, macOS, and Windows. Preliminary user feedback highlighted the effectiveness of the replay and pause features, as well as the intuitive design of the playback controls.

Future Enhancements

The next steps for the project include:

Disability Justice Analysis

Learning and Future Work