[Note: I tried to give this page an easy-to-understand title, and failed. By “programmatically defined media” I mean videos (especially animations) and such whose content is at least partially defined or described by files, software, databases, etc. These can be processed to change the content of the media or the way it is rendered to the viewer. For example, if an animated character’s shirt is defined by a file that contains color and texture information, changing the file pointer changes the shirt.]
Accessibility techniques for media such as captions and video description have been nibbling at the margins – basically providing separate, secondary, supplementary alternatives. But what if we jump into the content itself and modify it for accessibility? Some modern media production tools permit this. MIDI has long allowed instrument substitutions: A MIDI file of a composition contains the notes and the instrument definitions. Change the latter and it’s the same composition, but played by a flute instead of an oboe.
The idea here is that if you can change frequency characteristics like that, you can optimize the piece for someone with hearing loss – you can even produce it to their exact audiological profile.
We have animation tools that can do the same thing, but more broadly: a character can be styled to appear in high contrast. Shop signs in the animation can appear with larger letters, in clearer fonts. Scenes that contain text meant to be read can be automatically played at a lower frame rate, giving people more time to read. Even sets and props can be modified. Objects important to the action can get more space on the screen by changing camera characteristics, thus playing the role of video descriptions for low vision (not blind) viewers. Using the same technique may improve speech reading – optional closeups on the speaker, even exaggerated mouth movements. Audio can be processed to reduce background sounds or music when characters are speaking.
Here are some rough (ugly, no captions, no descriptions) examples:
- Deixis (disambiguating pronouns, especially “this”, “that”)
- Enhanced contrast (not possible via OS settings)
- Camera angle customization for person/object perception, speech reading – closeups
- Large print for on-screen text (signs, etc.) or other salient objects
- Vowel epenthesis (for compound consonants: “brother” becomes “buh-ruh-ther”)
- Cognitive support (select a character to learn more about (or be reminded of), and the scenes he/she appeared in)
The advantage of these approaches is that they integrate accessibility into the content itself — integrated rather than segregated. The approaches are also deeply compatible with linguistic, geographical, and cultural localization – characters can speak in any language, wear any costume, use familiar objects, in front of familiar and appropriate backdrops.
We’re already seeing enough progress in object detection in videos to apply these techniques, post-production, to any video. Let the algorithm decide who’s speaking, and offer the viewer a speech-reading inset of that character’s face saying those lines.
We are seeking support for these ideas, both from stakeholders (is this really useful to you?) and funders, commercial and otherwise.