Accessible Videos

Automation and Video Captioning

Automatic Speech Recognition (ASR) has made great strides in the last 15 years. With the right conditions, people using ASR for dictation can achieve very high levels of accuracy. But the reality is that even if ASR were 100% accurate (and it is far from that!), it would not provide equal access to videos. A word for word translation of speech into text is not the same as captioning. It does not provide other sounds that are missing such as laughter or a knock at the door. It does not include any punctuation. It does not inform the reader who is speaking (if there is more than one person speaking). The reality is that accuracy rates vary widely but the average is about 65 to 75%. Eve Hill, a disability rights attorney, when asked if 65% is compliant, put it this way:

[Screen reader users: The following text has x’s scattered about to visually represent missing 35% of the text.]

“I XonXt kXow. IX onXy 7 oXt oX eveXy 10 XetXers Xere XorrXct Xn a bXok, wXuld Xou bXy XhXt bXok? XoulX thXt Xe XffeXtivX XommXuniXatiXn?”

It is really a disservice to call anything created with automation “captions.” Some people would even say that the automatic transcripts provided by some tech companies are a disserve to Deaf and hard of hearing people because many misunderstand and call it “done” without creating true captions. To create a truly accessible video, you can start with the automatically-generated transcript, but it is necessary to download it (or use online editing if provided), make corrections, add punctuation and other information that is missing, and then upload the caption file.