This is a title



Enhancing Accessibility Through Automatic Speech Recognition

Mike Wald

Learning Technologies Group

School of Electronics and Computer Science

University of Southampton

Southampton SO171BJ

United Kingdom

M.Wald@soton.ac.uk

Automatic speech recognition can enhance accessibility through the cost-effective production of text synchronised with speech. This can assist those who require captioning or find notetaking difficult, help manage and search online digital multimedia resources and assist blind, visually impaired or dyslexic people by augmenting synthetic speech with natural recorded real speech.

automatic speech recognition, accessibility, synchronised text and speech

1. Introduction

SINCE 1999 THE AUTHOR HAS WORKED WITH LIBERATED LEARNING (LL) [1] TO RESEARCH HOW AUTOMATIC SPEECH RECOGNITION (ASR) CAN MAKE SPEECH ACCESSIBLE [2] [3] [4]. STANDARD AUTOMATIC SPEECH RECOGNITION SOFTWARE PRODUCED A CONTINUOUS STREAM OF UNPUNCTUATED TEXT FROM LECTURERS’ NATURAL SPONTANEOUS SPEECH WHICH WAS VERY DIFFICULT TO READ OR UNDERSTAND. AN APPLICATION (VIASCRIBE) HAS THEREFORE BEEN DEVELOPED IN COLLABORATION WITH IBM [5] TO AUTOMATICALLY FORMAT THE ASR TRANSCRIPTION BASED ON PAUSES/SILENCES TO PROVIDE BOTH A READABLE REAL TIME DISPLAY AND AN ARCHIVED VERBATIM TRANSCRIPTION SYNCHRONISED WITH A RECORDING OF THE SPEECH. THIS ASR APPLICATION IS CONTINUING TO BE IMPROVED AND CAN WORK WITH LIVE OR RECORDED SPEECH AND CAN ALSO SYNCHRONISE AND REPLAY POWERPOINT FILES.

Tools that synchronise pre-prepared text and corresponding audio files, either for the production of electronic books [6] based on the DAISY specifications [7] or for the captioning of multimedia [8] using for example the Synchronized Multimedia Integration Language [9] are not normally suitable or cost effective for use by teachers for the ‘everyday’ production of learning materials. This is because they depend on either a teacher reading a prepared script aloud, which can make a presentation less natural sounding and therefore less effective, or on obtaining a written transcript of the lecture, which is expensive and time consuming to produce.

UK legislation requires speech materials to be accessible to disabled learners and so as speech becomes a more common component of online learning materials, the need for synchronised ASR could therefore increase.

2. POTENTIAL OF ASR TO EnhanCE Accessibility

ASR HAS THE POTENTIAL TO:

• provide synchronised text online or in classrooms for captioning for deaf or hard of hearing people;

• provide automatic online lecture notes synchronised with speech and slides, as deaf and hard of hearing people and many other learners find it difficult or impossible to take notes at the same time as listening, watching and thinking while others are unable to attend the lecture for mental or physical health reasons;

• assist blind, visually impaired or dyslexic people to read and search material more readily by augmenting unnatural synthetic speech with natural recorded real speech;

• assist users to manage and search for online digital multimedia resources that include speech by automatically synchronising the speech with text to facilitate manipulation, annotation, indexing, searching and playback of the multimedia using the synchronised text.

3. Accuracy & Editing

LL RESEARCH [10] FOUND THAT AN ‘ACCEPTABLE’ ASR ACCURACY WAS 85% OR ABOVE AND THIS WAS ACHIEVED FOR 40% OF LECTURERS WHO USED ASR IN CLASSES. LECTURERS VARIED IN THEIR EXPERIENCE, ABILITIES, FAMILIARITY WITH THE LECTURE MATERIAL AND THE AMOUNT OF TIME THEY COULD SPEND ON IMPROVING THE VOICE AND LANGUAGE MODELS. THE TRANSCRIPTS WERE CORRECTED FOR ERRORS BEFORE BECOMING AVAILABLE FOR STUDENTS ON THE INTERNET AND IN SPITE OF ANY PROBLEMS, STUDENTS AND TEACHERS GENERALLY LIKED THE LL CONCEPT AND FELT IT IMPROVED TEACHING. SINCE RECOGNITION ERRORS DO OCCUR, ONE OR MORE EDITORS CORRECTING ERRORS IN REAL TIME IS ONE WAY OF IMPROVING THE ACCURACY OF THE REAL TIME DISPLAY OF ASR.

4. Personalised Displays

LL RESEARCH HAS SHOWN THAT WHILE PROJECTING THE TEXT ONTO A LARGE SCREEN IN THE CLASSROOM HAS BEEN USED SUCCESSFULLY IT IS CLEAR THAT IN MANY SITUATIONS AN INDIVIDUAL PERSONALISED AND CUSTOMISABLE DISPLAY ON A STUDENT’S OWN PERSONAL WIRELESS COMPUTER SYSTEM WOULD BE PREFERABLE OR ESSENTIAL TO IMPROVE READABILITY AND USABILITY. IT WILL PROVIDE THE FACILITY FOR STUDENTS TO CHOOSE DISPLAY PREFERENCES (E.G., FONT, SIZE, COLOUR, TEXT FORMATTING AND SCROLLING), MARK SECTIONS, CORRECT ERRORS AND ADD THEIR OWN NOTES IN REAL TIME AND SYNCHRONISED WITH THE TRANSCRIBED SPEECH.

5. USE OF ASR IN UK Higher Education

DEAF AND HARD OF HEARING PEOPLE CAN FIND IT DIFFICULT TO FOLLOW SPEECH THROUGH HEARING ALONE OR TO TAKE NOTES WHILE THEY ARE LIP-READING OR WATCHING A SIGN-LANGUAGE INTERPRETER. ALTHOUGH SUMMARISED NOTETAKING AND SIGN LANGUAGE INTERPRETING IS CURRENTLY AVAILABLE, NOTETAKERS CAN ONLY RECORD A SMALL FRACTION OF WHAT IS BEING SAID WHILE QUALIFIED SIGN LANGUAGE INTERPRETERS WITH A GOOD UNDERSTANDING OF THE RELEVANT HIGHER EDUCATION SUBJECT CONTENT ARE IN VERY SCARCE SUPPLY. ALTHOUGH UK GOVERNMENT FUNDING IS AVAILABLE TO DEAF AND HARD OF HEARING STUDENTS IN HIGHER EDUCATION FOR INTERPRETING OR NOTETAKING SERVICES, REAL TIME CAPTIONING HAS NOT BEEN USED BECAUSE OF THE SHORTAGE OF TRAINED STENOGRAPHERS WISHING TO WORK IN UNIVERSITIES. SINCE UNIVERSITIES IN THE UK DO NOT HAVE RESPONSIBILITY FOR PROVIDING INTERPRETING OR NOTETAKING SERVICES, THERE WOULD APPEAR TO BE LESS INCENTIVE FOR THEM TO DEVELOP THE USE OF ASR THAN UNIVERSITIES IN THE US, CANADA AND AUSTRALIA.

6. Conclusion

LEGISLATION REQUIRES SPOKEN MATERIAL TO BE ACCESSIBLE THROUGH CAPTIONING. SYNCHRONISED ASR CAN ALSO MAKE SOME TEXT AND VISUAL MATERIALS MORE ACCESSIBLE AND USABLE AND PROVIDES A PRACTICAL, ECONOMIC METHOD TO CREATE SYNCHRONISED TEXT AND ASSIST LEARNERS TO MANAGE AND SEARCH ONLINE DIGITAL MULTIMEDIA RESOURCES. THE ONLY ASR TOOL THAT CAN PROVIDE AN AUTOMATICALLY FORMATTED AND SYNCHRONISED TRANSCRIPTION WOULD APPEAR TO BE IBM VIASCRIBE WHICH IS BEING DEVELOPED IN COLLABORATION WITH THE LIBERATED LEARNING CONSORTIUM.

references

[1] LAST ACCESSED 2005-05-19

[2] Wald, M. “Developments in technology to increase access to education for deaf and hard of hearing students”. In: Proceedings of CSUN Conference Technology and Persons with Disabilities. California State University Northridge, 1999.

[3] Wald, M. “Hearing disability and technology”, Access All Areas: disability, technology and learning, JISC TechDis and ALT, 2002, pp. 19-23.

[4] Bain, K. Basson, S. Wald, M. “Speech recognition in university classrooms”. In: Proceedings of the Fifth International ACM SIGCAPH Conference on Assistive Technologies. ACM Press, 2002, pp. 192-196.

[5] last Accessed 2005-05-19

[6] last accessed 2005-05-19

[7] last accessed 2005-05-19

[8] last accessed 2005-05-19

[9] last accessed 2005-05-19

[10] Leitch, D. MacMillan, T. “Liberated Learning Initiative Innovative Technology and Inclusion: Current Issues and Future Directions for Liberated Learning Research”. Year III Report. 2003 Saint Mary's University, Nova Scotia.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download