DESIGN AND IMPLEMENTATION OF TEXT TO SPEECH APPLICATION FOR VISUALLY IMPAIRED STUDENT

Diploma

ABSTRACT

A Text-to-speech synthesizer is an application that converts text into spoken word, by analyzing and processing the text using Natural Language Processing (NLP) and then using Digital Signal Processing (DSP) technology to convert this processed text into synthesized speech representation of the text. Here, we developed a useful text-to-speech synthesizer in the form of a simple application that converts inputted text into synthesized speech and reads out to the user which can then be saved as an mp3.file. The development of a text to speech synthesizer will be of great help to people with visual impairment and make making through large volume of text easier.


INTRODUCTION

Speech is the primary means of communication between people. Speech synthesis, automatic generation of speech waveforms, has been under development for several decades (Santen et al. 1997, Kleijn et al. 1998). Recent progress in speech synthesis has produced synthesizers with very high intelligibility but the sound quality and naturalness still remain a major problem. However, the quality of present products has reached an adequate level for several applications, such as multimedia and telecommunications. With some audiovisual information or facial animation (talking head) it is possible to increase speech intelligibility considerably (Beskow et al. 1997). Some methods for audiovisual speech have been recently introduced by for example Santen et al. (1997), Breen et al. (1996), Beskow (1996), and Le Goff et al. (1996).

The text-to-speech (TTS) synthesis procedure consists of two main phases. The first one is text analysis, where the input text is transcribed into a phonetic or some other linguistic representation, and the second one is the generation of speech waveforms, where the acoustic output is produced from this phonetic and prosodic information. These two phases are usually called as high- and low-level synthesis. A simplified version of the procedure is presented in Figure 1.1. The input text might be for example data from a word processor, standard ASCII from e-mail, a mobile text-message, or scanned text from a newspaper. The character string is then preprocessed and analyzed into phonetic representation which is usually a string of phonemes with some additional information for correct intonation, duration, and stress. Speech sound is finally generated with the low-level synthesizer by the information from high-level one.


BACKGROUND OF THE STUDY

Speech is probably the most efficient medium for communication between humans. Speech synthesis is the artificial synthesis of human speech (N.Swetha, K.Anuradha, 2013). A text-to-speech synthesizer (TTS) is a computer based system that should be able to read any text aloud, whether it is directly introduced in to the computer by an operator or scanned and submitted to an Optical Character Recognition (OCR) system. A TTS converts normal language text into speech whereas other systems render symbolic linguistic representations like phonetic transcriptions into speech.  

Text-to-speech synthesis -TTS - is the automatic conversion of a text into speech that resembles, as closely as possible, a native speaker of the language reading that text. Text-to speech synthesizer (TTS) is the technology which lets computer speak to you. The TTS system gets the text as the input and then a computer algorithm which called TTS engine analyses the text, pre-processes the text and synthesizes the speech with some mathematical models. The TTS engine usually generates sound data in an audio format as the output. 

STATEMENT OF PROBLEMS

·        There are huge communication gap between the normal Nigerians and blind and visually impaired Nigerians.

·        Blind and visually impaired Nigerians lack more access to personal computer facilities.

·        Individuals with disabilities (blind and visually impaired) in Nigeria, lives in Stone Age (Uncivilized)

·         Difficulties in distance communications; since Digital communication devices like pc are not designed for blind and visually impaired.

AIM AND OBJECTIVES

The proposed study aimed at creating a text to speech program that shall attempt to determine how this proposed system can ease reading and purposely to be used by illiterate blind and visually impaired Nigerians since they can’t see to read.

Specifically, the proposed system has the following objectives;-

·        The proposed system shall allow its users with a dynamic environment for reading articles, stories to his/her hearing.

·        To pave a way for the blind and visually impaired Nigerians who understand English language and cannot see to read.

·        It will bring about conveniences for computer users, since it does the reading working with clear voice of the users choice.

·        To reduce stress from reading.