Free Essays, Free Research Papers, Free Book Reports and Free Term Papers
Great Essay Free Essays, Free Research Papers,
Free Book Reports and Free Term Papers

FREE ESSAY ON SPEECH RECOGNITION: PRINCIPLES AND APPLICATIONS

College Term Papers - Instant Download

(sponsored links)

Speech Recognition
This paper explores the issue of speech recognition including applications, forecasts and the state of the industry. -- 2,250 words; APA

Hate Speeches: The Right to Freedom of Speech
A look at freedom of speech with respect to 'hate' speeches' and whether or not these speeches are protected under the First Amendment. -- 1,168 words;

Face Recognition
This paper discusses the research around various issues of face recognition. -- 1,680 words; APA

Voice Recognition Technology
An examination of voice recognition technology and its uses. -- 3,000 words; MLA

"Business Law: Principles, Cases & Environment" by Anderson, Fox and Twomey
This paper is a chapter-by-chapter summary of "Business Law: Principles, Cases & Environment" by Anderson, Fox and Twomey, the legal and regulatory context upon which business operates and the social forces behind these rules and principles: Contracts, -- 6,975 words;

Click here for more essays on SPEECH RECOGNITION: PRINCIPLES AND APPLICATIONS

SPEECH RECOGNITION: PRINCIPLES AND APPLICATIONS

Table of contents
Abstract 3
Overview of the Characteristics of Automatic Speech Recognition Systems 4
Number of Words 4
Use of Grammar 5
Continuous vs. Discrete Speech 5
Speaker Dependency 6
Early Approaches to Automatic Speech Recognition 6
Acoustic-Phonetic Approach 7
Statistical Pattern Recognition Approach 8
Modern Approach to Automatic Speech Recognition 8
Hidden Markov Models 9 Training of an Automatic Speech Recognition System Based on HMMs
11 Sub-Word Units 11 
Applications of Automatic Speech Recognition Systems 12
Automated Call-Type Recognition 13
Data Entry 13
Future Applications Using Automatic Speech Recognition Systems 14 
Conclusion 14 
References 15
Abstract
With the advances of technology, a lot of people may think that integrating the ability
of understanding human speech in a computer system is a piece of cake. However,
scientists disagree. Since the early nineteen fifties, scientists have tried to implement
the perfect automatic speech recognition system, but they failed. They were successful in
making the computer recognise a large number of words, but till now, a computer that
understands everything without meeting any conditions does not exist. Due to the enormous
applications, a lot of money and time is spent in improving speech recognition systems. 
SPEECH RECOGNITION: PRINCIPLES AND APPLICATIONS
Nowadays, computer systems play a major role in our lives. They are used everywhere
beginning with homes, offices, restaurants, gas stations, and so on. Nonetheless, for
some, computers still represent the machine they will never know how to use.
Communicating with a computer is done using a keyboard or a mouse, devices many people
are not comfortable using. Speech recognition solves this problem and destroys the
boundaries between humans and computers. Using a computer will be as easy as talking with
your friend. 
Unfortunately, scientists have discovered that implementing a perfect speech recognition
system is no easy task. This report will present the principles and the major approaches
to speech recognition systems along with some of their applications. 
Overview of the Characteristics of Automatic Speech Recognition Systems
How can we evaluate a speech recognition system? Obviously describing it by good or bad
isn't enough since the performance of such a system may be outstanding in one application
and poor in another. In fact, speech recognition systems are designed according to the
application. Some of these variable characteristics are presented below. 
Number of Words
The major characteristic of a speech recognition system is the number of words it can
recognise. The question that comes to mind is how many words are enough so that the
performance of a speech recognition system is acceptable. The answer depends on the
application (6, p98). Some applications may require few words, like automated call-type
recognition, others may require thousands, like data entry. However, increasing the
number of words or the vocabulary of a speech recognition system increases its complexity
and decreases its performance (probability of error is higher)(6, p.98). Systems with
large vocabularies are also slower since more time is needed to search a word in a large
vocabulary. Increasing the number of words isn't enough because the speech recognition
system is unable to differentiate words like 'to' and 'two' or 'right' and 'write' (6
,p.98).
Use of Grammar 
Using grammar, differentiating words like 'to' and 'two' or 'right' and 'write' is
possible. Grammar is also used to speed up a speech recognition system by narrowing the
range of the search (6,p.98). Grammar also increases the performance of a speech
recognition system by eliminating inappropriate word sequencing. However, grammar doesn't
allow random dictation which is a problem for some applications (6, p.98).
Continuous vs. Discrete Speech 
When speaking to each other, we don't pause between words. In other words, we use
continuous speech. However, for speech recognition systems, there is difficulty in
dealing with continuous speech (6, p.98). The easy way out will be using discrete speech
where we pause between words (6, p.100). With discrete speech input, the silent gap
between words is used to determine the boundary of the word, whereas in continuous
speech, the speech recognition system must separate words using an algorithm which is not
a hundred per cent accurate. Still, for a small vocabulary and using grammar, continuous
speech recognition systems are available. They are reliable and do not require great
computational power (6, p.100). However, for large vocabulary, continuous speech
recognition systems are very difficult to achieve, require huge computational power, as
well as being slow. In fact, processing a speech sample can take three to ten times the
time required for a person to say it (6, p.100).
Speaker Dependency
Speech recognition system designers must consider another important issue: whether their
systems are speaker-dependent or speaker-independent. Each person pronounces a word
differently. Although it is easy for humans to recognise the word 'car' whether an
American or an Englishman says it, for speech recognition systems, this is not the case.
Speaker dependency is determined from the application, some may require speaker-dependent
systems (as in data entry), others may require speaker-independent systems (as in
automated call-type recognition)(6, p.100). Speaker dependency affects greatly the
training of an automatic speech recognition system (4, p.42). 
Early Approaches to Automatic Speech Recognition
When scientists dreamed about a machine capable of understanding spoken language,
computers and super fast integrated circuits were not available. However, they managed to
build the fundamental principles of speech recognition systems. Several approaches were
used, each one with advantages and disadvantages. Two of these approaches are discussed
below.
Acoustic-Phonetic Approach
The theory behind acoustic-phonetic approach is acoustic phonetics. This theory assumes
that spoken language is divided into phonetic units that are finite and particular. These
phonetic units are distinguished by properties that are apparent in the speech signal (7,
pp.42-43). The process by which speech is recognised is described briefly in what
follows: initially, speech is divided into segments. According to the acoustic properties
of these segments, an appropriate phonetic unit is attached to it. The obtained sequence
of units is used to formulate a valid word (7, p43). 
Figure 1: Phonetic sequence for a speech sample (7, 43).
As an example, consider the sequence of phonetic units matched with a sample of speech
illustrated in figure 1. The symbol 'SIL' indicates a silence whereas the vertical
position of the phonetic unit indicates how good it is matched with the corresponding
segment of speech (the higher, the best match). After searching, we can match the
phonetic sequence SIL-AO-L-AX-B-AW-T with the expression 'all about'. It is obvious that
the chosen phonemes are not only the first choices in the phonetic sequence, but also
second (B and AX) and third (L) choices. Therefore matching a phonetic sequence with a
word or a group of words is not obvious (7, p.43). In fact, this the main disadvantage of
this approach. 
Statistical Pattern Recognition Approach
In statistical pattern recognition, the speech patterns are directly inputted into the
system and compared with the patterns inputted in the system during training (7, p.43).
Unlike the acoustic-phonetic approach, the speech is not segmented nor checked for its
properties. If enough patterns are inputted to the speech recognition system during
training, it will perform better than the acoustic-phonetic approach. In general,
statistical pattern recognition approach is used more than acoustic-phonetic approach
because it is simpler to use, invariant to different speech vocabularies, and more
accurate (higher performance)(7, p.44).
Modern Approach to Automatic Speech Recognition
With the availability of computers and high speed microprocessors, more research was done
using the huge computational power available to solve the speech recognition problem.
However, scientists, till now, don't know the solution. Nevertheless, they were able to
implement new approaches that proved to be much more efficient than earlier methods.
Speech recognition systems are able to recognise more words and with more accuracy (3,
p.115). Some of these approaches are presented below. 
Hidden Markov Models (HMMs)
Speech is divided into phonemes. Unfortunately, these phonemes do not remain the same,
they change according to the surrounding phonemes (4, p.44). HMMs are a tool to represent
these changes mathematically.
A Markov model consists of a number of states linked together with each state
corresponding to a unique output. Each link between two states is characterised by a
probability called transitional probability (4, p.44). Moving from one state to another
or remaining in the same state is function of the corresponding transitional probability
(2, p.50). A classical example illustrating Markov models is the following: consider a
three-state weather system with state one being rainy, state two cloudy, and state three
sunny. Such a system is shown in figure 2 (transitional probabilities are added for
explanation below). From the diagram, it is clear that if the current day is sunny, the
probability of tomorrow being cloudy is 0.1, of tomorrow being rainy is 0.1, of tomorrow
being sunny is 0.8 (2, p.50).
Figure 2: Three-state Markov model of the weather (2, p.51).
This example is an observable Markov model since we can check the state we are currently
in (2, p.50). Nevertheless, speech recognition systems use hidden Markov models since the
speech fragment is not observable by the speech recognition system (2, p.50). In hidden
Markov models, a state can represent many outputs, therefore, a probability distribution
of all possible outputs is associated with each state. A diagram of a three-state HMM is
shown in figure 3 (4, p.44). This figure shows that each state has five possible outputs
(A, B, C, D, and E) occurring with a probability according to b--1(s), b2(s), or b3(s).
HMMs are doubly probabilistic since the transition from one state to the other and the
output generated at that state are probabilistic (4, p.44). Therefore we notice that if
we receive a sequence of outputs from an HMM, we are not able to retrace the sequence of
states that the HMM passed by to get that sequence (4, p.44). Looking at figure 3, it is
evident that an output sequence of A-B-C for example, can be achieved by any sequence of
three states; however, each sequence of states has its own probability of occurrence. In
speech recognition, each word is represented by a sequence of states (1, p.53),
therefore, it is essential to find this sequence for any sequence of outputs. In fact,
finding this sequence is equivalent to solving the speech recognition problem.
Figure 3: Three-state hidden Markov model (4, p.44).
The sequence of states is determined according to its probability. However, checking all
the probabilities of all possible sequences can be very time consuming, especially in
speech recognition HMMs that are much more complicated than our three-state example in
figure 3. This problem was solved using an algorithm that utilises the fact that the
probability of being in a certain state relies on the previous state (4, p.44).
Training of an Automatic Speech Recognition System Based on HMMs
As mentioned earlier, a major component of an HMM system are the probabilities between
states and the probability distribution of each state. To have a good speech recognition
system, these probabilities must change to factors like language, possible number of
speakers, and so on (3, p.115). Determining these probabilities is part of what is known
as training the speech recognition system.
This training process depends on whether we are dealing with a speaker-dependent or a
speaker-independent speech recognition system. In the first case, speech samples are
taken from the user and the probabilities are determined accordingly. In the second case,
speech samples are accumulated from many speakers in addition to the text of what was
said. In this case, the training process is much more complicated since the spectrogram
(measure of frequency vs. time) of the same word depends on the speaker. A training
process consists also of implementing a dictionary holding the vocabulary along with a
grammar of permitted word sequences (4, p.42). 
Sub-Word Units
In HMMs, each word is represented by a sequence of states (1, p.53). A word is recognised
from the sequence of states that is most probably associated with a sequence of outputs.
Therefore, the unit for such HMMs is the word. Many scientists believe that using
sub-words instead of words may improve the quality of speech recognition (1, p.50).
To implement sub-word HMMs, a system of sub-word units must by selected. The simplest
form of sub-word units are phones. Using phones as units for an HMM seems to be the right
choice since phones are small in number and smoothly trained, but the performance of such
an HMM is poor since a phone is affected by the surrounding phones (1, p.53). Another
choice of sub-word units are syllables. Similar to phones, syllables are also affected by
surrounding syllables, but their number is much greater than phones (around 20 000 in
English) which make them hard to train (1, p.53). A new sub-word unit, known as triphone,
seem to be the most successful. Triphones solve the problem of influence between sub-word
units and their surrounding by modelling each phone according to its right and left
neighbour (1, p.53). As an example, the 't' in 'still' will be modelled by the s-t-i
triphone (1, p.53). The immediate problem one can think of is the large number of
triphones since we are taking each phone and combining it with all possible left and
right phone neighbours. This problem can be resolved by using the fact that some
triphones can be very similar since many neighbouring phones can affect a phone the same
way (1, pp.53-54). For example, the effect on the 't' in 'still' is similar to the one in
'steal' (1, pp.53-54). Even though the performance of the recognition system is affected
by such approximations, it remains within acceptable standards (1, p.54). 
Applications of Automatic Speech Recognition Systems
With all the time and money spend on researches on speech recognition systems, someone
may wonder about the applications of speech recognition. This part will present some of
the currently available applications along with some future applications of automatic
speech recognition systems.
Automated Call-Type Recognition
An interesting and relatively simple application of speech recognition systems is
automated call-type recognition. In pay phones, operators are needed to determine the
call-type of the caller (7, p.490). Speech recognition may be used instead of operators.
Five types of calls are available: 'collect', 'calling card', 'operator' for operator
assisted calls, 'third number' for third party billing calls, 'person' for
person-to-person calls (7, p.490). For this application, the speech recognition system
must be speaker independent and capable of recognising and spotting the five key words
mentioned above in a speech sample (2, p.52). The problem in this application is the high
amount of background noise since pay phones are usually available in public places,
however, this problem can be solved using appropriate speech recognition systems
(low-level speakers, etc.)(2, p.52). 
Data Entry
Entering data using speech recognition is very practical when performing a manual task
(6, p.102). A speech recognition system for this application is highly complex and
structured since it should contain a large vocabulary. For data entry, speaker-dependent
or speaker-independent speech recognition systems are available even though
speaker-independent systems perform better than speaker-dependent systems. They are also
available for discrete or continuous speech (6, p.102). Data entry applications are still
limited since the performance of speech recognition systems in this field is still
limited.
Future applications using automatic speech recognition systems
With the increasing performance of automatic speech recognition systems, companies are
more interested in integrating speech recognition systems in their products. Car
manufacturers are interested in replacing all the levers, knobs, and buttons by a speech
recognition system capable of doing everything, from raising temperature to locking doors
and turning on the radio (5, p.49). In this way, the electronic content of the car is
increased whereas the mechanical is reduced. This makes the car easier to design and
build, therefore costing less (5, p.49). Others think of applying speech recognition
systems in kitchen appliances such as dishwashers, ovens, refrigerators. Air-conditioners
might some day be voice controlled (5, p.49). 
Conclusion
The gradual but inevitable development of speech recognition systems will surely lead to
a system that will one day compare to the perfect speech recognition device, the human
being. New methods and algorithms are researched every day to improve the performance of
speech recognition systems. Will we reach a stage where keyboards, buttons, and all input
devices become obsolete? Time will tell. 
Bibliography
1. Holmes, W.J., & Pearce, D.J.B. (1993, Vol.11, No.1). Sub-word units for automatic
speech recognition of any vocabulary. GEC Journal of Research, 49-58.
2. Juang, B.H., & Perdue, R.J., Jr, & Thomson, D.L. (1995, March / April). Deployable
automatic speech recognition systems: Advances and challenges. AT&T Technical Journal,
45-54.
3. Kay, R. (1998, January). Do you hear what I say?. Byte, 115-116.
4. Makhoul, J.F., & Schwartz, R. (1997, December). The voice of the computer is heard in
the land (and it listens too!). Spectrum, 39-47.
5. Mannes, G. (1995, July). Machines that listens. Popular Mechanics, 47-49. 
6. Markowitz, J. (1995, December). Talking to machines. Byte, 97-104
7. Rabiner, L. & Juang, B.H. (1993). Fundamentals of speech recognition. New Jersey.
Prentice-Hall.

Use the Search box at the top to find Term Papers for Sale by keywords or browse Free Essays page by page
(sorted alphabetically by Essay Title):

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
For college-level Term Papers, Essays, Research Papers and Book Reports, please go to the Term Papers for Sale Website


This Free Essays Web Site, is Copyright © 2009, Essay Express. All rights reserved.




Partner websites: Interior Decor Art :: Immigration Lawyer Toronto :: Laser Clinic Toronto :: Original Abstract Paintings :: Learn Violin in Thornhill :: Learn Violin in Toronto :: Buy used Yamaha piano in Toronto