Language and Speech Processing

Inbunden, Engelska, 2009

3 939 kr

Beställningsvara. Skickas inom 11-20 vardagar. Fri frakt för medlemmar vid köp för minst 249 kr.

Speech processing addresses various scientific and technological areas. It includes speech analysis and variable rate coding, in order to store or transmit speech. It also covers speech synthesis, especially from text, speech recognition, including speaker and language identification, and spoken language understanding.This book covers the following topics: how to realize speech production and perception systems, how to synthesize and understand speech using state-of-the-art methods in signal processing, pattern recognition, stochastic modelling computational linguistics and human factor studies.

Produktinformation

Utgivningsdatum2009-01-06
Mått160 x 240 x 31 mm
Vikt862 g
FormatInbunden
SpråkEngelska
Antal sidor416
FörlagISTE Ltd and John Wiley & Sons Inc
ISBN9781848210318

Tillhör följande kategorier

Systemvetenskap och AI inom Data och IT

Joseph Mariani’s research activities relate to language technology, multimodal human-machine communication, speech recognition, spoken language resources and evaluation. He was president of the European Language Resources Association (ELRA), president of the European (now International) Speech Communication Association (ISCA), a member of the board of the European Network on Language and Speech (ELSNET) and the coordinator of the AUF FRANCIL Network. He was the director of the LIMSI CNRS laboratory from 1989 to 2001, and the head of its “Human-Machine Communication” department. He is since 2001 director of the “Information and Communication Technologies” department at the French Ministry of Research.

Preface xiiiChapter 1. Speech Analysis 1Christophe D’ALESSANDRO1.1. Introduction 11.1.1. Source-filter model 11.1.2. Speech sounds 21.1.3. Sources 61.1.4. Vocal tract 121.1.5. Lip-radiation 181.2. Linear prediction 181.2.1. Source-filter model and linear prediction 181.2.2. Autocorrelation method: algorithm 211.2.3. Lattice filter 281.2.4. Models of the excitation 311.3. Short-term Fourier transform 351.3.1. Spectrogram 351.3.2. Interpretation in terms of filter bank 361.3.3. Block-wise interpretation 371.3.4. Modification and reconstruction 381.4. A few other representations 391.4.1. Bilinear time-frequency representations 391.4.2. Wavelets 411.4.3. Cepstrum 431.4.4. Sinusoidal and harmonic representations 461.5. Conclusion 491.6. References 50Chapter 2. Principles of Speech Coding 55Gang FENG and Laurent GIRIN2.1. Introduction 552.1.1. Main characteristics of a speech coder 572.1.2. Key components of a speech coder 592.2. Telephone-bandwidth speech coders 632.2.1. From predictive coding to CELP 652.2.2. Improved CELP coders 692.2.3. Other coders for telephone speech 772.3. Wideband speech coding 792.3.1. Transform coding 812.3.2. Predictive transform coding 852.4. Audiovisual speech coding 862.4.1. A transmission channel for audiovisual speech 862.4.2. Joint coding of audio and video parameters 882.4.3. Prospects 932.5. References 93Chapter 3. Speech Synthesis 99Olivier BOËFFARD and Christophe D’ALESSANDRO3.1. Introduction 993.2. Key goal: speaking for communicating 1003.2.1. What acoustic content? 1013.2.2. What melody? 1023.2.3. Beyond the strict minimum 1033.3 Synoptic presentation of the elementary modules in speech synthesis systems 1043.3.1. Linguistic processing 1053.3.2. Acoustic processing 1053.3.3. Training models automatically 1063.3.4. Operational constraints 1073.4. Description of linguistic processing 1073.4.1. Text pre-processing 1073.4.2. Grapheme-to-phoneme conversion 1083.4.3. Syntactic-prosodic analysis 1103.4.4. Prosodic analysis 1123.5. Acoustic processing methodology 1143.5.1. Rule-based synthesis 1143.5.2. Unit-based concatenative synthesis 1153.6. Speech signal modeling 1173.6.1. The source-filter assumption 1183.6.2. Articulatory model 1193.6.3. Formant-based modeling 1193.6.4. Auto-regressive modeling 1203.6.5. Harmonic plus noise model 1203.7. Control of prosodic parameters: the PSOLA technique 1223.7.1. Methodology background 1243.7.2. The ancestors of the method 1253.7.3. Descendants of the method 1283.7.4. Evaluation 1313.8. Towards variable-size acoustic units 1313.8.1. Constitution of the acoustic database 1343.8.2. Selection of sequences of units 1383.9. Applications and standardization 1423.10. Evaluation of speech synthesis 1443.10.1. Introduction 1443.10.2. Global evaluation 1463.10.3. Analytical evaluation 1513.10.4. Summary for speech synthesis evaluation 1533.11. Conclusions 1543.12. References 154Chapter 4. Facial Animation for Visual Speech 169Thierry GUIARD-MARIGNY4.1. Introduction 1694.2. Applications of facial animation for visual speech 1704.2.1. Animation movies 1704.2.2. Telecommunications 1704.2.3. Human-machine interfaces 1704.2.4. A tool for speech research 1714.3. Speech as a bimodal process 1714.3.1. The intelligibility of visible speech 1724.3.2. Visemes for facial animation 1744.3.3. Synchronization issues 1754.3.4. Source consistency 1764.3.5. Key constraints for the synthesis of visual speech 1774.4. Synthesis of visual speech 1784.4.1. The structure of an artificial talking head 1784.4.2. Generating expressions 1784.5. Animation 1804.5.1. Analysis of the image of a face 1804.5.2. The puppeteer 1814.5.3. Automatic analysis of the speech signal 1814.5.4. From the text to the phonetic string 1814.6. Conclusion 1824.7. References 182Chapter 5. Computational Auditory Scene Analysis 189Alain DE CHEVEIGNÉ5.1. Introduction 1895.2. Principles of auditory scene analysis 1915.2.1. Fusion versus segregation: choosing a representation 1915.2.2. Features for simultaneous fusion 1915.2.3. Features for sequential fusion 1925.2.4. Schemes 1935.2.5. Illusion of continuity, phonemic restoration 1935.3. CASA principles 1935.3.1. Design of a representation 1935.4. Critique of the CASA approach 2005.4.1. Limitations of ASA 2015.4.2. The conceptual limits of “separable representation” 2025.4.3. Neither a model, nor a method? 2035.5. Perspectives 2035.5.1. Missing feature theory 2035.5.2. The cancellation principle 2045.5.3. Multimodal integration 2055.5.4. Auditory scene synthesis: transparency measure 2055.6. References 206Chapter 6. Principles of Speech Recognition 213Renato DE MORI and Brigitte BIGI6.1. Problem definition and approaches to the solution 2136.2. Hidden Markov models for acoustic modeling 2166.2.1. Definition 2166.2.2. Observation probability and model parameters 2176.2.3. HMM as probabilistic automata 2186.2.4. Forward and backward coefficients 2196.3. Observation probabilities 2226.4. Composition of speech unit models 2236.5. The Viterbi algorithm 2266.6. Language models 2286.6.1. Perplexity as an evaluation measure for language models 2306.6.2. Probability estimation in the language model 2326.6.3. Maximum likelihood estimation 2346.6.4. Bayesian estimation 2356.7. Conclusion 2366.8. References 237Chapter 7. Speech Recognition Systems 239Jean-Luc GAUVAIN and Lori LAMEL7.1. Introduction 2397.2. Linguistic model 2417.3. Lexical representation 2447.4. Acoustic modeling 2477.4.1. Feature extraction 2477.4.2. Acoustic-phonetic models 2497.4.3. Adaptation techniques 2537.5. Decoder 2567.6. Applicative aspects 2577.6.1. Efficiency: speed and memory 2577.6.2. Portability: languages and applications 2597.6.3. Confidence measures 2607.6.4. Beyond words 2617.7. Systems 2617.7.1. Text dictation 2627.7.2. Audio document indexing 2637.7.3. Dialog systems 2657.8. Perspectives 2687.9. References 270Chapter 8. Language Identification 279Martine ADDA-DECKER8.1. Introduction 2798.2. Language characteristics 2818.3. Language identification by humans 2868.4. Language identification by machines 2878.4.1. LId tasks 2888.4.2. Performance measures 2888.4.3. Evaluation 2898.5. LId resources 2908.6. LId formulation 2958.7. Lid modeling 2988.7.1. Acoustic front-end 2998.7.2. Acoustic language-specific modeling 3008.7.3. Parallel phone recognition 3028.7.4. Phonotactic modeling 3048.7.5. Back-end optimization 3098.8. Discussion 3098.9. References 311Chapter 9. Automatic Speaker Recognition 321Frédéric BIMBOT.9.1. Introduction 3219.1.1. Voice variability and characterization 3219.1.2. Speaker recognition 3239.2. Typology and operation of speaker recognition systems 3249.2.1. Speaker recognition tasks 3249.2.2. Operation 3259.2.3. Text-dependence 3269.2.4. Types of errors 3279.2.5. Influencing factors 3289.3. Fundamentals 3299.3.1. General structure of speaker recognition systems 3299.3.2. Acoustic analysis 3309.3.3. Probabilistic modeling 3319.3.4. Identification and verification scores 3359.3.5. Score compensation and decision 3379.3.6. From theory to practice 3429.4. Performance evaluation 3439.4.1. Error rate 3439.4.2. DET curve and EER 3449.4.3. Cost function, weighted error rate and HTER 3469.4.4. Distribution of errors 3469.4.5. Orders of magnitude 3479.5. Applications 3489.5.1. Physical access control 3489.5.2. Securing remote transactions 3499.5.3. Audio information indexing 3509.5.4. Education and entertainment 3509.5.5. Forensic applications 3519.5.6. Perspectives 3529.6. Conclusions 3529.7. Further reading 353Chapter 10. Robust Recognition Methods 355Jean-Paul HATON10.1. Introduction 35510.2. Signal pre-processing methods 35710.2.1. Spectral subtraction 35710.2.2. Adaptive noise cancellation 35810.2.3. Space transformation 35910.2.4. Channel equalization 35910.2.5. Stochastic models 36010.3. Robust parameters and distance measures 36010.3.1. Spectral representations 36110.3.2. Auditory models 36410.3.3 Distance measure 36510.4. Adaptation methods 36610.4.1 Model composition 36610.4.2. Statistical adaptation 36710.5. Compensation of the Lombard effect 36810.6. Missing data scheme 36910.7. Conclusion 36910.8. References 370Chapter 11. Multimodal Speech: Two or Three senses are Better than One 377Jean-Luc SCHWARTZ, Pierre ESCUDIER and Pascal TEISSIER11.1. Introduction 37711.2. Speech is a multimodal process 37911.2.1. Seeing without hearing 37911.2.2. Seeing for hearing better in noise 38011.2.3. Seeing for better hearing… even in the absence of noise 38211.2.4. Bimodal integration imposes itself to perception 38311.2.5. Lip reading as taking part to the ontogenesis of speech 38511.2.6. ...and to its phylogenesis ? 38611.3. Architectures for audio-visual fusion in speech perception 38811.3.1.Three paths for sensory interactions in cognitive psychology 38911.3.2. Three paths for sensor fusion in information processing 39011.3.3. The four basic architectures for audiovisual fusion 39111.3.4. Three questions for a taxonomy 39211.3.5. Control of the fusion process 39411.4. Audio-visual speech recognition systems 39611.4.1. Architectural alternatives 39711.4.2. Taking into account contextual information 40111.4.3. Pre-processing 40311.5. Conclusions 40511.6. References 406Chapter 12. Speech and Human-Computer Communication 417Wolfgang MINKER & Françoise NÉEL12.1. Introduction 41712.2. Context 41812.2.1. The development of micro-electronics 41912.2.2. The expansion of information and communication technologies and increasing interconnection of computer systems 42012.2.3. The coordination of research efforts and the improvement of automatic speech processing systems 42112.3. Specificities of speech 42412.3.1. Advantages of speech as a communication mode 42412.3.2. Limitations of speech as a communication mode 42512.3.3. Multidimensional analysis of commercial speech recognition products 42712.4. Application domains with voice-only interaction 43012.4.1. Inspection, control and data acquisition 43112.4.2. Home automation: electronic home assistant 43212.4.3. Office automation: dictation and speech-to-text systems 43212.4.4. Training 43512.4.5. Automatic translation 43812.5. Application domains with multimodal interaction 43912.5.1. Interactive terminals 44012.5.2. Computer-aided graphic design 44112.5.3. On-board applications 44212.5.4. Human-human communication facilitation 44412.5.5. Automatic indexing of audio-visual documents 44612.6. Conclusions 44612.7. References 447Chapter 13. Voice Services in the Telecom Sector 455Laurent COURTOIS, Patrick BRISARD and Christian GAGNOULET13.1. Introduction 45513.2. Automatic speech processing and telecommunications 45613.3. Speech coding in the telecommunication sector 45613.4. Voice command in telecom services 45713.4.1. Advantages and limitations of voice command 45713.4.2. Major trends 45913.4.3. Major voice command services 46013.4.4. Call center automation (operator assistance) 46013.4.5. Personal voice phonebook 46213.4.6. Voice personal telephone assistants 46313.4.7. Other services based on voice command 46313.5. Speaker verification in telecom services 46413.6. Text-to-speech synthesis in telecommunication systems 46413.7. Conclusions 46513.8. References 466List of Authors 467Index 471