This book presents the methods, tools and techniques that are currently being used to recognise (automatically) the affect, emotion, personality and everything else beyond linguistics (‘paralinguistics’) expressed by or embedded in human speech and language.It is the first book to provide such a systematic survey of paralinguistics in speech and language processing. The technology described has evolved mainly from automatic speech and speaker recognition and processing, but also takes into account recent developments within speech signal processing, machine intelligence and data mining.Moreover, the book offers a hands-on approach by integrating actual data sets, software, and open-source utilities which will make the book invaluable as a teaching tool and similarly useful for those professionals already in the field.Key features: Provides an integrated presentation of basic research (in phonetics/linguistics and humanities) with state-of-the-art engineering approaches for speech signal processing and machine intelligence.Explains the history and state of the art of all of the sub-fields which contribute to the topic of computational paralinguistics.C overs the signal processing and machine learning aspects of the actual computational modelling of emotion and personality and explains the detection process from corpus collection to feature extraction and from model testing to system integration.Details aspects of real-world system integration including distribution, weakly supervised learning and confidence measures.Outlines machine learning approaches including static, dynamic and context‑sensitive algorithms for classification and regression.Includes a tutorial on freely available toolkits, such as the open-source ‘openEAR’ toolkit for emotion and affect recognition co-developed by one of the authors, and a listing of standard databases and feature sets used in the field to allow for immediate experimentation enabling the reader to build an emotion detection model on an existing corpus.
Björn Schuller, Technische Universität München, GermanyAnton Batliner, Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany
Preface xiii Acknowledgements xvList of Abbreviations xviiPart I Foundations1 Introduction 31.1 What is Computational Paralinguistics? A First Approximation 31.2 History and Subject Area 71.3 Form versus Function 101.4 Further Aspects 121.4.1 The Synthesis of Emotion and Personality 121.4.2 Multimodality: Analysis and Generation 131.4.3 Applications, Usability and Ethics 151.5 Summary and Structure of the Book 17References 182 Taxonomies 212.1 Traits versus States 212.2 Acted versus Spontaneous 252.3 Complex versus Simple 302.4 Measured versus Assessed 312.5 Categorical versus Continuous 332.6 Felt versus Perceived 352.7 Intentional versus Instinctual 372.8 Consistent versus Discrepant 382.9 Private versus Social 392.10 Prototypical versus Peripheral 402.11 Universal versus Culture-Specific 412.12 Unimodal versus Multimodal 432.13 All These Taxonomies – So What? 442.13.1 Emotion Data: The FAU AEC 452.13.2 Non-native Data: The C-AuDiT corpus 47References 483 Aspects of Modelling 533.1 Theories and Models of Personality 533.2 Theories and Models of Emotion and Affect 553.3 Type and Segmentation of Units 583.4 Typical versus Atypical Speech 603.5 Context 613.6 Lab versus Life, or Through the Looking Glass 623.7 Sheep and Goats, or Single Instance Decision versus Cumulative Evidence and Overall Performance 643.8 The Few and the Many, or How to Analyse a Hamburger 653.9 Reifications, and What You are Looking for is What You Get 673.10 Magical Numbers versus Sound Reasoning 68References 744 Formal Aspects 794.1 The Linguistic Code and Beyond 794.2 The Non-Distinctive Use of Phonetic Elements 814.2.1 Segmental Level: The Case of /r/ Variants 814.2.2 Supra-segmental Level: The Case of Pitch and Fundamental Frequency – and of Other Prosodic Parameters 824.2.3 In Between: The Case of Other Voice Qualities, Especially Laryngealisation 864.3 The Non-Distinctive Use of Linguistics Elements 914.3.1 Words and Word Classes 914.3.2 Phrase Level: The Case of Filler Phrases and Hedges 944.4 Disfluencies 964.5 Non-Verbal, Vocal Events 984.6 Common Traits of Formal Aspects 100References 1015 Functional Aspects 1075.1 Biological Trait Primitives 1095.1.1 Speaker Characteristics 1115.2 Cultural Trait Primitives 1125.2.1 Speech Characteristics 1145.3 Personality 1155.4 Emotion and Affect 1195.5 Subjectivity and Sentiment Analysis 1235.6 Deviant Speech 1245.6.1 Pathological Speech 1255.6.2 Temporarily Deviant Speech 1295.6.3 Non-native Speech 1305.7 Social Signals 1315.8 Discrepant Communication 1355.8.1 Indirect Speech, Irony, and Sarcasm 1365.8.2 Deceptive Speech 1385.8.3 Off-Talk 1395.9 Common Traits of Functional Aspects 140References 1416 Corpus Engineering 1596.1 Annotation 1606.1.1 Assessment of Annotations 1616.1.2 New Trends 1646.2 Corpora and Benchmarks: Some Examples 1646.2.1 FAU Aibo Emotion Corpus 1656.2.2 aGender Corpus 1656.2.3 TUM AVIC Corpus 1666.2.4 Alcohol Language Corpus 1686.2.5 Sleepy Language Corpus 1686.2.6 Speaker Personality Corpus 1696.2.7 Speaker Likability Database 1706.2.8 NKI CCRT Speech Corpus 1716.2.9 TIMIT Database 1716.2.10 Final Remarks on Databases 172References 173Part II Modelling7 Computational Modelling of Paralinguistics: Overview 179References 1838 Acoustic Features 1858.1 Digital Signal Representation 1858.2 Short Time Analysis 1878.3 Acoustic Segmentation 1908.4 Continuous Descriptors 1908.4.1 Intensity 1908.4.2 Zero Crossings 1918.4.3 Autocorrelation 1928.4.4 Spectrum and Cepstrum 1948.4.5 Linear Prediction 1988.4.6 Line Spectral Pairs 2028.4.7 Perceptual Linear Prediction 2038.4.8 Formants 2058.4.9 Fundamental Frequency and Voicing Probability 2078.4.10 Jitter and Shimmer 2128.4.11 Derived Low-Level Descriptors 214References 2149 Linguistic Features 2179.1 Textual Descriptors 2179.2 Preprocessing 2189.3 Reduction 2189.3.1 Stopping 2189.3.2 Stemming 2199.3.3 Tagging 2199.4 Modelling 2209.4.1 Vector Space Modelling 2209.4.2 On-line Knowledge 222References 22710 Supra-segmental Features 23010.1 Functionals 23110.2 Feature Brute-Forcing 23210.3 Feature Stacking 233References 23411 Machine-Based Modelling 23511.1 Feature Relevance Analysis 23511.2 Machine Learning 23811.2.1 Static Classification 23811.2.2 Dynamic Classification: Hidden Markov Models 25611.2.3 Regression 26211.3 Testing Protocols 26411.3.1 Partitioning 26411.3.2 Balancing 26611.3.3 Performance Measures 26711.3.4 Result Interpretation 272References 27712 System Integration and Application 28112.1 Distributed Processing 28112.2 Autonomous and Collaborative Learning 28412.3 Confidence Measures 286References 28713 ‘Hands-On’: Existing Toolkits and Practical Tutorial 28913.1 Related Toolkits 28913.2 openSMILE 29013.2.1 Available Feature Extractors 29313.3 Practical Computational Paralinguistics How-to 29413.3.1 Obtaining and Installing openSMILE 29513.3.2 Extracting Features 29513.3.3 Classification and Regression 302References 30314 Epilogue 304Appendix 307A.1 openSMILE Feature Sets Used at Interspeech Challenges 307A.2 Feature Encoding Scheme 310References 314Index 315