Digital Speech Transmission and Enhancement
Inbunden, Engelska, 2023
Av Peter Vary, Rainer Martin, Germany) Vary, Peter (RWTH Aachen University, Germany) Martin, Rainer (Ruhr-Universitat Bochum
1 609 kr
Beställningsvara. Skickas inom 5-8 vardagar
Fri frakt för medlemmar vid köp för minst 249 kr.DIGITAL SPEECH TRANSMISSION AND ENHANCEMENT Enables readers to understand the latest developments in speech enhancement/transmission due to advances in computational power and device miniaturization The Second Edition of Digital Speech Transmission and Enhancement has been updated throughout to provide all the necessary details on the latest advances in the theory and practice in speech signal processing and its applications, including many new research results, standards, algorithms, and developments which have recently appeared and are on their way into state-of-the-art applications. Besides mobile communications, which constituted the main application domain of the first edition, speech enhancement for hearing instruments and man-machine interfaces has gained significantly more prominence in the past decade, and as such receives greater focus in this updated and expanded second edition. Readers can expect to find information and novel methods on: Low-latency spectral analysis-synthesis, single-channel and dual-channel algorithms for noise reduction and dereverberationMulti-microphone processing methods, which are now widely used in applications such as mobile phones, hearing aids, and man-computer interfacesAlgorithms for near-end listening enhancement, which provide a significantly increased speech intelligibility for users at the noisy receiving side of their mobile phoneFundamentals of speech signal processing, estimation and machine learning, speech coding, error concealment by soft decoding, and artificial bandwidth extension of speech signalsDigital Speech Transmission and Enhancement is a single-source, comprehensive guide to the fundamental issues, algorithms, standards, and trends in speech signal processing and speech communication technology, and as such is an invaluable resource for engineers, researchers, academics, and graduate students in the areas of communications, electrical engineering, and information technology.
Produktinformation
- Utgivningsdatum2023-12-29
- Mått170 x 244 x 37 mm
- Vikt1 134 g
- SpråkEngelska
- SerieIEEE Press
- Antal sidor592
- Upplaga2
- FörlagJohn Wiley & Sons Inc
- EAN9781119060963
Hoppa över listan
Mer från samma författare
Tillhör följande kategorier
Peter Vary is former Head of the Institute of Communication Systems at RWTH Aachen University, Germany. Professor Vary is a Fellow of IEEE, EURASIP, and ITG, and has been a Distinguished Lecturer of the IEEE Signal Processing Society.Rainer Martin is Head of the Institute of Communication Acoustics at Ruhr-Universität Bochum, Germany. Professor Martin is a Fellow of the IEEE.Both authors have been actively involved in speech processing research and teaching over several decades.
- Preface xv1 Introduction 12 Models of Speech Production and Hearing 52.1 Sound Waves 52.2 Organs of Speech Production 72.3 Characteristics of Speech Signals 92.4 Model of Speech Production 102.4.1 Acoustic Tube Model of the Vocal Tract 122.4.2 Discrete Time All-Pole Model of the Vocal Tract 192.5 Anatomy of Hearing 252.6 Psychoacoustic Properties of the Auditory System 272.6.1 Hearing and Loudness 272.6.2 Spectral Resolution 292.6.3 Masking 312.6.4 Spatial Hearing 322.6.4.1 Head-Related Impulse Responses and Transfer Functions 332.6.4.2 Law of The First Wavefront 34References 353 Spectral Transformations 373.1 Fourier Transform of Continuous Signals 373.2 Fourier Transform of Discrete Signals 383.3 Linear Shift Invariant Systems 413.3.1 Frequency Response of LSI Systems 423.4 The z-transform 423.4.1 Relation to Fourier Transform 433.4.2 Properties of the ROC 443.4.3 Inverse z-Transform 443.4.4 z-Transform Analysis of LSI Systems 463.5 The Discrete Fourier Transform 473.5.1 Linear and Cyclic Convolution 483.5.2 The DFT of Windowed Sequences 513.5.3 Spectral Resolution and Zero Padding 543.5.4 The Spectrogram 553.5.5 Fast Computation of the DFT: The FFT 563.5.6 Radix-2 Decimation-in-Time FFT 573.6 Fast Convolution 603.6.1 Fast Convolution of Long Sequences 603.6.2 Fast Convolution by Overlap-Add 613.6.3 Fast Convolution by Overlap-Save 613.7 Analysis–Modification–Synthesis Systems 643.8 Cepstral Analysis 663.8.1 Complex Cepstrum 673.8.2 Real Cepstrum 693.8.3 Applications of the Cepstrum 703.8.3.1 Construction of Minimum-Phase Sequences 703.8.3.2 Deconvolution by Cepstral Mean Subtraction 713.8.3.3 Computation of the Spectral Distortion Measure 723.8.3.4 Fundamental Frequency Estimation 73References 754 Filter Banks for Spectral Analysis and Synthesis 794.1 Spectral Analysis Using Narrowband Filters 794.1.1 Short-Term Spectral Analyzer 834.1.2 Prototype Filter Design for the Analysis Filter Bank 864.1.3 Short-Term Spectral Synthesizer 874.1.4 Short-Term Spectral Analysis and Synthesis 884.1.5 Prototype Filter Design for the Analysis–Synthesis filter bank 904.1.6 Filter Bank Interpretation of the DFT 924.2 Polyphase Network Filter Banks 944.2.1 PPN Analysis Filter Bank 954.2.2 PPN Synthesis Filter Bank 1014.3 Quadrature Mirror Filter Banks 1044.3.1 Analysis–Synthesis Filter Bank 1044.3.2 Compensation of Aliasing and Signal Reconstruction 1064.3.3 Efficient Implementation 1094.4 Filter Bank Equalizer 1124.4.1 The Reference Filter Bank 1124.4.2 Uniform Frequency Resolution 1134.4.3 Adaptive Filter Bank Equalizer: Gain Computation 1174.4.3.1 Conventional Spectral Subtraction 1174.4.3.2 Filter Bank Equalizer 1184.4.4 Non-uniform Frequency Resolution 1204.4.5 Design Aspects & Implementation 122References 1235 Stochastic Signals and Estimation 1275.1 Basic Concepts 1275.1.1 Random Events and Probability 1275.1.2 Conditional Probabilities 1285.1.3 Random Variables 1295.1.4 Probability Distributions and Probability Density Functions 1295.1.5 Conditional PDFs 1305.2 Expectations and Moments 1305.2.1 Conditional Expectations and Moments 1315.2.2 Examples 1315.2.2.1 The Uniform Distribution 1325.2.2.2 The Gaussian Density 1325.2.2.3 The Exponential Density 1325.2.2.4 The Laplace Density 1335.2.2.5 The Gamma Density 1345.2.2.6 χ2-Distribution 1345.2.3 Transformation of a Random Variable 1355.2.4 Relative Frequencies and Histograms 1365.3 Bivariate Statistics 1375.3.1 Marginal Densities 1375.3.2 Expectations and Moments 1375.3.3 Uncorrelatedness and Statistical Independence 1385.3.4 Examples of Bivariate PDFs 1395.3.4.1 The Bivariate Uniform Density 1395.3.4.2 The Bivariate Gaussian Density 1395.3.5 Functions of Two Random Variables 1405.4 Probability and Information 1415.4.1 Entropy 1415.4.2 Kullback–Leibler Divergence 1415.4.3 Cross-Entropy 1425.4.4 Mutual Information 1425.5 Multivariate Statistics 1425.5.1 Multivariate Gaussian Distribution 1435.5.2 Gaussian Mixture Models 1445.6 Stochastic Processes 1455.6.1 Stationary Processes 1455.6.2 Auto-Correlation and Auto-Covariance Functions 1465.6.3 Cross-Correlation and Cross-Covariance Functions 1475.6.4 Markov Processes 1475.6.5 Multivariate Stochastic Processes 1485.7 Estimation of Statistical Quantities by Time Averages 1505.7.1 Ergodic Processes 1505.7.2 Short-Time Stationary Processes 1505.8 Power Spectrum and its Estimation 1515.8.1 White Noise 1525.8.2 The Periodogram 1525.8.3 Smoothed Periodograms 1535.8.3.1 Non Recursive Smoothing in Time 1535.8.3.2 Recursive Smoothing in Time 1545.8.3.3 Log-Mel Filter Bank Features 1545.8.4 Power Spectra and Linear Shift-Invariant Systems 1565.9 Statistical Properties of Speech Signals 1575.10 Statistical Properties of DFT Coefficients 1575.10.1 Asymptotic Statistical Properties 1585.10.2 Signal-Plus-Noise Model 1595.10.3 Statistics of DFT Coefficients for Finite Frame Lengths 1605.11 Optimal Estimation 1625.11.1 MMSE Estimation 1635.11.2 Estimation of Discrete Random Variables 1645.11.3 Optimal Linear Estimator 1645.11.4 The Gaussian Case 1655.11.5 Joint Detection and Estimation 1665.12 Non-Linear Estimation with Deep Neural Networks 1675.12.1 Basic Network Components 1685.12.1.1 The Perceptron 1685.12.1.2 Convolutional Neural Network 1705.12.2 Basic DNN Structures 1705.12.2.1 Fully-Connected Feed-Forward Network 1715.12.2.2 Autoencoder Networks 1715.12.2.3 Recurrent Neural Networks 1725.12.2.4 Time Delay, Wavenet, and Transformer Networks 1755.12.2.5 Training of Neural Networks 1755.12.2.6 Stochastic Gradient Descent (SGD) 1765.12.2.7 Adaptive Moment Estimation Method (ADAM) 176References 1776 Linear Prediction 1816.1 Vocal Tract Models and Short-Term Prediction 1816.1.1 All-Zero Model 1826.1.2 All-Pole Model 1836.1.3 Pole-Zero Model 1836.2 Optimal Prediction Coefficients for Stationary Signals 1876.2.1 Optimum Prediction 1876.2.2 Spectral Flatness Measure 1906.3 Predictor Adaptation 1926.3.1 Block-Oriented Adaptation 1926.3.1.1 Auto-Correlation Method 1936.3.1.2 Covariance Method 1946.3.1.3 Levinson–Durbin Algorithm 1966.3.2 Sequential Adaptation 2016.4 Long-Term Prediction 204References 2097 Quantization 2117.1 Analog Samples and Digital Representation 2117.2 Uniform Quantization 2127.3 Non-uniform Quantization 2197.4 Optimal Quantization 2277.5 Adaptive Quantization 2287.6 Vector Quantization 2327.6.1 Principle 2327.6.2 The Complexity Problem 2357.6.3 Lattice Quantization 2367.6.4 Design of Optimal Vector Code Books 2367.6.5 Gain–Shape Vector Quantization 2397.7 Quantization of the Predictor Coefficients 2407.7.1 Scalar Quantization of the LPC Coefficients 2417.7.2 Scalar Quantization of the Reflection Coefficients 2417.7.3 Scalar Quantization of the LSF Coefficients 243References 2468 Speech Coding 2498.1 Speech-Coding Categories 2498.2 Model-Based Predictive Coding 2538.3 Linear Predictive Waveform Coding 2558.3.1 First-Order DPCM 2558.3.2 Open-Loop and Closed-Loop Prediction 2588.3.3 Quantization of the Residual Signal 2598.3.3.1 Quantization with Open-Loop Prediction 2598.3.3.2 Quantization with Closed-Loop Prediction 2618.3.3.3 Spectral Shaping of the Quantization Error 2628.3.4 ADPCM with Sequential Adaptation 2668.4 Parametric Coding 2688.4.1 Vocoder Structures 2688.4.2 LPC Vocoder 2718.5 Hybrid Coding 2728.5.1 Basic Codec Concepts 2728.5.1.1 Scalar Quantization of the Residual Signal 2748.5.1.2 Vector Quantization of the Residual Signal 2768.5.2 Residual Signal Coding: RELP 2798.5.3 Analysis by Synthesis: CELP 2828.5.3.1 Principle 2828.5.3.2 Fixed Code Book 2838.5.3.3 Long-Term Prediction, Adaptive Code Book 2878.6 Adaptive Postfiltering 2898.7 Speech Codec Standards: Selected Examples 2938.7.1 GSM Full-Rate Codec 2958.7.2 EFR Codec 2978.7.3 Adaptive Multi-Rate Narrowband Codec (AMR-NB) 2998.7.4 ITU-T/G.722: 7 kHz Audio Coding within 64 kbit/s 3018.7.5 Adaptive Multi-Rate Wideband Codec (AMR-WB) 3018.7.6 Codec for Enhanced Voice Services (EVS) 3038.7.7 Opus Codec IETF RFC 6716 306References 3079 Concealment of Erroneous or Lost Frames 3139.1 Concepts for Error Concealment 3149.1.1 Error Concealment by Hard Decision Decoding 3159.1.2 Error Concealment by Soft Decision Decoding 3169.1.3 Parameter Estimation 3189.1.3.1 MAP Estimation 3189.1.3.2 MS Estimation 3189.1.4 The A Posteriori Probabilities 3199.1.4.1 The A Priori Knowledge 3209.1.4.2 The Parameter Distortion Probabilities 3209.1.5 Example: Hard Decision vs. Soft Decision 3219.2 Examples of Error Concealment Standards 3239.2.1 Substitution and Muting of Lost Frames 3239.2.2 AMR Codec: Substitution and Muting of Lost Frames 3259.2.3 EVS Codec: Concealment of Lost Packets 3299.3 Further Improvements 330References 33110 Bandwidth Extension of Speech Signals 33510.1 BWE Concepts 33710.2 BWE using the Model of Speech Production 33910.2.1 Extension of the Excitation Signal 34010.2.2 Spectral Envelope Estimation 34210.2.2.1 Minimum Mean Square Error Estimation 34410.2.2.2 Conditional Maximum A Posteriori Estimation 34510.2.2.3 Extensions 34510.2.2.4 Simplifications 34610.2.3 Energy Envelope Estimation 34610.3 Speech Codecs with Integrated BWE 34910.3.1 BWE in the GSM Full-Rate Codec 34910.3.2 BWE in the AMR Wideband Codec 35110.3.3 BWE in the ITU Codec G.729.1 353References 35511 NELE: Near-End Listening Enhancement 36111.1 Frequency Domain NELE (FD) 36311.1.1 Speech Intelligibility Index NELE Optimization 36411.1.1.1 SII-Optimized NELE Example 36711.1.2 Closed-Form Gain-Shape NELE 36811.1.2.1 The NoiseProp Shaping Function 37011.1.2.2 The NoiseInverse Strategy 37111.1.2.3 Gain-Shape Frequency Domain NELE Example 37211.2 Time Domain NELE (TD) 37411.2.1 NELE Processing using Linear Prediction Filters 374References 37812 Single-Channel Noise Reduction 38112.1 Introduction 38112.2 Linear MMSE Estimators 38312.2.1 Non-causal IIR Wiener Filter 38412.2.2 The FIR Wiener Filter 38612.3 Speech Enhancement in the DFT Domain 38712.3.1 The Wiener Filter Revisited 38812.3.2 Spectral Subtraction 39012.3.3 Estimation of the A Priori SNR 39112.3.3.1 Decision-Directed Approach 39212.3.3.2 Smoothing in the Cepstrum Domain 39212.3.4 Quality and Intelligibility Evaluation 39312.3.4.1 Noise Oversubtraction 39612.3.4.2 Spectral Floor 39612.3.4.3 Limitation of the A Priori SNR 39612.3.4.4 Adaptive Smoothing of the Spectral Gain 39612.3.5 Spectral Analysis/Synthesis for Speech Enhancement 39712.4 Optimal Non-linear Estimators 39712.4.1 Maximum Likelihood Estimation 39812.4.2 Maximum A Posteriori Estimation 40012.4.3 MMSE Estimation 40012.4.3.1 MMSE Estimation of Complex Coefficients 40112.4.3.2 MMSE Amplitude Estimation 40112.5 Joint Optimum Detection and Estimation of Speech 40512.6 Computation of Likelihood Ratios 40712.7 Estimation of the A Priori and A Posteriori Probabilities of Speech Presence 40812.7.1 Estimation of the A Priori Probability 40912.7.2 A Posteriori Speech Presence Probability Estimation 40912.7.3 SPP Estimation Using a Fixed SNR Prior 41012.8 VAD and Noise Estimation Techniques 41112.8.1 Voice Activity Detection 41112.8.1.1 Detectors Based on the Subband SNR 41212.8.2 Noise Power Estimation Based on Minimum Statistics 41312.8.3 Noise Estimation Using a Soft-Decision Detector 41612.8.4 Noise Power Tracking Based on Minimum Mean Square Error Estimation 41712.8.5 Evaluation of Noise Power Trackers 41912.9 Noise Reduction with Deep Neural Networks 42012.9.1 Processing Model 42112.9.2 Estimation Targets 42212.9.3 Loss Function 42312.9.4 Input Features 42312.9.5 Data Sets 423References 42513 Dual-Channel Noise and Reverberation Reduction 43513.1 Dual-Channel Wiener Filter 43513.2 The Ideal Diffuse Sound Field and Its Coherence 43813.3 Noise Cancellation 44213.3.1 Implementation of the Adaptive Noise Canceller 44413.4 Noise Reduction 44513.4.1 Principle of Dual-Channel Noise Reduction 44613.4.2 Binaural Equalization–Cancellation and Common Gain Noise Reduction 44713.4.3 Combined Single- and Dual-Channel Noise Reduction 44913.5 Dual-Channel Dereverberation 44913.6 Methods Based on Deep Learning 452References 45314 Acoustic Echo Control 45714.1 The Echo Control Problem 45714.2 Echo Cancellation and Postprocessing 46214.2.1 Echo Canceller with Center Clipper 46314.2.2 Echo Canceller with Voice-Controlled Soft-Switching 46314.2.3 Echo Canceller with Adaptive Postfilter 46414.3 Evaluation Criteria 46514.3.1 System Distance 46614.3.2 Echo Return Loss Enhancement 46614.4 The Wiener Solution 46714.5 The LMS and NLMS Algorithms 46814.5.1 Derivation and Basic Properties 46814.6 Convergence Analysis and Control of the LMS Algorithm 47014.6.1 Convergence in the Absence of Interference 47114.6.2 Convergence in the Presence of Interference 47314.6.3 Filter Order of the Echo Canceller 47614.6.4 Stepsize Parameter 47714.7 Geometric Projection Interpretation of the NLMS Algorithm 47914.8 The Affine Projection Algorithm 48114.9 Least-Squares and Recursive Least-Squares Algorithms 48414.9.1 The Weighted Least-Squares Algorithm 48414.9.2 The RLS Algorithm 48514.9.3 NLMS- and Kalman-Algorithm 48814.9.3.1 NLMS Algorithm 49014.9.3.2 Kalman Algorithm 49014.9.3.3 Summary of Kalman Algorithm 49214.9.3.4 Remarks 49214.10 Block Processing and Frequency Domain Adaptive Filters 49314.10.1 Block LMS Algorithm 49414.10.2 Frequency Domain Adaptive Filter (FDAF) 49514.10.2.1 Fast Convolution and Overlap-Save 49614.10.2.2 FLMS Algorithm 49914.10.2.3 Improved Stepsize Control 50214.10.3 Subband Acoustic Echo Cancellation 50214.10.4 Echo Canceller with Adaptive Postfilter in the Frequency Domain 50314.10.5 Initialization with Perfect Sequences 50514.11 Stereophonic Acoustic Echo Control 50614.11.1 The Non-uniqueness Problem 50814.11.2 Solutions to the Non-uniqueness Problem 508References 51015 Microphone Arrays and Beamforming 51715.1 Introduction 51715.2 Spatial Sampling of Sound Fields 51815.2.1 The Near-field Model 51815.2.2 The Far-field Model 51915.2.3 Sound Pickup in Reverberant Spaces 52115.2.4 Spatial Correlation Properties of Acoustic Signals 52215.2.5 Uniform Linear and Circular Arrays 52215.2.6 Phase Ambiguity in Microphone Signals 52315.3 Beamforming 52415.3.1 Delay-and-Sum Beamforming 52515.3.2 Filter-and-Sum Beamforming 52615.4 Performance Measures and Spatial Aliasing 52815.4.1 Array Gain and Array Sensitivity 52815.4.2 Directivity Pattern 52915.4.3 Directivity and Directivity Index 53115.4.4 Example: Differential Microphones 53115.5 Design of Fixed Beamformers 53415.5.1 Minimum Variance Distortionless Response Beamformer 53515.5.2 MVDR Beamformer with Limited Susceptibility 53715.5.3 Linearly Constrained Minimum Variance Beamformer 53815.5.4 Max-SNR Beamformer 53915.6 Multichannel Wiener Filter and Postfilter 54015.7 Adaptive Beamformers 54215.7.1 The Frost Beamformer 54215.7.2 Generalized Side-Lobe Canceller 54415.7.3 Generalized Side-lobe Canceller with Adaptive Blocking Matrix 54615.7.4 Model-Based Parsimonious-Excitation-Based GSC 54715.8 Non-linear Multi-channel Noise Reduction 550References 551Index 555