Speech Processing
Course Number 525.747

Syllabus and Transparencies

Dr. Joe Campbell
j.campbell@ieee.org

The Johns Hopkins University
Whiting School of Engineering
Dorsey Center

  1. Administrative matters, course overview, review of probability, Fourier and Laplace transforms, review of digital signal processing: sampling theorem, Z-transforms, digital filters, windows, decimation and interpolation, and filter banks.
  2. Speech modeling: linguistics, physics of sound, speech production and acoustic tube modeling, acoustic phonetics, anatomy and physiology of the vocal tract and ear, hearing, and perception.
  3. Waveform coding I: discussion of problem, sampling, aliasing, linear quantization, companding, optimum quantization, pulse code modulation (PCM), effects of channel errors, vector quantization (VQ), adaptive quantization, differential PCM, APCM vs ADPCM, delta modulation, adaptive delta modulation, and CVSD.
  4. Analog vocoders: channel vocoder, early secure voice systems, and formant vocoder.
  5. Linear prediction: history, general time domain formulation, all-pole model, autocorrelation solution, covariance solution, frequency domain formulation, lattice form, and comparison to acoustic tube.
  6. Digital vocoders: Linear predictive coding (LPC), hybrid coders: voice excited vocoders, voice excited linear predictor, and residual excited linear predictor (RELP).
  7. Waveform coding II: adaptive predictive coding (APC), analysis-by-synthesis coding [multipulse, regular pulse excitation (RPE), code-excited linear prediction (CELP), and low-delay CELP (LD-CELP)], subband coding, transform coding (TC), adaptive transform coding (ATC) and motion picture expert’s group (MPEG) audio.
  8. Homomorphic speech processing: homomorphic derivation and properties, complex and real cepstrum, homomorphic speech analysis, homomorphic deconvolution, and speech enhancement (channel equalization, echo reduction).
  9. Speech Synthesis: history, voice response, text-to-speech, formant synthesizer, concatenation synthesizers (phoneme, allophone, diphone, triphone, demisyllable, and word/morpheme), and time scale modification.
  10. Speech recognition I: terms, isolated word recognition, continuous speech recognition, speaker (in)dependent, measures and distances (articulation index, log spectral distortion, Itakura-Saito, cepstral distance), and dynamic time warping (DTW).
  11. Markov models: discussion of recognition problem, hidden Markov models (HMM), elements, "3 problems of HMMs," generation, training and recognition, feature vectors, left-to-right models, and phone modeling.
  12. Speech recognition II: isolated word recognition HMM, Viterbi algorithm, continuous speech recognition HMM networks, discrete and continuous observation density HMMs, grammar, and the SPHINX recognizer.
  13. "Neural" nets: "neurons" and perceptrons, nonlinearities, training (e.g., back propagation), and comparison with knowledge based approaches.
  14. Speaker recognition: speaker verification/authentication vs speaker identification, closed vs open set, feature vectors (e.g., line spectrum pair and cepstrum), pattern matching (e.g., DTW, VQ, HMM), hypothesis testing, and errors.
  15. Voice transformation.
  16. Speech processing hardware: digital signal processing (DSP) chips (TI, AT&T, Motorola) and Motorola CVSD and ADPCM chips.
  17. Hot topics: Emerging speech coding standards (e.g., 2400 bps MELP), Internet phone, voice and multimedia applications (e.g., on the WWW), and student requests...
  18. Internet resources:

(The numbers above do not refer to class meeting numbers.)