| Week |
Topic |
Reading |
Reading for Discussion |
Homework |
| 1: 8/25 |
History, Overview, Festival |
Dutoit, 1 |
----- |
Homework 1 |
| 2: 8/30, 9/1 |
The Vocal Tract
(Acoustic Phonetics)
(See UCLA pages on
Sounds
of World's Languages)
|
Fant, 1970, 1.4--2.4 (pp. 63--161) |
Klatt, 1987 (discussant: Hahn Koo) |
|
| 3: 9/8 |
Text Normalization; Morphological Analysis and Word
Pronunciation; Syntactic Analysis |
Dutoit, 2, 3, 4, 5; Sproat 2.5, 3, 4;
Sproat et al. 2001 |
Fant, 1970, 1.4 (pp. 63--90) (discussant:
Sandeep Phatak) |
Homework 2 |
| 4: 9/13, 9/15 |
More on text analysis |
"" |
Yarowsky, 1996, chapters 1-6 (discussant:
Cecilia Alm) |
|
| 5: 9/20, 9/22 |
Even more on text analysis |
"" |
Golding&Rosenbloom, 1996,
Jannedy&Moebius, 1997
Marchand&Damper, 2000
(discussant: Liam Moran)
|
Homework 3 |
| 6: 9/27, 9/29 |
Theories and Models of Intonation and Prosody
(Warning 20Mb);
9/29: Guest
Lecture, Chilin Shih,
FLB G96
|
Dutoit 6; Sproat 6;
Silverman et al. 1992;
van Santen & Möbius, 2000 ;
Taylor, 2000;
Kochanski & Shih, 2003. |
No discussion this week.
| |
| 7: 10/4, 10/6 |
Finite-state approaches to multilingual text processing
Project proposals due |
|
Liberman & Pierrehumbert 1984
(discussant: Taejin Yoon)
|
Homework 4 |
| 8: 10/11, 10/13 |
Accent and Phrasing Prediction.
Segmental Duration |
Sproat 5 |
Bachenko & Fitzpatrick, 1990
Wang & Hirschberg, 1992
Taylor & Black, 1998
(discussant: Jason Strohmaier)
|
Homework 5 |
| 9: 10/18, 10/20 |
Linear Predictive Coding;
Formant Synthesizers; The Klatt
Synthesizer |
Dutoit 8
Klatt, 1980
| No discussion this week |
|
| 10: 10/25, 10/27 |
Time-domain models;
Fixed-unit systems;
Unit-Selection Approaches (CHATR and
progeny); Large Numbers of Rare Events (LNRE) Distributions |
Dutoit, 9, 10
Sproat 7
Möbius (2001) |
Boersma, 1998, Chs. 1--3
(discussant: Andreas Ehmann) |
|
| 11: 11/1, 11/3 |
Unit-Selection Approaches Continued, LNRE |
|
Project progress oral reports |
|
| 12: 11/8, 11/10 |
"Limited Domain" Applications: talking clock;
Evaluation Methods;
Document Structure: email readers,
reading for the blind.
|
Sproat 8, 9;
Sproat, Hu and Chen, 1998;
Raman, 1994
|
van Santen, 1993,
van Santen et al, 1998,
SpeechWorks, 2002
(discussant: Alla Rozovskaya) |
|
| 13: 11/15, 11/17 |
TTS and the WWW, Markup for TTS, SABLE and the W3C;
Concept-to-Speech Systems
|
Sproat, 9,
Sproat et al. 1998,
Sproat and Raman (no date),
SABLE
specification,
W3C's Speech
Synthesis Markup Language Version 1.0
Davis and Hirschberg 1988.
|
No discussion this week.
|
|
| 14: 11/29, 12/1 |
Other issues: Visual TTS, Future
Directions and Wrap-up |
Ezzat and Poggio, 1998;
Graf and Cosatto, 2001;
Cassell et al. 1994;
Sproat, Ostendorf and Hunt, 1999 |
Project final presentations. |
|
| 15: 12/6, 12/8 |
|
|
Project final
presentations.
|
|
Course Requirements
There are two texts for this course:
- Thierry Dutoit (1997) An Introduction to Text-to-Speech
Synthesis Dordrecht: Kluwer.
ISBN
0-7923-4498-7.
-
Richard Sproat (1998) Multilingual Text-to-Speech Synthesis: The
Bell Labs Approach. Dordrecht: Kluwer.
ISBN 0-7923-8027-4.
A postscript version is available here.
Dutoit's book is a good overall introduction to text-to-speech
synthesis. Dutoit works mostly at the signal processing end of TTS, so
his coverage of signal processing approaches is particularly thorough.
The volume I edited is an overview of one particular system, the Bell
Labs Multilingual TTS system, a system that was quite influential in
its day. There is an extended coverage on the approach to text
normalization using finite-state transducers.
We will also be using the Festival TTS System as well as the AT&T
lextools and fsm packages. Documentation on the Festival system can be
found here. A
description of how to use the lextools system in TTS can be found here (look
for the link to the paper).
The requirements for a grade in this course are:
-
Weekly homeworks and lab exercises using tools set up in the EWS
Lab. Note that while audio for playback works in the EWS lab,
recording is not really practical there. So for the exercises that
require you to record speech, you should plan to do that using
your own PC, using any program that can produce a WAV (RIFF) audio
output. You will need a microphone.
You will need to bring your own headphones for use in the EWS lab.
-
Leading the discussion of at least one assigned paper during the
Wednesday meeting of the course: the exact number of papers assigned
to each participant will depend upon the number of enrollees.
-
Successful completion of a mini-project related to any aspect of
TTS. Possible projects include:
- Design and implementation of a non-trivial part of a text
normalization for any language.
- Implementation of any intonation model for any language.
- Construction of a limited-domain TTS system for a small (but not
completely trivial) task using Festival tools.
- Machine learning of letter-to-sound rules for English or some
other "interesting" language
- Evaluation experiments of some of the systems available online
- Construction of a simple articulatory synthesizer based on an
N-tube model
Depending upon the composition of the class, we may divide the class
into small groups (2 or 3 people per group), with each group selecting
the project they want to work on. Groups are required to discuss their
proposed project with me by the end of the seventh week,
present a small progress report (maximum 3 pages) by the
eleventh week (so that I can be sure you are on track for a
meaningful project), and give a short (15-20 min) presentation on your
project during the final week. The latter should include some sort of
demonstration of what you have done. A short (5-10 page, but longer if
needed) project report must also be turned in by the end of the class.