Speaker Recognition Resources
Welcome! The use of common corpora for evaluation of speech and speaker
recognition systems has proven invaluable in comparing different
approaches, sharing results, and generally advancing the technology
state-of-the-art. Within the last five years the number of publicly
available speech corpora has increased dramatically. Unfortunately, the
information describing these corpora is not centralized and is
sometimes difficult to obtain. It is the aim of this site to act as a
clearing house for cataloging and describing corpora suitable for the
evaluation of speaker recognition systems. We encourage researchers in
the field to use and report results on these standard corpora to help
further advances and interactions.
The genesis of this project was a paper we published in the 1999
Proceedings of the IEEE International Conference on Acoustics, Speech, and
Signal Processing (ICASSP) (provided below). However, to keep up
with the evolving list of corpora available, we solicit input from you to
alert us to new and overlooked speaker recognition corpora. A form is
provided below to submit information about any such corpus. Pointers to
papers or published results on these corpora are also most welcome and
we have included a form for this too.
In addition, we invite feedback on this site and suggestions for any
improvements.
Enjoy,
Joe Campbell &
Doug Reynolds
Our ICASSP-99 paper, Corpora for the Evaluation of Speaker
Recognition Systems, is available in a variety of formats:
web,
Microsoft Word97,
Portable Document Format (PDF), and
PostScript (send to a PostScript Level 2 or 3 printer).
Here are the slides we used in our
ICASSP-99
poster session presentation, which have information not contained in our
ICASSP-99 paper. The slides show some example results of systems using these corpora.
We inadvertently
missed referencing the paper by Godfrey, J., D. Graff, and A. Martin.
"Public Databases for Speaker Recognition and Verification,"
ESCA Workshop on Automatic Speaker Recognition Identification and Verification,
Martigny, Switzerland, April 1994, p. 39-42 (this paper is only available in PostScript
format, you might want to use the Acrobat Reader 4
or RoPS viewer).
Corpora We Included
- TIMIT/NTIMIT (LDC): Single session, please use other corpora
- SIVA (ELRA)
- PolyVar (ELRA)
- POLYCOST (ELRA)
- KING (LDC)
- YOHO (LDC)
- Switchboard I & II, SPIDRE, and NIST Eval Subsets (LDC)
- Cellular Switchboard (LDC, future)
- Speaker Recognition Corpus (OGI)
- Tactical Speaker Identification Corpus, TSID (LDC)
Corpora We Excluded
Below is a list of additional corpora of which we are aware. Due to page limit
restrictions, we were unable to include all known corpora in our ICASSP
paper. Next to each corpus title we indicate why we chose not to include
it in the paper.
- M2VTS (ELRA): Too few speakers
- Green Flag (Rome Laboratory): Not publicly available
- SpeechDat (ELRA S0047): Too few speakers (see PolyVar)
- VeriVox: Future, too few speakers?
- Istituto Superiore Comuncazioni e Tecnologie Informatiche, ISCTI: Future, not publicly available?
Further information is available in "Excluded
Corpora for the Evaluation of Speaker Recognition Systems" (Microsoft
Word97 format). Please let us know if you disagree with these exclusions or if you know of any
updates to these corpora.
New & Missed Corpora?
As promised, we are collecting lists and characteristics of publicly available
speaker recognition corpora and evaluations. We plan to update this web site as
new corpora become available and possibly write future papers on corpora.
If you know of a new corpus or one we missed, please tell us via our
Speaker Recognition Corpora Form.
New Results on Corpora?
We encourage researchers in the field to use and report results on standard
corpora to help further advances and interactions. Pointers to papers or
published results on these corpora are also most welcome. If you know of
results on a publicly available standard speaker recognition corpus, please
tell us via our Speaker Recognition Results Form.
NIST Evaluations
To understand and join the NIST Coordinated Speaker Recognition Evaluations,
please visit the Speaker Recognition section of the
NIST Spoken Language Technology
Evaluations page.
We encourage participation in current and future NIST Evaluations. Some
sites might prefer to begin by running a prior Evaluation, which is not pure
in the blind testing sense, but is still very useful. To run a prior
Evaluation, you will need 4 items:
Make your own DET curves, etc.
Free software to create detection error tradeoff (DET) curves, etc. is available
from NIST's Spoken Language
Technology Evaluation and Utility Software page. Additional information
on DET curves is available in the paper by Martin, A., G. Doddington, T. Kamm,
M. Ordowski, and M. Przybocki.
"The DET Curve in
Assessment of Detection Task Performance," Proceedings of Eurospeech
Conference, Rhodes, Greece, Sep 1997, p. 1895-1898.
Key Sources of Corpora
Speaker and Language Recognition Using Speech Codec Parameters
Authors: T.F. Quatieri, E. Singer,
R.B. Dunn, D.A. Reynolds, J.P. Campbell*
MIT Lincoln Laboratory, Lexington,
MA, USA
quatieri@ll.mit.edu
* Department of Defense
ABSTRACT
In this paper, we investigate
the effect of speech coding on speaker and language recognition tasks.
Three coders were selected to cover a wide range of quality and bit rates:
GSM at 12.2 kb/s, G.729 at 8 kb/s, and G.723.1 at 5.3 kb/s. Our objective
is to measure recognition performance from either the synthesized speech
or directly from the coder parameters themselves. We show that using speech
synthesized from the three codecs, GMM-based speaker verification and phone-based
language recognition performance generally degrades with coder bit rate,
i.e., from GSM to G.729 to G.723.1, relative to an uncoded baseline. In
addition, speaker verification for all codecs shows a performance decrease
as the degree of mismatch between training and testing conditions increases,
while language recognition exhibited no decrease in performance. We also
present initial results in determining the relative importance of codec
system components in their direct use for recognition tasks. For the G.729
codec, it is shown that removal of the postfilter in the decoder helps
speaker verification performance under the mismatched condition. On the
other hand, with use of G.729 LSF-based mel-cepstra, performance decreases
under all conditions, indicating the need for a residual contribution to
the feature representation.
Volume 2, Page 787-790
Please send your additions and requests to Joe
and Doug.
16 October 1999
This URL is: http://www.apl.jhu.edu/Classes/Notes/Campbell/SpkrRec/