moderated Re: synthesizer versus voice


JM Casey
 

I can tell you two reasons off the top of my head why many might prefer
Eloquence.
1. Its pronunciation of any english word at least in the American variant is
basically perfect.
2. it is really much better at fast speed than any of the sampled voices.
These more human sounding voices were not meant to be used at the fast rates
many blind people listen to synthesised speech. It makes the samples sound a
jumbled mess. Nevertheless I do know some people who still listen to modern
human-derived synthesised voices at fast(er) speeds.

-----Original Message-----
From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of David Diamond
Sent: September 21, 2020 12:13 AM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

Funny because some prefer eloquence over real speak from JAWS. The person
who did the Australian voice for JAWS said she had a huge manuscript the
size of a phone book to record. Also the Texas version of U S English had
slight variations. For me, the word motor sounded like murder. It could
have been my hearing disability though.

-----Original Message-----
From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of JM Casey
Sent: September 20, 2020 8:20 PM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

Cool writeup/analysis. I've no doubt we will get there, but I don't
think we're there yet -- I've heard a few top-of-the-lie commercial
voice synthesisers and to me they still haven't quite grasped the
inflection and intonations of the human voice. But they're getting
eerily close. So ..in time. And of course, all our ears are different,
too, and this "uncanny valley" aspect is probably already nonexistent for
some people.



-----Original Message-----
From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of Orlando
Enrique Fiol via groups.io
Sent: September 20, 2020 11:10 PM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

At 09:00 PM 9/20/2020, Mark asked:
>what's the difference between a synthesizer and a voice?

A synthesizer uses electronic processes to fashion complex timbres
from acoustic or electronic sound sources. For example, a triangle
wave may be combined with clarinet samples to produce a "synthesized"
clarinet.
However, I suspect your question pertains to our text-to-speech engines.
There, the distinction between speech synthesizer and voice operates
on two levels. The synthesizer is the speech engine as a whole, while
individual voices (such as male, female, child, etc.) can be chosen.
On a deeper level, though, the difference between synthesizer and
voice rests in the sources for phonemes used by a text-to-speech
engine. With purely synthesized speech, human speech is electronically
modeled, just as digital FM synthesizers such as the Yamaha DX7
attempted to create acoustic-sounding timbres using electronic sources
rather than actual samples. There's a vital difference between trying
to make an electronic keyboard sound like a violin or banjo, and
actually recording single notes on violin or banjo in order to spread them
out across the keyboard.
The old-fashioned speech synthesizer uses no human speech samples,
while most text-to-speech engines today do indeed use exclusively
human speech samples. That's why today's voices sound more realistic
and human; they're fashioned from recordings of human beings speaking
different words or parts of words, from which the speech engine
constructs its vocabulary libraries.
As a sidenote, this human speech sampling and modeling technology is
at the point where one can theoretically make a speech engine from
anyone's voice, which has produced some unintended byproducts. It is
now possible to create convincing audio recordings of people allegedly
saying things they never actually said. This is done by sampling
enough of their recorded speech to formulate a lexicon not only of
vocabulary, but more important, of their vocal inflections, the rises,
falls, breaths and pauses in their speech.
With this modeling technology, we soon will not know for certain
whether people have actually said what we've heard them say on audio
recordings or videos.
So, there you have it: a little primer on synthesis and sampled sound.


Orlando Enrique Fiol
Ph.D. in Music theory
University of Pennsylvania: November, 2018 Professional
Pianist/Keyboardist, Percussionist and Pedagogue Charlotte, North
Carolina









Join main@jfw.groups.io to automatically receive all group messages.