Topics

moderated synthesizer versus voice


Mark
 

what's the difference between a synthesizer and a voice?


Glenn / Lenny
 


A synthesizer is the software or hardware device, like Eloquence or Dectalk, and a voice is the variances, like Eloquence has Reed and Glenn and Dectalk has Harry and other names I cannot remember.
 
 

----- Original Message -----
From: Mark
Sent: Sunday, September 20, 2020 8:00 PM
Subject: synthesizer versus voice

what's the difference between a synthesizer and a voice?


Orlando Enrique Fiol
 

At 09:00 PM 9/20/2020, Mark asked:
what's the difference between a synthesizer and a voice?
A synthesizer uses electronic processes to fashion complex timbres from acoustic or electronic sound sources. For example, a triangle wave may be combined with clarinet samples to produce a "synthesized" clarinet.
However, I suspect your question pertains to our text-to-speech engines. There, the distinction between speech synthesizer and voice operates on two levels. The synthesizer is the speech engine as a whole, while individual voices (such as male, female, child, etc.) can be chosen.
On a deeper level, though, the difference between synthesizer and voice rests in the sources for phonemes used by a text-to-speech engine. With purely synthesized speech, human speech is electronically modeled, just as digital FM synthesizers such as the Yamaha DX7 attempted to create acoustic-sounding timbres using electronic sources rather than actual samples. There's a vital difference between trying to make an electronic keyboard sound like a violin or banjo, and actually recording single notes on violin or banjo in order to spread them out across the keyboard.
The old-fashioned speech synthesizer uses no human speech samples, while most text-to-speech engines today do indeed use exclusively human speech samples. That's why today's voices sound more realistic and human; they're fashioned from recordings of human beings speaking different words or parts of words, from which the speech engine constructs its vocabulary libraries.
As a sidenote, this human speech sampling and modeling technology is at the point where one can theoretically make a speech engine from anyone's voice, which has produced some unintended byproducts. It is now possible to create convincing audio recordings of people allegedly saying things they never actually said. This is done by sampling enough of their recorded speech to formulate a lexicon not only of vocabulary, but more important, of their vocal inflections, the rises, falls, breaths and pauses in their speech.
With this modeling technology, we soon will not know for certain whether people have actually said what we've heard them say on audio recordings or videos.
So, there you have it: a little primer on synthesis and sampled sound.


Orlando Enrique Fiol
Ph.D. in Music theory
University of Pennsylvania: November, 2018
Professional Pianist/Keyboardist, Percussionist and Pedagogue
Charlotte, North Carolina


JM Casey
 

Cool writeup/analysis. I've no doubt we will get there, but I don't think
we're there yet -- I've heard a few top-of-the-lie commercial voice
synthesisers and to me they still haven't quite grasped the inflection and
intonations of the human voice. But they're getting eerily close. So ..in
time. And of course, all our ears are different, too, and this "uncanny
valley" aspect is probably already nonexistent for some people.

-----Original Message-----
From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of Orlando Enrique
Fiol via groups.io
Sent: September 20, 2020 11:10 PM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

At 09:00 PM 9/20/2020, Mark asked:
>what's the difference between a synthesizer and a voice?

A synthesizer uses electronic processes to fashion complex timbres from
acoustic or electronic sound sources. For example, a triangle wave may be
combined with clarinet samples to produce a "synthesized" clarinet.
However, I suspect your question pertains to our text-to-speech engines.
There, the distinction between speech synthesizer and voice operates on two
levels. The synthesizer is the speech engine as a whole, while individual
voices (such as male, female, child, etc.) can be chosen.
On a deeper level, though, the difference between synthesizer and voice
rests in the sources for phonemes used by a text-to-speech engine. With
purely synthesized speech, human speech is electronically modeled, just as
digital FM synthesizers such as the Yamaha DX7 attempted to create
acoustic-sounding timbres using electronic sources rather than actual
samples. There's a vital difference between trying to make an electronic
keyboard sound like a violin or banjo, and actually recording single notes
on violin or banjo in order to spread them out across the keyboard.
The old-fashioned speech synthesizer uses no human speech samples, while
most text-to-speech engines today do indeed use exclusively human speech
samples. That's why today's voices sound more realistic and human; they're
fashioned from recordings of human beings speaking different words or parts
of words, from which the speech engine constructs its vocabulary libraries.
As a sidenote, this human speech sampling and modeling technology is at the
point where one can theoretically make a speech engine from anyone's voice,
which has produced some unintended byproducts. It is now possible to create
convincing audio recordings of people allegedly saying things they never
actually said. This is done by sampling enough of their recorded speech to
formulate a lexicon not only of vocabulary, but more important, of their
vocal inflections, the rises, falls, breaths and pauses in their speech.
With this modeling technology, we soon will not know for certain whether
people have actually said what we've heard them say on audio recordings or
videos.
So, there you have it: a little primer on synthesis and sampled sound.


Orlando Enrique Fiol
Ph.D. in Music theory
University of Pennsylvania: November, 2018 Professional Pianist/Keyboardist,
Percussionist and Pedagogue Charlotte, North Carolina


David Diamond
 

Funny because some prefer eloquence over real speak from JAWS. The person who did the Australian voice for JAWS said she had a huge manuscript the size of a phone book to record. Also the Texas version of U S English had slight variations. For me, the word motor sounded like murder. It could have been my hearing disability though.

-----Original Message-----
From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of JM Casey
Sent: September 20, 2020 8:20 PM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

Cool writeup/analysis. I've no doubt we will get there, but I don't think we're
there yet -- I've heard a few top-of-the-lie commercial voice synthesisers
and to me they still haven't quite grasped the inflection and intonations of
the human voice. But they're getting eerily close. So ..in time. And of course,
all our ears are different, too, and this "uncanny valley" aspect is probably
already nonexistent for some people.



-----Original Message-----
From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of Orlando
Enrique Fiol via groups.io
Sent: September 20, 2020 11:10 PM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

At 09:00 PM 9/20/2020, Mark asked:
>what's the difference between a synthesizer and a voice?

A synthesizer uses electronic processes to fashion complex timbres from
acoustic or electronic sound sources. For example, a triangle wave may be
combined with clarinet samples to produce a "synthesized" clarinet.
However, I suspect your question pertains to our text-to-speech engines.
There, the distinction between speech synthesizer and voice operates on
two levels. The synthesizer is the speech engine as a whole, while individual
voices (such as male, female, child, etc.) can be chosen.
On a deeper level, though, the difference between synthesizer and voice
rests in the sources for phonemes used by a text-to-speech engine. With
purely synthesized speech, human speech is electronically modeled, just as
digital FM synthesizers such as the Yamaha DX7 attempted to create
acoustic-sounding timbres using electronic sources rather than actual
samples. There's a vital difference between trying to make an electronic
keyboard sound like a violin or banjo, and actually recording single notes on
violin or banjo in order to spread them out across the keyboard.
The old-fashioned speech synthesizer uses no human speech samples, while
most text-to-speech engines today do indeed use exclusively human speech
samples. That's why today's voices sound more realistic and human; they're
fashioned from recordings of human beings speaking different words or
parts of words, from which the speech engine constructs its vocabulary
libraries.
As a sidenote, this human speech sampling and modeling technology is at the
point where one can theoretically make a speech engine from anyone's
voice, which has produced some unintended byproducts. It is now possible
to create convincing audio recordings of people allegedly saying things they
never actually said. This is done by sampling enough of their recorded speech
to formulate a lexicon not only of vocabulary, but more important, of their
vocal inflections, the rises, falls, breaths and pauses in their speech.
With this modeling technology, we soon will not know for certain whether
people have actually said what we've heard them say on audio recordings or
videos.
So, there you have it: a little primer on synthesis and sampled sound.


Orlando Enrique Fiol
Ph.D. in Music theory
University of Pennsylvania: November, 2018 Professional
Pianist/Keyboardist, Percussionist and Pedagogue Charlotte, North Carolina










 

On Sun, Sep 20, 2020 at 11:20 PM, JM Casey wrote:
and this "uncanny valley" aspect is probably already nonexistent for some people.
-
I'd be one of those people, at least for certain voices under certain synthesizers.

It also really depends on just precisely what is being said.  There are voices that, to me, are "virtual perfection" in mimicking human speech until you get to one specific word that's seldom used or an inflection.  But even then, what sounds "normal" to me may very well sound "weird" to someone else.  One experiences that sensation quite often when listening to different human speakers.  (And I'm ignoring "as a second language" issues and regional accents for that sensation.)
 
--

Brian - Windows 10 Pro, 64-Bit, Version 2004, Build 19041  

The purpose of education is not to validate ignorance but to overcome it.
       ~ Lawrence Krauss


David Diamond
 

I was chatting with someone from New Zealand and she told me some of her compatriots were mimicking the  U S accent. Thus it is not just the screen reader voices, it is Different nations voices.  Example, apparently Canadians and United States persons say aboot instead of about, according to the woman in N Z.   

 

From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of Brian Vogel
Sent: September 21, 2020 9:26 AM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

 

On Sun, Sep 20, 2020 at 11:20 PM, JM Casey wrote:

and this "uncanny valley" aspect is probably already nonexistent for some people.

-
I'd be one of those people, at least for certain voices under certain synthesizers.

It also really depends on just precisely what is being said.  There are voices that, to me, are "virtual perfection" in mimicking human speech until you get to one specific word that's seldom used or an inflection.  But even then, what sounds "normal" to me may very well sound "weird" to someone else.  One experiences that sensation quite often when listening to different human speakers.  (And I'm ignoring "as a second language" issues and regional accents for that sensation.)
 
--

Brian - Windows 10 Pro, 64-Bit, Version 2004, Build 19041  

The purpose of education is not to validate ignorance but to overcome it.
       ~ Lawrence Krauss


Richard Turner
 

Sorry, but people in the United states do not say “aboot” unless they happen to live very close to the Canadian border.

I’m not sure why that is, but the vast majority of people here in the U.S. say about, not aboot.

 

IN fact, most U.S. natives make fun of the Canadians for saying aboot.

 

 

 

Richard

"He that cannot forgive others breaks the bridge over which he must pass himself,” and we forget that only grace can break the cycle of ancient hatreds among peoples. (It is notable that while I have regretted not granting grace to others, I’ve never once regretted extending it.)" - Edward Herbert

 

From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of David Diamond
Sent: Monday, September 21, 2020 1:14 PM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

 

I was chatting with someone from New Zealand and she told me some of her compatriots were mimicking the  U S accent. Thus it is not just the screen reader voices, it is Different nations voices.  Example, apparently Canadians and United States persons say aboot instead of about, according to the woman in N Z.   

 

From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of Brian Vogel
Sent: September 21, 2020 9:26 AM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

 

On Sun, Sep 20, 2020 at 11:20 PM, JM Casey wrote:

and this "uncanny valley" aspect is probably already nonexistent for some people.

-
I'd be one of those people, at least for certain voices under certain synthesizers.

It also really depends on just precisely what is being said.  There are voices that, to me, are "virtual perfection" in mimicking human speech until you get to one specific word that's seldom used or an inflection.  But even then, what sounds "normal" to me may very well sound "weird" to someone else.  One experiences that sensation quite often when listening to different human speakers.  (And I'm ignoring "as a second language" issues and regional accents for that sensation.)
 
--

Brian - Windows 10 Pro, 64-Bit, Version 2004, Build 19041  

The purpose of education is not to validate ignorance but to overcome it.
       ~ Lawrence Krauss


JM Casey
 

Hahah…it’s all relative; Canadians don’t say “aboot” either.

 

 

 

From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of Richard Turner
Sent: September 21, 2020 5:15 PM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

 

Sorry, but people in the United states do not say “aboot” unless they happen to live very close to the Canadian border.

I’m not sure why that is, but the vast majority of people here in the U.S. say about, not aboot.

 

IN fact, most U.S. natives make fun of the Canadians for saying aboot.

 

 

 

Richard

"He that cannot forgive others breaks the bridge over which he must pass himself,” and we forget that only grace can break the cycle of ancient hatreds among peoples. (It is notable that while I have regretted not granting grace to others, I’ve never once regretted extending it.)" - Edward Herbert

 

From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of David Diamond
Sent: Monday, September 21, 2020 1:14 PM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

 

I was chatting with someone from New Zealand and she told me some of her compatriots were mimicking the  U S accent. Thus it is not just the screen reader voices, it is Different nations voices.  Example, apparently Canadians and United States persons say aboot instead of about, according to the woman in N Z.   

 

From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of Brian Vogel
Sent: September 21, 2020 9:26 AM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

 

On Sun, Sep 20, 2020 at 11:20 PM, JM Casey wrote:

and this "uncanny valley" aspect is probably already nonexistent for some people.

-
I'd be one of those people, at least for certain voices under certain synthesizers.

It also really depends on just precisely what is being said.  There are voices that, to me, are "virtual perfection" in mimicking human speech until you get to one specific word that's seldom used or an inflection.  But even then, what sounds "normal" to me may very well sound "weird" to someone else.  One experiences that sensation quite often when listening to different human speakers.  (And I'm ignoring "as a second language" issues and regional accents for that sensation.)
 
--

Brian - Windows 10 Pro, 64-Bit, Version 2004, Build 19041  

The purpose of education is not to validate ignorance but to overcome it.
       ~ Lawrence Krauss


JM Casey
 

I can tell you two reasons off the top of my head why many might prefer
Eloquence.
1. Its pronunciation of any english word at least in the American variant is
basically perfect.
2. it is really much better at fast speed than any of the sampled voices.
These more human sounding voices were not meant to be used at the fast rates
many blind people listen to synthesised speech. It makes the samples sound a
jumbled mess. Nevertheless I do know some people who still listen to modern
human-derived synthesised voices at fast(er) speeds.

-----Original Message-----
From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of David Diamond
Sent: September 21, 2020 12:13 AM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

Funny because some prefer eloquence over real speak from JAWS. The person
who did the Australian voice for JAWS said she had a huge manuscript the
size of a phone book to record. Also the Texas version of U S English had
slight variations. For me, the word motor sounded like murder. It could
have been my hearing disability though.

-----Original Message-----
From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of JM Casey
Sent: September 20, 2020 8:20 PM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

Cool writeup/analysis. I've no doubt we will get there, but I don't
think we're there yet -- I've heard a few top-of-the-lie commercial
voice synthesisers and to me they still haven't quite grasped the
inflection and intonations of the human voice. But they're getting
eerily close. So ..in time. And of course, all our ears are different,
too, and this "uncanny valley" aspect is probably already nonexistent for
some people.



-----Original Message-----
From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of Orlando
Enrique Fiol via groups.io
Sent: September 20, 2020 11:10 PM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

At 09:00 PM 9/20/2020, Mark asked:
>what's the difference between a synthesizer and a voice?

A synthesizer uses electronic processes to fashion complex timbres
from acoustic or electronic sound sources. For example, a triangle
wave may be combined with clarinet samples to produce a "synthesized"
clarinet.
However, I suspect your question pertains to our text-to-speech engines.
There, the distinction between speech synthesizer and voice operates
on two levels. The synthesizer is the speech engine as a whole, while
individual voices (such as male, female, child, etc.) can be chosen.
On a deeper level, though, the difference between synthesizer and
voice rests in the sources for phonemes used by a text-to-speech
engine. With purely synthesized speech, human speech is electronically
modeled, just as digital FM synthesizers such as the Yamaha DX7
attempted to create acoustic-sounding timbres using electronic sources
rather than actual samples. There's a vital difference between trying
to make an electronic keyboard sound like a violin or banjo, and
actually recording single notes on violin or banjo in order to spread them
out across the keyboard.
The old-fashioned speech synthesizer uses no human speech samples,
while most text-to-speech engines today do indeed use exclusively
human speech samples. That's why today's voices sound more realistic
and human; they're fashioned from recordings of human beings speaking
different words or parts of words, from which the speech engine
constructs its vocabulary libraries.
As a sidenote, this human speech sampling and modeling technology is
at the point where one can theoretically make a speech engine from
anyone's voice, which has produced some unintended byproducts. It is
now possible to create convincing audio recordings of people allegedly
saying things they never actually said. This is done by sampling
enough of their recorded speech to formulate a lexicon not only of
vocabulary, but more important, of their vocal inflections, the rises,
falls, breaths and pauses in their speech.
With this modeling technology, we soon will not know for certain
whether people have actually said what we've heard them say on audio
recordings or videos.
So, there you have it: a little primer on synthesis and sampled sound.


Orlando Enrique Fiol
Ph.D. in Music theory
University of Pennsylvania: November, 2018 Professional
Pianist/Keyboardist, Percussionist and Pedagogue Charlotte, North
Carolina










 

On Mon, Sep 21, 2020 at 06:10 PM, JM Casey wrote:
These more human sounding voices were not meant to be used at the fast rates many blind people listen to synthesised speech.
-
And knowing some of those blind people, I still cannot comprehend how they comprehend what they're hearing.  Clearly they do, but my head (auditory processing, in particular) reels at the speech rate that some of my clients routinely use for themselves.  I have on more than one occasion had to ask someone I was tutoring on something new to them in the screen reader to greatly reduce the speed so that I could be sure that what I expected to hear was what I was indeed hearing!
 
--

Brian - Windows 10 Pro, 64-Bit, Version 2004, Build 19041  

The purpose of education is not to validate ignorance but to overcome it.
       ~ Lawrence Krauss


Loy
 


After 20 years with Eloquence, I still prefer it over the human sounding voices for screen reader. I have used some of the human sounding voices for reading books at a normal speed and they are getting better.

----- Original Message -----
From: JM Casey
Sent: Monday, September 21, 2020 5:56 PM
Subject: Re: synthesizer versus voice

Hahah…it’s all relative; Canadians don’t say “aboot” either.

 

 

 

From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of Richard Turner
Sent: September 21, 2020 5:15 PM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

 

Sorry, but people in the United states do not say “aboot” unless they happen to live very close to the Canadian border.

I’m not sure why that is, but the vast majority of people here in the U.S. say about, not aboot.

 

IN fact, most U.S. natives make fun of the Canadians for saying aboot.

 

 

 

Richard

"He that cannot forgive others breaks the bridge over which he must pass himself,” and we forget that only grace can break the cycle of ancient hatreds among peoples. (It is notable that while I have regretted not granting grace to others, I’ve never once regretted extending it.)" - Edward Herbert

 

From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of David Diamond
Sent: Monday, September 21, 2020 1:14 PM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

 

I was chatting with someone from New Zealand and she told me some of her compatriots were mimicking the  U S accent. Thus it is not just the screen reader voices, it is Different nations voices.  Example, apparently Canadians and United States persons say aboot instead of about, according to the woman in N Z.   

 

From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of Brian Vogel
Sent: September 21, 2020 9:26 AM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

 

On Sun, Sep 20, 2020 at 11:20 PM, JM Casey wrote:

and this "uncanny valley" aspect is probably already nonexistent for some people.

-
I'd be one of those people, at least for certain voices under certain synthesizers.

It also really depends on just precisely what is being said.  There are voices that, to me, are "virtual perfection" in mimicking human speech until you get to one specific word that's seldom used or an inflection.  But even then, what sounds "normal" to me may very well sound "weird" to someone else.  One experiences that sensation quite often when listening to different human speakers.  (And I'm ignoring "as a second language" issues and regional accents for that sensation.)
 
--

Brian - Windows 10 Pro, 64-Bit, Version 2004, Build 19041  

The purpose of education is not to validate ignorance but to overcome it.
       ~ Lawrence Krauss


David Diamond
 

These more human sounding voices were not meant to be used at the fast rates many blind people listen to synthesised speech.  This was the exact reason why some blind persons, not me, prefer eloquence over the more human sounding voices.  Myself, listening to sped up speech via eloquence then  a person talking to me, as in a family member, is like the equivalent of going 50 miles per hour then slamming on the brake and going in reverse.  Sorry if that does not make sense.  I equate it to brain whiplash. 

 

 

From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of Brian Vogel
Sent: September 21, 2020 3:32 PM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

 

On Mon, Sep 21, 2020 at 06:10 PM, JM Casey wrote:

These more human sounding voices were not meant to be used at the fast rates many blind people listen to synthesised speech.

-
And knowing some of those blind people, I still cannot comprehend how they comprehend what they're hearing.  Clearly they do, but my head (auditory processing, in particular) reels at the speech rate that some of my clients routinely use for themselves.  I have on more than one occasion had to ask someone I was tutoring on something new to them in the screen reader to greatly reduce the speed so that I could be sure that what I expected to hear was what I was indeed hearing!
 
--

Brian - Windows 10 Pro, 64-Bit, Version 2004, Build 19041  

The purpose of education is not to validate ignorance but to overcome it.
       ~ Lawrence Krauss


Maria Campbell
 

Agree about Eloquence still being the best for me, though synths are getting better.


Maria Campbell
lucky1inct@...

All that is necessary for evil to triumph is for good people to do nothing.
--Edmund Burke
On 9/21/2020 6:49 PM, Loy wrote:


After 20 years with Eloquence, I still prefer it over the human sounding voices for screen reader. I have used some of the human sounding voices for reading books at a normal speed and they are getting better.
----- Original Message -----
From: JM Casey
Sent: Monday, September 21, 2020 5:56 PM
Subject: Re: synthesizer versus voice

Hahah…it’s all relative; Canadians don’t say “aboot” either.

 

 

 

From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of Richard Turner
Sent: September 21, 2020 5:15 PM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

 

Sorry, but people in the United states do not say “aboot” unless they happen to live very close to the Canadian border.

I’m not sure why that is, but the vast majority of people here in the U.S. say about, not aboot.

 

IN fact, most U.S. natives make fun of the Canadians for saying aboot.

 

 

 

Richard

"He that cannot forgive others breaks the bridge over which he must pass himself,” and we forget that only grace can break the cycle of ancient hatreds among peoples. (It is notable that while I have regretted not granting grace to others, I’ve never once regretted extending it.)" - Edward Herbert

 

From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of David Diamond
Sent: Monday, September 21, 2020 1:14 PM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

 

I was chatting with someone from New Zealand and she told me some of her compatriots were mimicking the  U S accent. Thus it is not just the screen reader voices, it is Different nations voices.  Example, apparently Canadians and United States persons say aboot instead of about, according to the woman in N Z.   

 

From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of Brian Vogel
Sent: September 21, 2020 9:26 AM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

 

On Sun, Sep 20, 2020 at 11:20 PM, JM Casey wrote:

and this "uncanny valley" aspect is probably already nonexistent for some people.

-
I'd be one of those people, at least for certain voices under certain synthesizers.

It also really depends on just precisely what is being said.  There are voices that, to me, are "virtual perfection" in mimicking human speech until you get to one specific word that's seldom used or an inflection.  But even then, what sounds "normal" to me may very well sound "weird" to someone else.  One experiences that sensation quite often when listening to different human speakers.  (And I'm ignoring "as a second language" issues and regional accents for that sensation.)
 
--

Brian - Windows 10 Pro, 64-Bit, Version 2004, Build 19041  

The purpose of education is not to validate ignorance but to overcome it.
       ~ Lawrence Krauss


David Diamond
 

I suspect she was just listening to the wrong person or someone was pulling her leg.  Just like Canadians are supposed to say A all the time. At one guide dog school, since I was the only Canadian there I said, “Only low class Canadians say A.”   

 

From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of JM Casey
Sent: September 21, 2020 2:57 PM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

 

Hahah…it’s all relative; Canadians don’t say “aboot” either.

 

 

 

From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of Richard Turner
Sent: September 21, 2020 5:15 PM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

 

Sorry, but people in the United states do not say “aboot” unless they happen to live very close to the Canadian border.

I’m not sure why that is, but the vast majority of people here in the U.S. say about, not aboot.

 

IN fact, most U.S. natives make fun of the Canadians for saying aboot.

 

 

 

Richard

"He that cannot forgive others breaks the bridge over which he must pass himself,” and we forget that only grace can break the cycle of ancient hatreds among peoples. (It is notable that while I have regretted not granting grace to others, I’ve never once regretted extending it.)" - Edward Herbert

 

From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of David Diamond
Sent: Monday, September 21, 2020 1:14 PM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

 

I was chatting with someone from New Zealand and she told me some of her compatriots were mimicking the  U S accent. Thus it is not just the screen reader voices, it is Different nations voices.  Example, apparently Canadians and United States persons say aboot instead of about, according to the woman in N Z.   

 

From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of Brian Vogel
Sent: September 21, 2020 9:26 AM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

 

On Sun, Sep 20, 2020 at 11:20 PM, JM Casey wrote:

and this "uncanny valley" aspect is probably already nonexistent for some people.

-
I'd be one of those people, at least for certain voices under certain synthesizers.

It also really depends on just precisely what is being said.  There are voices that, to me, are "virtual perfection" in mimicking human speech until you get to one specific word that's seldom used or an inflection.  But even then, what sounds "normal" to me may very well sound "weird" to someone else.  One experiences that sensation quite often when listening to different human speakers.  (And I'm ignoring "as a second language" issues and regional accents for that sensation.)
 
--

Brian - Windows 10 Pro, 64-Bit, Version 2004, Build 19041  

The purpose of education is not to validate ignorance but to overcome it.
       ~ Lawrence Krauss


Glenn / Lenny
 


I'm in the U.S. and I've never even heard that used before.
I live in the mid-west.
Glenn

----- Original Message -----
From: JM Casey
Sent: Monday, September 21, 2020 4:56 PM
Subject: Re: synthesizer versus voice

Hahah…it’s all relative; Canadians don’t say “aboot” either.

 

 

 

From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of Richard Turner
Sent: September 21, 2020 5:15 PM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

 

Sorry, but people in the United states do not say “aboot” unless they happen to live very close to the Canadian border.

I’m not sure why that is, but the vast majority of people here in the U.S. say about, not aboot.

 

IN fact, most U.S. natives make fun of the Canadians for saying aboot.

 

 

 

Richard

"He that cannot forgive others breaks the bridge over which he must pass himself,” and we forget that only grace can break the cycle of ancient hatreds among peoples. (It is notable that while I have regretted not granting grace to others, I’ve never once regretted extending it.)" - Edward Herbert

 

From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of David Diamond
Sent: Monday, September 21, 2020 1:14 PM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

 

I was chatting with someone from New Zealand and she told me some of her compatriots were mimicking the  U S accent. Thus it is not just the screen reader voices, it is Different nations voices.  Example, apparently Canadians and United States persons say aboot instead of about, according to the woman in N Z.   

 

From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of Brian Vogel
Sent: September 21, 2020 9:26 AM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

 

On Sun, Sep 20, 2020 at 11:20 PM, JM Casey wrote:

and this "uncanny valley" aspect is probably already nonexistent for some people.

-
I'd be one of those people, at least for certain voices under certain synthesizers.

It also really depends on just precisely what is being said.  There are voices that, to me, are "virtual perfection" in mimicking human speech until you get to one specific word that's seldom used or an inflection.  But even then, what sounds "normal" to me may very well sound "weird" to someone else.  One experiences that sensation quite often when listening to different human speakers.  (And I'm ignoring "as a second language" issues and regional accents for that sensation.)
 
--

Brian - Windows 10 Pro, 64-Bit, Version 2004, Build 19041  

The purpose of education is not to validate ignorance but to overcome it.
       ~ Lawrence Krauss


Glenn / Lenny
 

I do like Eloquence for the reasons you state, and also, I can have some
privacy without headphones, as most non-screenreader users pass it off as
noise.

----- Original Message -----
From: "JM Casey" <jmcasey@...>
To: <main@jfw.groups.io>
Sent: Monday, September 21, 2020 5:10 PM
Subject: Re: synthesizer versus voice


I can tell you two reasons off the top of my head why many might prefer
Eloquence.
1. Its pronunciation of any english word at least in the American variant is
basically perfect.
2. it is really much better at fast speed than any of the sampled voices.
These more human sounding voices were not meant to be used at the fast rates
many blind people listen to synthesised speech. It makes the samples sound a
jumbled mess. Nevertheless I do know some people who still listen to modern
human-derived synthesised voices at fast(er) speeds.



-----Original Message-----
From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of David Diamond
Sent: September 21, 2020 12:13 AM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

Funny because some prefer eloquence over real speak from JAWS. The person
who did the Australian voice for JAWS said she had a huge manuscript the
size of a phone book to record. Also the Texas version of U S English had
slight variations. For me, the word motor sounded like murder. It could
have been my hearing disability though.

-----Original Message-----
From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of JM Casey
Sent: September 20, 2020 8:20 PM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

Cool writeup/analysis. I've no doubt we will get there, but I don't
think we're there yet -- I've heard a few top-of-the-lie commercial
voice synthesisers and to me they still haven't quite grasped the
inflection and intonations of the human voice. But they're getting
eerily close. So ..in time. And of course, all our ears are different,
too, and this "uncanny valley" aspect is probably already nonexistent for
some people.



-----Original Message-----
From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of Orlando
Enrique Fiol via groups.io
Sent: September 20, 2020 11:10 PM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

At 09:00 PM 9/20/2020, Mark asked:
>what's the difference between a synthesizer and a voice?

A synthesizer uses electronic processes to fashion complex timbres
from acoustic or electronic sound sources. For example, a triangle
wave may be combined with clarinet samples to produce a "synthesized"
clarinet.
However, I suspect your question pertains to our text-to-speech engines.
There, the distinction between speech synthesizer and voice operates
on two levels. The synthesizer is the speech engine as a whole, while
individual voices (such as male, female, child, etc.) can be chosen.
On a deeper level, though, the difference between synthesizer and
voice rests in the sources for phonemes used by a text-to-speech
engine. With purely synthesized speech, human speech is electronically
modeled, just as digital FM synthesizers such as the Yamaha DX7
attempted to create acoustic-sounding timbres using electronic sources
rather than actual samples. There's a vital difference between trying
to make an electronic keyboard sound like a violin or banjo, and
actually recording single notes on violin or banjo in order to spread them
out across the keyboard.
The old-fashioned speech synthesizer uses no human speech samples,
while most text-to-speech engines today do indeed use exclusively
human speech samples. That's why today's voices sound more realistic
and human; they're fashioned from recordings of human beings speaking
different words or parts of words, from which the speech engine
constructs its vocabulary libraries.
As a sidenote, this human speech sampling and modeling technology is
at the point where one can theoretically make a speech engine from
anyone's voice, which has produced some unintended byproducts. It is
now possible to create convincing audio recordings of people allegedly
saying things they never actually said. This is done by sampling
enough of their recorded speech to formulate a lexicon not only of
vocabulary, but more important, of their vocal inflections, the rises,
falls, breaths and pauses in their speech.
With this modeling technology, we soon will not know for certain
whether people have actually said what we've heard them say on audio
recordings or videos.
So, there you have it: a little primer on synthesis and sampled sound.


Orlando Enrique Fiol
Ph.D. in Music theory
University of Pennsylvania: November, 2018 Professional
Pianist/Keyboardist, Percussionist and Pedagogue Charlotte, North
Carolina










Pastor Gilbert Pries
 

I still like my DECtalk USB.

Pastor Gil

-----Original Message-----
From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of JM Casey
Sent: Monday, September 21, 2020 3:10 PM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

I can tell you two reasons off the top of my head why many might prefer
Eloquence.
1. Its pronunciation of any english word at least in the American variant is
basically perfect.
2. it is really much better at fast speed than any of the sampled voices.
These more human sounding voices were not meant to be used at the fast rates
many blind people listen to synthesised speech. It makes the samples sound a
jumbled mess. Nevertheless I do know some people who still listen to modern
human-derived synthesised voices at fast(er) speeds.



-----Original Message-----
From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of David Diamond
Sent: September 21, 2020 12:13 AM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

Funny because some prefer eloquence over real speak from JAWS. The person
who did the Australian voice for JAWS said she had a huge manuscript the
size of a phone book to record. Also the Texas version of U S English had
slight variations. For me, the word motor sounded like murder. It could
have been my hearing disability though.

-----Original Message-----
From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of JM Casey
Sent: September 20, 2020 8:20 PM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

Cool writeup/analysis. I've no doubt we will get there, but I don't
think we're there yet -- I've heard a few top-of-the-lie commercial
voice synthesisers and to me they still haven't quite grasped the
inflection and intonations of the human voice. But they're getting
eerily close. So ..in time. And of course, all our ears are different,
too, and this "uncanny valley" aspect is probably already nonexistent
for
some people.



-----Original Message-----
From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of Orlando
Enrique Fiol via groups.io
Sent: September 20, 2020 11:10 PM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

At 09:00 PM 9/20/2020, Mark asked:
>what's the difference between a synthesizer and a voice?

A synthesizer uses electronic processes to fashion complex timbres
from acoustic or electronic sound sources. For example, a triangle
wave may be combined with clarinet samples to produce a "synthesized"
clarinet.
However, I suspect your question pertains to our text-to-speech engines.
There, the distinction between speech synthesizer and voice operates
on two levels. The synthesizer is the speech engine as a whole, while
individual voices (such as male, female, child, etc.) can be chosen.
On a deeper level, though, the difference between synthesizer and
voice rests in the sources for phonemes used by a text-to-speech
engine. With purely synthesized speech, human speech is electronically
modeled, just as digital FM synthesizers such as the Yamaha DX7
attempted to create acoustic-sounding timbres using electronic sources
rather than actual samples. There's a vital difference between trying
to make an electronic keyboard sound like a violin or banjo, and
actually recording single notes on violin or banjo in order to spread
them
out across the keyboard.
The old-fashioned speech synthesizer uses no human speech samples,
while most text-to-speech engines today do indeed use exclusively
human speech samples. That's why today's voices sound more realistic
and human; they're fashioned from recordings of human beings speaking
different words or parts of words, from which the speech engine
constructs its vocabulary libraries.
As a sidenote, this human speech sampling and modeling technology is
at the point where one can theoretically make a speech engine from
anyone's voice, which has produced some unintended byproducts. It is
now possible to create convincing audio recordings of people allegedly
saying things they never actually said. This is done by sampling
enough of their recorded speech to formulate a lexicon not only of
vocabulary, but more important, of their vocal inflections, the rises,
falls, breaths and pauses in their speech.
With this modeling technology, we soon will not know for certain
whether people have actually said what we've heard them say on audio
recordings or videos.
So, there you have it: a little primer on synthesis and sampled sound.


Orlando Enrique Fiol
Ph.D. in Music theory
University of Pennsylvania: November, 2018 Professional
Pianist/Keyboardist, Percussionist and Pedagogue Charlotte, North
Carolina










Pastor Gilbert Pries
 

Eliquence sounds like it has a cold sometimes.

 

Pastor Gil

 

From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of Brian Vogel
Sent: Monday, September 21, 2020 3:32 PM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

 

On Mon, Sep 21, 2020 at 06:10 PM, JM Casey wrote:

These more human sounding voices were not meant to be used at the fast rates many blind people listen to synthesised speech.

-
And knowing some of those blind people, I still cannot comprehend how they comprehend what they're hearing.  Clearly they do, but my head (auditory processing, in particular) reels at the speech rate that some of my clients routinely use for themselves.  I have on more than one occasion had to ask someone I was tutoring on something new to them in the screen reader to greatly reduce the speed so that I could be sure that what I expected to hear was what I was indeed hearing!
 
--

Brian - Windows 10 Pro, 64-Bit, Version 2004, Build 19041  

The purpose of education is not to validate ignorance but to overcome it.
       ~ Lawrence Krauss


Pastor Gilbert Pries
 

I like my DECTalk.

I've used it for years.

 

Pastor Gil

 

From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of Loy
Sent: Monday, September 21, 2020 3:49 PM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

 

After 20 years with Eloquence, I still prefer it over the human sounding voices for screen reader. I have used some of the human sounding voices for reading books at a normal speed and they are getting better.

----- Original Message -----

From: JM Casey

Sent: Monday, September 21, 2020 5:56 PM

Subject: Re: synthesizer versus voice

 

Hahah…it’s all relative; Canadians don’t say “aboot” either.

 

 

 

From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of Richard Turner
Sent: September 21, 2020 5:15 PM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

 

Sorry, but people in the United states do not say “aboot” unless they happen to live very close to the Canadian border.

I’m not sure why that is, but the vast majority of people here in the U.S. say about, not aboot.

 

IN fact, most U.S. natives make fun of the Canadians for saying aboot.

 

 

 

Richard

"He that cannot forgive others breaks the bridge over which he must pass himself,” and we forget that only grace can break the cycle of ancient hatreds among peoples. (It is notable that while I have regretted not granting grace to others, I’ve never once regretted extending it.)" - Edward Herbert

 

From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of David Diamond
Sent: Monday, September 21, 2020 1:14 PM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

 

I was chatting with someone from New Zealand and she told me some of her compatriots were mimicking the  U S accent. Thus it is not just the screen reader voices, it is Different nations voices.  Example, apparently Canadians and United States persons say aboot instead of about, according to the woman in N Z.   

 

From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of Brian Vogel
Sent: September 21, 2020 9:26 AM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

 

On Sun, Sep 20, 2020 at 11:20 PM, JM Casey wrote:

and this "uncanny valley" aspect is probably already nonexistent for some people.

-
I'd be one of those people, at least for certain voices under certain synthesizers.

It also really depends on just precisely what is being said.  There are voices that, to me, are "virtual perfection" in mimicking human speech until you get to one specific word that's seldom used or an inflection.  But even then, what sounds "normal" to me may very well sound "weird" to someone else.  One experiences that sensation quite often when listening to different human speakers.  (And I'm ignoring "as a second language" issues and regional accents for that sensation.)
 
--

Brian - Windows 10 Pro, 64-Bit, Version 2004, Build 19041  

The purpose of education is not to validate ignorance but to overcome it.
       ~ Lawrence Krauss