Topics

moderated synthesizer versus voice


David Diamond
 

There is an actress in the U S who even when speaking French in a movie where she was supposed to be from France, still sounded like an American trying to speak French. The words themselves were correct just, not the proper pronunciation of the words. Somewhat like some languages roll their Rs in some of the words, not rolling them you still can tell what the person is trying to say, just they are not rolling their Rs the way that culture and language does.    

Most of the eloquence voices such as rocko, gramma etc. all that is being done is the pitch is being changed.

From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of JM Casey
Sent: September 22, 2020 12:21 PM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

 

Not saying it is, of course, but if it 8was* your own regional accent that was being talked about, you wouldn’t really be aware of it and how it sounds to others as to you it’d just be the default state of speaking. When  people from other parts talk about the accent they hear and especially attempt to imitate what they are hearing, what comes out tends to be an exaggeration or caricature. This is why in early drama school and such they tell you not to try putting on accents when attempting to play a character., Some people do get really good at it (See Peter Sellers for instance, thougha s a comic actor exaggeration was also one of his things).

 

From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of Glenn / Lenny
Sent: September 21, 2020 8:11 PM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

 

I'm in the U.S. and I've never even heard that used before.

I live in the mid-west.

Glenn

----- Original Message -----

From: JM Casey

Sent: Monday, September 21, 2020 4:56 PM

Subject: Re: synthesizer versus voice

 

Hahah…it’s all relative; Canadians don’t say “aboot” either.

 

 

 

From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of Richard Turner
Sent: September 21, 2020 5:15 PM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

 

Sorry, but people in the United states do not say “aboot” unless they happen to live very close to the Canadian border.

I’m not sure why that is, but the vast majority of people here in the U.S. say about, not aboot.

 

IN fact, most U.S. natives make fun of the Canadians for saying aboot.

 

 

 

Richard

"He that cannot forgive others breaks the bridge over which he must pass himself,” and we forget that only grace can break the cycle of ancient hatreds among peoples. (It is notable that while I have regretted not granting grace to others, I’ve never once regretted extending it.)" - Edward Herbert

 

From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of David Diamond
Sent: Monday, September 21, 2020 1:14 PM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

 

I was chatting with someone from New Zealand and she told me some of her compatriots were mimicking the  U S accent. Thus it is not just the screen reader voices, it is Different nations voices.  Example, apparently Canadians and United States persons say aboot instead of about, according to the woman in N Z.   

 

From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of Brian Vogel
Sent: September 21, 2020 9:26 AM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

 

On Sun, Sep 20, 2020 at 11:20 PM, JM Casey wrote:

and this "uncanny valley" aspect is probably already nonexistent for some people.

-
I'd be one of those people, at least for certain voices under certain synthesizers.

It also really depends on just precisely what is being said.  There are voices that, to me, are "virtual perfection" in mimicking human speech until you get to one specific word that's seldom used or an inflection.  But even then, what sounds "normal" to me may very well sound "weird" to someone else.  One experiences that sensation quite often when listening to different human speakers.  (And I'm ignoring "as a second language" issues and regional accents for that sensation.)
 
--

Brian - Windows 10 Pro, 64-Bit, Version 2004, Build 19041  

The purpose of education is not to validate ignorance but to overcome it.
       ~ Lawrence Krauss


JM Casey
 

I hear that. I’m not one of those people but if I’m using eloquence I still set the rate at about 50%, which is too fast for most non-screen-reader users to understand. I probably won’t go faster as I have some hearing issues…everything always sounds loud enough but deciphering fast speech, certain types of accents, or having conversations in a crowded place where everyone is talking can be difficult for me – and many go way faster than I do when it comes to their speech output. I use braille most of the time now and only turn on speech once in a while.

 

 

 

From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of Brian Vogel
Sent: September 21, 2020 6:32 PM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

 

On Mon, Sep 21, 2020 at 06:10 PM, JM Casey wrote:

These more human sounding voices were not meant to be used at the fast rates many blind people listen to synthesised speech.

-
And knowing some of those blind people, I still cannot comprehend how they comprehend what they're hearing.  Clearly they do, but my head (auditory processing, in particular) reels at the speech rate that some of my clients routinely use for themselves.  I have on more than one occasion had to ask someone I was tutoring on something new to them in the screen reader to greatly reduce the speed so that I could be sure that what I expected to hear was what I was indeed hearing!
 
--

Brian - Windows 10 Pro, 64-Bit, Version 2004, Build 19041  

The purpose of education is not to validate ignorance but to overcome it.
       ~ Lawrence Krauss


JM Casey
 

Not saying it is, of course, but if it 8was* your own regional accent that was being talked about, you wouldn’t really be aware of it and how it sounds to others as to you it’d just be the default state of speaking. When  people from other parts talk about the accent they hear and especially attempt to imitate what they are hearing, what comes out tends to be an exaggeration or caricature. This is why in early drama school and such they tell you not to try putting on accents when attempting to play a character., Some people do get really good at it (See Peter Sellers for instance, thougha s a comic actor exaggeration was also one of his things).

 

From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of Glenn / Lenny
Sent: September 21, 2020 8:11 PM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

 

I'm in the U.S. and I've never even heard that used before.

I live in the mid-west.

Glenn

----- Original Message -----

From: JM Casey

Sent: Monday, September 21, 2020 4:56 PM

Subject: Re: synthesizer versus voice

 

Hahah…it’s all relative; Canadians don’t say “aboot” either.

 

 

 

From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of Richard Turner
Sent: September 21, 2020 5:15 PM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

 

Sorry, but people in the United states do not say “aboot” unless they happen to live very close to the Canadian border.

I’m not sure why that is, but the vast majority of people here in the U.S. say about, not aboot.

 

IN fact, most U.S. natives make fun of the Canadians for saying aboot.

 

 

 

Richard

"He that cannot forgive others breaks the bridge over which he must pass himself,” and we forget that only grace can break the cycle of ancient hatreds among peoples. (It is notable that while I have regretted not granting grace to others, I’ve never once regretted extending it.)" - Edward Herbert

 

From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of David Diamond
Sent: Monday, September 21, 2020 1:14 PM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

 

I was chatting with someone from New Zealand and she told me some of her compatriots were mimicking the  U S accent. Thus it is not just the screen reader voices, it is Different nations voices.  Example, apparently Canadians and United States persons say aboot instead of about, according to the woman in N Z.   

 

From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of Brian Vogel
Sent: September 21, 2020 9:26 AM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

 

On Sun, Sep 20, 2020 at 11:20 PM, JM Casey wrote:

and this "uncanny valley" aspect is probably already nonexistent for some people.

-
I'd be one of those people, at least for certain voices under certain synthesizers.

It also really depends on just precisely what is being said.  There are voices that, to me, are "virtual perfection" in mimicking human speech until you get to one specific word that's seldom used or an inflection.  But even then, what sounds "normal" to me may very well sound "weird" to someone else.  One experiences that sensation quite often when listening to different human speakers.  (And I'm ignoring "as a second language" issues and regional accents for that sensation.)
 
--

Brian - Windows 10 Pro, 64-Bit, Version 2004, Build 19041  

The purpose of education is not to validate ignorance but to overcome it.
       ~ Lawrence Krauss


Loy
 


The female voice used by bookshare audio is pretty good.

----- Original Message -----
From: JM Casey
Sent: Tuesday, September 22, 2020 1:52 PM
Subject: Re: synthesizer versus voice

I know that the Daniel UK voice is created from the “blueprints” of an actor’s voice..Jonathan somethingorother (can’t recall).

 

For a good time, try the Kate (UK) voice and write out “Hey! How’s it going?” or something liket aht

She sounds so enthusiastic she squeaks…

You  have to get the exclamation mark in there after a single preparatory word though.

 

From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of David Diamond
Sent: September 22, 2020 1:10 PM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

 

Some of the voices recorded by humans, sound a little off too.  One of the female U S voices sounds like she is having a bad day all the time and needs to chill out as they say.  I guess it all comes down to what you prefer. I heard an interview of the woman who does the Australian voice, Karen and she went into great detail as to what was involved in creating that screen reader voice for JAWS or IOS devices. The voice got named after her real name, Karen.    

 

From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of Pastor Gil Pries
Sent: September 21, 2020 3:44 PM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

 

Eliquence sounds like it has a cold sometimes.

 

Pastor Gil

 

From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of Brian Vogel
Sent: Monday, September 21, 2020 3:32 PM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

 

On Mon, Sep 21, 2020 at 06:10 PM, JM Casey wrote:

These more human sounding voices were not meant to be used at the fast rates many blind people listen to synthesised speech.

-
And knowing some of those blind people, I still cannot comprehend how they comprehend what they're hearing.  Clearly they do, but my head (auditory processing, in particular) reels at the speech rate that some of my clients routinely use for themselves.  I have on more than one occasion had to ask someone I was tutoring on something new to them in the screen reader to greatly reduce the speed so that I could be sure that what I expected to hear was what I was indeed hearing!
 
--

Brian - Windows 10 Pro, 64-Bit, Version 2004, Build 19041  

The purpose of education is not to validate ignorance but to overcome it.
       ~ Lawrence Krauss


JM Casey
 

I know that the Daniel UK voice is created from the “blueprints” of an actor’s voice..Jonathan somethingorother (can’t recall).

 

For a good time, try the Kate (UK) voice and write out “Hey! How’s it going?” or something liket aht

She sounds so enthusiastic she squeaks…

You  have to get the exclamation mark in there after a single preparatory word though.

 

From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of David Diamond
Sent: September 22, 2020 1:10 PM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

 

Some of the voices recorded by humans, sound a little off too.  One of the female U S voices sounds like she is having a bad day all the time and needs to chill out as they say.  I guess it all comes down to what you prefer. I heard an interview of the woman who does the Australian voice, Karen and she went into great detail as to what was involved in creating that screen reader voice for JAWS or IOS devices. The voice got named after her real name, Karen.    

 

From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of Pastor Gil Pries
Sent: September 21, 2020 3:44 PM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

 

Eliquence sounds like it has a cold sometimes.

 

Pastor Gil

 

From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of Brian Vogel
Sent: Monday, September 21, 2020 3:32 PM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

 

On Mon, Sep 21, 2020 at 06:10 PM, JM Casey wrote:

These more human sounding voices were not meant to be used at the fast rates many blind people listen to synthesised speech.

-
And knowing some of those blind people, I still cannot comprehend how they comprehend what they're hearing.  Clearly they do, but my head (auditory processing, in particular) reels at the speech rate that some of my clients routinely use for themselves.  I have on more than one occasion had to ask someone I was tutoring on something new to them in the screen reader to greatly reduce the speed so that I could be sure that what I expected to hear was what I was indeed hearing!
 
--

Brian - Windows 10 Pro, 64-Bit, Version 2004, Build 19041  

The purpose of education is not to validate ignorance but to overcome it.
       ~ Lawrence Krauss


JM Casey
 

Yeah…accents kind of ebb and flow and they don’t really regard physical borders, but are influenced by everything in the cultural environment. I happen to think people just across the lake in Rochester NY (Raachster) sound really different/distinctive, but to my friend in Mousouri, they sound “Canadian”, too. And of course we have many Canadian regional accents as well just in the english-speaking areas…martime (newfoundland in particular) being perhaps the most instantly recognisable.

 

Shame about your uK friend though. Jeez…

From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of David Diamond
Sent: September 22, 2020 1:15 PM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

 

I think it is the way they are hearing words with o u in them.  Last night she was telling me about a U S gentleman that pronounced house as huse.  Thus, she wanted me to pronounce it the same way.  People are quirky at times.  ?A lady in the UK told me she did not want to talk to me via the phone because she got hurt by a man in the U S and couldn’t stand to hear another U S voice.  I always thought Canadians had a different sounding voice then the U S? 

 

From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of Glenn / Lenny
Sent: September 21, 2020 5:11 PM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

 

I'm in the U.S. and I've never even heard that used before.

I live in the mid-west.

Glenn

----- Original Message -----

From: JM Casey

Sent: Monday, September 21, 2020 4:56 PM

Subject: Re: synthesizer versus voice

 

Hahah…it’s all relative; Canadians don’t say “aboot” either.

 

 

 

From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of Richard Turner
Sent: September 21, 2020 5:15 PM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

 

Sorry, but people in the United states do not say “aboot” unless they happen to live very close to the Canadian border.

I’m not sure why that is, but the vast majority of people here in the U.S. say about, not aboot.

 

IN fact, most U.S. natives make fun of the Canadians for saying aboot.

 

 

 

Richard

"He that cannot forgive others breaks the bridge over which he must pass himself,” and we forget that only grace can break the cycle of ancient hatreds among peoples. (It is notable that while I have regretted not granting grace to others, I’ve never once regretted extending it.)" - Edward Herbert

 

From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of David Diamond
Sent: Monday, September 21, 2020 1:14 PM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

 

I was chatting with someone from New Zealand and she told me some of her compatriots were mimicking the  U S accent. Thus it is not just the screen reader voices, it is Different nations voices.  Example, apparently Canadians and United States persons say aboot instead of about, according to the woman in N Z.   

 

From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of Brian Vogel
Sent: September 21, 2020 9:26 AM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

 

On Sun, Sep 20, 2020 at 11:20 PM, JM Casey wrote:

and this "uncanny valley" aspect is probably already nonexistent for some people.

-
I'd be one of those people, at least for certain voices under certain synthesizers.

It also really depends on just precisely what is being said.  There are voices that, to me, are "virtual perfection" in mimicking human speech until you get to one specific word that's seldom used or an inflection.  But even then, what sounds "normal" to me may very well sound "weird" to someone else.  One experiences that sensation quite often when listening to different human speakers.  (And I'm ignoring "as a second language" issues and regional accents for that sensation.)
 
--

Brian - Windows 10 Pro, 64-Bit, Version 2004, Build 19041  

The purpose of education is not to validate ignorance but to overcome it.
       ~ Lawrence Krauss


David Diamond
 

I think it is the way they are hearing words with o u in them.  Last night she was telling me about a U S gentleman that pronounced house as huse.  Thus, she wanted me to pronounce it the same way.  People are quirky at times.  ?A lady in the UK told me she did not want to talk to me via the phone because she got hurt by a man in the U S and couldn’t stand to hear another U S voice.  I always thought Canadians had a different sounding voice then the U S? 

 

From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of Glenn / Lenny
Sent: September 21, 2020 5:11 PM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

 

I'm in the U.S. and I've never even heard that used before.

I live in the mid-west.

Glenn

----- Original Message -----

From: JM Casey

Sent: Monday, September 21, 2020 4:56 PM

Subject: Re: synthesizer versus voice

 

Hahah…it’s all relative; Canadians don’t say “aboot” either.

 

 

 

From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of Richard Turner
Sent: September 21, 2020 5:15 PM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

 

Sorry, but people in the United states do not say “aboot” unless they happen to live very close to the Canadian border.

I’m not sure why that is, but the vast majority of people here in the U.S. say about, not aboot.

 

IN fact, most U.S. natives make fun of the Canadians for saying aboot.

 

 

 

Richard

"He that cannot forgive others breaks the bridge over which he must pass himself,” and we forget that only grace can break the cycle of ancient hatreds among peoples. (It is notable that while I have regretted not granting grace to others, I’ve never once regretted extending it.)" - Edward Herbert

 

From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of David Diamond
Sent: Monday, September 21, 2020 1:14 PM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

 

I was chatting with someone from New Zealand and she told me some of her compatriots were mimicking the  U S accent. Thus it is not just the screen reader voices, it is Different nations voices.  Example, apparently Canadians and United States persons say aboot instead of about, according to the woman in N Z.   

 

From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of Brian Vogel
Sent: September 21, 2020 9:26 AM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

 

On Sun, Sep 20, 2020 at 11:20 PM, JM Casey wrote:

and this "uncanny valley" aspect is probably already nonexistent for some people.

-
I'd be one of those people, at least for certain voices under certain synthesizers.

It also really depends on just precisely what is being said.  There are voices that, to me, are "virtual perfection" in mimicking human speech until you get to one specific word that's seldom used or an inflection.  But even then, what sounds "normal" to me may very well sound "weird" to someone else.  One experiences that sensation quite often when listening to different human speakers.  (And I'm ignoring "as a second language" issues and regional accents for that sensation.)
 
--

Brian - Windows 10 Pro, 64-Bit, Version 2004, Build 19041  

The purpose of education is not to validate ignorance but to overcome it.
       ~ Lawrence Krauss


Glenn / Lenny
 


Sometimes, as odd as it may sound, people's voice sounds right for their voice.
Just as sometimes people's appearance match their name.
 

----- Original Message -----
Sent: Tuesday, September 22, 2020 12:09 PM
Subject: Re: synthesizer versus voice

Some of the voices recorded by humans, sound a little off too.  One of the female U S voices sounds like she is having a bad day all the time and needs to chill out as they say.  I guess it all comes down to what you prefer. I heard an interview of the woman who does the Australian voice, Karen and she went into great detail as to what was involved in creating that screen reader voice for JAWS or IOS devices. The voice got named after her real name, Karen.    

 

From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of Pastor Gil Pries
Sent: September 21, 2020 3:44 PM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

 

Eliquence sounds like it has a cold sometimes.

 

Pastor Gil

 

From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of Brian Vogel
Sent: Monday, September 21, 2020 3:32 PM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

 

On Mon, Sep 21, 2020 at 06:10 PM, JM Casey wrote:

These more human sounding voices were not meant to be used at the fast rates many blind people listen to synthesised speech.

-
And knowing some of those blind people, I still cannot comprehend how they comprehend what they're hearing.  Clearly they do, but my head (auditory processing, in particular) reels at the speech rate that some of my clients routinely use for themselves.  I have on more than one occasion had to ask someone I was tutoring on something new to them in the screen reader to greatly reduce the speed so that I could be sure that what I expected to hear was what I was indeed hearing!
 
--

Brian - Windows 10 Pro, 64-Bit, Version 2004, Build 19041  

The purpose of education is not to validate ignorance but to overcome it.
       ~ Lawrence Krauss


David Diamond
 

Some of the voices recorded by humans, sound a little off too.  One of the female U S voices sounds like she is having a bad day all the time and needs to chill out as they say.  I guess it all comes down to what you prefer. I heard an interview of the woman who does the Australian voice, Karen and she went into great detail as to what was involved in creating that screen reader voice for JAWS or IOS devices. The voice got named after her real name, Karen.    

 

From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of Pastor Gil Pries
Sent: September 21, 2020 3:44 PM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

 

Eliquence sounds like it has a cold sometimes.

 

Pastor Gil

 

From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of Brian Vogel
Sent: Monday, September 21, 2020 3:32 PM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

 

On Mon, Sep 21, 2020 at 06:10 PM, JM Casey wrote:

These more human sounding voices were not meant to be used at the fast rates many blind people listen to synthesised speech.

-
And knowing some of those blind people, I still cannot comprehend how they comprehend what they're hearing.  Clearly they do, but my head (auditory processing, in particular) reels at the speech rate that some of my clients routinely use for themselves.  I have on more than one occasion had to ask someone I was tutoring on something new to them in the screen reader to greatly reduce the speed so that I could be sure that what I expected to hear was what I was indeed hearing!
 
--

Brian - Windows 10 Pro, 64-Bit, Version 2004, Build 19041  

The purpose of education is not to validate ignorance but to overcome it.
       ~ Lawrence Krauss


Pastor Gilbert Pries
 

I like my DECTalk.

I've used it for years.

 

Pastor Gil

 

From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of Loy
Sent: Monday, September 21, 2020 3:49 PM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

 

After 20 years with Eloquence, I still prefer it over the human sounding voices for screen reader. I have used some of the human sounding voices for reading books at a normal speed and they are getting better.

----- Original Message -----

From: JM Casey

Sent: Monday, September 21, 2020 5:56 PM

Subject: Re: synthesizer versus voice

 

Hahah…it’s all relative; Canadians don’t say “aboot” either.

 

 

 

From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of Richard Turner
Sent: September 21, 2020 5:15 PM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

 

Sorry, but people in the United states do not say “aboot” unless they happen to live very close to the Canadian border.

I’m not sure why that is, but the vast majority of people here in the U.S. say about, not aboot.

 

IN fact, most U.S. natives make fun of the Canadians for saying aboot.

 

 

 

Richard

"He that cannot forgive others breaks the bridge over which he must pass himself,” and we forget that only grace can break the cycle of ancient hatreds among peoples. (It is notable that while I have regretted not granting grace to others, I’ve never once regretted extending it.)" - Edward Herbert

 

From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of David Diamond
Sent: Monday, September 21, 2020 1:14 PM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

 

I was chatting with someone from New Zealand and she told me some of her compatriots were mimicking the  U S accent. Thus it is not just the screen reader voices, it is Different nations voices.  Example, apparently Canadians and United States persons say aboot instead of about, according to the woman in N Z.   

 

From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of Brian Vogel
Sent: September 21, 2020 9:26 AM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

 

On Sun, Sep 20, 2020 at 11:20 PM, JM Casey wrote:

and this "uncanny valley" aspect is probably already nonexistent for some people.

-
I'd be one of those people, at least for certain voices under certain synthesizers.

It also really depends on just precisely what is being said.  There are voices that, to me, are "virtual perfection" in mimicking human speech until you get to one specific word that's seldom used or an inflection.  But even then, what sounds "normal" to me may very well sound "weird" to someone else.  One experiences that sensation quite often when listening to different human speakers.  (And I'm ignoring "as a second language" issues and regional accents for that sensation.)
 
--

Brian - Windows 10 Pro, 64-Bit, Version 2004, Build 19041  

The purpose of education is not to validate ignorance but to overcome it.
       ~ Lawrence Krauss


Pastor Gilbert Pries
 

Eliquence sounds like it has a cold sometimes.

 

Pastor Gil

 

From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of Brian Vogel
Sent: Monday, September 21, 2020 3:32 PM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

 

On Mon, Sep 21, 2020 at 06:10 PM, JM Casey wrote:

These more human sounding voices were not meant to be used at the fast rates many blind people listen to synthesised speech.

-
And knowing some of those blind people, I still cannot comprehend how they comprehend what they're hearing.  Clearly they do, but my head (auditory processing, in particular) reels at the speech rate that some of my clients routinely use for themselves.  I have on more than one occasion had to ask someone I was tutoring on something new to them in the screen reader to greatly reduce the speed so that I could be sure that what I expected to hear was what I was indeed hearing!
 
--

Brian - Windows 10 Pro, 64-Bit, Version 2004, Build 19041  

The purpose of education is not to validate ignorance but to overcome it.
       ~ Lawrence Krauss


Pastor Gilbert Pries
 

I still like my DECtalk USB.

Pastor Gil

-----Original Message-----
From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of JM Casey
Sent: Monday, September 21, 2020 3:10 PM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

I can tell you two reasons off the top of my head why many might prefer
Eloquence.
1. Its pronunciation of any english word at least in the American variant is
basically perfect.
2. it is really much better at fast speed than any of the sampled voices.
These more human sounding voices were not meant to be used at the fast rates
many blind people listen to synthesised speech. It makes the samples sound a
jumbled mess. Nevertheless I do know some people who still listen to modern
human-derived synthesised voices at fast(er) speeds.



-----Original Message-----
From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of David Diamond
Sent: September 21, 2020 12:13 AM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

Funny because some prefer eloquence over real speak from JAWS. The person
who did the Australian voice for JAWS said she had a huge manuscript the
size of a phone book to record. Also the Texas version of U S English had
slight variations. For me, the word motor sounded like murder. It could
have been my hearing disability though.

-----Original Message-----
From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of JM Casey
Sent: September 20, 2020 8:20 PM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

Cool writeup/analysis. I've no doubt we will get there, but I don't
think we're there yet -- I've heard a few top-of-the-lie commercial
voice synthesisers and to me they still haven't quite grasped the
inflection and intonations of the human voice. But they're getting
eerily close. So ..in time. And of course, all our ears are different,
too, and this "uncanny valley" aspect is probably already nonexistent
for
some people.



-----Original Message-----
From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of Orlando
Enrique Fiol via groups.io
Sent: September 20, 2020 11:10 PM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

At 09:00 PM 9/20/2020, Mark asked:
>what's the difference between a synthesizer and a voice?

A synthesizer uses electronic processes to fashion complex timbres
from acoustic or electronic sound sources. For example, a triangle
wave may be combined with clarinet samples to produce a "synthesized"
clarinet.
However, I suspect your question pertains to our text-to-speech engines.
There, the distinction between speech synthesizer and voice operates
on two levels. The synthesizer is the speech engine as a whole, while
individual voices (such as male, female, child, etc.) can be chosen.
On a deeper level, though, the difference between synthesizer and
voice rests in the sources for phonemes used by a text-to-speech
engine. With purely synthesized speech, human speech is electronically
modeled, just as digital FM synthesizers such as the Yamaha DX7
attempted to create acoustic-sounding timbres using electronic sources
rather than actual samples. There's a vital difference between trying
to make an electronic keyboard sound like a violin or banjo, and
actually recording single notes on violin or banjo in order to spread
them
out across the keyboard.
The old-fashioned speech synthesizer uses no human speech samples,
while most text-to-speech engines today do indeed use exclusively
human speech samples. That's why today's voices sound more realistic
and human; they're fashioned from recordings of human beings speaking
different words or parts of words, from which the speech engine
constructs its vocabulary libraries.
As a sidenote, this human speech sampling and modeling technology is
at the point where one can theoretically make a speech engine from
anyone's voice, which has produced some unintended byproducts. It is
now possible to create convincing audio recordings of people allegedly
saying things they never actually said. This is done by sampling
enough of their recorded speech to formulate a lexicon not only of
vocabulary, but more important, of their vocal inflections, the rises,
falls, breaths and pauses in their speech.
With this modeling technology, we soon will not know for certain
whether people have actually said what we've heard them say on audio
recordings or videos.
So, there you have it: a little primer on synthesis and sampled sound.


Orlando Enrique Fiol
Ph.D. in Music theory
University of Pennsylvania: November, 2018 Professional
Pianist/Keyboardist, Percussionist and Pedagogue Charlotte, North
Carolina










Glenn / Lenny
 

I do like Eloquence for the reasons you state, and also, I can have some
privacy without headphones, as most non-screenreader users pass it off as
noise.

----- Original Message -----
From: "JM Casey" <jmcasey@teksavvy.com>
To: <main@jfw.groups.io>
Sent: Monday, September 21, 2020 5:10 PM
Subject: Re: synthesizer versus voice


I can tell you two reasons off the top of my head why many might prefer
Eloquence.
1. Its pronunciation of any english word at least in the American variant is
basically perfect.
2. it is really much better at fast speed than any of the sampled voices.
These more human sounding voices were not meant to be used at the fast rates
many blind people listen to synthesised speech. It makes the samples sound a
jumbled mess. Nevertheless I do know some people who still listen to modern
human-derived synthesised voices at fast(er) speeds.



-----Original Message-----
From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of David Diamond
Sent: September 21, 2020 12:13 AM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

Funny because some prefer eloquence over real speak from JAWS. The person
who did the Australian voice for JAWS said she had a huge manuscript the
size of a phone book to record. Also the Texas version of U S English had
slight variations. For me, the word motor sounded like murder. It could
have been my hearing disability though.

-----Original Message-----
From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of JM Casey
Sent: September 20, 2020 8:20 PM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

Cool writeup/analysis. I've no doubt we will get there, but I don't
think we're there yet -- I've heard a few top-of-the-lie commercial
voice synthesisers and to me they still haven't quite grasped the
inflection and intonations of the human voice. But they're getting
eerily close. So ..in time. And of course, all our ears are different,
too, and this "uncanny valley" aspect is probably already nonexistent for
some people.



-----Original Message-----
From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of Orlando
Enrique Fiol via groups.io
Sent: September 20, 2020 11:10 PM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

At 09:00 PM 9/20/2020, Mark asked:
>what's the difference between a synthesizer and a voice?

A synthesizer uses electronic processes to fashion complex timbres
from acoustic or electronic sound sources. For example, a triangle
wave may be combined with clarinet samples to produce a "synthesized"
clarinet.
However, I suspect your question pertains to our text-to-speech engines.
There, the distinction between speech synthesizer and voice operates
on two levels. The synthesizer is the speech engine as a whole, while
individual voices (such as male, female, child, etc.) can be chosen.
On a deeper level, though, the difference between synthesizer and
voice rests in the sources for phonemes used by a text-to-speech
engine. With purely synthesized speech, human speech is electronically
modeled, just as digital FM synthesizers such as the Yamaha DX7
attempted to create acoustic-sounding timbres using electronic sources
rather than actual samples. There's a vital difference between trying
to make an electronic keyboard sound like a violin or banjo, and
actually recording single notes on violin or banjo in order to spread them
out across the keyboard.
The old-fashioned speech synthesizer uses no human speech samples,
while most text-to-speech engines today do indeed use exclusively
human speech samples. That's why today's voices sound more realistic
and human; they're fashioned from recordings of human beings speaking
different words or parts of words, from which the speech engine
constructs its vocabulary libraries.
As a sidenote, this human speech sampling and modeling technology is
at the point where one can theoretically make a speech engine from
anyone's voice, which has produced some unintended byproducts. It is
now possible to create convincing audio recordings of people allegedly
saying things they never actually said. This is done by sampling
enough of their recorded speech to formulate a lexicon not only of
vocabulary, but more important, of their vocal inflections, the rises,
falls, breaths and pauses in their speech.
With this modeling technology, we soon will not know for certain
whether people have actually said what we've heard them say on audio
recordings or videos.
So, there you have it: a little primer on synthesis and sampled sound.


Orlando Enrique Fiol
Ph.D. in Music theory
University of Pennsylvania: November, 2018 Professional
Pianist/Keyboardist, Percussionist and Pedagogue Charlotte, North
Carolina










Glenn / Lenny
 


I'm in the U.S. and I've never even heard that used before.
I live in the mid-west.
Glenn

----- Original Message -----
From: JM Casey
Sent: Monday, September 21, 2020 4:56 PM
Subject: Re: synthesizer versus voice

Hahah…it’s all relative; Canadians don’t say “aboot” either.

 

 

 

From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of Richard Turner
Sent: September 21, 2020 5:15 PM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

 

Sorry, but people in the United states do not say “aboot” unless they happen to live very close to the Canadian border.

I’m not sure why that is, but the vast majority of people here in the U.S. say about, not aboot.

 

IN fact, most U.S. natives make fun of the Canadians for saying aboot.

 

 

 

Richard

"He that cannot forgive others breaks the bridge over which he must pass himself,” and we forget that only grace can break the cycle of ancient hatreds among peoples. (It is notable that while I have regretted not granting grace to others, I’ve never once regretted extending it.)" - Edward Herbert

 

From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of David Diamond
Sent: Monday, September 21, 2020 1:14 PM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

 

I was chatting with someone from New Zealand and she told me some of her compatriots were mimicking the  U S accent. Thus it is not just the screen reader voices, it is Different nations voices.  Example, apparently Canadians and United States persons say aboot instead of about, according to the woman in N Z.   

 

From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of Brian Vogel
Sent: September 21, 2020 9:26 AM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

 

On Sun, Sep 20, 2020 at 11:20 PM, JM Casey wrote:

and this "uncanny valley" aspect is probably already nonexistent for some people.

-
I'd be one of those people, at least for certain voices under certain synthesizers.

It also really depends on just precisely what is being said.  There are voices that, to me, are "virtual perfection" in mimicking human speech until you get to one specific word that's seldom used or an inflection.  But even then, what sounds "normal" to me may very well sound "weird" to someone else.  One experiences that sensation quite often when listening to different human speakers.  (And I'm ignoring "as a second language" issues and regional accents for that sensation.)
 
--

Brian - Windows 10 Pro, 64-Bit, Version 2004, Build 19041  

The purpose of education is not to validate ignorance but to overcome it.
       ~ Lawrence Krauss


David Diamond
 

I suspect she was just listening to the wrong person or someone was pulling her leg.  Just like Canadians are supposed to say A all the time. At one guide dog school, since I was the only Canadian there I said, “Only low class Canadians say A.”   

 

From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of JM Casey
Sent: September 21, 2020 2:57 PM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

 

Hahah…it’s all relative; Canadians don’t say “aboot” either.

 

 

 

From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of Richard Turner
Sent: September 21, 2020 5:15 PM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

 

Sorry, but people in the United states do not say “aboot” unless they happen to live very close to the Canadian border.

I’m not sure why that is, but the vast majority of people here in the U.S. say about, not aboot.

 

IN fact, most U.S. natives make fun of the Canadians for saying aboot.

 

 

 

Richard

"He that cannot forgive others breaks the bridge over which he must pass himself,” and we forget that only grace can break the cycle of ancient hatreds among peoples. (It is notable that while I have regretted not granting grace to others, I’ve never once regretted extending it.)" - Edward Herbert

 

From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of David Diamond
Sent: Monday, September 21, 2020 1:14 PM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

 

I was chatting with someone from New Zealand and she told me some of her compatriots were mimicking the  U S accent. Thus it is not just the screen reader voices, it is Different nations voices.  Example, apparently Canadians and United States persons say aboot instead of about, according to the woman in N Z.   

 

From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of Brian Vogel
Sent: September 21, 2020 9:26 AM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

 

On Sun, Sep 20, 2020 at 11:20 PM, JM Casey wrote:

and this "uncanny valley" aspect is probably already nonexistent for some people.

-
I'd be one of those people, at least for certain voices under certain synthesizers.

It also really depends on just precisely what is being said.  There are voices that, to me, are "virtual perfection" in mimicking human speech until you get to one specific word that's seldom used or an inflection.  But even then, what sounds "normal" to me may very well sound "weird" to someone else.  One experiences that sensation quite often when listening to different human speakers.  (And I'm ignoring "as a second language" issues and regional accents for that sensation.)
 
--

Brian - Windows 10 Pro, 64-Bit, Version 2004, Build 19041  

The purpose of education is not to validate ignorance but to overcome it.
       ~ Lawrence Krauss


Maria Campbell
 

Agree about Eloquence still being the best for me, though synths are getting better.


Maria Campbell
lucky1inct@...

All that is necessary for evil to triumph is for good people to do nothing.
--Edmund Burke
On 9/21/2020 6:49 PM, Loy wrote:


After 20 years with Eloquence, I still prefer it over the human sounding voices for screen reader. I have used some of the human sounding voices for reading books at a normal speed and they are getting better.
----- Original Message -----
From: JM Casey
Sent: Monday, September 21, 2020 5:56 PM
Subject: Re: synthesizer versus voice

Hahah…it’s all relative; Canadians don’t say “aboot” either.

 

 

 

From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of Richard Turner
Sent: September 21, 2020 5:15 PM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

 

Sorry, but people in the United states do not say “aboot” unless they happen to live very close to the Canadian border.

I’m not sure why that is, but the vast majority of people here in the U.S. say about, not aboot.

 

IN fact, most U.S. natives make fun of the Canadians for saying aboot.

 

 

 

Richard

"He that cannot forgive others breaks the bridge over which he must pass himself,” and we forget that only grace can break the cycle of ancient hatreds among peoples. (It is notable that while I have regretted not granting grace to others, I’ve never once regretted extending it.)" - Edward Herbert

 

From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of David Diamond
Sent: Monday, September 21, 2020 1:14 PM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

 

I was chatting with someone from New Zealand and she told me some of her compatriots were mimicking the  U S accent. Thus it is not just the screen reader voices, it is Different nations voices.  Example, apparently Canadians and United States persons say aboot instead of about, according to the woman in N Z.   

 

From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of Brian Vogel
Sent: September 21, 2020 9:26 AM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

 

On Sun, Sep 20, 2020 at 11:20 PM, JM Casey wrote:

and this "uncanny valley" aspect is probably already nonexistent for some people.

-
I'd be one of those people, at least for certain voices under certain synthesizers.

It also really depends on just precisely what is being said.  There are voices that, to me, are "virtual perfection" in mimicking human speech until you get to one specific word that's seldom used or an inflection.  But even then, what sounds "normal" to me may very well sound "weird" to someone else.  One experiences that sensation quite often when listening to different human speakers.  (And I'm ignoring "as a second language" issues and regional accents for that sensation.)
 
--

Brian - Windows 10 Pro, 64-Bit, Version 2004, Build 19041  

The purpose of education is not to validate ignorance but to overcome it.
       ~ Lawrence Krauss


David Diamond
 

These more human sounding voices were not meant to be used at the fast rates many blind people listen to synthesised speech.  This was the exact reason why some blind persons, not me, prefer eloquence over the more human sounding voices.  Myself, listening to sped up speech via eloquence then  a person talking to me, as in a family member, is like the equivalent of going 50 miles per hour then slamming on the brake and going in reverse.  Sorry if that does not make sense.  I equate it to brain whiplash. 

 

 

From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of Brian Vogel
Sent: September 21, 2020 3:32 PM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

 

On Mon, Sep 21, 2020 at 06:10 PM, JM Casey wrote:

These more human sounding voices were not meant to be used at the fast rates many blind people listen to synthesised speech.

-
And knowing some of those blind people, I still cannot comprehend how they comprehend what they're hearing.  Clearly they do, but my head (auditory processing, in particular) reels at the speech rate that some of my clients routinely use for themselves.  I have on more than one occasion had to ask someone I was tutoring on something new to them in the screen reader to greatly reduce the speed so that I could be sure that what I expected to hear was what I was indeed hearing!
 
--

Brian - Windows 10 Pro, 64-Bit, Version 2004, Build 19041  

The purpose of education is not to validate ignorance but to overcome it.
       ~ Lawrence Krauss


Loy
 


After 20 years with Eloquence, I still prefer it over the human sounding voices for screen reader. I have used some of the human sounding voices for reading books at a normal speed and they are getting better.

----- Original Message -----
From: JM Casey
Sent: Monday, September 21, 2020 5:56 PM
Subject: Re: synthesizer versus voice

Hahah…it’s all relative; Canadians don’t say “aboot” either.

 

 

 

From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of Richard Turner
Sent: September 21, 2020 5:15 PM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

 

Sorry, but people in the United states do not say “aboot” unless they happen to live very close to the Canadian border.

I’m not sure why that is, but the vast majority of people here in the U.S. say about, not aboot.

 

IN fact, most U.S. natives make fun of the Canadians for saying aboot.

 

 

 

Richard

"He that cannot forgive others breaks the bridge over which he must pass himself,” and we forget that only grace can break the cycle of ancient hatreds among peoples. (It is notable that while I have regretted not granting grace to others, I’ve never once regretted extending it.)" - Edward Herbert

 

From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of David Diamond
Sent: Monday, September 21, 2020 1:14 PM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

 

I was chatting with someone from New Zealand and she told me some of her compatriots were mimicking the  U S accent. Thus it is not just the screen reader voices, it is Different nations voices.  Example, apparently Canadians and United States persons say aboot instead of about, according to the woman in N Z.   

 

From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of Brian Vogel
Sent: September 21, 2020 9:26 AM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

 

On Sun, Sep 20, 2020 at 11:20 PM, JM Casey wrote:

and this "uncanny valley" aspect is probably already nonexistent for some people.

-
I'd be one of those people, at least for certain voices under certain synthesizers.

It also really depends on just precisely what is being said.  There are voices that, to me, are "virtual perfection" in mimicking human speech until you get to one specific word that's seldom used or an inflection.  But even then, what sounds "normal" to me may very well sound "weird" to someone else.  One experiences that sensation quite often when listening to different human speakers.  (And I'm ignoring "as a second language" issues and regional accents for that sensation.)
 
--

Brian - Windows 10 Pro, 64-Bit, Version 2004, Build 19041  

The purpose of education is not to validate ignorance but to overcome it.
       ~ Lawrence Krauss


 

On Mon, Sep 21, 2020 at 06:10 PM, JM Casey wrote:
These more human sounding voices were not meant to be used at the fast rates many blind people listen to synthesised speech.
-
And knowing some of those blind people, I still cannot comprehend how they comprehend what they're hearing.  Clearly they do, but my head (auditory processing, in particular) reels at the speech rate that some of my clients routinely use for themselves.  I have on more than one occasion had to ask someone I was tutoring on something new to them in the screen reader to greatly reduce the speed so that I could be sure that what I expected to hear was what I was indeed hearing!
 
--

Brian - Windows 10 Pro, 64-Bit, Version 2004, Build 19041  

The purpose of education is not to validate ignorance but to overcome it.
       ~ Lawrence Krauss


JM Casey
 

I can tell you two reasons off the top of my head why many might prefer
Eloquence.
1. Its pronunciation of any english word at least in the American variant is
basically perfect.
2. it is really much better at fast speed than any of the sampled voices.
These more human sounding voices were not meant to be used at the fast rates
many blind people listen to synthesised speech. It makes the samples sound a
jumbled mess. Nevertheless I do know some people who still listen to modern
human-derived synthesised voices at fast(er) speeds.

-----Original Message-----
From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of David Diamond
Sent: September 21, 2020 12:13 AM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

Funny because some prefer eloquence over real speak from JAWS. The person
who did the Australian voice for JAWS said she had a huge manuscript the
size of a phone book to record. Also the Texas version of U S English had
slight variations. For me, the word motor sounded like murder. It could
have been my hearing disability though.

-----Original Message-----
From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of JM Casey
Sent: September 20, 2020 8:20 PM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

Cool writeup/analysis. I've no doubt we will get there, but I don't
think we're there yet -- I've heard a few top-of-the-lie commercial
voice synthesisers and to me they still haven't quite grasped the
inflection and intonations of the human voice. But they're getting
eerily close. So ..in time. And of course, all our ears are different,
too, and this "uncanny valley" aspect is probably already nonexistent for
some people.



-----Original Message-----
From: main@jfw.groups.io <main@jfw.groups.io> On Behalf Of Orlando
Enrique Fiol via groups.io
Sent: September 20, 2020 11:10 PM
To: main@jfw.groups.io
Subject: Re: synthesizer versus voice

At 09:00 PM 9/20/2020, Mark asked:
>what's the difference between a synthesizer and a voice?

A synthesizer uses electronic processes to fashion complex timbres
from acoustic or electronic sound sources. For example, a triangle
wave may be combined with clarinet samples to produce a "synthesized"
clarinet.
However, I suspect your question pertains to our text-to-speech engines.
There, the distinction between speech synthesizer and voice operates
on two levels. The synthesizer is the speech engine as a whole, while
individual voices (such as male, female, child, etc.) can be chosen.
On a deeper level, though, the difference between synthesizer and
voice rests in the sources for phonemes used by a text-to-speech
engine. With purely synthesized speech, human speech is electronically
modeled, just as digital FM synthesizers such as the Yamaha DX7
attempted to create acoustic-sounding timbres using electronic sources
rather than actual samples. There's a vital difference between trying
to make an electronic keyboard sound like a violin or banjo, and
actually recording single notes on violin or banjo in order to spread them
out across the keyboard.
The old-fashioned speech synthesizer uses no human speech samples,
while most text-to-speech engines today do indeed use exclusively
human speech samples. That's why today's voices sound more realistic
and human; they're fashioned from recordings of human beings speaking
different words or parts of words, from which the speech engine
constructs its vocabulary libraries.
As a sidenote, this human speech sampling and modeling technology is
at the point where one can theoretically make a speech engine from
anyone's voice, which has produced some unintended byproducts. It is
now possible to create convincing audio recordings of people allegedly
saying things they never actually said. This is done by sampling
enough of their recorded speech to formulate a lexicon not only of
vocabulary, but more important, of their vocal inflections, the rises,
falls, breaths and pauses in their speech.
With this modeling technology, we soon will not know for certain
whether people have actually said what we've heard them say on audio
recordings or videos.
So, there you have it: a little primer on synthesis and sampled sound.


Orlando Enrique Fiol
Ph.D. in Music theory
University of Pennsylvania: November, 2018 Professional
Pianist/Keyboardist, Percussionist and Pedagogue Charlotte, North
Carolina