Return to the archive index

Re: Linux Voice to Text

From: "Ben Houston" <>
Date: Fri, 4 Jun 1999 16:02:28 -0400

Tristram,

Cool idea.  Althought, it is slower for a human to do the translation from
phonemes to coherent words that it is just to read correctly represented
words -- if you understand what I mean.  I bet you can read this phrase
better th an u c an r ee d th es w un.  But that fact aside...

Did you know that they already use spectrographs to help teach deaf people
to learn to speak?  The spectrograph gives a form of visual feedback to a
speaker.  A deaf person can see how they are speaking and compare it against
another in the visual domain.

I once did a presentation in my linguistics class on how speech recognition
systems work.  Afterwards my professors suggested that speech recognition
systems could be used in foriegn language instruction.  It sounds too
logical to not exist in a research lab -- thus I bet there is already
research in this domain?

Cheers,
-ben houston
http://chat.carleton.ca/~bhouston

----- Original Message -----
From: Tristram W. Metcalfe III <>
To: <>; <>;
<>; <>
Sent: Friday, June 04, 1999 2:35 AM
Subject: Re: Linux Voice to Text

I hope this is a potential to SIMPLIFY your key question <voice to text>,
additionally there may be benefits to lighting the ram load / etc. in voice.

What if,, (inserting my ignorance here) within the first step in the Speech
To Text
task you where to add *Words* to the IBM / Linux / other? data base that
were
Simply the *Phonemes*,, the basic 44+ new *Words* we use all the time, that
already
are in the front end of the recognition phase.

If so then the actual human voice recognition speed, might,, go (way?) up
while
offering a "True data record" (for any later retrieval apps) of what
actually was
said avoiding the mis spelled, mis construed, mis judged "recognition" that
slows
and frustrates the state of the art STT 'training' in ASR software.

My short presentation to ddlinux of "Phonetic Text Display" had found a
problem
pointed out by Sean Wheeler at mit Speech Interfaces (attached in case not
on
list);
where he said phoneme recognition is improved by its bundling into larger
words,
that a context, or the best voiced (easiest recognized) phonemes may then
lead to
that particular unrec'd phoneme etc., to better grasp, i.e. 'separate
phonemes may
be harder to recognize'. This Phonetic Text Display should not slow or
disable the
current word rec. process timing, that once done is displayed (as per the
users
wishes).

My (vacuous?) thoughts are that basically this may not be that large an
issue, we
simply could insert generic blank text phoneme/words. Similar to flat-grey
blanks
in closed captioning. These P.T.D. blanks could additionally show (prior to
their
true recognition) volume, pitch, timbre stress displaying 'real time'ing
pauses
etc., aiding the user's guess, while the better spelled recognition happens
down
the ('split second') road, where context recognition can edit the real time
that is
happening up front. This leaves a TRUE record of actually what was said.

This all would greatly benefit a particular very specific community of
millions of
users, if it could work in real time (using friendly wc/hmds!!). The deaf
and hard
of hearing would then have access to the total world of actual truly
accented
spoken human speech languages.

This additionally is of great value to those who have been largely cut off
from the
human sound stream, thus getting no instant feed back to self correct ones
voice.
Learning lip and sign are the best communication to date, but real time
subjective
voice feedback would allow their own voice to stay fine tuned for their
objective
speech to and from the rest of the world.

The hope is that such a 'streamlined?' IBM voice I/O such as this could
benefit
all. The reading of Phonetic text may have other voice language teaching
benefits.
It would help the speed &/or reading by aiding separation of phonetic sound
events
by adding characters to the alphabet as found in doubled up letters, i.e.
the
familiar <ae> we see written as <one joined character> (not possible in
html?).

Enough rant,
Sorry if this is too unfeasible,

h ah v ah g oo d w ee k e n d     l oo k s l ie k i t s s s s  n ah t g u n
ah r ae
n
<seconds.later>
have a good weekend (it) looks like its not going to rain!

The learning curve on reading it could be quick if you can't wait for the
state of
the art to catch up to you!.

fww
tris

Rusty Foster wrote:

> On Thu, 27 May 1999, Ansel Sermersheim wrote:
> > >>>>> "Tim" == Tim Gray <> writes:
> >
> > > check out www.zachary.com/creemer/xvoice.html It sounds like we
> > > might have a V2T app for wearables that might actually = work! it
> > > when one uses the IBM viavoice SDK for linux.
>
>         I also d/l'd this, but I haven't gotten to actually try it yet
(waiting
> on a good microphone). I did start it up and click the buttons, though :-)
>
>         Has anyone considered the possibility of hacking one of IBM's
demos
> into a simple "black box" kind of interface? I don't know any C++, so I'm
kind
> of helpless at the moment to do it myself, but what I have in mind is
basically
> just a console app that takes in some audio and spits back some text. This
> could be glued into, say, a perl interface that then does stuff based on
the
> text (this part I can do!). Perhaps it could be a simple daemon that
listens
> for mic input, and when it gets some, converts it to text and sends it out
on
> port xxx. Then you write yourself an interface that acts like a client,
> connects to port whatever, and does something when it receives text from
the
> socket. Whammo-- you have a voice-shell.
>
>         I have some more concrete ideas on how this shell could be set up,
but
> it's all pretty academic until I can find a way to convert the voice into
text.
> (C'mon, isn't that the easy part? ;-)). Anyone have any suggestions?
>
> -Rusty
>
> --
> Subcription/unsubscription/info requests: send e-mail with subject of
> "subscribe", "unsubscribe", or "info" to 
> Wear-Hard Mailing List Archive (searchable): http://wearables.blu.org

--
ÐÏࡱá

--
Subcription/unsubscription/info requests: send e-mail with subject of
"subscribe", "unsubscribe", or "info" to 
Wear-Hard Mailing List Archive (searchable): http://wearables.blu.org

--
Subcription/unsubscription/info requests: send e-mail with subject of
"subscribe", "unsubscribe", or "info" to 
Wear-Hard Mailing List Archive (searchable): http://wearables.blu.org

+Previous Message in Thread | Next Message in Thread

From Wear-Hard Mailing list Archive (WH)
Maintained by R. Paul McCarty

Archive created with babymail