I hope this is a potential to SIMPLIFY your key question <voice to text>,
additionally there may be benefits to lighting the ram load / etc. in voice.
What if,, (inserting my ignorance here) within the first step in the Speech To Text
task you where to add *Words* to the IBM / Linux / other? data base that were
Simply the *Phonemes*,, the basic 44+ new *Words* we use all the time, that already
are in the front end of the recognition phase.
If so then the actual human voice recognition speed, might,, go (way?) up while
offering a "True data record" (for any later retrieval apps) of what actually was
said avoiding the mis spelled, mis construed, mis judged "recognition" that slows
and frustrates the state of the art STT 'training' in ASR software.
My short presentation to ddlinux of "Phonetic Text Display" had found a problem
pointed out by Sean Wheeler at mit Speech Interfaces (attached in case not on
list);
where he said phoneme recognition is improved by its bundling into larger words,
that a context, or the best voiced (easiest recognized) phonemes may then lead to
that particular unrec'd phoneme etc., to better grasp, i.e. 'separate phonemes may
be harder to recognize'. This Phonetic Text Display should not slow or disable the
current word rec. process timing, that once done is displayed (as per the users
wishes).
My (vacuous?) thoughts are that basically this may not be that large an issue, we
simply could insert generic blank text phoneme/words. Similar to flat-grey blanks
in closed captioning. These P.T.D. blanks could additionally show (prior to their
true recognition) volume, pitch, timbre stress displaying 'real time'ing pauses
etc., aiding the user's guess, while the better spelled recognition happens down
the ('split second') road, where context recognition can edit the real time that is
happening up front. This leaves a TRUE record of actually what was said.
This all would greatly benefit a particular very specific community of millions of
users, if it could work in real time (using friendly wc/hmds!!). The deaf and hard
of hearing would then have access to the total world of actual truly accented
spoken human speech languages.
This additionally is of great value to those who have been largely cut off from the
human sound stream, thus getting no instant feed back to self correct ones voice.
Learning lip and sign are the best communication to date, but real time subjective
voice feedback would allow their own voice to stay fine tuned for their objective
speech to and from the rest of the world.
The hope is that such a 'streamlined?' IBM voice I/O such as this could benefit
all. The reading of Phonetic text may have other voice language teaching benefits.
It would help the speed &/or reading by aiding separation of phonetic sound events
by adding characters to the alphabet as found in doubled up letters, i.e. the
familiar <ae> we see written as <one joined character> (not possible in html?).
Enough rant,
Sorry if this is too unfeasible,
h ah v ah g oo d w ee k e n d l oo k s l ie k i t s s s s n ah t g u n ah r ae
n
<seconds.later>
have a good weekend (it) looks like its not going to rain!
The learning curve on reading it could be quick if you can't wait for the state of
the art to catch up to you!.
fww
tris
Rusty Foster wrote:
> On Thu, 27 May 1999, Ansel Sermersheim wrote:
> > >>>>> "Tim" == Tim Gray <
> writes:
> >
> > > check out www.zachary.com/creemer/xvoice.html It sounds like we
> > > might have a V2T app for wearables that might actually = work! it
> > > when one uses the IBM viavoice SDK for linux.
>
> I also d/l'd this, but I haven't gotten to actually try it yet (waiting
> on a good microphone). I did start it up and click the buttons, though :-)
>
> Has anyone considered the possibility of hacking one of IBM's demos
> into a simple "black box" kind of interface? I don't know any C++, so I'm kind
> of helpless at the moment to do it myself, but what I have in mind is basically
> just a console app that takes in some audio and spits back some text. This
> could be glued into, say, a perl interface that then does stuff based on the
> text (this part I can do!). Perhaps it could be a simple daemon that listens
> for mic input, and when it gets some, converts it to text and sends it out on
> port xxx. Then you write yourself an interface that acts like a client,
> connects to port whatever, and does something when it receives text from the
> socket. Whammo-- you have a voice-shell.
>
> I have some more concrete ideas on how this shell could be set up, but
> it's all pretty academic until I can find a way to convert the voice into text.
> (C'mon, isn't that the easy part? ;-)). Anyone have any suggestions?
>
> -Rusty
>
> --
> Subcription/unsubscription/info requests: send e-mail with subject of
> "subscribe", "unsubscribe", or "info" to
> Wear-Hard Mailing List Archive (searchable): http://wearables.blu.org
--
ÐÏࡱá
--
Subcription/unsubscription/info requests: send e-mail with subject of
"subscribe", "unsubscribe", or "info" to
Wear-Hard Mailing List Archive (searchable): http://wearables.blu.org
From Wear-Hard Mailing list Archive (WH)
Maintained by R. Paul McCarty
Archive created with babymail