STT--Speech To Text

Using applications, configuring, problems
Message
Author
User avatar
Announcer
Posts: 151
Joined: Tue 03 Jan 2012, 12:26

#21 Post by Announcer »

Those are instructions from Klaus Knopper for getting Adriane running on Puppy, if anyone is interested.

User avatar
smokey01
Posts: 2813
Joined: Sat 30 Dec 2006, 23:15
Location: South Australia :-(
Contact:

#22 Post by smokey01 »

CatDude and I are slowly working through it.

I got 49% through the make process before it crashed.

As many have said before, this is a slow difficult process.

Need to sleep now............................ZZZzzzzzzzzzzz.....

User avatar
H4LF82
Posts: 123
Joined: Tue 02 Oct 2012, 04:22

#23 Post by H4LF82 »

Smokey01, YOU are my HERO!! A GOD among MEN.

You take ALL the time in the world if that is what you need.

If I can do ANYTHING to help...I am eager and UN-attached to any other project at present. Tell me how and what to do and Ill be your lackey!

Otherwise get some rest. When this is all over you should give me the number of your local pub so I can order you a pint :D

Cheers mate! NICE one :D
"The wise know their weakness too well to assume infallibility; and he who knows most, knows best how little he knows." - Thomas Jefferson

User avatar
Ted Dog
Posts: 3965
Joined: Wed 14 Sep 2005, 02:35
Location: Heart of Texas

#24 Post by Ted Dog »

H4LF82 wrote:Thank you Announcer, but again, no thank you.

Im not turning my back on Puppy now just to go over to Knoppix and start the whole learning curve all over again--I just cannot bring myself to. Ive had YEARS to learn how to use puppy and I have grown comfortable with it...despite its shortcomings (of which there are few and they are far between). Speech-to-text and the lack thereof is my one and only tiny complaint worthy of being mentioned...

Smokey01 is getting the simon package compiled for me even as I type this, and Ive got to tell you, I have the faith of a mustard seed in his abilities. He seems to think it is do-able and he should know.

He compiles software in his sleep.

So when smokey01 says "Forget it man. Get Knoppix or else suck it up and deal!", THEN I might consider it.

But at this point, you might as well be telling me to switch back to windows "because it has Nuance Dragon Speaking".
Actually Speech to text and computer control is available on Win8 standard install, was playing with it earlier, does OK with windows apps but since most programs I run are still open source and oddly unsupported.... Just to put it out there.. not saying to switch to windows j8 ust yet. Took about two hours of training to get it to work 1 out of 4 times. :cry: could be my Texas accent!! or still crappy.

User avatar
H4LF82
Posts: 123
Joined: Tue 02 Oct 2012, 04:22

#25 Post by H4LF82 »

i cannot speak for win8, ted dog, as i have no experience with it.

i do however concur with you about it being 'still crappy'. that has been my experience...

2 hours huh? i trained Dragon for 8 hours a day for nearly a week...it worked 6 words out of 10--it actually did better untrained. i will eat my own socks before i go back to windows...Image
"The wise know their weakness too well to assume infallibility; and he who knows most, knows best how little he knows." - Thomas Jefferson

User avatar
Ted Dog
Posts: 3965
Joined: Wed 14 Sep 2005, 02:35
Location: Heart of Texas

#26 Post by Ted Dog »

Yes not time to switch ... the UI on Win8 is still a horrible mess. I was preparing a recovery flashdrive inorder to downsize win8 to yet a smaller-older HD (60GB) and free up the mid-sized harddrive for better use.
I saw it had speech control and I needed to keep the win8 alive by use, it turned itself off/hibernate while on A/C power after getting over 50% copied to recovery flash the first try. I got this new laptop for puppylinux/ other linux use, Made sure it still allowed boot from BIOS mode.
But at least win8 gives me something to complain about, I've been upset about long existing quirks in puppylinux, so trying Windows again after a few years help me regain perspective about really irritating quirks now, puppylinux quirks seems so much less...

User avatar
Smithy
Posts: 1151
Joined: Mon 12 Dec 2011, 11:17

#27 Post by Smithy »

So what are your expectations from the Linux engine(s) H4LF82?

Seems to me that Jack might be a good front end for the audio, then you can get the latency down. If that is important (text appearing on screen)?

I agree that it would be great for puppy to have STT TTS as an option.

As an experiment:
I've managed to get speech training going in wine, seems to be doing it's thing, listening to how you pronounce the words (never used it before).
And the sapi voices are talking in wine control panel/ speech.
That's as far as I have got for now. This is billy gates stuff, not open source.


I agree that it would be great for puppy to have STT TTS as an option.
Last edited by Smithy on Sun 02 Jun 2013, 14:20, edited 1 time in total.

User avatar
Ted Dog
Posts: 3965
Joined: Wed 14 Sep 2005, 02:35
Location: Heart of Texas

#28 Post by Ted Dog »

Agree with Smithy, this would be a great addition to puppy, already have a puppy spin with On Screen keyboard and one button control working at level or better than the commercial and expensive software.
Played with Text to Speech a few years back, and open source software is as good or better than commercial ( I guess ) actually haven't played with non-linux T2S since my Amiga days and MSDOS.

User avatar
H4LF82
Posts: 123
Joined: Tue 02 Oct 2012, 04:22

#29 Post by H4LF82 »

So what are your expectations from the Linux engine(s) H4LF82?
What do I expect?

well, I suppose I expect the following...

let us take today as a bright shiny example. I have awakened after sleeping in late on my Sunday, and like most days, the curtains/shades are drawn to keep the mid-day light out, the lights are all off, and before I can see the screen on the computer I have just turned on, I must grab my "Solar Shields".

Image

These I must place over my regular sunglasses.

Image

once both pairs have been located in the dark and put on one over the other, I am ready to turn on the computer and check the news and my morning email while sipping a fresh cup of tea...

Honestly...I know how to turn the computer ON without witnessing the event with my own eyes, but the whole sunglasses ritual I could do without begins when I turn the machine on because the "Jesus Light" from the screen is illuminating the room, and I NEED the screen....

The sunglasses are necessary because without them I cannot see what needs clicking on said screen--I can click the reader to read out loud any news story that catches my interest, but before I can determine what story I want to play catch with, I have to know what they are about.

So, "Google News" is a fine example to use here.

On this typical Sunday, I want to check Google News to hear all about the tornadoes pummeling the "bubba belt". So I go to my computer, which is now on and online and running happily waiting for me, and i grab my mouse, squint my double-glazed eyes, and brace myself for the screen input so I can find the browser icon and click it. I open my eyes slightly.

immediately I am lashed in the face with laser-beams of all colors. Its like a camera flash rave party psychedelic strobe supernova at first, only more intense, and just as blinding and disorienting. Its a very fortunate thing I am sitting down. Beneath the glasses my eyes must slowly adjust to the relentless onslaught that is my computer monitor, and after a minute, they do relent ... slightly. I can finally see to discern the browser button, and I click it.

The browser opens and the Google homepage is displayed within 0.19 seconds in a burst of blues and yellows and reds and black glowing text. a quick lean into the screen to discern the word NEWS from all the others sprawled out there across the top of the page and I am ready to look at the news. Puppy responds like a dream. The page loads instantly. The speed is unexampled...

a quick scan of the front page tells me there is nothing about bubba-belt tornadoes, so I have to search. click the mouse into the search window, head down, glasses off, now we wait for our eyes to adjust back to the darkness to be able to see the keyboard again so I can type "bubba-belt".

"Now wait a minute!" you say..."cant you type without looking? Blind or not, its a basic computer skill, typing is."

Sure I can type without looking. the keyboard is in my lap, but the mouse is in my hand, and so only one hand is on the keyboard, sorta. the F and the J key both have little nubs on them to help me find them easily, but they are tiny nubs and easy to miss and I am fighting the big black dot now burned into my retina...

take a professional camera flash with new batteries and point it directly at your face, and flash it with your eyes open 3 inches from your nose over and over about 20 times and then try and tie your shoes or type your name....

...I dare you.

i either end up typing correctly, or else I end up typing something like 5708htnw9j3n wt708e crq0. because my fingers were in the wrong place.

bubba belt typed in, or at least 'i think' that is what it is, i must now put the glasses back on and brace myself for the visual onslaught that is the computer screen so that I can check the screen for the correct input and confirm that I did indeed just type what I think I just typed....

The TTS reader would read my keyboard input as I type it if I enabled that option, and one day I may have to, but having it on seems like having a duck with one leg swimming around in a circle, being only half of a two-part system; TTS and STT. and since I already have to open my eyes to click with the mouse anyway, it seems a little pointless. If only I could eliminate the need for the mouse...

so I expect that STT will give me the option of saying "Open the Browser" and never needing the sunglasses on in the first place to click the icon...whereupon the browser can open and i can say "Go to Google News" and once again it can open the appropriate page without me ever having to see a thing. I would enable the TTS if this WERE the case so that I could have the audible confirmation that the computer was doing what I wanted. when i say "Open the Browser" I expect the TTS reader will say "opening Seamonkey Web Browser home page" and when i say "Go to Google News" then the TTS will confirm "opening Google news" and so on...

In this capacity, the TTS meta-morphs from an annoying little troll

Image

that repeats useless information two seconds too late to be useful, and into a beautiful butterfly

Image

that you depend on to guide you. I expect STT to be the left hand that TTS engine right hand has been trudging along without...a way for me to "speak" what I would otherwise need both hands planted firmly on the keyboard AND mouse for....

I understand that I will likely get something slightly less than what I expect, and I will have to make adjustments to fit my individual needs, but ultimately...that is where my expectations lay. Somewhere between "Thing from the Addams Family", and "The Invisible Man." One hand tied behind my back versus both hands out in front of me.

I expect to be able to drink my morning tea and read the news without playing up-wait-down-wait-up-wait-down-wait every 45 seconds with 2 pairs of sunglasses---at a minimum.

In a perfect example, I would expect to forgo the keyboard and mouse entirely just by saying click this, go there, play so-and-so, and so on....

So do you think I expect too much or are my expectations par for this course?
"The wise know their weakness too well to assume infallibility; and he who knows most, knows best how little he knows." - Thomas Jefferson

User avatar
Smithy
Posts: 1151
Joined: Mon 12 Dec 2011, 11:17

#30 Post by Smithy »

So it is like command and response technology. Full duplex definitely.
A marriage of STT and TTS.

God knows batch processing has been around for years but how many times do you find yourself having to copy things over individually, bunches of files and things. Because it's too much of a pain to setup a batch process and sometimes quicker just to transfer manually....

I was approaching the question of your expectations from a purely logical point of view. I don't know how far the technology has progressed, but it's interesting to hear about it from a users' point of view, what it can do, what it can't.

For all I know, an airline pilot could be sat on his ass reading Harry Potter and just shouting "Land now please" at the right time. I guess it wouldn't have been able to perform the magic that the hero pilot did when he managed to land on the Hudson River tho'.

So the idea is: voice goes into the engine, converts to command, then gives you aural feedback that it has done the job. And it also reads the text on the website, wonder if it can ignore all the markup stuff.
I guess you could setup macro words that are more easily understood by the voice engine for sites that one would visit regularly?

User avatar
mmmrr
Posts: 184
Joined: Tue 03 Mar 2009, 05:26
Location: vancouver island, canada

short cut voice commands

#31 Post by mmmrr »

hello, all

is shortcut voice commands the same thing as macro words?
such as an easily recognised word, 'easy' for example , combined
with a number, ie: 'easy 1', 'easy 2' and so on, which could
stand for longer commands in frequent use, including window
management tasks.

it certainly seems like the OP is highly motivated, with sad reason,
but we are here to help and excited to be part of such a quantum
leap. part of shortening the sst engine's learning curve is to give it
a more useful database to begin. if it hears the same passage read
more than once it learns more about how you say a word or letter
and if it can hear the passage read four or five times then so much
the better.

ditto for reciting the alphabet and double ditto for reciting the alphabet
backwards. numbers also. i'm sure others know more about this aspect.
i know there must be a literature on this topic.

i'll just say warmest good wishes, fascinating thread, enjoying yr post
style h4lf82.

cheers, mmm

User avatar
H4LF82
Posts: 123
Joined: Tue 02 Oct 2012, 04:22

#32 Post by H4LF82 »

Smithy, you bring up an interesting point of view! I will attempt to shed some light since I speak some basic programming...

"macro words" is a good way of putting it. imagine blindfolding yourself before you go to your computer and then turning it on and navigating to the browser.

just THIS ACT requires you to decide lots of things you may not realize, such as;

Is the computer performing correctly?
is it currently online?
is there an error message that requires attention from an improper shutdown or other event?

If all is well, you can visually see that and just click the "browse" icon. if not you can adjust accordingly---if you can see.

if you cannot see you have no idea there is an error message. no idea the internet is not connected. no idea if the machine is not on and running unless it is your first boot or you boot from the disk (and get the "bark! bark!" .wav) or you manually set it to beep annoyingly when it powers on.

Without eyes, every scenario is a challenge because one must adapt in real time to an ever-changing landscape. Blindfold yourself and you will understand.

THAT is the stick STT and TTS should be measured by...can the user operate his or her PC without any other input device other than a microphone? Without eyes, the mouse, keyboard, AND the video are all useless crap in the way.

If you are a software developer and you want to develop an award winning program that will be a boon for the visually impaired sector of society, solidifying your place in the history books as the father of modern machine sapience and insuring that statues, college campus buildings and awards are created in your honor...

you will do this.

Blindfold yourself, then imagine how to design a verbal interface that will give you command line access so you can create text files that contain the "macro words" necessary for creating command line executing programs without the need for a keyboard or mouse.

you dont NEED a screen to use your computer, you just need a screen to see it. Design a program that tells what happens on the screen and you will no longer require the screen. Computers can get smaller thanks to you.

You dont NEED a keyboard to use puppy linux. puppy linux has Xvkbd. Someone just needs to give me a way to give it 96 different verbal commands and I can type without ever touching a keyboard just like I do now, one key at a time. Computers can now have less input devices thanks to you.

You dont NEED a mouse to use Puppy Linux. You can use Xdotool and click on anything anywhere. Someone just needs to give me a way to give it coordinates and have it report coordinates so I know the coordinates of the browser are x25 y350 and I can make the computers job easier by saying "mouser move x25 y350, right click" and so on. Computers would no longer need mice thanks to you.

so...when smokey01 says "I will try to compile Simon for you into a pet package", what he is basically saying, at least as I understand it, is this...

"Using common programs already available on puppy Linux, the need for peripheral equipment on a desktop are reduced to a microphone and speaker. or a Bluetooth usb and a headset... with some time spent on the setup initially. everything will need to be assigned a value, and functions can then be created using 'macro words' that one can speak into literal text strings stored as text files to be read by either the user or the computer"

Keyboards, mice, touchscreens, they are all crutches. visual keys to mental constructions that we as sighted people take for granted as "decidedly necessary for using a computer". Nobody can imagine using a computer without looking at it or typing on it, but we could use them that way. We have the technology. Blind people have long dreamed of a world where they can verbally interact with everything. combining TTS and STT and you will directly contribute to a rapid increase in accessibility for the blind in computers and home automation for the physically handicapped, the obsolescence of the keyboard, the mouse, and the monitor, and likely become known as one of the greatest programmers of modern times.

...that is what I think anyway. correct me if I am wrong, please.

:D

mmmrr, I do not know what shortcut voice commands are, but It certainly sounds similar to me. At least according to common sense and the dictionary. a macro is a file that one uses to repeat a repetitive task programatically, though generally a macro is a text file and one must write it. "macro words" would be combining the idea of a macro with the idea of speaking the code instead of typing the code of a macro. or at least that is what the term "macro words" means to me. If anyone else cares to add ther definition of "macro words" they are invited to do so.

Cheers!
"The wise know their weakness too well to assume infallibility; and he who knows most, knows best how little he knows." - Thomas Jefferson

User avatar
H4LF82
Posts: 123
Joined: Tue 02 Oct 2012, 04:22

#33 Post by H4LF82 »

which basically makes smokey01 a programming Zeus worthy of marble statue carving....

Im just saying.

Thanks again smokey01. I know its probably giving you hell. any time you think you wanna throw in the towel I will understand if you reach your wits end!

totally serious about that pint BTW smokey01.
... And it also reads the text on the website, wonder if it can ignore all the markup stuff.
Smithy this will likely require a logic engine that allows for a "text only mode", and sets variables to fail to include speaking aloud any links, hyper text, or any other aspect of "news and text related sites" like this forum or google news that is not body text surrounded by multiple page breaks "<br>" while in text only mode, and a verbose mode for when hearing every link and page break. Ideally speaking that is.

:D
"The wise know their weakness too well to assume infallibility; and he who knows most, knows best how little he knows." - Thomas Jefferson

partsman
Posts: 363
Joined: Wed 06 Jun 2012, 19:00
Location: OHIO,USA

#34 Post by partsman »

This has got to be one of the best threds i have seen ! :D Serious and entertaining as well :lol: I have a feeling it will happen ! You want to know why i use puppy ? Because from all the other distros out there the people in these forums are TOTALLY AWESOME !!! Yes sometimes projects seem like they are forgotten about but the next thing ya know ! somebody stumbles across a small piece of the puzzle and there ya have it ! I have complete confidence in smokey01 and catdude ! As well as all who get involved in this project ! You will be amazed ! Puppies power stems from the people :wink: that is why PUPPY LINUX IS THE BEST ! I am very excited to see how this turns out ! Like i said i too have a very close member of my family that this would be quiet an amazing breakthrough !
[color=red]Anyone can build a fast processor. The trick is to build a fast system. (Seymour Cray)[/color] :wink:

User avatar
smokey01
Posts: 2813
Joined: Sat 30 Dec 2006, 23:15
Location: South Australia :-(
Contact:

#35 Post by smokey01 »

Guys you give me far too much credit.
Believe it or not but I'm not actually typing this message I'm dictating into an iPad through Siri.

As you can see it is quite accurate.
If we could do something similar in puppy it would be very very useful.

User avatar
smokey01
Posts: 2813
Joined: Sat 30 Dec 2006, 23:15
Location: South Australia :-(
Contact:

#36 Post by smokey01 »

ImageImage

Smokey01 <----> CatDude

Someone who understands KDE might do better.

jpeps
Posts: 3179
Joined: Sat 31 May 2008, 19:00

#37 Post by jpeps »

smokey01 wrote:Guys you give me far too much credit.
Believe it or not but I'm not actually typing this message I'm dictating into an iPad through Siri.

As you can see it is quite accurate.
I'd consider that an intelligent solution.

User avatar
H4LF82
Posts: 123
Joined: Tue 02 Oct 2012, 04:22

#38 Post by H4LF82 »

catdude is in on it too? I had no idea...

Now I feel HONORED EVEN!

THANK YOU catdude!!

and THANK YOU smokey01!!

whether you guys get it licked or not, let me just say that I appreciate the effort, regardless of the outcome!

You guys are "top gear" in my book :D
"The wise know their weakness too well to assume infallibility; and he who knows most, knows best how little he knows." - Thomas Jefferson

User avatar
smokey01
Posts: 2813
Joined: Sat 30 Dec 2006, 23:15
Location: South Australia :-(
Contact:

#39 Post by smokey01 »

H4LF82, are you able to run 64 bit operating systems on your computer?

User avatar
H4LF82
Posts: 123
Joined: Tue 02 Oct 2012, 04:22

#40 Post by H4LF82 »

I do not believe so, smokey01. I believe I am on 32 bit systems all around. I will begin double checking now to make sure...give me ten minutes to report!
"The wise know their weakness too well to assume infallibility; and he who knows most, knows best how little he knows." - Thomas Jefferson

Post Reply