Chatterbox - STT / TTS / TTA project. Part 2

A home for all kinds of Puppy related projects
Message
Author
User avatar
H4LF82
Posts: 123
Joined: Tue 02 Oct 2012, 04:22

#46 Post by H4LF82 »

in a perfect world, the speech-driven puppy would have this menu at a minimum, and it would have the ability to add any other functionality that is available (as long as the user is willing to tediously speak, type, or and confirm every option for every .pet available in this "distro" and then create it as part of the vocabulary and expand menu options using speech-driven tools such as the universal installer...
main menu

desktop-

clockset
clipboard settings

system-

grub
grub4dos
pdisk
boot manager

setup-

wizard wizard

utility-

control panel
x-archive
backup
resize psfile
execute command
new function utility
vocabulary builder
universal installer

filesystem-

file manager
disk mounter/unmounter

document-

e-reader
dictator

business-

calculator

personal-

password manager
organizer
event timer

network-

firewall
networking

internet-

browser

multimedia-

music player

fun-

memory game

help -

options
x-plain (a brief audible explanation of each option above)
FAQ
info

shutdown options-

reboot
shutdown
restart
IMHO...this would allow for maximum customization without overwhelming the user right off the bat AND keeping it a small so we dont overwhelm those who have to code it right off the bat. Use lucid 5.2.8 (stripped down to the above menu options being the only options available "out of the box").

after the guts of making the voice activation play random music files gets ironed out, creating the above menu items by adding them into the vocabulary and working out the universal installer first, everything else in Lucid COULD be added or removed from the speech-driven menu by myself or anybody else regardless of their visual abilities...


if I am getting to far ahead or forgetting anything, speak up someone please!

Cheers :D
"The wise know their weakness too well to assume infallibility; and he who knows most, knows best how little he knows." - Thomas Jefferson

User avatar
greengeek
Posts: 5789
Joined: Tue 20 Jul 2010, 09:34
Location: Republic of Novo Zelande

#47 Post by greengeek »

I got the whole shebang running successfully to my satisfaction tonight - including randomplay.

I also made another version of it that does not do the randomplay thingy - it has it's own included musical interlude instead - just for the purposes of getting people to trial the "proof-of-concept":

I decided to make this version into a combined tarball of chatterbox+espeak+sphinx, all ready to go in case anyone is interested in giving it a trial. The tarball includes a howto.txt for easy manual installation as I don't know how to do a pet yet.

If anyone gives it a go (and I'm keen to hear feedback...) I recommend you try it on a live session of Lucid 528 (either liveCD / pfix=ram or rename/relocate your savefile so that it can't be found)

22MByte Download here:
http://www.mediafire.com/download/ryd27 ... ype.tar.gz
(file now updated to include libportaudio pet which some users may need to install - thanks for that info Keef)

Installation instructions
See Howto.txt file included in tarball.

Troubleshooting tips:
1) Find a quiet room for testing in.
2) Have a look in the /root/chatdump.txt file to see if any of the words you speak are being decoded (use file/reload to view latest decodes)
3) If decoding is poor or non-existent try speaking at different distances from the mic - too near or too far can cause problems depending on the mic, voice and soundcard. Try talking in an American accent. Try saying the words "beginning" or "Out house" - they seem to have good decodability. The main word we want to decode for the purposes of this trial is "music" so try saying that in a clear voice and seeing if it gets correctly decoded.
4) After booting you should hear the voice prompt, but if you don't, go to /root/startup, rightclick the openspace, select "window - terminal here") and enter the following:

Code: Select all

#./zzzQuestionplay
(Note the dot slash)
This will start the script that triggers the voice prompt and you may see helpful errors displayed in that terminal window. Post them here if you are unable to solve the issue referred to. (ignore anything about fonts...) - (some users may also need to install the libportaudio pet contained in the tar)
5) If you need to reboot or restart xserver during your troubleshooting remove the following two files first:
/root/chatdump.txt
/root/extracted_command.txt
(doing this isn't vital but may help eliminate any odd symptoms if you can't get things going)
.
.
Last edited by greengeek on Thu 17 Oct 2013, 18:23, edited 6 times in total.

User avatar
H4LF82
Posts: 123
Joined: Tue 02 Oct 2012, 04:22

#48 Post by H4LF82 »

let me find a caffeinated beverage and I am all over it greengeek!

im excited :) ill let you know the results post haste!
"The wise know their weakness too well to assume infallibility; and he who knows most, knows best how little he knows." - Thomas Jefferson

User avatar
Keef
Posts: 987
Joined: Thu 20 Dec 2007, 22:12
Location: Staffordshire

#49 Post by Keef »

Works for me on Precise 571.
Espeak also needs libportaudio, but I was able to install that from my stash of assorted libraries.

User avatar
H4LF82
Posts: 123
Joined: Tue 02 Oct 2012, 04:22

#50 Post by H4LF82 »

im having issue---but there is no surprise there.

im starting over and retrying from pfix=ram... im positive i messed up at least twice...

bear with me. im moving like a herd of terrapins...

fleet 'o foot i am not :)
"The wise know their weakness too well to assume infallibility; and he who knows most, knows best how little he knows." - Thomas Jefferson

User avatar
greengeek
Posts: 5789
Joined: Tue 20 Jul 2010, 09:34
Location: Republic of Novo Zelande

#51 Post by greengeek »

@Keef - thanks for the libportaudio info - I have now included this pet in the tar.gz and re-uploaded the prototype.

@H4LF82 - I have added some troubleshooting info above. Hope this helps resolve the issues.

User avatar
H4LF82
Posts: 123
Joined: Tue 02 Oct 2012, 04:22

#52 Post by H4LF82 »

ok! obviously I am having an issue, but its not with the fotfware...its the hardware.

a word about me and hardware...i like hardware, and i really like overkill...and being an overkill-on-the-hardware kind of guy, my microphone is a home-brewed array of 8 different directional microphones. he results of me saying "music" in an American accent...

Code: Select all

READY....
Listening...
Stopped listening, please wait...
000000000: i will will the it was in and i will to the a new
READY....
Listening...
Stopped listening, please wait...
000000001: i'll read that hold on to what what it the two
READY....
Listening...
Stopped listening, please wait...
000000002: two are all out
READY....
Listening...
Stopped listening, please wait...
000000003: you're not like on
READY....
Listening...
Stopped listening, please wait...
000000004: why
READY....
Listening...
Stopped listening, please wait...
000000005: and i don't
READY....
Listening...
Stopped listening, please wait...
000000006: yeah
READY....
Listening...
Stopped listening, please wait...
000000007: they are on the at it
READY....
Listening...
Stopped listening, please wait...
000000008: i know
READY....
Listening...
Stopped listening, please wait...
000000009: but the i have
READY....
Listening...
Stopped listening, please wait...
000000010: yeah
READY....
Listening...
Stopped listening, please wait...
000000011: all right
READY....
Listening...
Stopped listening, please wait...
000000012: that as
READY....
Listening...
Stopped listening, please wait...
000000013: yeah
READY....
I obviously need to dig out a different mic. it IS working tho...just gotta go tweak the hardware on my end. not a big deal...im a hardware guy and ive got a bag of mics and fresh solder and flux from home depot if I need to build a simple mic real quick.

:D nice job thank you! ill sort thiss mic issue out and then retry...i have every confidence it functions :)
"The wise know their weakness too well to assume infallibility; and he who knows most, knows best how little he knows." - Thomas Jefferson

User avatar
H4LF82
Posts: 123
Joined: Tue 02 Oct 2012, 04:22

#53 Post by H4LF82 »

ok ive done the install a couple of times now to make sure I get it clean from pfix=ram and loaded it as per the instructions. ive instructed everyone to leave and go to the store and im turning off all fans and other sources of noise. i have a newly built mic thats ready to go...i have only to restart the x server to begin the fun...

it will be 30 minutes for these girls to get dressed and gone, so im gonna pause for a quick sandwich and a beverage. when i get back i will restart x in a completely quiet environment with appropriate volume settings and then report on the results...

ill be back ! :D
Last edited by H4LF82 on Thu 17 Oct 2013, 21:01, edited 1 time in total.
"The wise know their weakness too well to assume infallibility; and he who knows most, knows best how little he knows." - Thomas Jefferson

User avatar
greengeek
Posts: 5789
Joined: Tue 20 Jul 2010, 09:34
Location: Republic of Novo Zelande

#54 Post by greengeek »

H4LF82 - it looks to me as if your mic is never hearing silence, or maybe there is incoming noise from more than one mic?? Maybe the gain is too loud - on one puppy I had to turn off the normal 20DB mic boost and also wind the capture vol right down. Just looks like your mic is overloaded with incoming input maybe?

If you unplug the mic dose the chatdump.txt file stay silent? (except for READY...)

Im thinking inbuilt mics might be problematic if there was a noisy HDD...

User avatar
H4LF82
Posts: 123
Joined: Tue 02 Oct 2012, 04:22

#55 Post by H4LF82 »

that microphone i was using is special. it cannot be counted as a typical mic..it is 8 microphones in an array. it can hear an ant tripping and if you feel like doing some math it will tell you which direction the ant tripped in and how far away it was...a fresh built puppy running in RAM for the first time cannot cope with it, i have no doubt.

I now have a desk mic with a new cord; so no more overload.

no worries...nd yes with no mic it does stay silent (READY..... is all there is with no mic)
"The wise know their weakness too well to assume infallibility; and he who knows most, knows best how little he knows." - Thomas Jefferson

User avatar
Ted Dog
Posts: 3965
Joined: Wed 14 Sep 2005, 02:35
Location: Heart of Texas

#56 Post by Ted Dog »

H4LF82 wrote:that microphone i was using is special. it cannot be counted as a typical mic..it is 8 microphones in an array. it can hear an ant tripping and if you feel like doing some math it will tell you which direction the ant tripped in and how far away it was...a fresh built puppy running in RAM for the first time cannot cope with it, i have no doubt.

I now have a desk mic with a new cord; so no more overload.

no worries...nd yes with no mic it does stay silent (READY..... is all there is with no mic)
That actually sounds cool, like the auditory awareness methods used in pilots helmets. Your ears can 'see' almost 360x360 when your eyes are looking forward, a missle or enemy plane(s) can be heard in your minds eye, with audio phase changes and microseconds delays.
It's very amazing how well the ears can intercept sound and paint a picture of the environment.

User avatar
H4LF82
Posts: 123
Joined: Tue 02 Oct 2012, 04:22

#57 Post by H4LF82 »

Code: Select all

READY....
Listening...
Stopped listening, please wait...
000000000: music
READY....
Listening...
Stopped listening, please wait...
000000001: music
READY....
Listening...
Stopped listening, please wait...
000000002: music
READY....
the "music" was funny! i can indeed confirm that it works!

nice job greengeek!
"The wise know their weakness too well to assume infallibility; and he who knows most, knows best how little he knows." - Thomas Jefferson

User avatar
H4LF82
Posts: 123
Joined: Tue 02 Oct 2012, 04:22

#58 Post by H4LF82 »

That actually sounds cool, like the auditory awareness methods used in pilots helmets. Your ears can 'see' almost 360x360 when your eyes are looking forward, a missle or enemy plane(s) can be heard in your minds eye, with audio phase changes and microseconds delays.
It's very amazing how well the ears can intercept sound and paint a picture of the environment.
the build used to be here...

http://hackaday.com/2010/09/22/build-a- ... icrophone/

...but like all cool things, hackaday has succumbed to the evil consumerism monster and is happy to throw the occasional ad at you and unhappy to mirrir the projects that make them what they used to be.

my mic is like that one only mine has left and right channels for each point in the geometry...so where this one has 4 mics...mine has 8. The plan is to get it mounted to my RV underneath so i can hear the goings on outside as if the walls were not padded with insulative foam, sound-dampening board and half inch cedar. I can hear everything going on outside and nobody outside can hear what happens inside.

:D it is cool...
"The wise know their weakness too well to assume infallibility; and he who knows most, knows best how little he knows." - Thomas Jefferson

User avatar
Ted Dog
Posts: 3965
Joined: Wed 14 Sep 2005, 02:35
Location: Heart of Texas

#59 Post by Ted Dog »

H4LF82 wrote:
That actually sounds cool, like the auditory awareness methods used in pilots helmets. Your ears can 'see' almost 360x360 when your eyes are looking forward, a missle or enemy plane(s) can be heard in your minds eye, with audio phase changes and microseconds delays.
It's very amazing how well the ears can intercept sound and paint a picture of the environment.
the build used to be here...

http://hackaday.com/2010/09/22/build-a- ... icrophone/

...but like all cool things, hackaday has succumbed to the evil consumerism monster and is happy to throw the occasional ad at you and unhappy to mirrir the projects that make them what they used to be.

my mic is like that one only mine has left and right channels for each point in the geometry...so where this one has 4 mics...mine has 8. The plan is to get it mounted to my RV underneath so i can hear the goings on outside as if the walls were not padded with insulative foam, sound-dampening board and half inch cedar. I can hear everything going on outside and nobody outside can hear what happens inside.

:D it is cool...
Your RV must weigh a ton! but if you use your RV like me, it does not go much of anywhere, but provide a rolling man-cave parked on my get-a-way from the family,to undeveloped land. (posting from there now, great day in Texas, no A/C or heat needed.

User avatar
H4LF82
Posts: 123
Joined: Tue 02 Oct 2012, 04:22

#60 Post by H4LF82 »

okay, so correct me if I am wrong, but as i understand it, the point of this exercise was to make the "music" play by saying "music", and that is a done deal now, right?

so now should we go on to step 3, which I assume entails adding other command words (like "shutdown" and "restart") and adding those functionalities too?

if that is correct, can someone please confirm this for me?

Cheers! :D
"The wise know their weakness too well to assume infallibility; and he who knows most, knows best how little he knows." - Thomas Jefferson

User avatar
greengeek
Posts: 5789
Joined: Tue 20 Jul 2010, 09:34
Location: Republic of Novo Zelande

#61 Post by greengeek »

Yes, I think the next step involves extending the functionality. I do have some issues I want to look at more closely before I continue - for example I found that if I manually cleared the chatdump.txt file it stopped sphinx from doing any further updating to that file so I want to resolve why that is.

I also want to fine tune the code so that the script that reads the extracted command also clears the extracted_command.txt file (or "voice_prompt" or whatever we want to call it...) ready for the next extraction. (I think that step is pretty easy using sed)

One of the things on my list is to flesh out the first few posts in each of these chatterbox threads so that the important info is viewable without searching too far.

I also want to find ways to improve the integrity of word / phrase recognition so that it is possible to offer this as a pet which is reliable enough to make it pretty easy for new users to set up their own preferred command set/function at boot time, even if it is only for a single function.

In terms of what to do next I am keen to keep a similar informal format as this simple project but do two main things:
1) Produce a number of vocab files that are tailored for more reliable word recognition and with a word set that is appropriate to various specific functions (eg: a post-boot/main menu command set and/or vocab list, a browsing-specific set, and maybe a dictation set. Probably also a FileManager set.

2) I want to create scripts that allow a more interactive and multilevelled menu./protocol system eg: after boot I feel the computer should ask something like the following:
"Please choose between Music, Browsing, File Manager or Puppy menu" Once the user chooses their preference the main menu would hand control to the next menu and so on.

I have a few experiments in mind to test out what is possible so I hope to launch into those in the coming days. Feel free to suggest your own suggestions or preferences. The more thoughts the better...

EDIT: decided to start part 4 here:
http://murga-linux.com/puppy/viewtopic.php?t=89360

User avatar
technosaurus
Posts: 4853
Joined: Mon 19 May 2008, 01:24
Location: Blue Springs, MO
Contact:

#62 Post by technosaurus »

My sound doesn't work in Linux on 1 computer without a lot of manual setup and the other has a really crappy mic, but I think I can program it blind.

To make it a bit trekky, I will call my generic command "computer" so that it only "does stuff" when you begin your sentence with "Computer ..."

Code: Select all

computer(){
    case $1 in
        open)shift; which $1 && $@ || text2speech "I can't find that program.";;
        disregard)exit;;
        *)text2speech "I can't handle the $@ command yet.";;
    esac
}

pocketsphinx_continuous $SOMERANDOMOPTIONS |while read ROW COMMAND ARGS; do
case "$ROW$COMMAND" in
    [0-9]*:computer)$COMMAND $ARGS;;
    [0-9]*:dictate)[ "$DICTATE" ] && DICTATE="" || DICTATE=true ;;
    [0-9]*:*)[ "$DICTATE" ] && echo $COMMAND $ARGS >>$HOME/dictations
esac
done
for the text2speech try one of these:
http://www.murga-linux.com/puppy/viewto ... 601#573601
Check out my [url=https://github.com/technosaurus]github repositories[/url]. I may eventually get around to updating my [url=http://bashismal.blogspot.com]blogspot[/url].

User avatar
Ted Dog
Posts: 3965
Joined: Wed 14 Sep 2005, 02:35
Location: Heart of Texas

#63 Post by Ted Dog »

what about puppy in place of computer...

Puppy speak

Puppy fetch email

Puppy empty trash.. :wink:

starhawk
Posts: 4906
Joined: Mon 22 Nov 2010, 06:04
Location: Everybody knows this is nowhere...

#64 Post by starhawk »

"Puppy empty trash"...

...makes me think of this scene from Family Guy --> http://www.youtube.com/watch?v=17K6izfGMn0

LOL.

User avatar
H4LF82
Posts: 123
Joined: Tue 02 Oct 2012, 04:22

#65 Post by H4LF82 »

what about puppy in place of computer...
the only problem i have with that is that "puppy" sounds like all sorts of other words ending with the long E sound, and that could lead to problems.

the star trek computer is a good model to follow...and it has the advantage of being a four syllable word ending in a short R (uncommon) versus a two syllable word ending with the long E (very common). i see why you would suggest it though ted dog.

there are star trek computer sounds here...

http://www.starbase51.co.uk/starbase51/wav/wav.asp

andd technosaurus...we are using espeak, not text2speech. its already part of the package...

and a question....does the code you have there do this...

Code: Select all

#!/bin/bash

#This is just a "proof_of_concept" to show that the user can provide verbal feedback to control an action
# Establish loop
condition_to_check="False"
while [[ ${condition_to_check} == "False" ]]; do

#allow time after boot:
sleep 5

#Ask the question:
espeak -f /root/Qplay.txt &

#Allow time for user to reply
sleep 7
#play a noise to indicate the user is finifhed recording
/usr/share/chatterbox/sounds/c811.wav

#Use sed to extract last 3 lines of chatdump.txt, pipe the result to awk which extracts the single word command
#and writes it to sed2awk_extract_command.txt
sed -e :a -e '$q;N;4,$D;ba' /root/chatdump.txt | awk '/^0000/  { print $2 }' > /root/extracted_command.txt

#use sed to extract the command word from the sed2awk_extract_command.txt file
#and call it the "command" variable
command=$(sed '$!d' /root/extracted_command.txt)

#Test if the command word equals the word we want to hear
#if [ $command=yes ]
if test "$command" = "computer"

then
condition_to_check="True"
#If there is a match then  make a noise to confirm:
mplayer /usr/share/chatterbox/sounds/c810.wav &
# espeak -f /root/Music.txt &
# delete the contents of the two text files
sed '/-Start/,/-End/d' /root/extracted_command.txt &
sed '/-Start/,/-End/d' /root/chatdump.txt &
# run the menu program
# ARGUEMENT to run menu program MISSING HERE!!

else
condition_to_check="False"
#     echo "Failed to process chat_command."
espeak -f /root/CommandFail.txt &	
fi
..i think we were doing the same thin at the same time and came up with 2 different ways to do it :D i was going to add a second script for the menu of options beyond the word computer...

I also added the 'computer' sounds from the site above and put them in /usr/share/chatterbox/sounds
"The wise know their weakness too well to assume infallibility; and he who knows most, knows best how little he knows." - Thomas Jefferson

Post Reply