Chatterbox - STT / TTS / TTA project. Part 2

A home for all kinds of Puppy related projects
Message
Author
User avatar
greengeek
Posts: 5789
Joined: Tue 20 Jul 2010, 09:34
Location: Republic of Novo Zelande

#61 Post by greengeek »

Yes, I think the next step involves extending the functionality. I do have some issues I want to look at more closely before I continue - for example I found that if I manually cleared the chatdump.txt file it stopped sphinx from doing any further updating to that file so I want to resolve why that is.

I also want to fine tune the code so that the script that reads the extracted command also clears the extracted_command.txt file (or "voice_prompt" or whatever we want to call it...) ready for the next extraction. (I think that step is pretty easy using sed)

One of the things on my list is to flesh out the first few posts in each of these chatterbox threads so that the important info is viewable without searching too far.

I also want to find ways to improve the integrity of word / phrase recognition so that it is possible to offer this as a pet which is reliable enough to make it pretty easy for new users to set up their own preferred command set/function at boot time, even if it is only for a single function.

In terms of what to do next I am keen to keep a similar informal format as this simple project but do two main things:
1) Produce a number of vocab files that are tailored for more reliable word recognition and with a word set that is appropriate to various specific functions (eg: a post-boot/main menu command set and/or vocab list, a browsing-specific set, and maybe a dictation set. Probably also a FileManager set.

2) I want to create scripts that allow a more interactive and multilevelled menu./protocol system eg: after boot I feel the computer should ask something like the following:
"Please choose between Music, Browsing, File Manager or Puppy menu" Once the user chooses their preference the main menu would hand control to the next menu and so on.

I have a few experiments in mind to test out what is possible so I hope to launch into those in the coming days. Feel free to suggest your own suggestions or preferences. The more thoughts the better...

EDIT: decided to start part 4 here:
http://murga-linux.com/puppy/viewtopic.php?t=89360

User avatar
technosaurus
Posts: 4853
Joined: Mon 19 May 2008, 01:24
Location: Blue Springs, MO
Contact:

#62 Post by technosaurus »

My sound doesn't work in Linux on 1 computer without a lot of manual setup and the other has a really crappy mic, but I think I can program it blind.

To make it a bit trekky, I will call my generic command "computer" so that it only "does stuff" when you begin your sentence with "Computer ..."

Code: Select all

computer(){
    case $1 in
        open)shift; which $1 && $@ || text2speech "I can't find that program.";;
        disregard)exit;;
        *)text2speech "I can't handle the $@ command yet.";;
    esac
}

pocketsphinx_continuous $SOMERANDOMOPTIONS |while read ROW COMMAND ARGS; do
case "$ROW$COMMAND" in
    [0-9]*:computer)$COMMAND $ARGS;;
    [0-9]*:dictate)[ "$DICTATE" ] && DICTATE="" || DICTATE=true ;;
    [0-9]*:*)[ "$DICTATE" ] && echo $COMMAND $ARGS >>$HOME/dictations
esac
done
for the text2speech try one of these:
http://www.murga-linux.com/puppy/viewto ... 601#573601
Check out my [url=https://github.com/technosaurus]github repositories[/url]. I may eventually get around to updating my [url=http://bashismal.blogspot.com]blogspot[/url].

User avatar
Ted Dog
Posts: 3965
Joined: Wed 14 Sep 2005, 02:35
Location: Heart of Texas

#63 Post by Ted Dog »

what about puppy in place of computer...

Puppy speak

Puppy fetch email

Puppy empty trash.. :wink:

starhawk
Posts: 4906
Joined: Mon 22 Nov 2010, 06:04
Location: Everybody knows this is nowhere...

#64 Post by starhawk »

"Puppy empty trash"...

...makes me think of this scene from Family Guy --> http://www.youtube.com/watch?v=17K6izfGMn0

LOL.

User avatar
H4LF82
Posts: 123
Joined: Tue 02 Oct 2012, 04:22

#65 Post by H4LF82 »

what about puppy in place of computer...
the only problem i have with that is that "puppy" sounds like all sorts of other words ending with the long E sound, and that could lead to problems.

the star trek computer is a good model to follow...and it has the advantage of being a four syllable word ending in a short R (uncommon) versus a two syllable word ending with the long E (very common). i see why you would suggest it though ted dog.

there are star trek computer sounds here...

http://www.starbase51.co.uk/starbase51/wav/wav.asp

andd technosaurus...we are using espeak, not text2speech. its already part of the package...

and a question....does the code you have there do this...

Code: Select all

#!/bin/bash

#This is just a "proof_of_concept" to show that the user can provide verbal feedback to control an action
# Establish loop
condition_to_check="False"
while [[ ${condition_to_check} == "False" ]]; do

#allow time after boot:
sleep 5

#Ask the question:
espeak -f /root/Qplay.txt &

#Allow time for user to reply
sleep 7
#play a noise to indicate the user is finifhed recording
/usr/share/chatterbox/sounds/c811.wav

#Use sed to extract last 3 lines of chatdump.txt, pipe the result to awk which extracts the single word command
#and writes it to sed2awk_extract_command.txt
sed -e :a -e '$q;N;4,$D;ba' /root/chatdump.txt | awk '/^0000/  { print $2 }' > /root/extracted_command.txt

#use sed to extract the command word from the sed2awk_extract_command.txt file
#and call it the "command" variable
command=$(sed '$!d' /root/extracted_command.txt)

#Test if the command word equals the word we want to hear
#if [ $command=yes ]
if test "$command" = "computer"

then
condition_to_check="True"
#If there is a match then  make a noise to confirm:
mplayer /usr/share/chatterbox/sounds/c810.wav &
# espeak -f /root/Music.txt &
# delete the contents of the two text files
sed '/-Start/,/-End/d' /root/extracted_command.txt &
sed '/-Start/,/-End/d' /root/chatdump.txt &
# run the menu program
# ARGUEMENT to run menu program MISSING HERE!!

else
condition_to_check="False"
#     echo "Failed to process chat_command."
espeak -f /root/CommandFail.txt &	
fi
..i think we were doing the same thin at the same time and came up with 2 different ways to do it :D i was going to add a second script for the menu of options beyond the word computer...

I also added the 'computer' sounds from the site above and put them in /usr/share/chatterbox/sounds
"The wise know their weakness too well to assume infallibility; and he who knows most, knows best how little he knows." - Thomas Jefferson

User avatar
Ted Dog
Posts: 3965
Joined: Wed 14 Sep 2005, 02:35
Location: Heart of Texas

#66 Post by Ted Dog »

When I see long case switch like this is becoming, I long for the old IBM REXX program.. It just was so GOOD at stuff like this.

User avatar
technosaurus
Posts: 4853
Joined: Mon 19 May 2008, 01:24
Location: Blue Springs, MO
Contact:

#67 Post by technosaurus »

H4LF82 wrote:andd technosaurus...we are using espeak, not text2speech. its already part of the package...

and a question....does the code you have there do this...
....
..i think we were doing the same thin at the same time and came up with 2 different ways to do it :D i was going to add a second script for the menu of options beyond the word computer...

I also added the 'computer' sounds from the site above and put them in /usr/share/chatterbox/sounds
I meant for text2speech to be shell function wrapper like the ones in my link. The ultralight espeak version I built only uses standard puppy libs (no portaudio, ...), so the wav output option can be sent to stdout and piped through aplay (I like the unix philosophy)
There are quite a few other examples in that post, reading html docs by stripping the tag, getting text from the clipboard (it gets filled every time you highlight something, so can be annoying unless you _need_ it) and a few more.
btw, I wonder if espeak's -f option would work like echo "my text" |espeak -f /dev/stdin

I'm sure my code is duplicated effort, but all the code I was seeing was becoming overly complex.

I bet it wouldn't be too difficult to use my .desktop file parsing code from jwm_tools (its in jwm_menu_create) to create a voice menu... and probably parse the PuppyPin or combine with wget to google stuff
Check out my [url=https://github.com/technosaurus]github repositories[/url]. I may eventually get around to updating my [url=http://bashismal.blogspot.com]blogspot[/url].

User avatar
H4LF82
Posts: 123
Joined: Tue 02 Oct 2012, 04:22

#68 Post by H4LF82 »

I meant for text2speech to be shell function wrapper like the ones in my link. The ultralight espeak version I built only uses standard puppy libs (no portaudio, ...), so the wav output option can be sent to stdout and piped through aplay (I like the unix philosophy)
...ooooh. i see now!...
I wonder if espeak's -f option would work like echo "my text" |espeak -f /dev/stdin
...i dont know, but it says...
If neither -f nor --stdin, then <words> are spoken, or if none then text
is spoken from stdin, each line separately.
in the helpfile, so id think it would.
I'm sure my code is duplicated effort, but all the code I was seeing was becoming overly complex.
im still having trouble following along. were all going in the same direction tho i think...

:D
"The wise know their weakness too well to assume infallibility; and he who knows most, knows best how little he knows." - Thomas Jefferson

User avatar
greengeek
Posts: 5789
Joined: Tue 20 Jul 2010, 09:34
Location: Republic of Novo Zelande

#69 Post by greengeek »

technosaurus wrote:for the text2speech try one of these:
http://www.murga-linux.com/puppy/viewto ... 601#573601
Hi technosaurus, I have just tried "speak" and wondered if the problem I experienced is normal - I have a text file called /root/Qplay.txt and it contains the following sentence:
"Welcome to puppy. Please say the word Music if you want me to play music"

If I use the syntax:

Code: Select all

speak_files /root/Qplay.txt
it speaks the sentence as I would expect. However, if I use the following syntax:

Code: Select all

speak /root/Qplay.txt
I get an error message telling me to use the -w option "because the program was built without a sound interface"

If I then use the following syntax:

Code: Select all

speak -w /root/Qplay.txt
the txt file gets emptied and has no contents.

Is that what you would expect?

User avatar
technosaurus
Posts: 4853
Joined: Mon 19 May 2008, 01:24
Location: Blue Springs, MO
Contact:

#70 Post by technosaurus »

greengeek wrote:However, if I use the following syntax:

Code: Select all

speak /root/Qplay.txt
I get an error message telling me to use the -w option "because the program was built without a sound interface"

If I then use the following syntax:

Code: Select all

speak -w /root/Qplay.txt
the txt file gets emptied and has no contents.

Is that what you would expect?
IIRC the -w flag indicates the name for the output wav file.
The reason speak_* work differently is that I wrote my own puppy helper scripts to use stdout as the output file and piped them through aplay. You can still use speak -w /root/Qplay.wav -f /root/Qplay.txt && aplay /root/Qplay.wav, but it will take unecessary disk space and have additional delay compared to using stdout.
Check out my [url=https://github.com/technosaurus]github repositories[/url]. I may eventually get around to updating my [url=http://bashismal.blogspot.com]blogspot[/url].

User avatar
greengeek
Posts: 5789
Joined: Tue 20 Jul 2010, 09:34
Location: Republic of Novo Zelande

#71 Post by greengeek »

H4LF82 wrote:

Code: Select all

#!/bin/bash

#This is just a "proof_of_concept" to show that the user can provide verbal feedback to control an action
# Establish loop
condition_to_check="False"
while [[ ${condition_to_check} == "False" ]]; do
Hi H4LF82, does this addition mean that the "TEST FOR KEYWORD" is now going on continuously, or have I misunderstood.? (The one most important step I want to achieve at the moment is to get the keyword testing running continuously rather than just a single test/single action event)
sleep 7
#play a noise to indicate the user is finifhed recording
/usr/share/chatterbox/sounds/c811.wav
"mplayer" has been inadvertently left off here right? Or are you using a different function somehow?
# delete the contents of the two text files
sed '/-Start/,/-End/d' /root/extracted_command.txt &
sed '/-Start/,/-End/d' /root/chatdump.txt &
Nice touch. Have you been testing this script live or are you still in process of writing? I'm keen to know if the programmatic clearing of the file works correctly. When I manually clear the chatdump.txt I usually seem to get the outcome that sphinx stops writing to the file...
I also added the 'computer' sounds from the site above and put them in /usr/share/chatterbox/sounds
We have a chatterbox directory in /usr/share? Oooooh, that sounds great! Almost like a REAL program now... :-)

User avatar
greengeek
Posts: 5789
Joined: Tue 20 Jul 2010, 09:34
Location: Republic of Novo Zelande

#72 Post by greengeek »

technosaurus wrote:IIRC the -w flag indicates the name for the output wav file.
Just a heads-up for anyone else using "speak" then - don't do what I did and launch into reading your text file with the 'speak -w' syntax - in my case this WROTE to the textfile and I lost the contents. A minor problem in this case, but a different matter if it was an eBook... :-)

Use 'speak_files" to do the reading.

User avatar
greengeek
Posts: 5789
Joined: Tue 20 Jul 2010, 09:34
Location: Republic of Novo Zelande

#73 Post by greengeek »

technosaurus wrote:To make it a bit trekky, I will call my generic command "computer" so that it only "does stuff" when you begin your sentence with "Computer ..."

Code: Select all

computer(){
    case $1 in
        open)shift; which $1 && $@ || text2speech "I can't find that program.";;
        disregard)exit;;
        *)text2speech "I can't handle the $@ command yet.";;
    esac
}

pocketsphinx_continuous $SOMERANDOMOPTIONS |while read ROW COMMAND ARGS; do
case "$ROW$COMMAND" in
    [0-9]*:computer)$COMMAND $ARGS;;
    [0-9]*:dictate)[ "$DICTATE" ] && DICTATE="" || DICTATE=true ;;
    [0-9]*:*)[ "$DICTATE" ] && echo $COMMAND $ARGS >>$HOME/dictations
esac
done
Hi technosaurus - are you able to explain to my untrained brain a bit about what this is doing please? Is this MONITORING for the output from sphinx, or is this about PROCESSING the previously detected output? or maybe both?
(I'm still struggling with trying to get continuous sampling of the sphinx output...)

User avatar
H4LF82
Posts: 123
Joined: Tue 02 Oct 2012, 04:22

#74 Post by H4LF82 »

H4LF82 wrote:

Code:
#!/bin/bash

#This is just a "proof_of_concept" to show that the user can provide verbal feedback to control an action
# Establish loop
condition_to_check="False"
while [[ ${condition_to_check} == "False" ]]; do
Hi H4LF82, does this addition mean that the "TEST FOR KEYWORD" is now going on continuously, or have I misunderstood.? (The one most important step I want to achieve at the moment is to get the keyword testing running continuously rather than just a single test/single action event)
yes. its wrapped up in a loop checking for condition-to-check to equal true; provided i dont include any syntax errors ... :/
Quote:
sleep 7
#play a noise to indicate the user is finifhed recording
/usr/share/chatterbox/sounds/c811.wav
"mplayer" has been inadvertently left off here right? Or are you using a different function somehow?
not a different function. THIS IS MY PROBLEM. THIS is why I cannot program....its not that i cant program. i cannot see...so my code is chock full of errors and hangs up in the stupidest mistakes. im glad i gave you this code example now--it illustrates my point perfectly.
Quote:

# delete the contents of the two text files
sed '/-Start/,/-End/d' /root/extracted_command.txt &
sed '/-Start/,/-End/d' /root/chatdump.txt &
Nice touch. Have you been testing this script live or are you still in process of writing? I'm keen to know if the programmatic clearing of the file works correctly. When I manually clear the chatdump.txt I usually seem to get the outcome that sphinx stops writing to the file...
you SHOULD not have to manually clear it now....but agaain i add the caviat that i cannot see, and i can guarantee there are errors in my code. triple checkmy code....

im glaad you like it tho :)
"The wise know their weakness too well to assume infallibility; and he who knows most, knows best how little he knows." - Thomas Jefferson

User avatar
H4LF82
Posts: 123
Joined: Tue 02 Oct 2012, 04:22

#75 Post by H4LF82 »

Quote:
I also added the 'computer' sounds from the site above and put them in /usr/share/chatterbox/sounds
We have a chatterbox directory in /usr/share? Oooooh, that sounds great! Almost like a REAL program now... Smile
...yeah, that was getting necessary. i have a sscript running somewhere in all of this tat is filling my root folder with blank directories every hour...ive rebooted from the live cd and started a new savefile just for this project, and since it now has its own sfs, it might as well be structured correctly too.

/ussr/share/chatterbox/ is now the directory for it, if there are no objections?

cheers!
"The wise know their weakness too well to assume infallibility; and he who knows most, knows best how little he knows." - Thomas Jefferson

User avatar
technosaurus
Posts: 4853
Joined: Mon 19 May 2008, 01:24
Location: Blue Springs, MO
Contact:

#76 Post by technosaurus »

H4LF82 wrote:...yeah, that was getting necessary. i have a sscript running somewhere in all of this tat is filling my root folder with blank directories every hour...ive rebooted from the live cd and started a new savefile just for this project, and since it now has its own sfs, it might as well be structured correctly too.
If you take a look at my little example, it uses no disk space unless "dictate" is toggled and then it only writes to a single file in the user's $HOME directory. *nix OSs (including puppy linux) can operate on streams, so unless you are planning to use the output data from pocketsphinx_continuous for analysis to maybe patch the source there is really no need to use a temporary file(s).

With that being said, I realize pocketsphinx_continuous has a lot of superfluous options, but without a decent sound system it is difficult for me to separate the wheat from the chaff. If anyone cares to take note of what command line args and output strings are of limited value, I'd be willing to thresh them out of the source code. If we are always needing to set an arg to a certain value, I can hard code it, if an arg is never used I can remove it and if the output would be better in a different format, that can be done (for example using a time-since-epoch style integer time stamp instead of 0000000001: ....)

in shell that would be date +%s

or in C
struct timeval tp;
gettimeofday(&tp);
int seconds = tp.tv_sec

to convert them to a date string in shell
date -d @1382162295 <options_here>
Check out my [url=https://github.com/technosaurus]github repositories[/url]. I may eventually get around to updating my [url=http://bashismal.blogspot.com]blogspot[/url].

User avatar
greengeek
Posts: 5789
Joined: Tue 20 Jul 2010, 09:34
Location: Republic of Novo Zelande

#77 Post by greengeek »

H4LF82 wrote:/ussr/share/chatterbox/ is now the directory for it, if there are no objections?
cheers!
I certainly have no objections. I am a little concious that chatterbox may end up being a messy collection of poorly coded (yet hopefully functional) scripts that represent our attempts to achieve our various goals...

But then, if that happens, there is nothing to prevent a better coder improving things and maybe in the end chatterbox just becomes a testing ground that makes way for a more professional effort which could have a better name (VoiceBox maybe...). What do you think?

I'm kind of enjoying being able to throw my 'chatterbox' ideas into the ring and learning some basics of scripting but I don't want to be blamed for filling the puppy coffers with bad code :-)

User avatar
greengeek
Posts: 5789
Joined: Tue 20 Jul 2010, 09:34
Location: Republic of Novo Zelande

#78 Post by greengeek »

technosaurus wrote:so unless you are planning to use the output data from pocketsphinx_continuous for analysis to maybe patch the source there is really no need to use a temporary file(s).
That's excellent - I felt bad about using the temp file. Seemed a bit clunky. At least it helped me get to first base though...

User avatar
greengeek
Posts: 5789
Joined: Tue 20 Jul 2010, 09:34
Location: Republic of Novo Zelande

#79 Post by greengeek »

technosaurus wrote:and if the output would be better in a different format, that can be done
Does that mean it might be possible to get a single word output from sphinx - eg: just the command word itself?

User avatar
H4LF82
Posts: 123
Joined: Tue 02 Oct 2012, 04:22

#80 Post by H4LF82 »

If you take a look at my little example, it uses no disk space unless "dictate" is toggled and then it only writes to a single file in the user's $HOME directory.
...oh, please dont misunderstand. I get it! im not complaining. i expect that this will end up as a Frankenstein of code and be as cringe-worthy as it gets to the trained eye...and i dont care. im as happy as a pig in filth if the code is sloppy and im prepared to create new sfs files a thousand times over if thats what it takes.

and im happy for the testing folder to contain a million empty directories. just not my root folder. that folder is cluttered enough and i have a terrible time navigating folders now as it is. buried in /usr/share/chatterbox is a good place for testing files IMHO...thats all i was saying. :)

forgive me if it sounded like i was wingeing!

and i agree that txt files are clunky. it was my suggestion, and i suggested it because it gives me a physical place to put the stdout without having to use a console where i can physically SEE it the moment it gets created. by all means, remove the text file and use the stdout ...someone with a console who trusts their eyes, please!

Cheers!
"The wise know their weakness too well to assume infallibility; and he who knows most, knows best how little he knows." - Thomas Jefferson

Post Reply