Chatterbox - STT / TTS / TTA project. Part 2

A home for all kinds of Puppy related projects
Message
Author
User avatar
Keef
Posts: 987
Joined: Thu 20 Dec 2007, 22:12
Location: Staffordshire

#31 Post by Keef »

Probably hundreds of better ways of doing this, but it does have some effect:

Code: Select all

 awk '/^0/  { print $2 }' ~/test.txt
This extracts the spoken command (assuming a single word).
It just finds a line beginning with a '0' eg 000000001:, then prints out the next field.
Contents of test.txt:

Code: Select all

READY....
Listening...
Stopped listening, please wait...
000000000: hello
READY....
Listening...
Stopped listening, please wait...
000000001: bye
READY....

Just outputting to test.txt, there will be several words, as the file gets constantly written to, so it probably needs to be done 'on the fly'.
(thats not genuine output BTW, not had chance to test recognition properly)

The word then needs passing to something like:

Code: Select all

case  $wot_u_said in 
        browser)
        defaultbrowser
        ;;
        shutdown)     
        shutdown
         ;;
  esac         
}

User avatar
greengeek
Posts: 5789
Joined: Tue 20 Jul 2010, 09:34
Location: Republic of Novo Zelande

#32 Post by greengeek »

I think what I would be wanting to do is this:

1) Identify the most recent instance of "READY" in the file (assuming the file may at times have pages of chatter in it...)
2) "Grab" the text that is found between that "READY" and the colon that precedes it.
3) Strip out any leading and trailing spaces.
4) Use the remaining text as our command keyword.
5) Clear the file.

I've got a bit of research to do... :-)

User avatar
greengeek
Posts: 5789
Joined: Tue 20 Jul 2010, 09:34
Location: Republic of Novo Zelande

#33 Post by greengeek »

Thanks Keef, I didn't spot that before I posted. I will have a tinker with your code tonight.

EDIT : Could your awk script be changed to allow it to detect the "READY" string and then grab the data field BEFORE it? (I'm thinking that would ensure we were not trying to grab the data field before it was finished being written)

EDIT2 : Is there a risk involved in trying to have two programs accessing the same file? ie: what if sphinx is trying to write new data to the file while I am trying to use another program to clear the data. Do I need to handle the potential conflict resolution or does the system code handle that somehow by making one program wait politely? (and if so - how does it determine which program gets the priority?)

User avatar
greengeek
Posts: 5789
Joined: Tue 20 Jul 2010, 09:34
Location: Republic of Novo Zelande

#34 Post by greengeek »

I gave your awk code a try and it works well on a live file - I decided to change it to:

Code: Select all

 awk '/^0000/  { print $2 }' ~/test.txt
just so there was less chance of it picking up any other "0" that happens to get thrown in there (probably unnecessary but I feel it's more selective to look for the "0000" string)

So now the problem is to clear the file at the right time so that only a single instance is captured (last response only). (Or how to discard everything except the last line captured...?)

All I can do is have a good look at Grep, Awk and Sed and try to figure which is the best way to force discarding of everything except the most recent command word.

All suggestions truly welcome.

User avatar
greengeek
Posts: 5789
Joined: Tue 20 Jul 2010, 09:34
Location: Republic of Novo Zelande

#35 Post by greengeek »

Well, I've got some steps that seem to work, but I haven't combined them into an elegant form yet. Here are the steps I have been trying just as single scripts:

This first step gets pocketsphinx_continuous to run and build a chatdump file:

Code: Select all

#!/bin/sh
pocketsphinx_continuous  > /root/chatdump.txt &
This creates a raw "chatdump" file containing things like this:

Code: Select all

Listening...
Stopped listening, please wait...
000000001: Out house
READY....
Listening...
Stopped listening, please wait...
000000002: program
READY....
Listening...
Stopped listening, please wait...
000000003: beginning
READY....
Then I use Keefs awk to extract the recognised command word from each valid line as follows:

Code: Select all

#!/bin/bash
awk '/^0000/  { print $2 }' /root/chatdump.txt > /root/chat_extract.txt &
This creates a file called chat_extract.txt containing the following:

Code: Select all

Out house
program
beginning
Then I extract just the final word from this file as follows:

Code: Select all

#!/bin/bash
sed '$!d'  /root/chat_extract.txt > /root/chat_command.txt &
Which extracts the final spoken command (in this case "beginning") and lists that single word in a file called chat_command.txt

Of course this is limiting commands to a single word, but that is where I want to start. (Then the TTA protocol/menu can easily reject any meaningless command and only permit a small range of actions that it is programmed for)

At least by picking out the final word in the file I don't need to panic about clearing the chatdump file regularly yet.

Don't laugh folks - at least I feel like I'm making progress. Baby steps!
.
Last edited by greengeek on Tue 15 Oct 2013, 01:02, edited 1 time in total.

User avatar
H4LF82
Posts: 123
Joined: Tue 02 Oct 2012, 04:22

#36 Post by H4LF82 »

i dont think anybody is laughing greengeek. that looks awesome.

im trying just to keep up with your baby steps! your baby steps seem herculean compared to my baby steps.

im off to play with this now and see what little help i can offer, if any.
:)
"The wise know their weakness too well to assume infallibility; and he who knows most, knows best how little he knows." - Thomas Jefferson

User avatar
greengeek
Posts: 5789
Joined: Tue 20 Jul 2010, 09:34
Location: Republic of Novo Zelande

#37 Post by greengeek »

If you are succeeding in getting some degree of accurate word recognition but want to improve the overall accuracy the easiest way to do this is by running a reduced vocabulary. That way the program does not have to try to distinguish the difference between words or phrases that sound similar.
eg:
If the program has a big vocabulary to chose from the word "read" could be misinterpreted as any of the following:
red
reed
reared
wreck
wrecked
wreaked etc etc

If your vocab file only contains the word "read" then that is what will be chosen when you say "read"

To achieve this reduced vocabulary you have to make a "corpus.txt" file as discussed two thirds of the way down the page in this tutorial:
http://cmusphinx.sourceforge.net/wiki/tutoriallm

Here is a sample corpus.txt file:

Code: Select all

open browser
new e-mail
forward
backward
next window
last window
open music player
Out house
negative
right
close
Once you have set up your corpus.txt file it is necessary to get it compiled. I did this via the online webservice referred to in the tutorial. Simply go here:
http://www.speech.cs.cmu.edu/tools/lmtool-new.html
and click the "Browse" button and upload your corpus.txt

It will quickly be compiled and you will be offered a tar.gz file which contains various necessary files which will all have a numeric prefix. Save this tarball and extract it somewhere you can find the files later.
I ended up with a collection of files as follows:
6718.dic
6718.lm
6718.log_pronounce
6718.sent
6718.vocab
(I guess your number prefix will be different...)

Copy all of these files into the /usr/share/pocketsphinx/model/lm directory (see attached pic)

Start pocketsphinx with the following syntax:

Code: Select all

pocketsphinx_continuous -lm 6718.lm -dict 6718.dic
(obviously substituting your own number prefix instead of 6718...)

Pocketsphinx will now run as normal but without so many confusing options to choose from if it is struggling to make out which word you are saying. This seems to work quite well so it suggests that we would be able to create a very useful corpus.txt file which would be applicable to a puppy-specifc menu system.
.
Attachments
sphinx_vocab_files_in_lm_dir_.jpg
(81.88 KiB) Downloaded 330 times

User avatar
H4LF82
Posts: 123
Joined: Tue 02 Oct 2012, 04:22

#38 Post by H4LF82 »

that is outstanding greengeek! I was wondering how we were going to deal with that HUGE vocabulary. I will begin working on a "one size fits all" word-base today for non-visually-accessible options that hopefully will come a lot closer to fitting the bill than is currently available on other platforms.

And, for the record, i am still reeling from the shock and awe of the fine code work youve got going on here. I have nothing to compare it to, as it is unexampled.

Thank you, greengeek. I anxiously await your next words of instruction on needles and pins.

:D
"The wise know their weakness too well to assume infallibility; and he who knows most, knows best how little he knows." - Thomas Jefferson

User avatar
greengeek
Posts: 5789
Joined: Tue 20 Jul 2010, 09:34
Location: Republic of Novo Zelande

#39 Post by greengeek »

Thanks for the kind words H4LF82. This is rather a enjoyable project and I am learning quite a bit. After doing a bit of code googling I have decided to change the order of the steps I was using:

Rather than use awk to list all of the command words from the chatdump file I have decided to firstly use sed to grab only the last 3 lines of that file, then pipe that output to awk to identify the single keyword. So my new code is:

Code: Select all

#!/bin/bash
sed -e :a -e '$q;N;4,$D;ba' /root/chatdump.txt | awk '/^0000/  { print $2 }' > chat_command.txt &
My next step is to work out how to pipe the awk output to a variable (instead of a file) then I can compare it to my "preferred answer" (ie "YES") and use that directly to trigger my final action

User avatar
Keef
Posts: 987
Joined: Thu 20 Dec 2007, 22:12
Location: Staffordshire

#40 Post by Keef »

Code: Select all

 voice_prompt=`sed -e :a -e '$q;N;4,$D;ba' /root/chatdump.txt | awk '/^0000/  { print $2 }'` 
..seems to work. (try #echo $voice_prompt) Can't test it properly as only on an old laptop with built-in mike, and speech recognition is a little ropey.
Not managed to bolt all this together - there is always a delay from the speech until something gets written to the temp file. Most solutions to this seem to be done in Python, which is all Parseltongue to me. Not that my Bash is much better.

Ooh err.... got Leafpad to launch by coughing!!

run

Code: Select all

# pocketsphinx_continuous  > chatdump.txt
in one terminal,

then run this script in another:

Code: Select all

#!/bin/bash
while [ "$voice_prompt" != "quit"  ] ; do

 voice_prompt=`sed -e :a -e '$q;N;4,$D;ba' chatdump.txt | awk '/^0000/  { print $2 }'`

# Line below just for testing:
echo $voice_prompt 

# Change "edit" to what ever works best.

if [ "$voice_prompt" == "edit" ]; then 
   exec leafpad
  
fi


done
I just used "i" as the keyword, as recognition is so poor. I've tidied up my very messy code to post it, so may be the odd error.

User avatar
greengeek
Posts: 5789
Joined: Tue 20 Jul 2010, 09:34
Location: Republic of Novo Zelande

#41 Post by greengeek »

Thanks keef!

ps: I just realised this whole project to kick off randomly playing music is only going to work if I wear headphones to isolate my microphone from the speakers. I just had Katy Perry filling up my chatdump....
:D

User avatar
H4LF82
Posts: 123
Joined: Tue 02 Oct 2012, 04:22

#42 Post by H4LF82 »

...this is what makes an "OFF"switch so handy. for this moment, its not a big deal because we are still working out form and function...but ultimately an off switch will be vital.

i will be using a bluetooth headset, and even this (which cannot hear its own speaker) will need a software-side mute button in addition to the actual physical hardware MUTE button on the headset itself.

that is my 2 cents...

:D
edit:

Code: Select all

amixer set Master 0 mute

Code: Select all

amixer set Master 35 unmute
[/code]
"The wise know their weakness too well to assume infallibility; and he who knows most, knows best how little he knows." - Thomas Jefferson

User avatar
greengeek
Posts: 5789
Joined: Tue 20 Jul 2010, 09:34
Location: Republic of Novo Zelande

#43 Post by greengeek »

H4LF82 wrote:

Code: Select all

amixer set Master 0 mute
amixer set Master 35 unmute
Is that muting mic only? (ie: no effect on speakers?)

so...
- mute mic
- ask question
- unmute mic
- process voice_prompt
- possibly re-mute mic
- take action
- prepare next question
- repeat process

User avatar
H4LF82
Posts: 123
Joined: Tue 02 Oct 2012, 04:22

#44 Post by H4LF82 »

that mutes speakers ....

look here to control others...

Code: Select all

sh-4.1# amixer -h
Usage: amixer <options> [command]

Available options:
  -h,--help       this help
  -c,--card N     select the card
  -D,--device N   select the device, default 'default'
  -d,--debug      debug mode
  -n,--nocheck    do not perform range checking
  -v,--version    print version of this program
  -q,--quiet      be quiet
  -i,--inactive   show also inactive controls
  -a,--abstract L select abstraction level (none or basic)
  -s,--stdin      Read and execute commands from stdin sequentially

Available commands:
  scontrols       show all mixer simple controls
  scontents	  show contents of all mixer simple controls (default command)
  sset sID P      set contents for one mixer simple control
  sget sID        get contents for one mixer simple control
  controls        show all controls for given card
  contents        show contents of all controls for given card
  cset cID P      set control contents for one control
  cget cID        get control contents for one control
sh-4.1# amixer scontrols
Simple mixer control 'Master',0
Simple mixer control 'PCM',0
Simple mixer control 'IEC958',0
Simple mixer control 'IEC958 Default PCM',0
Simple mixer control 'Capture',0
Simple mixer control 'Digital',0
Simple mixer control 'Input Source',0
Simple mixer control 'Mux',0

so for the mic youll probably do something like

Code: Select all

amixer set Capture 0 mute
or...

Code: Select all

amixer set Input Source 35 unmute
and to be clear...32 is 100% volume on any control... 16 is half (50%)

35 is overkill for LOUD
"The wise know their weakness too well to assume infallibility; and he who knows most, knows best how little he knows." - Thomas Jefferson

User avatar
H4LF82
Posts: 123
Joined: Tue 02 Oct 2012, 04:22

#45 Post by H4LF82 »

you may also find it helpful to use the sleep/usleep command and an audible "ding.wav" to mark the "beginning" and the ""end" of the voice-prompt window, if timing or "lag" is an issu and to make no mistake that the machine is listening/stopped listening and is processing a valid input option/choking on an invalid response input.

XSE from the xdotool package would allow you to assign a "hotkey" to a specific functiom...ie..start recording imput by holding down the left shift key for 4 seconds opens the voice-prompt...but im having issue with xse and dont know another way to emulate keystrokes or assign value to them...

maybbe an if statement with some command ive yet to learn....

Code: Select all

 if RIGHT_SHIFT DOWN => 5SECONDS, run listen.sh 
sorry i cant be more help...

edit--i meant Xaut not xse or xdotool---sorry.
"The wise know their weakness too well to assume infallibility; and he who knows most, knows best how little he knows." - Thomas Jefferson

User avatar
H4LF82
Posts: 123
Joined: Tue 02 Oct 2012, 04:22

#46 Post by H4LF82 »

in a perfect world, the speech-driven puppy would have this menu at a minimum, and it would have the ability to add any other functionality that is available (as long as the user is willing to tediously speak, type, or and confirm every option for every .pet available in this "distro" and then create it as part of the vocabulary and expand menu options using speech-driven tools such as the universal installer...
main menu

desktop-

clockset
clipboard settings

system-

grub
grub4dos
pdisk
boot manager

setup-

wizard wizard

utility-

control panel
x-archive
backup
resize psfile
execute command
new function utility
vocabulary builder
universal installer

filesystem-

file manager
disk mounter/unmounter

document-

e-reader
dictator

business-

calculator

personal-

password manager
organizer
event timer

network-

firewall
networking

internet-

browser

multimedia-

music player

fun-

memory game

help -

options
x-plain (a brief audible explanation of each option above)
FAQ
info

shutdown options-

reboot
shutdown
restart
IMHO...this would allow for maximum customization without overwhelming the user right off the bat AND keeping it a small so we dont overwhelm those who have to code it right off the bat. Use lucid 5.2.8 (stripped down to the above menu options being the only options available "out of the box").

after the guts of making the voice activation play random music files gets ironed out, creating the above menu items by adding them into the vocabulary and working out the universal installer first, everything else in Lucid COULD be added or removed from the speech-driven menu by myself or anybody else regardless of their visual abilities...


if I am getting to far ahead or forgetting anything, speak up someone please!

Cheers :D
"The wise know their weakness too well to assume infallibility; and he who knows most, knows best how little he knows." - Thomas Jefferson

User avatar
greengeek
Posts: 5789
Joined: Tue 20 Jul 2010, 09:34
Location: Republic of Novo Zelande

#47 Post by greengeek »

I got the whole shebang running successfully to my satisfaction tonight - including randomplay.

I also made another version of it that does not do the randomplay thingy - it has it's own included musical interlude instead - just for the purposes of getting people to trial the "proof-of-concept":

I decided to make this version into a combined tarball of chatterbox+espeak+sphinx, all ready to go in case anyone is interested in giving it a trial. The tarball includes a howto.txt for easy manual installation as I don't know how to do a pet yet.

If anyone gives it a go (and I'm keen to hear feedback...) I recommend you try it on a live session of Lucid 528 (either liveCD / pfix=ram or rename/relocate your savefile so that it can't be found)

22MByte Download here:
http://www.mediafire.com/download/ryd27 ... ype.tar.gz
(file now updated to include libportaudio pet which some users may need to install - thanks for that info Keef)

Installation instructions
See Howto.txt file included in tarball.

Troubleshooting tips:
1) Find a quiet room for testing in.
2) Have a look in the /root/chatdump.txt file to see if any of the words you speak are being decoded (use file/reload to view latest decodes)
3) If decoding is poor or non-existent try speaking at different distances from the mic - too near or too far can cause problems depending on the mic, voice and soundcard. Try talking in an American accent. Try saying the words "beginning" or "Out house" - they seem to have good decodability. The main word we want to decode for the purposes of this trial is "music" so try saying that in a clear voice and seeing if it gets correctly decoded.
4) After booting you should hear the voice prompt, but if you don't, go to /root/startup, rightclick the openspace, select "window - terminal here") and enter the following:

Code: Select all

#./zzzQuestionplay
(Note the dot slash)
This will start the script that triggers the voice prompt and you may see helpful errors displayed in that terminal window. Post them here if you are unable to solve the issue referred to. (ignore anything about fonts...) - (some users may also need to install the libportaudio pet contained in the tar)
5) If you need to reboot or restart xserver during your troubleshooting remove the following two files first:
/root/chatdump.txt
/root/extracted_command.txt
(doing this isn't vital but may help eliminate any odd symptoms if you can't get things going)
.
.
Last edited by greengeek on Thu 17 Oct 2013, 18:23, edited 6 times in total.

User avatar
H4LF82
Posts: 123
Joined: Tue 02 Oct 2012, 04:22

#48 Post by H4LF82 »

let me find a caffeinated beverage and I am all over it greengeek!

im excited :) ill let you know the results post haste!
"The wise know their weakness too well to assume infallibility; and he who knows most, knows best how little he knows." - Thomas Jefferson

User avatar
Keef
Posts: 987
Joined: Thu 20 Dec 2007, 22:12
Location: Staffordshire

#49 Post by Keef »

Works for me on Precise 571.
Espeak also needs libportaudio, but I was able to install that from my stash of assorted libraries.

User avatar
H4LF82
Posts: 123
Joined: Tue 02 Oct 2012, 04:22

#50 Post by H4LF82 »

im having issue---but there is no surprise there.

im starting over and retrying from pfix=ram... im positive i messed up at least twice...

bear with me. im moving like a herd of terrapins...

fleet 'o foot i am not :)
"The wise know their weakness too well to assume infallibility; and he who knows most, knows best how little he knows." - Thomas Jefferson

Post Reply