Chatterbox - STT / TTS / TTA project. Part 2

A home for all kinds of Puppy related projects
Message
Author
User avatar
greengeek
Posts: 5789
Joined: Tue 20 Jul 2010, 09:34
Location: Republic of Novo Zelande

Chatterbox - STT / TTS / TTA project. Part 2

#1 Post by greengeek »

Part 2 of my "chatterbox" project is aimed at getting a Puppy to monitor the microphone and listen to my response and create a text file which accurately reflects what I have spoken.

Part 1 and chatterbox project description here:
http://www.murga-linux.com/puppy/viewtopic.php?t=89258

Part 3 (Making puppy act on decoded commands) here:
http://murga-linux.com/puppy/viewtopic.php?t=89260

Progress so far is based on the 'pocketsphinx_continuous" .pet offered by technosaurus here:
http://www.murga-linux.com/puppy/viewto ... 5&start=27
.
Last edited by greengeek on Wed 27 Nov 2013, 04:45, edited 4 times in total.

User avatar
greengeek
Posts: 5789
Joined: Tue 20 Jul 2010, 09:34
Location: Republic of Novo Zelande

#2 Post by greengeek »

reserved

User avatar
greengeek
Posts: 5789
Joined: Tue 20 Jul 2010, 09:34
Location: Republic of Novo Zelande

#3 Post by greengeek »

reserved

User avatar
greengeek
Posts: 5789
Joined: Tue 20 Jul 2010, 09:34
Location: Republic of Novo Zelande

#4 Post by greengeek »

reserved

User avatar
greengeek
Posts: 5789
Joined: Tue 20 Jul 2010, 09:34
Location: Republic of Novo Zelande

#5 Post by greengeek »

reserved

User avatar
H4LF82
Posts: 123
Joined: Tue 02 Oct 2012, 04:22

#6 Post by H4LF82 »

This is going to be the tough bit. Getting your computer to understand even one single word is tough enough, never mind the entire English language.

For these purposes, however, even the ability to discern between 2 words like "yes" and "no" would be extremely helpful.

Ive heard to try sphinx, verbio, ubuntu, and all manner and sorts of other things, but I have not had any luck with any of it. But I can tell you this much; I know when I am beaten, and there is a 6 month chunk of my life gone that I wont ever get back that I spent banging my head against this very wall (hindsight being 20/20, I'd avoid Sphinx if I were you), so by all means, please have a go at it...

I look forward to seeing what comes of it! :D
"The wise know their weakness too well to assume infallibility; and he who knows most, knows best how little he knows." - Thomas Jefferson

User avatar
greengeek
Posts: 5789
Joined: Tue 20 Jul 2010, 09:34
Location: Republic of Novo Zelande

#7 Post by greengeek »

Sorry to hear of your experience with Sphinx - I was getting my hopes up this morning when technosaurus posted a pet of pocketsphinx
http://www.murga-linux.com/puppy/viewto ... 5&start=27

Never mind, I will give it a go. As you say, teaching it the difference between "yes" and "no" is all that is required to make a start. To be honest I've read a few posts that suggest it is a mistake to use short words with STT - better to try to teach it the difference between "affirmative" and "not bloody likely" - apparently the longer phrases are easier to decode reliably.

User avatar
greengeek
Posts: 5789
Joined: Tue 20 Jul 2010, 09:34
Location: Republic of Novo Zelande

#8 Post by greengeek »

Well, I've been playing with pocketsphinx and it seems to be pretty good at decoding what I'm saying. I can certainly get it to distinguish yes and no with excellent reliability. Surprisingly it also seems very good (sometimes) at assembling entire sentences - although the accuracy does vary if the room has background noise.

I found that the program itself was extremely sensitive to mic volume and it was necessary for me turn the capture volume right DOWN to almost nothing, and to turn OFF the 20db mic boost which is usually a necessity with all other audio programs like mhwaveedit etc. Quite surprising.

The problem is what to do with the output of the recognition program? I can see the decoded speech in the terminal but how to feed it to a text file in real time??

Technosaurus mentioned the following tutorial:
http://hackaday.com/2010/07/11/adding-s ... -platform/
and one of the comments was as follows:
I have a robot and I want to use Pocketsphinx so I can talk to the robot thing like…where is this room and it will tell me where it is or move foward and it should move forward. Right now I have install pockectsphinx.07 and sphinxbase and when I run using ubuntu 10.04LTS: pocketsphinx_continuous -lm 1998.lm -dict .dict 1998.dic it say READY then listening the when I say something like Good morning it write back Goodmorning….But how do I go from here…how do I use pocketsphinx to allow me to just talk and have what I just said be recorded and send to my robot to move…PLEASE HELP
To which the author replied:
Hello Steve
The way to connect recognizer library output to an action is a standard task every programmer could solve. I suppose you need to learn how to write programs. I’m sure you could find quite some references on the web. If you learn Python for example you can do it in a minute. For futher questions please use CMUSphinx forums
http://cmusphinx.sourceforge.net/wiki/communicate
So - not being a programmer, I'm stuck.

Technosaurus makes the following comment:
One way to handle the output from speech recognition is to use /dev/stdout as the output and pipe it through a while-read-case block like:

Code: Select all

	
pocketsphinx_continuous <params>| while read LINE; do
case "$LINE" in
  *)...;; #use different regex here for different actions
esac;
done
I will need to scavenge the CMUSphinx forums and learn what all this means and see if there are any examples that give me some clues how to finetune this for puppy.

User avatar
H4LF82
Posts: 123
Joined: Tue 02 Oct 2012, 04:22

sphinx

#9 Post by H4LF82 »

if we can practice on Lucid ill give it a go...

gimme a few to get caffiene and im on it...
"The wise know their weakness too well to assume infallibility; and he who knows most, knows best how little he knows." - Thomas Jefferson

User avatar
greengeek
Posts: 5789
Joined: Tue 20 Jul 2010, 09:34
Location: Republic of Novo Zelande

#10 Post by greengeek »

What version of Lucid are you using? (could you post the bottom few lines of your /etc/DISTRO_SPECS file?

User avatar
H4LF82
Posts: 123
Joined: Tue 02 Oct 2012, 04:22

#11 Post by H4LF82 »

lucid 5.2.8

Code: Select all

One or more words that identify this distribution:
DISTRO_NAME='Lucid '
#A three-digit numeric value, version number of this distribution:
DISTRO_VERSION=528
#A two-digit numeric value, minor-version number of this distribution:
DISTRO_MINOR_VERSION=00
#The distro whose binary packages were used to build this distribution:
DISTRO_BINARY_COMPAT='ubuntu'
#Prefix for some filenames: exs: lupusave.2fs, lupu-528.sfs
DISTRO_FILE_PREFIX='lupu'
#The version of the distro whose binary packages were used to build this distro:
DISTRO_COMPAT_VERSION='lucid'
#the kernel pet package used:
DISTRO_KERNEL_PET='linux_kernel-2.6.33.2-tickless_smp_patched-L3.pet'
#16-byte alpha-numeric ID-string appended to vmlinuz, lupu_528.sfs, zl528332.sfs and devx.sfs:
DISTRO_IDSTRING='l528120404231153'
#Puppy default filenames...
#Note, the 'SFS' files below are what the 'init' script in initrd.gz searches for,
#for the partition, path and actual files loaded, see PUPSFS and ZDRV in /etc/rc.d/PUPSTATE
DISTRO_PUPPYSFS='lupu_528.sfs'
DISTRO_ZDRVSFS='zl528332.sfs'
"The wise know their weakness too well to assume infallibility; and he who knows most, knows best how little he knows." - Thomas Jefferson

User avatar
H4LF82
Posts: 123
Joined: Tue 02 Oct 2012, 04:22

#12 Post by H4LF82 »

if you would confirm that sphinx plays well with lucid ( i.e. no smoking HDD's) then i will give it another try. i may have had an old version last time...
"The wise know their weakness too well to assume infallibility; and he who knows most, knows best how little he knows." - Thomas Jefferson

User avatar
greengeek
Posts: 5789
Joined: Tue 20 Jul 2010, 09:34
Location: Republic of Novo Zelande

#13 Post by greengeek »

When the decoded speech is extracted I thought it would be useful to have it placed into a text file called something like chatdump or voicedump or something like that - the stream of text would flow in as the user spoke, and maybe the file would need to be cleared every 3 seconds or so.

When Puppy was ready to assess the users answer to a question it would go looking at the chatdump and view the last word (or words if appropriate).

If the user was busy chatting to other people in the room this chatter would be discarded after 3 seconds, and then when it came time to answer a Puppy question the user would reach a natural break in their conversation and the chatdump would just contain their answer to that question.

Just tossing ideas into the mix....

User avatar
H4LF82
Posts: 123
Joined: Tue 02 Oct 2012, 04:22

#14 Post by H4LF82 »

we are of one mind here. While I can see the merit of piping the stdout using python and then continuing in python, i would prefer to stay in the shallow end with my water wings and just write the stdout to a txt file which can then be bash-ed into submission. I can write a monitor-script to check the bash file for changes every few seconds and when they are detected, to act on them appropriately.

Arguably not as elegant as a singular python script, but i think it will do the job. Luckily there are many ways to skin a cat programmatically :)
"The wise know their weakness too well to assume infallibility; and he who knows most, knows best how little he knows." - Thomas Jefferson

User avatar
greengeek
Posts: 5789
Joined: Tue 20 Jul 2010, 09:34
Location: Republic of Novo Zelande

#15 Post by greengeek »

I've just booted into a live session of Lupu 528 and can confirm that pocketsphinx works fine.
(Interestingly I did not need to wind down the mic volume in Lupu the way I did on Upup. It worked fine in Lupu without any changes).

Steps as follows:
1) Download technosaurus pocketsphinx pet from here:
http://murga-linux.com/puppy/viewtopic. ... 5&start=27
2) Install the pet
3) Create a new directory of /usr/share/pocketsphinx (we will be using this later...)
4) Download the other source files referred to by technoasurus from this link:
http://hivelocity.dl.sourceforge.net/pr ... 0.8.tar.gz
5) Extract these files in your download directory and copy the "model" directory from the source into the /usr/share/pocketsphinx directory created above. (ie it becomes /usr/share/pocketsphinx/model)
6) Go into /usr/bin, rightclick in the open space and choose "window, terminal here"
7) Type: #./pocketsphinx_continuous
You should see sphinx set itself up and eventually show a "Ready" prompt. At that point you can speak into your microphone and you should see it say "listening..." and then once you stop speaking it will try to decode what you said.

Try saying "negative" or "affirmative" - I found the detection of those words to be 100% accurate if I used an American accent (ie: roll the r slightly in affirmative, just like Mr Spock would have.)

(The biggest problem is I keep spelling "shpinx" wrong a million times).

User avatar
greengeek
Posts: 5789
Joined: Tue 20 Jul 2010, 09:34
Location: Republic of Novo Zelande

#16 Post by greengeek »

Also, I found some words worked really well and others were unreliable (this probably depends on the microphone, the soundcard and the voice of the user etc)

Here is a list of the words I found that work pretty consistently so far:

negative (pronounce the t clearly)
affirmative (pronounce the t clearly and roll the r slightly as Americans do)
yes
no
right
down
north (roll the r slightly as americans do)
program
clear
again (pronounce "agen" not "agayn")
welcome
beginning
screen
return (roll the r slightly as americans do)
absolutely
music
internet (pronounced as "innnternet" as Americans would. Roll the r slightly)
one
four (roll the r slightly as americans do)
six
self
finish
fiction
america
Out house

Avoid start and stop as they are too easily confused.
.
.
Last edited by greengeek on Sun 13 Oct 2013, 00:39, edited 1 time in total.

User avatar
H4LF82
Posts: 123
Joined: Tue 02 Oct 2012, 04:22

#17 Post by H4LF82 »

Code: Select all

sh-4.1# ./pocketsphinx_continuous 
INFO: cmd_ln.c(691): Parsing command line:
./pocketsphinx_continuous 

Current configuration:
[NAME]		[DEFLT]		[VALUE]
-adcdev				
-agc		none		none
-agcthresh	2.0		2.000000e+00
-alpha		0.97		9.700000e-01
-argfile			
-ascale		20.0		2.000000e+01
-aw		1		1
-backtrace	no		no
-beam		1e-48		1.000000e-48
-bestpath	yes		yes
-bestpathlw	9.5		9.500000e+00
-bghist		no		no
-ceplen		13		13
-cmn		current		current
-cmninit	8.0		8.0
-compallsen	no		no
-debug				0
-dict				
-dictcase	no		no
-dither		no		no
-doublebw	no		no
-ds		1		1
-fdict				
-feat		1s_c_d_dd	1s_c_d_dd
-featparams			
-fillprob	1e-8		1.000000e-08
-frate		100		100
-fsg				
-fsgusealtpron	yes		yes
-fsgusefiller	yes		yes
-fwdflat	yes		yes
-fwdflatbeam	1e-64		1.000000e-64
-fwdflatefwid	4		4
-fwdflatlw	8.5		8.500000e+00
-fwdflatsfwin	25		25
-fwdflatwbeam	7e-29		7.000000e-29
-fwdtree	yes		yes
-hmm				
-infile				
-input_endian	little		little
-jsgf				
-kdmaxbbi	-1		-1
-kdmaxdepth	0		0
-kdtree				
-latsize	5000		5000
-lda				
-ldadim		0		0
-lextreedump	0		0
-lifter		0		0
-lm				
-lmctl				
-lmname		default		default
-logbase	1.0001		1.000100e+00
-logfn				
-logspec	no		no
-lowerf		133.33334	1.333333e+02
-lpbeam		1e-40		1.000000e-40
-lponlybeam	7e-29		7.000000e-29
-lw		6.5		6.500000e+00
-maxhmmpf	-1		-1
-maxnewoov	20		20
-maxwpf		-1		-1
-mdef				
-mean				
-mfclogdir			
-min_endfr	0		0
-mixw				
-mixwfloor	0.0000001	1.000000e-07
-mllr				
-mmap		yes		yes
-ncep		13		13
-nfft		512		512
-nfilt		40		40
-nwpen		1.0		1.000000e+00
-pbeam		1e-48		1.000000e-48
-pip		1.0		1.000000e+00
-pl_beam	1e-10		1.000000e-10
-pl_pbeam	1e-5		1.000000e-05
-pl_window	0		0
-rawlogdir			
-remove_dc	no		no
-round_filters	yes		yes
-samprate	16000		1.600000e+04
-seed		-1		-1
-sendump			
-senlogdir			
-senmgau			
-silprob	0.005		5.000000e-03
-smoothspec	no		no
-svspec				
-time		no		no
-tmat				
-tmatfloor	0.0001		1.000000e-04
-topn		4		4
-topn_beam	0		0
-toprule			
-transform	legacy		legacy
-unit_area	yes		yes
-upperf		6855.4976	6.855498e+03
-usewdphones	no		no
-uw		1.0		1.000000e+00
-var				
-varfloor	0.0001		1.000000e-04
-varnorm	no		no
-verbose	no		no
-warp_params			
-warp_type	inverse_linear	inverse_linear
-wbeam		7e-29		7.000000e-29
-wip		0.65		6.500000e-01
-wlen		0.025625	2.562500e-02

INFO: cmd_ln.c(691): Parsing command line:
\
	-nfilt 20 \
	-lowerf 1 \
	-upperf 4000 \
	-wlen 0.025 \
	-transform dct \
	-round_filters no \
	-remove_dc yes \
	-svspec 0-12/13-25/26-38 \
	-feat 1s_c_d_dd \
	-agc none \
	-cmn current \
	-cmninit 56,-3,1 \
	-varnorm no 

Current configuration:
[NAME]		[DEFLT]		[VALUE]
-agc		none		none
-agcthresh	2.0		2.000000e+00
-alpha		0.97		9.700000e-01
-ceplen		13		13
-cmn		current		current
-cmninit	8.0		56,-3,1
-dither		no		no
-doublebw	no		no
-feat		1s_c_d_dd	1s_c_d_dd
-frate		100		100
-input_endian	little		little
-lda				
-ldadim		0		0
-lifter		0		0
-logspec	no		no
-lowerf		133.33334	1.000000e+00
-ncep		13		13
-nfft		512		512
-nfilt		40		20
-remove_dc	no		yes
-round_filters	yes		no
-samprate	16000		1.600000e+04
-seed		-1		-1
-smoothspec	no		no
-svspec				0-12/13-25/26-38
-transform	legacy		dct
-unit_area	yes		yes
-upperf		6855.4976	4.000000e+03
-varnorm	no		no
-verbose	no		no
-warp_params			
-warp_type	inverse_linear	inverse_linear
-wlen		0.025625	2.500000e-02

INFO: acmod.c(246): Parsed model-specific feature parameters from /usr/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/feat.params
INFO: feat.c(713): Initializing feature stream to type: '1s_c_d_dd', ceplen=13, CMN='current', VARNORM='no', AGC='none'
INFO: cmn.c(142): mean[0]= 12.00, mean[1..12]= 0.0
INFO: acmod.c(167): Using subvector specification 0-12/13-25/26-38
INFO: mdef.c(517): Reading model definition: /usr/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/mdef
INFO: mdef.c(528): Found byte-order mark BMDF, assuming this is a binary mdef file
INFO: bin_mdef.c(336): Reading binary model definition: /usr/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/mdef
INFO: bin_mdef.c(513): 50 CI-phone, 143047 CD-phone, 3 emitstate/phone, 150 CI-sen, 5150 Sen, 27135 Sen-Seq
INFO: tmat.c(205): Reading HMM transition probability matrices: /usr/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/transition_matrices
INFO: acmod.c(121): Attempting to use SCHMM computation module
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /usr/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/means
INFO: ms_gauden.c(292): 1 codebook, 3 feature, size: 
INFO: ms_gauden.c(294):  256x13
INFO: ms_gauden.c(294):  256x13
INFO: ms_gauden.c(294):  256x13
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /usr/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/variances
INFO: ms_gauden.c(292): 1 codebook, 3 feature, size: 
INFO: ms_gauden.c(294):  256x13
INFO: ms_gauden.c(294):  256x13
INFO: ms_gauden.c(294):  256x13
INFO: ms_gauden.c(354): 0 variance values floored
INFO: s2_semi_mgau.c(903): Loading senones from dump file /usr/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/sendump
INFO: s2_semi_mgau.c(927): BEGIN FILE FORMAT DESCRIPTION
INFO: s2_semi_mgau.c(1022): Using memory-mapped I/O for senones
INFO: s2_semi_mgau.c(1296): Maximum top-N: 4 Top-N beams: 0 0 0
INFO: dict.c(317): Allocating 137543 * 20 bytes (2686 KiB) for word entries
INFO: dict.c(332): Reading main dictionary: /usr/share/pocketsphinx/model/lm/en_US/cmu07a.dic
INFO: dict.c(211): Allocated 1010 KiB for strings, 1664 KiB for phones
INFO: dict.c(335): 133436 words read
INFO: dict.c(341): Reading filler dictionary: /usr/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/noisedict
INFO: dict.c(211): Allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(344): 11 words read
INFO: dict2pid.c(396): Building PID tables for dictionary
INFO: dict2pid.c(404): Allocating 50^3 * 2 bytes (244 KiB) for word-initial triphones
INFO: dict2pid.c(131): Allocated 30200 bytes (29 KiB) for word-final triphones
INFO: dict2pid.c(195): Allocated 30200 bytes (29 KiB) for single-phone word triphones
INFO: ngram_model_arpa.c(77): No \data\ mark in LM file
INFO: ngram_model_dmp.c(142): Will use memory-mapped I/O for LM file
INFO: ngram_model_dmp.c(196): ngrams 1=5001, 2=436879, 3=418286
INFO: ngram_model_dmp.c(242):     5001 = LM.unigrams(+trailer) read
INFO: ngram_model_dmp.c(288):   436879 = LM.bigrams(+trailer) read
INFO: ngram_model_dmp.c(314):   418286 = LM.trigrams read
INFO: ngram_model_dmp.c(339):    37293 = LM.prob2 entries read
INFO: ngram_model_dmp.c(359):    14370 = LM.bo_wt2 entries read
INFO: ngram_model_dmp.c(379):    36094 = LM.prob3 entries read
INFO: ngram_model_dmp.c(407):      854 = LM.tseg_base entries read
INFO: ngram_model_dmp.c(463):     5001 = ascii word strings read
INFO: ngram_search_fwdtree.c(99): 788 unique initial diphones
INFO: ngram_search_fwdtree.c(147): 0 root, 0 non-root channels, 60 single-phone words
INFO: ngram_search_fwdtree.c(186): Creating search tree
INFO: ngram_search_fwdtree.c(191): before: 0 root, 0 non-root channels, 60 single-phone words
INFO: ngram_search_fwdtree.c(326): after: max nonroot chan increased to 13428
INFO: ngram_search_fwdtree.c(338): after: 457 root, 13300 non-root channels, 26 single-phone words
INFO: ngram_search_fwdflat.c(156): fwdflat: min_ef_width = 4, max_sf_win = 25
INFO: continuous.c(371): ./pocketsphinx_continuous COMPILED ON: Oct 11 2013, AT: 11:34:56

Warning: Could not find Mic element
FATAL_ERROR: "continuous.c", line 254: Failed to calibrate voice activity detection

? i have 2 mics. they work and are recognized....any thoughts?
"The wise know their weakness too well to assume infallibility; and he who knows most, knows best how little he knows." - Thomas Jefferson

User avatar
H4LF82
Posts: 123
Joined: Tue 02 Oct 2012, 04:22

#18 Post by H4LF82 »

when i switchbetween mics, i get this...

Code: Select all


sh-4.1# ./pocketsphinx_continuous 
INFO: cmd_ln.c(691): Parsing command line:
./pocketsphinx_continuous 

Current configuration:
[NAME]		[DEFLT]		[VALUE]
-adcdev				
-agc		none		none
-agcthresh	2.0		2.000000e+00
-alpha		0.97		9.700000e-01
-argfile			
-ascale		20.0		2.000000e+01
-aw		1		1
-backtrace	no		no
-beam		1e-48		1.000000e-48
-bestpath	yes		yes
-bestpathlw	9.5		9.500000e+00
-bghist		no		no
-ceplen		13		13
-cmn		current		current
-cmninit	8.0		8.0
-compallsen	no		no
-debug				0
-dict				
-dictcase	no		no
-dither		no		no
-doublebw	no		no
-ds		1		1
-fdict				
-feat		1s_c_d_dd	1s_c_d_dd
-featparams			
-fillprob	1e-8		1.000000e-08
-frate		100		100
-fsg				
-fsgusealtpron	yes		yes
-fsgusefiller	yes		yes
-fwdflat	yes		yes
-fwdflatbeam	1e-64		1.000000e-64
-fwdflatefwid	4		4
-fwdflatlw	8.5		8.500000e+00
-fwdflatsfwin	25		25
-fwdflatwbeam	7e-29		7.000000e-29
-fwdtree	yes		yes
-hmm				
-infile				
-input_endian	little		little
-jsgf				
-kdmaxbbi	-1		-1
-kdmaxdepth	0		0
-kdtree				
-latsize	5000		5000
-lda				
-ldadim		0		0
-lextreedump	0		0
-lifter		0		0
-lm				
-lmctl				
-lmname		default		default
-logbase	1.0001		1.000100e+00
-logfn				
-logspec	no		no
-lowerf		133.33334	1.333333e+02
-lpbeam		1e-40		1.000000e-40
-lponlybeam	7e-29		7.000000e-29
-lw		6.5		6.500000e+00
-maxhmmpf	-1		-1
-maxnewoov	20		20
-maxwpf		-1		-1
-mdef				
-mean				
-mfclogdir			
-min_endfr	0		0
-mixw				
-mixwfloor	0.0000001	1.000000e-07
-mllr				
-mmap		yes		yes
-ncep		13		13
-nfft		512		512
-nfilt		40		40
-nwpen		1.0		1.000000e+00
-pbeam		1e-48		1.000000e-48
-pip		1.0		1.000000e+00
-pl_beam	1e-10		1.000000e-10
-pl_pbeam	1e-5		1.000000e-05
-pl_window	0		0
-rawlogdir			
-remove_dc	no		no
-round_filters	yes		yes
-samprate	16000		1.600000e+04
-seed		-1		-1
-sendump			
-senlogdir			
-senmgau			
-silprob	0.005		5.000000e-03
-smoothspec	no		no
-svspec				
-time		no		no
-tmat				
-tmatfloor	0.0001		1.000000e-04
-topn		4		4
-topn_beam	0		0
-toprule			
-transform	legacy		legacy
-unit_area	yes		yes
-upperf		6855.4976	6.855498e+03
-usewdphones	no		no
-uw		1.0		1.000000e+00
-var				
-varfloor	0.0001		1.000000e-04
-varnorm	no		no
-verbose	no		no
-warp_params			
-warp_type	inverse_linear	inverse_linear
-wbeam		7e-29		7.000000e-29
-wip		0.65		6.500000e-01
-wlen		0.025625	2.562500e-02

INFO: cmd_ln.c(691): Parsing command line:
\
	-nfilt 20 \
	-lowerf 1 \
	-upperf 4000 \
	-wlen 0.025 \
	-transform dct \
	-round_filters no \
	-remove_dc yes \
	-svspec 0-12/13-25/26-38 \
	-feat 1s_c_d_dd \
	-agc none \
	-cmn current \
	-cmninit 56,-3,1 \
	-varnorm no 

Current configuration:
[NAME]		[DEFLT]		[VALUE]
-agc		none		none
-agcthresh	2.0		2.000000e+00
-alpha		0.97		9.700000e-01
-ceplen		13		13
-cmn		current		current
-cmninit	8.0		56,-3,1
-dither		no		no
-doublebw	no		no
-feat		1s_c_d_dd	1s_c_d_dd
-frate		100		100
-input_endian	little		little
-lda				
-ldadim		0		0
-lifter		0		0
-logspec	no		no
-lowerf		133.33334	1.000000e+00
-ncep		13		13
-nfft		512		512
-nfilt		40		20
-remove_dc	no		yes
-round_filters	yes		no
-samprate	16000		1.600000e+04
-seed		-1		-1
-smoothspec	no		no
-svspec				0-12/13-25/26-38
-transform	legacy		dct
-unit_area	yes		yes
-upperf		6855.4976	4.000000e+03
-varnorm	no		no
-verbose	no		no
-warp_params			
-warp_type	inverse_linear	inverse_linear
-wlen		0.025625	2.500000e-02

INFO: acmod.c(246): Parsed model-specific feature parameters from /usr/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/feat.params
INFO: feat.c(713): Initializing feature stream to type: '1s_c_d_dd', ceplen=13, CMN='current', VARNORM='no', AGC='none'
INFO: cmn.c(142): mean[0]= 12.00, mean[1..12]= 0.0
INFO: acmod.c(167): Using subvector specification 0-12/13-25/26-38
INFO: mdef.c(517): Reading model definition: /usr/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/mdef
INFO: mdef.c(528): Found byte-order mark BMDF, assuming this is a binary mdef file
INFO: bin_mdef.c(336): Reading binary model definition: /usr/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/mdef
INFO: bin_mdef.c(513): 50 CI-phone, 143047 CD-phone, 3 emitstate/phone, 150 CI-sen, 5150 Sen, 27135 Sen-Seq
INFO: tmat.c(205): Reading HMM transition probability matrices: /usr/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/transition_matrices
INFO: acmod.c(121): Attempting to use SCHMM computation module
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /usr/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/means
INFO: ms_gauden.c(292): 1 codebook, 3 feature, size: 
INFO: ms_gauden.c(294):  256x13
INFO: ms_gauden.c(294):  256x13
INFO: ms_gauden.c(294):  256x13
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: /usr/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/variances
INFO: ms_gauden.c(292): 1 codebook, 3 feature, size: 
INFO: ms_gauden.c(294):  256x13
INFO: ms_gauden.c(294):  256x13
INFO: ms_gauden.c(294):  256x13
INFO: ms_gauden.c(354): 0 variance values floored
INFO: s2_semi_mgau.c(903): Loading senones from dump file /usr/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/sendump
INFO: s2_semi_mgau.c(927): BEGIN FILE FORMAT DESCRIPTION
INFO: s2_semi_mgau.c(1022): Using memory-mapped I/O for senones
INFO: s2_semi_mgau.c(1296): Maximum top-N: 4 Top-N beams: 0 0 0
INFO: dict.c(317): Allocating 137543 * 20 bytes (2686 KiB) for word entries
INFO: dict.c(332): Reading main dictionary: /usr/share/pocketsphinx/model/lm/en_US/cmu07a.dic
INFO: dict.c(211): Allocated 1010 KiB for strings, 1664 KiB for phones
INFO: dict.c(335): 133436 words read
INFO: dict.c(341): Reading filler dictionary: /usr/share/pocketsphinx/model/hmm/en_US/hub4wsj_sc_8k/noisedict
INFO: dict.c(211): Allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(344): 11 words read
INFO: dict2pid.c(396): Building PID tables for dictionary
INFO: dict2pid.c(404): Allocating 50^3 * 2 bytes (244 KiB) for word-initial triphones
INFO: dict2pid.c(131): Allocated 30200 bytes (29 KiB) for word-final triphones
INFO: dict2pid.c(195): Allocated 30200 bytes (29 KiB) for single-phone word triphones
INFO: ngram_model_arpa.c(77): No \data\ mark in LM file
INFO: ngram_model_dmp.c(142): Will use memory-mapped I/O for LM file
INFO: ngram_model_dmp.c(196): ngrams 1=5001, 2=436879, 3=418286
INFO: ngram_model_dmp.c(242):     5001 = LM.unigrams(+trailer) read
INFO: ngram_model_dmp.c(288):   436879 = LM.bigrams(+trailer) read
INFO: ngram_model_dmp.c(314):   418286 = LM.trigrams read
INFO: ngram_model_dmp.c(339):    37293 = LM.prob2 entries read
INFO: ngram_model_dmp.c(359):    14370 = LM.bo_wt2 entries read
INFO: ngram_model_dmp.c(379):    36094 = LM.prob3 entries read
INFO: ngram_model_dmp.c(407):      854 = LM.tseg_base entries read
INFO: ngram_model_dmp.c(463):     5001 = ascii word strings read
INFO: ngram_search_fwdtree.c(99): 788 unique initial diphones
INFO: ngram_search_fwdtree.c(147): 0 root, 0 non-root channels, 60 single-phone words
INFO: ngram_search_fwdtree.c(186): Creating search tree
INFO: ngram_search_fwdtree.c(191): before: 0 root, 0 non-root channels, 60 single-phone words
INFO: ngram_search_fwdtree.c(326): after: max nonroot chan increased to 13428
INFO: ngram_search_fwdtree.c(338): after: 457 root, 13300 non-root channels, 26 single-phone words
INFO: ngram_search_fwdflat.c(156): fwdflat: min_ef_width = 4, max_sf_win = 25
INFO: continuous.c(371): ./pocketsphinx_continuous COMPILED ON: Oct 11 2013, AT: 11:34:56

Warning: Could not find Mic element
READY....
i assume it cannot find the mic? i dunno...ill keep picking at it. no smoking HDDs tho so its progress...

:D
"The wise know their weakness too well to assume infallibility; and he who knows most, knows best how little he knows." - Thomas Jefferson

User avatar
H4LF82
Posts: 123
Joined: Tue 02 Oct 2012, 04:22

#19 Post by H4LF82 »

despite the error message, it DOES seem to be listening!

NICE JOB!

give me a few to play with this and see what I cant make of it :) Looks like part 2 may be close to done :D

Ill be back....
"The wise know their weakness too well to assume infallibility; and he who knows most, knows best how little he knows." - Thomas Jefferson

User avatar
H4LF82
Posts: 123
Joined: Tue 02 Oct 2012, 04:22

#20 Post by H4LF82 »

Code: Select all

#!/bin/sh
file="inputtxt"
pocketsphinx_continuous | while read LINE; do
case "$LINE" in
  echo "$LINE" >> "$file"
done
im sure that to output the line into a file we want to do something like this, but im not getting something quite right here. syntax...

i have created a script in the usr/bin folder and given it the above code to chew on, but im getting no joy as yet. ill figure it out tho...might take me a minute to nail down but ill get it.

if any other code monkey wants to jump in and tell me my syntax error i would not complain...feel free! but this is not so tough and ill untangle it sooner or later.
"The wise know their weakness too well to assume infallibility; and he who knows most, knows best how little he knows." - Thomas Jefferson

Post Reply