UCSD DSP LAB | The Digital Signal Processing Lab at ...



The first part of the assignment is to build a simple digit recognition system with whole-word models using the HTK software toolkit. In brief, the process consists of the following steps:1) Building the task grammar (a "language model")2) Constructing a dictionary for the models3) Creating transcription files for training data4) Encoding the data (feature processing)5) (Re-)training the acoustic models6) Evaluating the recognizer against the test data7) Reporting recognition resultsEach of these is explained in a little more detail below. For a comprehensive overview, please refer to the HTKBook.1) Building the task grammar The task grammar defines constraints on what the recognizer can expect as input. In this problem, we use an FSG to represent the constraints (in the future, we will use statistical language models). Create a file called 'grammar' with the following contents:$digit = ONE | TWO | THREE | FOUR | FIVE |SIX | SEVEN | EIGHT | NINE | OH | ZERO;( SENT-START ( $digit ) SENT-END )( SENT-START ( <$digit> ) SENT-END )Convert this grammar to an HTK wordnet lattice using the HParse tool: HParse grammar wordnet2) Constructing a dictionaryThe dictionary provides an association between "words" used in the task grammar and the acoustic models, which may be composed of sub-word (phonetic, syllabic, etc.) units. Since we are using whole-word models in this assignment, the dictionary has a simple structure. Create a file called 'lexicon' that has the following structure:one onetwo two...nineninezerozerosent-start silsent-end silAlso, create a file named 'wlist' which contains the following lines:onetwo...ninezerosent-startsent-endNext, create an HTK edit script file, 'global.ded', that has the following commands:AS spRS cmuMP sil sp silFinally, create the dictionary using the HDMan tool:sort wlist /o wlistSortsort lexicon /o lexiconSortHDMan -m -w wlistSort -n models1 -l dlog dict lexiconSortThe dictionary used by HTK is 'dict'.3) Creating transcription files for training data For training, we need to tell the recognizer which files correspond to what digit. These transcriptions are provided in the form of a Master Label File (MLF) for compactness. You will need to construct the source MLF (source.mlf) as follows:#!MLF!#"*/00F1SET0.lab"zero."*/01F1SET0.lab"one."*/02F1SET0.lab"two.(etc.)It is assumed that 00F1SET0.WAV contains the utterance 'zero', and so on. Next, the model transcriptions must be obtained. For this, create an HTK edit script called 'mkphones0.led' containing the following:EXIS sil silDE spThen use the HLed tool for expanding the word transcriptions into model transcriptions (models0.mlf):HLEd -l '*' -d dict -i models0.mlf mkphones0.led source.mlf4) Encoding the dataThis is the feature extraction step. In order to specify to HTK the nature of the audio data (format, sample rate, etc.) and feature extraction parameters (type of feature, window length, pre-emphasis, etc.), create a configuration file (config) as follows:# Coding parameters ADDED NEXT LINE MYSELF <ALI july 27 09>SOURCEFORMAT = WAVTARGETKIND = MFCC_0_D_ATARGETRATE = 100000.0SAVECOMPRESSED = TSAVEWITHCRC = TWINDOWSIZE = 250000.0USEHAMMING = TPREEMCOEF = 0.97NUMCHANS = 26CEPLIFTER = 22NUMCEPS = 12ENORMALISE = FYou may want to change some parameters to suit your needs. Next, create an HTK script file (hcopy.scp) that contains the following lines:/home/ee619/assignment/digit/train/00F1SET0.WAV /home/your-login/your-preferred-folder/00F1SET0.mfc/home/ee619/assignment/digit/train/01F1SET0.WAV /home/your-login/your-preferred-folder/01F1SET0.mfc/home/ee619/assignment/digit/train/02F1SET0.WAV /home/your-login/your-preferred-folder/02F1SET0.mfc...One line for each file in the training set. This file tells HTK to extract features from each audio file in the first column and save them to the corresponding feature file in the second column. The command is:HCopy -T 1 -C config -S hcopy.scp HCopy -T 1 -C configExt -S hcopy.scp 5) (Re-)training the acoustic modelsYou will use a "flat-start" initialization of the model HMMs. First, create a file called 'train.scp' that lists all the training feature files, as follows:/home/your-login/your-preferred-folder/00F1SET0.mfc/home/your-login/your-preferred-folder/01F1SET0.mfc/home/your-login/your-preferred-folder/02F1SET0.mfc...Next, create a prototype HMM ('proto'), left-to-right configuration with 3 states and 39 dimensional feature vectors as follows:~o <VecSize> 39 <MFCC_0_D_A>~h "proto"<BeginHMM><NumStates> 5<State> 2<Mean> 390.0 0.0 0.0 ...<Variance> 391.0 1.0 1.0 ...<State> 3<Mean> 390.0 0.0 0.0 ...<Variance> 391.0 1.0 1.0 ...<State> 4<Mean> 390.0 0.0 0.0 ...<Variance> 391.0 1.0 1.0 ...<State> 5<Mean> 390.0 0.0 0.0 ...<Variance> 391.0 1.0 1.0 ...<State> 6<Mean> 390.0 0.0 0.0 ...<Variance> 391.0 1.0 1.0 ...<TransP> 50.0 1.0 0.0 0.0 0.00.0 0.6 0.4 0.0 0.00.0 0.0 0.6 0.4 0.00.0 0.0 0.0 0.7 0.30.0 0.0 0.0 0.0 0.0<EndHMM>Then, use HCompV to initialize the prototype model with means and variances from the training data.#USE A VERSION OF config that does not have SOURCEFORMAT=WAV or use HTK #instead of WAVHCompV -C config -f 0.01 -m -S train.scp -M hmm0 protoHCompV -C configTrain -f 0.01 -m -S train.scp -M hmm0 protoNote that you must first create a folder called 'hmm0' in the current folder, or the command will fail. This command places a new version of 'proto' in the 'hmm0' folder. Next, create a master macro file ('hmmdefs') in the 'hmm0' folder. This file should contain copies of the "proto" HMM renamed to each of the required models ('one' through 'zero', and 'sil').Then invoke the HERest tool for embedded re-estimation as follows:HERest -C config -I models0.mlf -t 250.0 150.0 1000.0 -S train.scp -H hmm0/macros -H hmm0/hmmdefs -M hmm1 models0HERest -C configTrain -I models0.mlf -t 250.0 150.0 1000.0 -S train.scp -H hmm0/macros -H hmm0/hmmdefs -M hmm1 models0when I run the above line it gives me error 6510 later. to fix this open models0.mlf and get rid of the the ' before and after the star so it% it should look like this #!MLF!#"*/000.lab"silzerosilwhere 'models0' is just 'models1' less the 'sp' model. Make sure the folder 'hmm1' is created before you run this command. Repeat the command a couple of more times:HERest -C config -I models0.mlf -t 250.0 150.0 1000.0 -S train.scp -H hmm1/macros -H hmm1/hmmdefs -M hmm2 models0HERest -C config -I models0.mlf -t 250.0 150.0 1000.0 -S train.scp -H hmm2/macros -H hmm2/hmmdefs -M hmm3 models0Now, increment the number of pdf mixtures from 1 to 2 as follows:HHEd -H hmm3/macros -H hmm3/hmmdefs -M hmm4 incmix.2.hed models0The contents of incmix.2.hed are:MU 2 {*.state[2-4].mix}Redo the HERest command a couple of more times. Assume that the most recent models are in hmm6. Now, increment the number of pdf mixtures from 2 to 4 as follows:HHEd -H hmm6/macros -H hmm6/hmmdefs -M hmm7 incmix.4.hed models0The contents of incmix.4.hed are:MU 4 {*.state[2-4].mix}Repeat HERest two more times to get the updated models in the folder 'hmm9'. Then, fix the 'sp' short pause model by "borrowing" a state from the 'sil' model:- Use a text editor on the file 'hmm9/hmmdefs' to copy the centre state of the 'sil' model to make a new 'sp' model and store the resulting MMF 'hmmdefs', which includes the new 'sp' model, in the new directory 'hmm10'.%SEE ERRATA- Run the HMM editor HHEd to add the extra transitions required and tie the 'sp' state to the centre 'sil' state. In this case, the command is:HHEd -H hmm10/macros -H hmm10/hmmdefs -M hmm11 sil.hed models1where sil.hed contains the following commandsAT 2 4 0.2 {sil.transP}AT 4 2 0.2 {sil.transP}AT 1 3 0.3 {sp.transP}TI silst {sil.state[3],sp.state[2]}Finally, run HERest two more times to obtain the final models in 'hmm13'. The acoustic models are now ready.%SEE ERRATATo run the recognizer and evaluate the results, use the following tools.HVite is the Viterbi decoder that performs recognition. It should be invoked as below:HVite -H hmm13/macros -H hmm13/hmmdefs -S test.scp -l '*' -i recout.mlf -w wordnet -p 0.0 -s 5.0 dict models1HVite -H hmm13/macros -H hmm13/hmmdefs -S test.scp -l '*' -i recout.mlf -w wordnet -p -70.0 -s 0 dict models1where 'test.scp' is the script file that contains a list of feature files that are to be recognized (you must create these feature files (*.mfc) using HCopy in the same manner as you created 'train.scp' for training).Finally, evaluate the performance of the speech recognizer using the HResults tool.HResults -I testref.mlf models1 recout.mlfwhere again you need to create 'testref.mlf' (test data reference transcriptions) the same way you created 'source.mlf' (training data transcriptions). The recognition results will be displayed in terms of Word Error Rate (WER), including the number of insertions, deletions and substitutions that would align the reference and recognized transcriptions.Errata and Addenda to the digit recognition tutorial.In step 1, define the grammar as follows:$digit = ONE | TWO | THREE | FOUR | FIVE |SIX | SEVEN | EIGHT | NINE | ZERO;( SENT-START ( $digit ) SENT-END )In step 2, define the dictionary as follows:ONE oneTWO two...NINEnineZEROzeroSENT-START [] silSENT-END [] siland then sort this dictionary in alphabetical order using the Linux 'sort' command. Sorting is necessary. The word list 'wlist' created in this step must also be all-caps. The model is in lower case, the dictionary word is in caps.In step 4, the config file should look somewhat like this:TARGETKIND= MFCC_0_D_ATARGETRATE=100000.0SAVECOMPRESSED=TSAVEWITHCRC=TWINDOWSIZE=300000.0USEHAMMING=TPREEMPCOEF=0.97NUMCHANS=26CEPLIFTER=22NUMCEPS=12ENORMALISE=FSOURCEFORMAT=NIST <-- this line is important!! HEADERSIZE=1024TARGETFORMAT=HTKIn step 5, when creating the proto HMM, you should define only 3 states. There is an error in the tutorial. Please delete the definitions of <STATE> 5 and <STATE> 6 i.e. your proto HMM should only contain <STATE> 2 through <STATE> 4, and the transition matrix.Also, in this step, use the following config file for HCompVTARGETKIND= MFCC_0_D_ATARGETRATE=100000.0SAVECOMPRESSED=TSAVEWITHCRC=TWINDOWSIZE=300000.0USEHAMMING=TPREEMPCOEF=0.97NUMCHANS = 26CEPLIFTER = 22NUMCEPS = 12ENORMALISE= FHEADERSIZE=1024TARGETFORMAT=HTK HCompV creates a new 'proto' in the folder 'hmm0'. It also creates a file called 'vFloors'. Now, in addition to creating 'hmmdefs' as defined in the tutorial, you must also create a file called 'macros' in which you copy and paste the contents of the file 'vFloors'. For running HERest in the remaining steps, use the same config file you used for HCompV.When you add the 'sp' model, create a new HMM definition in the 'hmmdefs' file (you can add it at the end of the file) as follows:~h "sp"<BEGINHMM><NUMSTATES> 3<STATE> 2< here is stuff copied from under <STATE> 3 of the "sil" model ><TRANSP> 3 0.000000e+00 5.000000e-01 5.000000e-01 0.000000e+00 5.000000e-01 5.000000e-01 0.000000e+00 0.000000e+00 0.000000e+00<ENDHMM>You need to provide this default transition matrix when creating the "sp" HMM. After adding the "sp" model, you will get a warning message from HERest, saying:Pruning-On[250.0 150.0 1000.0] WARNING [-2331] UpdateModels: sp[10] copied: only 0 egsIgnore this warning.Please remove all references to the word "OH" from the lexicon, wordlist and model list. We do not have any data that corresponds to "OH" and will not be using it. Be aware that everything to do with Linux / HTK is case sensitive. If you get any unexpected errors, check that all your files, config parameters and case stuff is in order.IF hmmdefs in hmm13 does not have a “sp” state then add this at the end of hmmdefs in hmm13~h "sp"<BEGINHMM><NUMSTATES> 3<STATE> 2~s "silst"<TRANSP> 3 0.000000e+000 7.000000e-001 3.000000e-001 0.000000e+000 6.000000e-001 4.000000e-001 0.000000e+000 0.000000e+000 0.000000e+000<ENDHMM> ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download