List of Features Specified of Development of Sphinx 3



Memorandum on Provisional List of Features for Open Source CMU Sphinx in Arthur Chan’s Employment

Introduction:

This is a memorandum of software features (or the features), which are pre-disclosed to Scanscout Inc. before employment of Arthur Chan (the employee). Both Scanscout and the employee have agreed that

1, Scanscout will support the employee’s activity on open source programming.

2, The source code that implements all and part of the specified features upon their completion, during the employment period in Scanscout Inc., will be released under Berkeley style license specified in

given the following conditions

a, there is a mutual consent between Scanscout Inc. and the employee at the time of completion.

3, The list is not exhaustive, Scanscout Inc. and the employee will expand and revise the list at will.

4, This agreement will only apply to the employee and the employee only.

5, The list is not the job description of the employee, it should be noted that this is just a potential features and its execution is not guaranteed by either the employee and Scanscout.

General:

-General maintenance and administration of Sphinx 2, Sphinx 3 (or Sphinx 3.X), Sphinx 4, Pocket Sphinx, CMU-Cambridge Language Modeling Toolkit Version 2 and documentation Hieroglyphs.

-Software architecting, that includes general development, code re-factoring.

-CMU Sphinx forum mediation.

-Incorporation of bug fixes and new feature patches.

-Regression testing.

-Documentation.

Decoder work, this covers

Sphinx 2, Sphinx 3.X (Sphinx 3.6, Sphinx 3.7, Sphinx 3.8, Sphinx 3.9, Sphinx 3.10), Sphinx 4, Pocket Sphinx

-Enable the decoder to support more than 65536 words in the decoders

-Enable full n-grams to be supported in the decoders

-Enable full context-dependent phoneme units to be fully utilized in the decoders

-Enable different smoothing and discounting methods of language modeling to be used in the decoders.

-Enable different types of features, including perceptual linear prediction coding, in the front-end of the decoders

-Enable HTK and Sphinx compatibilities

-Complete at least one search algorithm which could give at least 10% error rate improvement

-Enable usage of semi-tied covariance matrix.

-Enable usage of eigenvoices and eigenvector methods.

-Enable usage of transformation-based features, this includes principle component analysis, linear discriminative analysis.

-Enhance usage of maximum likelihood linear regression, maximum a posterior speaker adaptation methods.

-Enable usage of constrained maximum likelihood linear regression, maximum likelihood linear transformation methods.

-Enable usage of vocal tract length normalization method

-Enable the usage of k-d tree speed-up methods in Sphinx 3.X

-Enable one method of speaker verification.

-Enable one method of speaker identification.

-Enable one method of keyword spotting.

-Enable arbitrary topologies of hidden Markov model in decoding.

-Enable at least one method to automatically tune the speed and accuracy of the recognizer

-Enable sub-vector quantization or sub-space distribution hidden Markov model.

-Enable different grammars to be converted to Sphinx finite state grammar formats.

-Enable different language model formats to be converted to Sphinx language model formats.

-Enable different acoustic model formats to be converted to Sphinx language model formats.

SphinxTrain

-Complete one implementation of Baum-Welch algorithm for hidden-Markov model.

-Enable speaker adaptive training

-Enable one method of speaker-diarization

-Enable one method of speaker clustering.

-Enable discriminative training using maximum mutual information estimates (MMIE) and maximum phone error rate (MPE) method

-Enable training of sub-vector quantization

-Complete one implementation of a C program that enable training in one command.

-Continue to develop and enhance training scripts in perl and python.

-Enable arbitrary topologies of hidden Markov model in Baum-Welch training.

CMU-Cambridge Language Modeling Toolkit V2 (Names subject to change)

-Enable Modified Kneser-Ney smoothing and discounting

-Enable support of class-based language modeling

-Enable at least one method for maximum entropy method of language modeling

-Enable one method for latent-semantic analysis–based language modeling

-Enable one method for neural network-based language modeling.

-Enable one method for skip-based language modeling

-Continue to develop and enhance training scripts in perl and python.

Hieroglyphs (sphinxDoc)

-Completion of the document.

_______________________

Arthur Chan, the employee

_______________________

Waikit Lau,

Chief Executive Officer, Scanscout

_______________________

Steven Lee

Chief Technology Officer, Scanscout

Appendix: the license.

/* ====================================================================

* Copyright (c) 1999-2001 Carnegie Mellon University. All rights

* reserved.

*

* Redistribution and use in source and binary forms, with or without

* modification, are permitted provided that the following conditions

* are met:

*

* 1. Redistributions of source code must retain the above copyright

* notice, this list of conditions and the following disclaimer.

*

* 2. Redistributions in binary form must reproduce the above copyright

* notice, this list of conditions and the following disclaimer in

* the documentation and/or other materials provided with the

* distribution.

*

* This work was supported in part by funding from the Defense Advanced

* Research Projects Agency and the National Science Foundation of the

* United States of America, and the CMU Sphinx Speech Consortium.

*

* THIS SOFTWARE IS PROVIDED BY CARNEGIE MELLON UNIVERSITY ``AS IS'' AND

* ANY EXPRESSED OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,

* THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR

* PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL CARNEGIE MELLON UNIVERSITY

* NOR ITS EMPLOYEES BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,

* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT

* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,

* DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY

* THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT

* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE

* OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

*

* ====================================================================

*

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download