Speech Recognition HOWTO

Stephen Cook

                scook@gear21.com
            

 - {

    htakashi@yabumi.com
  

Revision History                                                        
Revision v1.2        February 5, 2002                                   
Added more commercial software listings (sent by Mayur Patel).          
Revision v1.1        October 5, 2001             Revised by: scc        
Added info for Vocalis Speechware. Fixed/Updated various other items.   
Revision v1.0        November 20, 2000           Revised by: scc        
Added info on L and H and HTK                                           
Revision v0.5        September 13, 2000          Revised by: scc        
Initial HOWTO Submission                                                

Linux ł̎F (ASR) ȒPɂȂ܂. J҂ł
[Uł\Ȃ̂܂. ̕ł, F̊bƂ
\ȃ\tgEFAɂċLq܂.



Table of Contents
1. @IȒ
   
    1.1. 쌠/CZX
    1.2. Ɛ
    1.3. W
   
2. Ou
   
    2.1.  
    2.2. ӎ
    2.3. Rg/ŐV/tB[hobN
    2.4. ToDo
    2.5. 
   
3. ͂߂
   
    3.1. F̊b
    3.2. F̃^Cv
    3.3. p@Ɖp
   
4. n[hEFA
   
    4.1. TEhJ[h
    4.2. }CN
    4.3. Rs[^/vZbT
   
5. F\tgEFA
   
    5.1. t[\tgEFA
       
        5.1.1. XVoice
        5.1.2. CVoiceControl/kVoiceControl
        5.1.3. Open Mind Speech
        5.1.4. GVoice
        5.1.5. ISIP
        5.1.6. CMU Sphinx
        5.1.7. Ears
        5.1.8. NICO ANN Toolkit
        5.1.9. Myers' Hidden Markov Model Software
        5.1.10. Jialong He's Speech Recognition Research Tool
        5.1.11. ܂ɂ܂?
       
    5.2. p\tgEFA
       
        5.2.1. IBM ViaVoice
        5.2.2. Vocalis Speechware
        5.2.3. Babel Technologies
        5.2.4. SpeechWorks
        5.2.5. Nuance
        5.2.6. Abbot/AbbotDemo
        5.2.7. Entropic
        5.2.8. ̏pi
       
6. F̓
   
    6.1. ǂ̂悤ɔFĂ邩
    6.2. fBW^I[fBI̊b
   
7. oŕ
   
    7.1. 
    7.2. C^[lbg
   
8. {ɂ

1. @IȒ

1.1. 쌠/CZX

(: c܂.)

This document is copyrighted (c) 2000-2002 Stephen C. Cook.

LICENSE: This document may be reproduced and distributed in whole or in
part, in any medium physical or electronic, provided that this license
notice is displayed in the reproduction. Commercial redistribution is
permitted and encouraged. Thirty days advance notice, via email to the
author, of redistribution is appreciated, to give the author time to
provide updated documents.

CZX: ̃CZX̒ɕ\Ă, ̕
ꕔ܂͑S, I邢͓dqIȂ}̂ŏC, 
Ƃł܂. IȍĔzz, Ă܂. 30O, 
҂ Email ʂ, Ĕzz̒ʒmƂꂵł, ҂ɍŐV̕
pӂ鎞Ԃ.

All modified documents, including translations, anthologies, and
partial documents, must meet the following requirements:

|A\W[, ̈ꕔ܂߂, SĂ̏Cꂽ͈ȉ̏
𖞑Ȃ΂Ȃ܂:

 E Modified versions must be labeled as such.
   
    Cꂽł͂̎|ĂȂ΂Ȃ܂.
   
 E The person making the modifications must be identified.
   
    CsȂl肳ĂȂ΂Ȃ܂.
   
 E Acknowledgement of the original author must be retained.
   
    IWi̒҂̏FۂĂȂ΂Ȃ܂
   
 E The location of the original unmodified document be identified.
   
    IWi̕ύXȌ̕ꏊ肳ĂȂ΂Ȃ܂.
   
 E The original author's name(s) may not be used to assert or imply
    endorsement of the resulting document without the original author's
    permission.
   
    ҂̋, ҂̖Og, ʂ̊̕mF咣
    Î肵Ȃŉ.
   
 E The author be notified by email of the modification in advance of
    redistribution.
   
    Ĕzz̑O, CɂĒ҂ email ŒʒmĂ.
   
 E As a special exception, anthologies of LDP documents may include a
    single copy of these license terms in a conspicuous location within
    the anthology and replace other copies of this license with a
    reference to the single copy of the license without the document
    being considered "modified" for the purposes of this section.
   
    ʂȗOƂ, LDP ̃̕A\W[, ̃CZX
    ̒P̃Rs[A\W[̖̓ڗꏊɊ܂, ̃CZ
    X̑̃Rs[, ̒P̃CZX̃Rs[ւ̎QƂŊ邱
    ܂. ̏ꍇ͖{߂̖ړI͕ύXƌȂ܂.
   
Mere aggregation of LDP documents with other documents or programs on
the same media shall not cause this license to apply to those other
works.

fBAő̕vOW߂ LDP ̒PȂŴ,
̑̍iɂ̃CZXKp邱Ƃ͂܂.

All translations, derivative documents, or modified documents that
incorporate this document may not have more restrictive license terms
than these, except that you may require distributors to make the
resulting document available in source format.

zz҂ɐ̕\[X̌`œł悤ɋ߂ꍇ,
SĂ̖|, h, 邢͂̕gݍŏCꂽ
ȏ㌵CZXĂ͂܂.



1.2. Ɛ

(: c܂.)

The author disclaims all warranties with regard to this document,
including all implied warranties of merchantability and fitness for a
certain purpose; in no event shall the author be liable for any
special, indirect or consequential damages or any damages whatsoever
resulting from loss of use, data or profits, whether in an action of
contract, negligence or other tortious action, arising out of or in
connection with the use of this document.

҂, SĂ̏sׂ\ł邱Ƃ̈Öق̕ۏ, ړI֓K邱
Ƃ܂߂Ă̕ɊւSĂ̕ۏ؂܂; ǂ̂悤ȏo
Ă, ̎̕gpƂ̌p̓OŋN, K̒̊, Ӗ
邢͑̕s@sׂɂ̂ł낤, ʂ, ԐړI܂,
ʓIȑQgp, f[^, v̑ɂ鑹QȂǂɑ΂č҂͐ӔC
𕉂܂.



1.3. W

̕Ɋ܂܂SĂ̏W͂ꂼ̏L҂̒쌠/o^Wł.



2. Ou

2.1.

͉̕F̊wK, Ă݂悤ƂĂ鏉璆
x Linux [U^[QbgɂĂ܂. ܂, J
̂߂ɉFɊւvO~O̊bɂĂ܂.

ǂ̂悤ȉF\tgEFAƊJp̃Cu Linux Ŏgpł
̂𒲂׎n߂Ƃɂ͂̕߂܂. Linux ł̎
F (ASR ܂͒P SR) ͂傤ǖ{̂𔭊͂߂Ƃ, ̕
Ő֌㉟ł邱ƂĂ܂ - ASR Zp̃[UƊJ
҂̗T|[g邱Ƃ.  

̕ SR ̋ZpɂĂ͐GĂ܂, ̑ "HOWTO" Ƃ
ʂɏWĂ܂ ( HOWTO łc). ŃJo[łĂ
ƂɂĂ, ǎ҂{LT悤ɏoŕ̐߂
pӂ܂. ꂪLinux  ASR ɂĂ̍ŏIIȕ񍐂ƂƂł
܂.

̍̕ŐVł, LDP ̃A[JCu`FbN邩, http://
www.gear21.com/speech/index.html肵Ă.



2.2. ӎ

̕, Ăȉ̐lXɊӂ܂:

 E Jessica Perry Hekman
   
 E Geoff Wexler
   


2.3. Rg/ŐV/tB[hobN

Rg, , , ŐV񂪂, ܂,  ASR ɂă`b
gƂ, ̃AhX scook@gear21.com <mailto:scook@gear21.com>
 Email .



2.4. ToDo

ȉ̂Ƃ "to do" ƂĎcĂ܂:

 E oŕ̐߂ɐ.
   
 E oŕ̐߂ɂ葽̖{.
   
 E 葽̃Ntŉ.
   
 E ASR VXe̎菇ɂĂ̐[.
   
 E FFT ƃtB^[̐.
   
 E DSP ̌̐.
   


2.5. 

v0.1 ŏ̑ 2000N 8

v0.5 ŏI 2000N 9



3. ͂߂

3.1. F̊b

FƂ, Rs[^ (邢͑̃^Cv̋@B) btF
鏈ł. {Iɂ, Rs[^ɌĘb, ̌tRs
[^ɐFƂӖł.

ȉ̒`͉F̋Zp𗝉邽߂ɕKvȊbł.

b
   
    b, 1̈Ӗ\P₢̌tRs[^Ɍ
     (b) Ƃł. b͒Pł, tł, 
    ł, 邢͕̕ł肵܂.
   
b҂ւ̈ˑ
   
    b҂ɈˑVXe͓̘b҂ΏۂƂĐ݌v܂. ̃V
    Xe͈ʂ, ̘̓b҂̔ɑ΂Ă͐mł, ̘b
    ł͐xƈȂ܂. ͘b҂̐ƑxŘb
    Ƃ肵Ă܂. b҂ɈˑȂVXe͗lXȘb҂ɌĐ
    v܂. ̂VXe͕, b҂ɈˑȂVXe
    ăX^[g, wKZp𗘗pĔFx߂邱ƂŘb҂ɓK
    Ă܂.
   
b
   
    b (邢͎) Ƃ, SR VXeɔF邽߂̌t┭b
    Xgł. ʂ, Rs[^ɂƂĂ͏Ȃb̂قF
    ₷, bȂقǔFɂȂ܂. ʂ̎Ƃ͈ق
    , ꂼ̍ڂ͒Pł͂܂. ͕╶͂قǒ
    邱Ƃ܂. Ȃb12̔Fꂽ (Ⴆ "Wake
    up") Ȃ܂, ƂĂbł 10 ȏƂȂ
    ܂.
   
x
   
    Fu̔\͂͂̐x𑪒肷邱Ƃɂ, 邢͂܂, b
    ꂽtǂꂭ炢F邩ɂĒׂ邱Ƃł܂. 
    b𐳊mɓ肷邾łȂ, bbɊ܂܂Ă邩ǂ
    肷邱Ƃ܂ł܂. ǂ ASR VXe 98% ȏ̐x
    ܂. VXe̐x̋e͈̗͂͂prɋˑ܂.
   
wK
   
    b҂ɏ\͂F܂. VXe̔\͂
    ĂƂ, wK邱Ƃł܂. ASR VXe͘b҂ɕW
    IȌtʓIȌtJԂ, r̃ASY̘b
    ɒa邱ƂŊwK܂. ʂɔFuwK邱Ƃ, 
    ̐x͌サ܂.
   
    wK, b₠̒P̔܂łȂb҂ɂp
    ܂. b҂тĔbJԂ, wK@\̂ ASR VX
    e͓K邱Ƃ\ł傤.
   


3.2. F̃^Cv

F̃VXe, ǂ̂悤ȃ^Cv̔bF\͂Ă
ɂ, ̃NXɕނ邱Ƃł܂. ̂悤ȃNX
b҂bn, Î𑪒肷\͂ ASR ̓ 1
łƂɊÂĂ܂. ̃pbP[Wgp̃[hɂ
, ̃NXɓK܂.

Ǘt
   
    Ǘt̔Fɂ, ꂼ̔bƂɃTvEBhE (T
    v̊JnI̊)̑Oɉ̂Ȃ (I[fBIM̖
    )KvƂȂ܂. FuP󂯎Ƃ킯łȂ
    xɂ͔b͈ƂӖł. ̃VXeł͕ʂȂ̂ł,
    ``͏ / F'' Ƃ 2 ̏Ԃ邽߁Cb҂͂Ƃ
    ƂɘbȂ΂Ȃ܂ (ƂꂽƂɔF
    ܂). Ǘb͂̃NXł͂ǂO܂.
   
At
   
    At (邢͂萳m 'Ab') ̃VXe͌Ǘ
    t̃VXeɎĂ܂, ԂɍŒZ̋x~݂͂Ȃ '
    Ĕ' ʂ̔bF܂.
   
A
   
    AF̃Xebvł. AFł鑕u͍ł
    ɂ̂ł, ȂȂ甭b̋E肷邽߂ɓȕ@
    gpȂ΂ȂȂł. AFu̓[Uɂق
    ǎRɘbƂ܂, ŃRs[^e肵܂.
    {I, ̓Rs[^̏ł.
   
Rȉ
   
    ۂɎRȉł邩̒`͂܂܂悤ł. {I
    iKł, ͎Rȉ̔łČJԂ̂ł͂ȂƂ
    lł邩܂. Rȉ̋@\ ASR VXe
    "ums"  "ahs" Ȃ, ꂽtȂǂ̂܂܂ȎR̉̓
    , Ȍ肳, Ƃ\ł傤.
   
ƍ/
   
     ASR VXẽ͓[Uʂ@\Ă܂.
    ̕ł͏ƍZLeB̂߂̃VXeɂĂ͈܂.
   


3.3. p@Ɖp

Rs[^ƐlԂ𒇉dSʂɂ, ASR ̏oԂ邩
܂. ݂͉LɋAvP[VʓIł.


   
    , łʓI ASR VXe̎gp@ł. ͈ʂ
    ƓlɈwL^]ʂ, @d̏܂݂܂. V
    Xe̐xコ邽߂, ʂȌbgꍇ܂.
   
߃VXe
   
    Rs[^̃R}hs ASR VXêƂ, ߃VX
    eƒ`܂. "Open Netscape"  "Start a new xterm" ̂悤ɉ
    Ŗ߂, bǂ̃R}hs܂.
   
db
   
     PBX/Voice [VXe, {^ɃR}h
    bƂœdb܂.
   
gы@
   
    ͎i肳Ăgы@ł, bƂ͓R\ł.
   
/nfBLbv
   
    ̐l, ^ߑ (RSI), ؃WXgtB[Ȃǂ̂悤Ȑg
    ̓IȐ̂߂Ƀ^CsOɖĂ܂. ႦΒoɖ
    ̂l, ̐eLXgɕς邽߂ɓdbɐڑꂽVXe
    gpłł傤.
   
gݍ݃AvP[V
   
    Vgѓdb̂Ȃɂ "Call Home" ̂悤Ȕb߂ C&C 
    F̂܂. ͏ ASR  Linux ̎v
    ƂȂ邩܂. Ȃ͂܂erɘbȂ̂ł
    ?
   


4. n[hEFA

4.1. TEhJ[h

͔rIႢш敝KvƂ̂, x獂i 16 rbgT
EhJ[hȂgł傤. J[lŃTEhLɂĐh
CoCXg[Ȃ΂Ȃ܂. TEhJ[hɂĂ̂
ȏ̏ http://www.LinuxDoc.org/ ɂ "The Linux Sound HOWTO" 
Ă. TEhJ[h̕iɂĂ͐xƃmCỶeɂ,
΂΋c_܂N܂.

łY A/D (AiOfBW^) ւ̕ϊ@\TEhJ
[h߂܂, ΂΃fBW^Tv̖Ă̓}CN̐\Ɉ
, ͂̃mCYɂ͂傫ˑ܂. j^, PCI Xbg,
n[hfBXNȂǂ̓dCMIȃmCY͂ӂ, Rs[^̃t@
֎q̂މ, ċz畷mCYɔׂďȂ̂ł.

ASR \tgEFApbP[Wɂ͓̃TEhJ[hKvƂ̂
܂. ̃n[hEFAւ̈ˑ̂͒ʏǂƂł, Ȃ
珫̑I߂Ă܂ł. K؂ɓ삷邽߂ɂ͓ʂ
n[hEFAKvƂȂ悤ȃpbP[WlĂ̂Ȃ, Ȃ͗
vƃRXglȂ΂ȂȂł傤.



4.2. }CN

}CN̕i ASR gŏdvł. ̏ꍇɂ, }CN
̎gp@Ɍ܂. ͂̃mCYEɂȂ̂, ASR vO
܂삵ȂƂɂȂ܂.

}CNƎÂĂ̂͑ςȂ̂, nh}CNőP̑I
ł͂܂. ͂̃mCY̗ʂ}Ȃ, pɂɘb҂ςꍇ
FuɌĘbƂ܂Ȃꍇ (wbhZbgt邱Ƃ
IłȂƂ) ł֗ł.

fRlĈԂ悢I̓wbhZbgł. g, 
̌ƂɃ}CNu܂܂ł, ͂̑ŏɗ}邱Ƃł
܂. wbhZbg̓CAẑ̖Ƃ (mXeI)
܂. XeĨwbhz߂܂, ͌l̍D݂̖
.

$25  $100 炢őf炵\wbhZbg^}CN
܂. http://www.headphones.com  http://www.speechcontrol.com T
Ă݂Ă.

xɂĂ̒Z: }CÑ{[グ邱ƂYȂ
.  XMixer  OSS Mixer ̂悤ȃvOgčsȂ
Ƃł܂, ătB[hobNmCY悤Ɏgp邱Ƃɒ
ӂĂ. ASR \tgEFA߃vO܂ł, 
ɎgĂ, ͂̓̔FVXeɍœK
Ă܂.



4.3. Rs[^/vZbT

ASR AvP[V̓vZbT̑xɋˑ邱Ƃ܂. 
 ASR ł͑ςȗʂ̃fBW^tB^OƐMN肤邩
ł.

CPU ׂ̍\tgEFAƓ, قǗǂȂ܂. ܂, 
傫悭Ȃ܂.  ASR  100MHz  16MB  RAM ł
\ł, ŏ (傫Ȏ╡GȔFXL[, Tv
[g) ɂ, Œł 400MHz  128MB  RAM ǂł傤. KvƂ
\̊֌W, قƂǂ̃\tgEFAł͍ŏ̕KvLڂ
Ă܂.

K͂̔FsȂ̂, NX^ (Beowulf ⑼̂) 𗘗p邱
͍sȂĂ܂. isJ̃vWFNgmȂ炨m点
. scook@gear21.com <mailto:scook@gear21.com>



5. F\tgEFA

5.1. t[\tgEFA

ŋt[\tgEFȂ, _E[hł܂:
http://sunsite.uio.no/pub/Linux/sound/apps/speech/



5.1.1. XVoice

XVoice ͂܂܂ XWindow AvP[VŎgpł鉹F̃\t
gEFA, AF\ł. [U}N`
邱Ƃł, mȖ̂ǂvOł. xݒ肷, [
Ȑxœ삵܂.

XVoice g߂ɂ IBM  ViaVoice for Linux (p̐߂Ă
) 肵ăCXg[Kv܂. ܂ ViaVoice 𐳂
삳邽߂ɐݒ肪Kvł.  Lesstif/Motif (libXm) Kvł.
̃vO X Window ƂƂ肷̂, X \[X𗘗pł
ɂĂȂ΂ȂȂƂɒӂ邱Ƃdvł, ̂, l
bg[NɌp}V}`[Ũ}VŎgpƂ, C
tĂ.

̃\tgEFA͎Ƀ[Uł. RPM ł܂.

HomePage: http://www.compapp.dcu.ie/~tdoris/Xvoice/ http://
www.zachary.com/creemer/xvoice.html

Project: http://xvoice.sourceforge.net

Community: http://www.onelist.com/community/xvoice



5.1.2. CVoiceControl/kVoiceControl

CVoiceControl (Console Voice Control ̗) ͌X KVoiceControl(KDE
Voice Control) ł. ̃vO̓[UR}hbƂ
Linux ̃R}hsł, {IȉFVXeł.
CVoiceControl  KVoiceControl ɒu܂.

̃\tgEFAɂ̓}CNxݒ肷郆[eBeB, VR}
hƔbǉ邽߂̌bfGfB^, FVXe܂܂
܂.

CVoiceControl  ASR n߂悤ƂoLxȃ[UɂƂ, f炵
o_ƂȂ܂. K[UthłƂ͌܂, 
wK, ƂĂ𗧂܂. ZbgAbvsɂ̓hL
gǂǂł.

̃\tgEFA͎Ƀ[Uł.

Homepage: http://www.kiecza.de/daniel/linux/index.html

Documents: http://www.kiecza.de/daniel/linux/cvoicecontrol/index.html



5.1.3. Open Mind Speech

1999 N㔼Ɏn܂ Open Mind Speech ͉xOς܂ (
 VoiceControl, ̌ SpeechInput , ꂩ FreeSpeech ł). 
ł, I[v\[XvWFNg "Open Mind Initiative" ̈ꕔ
. ̏͊Sɋ@\킯ł͂Ȃ, ɊJҌł.

̃\tgEFA͎ɊJ҂Ɍ̂ł.

Homepage: http://freespeech.sourceforge.net



5.1.4. GVoice

GVoice  Gtk/GNOME AvP[V𐧌䂷邽߂ IBM  (t[)
ViaVoice SDK gp ASR Cu, , FGW, 
, pl̃Rg[s߂̃Cu܂܂Ă܂. J
͈Nȏ؂Ă܂.

̃\tgEFA͎ɊJ҂Ɍ̂ł.

Homepage: http://www.cse.ogi.edu/~omega/gnome/gvoice/



5.1.5. ISIP

Mississippi State University  Institute for Signal and Information
Processing ͂̉FGWJ܂. ̃c[Lbg̓t
gGhƃfR[_[, ČPW[܂ł܂. ͋@\
Iȃc[Lbgł.

̃\tgEFA͎ɊJ҂Ɍ̂ł.

̃c[Lbg ( ISIP ɂĂ̏) ͂œł܂: http://
www.isip.msstate.edu/project/speech/



5.1.6. CMU Sphinx

Sphinx ͂Ƃ CMU Ŏn߂, ŋ߃I[v\[XƂČJ܂
. ͑̃c[Ə܂, Ȃ傫ȃvOł. 
͂܂"J"ł, wK̂߂̃\tgEFAƔFu, f
, ꃂf, 쐬̕܂ł܂.

̃\tgEFA͎ɊJ҂Ɍ̂ł.

Homepage: http://www.speech.cs.cmu.edu/sphinx/Sphinx.html

Source: http://download.sourceforge.net/cmusphinx/sphinx2-0.1a.tar.gz



5.1.7. Ears

Ears ̊J͊Sł͂܂, ASR n߂ƎvĂvO}
ɂ͗ǂɂȂł傤.

̃\tgEFA͎ɊJ҂Ɍ̂ł.

FTP site: ftp://svr-ftp.eng.cam.ac.uk/comp.speech/recognition/



5.1.8. NICO ANN Toolkit

NICO Artificial Neural Network toolkit͉FAvP[VɍœK
ꂽtLVuobNvpQ[Vj[lbg[Nc[
Lbgł.

̃\tgEFA͎ɊJ҂Ɍ̂ł.

homepage: http://www.speech.kth.se/NICO/index.html



5.1.9. Myers' Hidden Markov Model Software

Richard Myers ̂̃\tgEFA C++ ŋLqꂽ HMM ASY
.  L. Rabiner ̖{ł "Fundamentals of Speech Recognition"
ɋLqꂽ HMM ̂߂̗ƊwKc[񋟂܂.

̃\tgEFA͎ɊJ҂Ɍ̂ł.

͂œł܂: http://www.itl.atr.co.jp/comp.speech/Section6
/Recognition/myers.hmm.html



5.1.10. Jialong He's Speech Recognition Research Tool

Ƃ Linux ɏꂽ̂ł͂܂, ̌c[
Linux ŃRpCł܂. قȂ3̃^Cv̔FuĂ܂:
DTW, Dynamic Hidden Markov Model, Continuous Density Hidden Markov
Model ł. ͌ƊJp̂̂, S ASR VXeł͂܂
. ̃c[Lbg͂֗̕ȃc[Ă܂.

̃\tgEFA͎ɊJ҂Ɍ̂ł.

ɑ̏͂œł܂: http://www.itl.atr.co.jp/
comp.speech/Section6/Recognition/jialong.html



5.1.11. ܂ɂ܂?

LȊÔ̂Ȃ玄܂łm点: scook@gear21.com
<mailto:scook@gear21.com>. 낵, Љ\tgEFA
Rs[łꏊĂ. ɊzĒƍK
ł.



5.2. p\tgEFA

5.2.1. IBM ViaVoice

SDK ̖͂ǂȂ邩킩܂, IBM  ViaVoice V[Y Linux
T|[gƂ񑩂Ă܂, (J҂Ƃ̃CZX_͌_
ł͌ɂ͍sĂ܂, ɂȂł傤. )

p (t[łȂ) ił, IBM ViaVoice Dictation for Linux
(http://www-4.ibm.com/software/speech/linux/dictation.html ł
܂) ̐\͂ƂĂǂ̂ł, {I ASR VXe (64M RAM 
233MHz Pentium) ɔrĂɑ傫ȃVXeKvƂ܂. $59.95US
 Andrea NC-8 }CNtĂ܂. }`[UŎgp邱Ƃ\
ł (, ̓}`[UŎĂȂ̂, ꂩl
Ύ̂@Ă). ̃pbP[Ŵ͎̂܂݂܂:
 (PDF), wKc[, VXe, ꂩCXg[XNv
g. 2.2nJ[lx[Xɂ Linux fBXgr[ṼT|
[gŐṼ[Xł͂Ă܂.

 ASR SDK ͎Rɓł, IBM  SMAPI, @ API, , ƗlXȃT
vvO܂ł܂. ViaVoice Run Time Kit ͏@\
߂ ASR GWƃf[^t@C, [U[eBeB񋟂܂.
 ViaVoice Command & Control Run Time Kit ͉߃VXê߂
ASR GWƃf[^t@Cƃ[U[eBeB܂ł܂. 
SDK  Kit ɂ 128MB  RAM  Linux 2.2 ȏオKvł.

SDK  Kit ͂ŎRɓł܂: http://www-4.ibm.com/software/
speech/dev/sdk_linux.html



5.2.2. Vocalis Speechware

Vocalis  Vocalis Speechware ɂĂ̂Ȃ: http://
www.vocalisspeechware.com  http://www.vocalis.com. 



5.2.3. Babel Technologies

Babel Technologies  Babear ƌĂ΂ Linux SDK 񋟂Ă܂. 
 Hybrid Markov Model  Artificial Neural Network eNmWɊ
b҂ɈˑȂVXeł. eLXgϊbҏƍ, f
Ɋւ邳܂܂Ȑi܂. ̏ɂĂ: http://
www.babeltech.com.



5.2.4. SpeechWorks

ނ̃EFuTCgł Linux ɂēɌyĂ܂, ނ
"OpenSpeech Recognizer" ̓I[vX^_[hł VoiceXML gp
Ă܂. ̏ɂĂ: http://www.speechworks.com.



5.2.5. Nuance

Nuance ͂܂܂ *nix vbgtH[p̉F/R̐i
(݂ Nuance 8.0) 񋟂Ă܂. ɑ傫ȌbƂ\
ŃXP[reBƏQê߂ɓL̕UA[LeN`gp
܂. ̏͂ł܂: http://www.nuance.com.



5.2.6. Abbot/AbbotDemo

Abbot ͔ɑ傫Ȍb, b҂ɈˑȂ ASR VXeł. ͂
, Cambridge University  Connectionist Speech GroupɂĊJ
, ܂, SoftSound (p)ɈڂĂ܂. 킵: http://
www.softsound.com

AbbotDemo  Abbot ̃fpbP[Wł. ̃fVXe͖ 5000 
b, connectionist/HMM ̘AASY𗘗pĂ
. ̓\[XR[h̕ȂfvOł.



5.2.7. Entropic

Entropic ̎ӂ̗L\Ȑl Micro$oft ɔĂ܂܂. . .
iƃT|[gT[rX͑SďĂ܂܂. HTK  ESPS/waves+ ̃T
|[g͑ł؂Ă܂, ނ̖ M$ ɂĂ܂. ÂEF
uTCg http://www.entropic.com ɂɏ񂪂܂.

K.K. Chin  HTK ̌X̊J (Cambridge  Speech Vision and
Robotic Group) ܂ɑ΂T|[gĂƏ܂. 
http://htk.eng.cam.ac.ukł̓t[ȃo[Wł܂. Microsoft
s HTK ̃R[h̒쌠LĂ邱Ƃɂ͒ӂĂ.



5.2.8. ̏pi

葽̏p ASR i (L&H ܂߂) ߂ł悤ɂȂ
Ƃ\܂.  Comdex 2000 (Vegas)  L&H ̑\ 2,3 lƘb
܂, N Linux [Xɂ, ܂ Linux ɂǂ̐ĩ
[Xv悵̂ɂĂ܂ł. ȏ
Ă, ڍׂ scook@gear21.com <mailto:scook@gear21.com>
ɑĂ.



6. F̓

6.1. ǂ̂悤ɔFĂ邩

FVXe 2 ̎ȕɕł܂. p^[FVXe͓K
𔻒f邽߂Ƀp^[m̂̂wKp^[Ɣr܂.
Acoustic Phonetic VXe͉̓ (ꉹȂǂ̉Ȃǂ̉) r
邽߂ɐl̂Ɋւm (̐, ƒo) 𗘗p܂. قƂǂ
IȃVXe͂̂悤ȃp^[FAv[`ɏd_uĂ܂,
ȂȂ. ݂͌̃Rs[^pZpƂ܂т, x
₷ł.

قƂǂ̔Fu͈ȉ̂悤ȒiKɕł܂:

 1. I[fBI̋L^Ɣb̌o
   
 2. vtB^O (vGt@TCY, K, ofBOȂ)
   
 3. t[~OƃEBhEBO (f[^gpł`ɕ)
   
 4. tB^O (Xɂꂼwindow/frame/freq.bandtB^
    O)
   
 5. rƓK (b̔F)
   
 6.  (Fꂽp^[Ɋ֘A@\s)
   
ꂼ̒iK͒PɌ܂, ͑̈قȂ (ĂƂ
͊Sɋt) Zp𗘗pĂ܂.

(1) I[fBI/̘^: 낢ȕ@܂. n߂͎͂̃I[f
BĨx (̏ꍇł͉̃GlM[) ^ĂT
vƔr邱Ƃł. I_̔ʂ, b҂ċz₽ߑ, ̖, G
R[Ȃǂ "artifacts" cȂ̂łɍł.

(2) vtB^O: FVXȇ̋@\Ɉˑ, 낢ȕ
@ōsȂ܂. łʓIȕ@, Tv̏̂߂ɈÃI[
fBItB^[gp "Bank-of-Filters" @, (덷)̌vẐ
߂ɗ\@\gp Linear Predictive Coding @ł. قȂ`̃X
yNg͂p܂.

(3) t[~O/EChEBO̓Tvf[^̑傫ɕ
邱Ƃł. ͂΂ step2  step4 ֐i݂܂. ̒iK͕͂
߂ɃTvE (̃J`ƂȂǂ) pӂ邱Ƃ
܂ł܂.

(4) ǉ̃tB^O͂݂킯ł͂܂. ͔r
ƓK̑Oł̂ꂼ̃EBhEɑ΂Ō̏ł. ΂΂
͎Ԃ̔zuƐK\܂.

(5)̔rƓKɂĂ͉\ȋZpʂɂ܂. قƂǂ݂͌̃E
BhEƊm̃Tv̔rKvƂ܂.Hidden Markov Models
(HMM), g, ى, ^㐔̋Zp/ߓ, XyNgc𗘗p
@⎞ԘcȖ@܂. ׂ̂Ă̕@͈v̊mƐx
邽߂ɎgpĂ܂.

(6) ͊J҂]񂾂Ƃł.



6.2. fBW^I[fBI̊b

I[fBI͖{IɃAiOȌۂł. fBW^TvŘ^邱
Ƃ, }CÑAiOMTEhJ[h A/D Ro[^ŃfB
W^Mɕϊ邱Ƃł. }CN삵Ă, g̓}CN̒
̎΂̗vfU, TEhJ[hւ̓d (Xs[Jtɓ삵
ƍlĂ) 𔭐܂. {IɂA/DRo[^͓
Ԋuł̓d̒lL^܂.

̉ߒ̒2̏dvȗvf܂. 1߂ "sample rate", 邢
͂ǂ̂悤ȕpxœdL^̂Ƃ. 2߂ "bits per
second", ǂ̂悤ȐxŒlL^邩Ƃ̂ł. 3߂̗vf
`l̐ (mXeI), ,  ASR AvP[V
ł̓mŏ\ł. ̃AvP[Vł͂̃p[^ɗ\
ݒ肳ꂽlgp, [U͕ɏĂȂύXׂ
ł͂܂. J҂͈قȂlł̃ASYƂ͉̂
邱ƂŌ肷ׂł.

ł, ASR ɂĂ͂ǂ̂悤ȃTv[gǂ̂ł傤? 
͔rIႢш敝 (قƂ 100Hz  8kHz) ł, 8000 samples/
sec (8kHz) ͂قƂǂ̊{I ASR ɑ΂Ă͏\ł. , 
mȍg̏𓾂̂ 16000 samples/sec(16kHz) Dސl
܂. \͂ 16kHz gׂł. قƂǂ ASR Av
P[Vł 22kHz ȏ̃TvO[g͖ʂł.

Ăǂ̂悤Ȓl "bits per sample" (1Tṽrbg) ɂ
ėǂ̂ł傤? 8 bits per sample  0  255 ̊ԂŒlL^
, ̓}CN̑傫 256̒1łƂƂӖ
. 16 bits per sample͐̑傫 65536 ɕ܂. Tv
[glł. r̂߂, yp Compact Disc  44kHz  16 bits
per sampleŃGR[hĂ܂.

gpGR[fBOtH[}bg͐^ȕt邢͕Ȃ
̂悤ɒPłׂł. U-Law/A-Law ASY܂͑̈k@
gp邱Ƃ͕ʉl܂, ȂȂ炻͌vZ\͂̃RXg
, RXgɌ\͂\ɓ邱ƂłȂł



7. oŕ

̃XgɍڂĂȂ̂, ɉقƎvoŕ
, scook@gear21.com <mailto:scook@gear21.com>֏𑗂Ă
.



7.1. 

 E "Fundamentals of Speech Recognition". L. Rabiner & B. Juang. 1993.
    ISBN: 0130151572.
   
 E "How to Build a Speech Recognition Application". B. Balentine, D.
    Morgan, and W. Meisel. 1999. ISBN: 0967127815.
   
 E "Speech Recognition : Theory and C++ Implementation". C. Becchetti
    and L.P. Ricotti. 1999. ISBN: 0471977306.
   
 E "Applied Speech Technology". A. Syrdal, R. Bennett, S. Greenspan.
    1994. ISBN: 0849394562.
   
 E "Speech Recognition : The Complete Practical Reference Guide". P.
    Foster, T. Schalk. 1993. ISBN: 0936648392.
   
 E "Speech and Language Processing: An Introduction to Natural
    Language Processing, Computational Linguistics and Speech
    Recognition". D. Jurafsky, J. Martin. 2000. ISBN: 0130950696.
   
 E "Discrete-Time Processing of Speech Signals (IEEE Press Classic
    Reissue)". J. Deller, J. Hansen, J. Proakis. 1999. ISBN:
    0780353862.
   
 E "Statistical Methods for Speech Recognition (Language, Speech, and
    Communication)". F. Jelinek. 1999. ISBN: 0262100665.
   
 E "Digital Processing of Speech Signals" L. Rabiner, R. Schafer.
    1978. ISBN: 0132136031
   
 E "Foundations of Statistical Natural Language Processing". C.
    Manning, H. Schutze. 1999. ISBN: 0262133601.
   
̃ICœǂ߂L̂, Institut Fur Phoneti `FbN
قł傤: http://www.informatik.uni-frankfurt.de/~ifb/
bib_engl.html



7.2. C^[lbg

news:comp.speech
   
    Rs[^ƉɊւj[XO[vł.
   
      US: http://www.speech.cs.cmu.edu/comp.speech/
       
      UK: http://svr-www.eng.cam.ac.uk/comp.speech/
       
      Aus: http://www.speech.su.oz.au/comp.speech/
       
news:comp.speech.users
   
    Ɋւ\tgEFÃ[Û߂̃j[XO[vł.
   
      http://www.speechtechnology.com/users/comp.speech.users.html
       
news:comp.speech.research
   
    Ɋ֌W\tgEFAƃn[hEFÂ߂̃j[XO[v
    .
   
news:comp.dsp
   
    fBW^M̂߂̃j[XO[vł.
   
news:alt.sci.physics.acoustics
   
    ̕ŵ߂̃j[XO[vł.
   
DDLinux Email List
   
    Linux ̉F̃[OXgł.
   
      Homepage: http://leb.net/ddlinux/
       
      Archives: http://leb.net/pipermail/ddlinux/
       
Linux Software Repository for speech applications
   
    http://sunsite.uio.no/pub/linux/sound/apps/speech/
   
Russ Wilcox's List of Speech Recognition Links
   
    (excellent) http://www.tiac.net/users/rwilcox/speech.html
   
Online Bibliography
   
    Online Bibliography of Phonetics and Speech Technology
    Publications. http://www.informatik.uni-frankfurt.de/~ifb/
    bib_engl.html
   
MIT's Spoken Language Systems Homepage
   
    http://www.sls.lcs.mit.edu/sls/
   
Oregon Graduate Institute
   
    Oregon Graduate Institute  Spoken Language Understanding ̃Z^
    [ł. J҂ƌ҂ɂƂđf炵ꏊł. http://
    cslu.cse.ogi.edu/
   
IBM's ViaVoice Linux SDK
   
    http://www-4.ibm.com/software/speech/dev/sdk_linux.html
   
Mississippi State
   
    Signal and Information ProcessingɂẴ~VVbsBwJ
    Ɍʂ̏񂪂z[y[Wł. http://
    www.isip.msstate.edu/projects/speech/
   
Speech Technology
   
    ASR \tgEFAƃANZTł. http://www.speechtechnology.com
   
Speech Control
   
    ̃Rs[^VXe. ASRp̃}CN, wbhZbg, C
    Xił. http://www.speechcontrol.com
   
Microphones.com
   
    ASR p̃}CNƃANZTł. http://www.microphones.com
   
21st Century Eloquence
   
    "Speech Recognition Specialists." http://voicerecognition.com
   
Computing Out Loud
   
    ɂ Windows [UɌ̂ł, ǂ񂪂܂. http:/
    /www.out-loud.com
   
Say I Can.com
   
    "The Speech Recognition Information Source." http://www.sayican.com
   


8. {ɂ

{ Linux Japanese FAQ Project s܂. |Ɋւ邲ӌ
 JF vWFNg <JF@linux.or.jp> ɘAĂ.

1.2j

|:
   
     <htakashi@yabumi.com>
   
Z:
   
      JG <jeanne@mbox.kyoto-inet.or.jp>
       
      {_ <hng@ps.ksky.ne.jp>
       
