Guerrero Flores, Cristina Maritza (2016) Information Fusion Approaches for Distant Speech Recognition in a Multi-microphone Setting. PhD thesis, University of Trento.
|PDF - Doctoral Thesis |
Restricted to Repository staff only until 9999.
Available under License Creative Commons Attribution Share Alike.
It is a well known fact that high quality Automatic Speech Recognition is still difficult to guarantee under conditions in which the speaker is distant from the microphone due to the distortions caused by acoustic phenomena, such as noise and reverberation. Among the different research directions pursued around this problem, the adoption of multi-channel approaches is of great interest to the community given the potential of taking advantage of information diversity. In this thesis we elaborate on approaches that exploit different instances of a sound source, captured by various largely spaced microphones, in order to extract a Distant Speech Recognition hypothesis. Two original solutions are presented, based on information fusion approaches at different levels of the recognition system, one at front-end stage and one at post-decoding stage, namely for the problems of channel selection (CS) and hypothesis combination. First, a new CS framework is proposed. Cepstral distance (CD), which is effectively applied in other acoustic processing fields, is the basis of the CS method developed. Experimental results confirmed the advantages of a CD-based selection schema under different scenarios. The second contribution concerns the combination of information extracted from the individual decoding processes performed over the multiple captured signals. It is shown how temporal cues can be identified in the hypothesis space, and be beneficial for the elaboration of a multi-microphone confusion network, from which the final speech transcription is derived. The proposed methods are applicable in a setting equipped with synchronized distributed microphones, independently of the proximity between the sensors. Analysis of the novel concepts were performed over synthetic and real-captured data. Both approaches achieved positive results at the different assessment tasks they were exposed to.
|Item Type:||Doctoral Thesis (PhD)|
|Doctoral School:||Information and Communication Technology|
|Subjects:||Area 01 - Scienze matematiche e informatiche > INF/01 INFORMATICA|
|Uncontrolled Keywords:||Distant-talking, distributed microphone network, channel selection, cepstral distance, hypothesis combination, lattice, confusion network.|
|Funders:||Fondazione Bruno Kessler|
|Repository Staff approval on:||02 Nov 2016 10:52|
Repository Staff Only: item control page