Skip to main content
placeholder image

An effective target speech enhancement with single acoustic vector sensor based on the speech time-frequency sparsity

Conference Paper


Abstract


  • This paper investigates the speech time-frequency (TF) sparsity together with the unique characteristics between the acoustic vector sensors (AVS) to formulate an effective speech enhancement approach under the minimum mean square error (MMSE) criterion together with a fixed beamformer (FBF). The proposed approach exploits the inter-sensor data ratio (ISDR) of the AVS and time-frequency sparsity of speech to derive a mask that is used to extract and enhance a target speech signal recorded in the presence of a spatially separated interfering speech signal and background noise. Experimental results show that the proposed AVS-ISDRSS algorithm effectively suppresses the spatial interference and additive background noise meanwhile increases the perceptual quality of the target speech. In addition, it is noted that the proposed AVS-ISDRSS algorithm does not require voice activity detection (VAD) for estimating the speech and this greatly reduces the computational complexity.

Publication Date


  • 2014

Citation


  • Y. x. Zou, Y. Q. Wang, P. Wang, C. H. Ritz & J. Xi, "An effective target speech enhancement with single acoustic vector sensor based on the speech time-frequency sparsity," in Digital Signal Processing (DSP), 2014 19th International Conference on, 2014, pp. 547-551.

Scopus Eid


  • 2-s2.0-84940740587

Ro Metadata Url


  • http://ro.uow.edu.au/eispapers/4708

Start Page


  • 547

End Page


  • 551

Abstract


  • This paper investigates the speech time-frequency (TF) sparsity together with the unique characteristics between the acoustic vector sensors (AVS) to formulate an effective speech enhancement approach under the minimum mean square error (MMSE) criterion together with a fixed beamformer (FBF). The proposed approach exploits the inter-sensor data ratio (ISDR) of the AVS and time-frequency sparsity of speech to derive a mask that is used to extract and enhance a target speech signal recorded in the presence of a spatially separated interfering speech signal and background noise. Experimental results show that the proposed AVS-ISDRSS algorithm effectively suppresses the spatial interference and additive background noise meanwhile increases the perceptual quality of the target speech. In addition, it is noted that the proposed AVS-ISDRSS algorithm does not require voice activity detection (VAD) for estimating the speech and this greatly reduces the computational complexity.

Publication Date


  • 2014

Citation


  • Y. x. Zou, Y. Q. Wang, P. Wang, C. H. Ritz & J. Xi, "An effective target speech enhancement with single acoustic vector sensor based on the speech time-frequency sparsity," in Digital Signal Processing (DSP), 2014 19th International Conference on, 2014, pp. 547-551.

Scopus Eid


  • 2-s2.0-84940740587

Ro Metadata Url


  • http://ro.uow.edu.au/eispapers/4708

Start Page


  • 547

End Page


  • 551