Skip to main content
placeholder image

Separation of speech sources using an acoustic vector sensor

Conference Paper


Abstract


  • This paper investigates how the directional characteristics of an Acoustic Vector Sensor (AVS) can be used to separate speech sources. The technique described in this work takes advantage of the frequency domain direction of arrival estimates to identify the location, relative to the AVS array, of each individual speaker in a group of speakers and separate them accordingly into individual speech signals. Results presented in this work show that the technique can be used for real-time separation of speech sources using a single 20ms frame of speech, furthermore the results presented show that there is an average improvement in the Signal to Interference Ratio (SIR) for the proposed algorithm over the unprocessed recording of 15.1 dB and an average improvement of 5.4 dB in terms of Signal to Distortion Ratio (SDR) over the unprocessed recordings. In addition to the SIR and SDR results, Perceptual Evaluation of Speech Quality (PESQ) and listening tests both show an improvement in perceptual quality of 1 Mean Opinion Score (MOS) over unprocessed recordings.

Authors


  •   Shujau, Muawiyath (external author)
  •   Ritz, Christian H.
  •   Burnett, Ian S. (external author)

Publication Date


  • 2011

Citation


  • Shujau, M., Ritz, C. H. & Burnett, I. S. (2011). Separation of speech sources using an acoustic vector sensor. 13rd IEEE International Workshop on Multimedia Signal Processing, MMSP 2011 (pp. 1-6). USA: IEEE.

Scopus Eid


  • 2-s2.0-84055218328

Ro Metadata Url


  • http://ro.uow.edu.au/infopapers/1767

Start Page


  • 1

End Page


  • 6

Place Of Publication


  • http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6093797

Abstract


  • This paper investigates how the directional characteristics of an Acoustic Vector Sensor (AVS) can be used to separate speech sources. The technique described in this work takes advantage of the frequency domain direction of arrival estimates to identify the location, relative to the AVS array, of each individual speaker in a group of speakers and separate them accordingly into individual speech signals. Results presented in this work show that the technique can be used for real-time separation of speech sources using a single 20ms frame of speech, furthermore the results presented show that there is an average improvement in the Signal to Interference Ratio (SIR) for the proposed algorithm over the unprocessed recording of 15.1 dB and an average improvement of 5.4 dB in terms of Signal to Distortion Ratio (SDR) over the unprocessed recordings. In addition to the SIR and SDR results, Perceptual Evaluation of Speech Quality (PESQ) and listening tests both show an improvement in perceptual quality of 1 Mean Opinion Score (MOS) over unprocessed recordings.

Authors


  •   Shujau, Muawiyath (external author)
  •   Ritz, Christian H.
  •   Burnett, Ian S. (external author)

Publication Date


  • 2011

Citation


  • Shujau, M., Ritz, C. H. & Burnett, I. S. (2011). Separation of speech sources using an acoustic vector sensor. 13rd IEEE International Workshop on Multimedia Signal Processing, MMSP 2011 (pp. 1-6). USA: IEEE.

Scopus Eid


  • 2-s2.0-84055218328

Ro Metadata Url


  • http://ro.uow.edu.au/infopapers/1767

Start Page


  • 1

End Page


  • 6

Place Of Publication


  • http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6093797