Skip to main content
placeholder image

Encoding navigable speech sources: an analysis by synthesis approach

Conference Paper


Abstract


  • This paper pressents an analysis-by-synthesis coding architecture for compressing navigable speech sources. The proposed coding scheme encodes multiple overlapped speech sources recorded, for example, during a multi-participant meeting or teleconference, into a mono or stereo mixture signal that can be compressed with an existing speech coder. The individual speech sources can be separated from the received compressed mixture, which allows the listener to determine the active sources and their spatial locations at the reproduction site. The approach was applied to the compression of a series of speech soundfields created from multiple clean speech sentences and real meeting recordings, where each sound-field contained four participants with up to three simultaneous speech sources. At a total bit rate of 48 kbps, the perceptual quality of each decoded speech source, as judged by subjective listening tests, was found to be significantly better than either a non-a-by-s approach or separate encoding of each source at the same overall total bit rate. Subjective listening tests also confirm that the quality of the spatialised speech scene is maintained as well. © 2012 IEEE.

Publication Date


  • 2012

Citation


  • X. Zheng, C. H. Ritz & J. Xi, "Encoding navigable speech sources: an analysis by synthesis approach," in ICASSP 2012: IEEE International Conference on Acoustics, Speech and Signal Processing, 2012, pp. 405-408.

Scopus Eid


  • 2-s2.0-84867615570

Ro Metadata Url


  • http://ro.uow.edu.au/infopapers/2222

Start Page


  • 405

End Page


  • 408

Abstract


  • This paper pressents an analysis-by-synthesis coding architecture for compressing navigable speech sources. The proposed coding scheme encodes multiple overlapped speech sources recorded, for example, during a multi-participant meeting or teleconference, into a mono or stereo mixture signal that can be compressed with an existing speech coder. The individual speech sources can be separated from the received compressed mixture, which allows the listener to determine the active sources and their spatial locations at the reproduction site. The approach was applied to the compression of a series of speech soundfields created from multiple clean speech sentences and real meeting recordings, where each sound-field contained four participants with up to three simultaneous speech sources. At a total bit rate of 48 kbps, the perceptual quality of each decoded speech source, as judged by subjective listening tests, was found to be significantly better than either a non-a-by-s approach or separate encoding of each source at the same overall total bit rate. Subjective listening tests also confirm that the quality of the spatialised speech scene is maintained as well. © 2012 IEEE.

Publication Date


  • 2012

Citation


  • X. Zheng, C. H. Ritz & J. Xi, "Encoding navigable speech sources: an analysis by synthesis approach," in ICASSP 2012: IEEE International Conference on Acoustics, Speech and Signal Processing, 2012, pp. 405-408.

Scopus Eid


  • 2-s2.0-84867615570

Ro Metadata Url


  • http://ro.uow.edu.au/infopapers/2222

Start Page


  • 405

End Page


  • 408