Skip to main content
placeholder image

Black-Box Audio Adversarial Example Generation Using Variational Autoencoder

Chapter


Abstract


  • Automatic speech recognition (ASR) applications are ubiquitous these days. A variety of commercial products utilize powerful ASR capabilities to transcribe user speech. However, as with other deep learning models, the techniques underlying ASR models suffer from adversarial example (AE) attacks. Audio AEs resemble non-suspicious audio to the casual listener, but will be incorrectly transcribed by an ASR system. Existing black-box AE techniques require excessive requests sent to a targeted system. Such suspicious behavior can potentially trigger a threat alert on the system. This paper proposes a method of generating black-box AEs in a way that significantly reduces the required amount of requests. We describe our proposed method and presents experimental results demonstrating its effectiveness in generating word-level and sentence-level AEs that are incorrectly transcribed by an ASR system.

Publication Date


  • 2021

Citation


  • Zong, W., Chow, Y. W., & Susilo, W. (2021). Black-Box Audio Adversarial Example Generation Using Variational Autoencoder. In Unknown Book (Vol. 12919 LNCS, pp. 142-160). doi:10.1007/978-3-030-88052-1_9

International Standard Book Number (isbn) 13


  • 9783030880514

Scopus Eid


  • 2-s2.0-85116028339

Book Title


  • Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Start Page


  • 142

End Page


  • 160

Abstract


  • Automatic speech recognition (ASR) applications are ubiquitous these days. A variety of commercial products utilize powerful ASR capabilities to transcribe user speech. However, as with other deep learning models, the techniques underlying ASR models suffer from adversarial example (AE) attacks. Audio AEs resemble non-suspicious audio to the casual listener, but will be incorrectly transcribed by an ASR system. Existing black-box AE techniques require excessive requests sent to a targeted system. Such suspicious behavior can potentially trigger a threat alert on the system. This paper proposes a method of generating black-box AEs in a way that significantly reduces the required amount of requests. We describe our proposed method and presents experimental results demonstrating its effectiveness in generating word-level and sentence-level AEs that are incorrectly transcribed by an ASR system.

Publication Date


  • 2021

Citation


  • Zong, W., Chow, Y. W., & Susilo, W. (2021). Black-Box Audio Adversarial Example Generation Using Variational Autoencoder. In Unknown Book (Vol. 12919 LNCS, pp. 142-160). doi:10.1007/978-3-030-88052-1_9

International Standard Book Number (isbn) 13


  • 9783030880514

Scopus Eid


  • 2-s2.0-85116028339

Book Title


  • Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Start Page


  • 142

End Page


  • 160