Skip to main content
placeholder image

Targeted Universal Adversarial Perturbations for��Automatic Speech Recognition

Chapter


Abstract


  • Automatic speech recognition (ASR) is an essential technology used in commercial products nowadays. However, the underlying deep learning models used in ASR systems are vulnerable to adversarial examples (AEs), which are generated by applying small or imperceptible perturbations to audio to fool these models. Recently, universal adversarial perturbations (UAPs) have attracted much research interest. UAPs used to generate audio AEs are not limited to a specific input audio signal. Instead, given a generic audio signal, audio AEs can be generated by directly applying UAPs. This paper presents a method of generating UAPs based on a targeted phrase. To the best of our knowledge, our proposed method of generating UAPs is the first to successfully attack ASR models with connectionist temporal classification (CTC) loss. In addition to generating UAPs, we empirically show that the UAPs can be considered as signals that are transcribed as the target phrase. We also show that the UAPs themselves preserve temporal dependency, such that the audio AEs generated using these UAPs also preserved temporal dependency.

Publication Date


  • 2021

Edition


Citation


  • Zong, W., Chow, Y. W., Susilo, W., Rana, S., & Venkatesh, S. (2021). Targeted Universal Adversarial Perturbations for��Automatic Speech Recognition. In Unknown Book (Vol. 13118 LNCS, pp. 358-373). doi:10.1007/978-3-030-91356-4_19

International Standard Book Number (isbn) 13


  • 9783030913557

Scopus Eid


  • 2-s2.0-85121878891

Book Title


  • Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Start Page


  • 358

End Page


  • 373

Place Of Publication


Abstract


  • Automatic speech recognition (ASR) is an essential technology used in commercial products nowadays. However, the underlying deep learning models used in ASR systems are vulnerable to adversarial examples (AEs), which are generated by applying small or imperceptible perturbations to audio to fool these models. Recently, universal adversarial perturbations (UAPs) have attracted much research interest. UAPs used to generate audio AEs are not limited to a specific input audio signal. Instead, given a generic audio signal, audio AEs can be generated by directly applying UAPs. This paper presents a method of generating UAPs based on a targeted phrase. To the best of our knowledge, our proposed method of generating UAPs is the first to successfully attack ASR models with connectionist temporal classification (CTC) loss. In addition to generating UAPs, we empirically show that the UAPs can be considered as signals that are transcribed as the target phrase. We also show that the UAPs themselves preserve temporal dependency, such that the audio AEs generated using these UAPs also preserved temporal dependency.

Publication Date


  • 2021

Edition


Citation


  • Zong, W., Chow, Y. W., Susilo, W., Rana, S., & Venkatesh, S. (2021). Targeted Universal Adversarial Perturbations for��Automatic Speech Recognition. In Unknown Book (Vol. 13118 LNCS, pp. 358-373). doi:10.1007/978-3-030-91356-4_19

International Standard Book Number (isbn) 13


  • 9783030913557

Scopus Eid


  • 2-s2.0-85121878891

Book Title


  • Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Start Page


  • 358

End Page


  • 373

Place Of Publication