Abstract
-
This paper proposes a speaker independent keyword spotting (KWS) approach applied to the audio track of user
video blogs that can help in their automatic analysis, indexing, search and retrieval. The approach, which relies
on matching of keyword templates to speech segments using an adaptive similarity threshold that is estimated
automatically for each utterance, does not require training data or language model as required in existing
approaches such as those based on the Hidden Markov Model (HMM). This is a particular advantage for user
video blogs since they usually contain words of interest that have not been adequately represented in a
training database. Experiments conducted to detect offensive words in video blogs achieved much higher
accuracy than existing speech-to-text based approaches.