Skip to main content
placeholder image

Speechbot: An experimental speech-based search engine for multimedia content on the Web

Journal Article


Abstract


  • As the Web transforms from a text-only medium into a more multimedia-rich medium, the need arises to perform searches based on the multimedia content. In this paper, we present an audio and video search engine to tackle this problem. The engine uses speech recognition technology to index spoken audio and video files from the World Wide Web (WWW) when no transcriptions are available. If transcriptions (even imperfect ones) are available, we can also take advantage of them to improve the indexing process. Our engine indexes several thousand talk and news radio shows covering a wide range of topics and speaking styles from a selection of public Web sites with multimedia archives. Our Web site is similar in spirit to normal Web search sites; it contains an index, not the actual multimedia content. The audio from these shows suffers in acoustic quality due to bandwidth limitations, coding, compression, and poor acoustic conditions. Our word error rate (WER) results using appropriately trained acoustic models show remarkable resilience to the high compression, although many factors combine to increase the average WERs over standard broadcast news benchmarks. We show that, even if the transcription is inaccurate, we can still achieve good retrieval performance for typical user queries (77.5%).

Publication Date


  • 2002

Citation


  • Van Thong, J. M., Moreno, P. J., Logan, B., Fidler, B., Maffey, K., & Moores, M. (2002). Speechbot: An experimental speech-based search engine for multimedia content on the Web. IEEE Transactions on Multimedia, 4(1), 88-96. doi:10.1109/6046.985557

Scopus Eid


  • 2-s2.0-0036501883

Start Page


  • 88

End Page


  • 96

Volume


  • 4

Issue


  • 1

Abstract


  • As the Web transforms from a text-only medium into a more multimedia-rich medium, the need arises to perform searches based on the multimedia content. In this paper, we present an audio and video search engine to tackle this problem. The engine uses speech recognition technology to index spoken audio and video files from the World Wide Web (WWW) when no transcriptions are available. If transcriptions (even imperfect ones) are available, we can also take advantage of them to improve the indexing process. Our engine indexes several thousand talk and news radio shows covering a wide range of topics and speaking styles from a selection of public Web sites with multimedia archives. Our Web site is similar in spirit to normal Web search sites; it contains an index, not the actual multimedia content. The audio from these shows suffers in acoustic quality due to bandwidth limitations, coding, compression, and poor acoustic conditions. Our word error rate (WER) results using appropriately trained acoustic models show remarkable resilience to the high compression, although many factors combine to increase the average WERs over standard broadcast news benchmarks. We show that, even if the transcription is inaccurate, we can still achieve good retrieval performance for typical user queries (77.5%).

Publication Date


  • 2002

Citation


  • Van Thong, J. M., Moreno, P. J., Logan, B., Fidler, B., Maffey, K., & Moores, M. (2002). Speechbot: An experimental speech-based search engine for multimedia content on the Web. IEEE Transactions on Multimedia, 4(1), 88-96. doi:10.1109/6046.985557

Scopus Eid


  • 2-s2.0-0036501883

Start Page


  • 88

End Page


  • 96

Volume


  • 4

Issue


  • 1