Introduction
Thanks to the development of the Internet and video services such as YouTube, Facebook, and Twitch, people can easily share their own videos with people across continents on a daily basis. Along with books, these videos have become a new source of knowledge. However, the quality of information in some of these videos is sometimes questionable and may contain unintentional or intentional misinformation, and political bias.
Moreover, the COVID-19 pandemic has resulted in the widespread adoption of remote working, remote learning, and remote conferencing. These remote working environments demand many new applications for efficient video transcript understanding, such as meeting recording understanding, quality assurance in call centers, and automatic test scoring in educational testing. The recent advancements in methods and resources for speech recognition have also created more research opportunities around video transcript understanding.