作者(外文):Liu, Chao-Ning
論文名稱(外文):A2A: Attention to Attention Reasoning for Movie Question Answering
指導教授(外文):Chen, Hwann-Tzong
Liu, Tyng-Luh
口試委員(外文):Chiu, Wei-Chen
Lee, Che-Rung
本篇論文提出交互注意力推論機制,旨在建立一套電影問答系統,這是極具挑戰的研究。我們實驗提出兩項分析注意力的方法,第一是注意力傳播機制,這個方法會藉由關聯提示片段與題示片外段外的其他內容來發現潛在有用的信息。第二是問答注意力機制,這個機制可以找出與問題和答案相關連的所有內容。再者,我們提出的交互注意力推論機制可以有效參考視覺與文字的內容來作答,並且也可以利用神經網絡架構方便地構建。為了解決電影中稀有名字所造成的的詞彙外問題,我們採用 GloVe 字向量作為教師模型,並基於 n-gram 象徵向量建立一個新穎且靈活的字向量。我們的方法在MovieQA 基準數據集上進行評估,並且達成影片與字幕這個子任務的最好的表現。
This thesis presents the Attention to Attention (A2A) reasoning mecha-nism to address the challenging task of movie question answering (MQA). By focusing on the various aspects of attention cues, we establish the tech-nique of attention propagation to uncover latent but useful information to solving the underlying QA task. In addition, the proposed A2A reasoning seamlessly leads to effective fusion of different representation modalities about the data, and also can be conveniently constructed with popular neural network architectures. To tackle the out-of-vocabulary issue caused by the diverse language usages in nowadays movies, we adopt the GloVe mapping as a teacher model and establish a new and flexible word embed-ding based on character n-grams learning. Our method is evaluated on the MovieQA benchmark dataset and achieves the state-of-the-art accuracy for the ‘Video+Subtitles’ entry.
1 Introduction 7
2 Related Work 10
2.1 Visual Captioning and Question Datasets . . . . . . . . . . . . . . . 10
2.2 Memory Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3 Attention Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3 Proposed Method 14
3.1 Visual and Linguistic Embedding . . . . . . . . . . . . . . . . . . . . 14
3.2 Joint Embedding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.3 Attention Propagation . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.4 QA Attention . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.5 Optimal Answer Response . . . . . . . . . . . . . . . . . . . . . . . . 20
4 Experiments and Discussions 22
4.1 Ablation Study on Key Components . . . . . . . . . . . . . . . . . . 23
4.2 Leaderboard Comparison . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.3 Model Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.4 Question Types Comparison . . . . . . . . . . . . . . . . . . . . . . . 29
5 Conclusion 31
