作者(外文):Chen, Wei-Da
論文名稱(中文):CNN^2: 利用雙眼視覺增進視角泛化能力
論文名稱(外文):CNN^2: Viewpoint Generalization via a Binocular Vision
指導教授(外文):Wu, Shan-Hung
口試委員(外文):Chen, Hwann-Tzong
Chien, Jen-Tzung
Peng, Wen-Hsiao
表現,然而卷積神經網路在 3D 的視角泛化能力仍與人類有一段差距。最近的研究也
在利用雙眼視覺來提昇卷積神經網路在 3D 的視角泛化能力。我們提出 𝐂𝐍𝐍 𝟐 ,一個基於卷積神經網路所做的改進,輸入為雙眼圖片,在傳遞的過程中結合左右眼的資訊。
上顯示 𝐂𝐍𝐍^𝟐 能夠有效的提升視角的泛化能力,此外 𝐂𝐍𝐍^𝟐 是一個簡單實作且容易訓練的模型,此模型更可以兼容於現在蓬勃發展的卷積神經網路領域。
The Convolutional Neural Networks (CNNs) have laid the foundation for many techniques in various applications. Despite achieving remarkable performance in some tasks, the 3D viewpoint generalizability of CNNs is still far behind humans visual capabilities. Although recent efforts, such as the Capsule Networks, have been made to address this issue, these new models are either hard to train and/or incompatible with existing CNN-based techniques specialized for different applications. Observing that humans use binocular vision to understand the world, we study in this paper whether the 3D viewpoint generalizability of CNNs can be achieved via a binocular vision. We propose CNN^2, a CNN that takes two images as input, which resembles the process of an object being viewed from the left eye and
the right eye. CNN^2 uses novel augmentation, pooling, and convolutional layers to learn a sense of three-dimensionality in a recursive manner. Empirical evaluation shows that CNN &
has improved viewpoint generalizability compared to vanilla CNNs. Furthermore, CNN & is easy to implement and train, and is compatible with existing CNN-based specialized techniques for different applications.
1. Introduction ...................................................................................... 1
2. Model Design of 𝐂𝐍𝐍^𝟐 ..................................................................... 3
3. Further Related Work ...................................................................... 7
4. Experiments ...................................................................................... 8
4.1 3D Viewpoint Generalization ........................................................ 10
4.2 Backward Compatibility ............................................................... 12
4.3 Ablation Study .............................................................................. 14
5. Conclusion ....................................................................................... 14
Supplementary Materials .................................................................... 15
6. Related Works ................................................................................. 15
7. Correspondence to Human Visual System .................................... 17
8. More on Experiments ..................................................................... 19
8.1 Datasets and Preprocessing ......................................................... 19
8.2 Training and Model Architectures ............................................... 21
8.3 Backward Compatibility ............................................................. 23
References ............................................................................................ 25
