帳號:guest(3.141.27.132)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):陳韋達
作者(外文):Chen, Wei-Da
論文名稱(中文):CNN^2: 利用雙眼視覺增進視角泛化能力
論文名稱(外文):CNN^2: Viewpoint Generalization via a Binocular Vision
指導教授(中文):吳尚鴻
指導教授(外文):Wu, Shan-Hung
口試委員(中文):陳煥宗
簡仁宗
彭文孝
口試委員(外文):Chen, Hwann-Tzong
Chien, Jen-Tzung
Peng, Wen-Hsiao
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊工程學系
學號:106062557
出版年(民國):108
畢業學年度:107
語文別:英文
論文頁數:29
中文關鍵詞:卷積神經網路泛化能力雙眼視覺
外文關鍵詞:CNNGeneralizationViewpointBinocularVision
相關次數:
  • 推薦推薦:0
  • 點閱點閱:451
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
卷積神經網路在近幾年已經取得巨大的成功,不管是在視覺處理或是自然語言處理等
領域,甚至在醫學領域也開始引起關注。儘管卷積神經網路在這些任務上都有卓越的
表現,然而卷積神經網路在 3D 的視角泛化能力仍與人類有一段差距。最近的研究也
有針對此問題而提出新的模型,像是膠囊網路模型,但是這些方法非常的難訓練也沒
有辦法兼容於現在蓬勃發展的卷積神經網路領域。許多針對卷積神經網路所提出的方
法並不能適用在膠囊網路模型。我們啟發於人類是雙眼來了解這個世界,此篇論文旨
在利用雙眼視覺來提昇卷積神經網路在 3D 的視角泛化能力。我們提出 𝐂𝐍𝐍 𝟐 ,一個基於卷積神經網路所做的改進,輸入為雙眼圖片,在傳遞的過程中結合左右眼的資訊。
此模型以遞迴的方式利用全新的擴充方法、池化方法來學習具有立體感的特徵。實驗
上顯示 𝐂𝐍𝐍^𝟐 能夠有效的提升視角的泛化能力,此外 𝐂𝐍𝐍^𝟐 是一個簡單實作且容易訓練的模型,此模型更可以兼容於現在蓬勃發展的卷積神經網路領域。
The Convolutional Neural Networks (CNNs) have laid the foundation for many techniques in various applications. Despite achieving remarkable performance in some tasks, the 3D viewpoint generalizability of CNNs is still far behind humans visual capabilities. Although recent efforts, such as the Capsule Networks, have been made to address this issue, these new models are either hard to train and/or incompatible with existing CNN-based techniques specialized for different applications. Observing that humans use binocular vision to understand the world, we study in this paper whether the 3D viewpoint generalizability of CNNs can be achieved via a binocular vision. We propose CNN^2, a CNN that takes two images as input, which resembles the process of an object being viewed from the left eye and
the right eye. CNN^2 uses novel augmentation, pooling, and convolutional layers to learn a sense of three-dimensionality in a recursive manner. Empirical evaluation shows that CNN &
has improved viewpoint generalizability compared to vanilla CNNs. Furthermore, CNN & is easy to implement and train, and is compatible with existing CNN-based specialized techniques for different applications.
1. Introduction ...................................................................................... 1
2. Model Design of 𝐂𝐍𝐍^𝟐 ..................................................................... 3
3. Further Related Work ...................................................................... 7
4. Experiments ...................................................................................... 8
4.1 3D Viewpoint Generalization ........................................................ 10
4.2 Backward Compatibility ............................................................... 12
4.3 Ablation Study .............................................................................. 14
5. Conclusion ....................................................................................... 14
Supplementary Materials .................................................................... 15
6. Related Works ................................................................................. 15
7. Correspondence to Human Visual System .................................... 17
8. More on Experiments ..................................................................... 19
8.1 Datasets and Preprocessing ......................................................... 19
8.2 Training and Model Architectures ............................................... 21
8.3 Backward Compatibility ............................................................. 23
References ............................................................................................ 25
Abadi, Martín et al. (2016). “Tensorflow: a system for large-scale machine learning.” In:
Anzai, Akiyuki, Xinmiao Peng, and David C Van Essen (2007). “Neurons in monkey visual area
V2 encode combinations of orientations”. In: Nature neuroscience 10.10, p. 1313.
Biederman, Irving (1987). “Recognition-by-components: a theory of human image understanding”.
In: Psychological review 94.2, p. 115.
Bojarski, Mariusz et al. (2016). “End to end learning for self-driving cars”. In: arXiv preprint
arXiv:1604.07316.
Cai, Zhaowei et al. (2016). “A unified multi-scale deep convolutional neural network for fast
object detection”. In: Proc. of ECCV.
Chen, Chun-Fu et al. (2019). “Big-little net: An efficient multi-scale feature representation for
visual and speech recognition”. In:
Cheng, Gong, Peicheng Zhou, and Junwei Han (2016). “Rifd-cnn: Rotation-invariant and fisher
discriminative convolutional neural networks for object detection”. In: Proc. of CVPR.
Cheng, Xiuyuan et al. (2019). “RotDCF: Decomposition of Convolutional Filters for Rotation-
Equivariant Deep Networks”. In: Proc. of ICLR.
Codevilla, Felipe et al. (2018). “End-to-End Driving Via Conditional Imitation Learning”. In:
Proc. of ICRA.
Cohen, Taco and Max Welling (2016). “Group Equivariant Convolutional Networks”. In: Proc.
of ICML.
– (2017). “Steerable CNNs”. In: Proc. of ICLR.
Dai, Zhuyun et al. (2018). “Convolutional neural networks for soft-matching n-grams in ad-hoc
search”. In: Proc. of the Eleventh ACM International Conference on Web Search and Data
Mining. ACM, pp. 126–134.
Ecker, Alexander S. et al. (2019). “A rotation-equivariant convolutional neural network model of
primary visual cortex”. In: Proc. of ICLR.
Esteva, Andre et al. (2019). “A guide to deep learning in healthcare”. In: Nature Medicine 25.
Fukushima, Kunihiko and Sei Miyake (1982). “Neocognitron: A self-organizing neural network
model for a mechanism of visual pattern recognition”. In: Competition and cooperation in
neural nets. Springer, pp. 267–285.
25
Gehring, Jonas et al. (2017a). “A Convolutional Encoder Model for Neural Machine Translation”.
In: Proc. of ACL.
Gehring, Jonas et al. (2017b). “Convolutional sequence to sequence learning”. In: Proc. of ICML.
Godard, Clément, Oisin Mac Aodha, and Gabriel J Brostow (2017). “Unsupervised monocular
depth estimation with left-right consistency”. In: Proc. of CVPR, pp. 6602–6611.
Gong, Yunchao et al. (2014). “Multi-scale Orderless Pooling of Deep Convolutional Activation
Features”. In:
Gotts, Stephen J et al. (2013). “Two distinct forms of functional lateralization in the human
brain”. In: Proc. of the National Academy of Sciences, p. 201302581.
He, Kaiming et al. (2014). “Spatial Pyramid Pooling in Deep Convolutional Networks for Visual
Recognition”. In: Proc. of ECCV.
– (2016). “Deep residual learning for image recognition”. In: Proc. CVPR, pp. 770–778.
He, Kaiming et al. (2017). “Mask r-cnn”. In: Proc. of ICCV. IEEE, pp. 2980–2988.
Henriques, J. and A. Vedaldi (2017). “Warped Convolutions: Efficient Invariance to Spatial
Transformations”. In: Proc. of ICML.
Heydt, Rüdiger von der, Esther Peterhans, and Gunter Baumgartner (1984). “Illusory contours
and cortical neuron responses”. In: Science 224.4654, pp. 1260–1262.
Hinton, Geoffrey E, Alex Krizhevsky, and Sida D Wang (2011). “Transforming auto-encoders”.
In: International Conference on Artificial Neural Networks.
Hinton, Geoffrey E, Sara Sabour, and Nicholas Frosst (2018). “Matrix capsules with EM routing”.
In: Proc. of ICLR.
Hubel, David H and Torsten N Wiesel (1962). “Receptive fields, binocular interaction and
functional architecture in the cat’s visual cortex”. In: The Journal of physiology 160.1, pp. 106–
154.
Jaderberg, Max, Karen Simonyan, Andrew Zisserman, et al. (2015). “Spatial transformer
networks”. In: Proc. of NIPS.
Jia, Xu et al. (2016). “Dynamic Filter Networks”. In: Proc. of NIPS.
Kendall, Alex et al. (2017). “End-to-end learning of geometry and context for deep stereo
regression”. In: Proc. of ICCV, pp. 66–75.
26
Kheradpisheh, Saeed Reza et al. (2016). “Deep networks can resemble human feed-forward vision
in invariant object recognition”. In: Scientific reports 6, p. 32672.
Kim, Byungkwan, Hyunseong Kang, and Seong-Ook Park (2017). “Drone Classification Using
Convolutional Neural Networks With Merged Doppler Images.” In:
Kingma, Diederik P and Jimmy Ba (2014). “Adam: A method for stochastic optimization”. In:
arXiv preprint arXiv:1412.6980.
Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E Hinton (2012). “Imagenet classification with
deep convolutional neural networks”. In: Proc. of NIPS, pp. 1097–1105.
Kyrkou, Christos et al. (2018). “DroNet: Efficient convolutional neural network detector for real-
time UAV applications”. In: Design, Automation & Test in Europe Conference & Exhibition
(DATE), 2018. IEEE, pp. 967–972.
Lai, Kevin et al. (2011). “A Large-Scale Hierarchical Multi-View RGB-D Object Dataset”. In:
Laptev, Dmitry et al. (2016). “TI-POOLING: transformation-invariant pooling for feature
learning in convolutional neural networks”. In: Proc. of CVPR.
Laskar, Md Nasir Uddin, Luis G Sanchez Giraldo, and Odelia Schwartz (2018). “Correspondence of
Deep Neural Networks and the Brain for Visual Textures”. In: arXiv preprint arXiv:1806.02888.
LeCun, Yann, Fu Jie Huang, and Leon Bottou (2004). “Learning methods for generic object
recognition with invariance to pose and lighting”. In: Proc. of CVPR. Vol. 2. IEEE, pp. II–104.
LeCun, Yann et al. (1989). “Backpropagation applied to handwritten zip code recognition”. In:
Neural computation 1.4, pp. 541–551.
LeCun, Yann et al. (1998). “Gradient-based learning applied to document recognition”. In:
Proceedings of the IEEE 86.11, pp. 2278–2324.
Lin, Tsung-Yi et al. (2017a). “Feature pyramid networks for object detection”. In: Proc. of
CVPR.
Lin, Tsung-Yi et al. (2017b). “Focal Loss for Dense Object Detection”. In: Proc. of ICCV. IEEE,
pp. 2999–3007.
Liu, Ming-Yu, Thomas Breuel, and Jan Kautz (2017). “Unsupervised image-to-image translation
networks”. In: Proc. of NIPS, pp. 700–708.
27
Long, Bria and Talia Konkle (2018). The role of textural statistics vs. outer contours in deep CNN
and neural responses to objects. http://konklab.fas.harvard.edu/ConferenceProceedings/
Long_2018_CCN.pdf.
Long, Jonathan, Evan Shelhamer, and Trevor Darrell (2015). “Fully convolutional networks for
semantic segmentation”. In: Proc. of CVPR, pp. 3431–3440.
Maninis, Kevis-Kokitsi et al. (2016). “Convolutional oriented boundaries”. In: Proc. of ECCV.
Maruko, Ichiro et al. (2008). “Postnatal development of disparity sensitivity in visual area 2 (v2)
of macaque monkeys”. In: Journal of Neurophysiology 100.5, pp. 2486–2495.
Maturana, Daniel and Sebastian Scherer (2015). “VoxNet: A 3D Convolutional Neural Network
for Real-Time Object Recognition”. In: Proc. of IEEE/RSJ International Conference on
Intelligent Robots and Systems.
McDonald, Ryan, George Brokos, and Ion Androutsopoulos (2018). “Deep Relevance Ranking
using Enhanced Document-Query Interactions”. In: Proc. of EMNLP, pp. 1849–1860.
Milner, David and Mel Goodale (2006). The visual brain in action. Oxford University Press.
Moura, Thiago DO et al. (2014). “Design of a dual lens system for a micromachined optical
setup”. In: Proc. Microelectronics Technology and Devices (SBMicro). IEEE, pp. 1–4.
Murphy, Penelope C, Simon G Duckett, and Adam M Sillito (1999). “Feedback connections
to the lateral geniculate nucleus and cortical response properties”. In: Science 286.5444,
pp. 1552–1554.
Peer, David, Sebastian Stabinger, and Antonio Rodriguez-Sanchez (2018). “Training Deep
Capsule Networks”. In: arXiv preprint arXiv:1812.09707.
Qi, Charles R et al. (2016). “Volumetric and multi-view cnns for object classification on 3d data”.
In: Proc. of CVPR.
Qi, Charles R et al. (2017). “Pointnet: Deep learning on point sets for 3d classification and
segmentation”. In:
Qi, Kunlun et al. (2018). “Concentric Circle Pooling in Deep Convolutional Networks for Remote
Sensing Scene Classification”. In: Remote Sensing.
Qiu, Fangtu T and Rüdiger Von Der Heydt (2005). “Figure and ground in the visual cortex: V2
combines stereoscopic cues with Gestalt rules”. In: Neuron 47.1, pp. 155–166.
28
Real, Esteban et al. (2018). “Regularized evolution for image classifier architecture search”. In:
arXiv preprint arXiv:1802.01548.
Redmon, Joseph et al. (2016). “You only look once: Unified, real-time object detection”. In: Proc.
of CVPR, pp. 779–788.
Reid, R Clay and Jose-Manuel Alonso (1995). “Specificity of monosynaptic connections from
thalamus to visual cortex”. In: Nature 378.6554, p. 281.
Sabour, Sara, Nicholas Frosst, and Geoffrey E Hinton (2017). “Dynamic routing between capsules”.
In: Proc. of NIPS.
Su, Hang et al. (2015). “Multi-view convolutional neural networks for 3d shape recognition”. In:
Proc. of ICCV.
Von Der Heydt, Rüdiger, Hong Zhou, and Howard S Friedman (2000). “Representation of
stereoscopic edges in monkey visual cortex”. In: Vision research 40.15, pp. 1955–1967.
Wallis, Thomas SA et al. (2017). “A parametric texture model based on deep convolutional
features closely matches texture appearance for humans”. In: Journal of vision 17.12, pp. 5–5.
Weiler, Maurice, Fred A Hamprecht, and Martin Storath (2018). “Learning steerable filters for
rotation equivariant CNNs”. In: Proc. of CVPR.
Worrall, Daniel E et al. (2017). “Harmonic networks: Deep translation and rotation equivariance”.
In: Proc. of CVPR.
Wu, Zhirong et al. (2015). “3d shapenets: A deep representation for volumetric shapes”. In: Proc.
of CVPR.
Wurtz, Robert H, Eric R Kandel, et al. (2000). “Central visual pathways”. In: Principles of
neural science 4, pp. 523–545.
Yan, Xinchen et al. (2016). “Perspective transformer nets: Learning single-view 3d object
reconstruction without 3d supervision”. In: Proc. of NIPS, pp. 1696–1704.
Yang, Songfan and Deva Ramanan (2015). “Multi-scale recognition with DAG-CNNs”. In: Proc.
of ICCV.
Zhou, Yanzhao et al. (2017). “Oriented response networks”. In: Proc. of CVPR.
Zhu, Jun-Yan et al. (2017). “Unpaired Image-to-Image Translation using Cycle-Consistent
Adversarial Networks”. In: Proc. of ICCV.
(此全文未開放授權)
電子全文
中英文摘要
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *