帳號:guest(216.73.216.146)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):俞尚毅
作者(外文):Yu, Shang-Yi
論文名稱(中文):探尋影片中的承擔特質
論文名稱(外文):Affordance Detection in Videos
指導教授(中文):陳煥宗
指導教授(外文):Chen, Hwann-Tzong
口試委員(中文):邱維辰
胡敏君
口試委員(外文):Chiu, Wei-Chen
Hu, Min-Chun
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊工程學系
學號:107062507
出版年(民國):109
畢業學年度:108
語文別:英文
論文頁數:27
中文關鍵詞:影片承擔特質偵測
外文關鍵詞:VideoAffordanceDetection
相關次數:
  • 推薦推薦:0
  • 點閱點閱:793
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
本文提出了一種全新有關於承擔特質的任務:在影片中的每一幀畫面中,尋
找具有承擔特質的區域以及判斷承擔特質的有無。以往有關承擔特質的研究
中,只著重在圖像的偵測;為了此項在影片中偵測承擔特質的新任務,我們
提出一個新的承擔特質資料庫Support Affordance Video (SAV) dataset,蒐
集支撐承擔特質的影片並設計一系列的動作情境,使得承擔特質有無的狀態
隨著情境中的動作及環境而改變。我們提出了網路架構,使用兩條不同的分
支加上專注於時間序的模組,預測在影片中的承擔特質的關注區域、承擔特
質的區域、及承擔特質有無的標籤。我們檢驗在SAV 資料集上測試的結果,
以驗證此方法的有效性。
This thesis proposes a new task on affordance: detecting the affordance
region and predicting the existence of affordance for each frame in a video
sequence. In the past, researches about affordance only focus on detection
for a single image. For this new task about affordance detection in videos,
we build a new affordance dataset, Support Affordance Video (SAV) dataset.
The dataset consists of support affordance videos that exhibit a series of action
scenarios to make the affordance existence status change as actions and environments
change in scenarios. We propose a network architecture that uses
two different branches and temporal modules to predict affordance attention
area, affordance region, and affordance existence label in a video. The experimental
results on SAV dataset provide a baseline of the new task and validate
the effectiveness of our method.
List of Tables 5
List of Figures 6
摘要7
Abstract 8
1 Introduction 9
2 Related work 11
3 Dataset 13
4 Our Approach 17
4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.2 Affordance Features Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.3 Affordance Attention Predictor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.4 Affordance Existence Predictor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.5 Affordance Region Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
5 Experiments 21
5 Experiments 21
5.1 Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.1.1 Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.1.2 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.2 Quantitative Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.2.1 Evaluation on SAV Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.2.2 Ablation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.3 Qualitative Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
6 Conclusion and FutureWork 24
Bibliography 25
[1] F.-J. Chu, R. Xu, and P. A. Vela. Detecting robotic affordances on novel objects with
regional attention and attributes. 2019.
[2] C. Chuang, J. Li, A. Torralba, and S. Fidler. Learning to act properly: Predicting and
explaining affordances from images. In 2018 IEEE/CVF Conference on Computer
Vision and Pattern Recognition, pages 975–983, 2018.
[3] T. Do, A. Nguyen, and I. Reid. Affordancenet: An end-to-end deep learning approach
for object affordance detection. In 2018 IEEE International Conference on Robotics
and Automation (ICRA), pages 5882–5889, 2018.
[4] K. Fang, T. Wu, D. Yang, S. Savarese, and J. J. Lim. Demo2vec: Reasoning object
affordances from online videos. In 2018 IEEE/CVF Conference on Computer Vision
and Pattern Recognition, pages 2139–2147, 2018.
[5] D. F. Fouhey, X. Wang, and A. Gupta. In defense of the direct perception of affordances.
2015.
[6] J. Gibson. The Ecological Approach to Visual Perception. Resources for ecological
psychology. Lawrence Erlbaum Associates, 1986.
[7] K. He, G. Gkioxari, P. Dollár, and R. B. Girshick. Mask R-CNN. CoRR,
abs/1703.06870, 2017.
[8] J. Hu, L. Shen, S. Albanie, G. Sun, and E. Wu. Squeeze-and-excitation networks.
IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 1–1, 2019.
[9] R. Karlsson and E. Sjoberg. Learning a directional soft lane affordance model for road
scenes using self-supervision. In IEEE Conference on Computer Vision and Pattern
Recognition, CVPR 2020, Las Vegas, NV, United States, October 20-23, 2020, 2020.
[10] A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei. Largescale
video classification with convolutional neural networks. In 2014 IEEE Conference
on Computer Vision and Pattern Recognition, pages 1725–1732, 2014.
[11] J. Lin, C. Gan, and S. Han. Tsm: Temporal shift module for efficient video understanding.
In 2019 IEEE/CVF International Conference on Computer Vision (ICCV),
pages 7082–7092, 2019.
[12] L. Manuelli,W. Gao, P. R. Florence, and R. Tedrake. kpam: Keypoint affordances for
category-level robotic manipulation. CoRR, abs/1903.06684, 2019.
[13] A. Myers, C. L. Teo, C. Fermüller, and Y. Aloimonos. Affordance detection of tool
parts from geometric features. In 2015 IEEE International Conference on Robotics
and Automation (ICRA), pages 1374–1381, 2015.
[14] A. Newell, K. Yang, and J. Deng. Stacked hourglass networks for human pose estimation.
CoRR, abs/1603.06937, 2016.
[15] M. Rahman and Y. Wang. Optimizing intersection-over-union in deep neural networks
for image segmentation. volume 10072, pages 234–244, 12 2016.
[16] O. Ronneberger, P. Fischer, and T. Brox. U-net: Convolutional networks for biomedical
image segmentation. CoRR, abs/1505.04597, 2015.
[17] A. Roy and S. Todorovic. A multi-scale cnn for affordance segmentation in rgb images.
2016.
[18] E. Ruiz and W. W. Mayol-Cuevas. Egocentric affordance detection with the one-shot
geometry-driven interaction tensor. CoRR, abs/1906.05794, 2019.
[19] J. Sawatzky, Y. Souri, C. Grund, and J. Gall. What object should I use? - task driven
object detection. CoRR, abs/1904.03000, 2019.
[20] X. Shi, Z. Chen, H. Wang, D. Yeung, W. Wong, and W. Woo. Convolutional
LSTM network: A machine learning approach for precipitation nowcasting. CoRR,
abs/1506.04214, 2015.
[21] K. K. Singh and Y. J. Lee. Hide-and-seek: Forcing a network to be meticulous for
weakly-supervised object and action localization. CoRR, abs/1704.04232, 2017.
[22] A. Srikantha and J. Gall. Weakly supervised learning of affordances. 2016.
[23] C. Sun, J. M. U. Vianney, and D. Cao. Affordance learning in direct perception for
autonomous driving. 2019.
[24] S. Thermos, P. Daras, and G. Potamianos. A deep learning approach to object affordance
segmentation. In ICASSP 2020 - 2020 IEEE International Conference on
Acoustics, Speech and Signal Processing (ICASSP), pages 2358–2362, 2020.
[25] S. Thermos, G. T. Papadopoulos, P. Daras, and G. Potamianos. Deep affordancegrounded
sensorimotor object recognition. CoRR, abs/1704.02787, 2017.
[26] M. Toromanoff, E. Wirbel, and F. Moutarde. End-to-end model-free reinforcement
learning for urban driving using implicit affordances. In IEEE Conference on Computer
Vision and Pattern Recognition, CVPR, 2020.
[27] D. Tran, H. Wang, L. Torresani, J. Ray, Y. LeCun, and M. Paluri. A closer look at
spatiotemporal convolutions for action recognition. In 2018 IEEE/CVF Conference
on Computer Vision and Pattern Recognition, pages 6450–6459, 2018.
[28] Q.Wang, L. Zhang, L. Bertinetto,W. Hu, and P. H. S. Torr. Fast online object tracking
and segmentation: A unifying approach. In 2019 IEEE/CVF Conference on Computer
Vision and Pattern Recognition (CVPR), pages 1328–1338, 2019.
[29] X. Wang, R. B. Girshick, A. Gupta, and K. He. Non-local neural networks. CoRR,
abs/1711.07971, 2017.
[30] N. Xu, L. Yang, Y. Fan, J. Yang, D. Yue, Y. Liang, B. L. Price, S. Cohen, and
T. S. Huang. Youtube-vos: Sequence-to-sequence video object segmentation. CoRR,
abs/1809.00461, 2018.
[31] L. Yang, Y. Fan, and N. Xu. Video instance segmentation. In 2019 IEEE/CVF International
Conference on Computer Vision (ICCV), pages 5187–5196, 2019.
[32] L. Yen-Chen, A. Zeng, S. Song, P. Isola, and T.-Y. Lin. Learning to see before learning
to act: Visual pre-training for manipulation. In IEEE International Conference on
Robotics and Automation (ICRA), 2020.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *