|
[1] M. Hoai and F. Dela Torre, “Max-margin early event detectors,” in CVPR,2012. [2] M. S. Ryoo, “Human activity prediction: Early recognition of ongoing activities from streaming videos,” in ICCV, 2011. [3] T. Lan, T.-C. Chen, and S. Savarese, “A hierarchical representation for future action prediction,” in ECCV, 2014. [4] K. M. Kitani, B. D. Ziebart, J. A. D. Bagnell, and M. Hebert, “Activity forecasting,” in ECCV, 2012. [5] F.-H. Chan, Y.-T. Chen, Y. Xiang, and M. Sun, “Anticipating accidents in dashcam videos,” ACCV, 2016. [6] A. Jain, H. S. Koppula, B. Raghavan, S. Soh, and A. Saxena, “Car that knows before you do: Anticipating maneuvers via learning temporal driving models,” in ICCV,2015. [7] A. Jain, A. Singh, H. S. Koppula, S. Soh, and A. Saxena, “Recurrent neural networks for driver activity anticipation via sensory-fusion architecture,” in ICRA, 2016. [8] S. Abu-El-Haija, N. Kothari, J. Lee, P. Natsev, G. Toderici, B. Varadarajan, and S. Vijayanarasimhan, “Youtube-8m: A large-scale video classification benchmark,” arXiv:1609.08675, 2016. [9] B. G. Fabian Caba Heilbron, Victor Escorcia and J. C. Niebles, “Activitynet: A large-scale video benchmark for human activity understanding,” in CVPR, 2015. [10] J.-B. Alayrac, J. Sivic, I. Laptev, and S. Lacoste-Julien, “Joint discovery of object states and manipulating actions,” in ICCV, 2017. [11] R. Mottaghi, C. Schenck, D. Fox, and A. Farhadi, “See the glass half full: Reasoning about liquid containers, their volume and content,” in ICCV, 2017. [12] K. Ohnishi, A. Kanehira, A. Kanezaki, and T. Harada, “Recognizing activities of daily living with a wrist-mounted camera,” in CVPR, 2016. [13] C.-S. Chan, S.-Z. Chen, P.-X. Xie, C.-C. Chang, and M. Sun, “Recognition from hand cameras: A revisit with deep learning,” in ECCV, 2016. [14] K. Soomro, A. R. Zamir, and M. Shah, “Ucf101: A dataset of 101 human actions classes from videos in the wild,” arXiv:1212.0402, 2012. [15] H. Kuehne, H. Jhuang, E. Garrote, T. Poggio, and T. Serre, “Hmdb: a large video database for human motion recognition,” in ICCV, 2011. [16] M. Rohrbach, S. Amin, M. Andriluka, and B. Schiele, “A database for finegrained activity detection of cooking activities,” in CVPR, 2012. [17] G. Chéron, I. Laptev, and C. Schmid, “P-cnn: Pose-based cnn features for action recognition,” in ICCV, 2015. [18] H. Jhuang, J. Gall, S. Zuffi, C. Schmid, and M. J. Black, “Towards understanding action recognition,” in ICCV, 2013. [19] T.-H. Vu, C. Olsson, I. Laptev, A. Oliva, and J. Sivic, “Predicting actions from static scenes,” in ECCV, 2014. [20] Y. Zhang, W. Qu, and D. Wang, “Action-scene model for human action recognition from videos,” 2014. [21] D. J. Moore, I. A. Essa, and M. H. Hayes, “Exploiting human actions and object context for recognition tasks,” in ICCV, 1999. [22] V. Delaitre, J. Sivic, and I. Laptev, “Learning person-object interactions for action recognition in still images,” in NIPS, 2011. [23] A. Gupta, A. Kembhavi, and L. S. Davis, “Observing human-object interactions: Using spatial and functional compatibility for recognition,” TPAMI, 2009. [24] A. Gupta and L. S. Davis, “Objects in action: An approach for combining action understanding and object perception,” in CVPR, 2007. [25] A. Fathi and J. M. Rehg, “Modeling actions through state changes,” in CVPR, 2013. [26] S. Bambach, S. Lee, D. J. Crandall, and C. Yu, “Lending a hand: Detecting hands and recognizing activities in complex egocentric interactions,” in ICCV, 2015. [27] M. Ma, H. Fan, and K. M. Kitani, “Going deeper into first-person activity recognition,” in CVPR, 2016. [28] J.-F. Hu, W.-S. Zheng, J. Lai, and J. Zhang, “Jointly learning heterogeneous features for rgb-d activity recognition,” in CVPR, 2015. [29] J. Lei, X. Ren, and D. Fox, “Fine-grained kitchen activity recognition using rgb-d,” in UbiComp, 2012. [30] S. Song,N.-M. Cheung, V. Chandrasekhar, B. Mandal, and J. Liri, “Egocentric activity recognition with multimodal fisher vector,” in Acoustics, Speech and Signal Processing (ICASSP),IEEE,2016. [31] F. de la Torre, J. K. Hodgins, J. Montano, and S. Valcarcel, “Detailed human data acquisition of kitchen activities: the cmu-multimodal activity database (cmummac),” in CHI Workshop.,2009. [32] D. Roggen, A. Calatroni, M. Rossi, T. Holleczek, K. Förster, G. Tröster, P. Lukowicz, D. Bannach, G. Pirkl, A. Ferscha, etal., “Collecting complex activity datasets in highly rich networked sensor environments,” in Networked Sensing Systems (INSS), IEEE, 2010. [33] Y. Zhou, B. Ni, R. Hong, M. Wang, and Q. Tian, “Interaction part mining: A mid- level approach for fine-grained action recognition,” in CVPR, 2015. [34] Y. Zhou, B. Ni, S. Yan, P. Moulin, and Q. Tian, “Pipelining localized semantic features for fine-grained action recognition,” in ECCV, 2014. [35] X. Peng, C. Zou, Y. Qiao, and Q. Peng, “Action recognition with stacked fisher vectors,” in ECCV, 2014. [36] H. S. Koppula and A. Saxena, “Anticipating human activities using object affordances for reactive robotic response,” PAMI, vol. 38, no. 1, pp. 14–29, 2016. [37] C. Vondrick, H. Pirsiavash, and A. Torralba, “Anticipating visual representations from unlabeled video,” in CVPR, 2016. [38] S. Z. Bokhari and K. M. Kitani, “Long-term activity forecasting using first-person vision,” in ACCV, 2016. [39] Z. Wang, M. Deisenroth, H. Ben Amor, D. Vogt, B. Schölkopf, and J. Peters, “Probabilistic modeling of human movements for intention inference,” in RSS, 2012. [40] H. S. Koppula, A. Jain, and A. Saxena, “Anticipatory planning for human-robot teams,” in ISER, 2014. [41] J. Mainprice and D. Berenson, “Human-robot collaborative manipulation planning using early prediction of human motion,” in IROS, 2013. [42] A. Hashimoto, J. Inoue, T. Funatomi, and M. Minoh, “Intention-sensing recipe guidance via user accessing objects,” International Journal of Human-Computer Interaction, 2016. [43] N. Rhinehart and K. M. Kitani, “First-person activity forecasting with online inverse reinforcement learning,” ICCV, 2017. [44] J. Yuen and A. Torralba, “A data-driven approach for event prediction,” in ECCV, 2010. [45] J. Walker, A. Gupta, and M. Hebert, “Patch to the future: Unsupervised visual prediction,” in CVPR, 2014. [46] V. Joo, W. Li, F. F. Steen, and S.-C. Zhu, “Visual persuasion: Inferring communicative intents of images,” in CVPR, 2014. [47] C. Vondrick, D. Oktay, H. Pirsiavash, and A. Torralba, “Predicting motivations of actions by leveraging text,” in CVPR, 2016. [48] H. Yu, J. Wang, Z. Huang, Y. Yang, and W. Xu, “Video paragraph captioning using hierarchical recurrent neural networks,” in CVPR,2016. [49] P. Pan, Z. Xu, Y. Yang, F. Wu, and Y. Zhuang, “Hierarchical recurrent neural encoder for video representation with application to captioning,” in CVPR, 2016. [50] Y. Pan, T. Mei, T. Yao, H. Li, and Y. Rui, “Jointly modeling embedding and translation to bridge video and language,” in CVPR, 2016. [51] K.-H. Zeng, T.-H. Chen, J. C. Niebles, and M. Sun, “Title generation for user generated videos,” in ECCV, 2016. [52] S. Venugopalan, M. Rohrbach, J. Donahue, R. J. Mooney, T. Darrell, and K. Saenko, “Sequence to sequence – video to text.,” in ICCV, 2015. [53] L. Yao, A. Torabi, K. Cho, N. Ballas, C. Pal, H. Larochelle, and A. Courville, “Describing videos by exploiting temporal structure,” in ICCV, 2015. [54] S. Singh, C. Arora, and C. V.Jawahar, “First person action recognition using deep learned descriptors,” in CVPR, 2016. [55] A. Fathi, A. Farhadi, and J. M. Rehg, “Understanding egocentric activities,” in ICCV, 2011. [56] Y. Li, Z. Ye, and J. M. Rehg, “Delving into egocentric actions,” in CVPR, 2015. [57] Z. Lu and K. Grauman, “Story-driven summarization for egocentric video,” in CVPR, 2013. [58] J. Ghosh, Y. J. Lee, and K. Grauman, “Discovering important people and objects for egocentric video summarization,” in CVPR, 2012. [59] C. Schenck and D. Fox, “Detection and tracking of liquids with fully convolutional networks,” in RSS workshop, 2016. [60] P. Sermanet, C. Lynch, J. Hsu, and S. Levine, “Time-contrastive networks: Self-supervised learning from multi-view observation,” arXiv:1704.06888, 2017. [61] M. Tamosiunaite, B. Nemec, A. Ude, and F. Wörgötter, “Learning to pour with a robot arm combining goal and shape learning for dynamic movement primitives,” IEEE-RAS, 2011. [62] L. Rozo, P. Jiménez, and C. Torras, “Force-based robot learning of pouring skills using parametric hidden markov models,”in 9th International Workshop on Robot Motion and Control, 2013. [63] S. Brandi, O. Kroemer, and J. Peters, “Generalizing pouring actions between objects using warped parameters,” in Humanoids,2014. [64] C. Schenck and D. Fox, “Visual closed-loop control for pouring liquids,” 2017. [65] A. Yamaguchi and C. G. Atkeson, “Differential dynamic programming with temporally decomposed dynamics,” 2015. [66] L. Kunze and M. Beetz, “Envisioning the qualitative effects of robot manipulation actions using simulation – based projections,” 2017. [67] R. J. Williams, “Simple statistical gradient – following algorithms for connectionist reinforcement learning,” Machine Learning, vol. 8, no. 3, pp. 229–256, 1992. [68] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet: A Large-Scale Hierarchical Image Database,”inCVPR,2009. [69] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in CVPR, 2016. [70] Y. Chen and Y. Xue, “A deep learning approach to human activity recognition based on single accelerometer,” in SMC, 2015. [71] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in NIPS, 2014. [72] G. A. Sigurdsson, G. Varol, X. Wang, A. Farhadi, I. Laptev, and A. Gupta, “Hollywood in homes: Crowdsourcing data collection for activity understanding,” in ECCV, 2016. [73] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” CoRR, vol.abs/1409.1556, 2014. [74] J. W. Lockhart, G. M. Weiss, J. C. Xue, S. T. Gallagher, A. B. Grosner, and T. T. Pulickal, “Design considerations for the wisdm smart phone-based sensor mining architecture,” in Proceedings of the Fifth International Workshop on Knowledge Discovery from Sensor Data, 2011. |