|
[1] A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? the kitti vision benchmark suite,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012.
[2] A. Gaidon, Q. Wang, Y. Cabon, and E. Vig, “Virtual worlds as proxy for multiobject tracking analysis,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
[3] S. R. Richter, Z. Hayder, and V. Koltun, “Playing for benchmarks,” in IEEE International Conference on Computer Vision (ICCV), 2017.
[4] M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele, “The cityscapes dataset for semantic urban scene understanding,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
[5] Y.C. Su, D. Jayaraman, and K. Grauman, “Pano2vid: Automatic cinematography for watching 360°videos,” in Asian Conference on Computer Vision (ACCV), 2016.
[6] H. Caesar, V. Bankiti, A. H. Lang, S. Vora, V. E. Liong, Q. Xu, A. Krishnan, Y. Pan, G. Baldan, and O. Beijbom, “nuscenes: A multimodal dataset for autonomous driving,” ArXiv:1903.11027, 2019.
[7] X. Zhou, V. Koltun, and P. Krähenbühl, “Tracking objects as points,” in IEEE International Conference on Computer Vision (ICCV), 2020.
[8] H.N. Hu, Q.Z. Cai, D. Wang, J. Lin, M. Sun, P. Krähenbühl, T. Darrell, and F. Yu, “Joint monocular 3d vehicle detection and tracking,” in IEEE International Conference on Computer Vision (ICCV), 2019.
[9] H.N. Hu, Y.C. Lin, M.Y. Liu, H.T. Cheng, Y.J. Chang, and M. Sun, “Deep 360 pilot: Learning a deep agent for piloting through 360°sports videos,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
[10] H.N. Hu, Y.C. Lin, M.Y. Liu, H.T. Cheng, Y.J. Chang, and M. Sun, “Technical report of deep 360 pilot,” 2017. https://aliensunmin.github.io/project/360video/.
[11] Y.C. Lin, Y.J. Chang, H.N. Hu, H.T. Cheng, C.W. Huang, and M. Sun, “Tell me where to look: Investigating ways for assisting focus in 360°video,” in ACM Conference on Human Factors in Computing Systems (CHI), 2017.
[12] B. T. Truong and S. Venkatesh, “Video abstraction: A systematic review and classification,” TMCCA, vol. 3, Feb. 2007.
[13] D. B. Christianson, S. E. Anderson, L. wei He, D. Salesin, D. S. Weld, and M. F. Cohen, “Declarative camera control for automatic cinematography.,” in AAAI Conference on Artificial Intelligence (AAAI), 1996.
[14] L.w. He, M. F. Cohen, and D. H. Salesin, “The virtual cinematographer: A paradigm for automatic realtime camera control and directing,” in Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques(ACM CGI), ACM SIGGRAPH, (New York, NY, USA), pp. 217–224, ACM, 1996.
[15] D. K. Elson and M. O. Riedl, “A Lightweight Intelligent Virtual Cinematography System for Machinima Production,” in AIIDE, 2007.
[16] P. Mindek, L. Čmolík, I. Viola, E. Gröller, and S. Bruckner, “Automatized summarization of multiplayer games,” in ACM CCG, 2015.
[17] Y.C. Su and K. Grauman, “Making 360°video watchable in 2d: Learning videography for click free viewing,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
[18] A. Patney, J. Kim, M. Salvi, A. Kaplanyan, C. Wyman, N. Benty, A. Lefohn, and D. Luebke, “Perceptuallybased foveated virtual reality,” in ACM Transactions on Graphics (TOG), pp. 17:1–17:2, 2016.
[19] J. Chen, H. M. Le, P. Carr, Y. Yue, and J. J. Little, “Learning online smooth predictors for realtime camera planning using recurrent decision trees,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
[20] J. Chen and P. Carr, “Mimicking human camera operators,” in IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 215–222, IEEE, 2015.
[21] S. Ren, K. He, R. Girshick, and J. Sun, “Faster rcnn: Towards realtime object detection with region proposal networks,” in Advances in Neural Information Processing Systems (NeurIPS), 2015.
[22] R. J. Williams, “Simple statistical gradientfollowing algorithms for connectionist reinforcement learning,” Machine Learning, vol. 8, no. 3, pp. 229–256, 1992.
[23] M. Vosmeer and B. Schouten, Interactive Cinema: Engagement and Interaction, pp. 140–147. Cham: Springer International Publishing, 2014.
[24] J. Gugenheimer, D. Wolf, G. Haas, S. Krebs, and E. Rukzio, “Swivrchair: A motorized swivel chair to nudge users’ orientation for 360 degree storytelling in virtual reality,” in ACM Conference on Human Factors in Computing Systems (CHI), CHI ’16, (New York, NY, USA), pp. 1996–2000, ACM, 2016.
[25] “Roto vr chair interactive virtual reality seat.” https://www.rotovr.com/. Accessed: 20170426.
[26] “New publisher tools for 360 video | facebook media.” https://www.facebook.com/2016/08/10/new-publisher-tools-for-360-video/. Accessed: 20170426.
[27] J. Kopf, “360°; video stabilization,” ACM Transactions on Graphics (TOG), vol. 35, pp. 195:1–195:9, Nov. 2016.
[28] W.S. Lai, Y. Huang, N. Joshi, C. Buehler, M.H. Yang, and S. B. Kang, “Semanticdriven generation of hyperlapse from 360 degree video,” IEEE Transactions on Visualization and Computer Graphics (TVCG), 2018.
[29] S. Gustafson, P. Baudisch, C. Gutwin, and P. Irani, “Wedge: clutterfree visualization of offscreen locations,” in ACM Conference on Human Factors in Computing Systems (CHI), pp. 787–796, 2008.
[30] “5 lessons learned while making lost | oculus.” https://www.oculus.com/story-studio/blog/5-lessons-learned-while-making-lost/. Accessed: 20170426.
[31] “Oculus.” https://www.oculus.com/. Accessed: 20170426.
[32] A. Sheikh, A. Brown, Z. Watson, and M. Evans, “Directing attention in 360degree video,” in IBC 2016 Conference, pp. 29–38, 2016.
[33] S. Burigat and L. Chittaro, “Visualizing references to offscreen content on mobile devices: A comparison of arrows, wedge, and overview+detail,” Interact. Comput., vol. 23, pp. 156–166, Mar. 2011.
[34] S. Burigat, L. Chittaro, and S. Gabrielli, “Visualizing locations of offscreen objects on mobile devices: a comparative evaluation of three approaches,” in ACM Conference on Humancomputer interaction with mobile devices and services (mobileCHI), pp. 239–246, 2006.
[35] B. Karstens, R. Rosenbaum, and H. Schumann, “Presenting large and complex information sets on mobile handhelds.,” in Ecommerce and Mcommerce Technologies, pp. 32–56, 2005.
[36] W. Song, D. W. Tjondronegoro, S.H. Wang, and M. J. Docherty, “Impact of zooming and enhancing region of interests for optimizing user experience on mobile sports video,” in ACM Conference on Multimedia (MM), MM ’10, (New York, NY, USA), pp. 321–330, ACM, 2010.
[37] D. Liu, G. Hua, and T. Chen, “A hierarchical visual model for video object summarization,” IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol. 32, no. 12, pp. 2178–2190, 2010.
[38] Y. Gong and X. Liu, “Video summarization using singular value decomposition,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2000.
[39] C. Ngo, Y. Ma, and H. Zhan, “Video summarization and scene detection by graph modeling,” in IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), 2005.
[40] A. Khosla, R. Hamid, C.J. Lin, and N. Sundaresan, “Largescale video summarization using webimage priors,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013.
[41] D. Potapov, M. Douze, Z. Harchaoui, and C. Schmid, “Categoryspecific video summarization,” in European Conference on Computer Vision (ECCV), 2014.
[42] M. Sun, A. Farhadi, and S. Seitz, “Ranking domainspecific highlights by analyzing edited videos,” in European Conference on Computer Vision (ECCV), 2014.
[43] T. Yao, T. Mei, and Y. Rui, “Highlight detection with pairwise deep ranking for firstperson video summarization,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
[44] B. Zhao and E. Xing, “Quasi realtime summarization for consumer videos,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2014.
[45] B. Gong, W.L. Chao, K. Grauman, and F. Sha, “Diverse sequential subset selection for supervised video summarization,” in Advances in Neural Information Processing Systems (NeurIPS), 2014.
[46] K. Zhang, W.L. Chao, F. Sha, and K. Grauman, “Summary transfer: Exemplarbased subset selection for video summarization,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016.
[47] K. Zhang, W. Chao, F. Sha, and K. Grauman, “Video summarization with long shortterm memory,” in European Conference on Computer Vision (ECCV), 2016.
[48] Y. Pritch, A. RavAcha, A. Gutman, and S. Peleg, “Webcam synopsis: Peeking around the world,” in IEEE International Conference on Computer Vision (ICCV), 2007.
[49] A. RavAcha, Y. Pritch, and S. Peleg, “Making a long video short,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2006.
[50] M. Sun, A. Farhadi, B. Taskar, and S. Seitz, “Summarizing unconstrained videos using salient montages,” in European Conference on Computer Vision (ECCV), 2014.
[51] D. Goldman, B. Curless, D. Salesin, and S. Seitz, “Schematic storyboarding for video visualization and editing,” in ACM Transactions on Graphics (TOG), 2006.
[52] N. Joshi, S. Metha, S. Drucker, E. Stollnitz, H. Hoppe, M. Uyttendaele, and M. F. Cohen, “Cliplets: Juxtaposing still and dynamic imagery,” in UIST, 2012.
[53] Y. J. Lee, J. Ghosh, and K. Grauman, “Discovering important people and objects for egocentric video summarization,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012.
[54] Z. Lu and K. Grauman, “Storydriven summarization for egocentric video,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013.
[55] J. Kopf, M. F. Cohen, and R. Szeliski, “Firstperson hyperlapse videos,” ACM Transactions on Graphics (TOG), vol. 33, July 2014.
[56] T. Liu, Z. Yuan, J. Sun, J. Wang, N. Zheng, X. Tang, and H.Y. Shum, “Learning to detect a salient object,” TPAMI, vol. 33, no. 2, pp. 353–367, 2011.
[57] J. Harel, C. Koch, and P. Perona, “Graphbased visual saliency.,” in Advances in Neural Information Processing Systems (NeurIPS), 2006.
[58] R. Achanta, S. S. Hemami, F. J. Estrada, and S. Süsstrunk, “Frequencytuned salient region detection.,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009.
[59] F. Perazzi, P. Krähenbühl, Y. Pritch, and A. Hornung, “Saliency filters: Contrast based filtering for salient region detection,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012.
[60] J. Wang, A. Borji, C.C. J. Kuo, and L. Itti, “Learning a combined model of visual saliency for fixation prediction,” TIP, vol. 25, no. 4, pp. 1566–1579, 2016.
[61] J. Zhang and S. Sclaroff, “Exploiting surroundedness for saliency detection: a boolean map approach,” TPAMI, vol. 38, no. 5, pp. 889–902, 2016.
[62] N. Liu and J. Han, “Dhsnet: Deep hierarchical saliency network for salient object detection,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
[63] S. Jetley, N. Murray, and E. Vig, “Endtoend saliency mapping via probability distribution prediction,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
[64] M. Cornia, L. Baraldi, G. Serra, and R. Cucchiara, “A deep multilevel network for saliency prediction,” in International Conference on Pattern Recognition (ICPR), 2016.
[65] J. Pan, K. McGuinness, E. Sayrol, N. O’Connor, and X. Giroi Nieto, “Shallow and deep convolutional networks for saliency prediction,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
[66] N. D. B. Bruce, C. Catton, and S. Janjic, “A deeper look at saliency: Feature contrast, semantics, and beyond,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016.
[67] Q. Wang, W. Zheng, and R. Piramuthu, “Grab: Visual saliency via novel graph model and background priors,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016.
[68] L. Wang, L. Wang, H. Lu, P. Zhang, and X. Ruan, “Saliency detection with recurrent fully convolutional networks.,” in European Conference on Computer Vision (ECCV), 2016.
[69] Y. Tang and X. Wu, “Saliency detection via combining regionlevel and pixellevel predictions with cnns,” in European Conference on Computer Vision (ECCV), 2016.
[70] X. Cui, Q. Liu, and D. Metaxas, “Temporal spectral residual: fast motion saliency detection,” in ACM Conference on Multimedia (MM), 2009.
[71] C. Guo, Q. Ma, and L. Zhang, “Spatiotemporal saliency detection using phase spectrum of quaternion fourier transform,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2008.
[72] V. Mahadevan and N. Vasconcelos, “Spatiotemporal saliency in dynamic scenes,” IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol. 32, no. 1, pp. 171–177, 2010.
[73] H. Seo and P. Milanfar, “Static and spacetime visual saliency detection by selfresemblance,” Journal of Vision, 2009.
[74] P. Mital, T. Smith, R. Hill, and J. Henderson, “Clustering of gaze during dynamic scene viewing is predicted by motion,” Cognitive Computation, vol. 3, no. 1, pp. 524, 2011.
[75] T. Lee, M. Hwangbo, T. Alan, O. Tickoo, and R. Iyer, “Lowcomplexity hog for efficient video saliency,” in ICIP, pp. 3749–3752, IEEE, 2015.
[76] T. Judd, K. Ehinger, F. Durand, and A. Torralba, “Learning to predict where humans look,” in IEEE International Conference on Computer Vision (ICCV), 2009.
[77] S. Goferman, L. ZelnikManor, and A. Tal, “Contextaware saliency detection,” IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol. 34, no. 10, pp. 1915–1926, 2012.
[78] D. Rudoy, D. B. Goldman, E. Shechtman, and L. ZelnikManor, “Learning video saliency from human gaze using candidate selection,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1147–1154, 2013.
[79] S. Mathe and C. Sminchisescu, “Actions in the eye: Dynamic gaze datasets and learnt saliency models for visual recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol. 37, 2015.
[80] A. Fathi, Y. Li, and J. M. Rehg, “Learning to recognize daily actions using gaze,” in European Conference on Computer Vision (ECCV), 2012.
[81] S. Mathe, A. Pirinen, and C. Sminchisescu, “Reinforcement Learning for Visual Object Detection,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016.
[82] J. Ba, V. Mnih, and K. Kavukcuoglu, “Multiple object recognition with visual attention,” in International Conference on Learning Representations (ICLR), 2015.
[83] V. Mnih, N. Heess, A. Graves, and k. kavukcuoglu, “Recurrent models of visual attention,” in Advances in Neural Information Processing Systems (NeurIPS) (Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, eds.), 2014.
[84] D. Zhang, H. Maei, X. Wang, and Y. Wang, “Deep reinforcement learning for visual object tracking in videos,” ArXiv:1701.08936, 2017.
[85] L. Bourdev and J. Malik, “Poselets: Body part detectors trained using 3d human pose annotations,” in IEEE International Conference on Computer Vision (ICCV), 2009.
[86] J. Foote and D. Kimber, “Flycam: Practical panoramic video and automatic camera control,” in ICME, 2000.
[87] X. Sun, J. Foote, D. Kimber, and B. Manjunath, “Region of interest extraction and virtual camera control based on panoramic video capturing,” IEEE Transactions on Multimedia (TMM), vol. 7, no. 5, pp. 981–990, 2005.
[88] T.Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft coco: Common objects in context,” in European Conference on Computer Vision (ECCV), 2014.
[89] M. Andriluka, S. Roth, and B. Schiele, “Peopletrackingbydetection and peopledetectionbytracking,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2008.
[90] N. Dalal, B. Triggs, and C. Schmid, “Human detection using oriented histograms of flow and appearance,” in European Conference on Computer Vision (ECCV), 2006.
[91] G. Gkioxari and J. Malik, “Finding action tubes,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
[92] R. S. Kennedy, N. E. Lane, K. S. Berbaum, and M. G. Lilienthal, “Simulator Sickness Questionnaire: An Enhanced Method for Quantifying Simulator Sickness,” The International Journal of Aviation Psychology, vol. 3, no. 3, p. 203, 1993.
[93] J. J. . Lin, H. B. L. Duh, D. E. Parker, H. AbiRached, and T. A. Furness, “Effects of field of view on presence, enjoyment, memory, and simulator sickness in a virtual environment,” in IEEE Virtual Reality (VR), pp. 164–171, 2002.
[94] H.N. Hu, Y.H. Yang, T. Fischer, F. Yu, T. Darrell, and M. Sun, “Monocular quasidense 3d object tracking,” ArXiv:2103.07351, 2021.
[95] L. Wen, D. Du, Z. Cai, Z. Lei, M.C. Chang, H. Qi, J. Lim, M.H. Yang, and S. Lyu, “Uadetrac: A new benchmark and protocol for multiobject detection and tracking,” ArXiv:1511.04136, 2015.
[96] A. Milan, L. LealTaixé, I. Reid, S. Roth, and K. Schindler, “Mot16: A benchmark for multiobject tracking,” ArXiv:1603.00831, 2016.
[97] J. Pang, L. Qiu, X. Li, H. Chen, Q. Li, T. Darrell, and F. Yu, “Quasidense similarity learning for multiple object tracking,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
[98] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014.
[99] R. Girshick, “Fast rcnn,” in IEEE International Conference on Computer Vision (ICCV), 2015.
[100] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.Y. Fu, and A. C. Berg, “Ssd: Single shot multibox detector,” in European Conference on Computer Vision (ECCV), 2016.
[101] J. Redmon and A. Farhadi, “Yolo9000: Better, faster, stronger,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
[102] T.Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, “Focal loss for dense object detection,” in IEEE International Conference on Computer Vision (ICCV), 2017.
[103] X. Zhou, D. Wang, and P. Krähenbühl, “Objects as points,” ArXiv:1904.07850, 2019.
[104] H. Law and J. Deng, “Cornernet: Detecting objects as paired keypoints,” in European Conference on Computer Vision (ECCV), 2018.
[105] Z. Tian, C. Shen, H. Chen, and T. He, “Fcos: Fully convolutional onestage object detection,” in IEEE International Conference on Computer Vision (ICCV), 2019.
[106] X. Chen, K. Kundu, Y. Zhu, A. G. Berneshawi, H. Ma, S. Fidler, and R. Urtasun, “3D object proposals for accurate object class detection,” in Advances in Neural Information Processing Systems (NeurIPS), 2015.
[107] X. Chen, K. Kundu, Z. Zhang, H. Ma, S. Fidler, and R. Urtasun, “Monocular 3D object detection for autonomous driving,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
[108] A. Mousavian, D. Anguelov, J. Flynn, and J. Košecká, “3D bounding box estimation using deep learning and geometry,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
[109] G. Brazil and X. Liu, “M3DRPN: Monocular 3D region proposal network for object detection,” in IEEE International Conference on Computer Vision (ICCV), 2019.
[110] A. Simonelli, S. R. Bulo, L. Porzi, M. LópezAntequera, and P. Kontschieder, “Disentangling monocular 3D object detection,” in IEEE International Conference on Computer Vision (ICCV), 2019.
[111] Y. Wang, W.L. Chao, D. Garg, B. Hariharan, M. Campbell, and K. Q. Weinberger, “PseudoLiDAR from visual depth estimation: Bridging the gap in 3D object detection for autonomous driving,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
[112] X. Ma, Z. Wang, H. Li, P. Zhang, W. Ouyang, and X. Fan, “Accurate monocular 3D object detection via colorembedded 3D reconstruction for autonomous driving,” in IEEE International Conference on Computer Vision (ICCV), 2019.
[113] A. Kundu, Y. Li, and J. M. Rehg, “3DRCNN: Instancelevel 3D object reconstruction via renderandcompare,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
[114] F. Chabot, M. Chaouch, J. Rabarisoa, C. Teuliere, and T. Chateau, “Deep MANTA: A coarsetofine manytask network for joint 2D and 3D vehicle analysis from monocular image,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
[115] Y. Zhou and O. Tuzel, “Voxelnet: Endtoend learning for point cloud based 3d object detection,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
[116] S. Shi, X. Wang, and H. Li, “PointRCNN: 3D object proposal generation and detection from point cloud,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
[117] A. H. Lang, S. Vora, H. Caesar, L. Zhou, J. Yang, and O. Beijbom, “PointPillars: Fast encoders for object detection from point clouds,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
[118] X. Chen, H. Ma, J. Wan, B. Li, and T. Xia, “Multiview 3d object detection network for autonomous driving,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
[119] A. Yilmaz, O. Javed, and M. Shah, “Object tracking: A survey,” ACM computing surveys (CSUR), 2006.
[120] S. Salti, A. Cavallaro, and L. Di Stefano, “Adaptive appearance modeling for video tracking: Survey and evaluation,” IEEE Transactions on Image Processing (TIP), 2012.
[121] A. W. Smeulders, D. M. Chu, R. Cucchiara, S. Calderara, A. Dehghan, and M. Shah, “Visual tracking: An experimental survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2014.
[122] D. S. Bolme, J. R. Beveridge, B. A. Draper, and Y. M. Lui, “Visual object tracking using adaptive correlation filters,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2010.
[123] M. Kristan, J. Matas, A. Leonardis, M. Felsberg, L. Cehovin, G. Fernández, T. Vojir, G. Hager, G. Nebehay, and R. Pflugfelder, “The visual object tracking vot2015 challenge results,” in IEEE International Conference on Computer Vision Workshops (ICCV Workshops), 2015.
[124] S. Hare, S. Golodetz, A. Saffari, V. Vineet, M.M. Cheng, S. L. Hicks, and P. H. Torr, “Struck: Structured output tracking with kernels,” IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2016.
[125] B. Babenko, M.H. Yang, and S. Belongie, “Visual tracking with online multiple instance learning,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009.
[126] Z. Kalal, K. Mikolajczyk, and J. Matas, “Trackinglearningdetection,” IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2012.
[127] W. Luo, J. Xing, A. Milan, X. Zhang, W. Liu, X. Zhao, and T.K. Kim, “Multiple object tracking: A literature review,” ArXiv:1409.7618, 2017.
[128] R. Tao, E. Gavves, and A. W. Smeulders, “Siamese instance search for tracking,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
[129] L. Bertinetto, J. Valmadre, J. F. Henriques, A. Vedaldi, and P. H. Torr, “Fullyconvolutional siamese networks for object tracking,” in European Conference on Computer Vision (ECCV), 2016.
[130] C. Feichtenhofer, A. Pinz, and A. Zisserman, “Detect to track and track to detect,” in IEEE International Conference on Computer Vision (ICCV), 2017.
[131] P. Bergmann, T. Meinhardt, and L. LealTaixé, “Tracking without bells and whistles,” in IEEE International Conference on Computer Vision (ICCV), 2019.
[132] D. Mykheievskyi, D. Borysenko, and V. Porokhonskyy, “Learning local feature descriptors for multiple object tracking,” in Asian Conference on Computer Vision (ACCV), 2020.
[133] W. Zhang, H. Zhou, S. Sun, Z. Wang, J. Shi, and C. C. Loy, “Robust multimodality multiobject tracking,” in IEEE International Conference on Computer Vision (ICCV), 2019.
[134] Li Zhang, Yuan Li, and R. Nevatia, “Global data association for multiobject tracking using network flows,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2008.
[135] W. Choi, “Nearonline multitarget tracking with aggregated local flow descriptor,” in IEEE International Conference on Computer Vision (ICCV), 2015.
[136] C. Kim, F. Li, A. Ciptadi, and J. M. Rehg, “Multiple hypothesis tracking revisited,” in IEEE International Conference on Computer Vision (ICCV), 2015.
[137] A. Ess, B. Leibe, K. Schindler, and L. Van Gool, “A mobile vision system for robust multiperson tracking,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2008.
[138] S. Scheidegger, J. Benjaminsson, E. Rosenberg, A. Krishnan, and K. Granstrom, “Monocamera 3d multiobject tracking using deep learning detections and pmbm filtering,” in IEEE Intelligent Vehicles Symposium (IV), 2018.
[139] P. Voigtlaender, M. Krause, A. Osep, J. Luiten, B. B. G. Sekar, A. Geiger, and B. Leibe, “Mots: Multiobject tracking and segmentation,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7942–7951, 2019.
[140] S. Sharma, J. A. Ansari, J. Krishna Murthy, and K. Madhava Krishna, “Beyond pixels: Leveraging geometry and shape cues for online multiobject tracking,” in IEEE International Conference on Robotics and Automation (ICRA), 2018.
[141] J. Luiten, T. Fischer, and B. Leibe, “Track to reconstruct and reconstruct to track,” IEEE Robotics and Automation Letters (RAL), 2020.
[142] P. Li, T. Qin, and a. Shen, “Stereo visionbased semantic 3d object and egomotion tracking for autonomous driving,” in European Conference on Computer Vision (ECCV), 2018.
[143] A. Osep, W. Mehner, M. Mathias, and B. Leibe, “Combined image and worldspace tracking in traffic scenes,” in IEEE International Conference on Robotics and Automation (ICRA), 2017.
[144] X. Weng, J. Wang, D. Held, and K. Kitani, “3d multiobject tracking: A baseline and new evaluation metrics,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2020.
[145] Z. Lu, V. Rathod, R. Votel, and J. Huang, “Retinatrack: Online single stage joint detection and tracking,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
[146] T. Yin, X. Zhou, and P. Krähenbühl, “Centerbased 3d object detection and tracking,” ArXiv:2006.11275, 2020.
[147] W. Maddern, G. Pascoe, C. Linegar, and P. Newman, “1 year, 1000 km: The oxford robotcar dataset,” International Journal of Robotics Research (IJRR), 2017.
[148] F. Yu, H. Chen, X. Wang, W. Xian, Y. Chen, F. Liu, V. Madhavan, and T. Darrell, “Bdd100k: A diverse driving dataset for heterogeneous multitask learning,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2020.
[149] M.F. Chang, J. Lambert, P. Sangkloy, J. Singh, S. Bak, A. Hartnett, D. Wang, P. Carr, S. Lucey, D. Ramanan, and J. Hays, “Argoverse: 3d tracking and forecasting with rich maps,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
[150] P. Sun, H. Kretzschmar, X. Dotiwalla, A. Chouard, V. Patnaik, P. Tsui, J. Guo, Y. Zhou, Y. Chai, B. Caine, V. Vasudevan, W. Han, J. Ngiam, H. Zhao, A. Timofeev, S. Ettinger, M. Krivokon, A. Gao, A. Joshi, Y. Zhang, J. Shlens, Z. Chen, and D. Anguelov, “Scalability in perception for autonomous driving: Waymo open dataset,” ArXiv:1912.04838, 2019.
[151] G. Ros, L. Sellart, J. Materzynska, D. Vazquez, and A. M. Lopez, “The synthia dataset: A large collection of synthetic images for semantic segmentation of urban scenes,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
[152] S. R. Richter, V. Vineet, S. Roth, and V. Koltun, “Playing for data: Ground truth from computer games,” in European Conference on Computer Vision (ECCV), 2016.
[153] A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V. Koltun, “Carla: An open urban driving simulator,” in Conference on Robot Learning (CoRL), 2017.
[154] P. Krähenbühl, “Free supervision from video games,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
[155] K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask rcnn,” in IEEE International Conference on Computer Vision (ICCV), 2017.
[156] P. J. Huber, “Robust estimation of a location parameter,” The Annals of Mathematical Statistics, 1964.
[157] A. Hermans, L. Beyer, and B. Leibe, “In defense of the triplet loss for person reidentification,” ArXiv:1703.07737, 2017.
[158] J. Valmadre, L. Bertinetto, J. Henriques, A. Vedaldi, and P. H. S. Torr, “Endtoend representation learning for correlation filter based tracking,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
[159] Z. Wu, Y. Xiong, X. Y. Stella, and D. Lin, “Unsupervised feature learning via nonparametric instance discrimination,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
[160] K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask rcnn,” in IEEE International Conference on Computer Vision (ICCV), pp. 2961–2969, 2017.
[161] A. Kendall and Y. Gal, “What uncertainties do we need in bayesian deep learning for computer vision?,” in Advances in Neural Information Processing Systems (NeurIPS), 2017.
[162] G. Brazil, G. PonsMoll, X. Liu, and B. Schiele, “Kinematic 3d object detection in monocular video,” in European Conference on Computer Vision (ECCV), (Virtual), 2020.
[163] F. Yu, W. Li, Q. Li, Y. Liu, X. Shi, and J. Yan, “Poi: multiple object tracking with high performance detection and appearance feature,” in European Conference on Computer Vision (ECCV), 2016.
[164] H. W. Kuhn, “The hungarian method for the assignment problem,” in Naval Research Logistics Quarterly, 1955.
[165] N. Wojke, A. Bewley, and D. Paulus, “Simple online and realtime tracking with a deep association metric,” in IEEE International Conference on Image Processing (ICIP), 2017.
[166] Y. Xiang, A. Alahi, and S. Savarese, “Learning to track: Online multiobject tracking by decision making,” in IEEE International Conference on Computer Vision (ICCV), 2015.
[167] A. Sadeghian, A. Alahi, and S. Savarese, “Tracking the untrackable: Learning to track multiple cues with longterm dependencies,” in IEEE International Conference on Computer Vision (ICCV), 2017.
[168] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, et al., “Imagenet large scale visual recognition challenge,” International Journal of Computer Vision (IJCV), 2015.
[169] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala, “Pytorch: An imperative style, highperformance deep learning library,” in Advances in Neural Information Processing Systems (NeurIPS) (H. Wallach, H. Larochelle, A. Beygelzimer, F. d'AlchéBuc, E. Fox, and R. Garnett, eds.), pp. 8024–8035, Curran Associates, Inc., 2019.
[170] F. Yu, D. Wang, E. Shelhamer, and T. Darrell, “Deep layer aggregation,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
[171] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
[172] H. Robbins and S. Monro, “A stochastic approximation method,” Ann. Math. Statist., vol. 22, pp. 400–407, 09 1951.
[173] T.Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft coco: Common objects in context,” in European Conference on Computer Vision (ECCV), 2014.
[174] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman, “The pascal visual object classes (voc) challenge,” International Journal of Computer Vision (IJCV), vol. 88, 2010.
[175] K. Bernardin and R. Stiefelhagen, “Evaluating multiple object tracking performance: the clear mot metrics,” EURASIP Journal on Image and Video Processing (JIVP), 2008.
[176] Y. Li, C. Huang, and R. Nevatia, “Learning to associate: Hybridboosted multitarget tracker for crowded scene,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009.
[177] R. E. Kalman, “A new approach to linear filtering and prediction problems,” Journal of basic Engineering, 1960.
[178] B. Zhu, Z. Jiang, X. Zhou, Z. Li, and G. Yu, “Classbalanced grouping and sampling for point cloud 3d object detection,” ArXiv:1908.09492, 2019.
[179] Y. Wang, S. Chen, L. Huang, R. Ge, Y. Hu, Z. Ding, and J. Liao, “1st place solutions for waymo open dataset challenges – 2d and 3d tracking,” ArXiv:2006.15506, 2020.
[180] O. D. Team, “Openpcdet: An opensource toolbox for 3d object detection from point clouds.” https://github.com/open-mmlab/OpenPCDet, 2020.
[181] K. Krippendorff, Content analysis: An introduction to its methodology. Sage, 2012.
[182] S. Hwang, J. Park, N. Kim, Y. Choi, and I. S. Kweon, “Multispectral pedestrian detection: Benchmark dataset and baseline,” Integrated ComputerAided Engineering (ICAE), 2013.
[183] L. LealTaixé, A. Milan, I. Reid, S. Roth, and K. Schindler, “Motchallenge 2015: Towards a benchmark for multitarget tracking,” ArXiv:1504.01942, 2015.
[184] A. Shenoi, M. Patel, J. Gwak, P. Goebel, A. Sadeghian, H. Rezatofighi, R. MartinMartin, and S. Savarese, “Jrmot: A realtime 3d multiobject tracker and a new largescale dataset,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2020.
[185] H. Karunasekera, H. Wang, and H. Zhang, “Multiple object tracking with attention to appearance, structure, motion and size,” IEEE Access, 2019.
[186] G. Gündüz and T. Acarman, “A lightweight online multiple object vehicle tracking method,” in IEEE Intelligent Vehicles Symposium (IV), 2018.
[187] B. Lee, E. Erdenee, S. Jin, M. Y. Nam, Y. G. Jung, and P. Rhee, “Multiclass multiobject tracking using changing point detection,” in European Conference on Computer Vision Workshops (ECCV Workshops), 2016.
[188] W. Choi, “Nearonline multitarget tracking with aggregated local flow descriptor,” in IEEE International Conference on Computer Vision (ICCV), 2015.
[189] D. Frossard and R. Urtasun, “Endtoend learning of multisensor 3d tracking by detection,” in IEEE International Conference on Robotics and Automation (ICRA), 2018.
[190] J. Hong Yoon, C.R. Lee, M.H. Yang, and K.J. Yoon, “Online multiobject tracking via structural constraint event aggregation,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
[191] P. Lenz, A. Geiger, and R. Urtasun, “Followme: Efficient online mincost flow tracking with bounded memory and computation,” in IEEE International Conference on Computer Vision (ICCV), 2015.
|