|
1. [1] Akhter, I., and Black, M. J. Poseconditioned joint angle limits for 3d human pose reconstruction. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015), 1446–1455. 2. [2] Andriluka, M., Pishchulin, L., Gehler, P., and Schiele, B. 2d human pose esti mation: New benchmark and state of the art analysis. 2014 IEEE Conference on Computer Vision and Pattern Recognition (2014), 3686–3693. 3. [3] Ba,J.,Kiros,J.,andHinton,G.E.Layer normalization.ArXivabs/1607.06450 (2016). 4. [4] Carion,N.,Massa,F.,Synnaeve,G.,Usunier,N.,Kirillov,A.,andZagoruyko, S. Endtoend object detection with transformers. In European Conference on Computer Vision (2020), Springer, pp. 213–229. 5. [5] Catalin Ionescu, Fuxin Li, C. S. Latent structured models for human pose estimation. In International Conference on Computer Vision (2011). 6. [6] Chen,C.H.,andRamanan,D.3dhumanposeestimation=2dposeestimation + matching. 2017 IEEE Conference on Computer Vision and Pattern Recog nition (CVPR) (2017), 5759–5767. 7. [7] Chen, X., Lin, K.Y., Liu, W., Qian, C., Wang, X., and Lin, L. Weakly supervised discovery of geometryaware representation for 3d human pose estimation. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019), 10887–10896. 8. [8] Chen,Y.,Wang,Z.,Peng,Y.,Zhang,Z.,Yu,G.,andSun,J.Cascadedpyramid network for multiperson pose estimation. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (2018), 7103–7112. 9. [9] Devlin, J., Chang, M.W., Lee, K., and Toutanova, K. Bert: Pretraining of deep bidirectional transformers for language understanding. In NAACLHLT (2019). 10. [10] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Un terthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., and Houlsby, N. An image is worth 16x16 words: Transformers for image recognition at scale. ArXiv abs/2010.11929 (2020). 11. [11] Fang, H., Xie, S., Tai, Y.W., and Lu, C. Rmpe: Regional multiperson pose estimation. 2017 IEEE International Conference on Computer Vision (ICCV) (2017), 2353–2362. 12. [12] Fang, H., Xu, Y., Wang, W., Liu, X., and Zhu, S. Learning knowledgeguided pose grammar machine for 3d human pose estimation. ArXiv abs/1710.06513 (2017). 13. [13] Habibie, I., Xu, W., Mehta, D., PonsMoll, G., and Theobalt, C. In the wild human pose estimation using explicit 2d features and intermediate 3d representations. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019), 10897–10906. 14. [14] Hendrycks, D., and Gimpel, K. Gaussian error linear units (gelus). arXiv: Learning (2016). 15. [15] Hossain, M. R. I., and Little, J. Exploiting temporal information for 3d human pose estimation. In ECCV (2018). 16. [16] Ionescu, C., Carreira, J., and Sminchisescu, C. Iterated secondorder label sensitive pooling for 3d human pose estimation. 2014 IEEE Conference on Computer Vision and Pattern Recognition (2014), 1661–1668. 17. [17] Ionescu, C., Papava, D., Olaru, V., and Sminchisescu, C. Human3.6m: Large scale datasets and predictive methods for 3d human sensing in natural envi ronments. IEEE Transactions on Pattern Analysis and Machine Intelligence 36 (2014), 1325–1339. 18. [18] Iqbal, U., Milan, A., and Gall, J. Posetrack: Joint multiperson pose estima tion and tracking. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017), 4654–4663. 19. [19] Jiang, H. 3d human pose reconstruction using millions of exemplars. 2010 20th International Conference on Pattern Recognition (2010), 1674–1677. 20. [20] Johnson, S., and Everingham, M. Clustered pose and nonlinear appearance models for human pose estimation. In BMVC (2010). 21. [21] Joo, H., Simon, T., Li, X., Liu, H., Tan, L., Gui, L., Banerjee, S., Godisart, T., Nabbe, B. C., Matthews, I., Kanade, T., Nobuhara, S., and Sheikh, Y. Panoptic studio: A massively multiview system for social interaction capture. IEEE Transactions on Pattern Analysis and Machine Intelligence 41 (2019), 190– 204. 22. [22] Kanazawa, A., Black, M. J., Jacobs, D., and Malik, J. Endtoend recovery of human shape and pose. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (2018), 7122–7131. 23. [23] Khan, S., Naseer, M., Hayat, M., Zamir, S. W., Khan, F., and Shah, M. Trans formers in vision: A survey. ArXiv abs/2101.01169 (2021). 24. [24] Lee, K., Lee, I., and Lee, S. Propagating lstm: 3d pose estimation based on joint interdependency. In ECCV (2018). 25. [25] Li, C., and Lee, G. H. Generating multiple hypotheses for 3d human pose estimation with mixture density network. 2019 IEEE/CVF Conference on Com puter Vision and Pattern Recognition (CVPR) (2019), 9879–9887. 26. [26] Li, S., and Chan, A. B. 3d human pose estimation from monocular images with deep convolutional neural network. In ACCV (2014). 27. [27] Li,S.,Ke,L.,Pratama,K.,Tai,Y.W.,Tang,C.,andCheng,K.Cascadeddeep monocular 3d human pose estimation with evolutionary training data. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020), 6172–6182. 28. [28] Lin,T.Y.,Maire,M.,Belongie,S.J.,Hays,J.,Perona,P.,Ramanan,D.,Dollár, P., and Zitnick, C. L. Microsoft coco: Common objects in context. In ECCV (2014). 29. [29] Liu, R., Shen, J., Wang, H., Chen, C., Cheung, S., and Asari, V. Attention mechanism exploits temporal contexts: Realtime 3d human pose reconstruc tion. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recogni tion (CVPR) (2020), 5063–5072. 30. [30] Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., and Stoyanov, V. Roberta: A robustly optimized bert pretraining approach. ArXiv abs/1907.11692 (2019). 31. [31] Loshchilov, I., and Hutter, F. Decoupled weight decay regularization. In ICLR (2019). 32. [32] Martinez, J., Hossain, R., Romero, J., and Little, J. A simple yet effective baseline for 3d human pose estimation. 2017 IEEE International Conference on Computer Vision (ICCV) (2017), 2659–2668. 33. [33] Mehta, D., Rhodin, H., Casas, D., Fua, P., Sotnychenko, O., Xu, W., and Theobalt, C. Monocular 3d human pose estimation in the wild using improved cnn supervision. 2017 International Conference on 3D Vision (3DV) (2017), 506–516. 34. [34] Moon, G., Chang, J. Y., and Lee, K. M. Posefix: Modelagnostic general human pose refinement network. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019), 7765–7773. 35. [35] Papandreou, G., Zhu, T. L., Kanazawa, N., Toshev, A., Tompson, J., Bregler, C., and Murphy, K. Towards accurate multiperson pose estimation in the wild. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017), 3711–3719. 36. [36] Pavlakos, G., Zhou, X., Derpanis, K., and Daniilidis, K. Coarsetofine volu metric prediction for singleimage 3d human pose. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017), 1263–1272. 37. [37] Pavllo, D., Feichtenhofer, C., Grangier, D., and Auli, M. 3d human pose estimation in video with temporal convolutions and semisupervised training. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019), 7745–7754. 38. [38] Pavllo, D., Feichtenhofer, C., Grangier, D., and Auli, M. 3d human pose estimation in video with temporal convolutions and semisupervised training. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019), pp. 7753–7762. 39. [39] Pishchulin, L., Jain, A., Andriluka, M., Thormählen, T., and Schiele, B. Articulated people detection and pose estimation: Reshaping the future. 2012 IEEE Conference on Computer Vision and Pattern Recognition (2012), 3178–3185. 40. [40] Rogez, G., and Schmid, C. Mocapguided data augmentation for 3d pose estimation in the wild. In NIPS (2016). 41. [41] Sigal, L., Balan, A., and Black, M. J. Humaneva: Synchronized video and moion capture dataset and baseline algorithm for evaluation of articulated human motion. International Journal of Computer Vision 87 (2009), 4–27. 42. [42] Sun, K., Xiao, B., Liu, D., and Wang, J. Deep highresolution representation learning for human pose estimation. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019), 5686–5696. 43. [43] Tekin, B., MárquezNeila, P., Salzmann, M., and Fua, P. Learning to fuse 2d and 3d image cues for monocular body pose estimation. 2017 IEEE International Conference on Computer Vision (ICCV) (2017), 3961–3970. 44. [44] Tomè, D., Russell, C., and Agapito, L. Lifting from the deep: Convolutional 3d pose estimation from a single image. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017), 5689–5698. 45. [45] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., and Polosukhin, I. Attention is all you need. ArXiv abs/1706.03762 (2017). 46. [46] Véges, M., Varga, V., and Lörincz, A. 3d human pose estimation with siamese equivariant embedding. Neurocomputing 339 (2019), 194–201. 47. [47] von Marcard, T., Henschel, R., Black, M., Rosenhahn, B., and PonsMoll, G. Recovering accurate 3d human pose in the wild using imus and a moving camera. In European Conference on Computer Vision (ECCV) (Sep 2018). 48. [48] Wang, L., Chen, Y., Guo, Z., Qian, K., Lin, M., Li, H., and Ren, J. Generaliz ing monocular 3d human pose estimation in the wild. 2019 IEEE/CVF Inter national Conference on Computer Vision Workshop (ICCVW) (2019), 4024– 4033. 49. [49] Wang, Z., Shin, D., and Fowlkes, C.C.Predicting camera viewpoint improves cross-dataset generalization for 3d human pose estimation. In ECCV Work shops (2020). 50. [50] Xiao,B.,Wu,H.,andWei,Y.Simplebaselinesforhumanposeestimationand tracking. In ECCV (2018). 51. [51] Yang, W., Ouyang, W., Wang, X., Ren, J., Li, H., and Weng, X.3d human pose estimation in the wild by adversarial learning. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (2018), 5255–5264. 52. [52] Zeng, A., Sun, X., Huang, F., Liu, M., Xu, Q., and Lin, S. Srnet: Improv ing generalization in 3d human pose estimation with a splitandrecombine approach. In ECCV (2020). 53. [53] Zhang, W., Zhu, M., and Derpanis, K. From actemes to action: A strongly supervised representation for detailed action understanding. 2013 IEEE Inter national Conference on Computer Vision (2013), 2248–2255. 54. [54] Zhao, L., Peng, X., Tian, Y., Kapadia, M., and Metaxas, D. N. Semantic graph convolutional networks for 3d human pose regression. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019), 3420– 3430. 55. [55] Zhou, X., Huang, Q., Sun, X., Xue, X., and Wei, Y. Towards 3d human pose estimation in the wild: A weaklysupervised approach. 2017 IEEE International Conference on Computer Vision (ICCV) (2017), 398–407. 56. [56] Zhou, X., Sun, X., Zhang, W., Liang, S., and Wei, Y. Deep kinematic pose regression. In ECCV Workshops (2016). 57. [57] Zhu, L., Rematas, K., Curless, B., Seitz, S., and KemelmacherShlizerman, I. Reconstructing nba players. In Proceedings of the European Conference on Computer Vision (ECCV) (August 2020). |