|
[1] F. Manhardt, D. M. Arroyo, C. Rupprecht, B. Busam, T. Birdal, N. Navab, and F. Tombari, “Explaining the ambiguity of object detection and 6d pose from visual data,” in Proc. IEEE Int. Conf. on Computer Vision (ICCV), pp. 6840–6849, 2019. [2] T. Hodan, D. Baráth, and J. Matas, “EPOS: estimating 6d pose of objects with symmetries,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 11700–11709, 2020. [3] H. Deng, M. Bui, N. Navab, L. Guibas, S. Ilic, and T. Birdal, “Deep bingham networks: Dealing with uncertainty and ambiguity in pose estimation,” 2020. [4] K. A. Murphy, C. Esteves, V. Jampani, S. Ramalingam, and A. Makadia, “Implicitpdf: Nonparametric representation of probability distributionson the rotation manifold,” in Proc. Int. Conf. on Machine Learning (ICML), vol. 139, pp. 7882–7893, 2021. [5] T. Hodan, P. Haluza, Š. Obdržálek, J. Matas, M. Lourakis, and X. Zabulis, “Tless: An rgbd dataset for 6d pose estimation of textureless objects,” in 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 880–888, IEEE, 2017. [6] K. Park, T. Patten, and M. Vincze, “Pix2pose: Pixelwise coordinate regression of objects for 6d pose estimation,” in Proc. IEEE Int. Conf. on Computer Vision (ICCV), pp. 7667– 7676, 2019. [7] G. Wang, F. Manhardt, F. Tombari, and X. Ji, “Gdrnet: Geometryguided direct regression network for monocular 6d object pose estimation,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 16611–16621, 2021. [8] S. Thalhammer, T. Patten, and M. Vincze, “COPE: endtoend trainable constant runtime object pose estimation,” in Proc. IEEE Winter Conf. on Applications of Computer Vision (WACV), pp. 2859–2869, 2023. [9] T. Höfer, B. Kiefer, M. Messmer, and A. Zell, “Hyperposepdfhypernetworks predicting the probability distribution on so (3),” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 2369–2379, 2023. [10] R. L. Haugaard, F. Hagelskjær, and T. M. Iversen, “Spyropose: Importance sampling pyramids for object pose distribution estimation in SE(3),” CoRR, vol. abs/2303.05308, 2023. [11] Y. Song and S. Ermon, “Generative modeling by estimating gradients of the data distribution,” in Proc. Conf. on Neural Information Processing Systems (NeurIPS), pp. 11895– 11907, 2019. 41 [12] J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” in Proc. Conf. on Neural Information Processing Systems (NeurIPS), 2020. [13] J. Song, C. Meng, and S. Ermon, “Denoising diffusion implicit models,” in Proc. Int. Conf. on Learning Representations (ICLR), 2021. [14] Y. Song, J. SohlDickstein, D. P. Kingma, A. Kumar, S. Ermon, and B. Poole, “Scorebased generative modeling through stochastic differential equations,” in Proc. Int. Conf. on Learning Representations (ICLR), 2021. [15] A. Leach, S. M. Schmon, M. T. Degiacomi, and C. G. Willcocks, “Denoising diffusion probabilistic models on so(3) for rotational alignment,” in Proc. Int. Conf. on Learning Representations Workshop (ICLRW), 2022. [16] Y. Jagvaral, F. Lanusse, and R. Mandelbaum, “Diffusion generative models on so(3).” https://openreview.net/pdf?id=jHA-yCyBGb, 2023. [17] J. Urain, N. Funk, J. Peters, and G. Chalvatzaki, “Se(3)diffusionfields: Learning smooth cost functions for joint grasp and motion optimization through diffusion,” CoRR, vol. abs/2209.03855, 2022. [18] J. Yim, B. L. Trippe, V. D. Bortoli, E. Mathieu, A. Doucet, R. Barzilay, and T. S. Jaakkola, “SE(3) diffusion model with application to protein backbone generation,” CoRR, vol. abs/2302.02277, 2023. [19] Y. Xiang, T. Schmidt, V. Narayanan, and D. Fox, “Posecnn: A convolutional neural network for 6d object pose estimation in cluttered scenes,” in Robotics: Science and Systems XIV, 2018. [20] A. Amini, A. S. Periyasamy, and S. Behnke, “Yolopose: Transformerbased multiobject 6d pose estimation using keypoint regression,” in Intelligent Autonomous Systems (IAS), vol. 577, pp. 392–406, 2022. [21] Y. Labbé, J. Carpentier, M. Aubry, and J. Sivic, “Cosypose: Consistent multiview multiobject 6d pose estimation,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XVII 16, pp. 574–591, Springer, 2020. [22] Y. Di, F. Manhardt, G. Wang, X. Ji, N. Navab, and F. Tombari, “Sopose: Exploiting selfocclusion for direct 6d pose estimation,” in Proc. IEEE Int. Conf. on Computer Vision (ICCV), pp. 12376–12385, 2021. [23] S. Peng, Y. Liu, Q. Huang, X. Zhou, and H. Bao, “Pvnet: Pixelwise voting network for 6dof pose estimation,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 4561–4570, 2019. [24] M. Rad and V. Lepetit, “BB8: A scalable, accurate, robust to partial occlusion method for predicting the 3d poses of challenging objects without using depth,” in Proc. IEEE Int. Conf. on Computer Vision (ICCV), pp. 3848–3856, 2017. 42 [25] H. Wang, S. Sridhar, J. Huang, J. Valentin, S. Song, and L. J. Guibas, “Normalized object coordinate space for categorylevel 6d object pose and size estimation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2642–2651, 2019. [26] L. Huang, T. Hodan, L. Ma, L. Zhang, L. Tran, C. D. Twigg, P. Wu, J. Yuan, C. Keskin, and R. Wang, “Neural correspondence field for object pose estimation,” in Proc. European Conf. on Computer Vision (ECCV), vol. 13670, pp. 585–603, 2022. [27] B. Okorn, M. Xu, M. Hebert, and D. Held, “Learning orientation distributions for object pose estimation,” in 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 10580–10587, IEEE, 2020. [28] I. Gilitschenski, R. Sahoo, W. Schwarting, A. Amini, S. Karaman, and D. Rus, “Deep orientation uncertainty learning based on a bingham loss,” in International conference on learning representations, 2020. [29] S. Prokudin, P. Gehler, and S. Nowozin, “Deep directional statistics: Pose estimation with uncertainty quantification,” in Proceedings of the European conference on computer vision (ECCV), pp. 534–551, 2018. [30] D. M. Klee, O. Biza, R. Platt, and R. Walters, “Image to sphere: Learning equivariant features for efficient pose prediction,” arXiv preprint arXiv:2302.13926, 2023. [31] L. Yang, Z. Zhang, Y. Song, S. Hong, R. Xu, Y. Zhao, Y. Shao, W. Zhang, B. Cui, and M.H. Yang, “Diffusion models: A comprehensive survey of methods and applications,” arXiv preprint arXiv:2209.00796, 2022. [32] A. Ramesh, P. Dhariwal, A. Nichol, C. Chu, and M. Chen, “Hierarchical textconditional image generation with clip latents,” arXiv preprint arXiv:2204.06125, 2022. [33] N. Ruiz, Y. Li, V. Jampani, Y. Pritch, M. Rubinstein, and K. Aberman, “Dreambooth: Fine tuning texttoimage diffusion models for subjectdriven generation,” arXiv preprint arXiv:2208.12242, 2022. [34] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “Highresolution image synthesis with latent diffusion models,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10684–10695, 2022. [35] C. Saharia, W. Chan, S. Saxena, L. Li, J. Whang, E. L. Denton, K. Ghasemipour, R. Gontijo Lopes, B. Karagol Ayan, T. Salimans, et al., “Photorealistic texttoimage diffusion models with deep language understanding,” Advances in Neural Information Processing Systems, vol. 35, pp. 36479–36494, 2022. [36] R. Yang, P. Srivastava, and S. Mandt, “Diffusion probabilistic modeling for video generation,” arXiv preprint arXiv:2203.09481, 2022. [37] J. Ho, T. Salimans, A. Gritsenko, W. Chan, M. Norouzi, and D. J. Fleet, “Video diffusion models,” arXiv preprint arXiv:2204.03458, 2022. [38] J. Ho, W. Chan, C. Saharia, J. Whang, R. Gao, A. Gritsenko, D. P. Kingma, B. Poole, M. Norouzi, D. J. Fleet, et al., “Imagen video: High definition video generation with diffusion models,” arXiv preprint arXiv:2210.02303, 2022. 43 [39] R. Huang, Z. Zhao, H. Liu, J. Liu, C. Cui, and Y. Ren, “Prodiff: Progressive fast diffusion model for highquality texttospeech,” in Proceedings of the 30th ACM International Conference on Multimedia, pp. 2595–2605, 2022. [40] D. Yang, J. Yu, H. Wang, W. Wang, C. Weng, Y. Zou, and D. Yu, “Diffsound: Discrete diffusion model for texttosound generation,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2023. [41] S. Gong, M. Li, J. Feng, Z. Wu, and L. Kong, “Diffuseq: Sequence to sequence text generation with diffusion models,” arXiv preprint arXiv:2210.08933, 2022. [42] X. Li, J. Thickstun, I. Gulrajani, P. S. Liang, and T. B. Hashimoto, “Diffusionlm improves controllable text generation,” Advances in Neural Information Processing Systems, vol. 35, pp. 4328–4343, 2022. [43] F.A. Croitoru, V. Hondru, R. T. Ionescu, and M. Shah, “Diffusion models in vision: A survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023. [44] T. Amit, T. Shaharbany, E. Nachmani, and L. Wolf, “Segdiff: Image segmentation with diffusion probabilistic models,” arXiv preprint arXiv:2112.00390, 2021. [45] D. Baranchuk, I. Rubachev, A. Voynov, V. Khrulkov, and A. Babenko, “Labelefficient semantic segmentation with diffusion models,” arXiv preprint arXiv:2112.03126, 2021. [46] S. Chen, P. Sun, Y. Song, and P. Luo, “Diffusiondet: Diffusion model for object detection,” CoRR, vol. abs/2211.09788, 2022. [47] J. Choi, D. Shim, and H. J. Kim, “Diffupose: Monocular 3d human pose estimation via denoising diffusion probabilistic model,” CoRR, vol. abs/2212.02796, 2022. [48] K. Holmquist and B. Wandt, “Diffpose: Multihypothesis human pose estimation using diffusion models,” arXiv preprint arXiv:2211.16487, 2022. [49] V. D. Bortoli, E. Mathieu, M. J. Hutchinson, J. Thornton, Y. W. Teh, and A. Doucet, “Riemannian scorebased generative modelling,” in Advances in Neural Information Processing Systems (A. H. Oh, A. Agarwal, D. Belgrave, and K. Cho, eds.), 2022. [50] E. Jørgensen, “The central limit problem for geodesic random walks,” Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete, vol. 32, no. 12, pp. 1–64, 1975. [51] J. Solà, J. Deray, and D. Atchuthan, “A micro lie theory for state estimation in robotics,” CoRR, vol. abs/1812.01537, 2018. [52] P. Vincent, “A connection between score matching and denoising autoencoders,” Neural Comput., vol. 23, no. 7, pp. 1661–1674, 2011. [53] D. I. Nikolayev and T. I. Savyolov, “Normal distribution on the rotation group so (3),” Textures and Microstructures, vol. 29, 1970. [54] S. Said, L. Bombrun, Y. Berthoumieu, and J. H. Manton, “Riemannian gaussian distributions on the space of symmetric positive definite matrices,” IEEE Trans. Inf. Theory, vol. 63, no. 4, pp. 2153–2170, 2017. 44 [55] G. Chirikjian and M. Kobilarov, “Gaussian approximation of nonlinear measurement models on lie groups,” in 53rd IEEE Conference on Decision and Control, pp. 6401–6406, IEEE, 2014. [56] T. D. Barfoot and P. T. Furgale, “Associating uncertainty with threedimensional poses for use in estimation problems,” IEEE Trans. Robotics, vol. 30, pp. 679–693, 2014. [57] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770– 778, 2016. [58] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017. [59] L. Ziyin, T. Hartwig, and M. Ueda, “Neural networks fail to learn periodic functions and how to fix it,” Advances in Neural Information Processing Systems, vol. 33, pp. 1583– 1594, 2020. [60] J. Lee, W. Kim, D. Gwak, and E. Choi, “Conditional generation of periodic signals with fourierbased decoder,” arXiv preprint arXiv:2110.12365, 2021. [61] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 770–778, 2016. [62] T. Hodaň, M. Sundermeyer, B. Drost, Y. Labbé, E. Brachmann, F. Michel, C. Rother, and J. Matas, “Bop challenge 2020 on 6d object localization,” in Computer Vision–ECCV 2020 Workshops: Glasgow, UK, August 23–28, 2020, Proceedings, Part II 16, pp. 577– 594, Springer, 2020. [63] J. Bradbury, R. Frostig, P. Hawkins, M. J. Johnson, C. Leary, D. Maclaurin, G. Necula, A. Paszke, J. VanderPlas, S. WandermanMilne, and Q. Zhang, “JAX: composable transformations of Python+NumPy programs,” 2018. [64] B. Yi, M. Lee, A. Kloss, R. MartínMartín, and J. Bohg, “Differentiable factor graph optimization for learning smoothers,” in 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2021. [65] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in Proc. Int. Conf. on Learning Representations (ICLR), 2015 |