|
[1] F. J. Romero-Ramirez, R. Muñoz-Salinas, and R. Medina-Carnicer, “Speeded up detection of squared fiducial markers,” Image and Vision Computing, vol. 76, pp. 38–47, 2018. [2] E. Olson, “Apriltag: A robust and flexible visual fiducial system,” in International Conference on Robotics and Automation (ICRA), pp. 3400–3407, IEEE, 2011. [3] V. Narayanan and M. Likhachev, “Discriminatively-guided deliberative perception for pose estimation of multiple 3d object instances.,” in Robotics: Science and Systems (RSS), 2016. [4] W. Kehl, F. Tombari, S. Ilic, and N. Navab, “Real-time 3d model tracking in color and depth on a single cpu core,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 745–753, 2017. [5] M. Gualtieri and R. Platt, “Robotic pick-and-place with uncertain object instance segmentation and shape completion,” IEEE Robotics and Automation Letters (RA-L), vol. 6, no. 2, pp. 1753–1760, 2021. [6] S. Levine, C. Finn, T. Darrell, and P. Abbeel, “End-to-end training of deep visuomotor policies,” Journal of Machine Learning Research, vol. 17, no. 39, pp. 1–40, 2016. [7] R. Rahmatizadeh, P. Abolghasemi, L. Bölöni, and S. Levine, “Vision-based multi-task manipulation for inexpensive robots using end-to-end learning from demonstration,” in International Conference on Robotics and Automation (ICRA), pp. 3758–3765, IEEE, 2018. [8] D. Kalashnikov, J. Varley, Y. Chebotar, B. Swanson, R. Jonschkowski, C. Finn, S. Levine, and K. Hausman, “Mt-opt: Continuous multi-task robotic reinforcement learning at scale,” arXiv preprint arXiv:2104.08212, 2021. [9] L. Berscheid, P. Meißner, and T. Kröger, “Self-supervised learning for precise pick-and-place without object model,” IEEE Robotics and Automation Letters (RA-L), vol. 5, no. 3, pp. 4828–4835, 2020. [10] A. Zeng, S. Song, S. Welker, J. Lee, A. Rodriguez, and T. Funkhouser, “Learning synergies between pushing and grasping with self-supervised deep reinforcement learning,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4238–4245, IEEE, 2018. [11] H. Huang, O. L. Howell, D. Wang, X. Zhu, R. Platt, and R. Walters, “Fourier transporter: Bi-equivariant robotic manipulation in 3d,” in International Conference on Learning Representations (ICLR), 2024. [12] A. Zeng, P. Florence, J. Tompson, S. Welker, J. Chien, M. Attarian, T. Armstrong, I. Krasin, D. Duong, V. Sindhwani, et al., “Transporter networks: Rearranging the visual world for robotic manipulation,” in Conference on Robot Learning (CoRL), pp. 726–747, PMLR, 2021. [13] T. Fu, Y. Tang, T. Wu, X. Xia, J. Wang, and C. Zhao, “Multi-dimensional deformable object manipulation using equivariant models,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1000–1007, IEEE, 2023. [14] H. Huang, D. Wang, A. Tangri, R. Walters, and R. Platt, “Leveraging symmetries in pick and place,” The International Journal of Robotics Research, vol. 43, no. 4, pp. 550–571, 2024. [15] G. Sóti, X. Huang, C. Wurll, and B. Hein, “Train what you know –precise pick-and-place with transporter networks,” in International Conference on Robotics and Automation (ICRA), pp. 5814–5820, IEEE, 2023. [16] A. Simeonov, A. Goyal, L. Manuelli, Y.-C. Lin, A. Sarmiento, A. R. Garcia, P. Agrawal, and D. Fox, “Shelving, stacking, hanging: Relational pose diffusion for multi-modal rearrangement,” in Conference on Robot Learning (CoRL), pp. 2030–2069, PMLR, 2023. [17] M. Zhu, K. G. Derpanis, Y. Yang, S. Brahmbhatt, M. Zhang, C. Phillips, M. Lecce, and K. Daniilidis, “Single image 3d object detection and pose estimation for grasping,” in International Conference on Robotics and Automation (ICRA), pp. 3936–3943, IEEE, 2014. [18] D. Seita, P. Florence, J. Tompson, E. Coumans, V. Sindhwani, K. Goldberg, and A. Zeng, “Learning to rearrange deformable cables, fabrics, and bags with goal-conditioned transporter networks,” in International Conference on Robotics and Automation (ICRA), pp. 4568–4575, IEEE, 2021. [19] M. Shridhar, L. Manuelli, and D. Fox, “Cliport: What and where pathways for robotic manipulation,” in Conference on Robot Learning (CoRL), pp. 894–906, PMLR, 2022. [20] M. H. Lim, A. Zeng, B. Ichter, M. Bandari, E. Coumans, C. Tomlin, S. Schaal, and A. Faust, “Multi-task learning with sequence-conditioned transporter networks,” in International Conference on Robotics and Automation (ICRA), pp. 2489–2496, IEEE, 2022. [21] J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” Advances in Neural Information Processing Systems (NeurIPS), vol. 33, pp. 6840–6851, 2020. [22] J. Song, C. Meng, and S. Ermon, “Denoising diffusion implicit models,” in International Conference on Learning Representations (ICLR), 2021. [23] Y. Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, and B. Poole, “Score-based generative modeling through stochastic differential equations,” in International Conference on Learning Representations (ICLR), 2021. [24] Y. Song and S. Ermon, “Generative modeling by estimating gradients of the data distribution,” Advances in Neural Information Processing Systems (NeurIPS), vol. 32, pp. 11895–11907, 2019. [25] U. A. Mishra and Y. Chen, “Reorientdiff: Diffusion model based reorientation for object manipulation,” arXiv preprint arXiv:2303.12700, 2023. [26] Z. Xian, N. Gkanatsios, T. Gervet, T.-W. Ke, and K. Fragkiadaki, “Chaineddiffuser: Unifying trajectory diffusion and keypose prediction for robotic manipulation,” in Conference on Robot Learning (CoRL), pp. 2323–2339, PMLR, 2023. [27] L. Chen, S. Bahl, and D. Pathak, “Playfusion: Skill acquisition via diffusion from language-annotated play,” in Conference on Robot Learning (CoRL), pp. 2012–2029, PMLR, 2023. [28] H. Ryu, J. Kim, H. An, J. Chang, J. Seo, T. Kim, Y. Kim, C. Hwang, J. Choi, and R. Horowitz, “Diffusion-edfs: Bi-equivariant denoising generative modeling on se (3) for visual robotic manipulation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 18007–18018, 2024. [29] H. Ha, P. Florence, and S. Song, “Scaling up and distilling down: Language-guided robot skill acquisition,” in Conference on Robot Learning (CoRL), pp. 3766–3777, PMLR, 2023. [30] C. Chi, S. Feng, Y. Du, Z. Xu, E. Cousineau, B. Burchfiel, and S. Song, “Diffusion policy: Visuomotor policy learning via action diffusion,” in Robotics: Science and Systems (RSS), 2023. [31] Q. Liu, J. Lee, and M. Jordan, “A kernelized stein discrepancy for goodness-of-fit tests,” in International Conference on Machine Learning (ICML), pp. 276–284, PMLR, 2016. [32] P. Vincent, “A connection between score matching and denoising autoencoders,” Neural Computation, vol. 23, no. 7, pp. 1661–1674, 2011. [33] T.-C. Hsiao, H.-W. Chen, H.-K. Yang, and C.-Y. Lee, “Confronting ambiguity in 6d object pose estimation via score-based diffusion on se(3),” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024. [34] J. Deray and J. Solà, “Manif: A micro Lie theory library for state estimation in robotics applications,” Journal of Open Source Software, vol. 5, no. 46, p. 1371, 2020. [35] E. Jørgensen, “The central limit problem for geodesic random walks,” Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete, vol. 32, no. 1, pp. 1–64, 1975. [36] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778, 2016. [37] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in Neural Information Processing Systems (NeurIPS), vol. 30, 2017. [38] G. Bradski, “The OpenCV Library,” Dr. Dobb’s Journal of Software Tools, 2000. [39] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 248–255, 2009. |