帳號:guest(3.133.120.91)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):郭士維
作者(外文):Guo, Shih-Wei
論文名稱(中文):使用基於分數的擴散網路以達成精確取放任務
論文名稱(外文):Precise Pick-and-Place using Score-Based Diffusion Networks
指導教授(中文):李濬屹
指導教授(外文):Lee, Chun-Yi
口試委員(中文):陳煥宗
劉育綸
口試委員(外文):Chen, Hwann-Tzong
Liu, Yu-Lun
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊工程學系
學號:108062704
出版年(民國):113
畢業學年度:112
語文別:英文
論文頁數:31
中文關鍵詞:電腦視覺擴散網路姿態估計取放任務機械手臂機器人操縱
外文關鍵詞:Computer VisionDiffusion NetworkPose EstimationPick-and-PlaceRobotic ArmRobotic Manipulation
相關次數:
  • 推薦推薦:0
  • 點閱點閱:98
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
在本論文中,我們提出了一種新穎的由粗到細的連續姿態擴散方法,以提高機器人操作任務中取放的精度。利用擴散網路,我們能夠實現物體姿態的準確感知。這種準確的感知不僅提高了取放的成功率,還增強了操作的精確性。我們的方法使用了從RGB-D相機投影而成的俯視RGB圖像,並採用了由粗到細的漸進式架構。這種架構能夠高效地訓練粗略和精細模型。此方法的一個特點是其連續的姿態估計,使得物體操作更加精確,尤其是在旋轉角度方面。此外,我們採用了姿態和顏色增強技術,以便在資料有限的情況下進行有效的訓練。通過在模擬和現實場景中的廣泛實驗,以及進行消融實驗,我們詳盡評估了所提出的方法。綜合來看,研究結果驗證了該方法在實現高精度取放任務的有效性。
In this thesis, we propose a novel coarse-to-fine continuous pose diffusion method to enhance the precision of pick-and-place operations within robotic manipulation tasks. Leveraging the capabilities of diffusion networks, we facilitate the accurate perception of object poses. This accurate perception enhances both pick-and-place success rates and overall manipulation precision. Our methodology utilizes a top-down RGB image projected from an RGB-D camera and adopts a coarse-to-fine architecture. This architecture enables efficient learning of coarse and fine models. A distinguishing feature of our approach is its focus on continuous pose estimation, which enables more precise object manipulation, particularly concerning rotational angles. In addition, we employ pose and color augmentation techniques to enable effective training with limited data. Through extensive experiments in simulated and real-world scenarios, as well as an ablation study, we comprehensively evaluate our proposed methodology. Taken together, the findings validate its effectiveness in achieving high-precision pick-and-place tasks.
摘要 iii
Abstract v

誌謝 vii
Acknowledgements ix

1 Introduction 1

2 Related Work 5
2.1 Pick-and-Place 5
2.2 Transporter Network and Its Successor 5
2.3 Diffusion Models and Its Application in Manipulation 6

3 Background 7
3.1 Score-Based Generative Models 7
3.2 Score-Based Pose Diffusion Models 7

4 Methodology 9
4.1 Problem Statement 9
4.2 Framework Overview 10
4.3 Extending Score-Based Pose Diffusion Models 11
4.4 Architecture Design 13
4.5 Data Augmentation 14

5 Experimental Results 17
5.1 Environments 17
5.1.1 Environmental Setups 17
5.1.2 Camera and Robot Calibration 17
5.1.3 Tasks and Datasets 18
5.2 Baselines 20
5.3 Training and Metrics 21
5.3.1 Training Procedure 21
5.3.2 Metrics 21
5.4 Performance Evaluation and Ablation Study 22
5.4.1 Simulation 22
5.4.2 Real Robot 24
5.4.3 Ablation Study 25

6 Conclusion 27

References 29
[1] F. J. Romero-Ramirez, R. Muñoz-Salinas, and R. Medina-Carnicer, “Speeded up detection of squared fiducial markers,” Image and Vision Computing, vol. 76, pp. 38–47, 2018.
[2] E. Olson, “Apriltag: A robust and flexible visual fiducial system,” in International Conference on Robotics and Automation (ICRA), pp. 3400–3407, IEEE, 2011.
[3] V. Narayanan and M. Likhachev, “Discriminatively-guided deliberative perception for pose estimation of multiple 3d object instances.,” in Robotics: Science and Systems (RSS), 2016.
[4] W. Kehl, F. Tombari, S. Ilic, and N. Navab, “Real-time 3d model tracking in color and depth on a single cpu core,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 745–753, 2017.
[5] M. Gualtieri and R. Platt, “Robotic pick-and-place with uncertain object instance segmentation and shape completion,” IEEE Robotics and Automation Letters (RA-L), vol. 6, no. 2, pp. 1753–1760, 2021.
[6] S. Levine, C. Finn, T. Darrell, and P. Abbeel, “End-to-end training of deep visuomotor policies,” Journal of Machine Learning Research, vol. 17, no. 39, pp. 1–40, 2016.
[7] R. Rahmatizadeh, P. Abolghasemi, L. Bölöni, and S. Levine, “Vision-based multi-task manipulation for inexpensive robots using end-to-end learning from demonstration,” in International Conference on Robotics and Automation (ICRA), pp. 3758–3765, IEEE, 2018.
[8] D. Kalashnikov, J. Varley, Y. Chebotar, B. Swanson, R. Jonschkowski, C. Finn, S. Levine, and K. Hausman, “Mt-opt: Continuous multi-task robotic reinforcement learning at scale,” arXiv preprint arXiv:2104.08212, 2021.
[9] L. Berscheid, P. Meißner, and T. Kröger, “Self-supervised learning for precise pick-and-place without object model,” IEEE Robotics and Automation Letters (RA-L), vol. 5, no. 3, pp. 4828–4835, 2020.
[10] A. Zeng, S. Song, S. Welker, J. Lee, A. Rodriguez, and T. Funkhouser, “Learning synergies between pushing and grasping with self-supervised deep reinforcement learning,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4238–4245, IEEE, 2018.
[11] H. Huang, O. L. Howell, D. Wang, X. Zhu, R. Platt, and R. Walters, “Fourier transporter: Bi-equivariant robotic manipulation in 3d,” in International Conference on Learning Representations (ICLR), 2024.
[12] A. Zeng, P. Florence, J. Tompson, S. Welker, J. Chien, M. Attarian, T. Armstrong, I. Krasin, D. Duong, V. Sindhwani, et al., “Transporter networks: Rearranging the visual world for robotic manipulation,” in Conference on Robot Learning (CoRL), pp. 726–747, PMLR, 2021.
[13] T. Fu, Y. Tang, T. Wu, X. Xia, J. Wang, and C. Zhao, “Multi-dimensional deformable object manipulation using equivariant models,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1000–1007, IEEE, 2023.
[14] H. Huang, D. Wang, A. Tangri, R. Walters, and R. Platt, “Leveraging symmetries in pick and place,” The International Journal of Robotics Research, vol. 43, no. 4, pp. 550–571, 2024.
[15] G. Sóti, X. Huang, C. Wurll, and B. Hein, “Train what you know –precise pick-and-place with transporter networks,” in International Conference on Robotics and Automation (ICRA), pp. 5814–5820, IEEE, 2023.
[16] A. Simeonov, A. Goyal, L. Manuelli, Y.-C. Lin, A. Sarmiento, A. R. Garcia, P. Agrawal, and D. Fox, “Shelving, stacking, hanging: Relational pose diffusion for multi-modal rearrangement,” in Conference on Robot Learning (CoRL), pp. 2030–2069, PMLR, 2023.
[17] M. Zhu, K. G. Derpanis, Y. Yang, S. Brahmbhatt, M. Zhang, C. Phillips, M. Lecce, and K. Daniilidis, “Single image 3d object detection and pose estimation for grasping,” in International Conference on Robotics and Automation (ICRA), pp. 3936–3943, IEEE, 2014.
[18] D. Seita, P. Florence, J. Tompson, E. Coumans, V. Sindhwani, K. Goldberg, and A. Zeng, “Learning to rearrange deformable cables, fabrics, and bags with goal-conditioned transporter networks,” in International Conference on Robotics and Automation (ICRA), pp. 4568–4575, IEEE, 2021.
[19] M. Shridhar, L. Manuelli, and D. Fox, “Cliport: What and where pathways for robotic manipulation,” in Conference on Robot Learning (CoRL), pp. 894–906, PMLR, 2022.
[20] M. H. Lim, A. Zeng, B. Ichter, M. Bandari, E. Coumans, C. Tomlin, S. Schaal, and A. Faust, “Multi-task learning with sequence-conditioned transporter networks,” in International Conference on Robotics and Automation (ICRA), pp. 2489–2496, IEEE, 2022.
[21] J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” Advances in Neural Information Processing Systems (NeurIPS), vol. 33, pp. 6840–6851, 2020.
[22] J. Song, C. Meng, and S. Ermon, “Denoising diffusion implicit models,” in International Conference on Learning Representations (ICLR), 2021.
[23] Y. Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, and B. Poole, “Score-based generative modeling through stochastic differential equations,” in International Conference on Learning Representations (ICLR), 2021.
[24] Y. Song and S. Ermon, “Generative modeling by estimating gradients of the data distribution,” Advances in Neural Information Processing Systems (NeurIPS), vol. 32, pp. 11895–11907, 2019.
[25] U. A. Mishra and Y. Chen, “Reorientdiff: Diffusion model based reorientation for object manipulation,” arXiv preprint arXiv:2303.12700, 2023.
[26] Z. Xian, N. Gkanatsios, T. Gervet, T.-W. Ke, and K. Fragkiadaki, “Chaineddiffuser: Unifying trajectory diffusion and keypose prediction for robotic manipulation,” in Conference on Robot Learning (CoRL), pp. 2323–2339, PMLR, 2023.
[27] L. Chen, S. Bahl, and D. Pathak, “Playfusion: Skill acquisition via diffusion from language-annotated play,” in Conference on Robot Learning (CoRL), pp. 2012–2029, PMLR, 2023.
[28] H. Ryu, J. Kim, H. An, J. Chang, J. Seo, T. Kim, Y. Kim, C. Hwang, J. Choi, and R. Horowitz, “Diffusion-edfs: Bi-equivariant denoising generative modeling on se (3) for visual robotic manipulation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 18007–18018, 2024.
[29] H. Ha, P. Florence, and S. Song, “Scaling up and distilling down: Language-guided robot skill acquisition,” in Conference on Robot Learning (CoRL), pp. 3766–3777, PMLR, 2023.
[30] C. Chi, S. Feng, Y. Du, Z. Xu, E. Cousineau, B. Burchfiel, and S. Song, “Diffusion policy: Visuomotor policy learning via action diffusion,” in Robotics: Science and Systems (RSS), 2023.
[31] Q. Liu, J. Lee, and M. Jordan, “A kernelized stein discrepancy for goodness-of-fit tests,” in International Conference on Machine Learning (ICML), pp. 276–284, PMLR, 2016.
[32] P. Vincent, “A connection between score matching and denoising autoencoders,” Neural Computation, vol. 23, no. 7, pp. 1661–1674, 2011.
[33] T.-C. Hsiao, H.-W. Chen, H.-K. Yang, and C.-Y. Lee, “Confronting ambiguity in 6d object pose estimation via score-based diffusion on se(3),” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024.
[34] J. Deray and J. Solà, “Manif: A micro Lie theory library for state estimation in robotics applications,” Journal of Open Source Software, vol. 5, no. 46, p. 1371, 2020.
[35] E. Jørgensen, “The central limit problem for geodesic random walks,” Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete, vol. 32, no. 1, pp. 1–64, 1975.
[36] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778, 2016.
[37] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in Neural Information Processing Systems (NeurIPS), vol. 30, 2017.
[38] G. Bradski, “The OpenCV Library,” Dr. Dobb’s Journal of Software Tools, 2000.
[39] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 248–255, 2009.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *