Halide編譯器上支援稀疏矩陣壓縮的排程設計__國立清華大學博碩士論文全文影像系統

帳號：guest(216.73.216.146) 離開系統

字體大小：

詳目顯示

第 1 筆 / 共 1 筆

/1頁

以作者查詢圖書館館藏

、以作者查詢臺灣博碩士論文系統

、以作者查詢全國書目

論文基本資料
摘要
外文摘要
論文目次
參考文獻
電子全文

作者(中文):	趙振廷
作者(外文):	Chao, Chen-Ting
論文名稱(中文):	Halide編譯器上支援稀疏矩陣壓縮的排程設計
論文名稱(外文):	Devise Sparse Compression Schedulers in Halide
指導教授(中文):	李政崑
指導教授(外文):	Lee, Jenq-Kuen
口試委員(中文):	洪明郁陳呈瑋
口試委員(外文):	Hung, Ming-Yu Chen, Cheng-Wei
學位類別:	碩士
校院名稱:	國立清華大學
系所名稱:	資訊工程學系
學號:	107062561
出版年(民國):	109
畢業學年度:	109
語文別:	英文
論文頁數:	48
中文關鍵詞:	NLP、排程、稀疏矩陣、神經網路、word embedding、混合壓縮法、壓縮格式
外文關鍵詞:	sparse matrix、NLP、neural networks、word embedding、compression format、hybrid compression、schedule
相關次數:	推薦:0 點閱:419 評分: 下載:0 收藏:0

近年來，由於硬體與半導體製程技術的快速發展，人工智慧取得了卓越的效能進步，並具有強大的能力來解決各種機器學習的問題，例如圖像分類和物件檢測。人工智慧的一個分支，稱為自然語言處理（NLP），可以從人類語言中提取單詞的含義。理解單詞含義的一般方法是通過單詞嵌入，詞嵌入訓練模型可以將詞轉換為多維向量，將不知道“含義”的詞轉換為具有“含義”的向量。著名的詞嵌入訓練模型包括FastText，Word2Vec和GloVe 等模型。他們可以將單詞訓練成向量，然後將向量用於進一步的語義分類。而在本論文中，我們針對FastText 進行最佳化，改善其效能並減少模型尺寸。
FastText 是Facebook（FAIR）實驗室開發的開源函式庫，讓使用者更容易地訓練
詞嵌入和文字分類。邊緣運算對人工智慧的需求顯著增加，各種硬體加速器的效能改善對於深度學習應用至關重要。稀疏矩陣是一種包含了許多零元素的矩陣，在深度學習中非常常見。然而，稀疏矩陣的計算效率很低，儲存空間會被無用的零元素佔據，因此如何有效地壓縮稀疏矩陣，就可以有效地提高深度學習的效能。此外，Halide 是一種嵌入式領域特定語言，旨在用於高效能的圖像和陣列處理。Halide 透過分開演算法與排程，讓使用者可以很輕鬆地撰寫演算法和更改排程，以探索演算法的各種可能的最佳化方法。本論文提出了一種適用於大多數稀疏矩陣而不需特別考慮非零元素分佈的混合壓縮的方案，並設計了一種Reorder 演算法輔助該混合壓縮格式。我們將所提出的方法實作為Halide 的新排程，並將該方案應用於FastText 以提高其效能。我們的實驗顯示，在SpMM 的運算上，我們所提出的混合壓縮方法的效能優於最普遍的格式之一CSR，使得讓效能提高了約2至3倍。

Nowadays, artificial intelligence has achieved remarkable performance and has the powerful ability to solve various machine learning tasks, such as image classification and object detection. A branch of artificial intelligence, called Natural language processing (NLP), can extract words meaning from human language. The general way to understand the meaning of a word is via word embedding. The word embedding training model can convert words into multidimensional vectors and make the words that do not know “meaning” into vectors with “meaning”. Famous word embedding training models, include models such as FastText, Word2Vec, and GloVe. They can train words into vectors and then the vectors are used for further semantic classifications. In this thesis, we work on the efficient support for the FastText. FastText is an open source library created by Facebook(FAIR) lab that allows users to learn word embedding and text classification.
The demand for artificial intelligence on edge devices has increased significantly, and the performance improvement of various hardware accelerators is critical for deep learning applications. Sparse matrices contain many zero elements and are very common in deep learning. However, the computational efficiency of the sparse matrix is very low, and the memory space will be occupied by useless zero elements, so how to effectively compress the sparse matrix can effectively improve the performance of deep learning. Meanwhile Halide is an embedded domain specific language designed for high-performance image and array processing. Halide makes it easy for users to build pipelines and change the schedules to explore varies possible optimizations of a algorithm by decoupling the algorithm from its schedule.
In this thesis, a novel hybrid compression scheme, which is applicable to most sparse matrices, regardless of the distribution of non-zero elements, and a Reorder algorithm are devised. We implement the proposed method as a new scheduling primitive of Halide and apply this scheme to FastText to improve its performance.
Our evaluation demonstrates that the performance of our proposed hybrid compression method on SpMM is better than that of CSR format, and the performance is improved by about 2-3 times.

Abstract i
Contents iii
List of Figures v
List of Tables vii
1 Introduction . . . . . . . . . . . . . . . . . . . . . . 1
2 Background . . . . . . . . . . . . . . . . . . . . . . . 4
2.1 Sparse Matrix . . . . . . . . . . . . . . . . . . . . . 4
2.1.1 Compression format . . . . . . . . . . . . . . . . . 4
2.1.2 Sparse Matrix in Deep Learning . . . . . . . . . . . 6
2.2 Halide . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3 FastText . . . . . . . . . . . . . . . . . . . . . . . 11
3 The Design of Hybrid Compression Scheme . . . . . . . . 14
3.1 Hybrid Method . . . . . . . . . . . . . . . . . . . . 14
3.2 Reordering . . . . . . . . . . . . . . . . . . . . . . 17
3.3 SpMM for Hybrid Method . . . . . . . . . . . . . . . . 21
4 Enhancement of FastText . . . . . . . . . . . . . . . . 23
4.1 The Working Principles of FastText . . . . . . . . . . 23
4.2 Optimization of the FastText via Halide . . . . . . . 24
5 Experimental Evaluation . . . . . . . . . . . . . . . . 29
5.1 Experimental Environment . . . . . . . . . . . . . . . 29
5.2 Experimental Methodology . . . . . . . . . . . . . . . 30
5.2.1 SpMM for Hybrid Format in Halide . . . . . . . . . . 30
5.2.2 Enhanced FastText . . . . . . . . . . . . . . . . . 31
5.3 Experimental Results . . . . . . . . . . . . . . . . . 32
5.3.1 Results of SpMM for Hybrid Format in Halide . . . . 32
5.3.2 Results of Enhanced FastText . . . . . . . . . . . . 36
6 Conclusion and Future Works . . . . . . . . . . . . . . 42
6.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . 42
6.2 Future Works . . . . . . . . . . . . . . . . . . . . . 43

[1] A. Joulin, E. Grave, P. Bojanowski, M. Douze, H. Jegou, and T. Mikolov, "Fasttext.zip: Compressing text classication models," arXiv preprint arXiv:1612.03651, 2016.

[2] T. Mikolov, K. Chen, G. S. Corrado, and J. Dean, "Efficient estimation of word representations in vector space," CoRR, vol. abs/1301.3781, 2013.

[3] J. Pennington, R. Socher, and C. Manning, "Glove: Global vectors for word representation," in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Doha, Qatar: Association for Computational Linguistics, Oct. 2014, pp. 1532-1543. [Online]. Available: https://www.aclweb.org/anthology/D14-1162

[4] J. Ragan-Kelley, A. Adams, S. Paris, M. Levoy, S. Amarasinghe, and F. Durand, "Decoupling algorithms from schedules for easy optimization of image processing pipelines," ACM Transactions on Graphics (TOG), vol. 31, no. 4, pp. 1{12, 2012.

[5] J. Ragan-Kelley, C. Barnes, A. Adams, S. Paris, F. Durand, and S. Amarasinghe, "Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines" ACM SIGPLAN Notices, vol. 48, no. 6, pp. 519{530, 2013.

[6] C.-L. Lee, C.-T. Chao, J.-K. Lee, C.-W. Huang, and M.-Y. Hung,“Sparse-matrix compression primitives with opencl framework to support Halide,” in Proceedings of the International Workshop on OpenCL,2019, pp. 1–2.

[7] C.-L. Lee, C.-T. Chao, J.-K. Lee, M.-Y. Hung, and C.-W. Huang, “Accelerate dnn performance with sparse matrix compression in halide,” in Proceedings of the 48th International Conference on Parallel Processing: Workshops, ser. ICPP 2019. New York, NY, USA: Association for Computing Machinery, 2019. [Online]. Available: https://doi.org/10.1145/3339186.3339194

[8] S. Han, J. Pool, J. Tran, and W. Dally, “Learning both weights and connections for eﬃcient neural network,” in Advances in neural information processing systems, 2015, pp. 1135–1143.

[9] R.-G. Chang, T.-R. Chuang, and J. K. Lee, “Parallel sparse supports for array intrinsic functions of fortran 90,” The Journal of supercomputing, vol. 18, no. 3, pp. 305–339, 2001.

[10] C. Hong, A. Sukumaran-Rajam, B. Bandyopadhyay, J. Kim, S. E. Kurt,I. Nisa, S. Sabhlok, ¨U. V. C¸ ataly¨urek, S. Parthasarathy, and P. Sadayappan, “Eﬃcient sparse-matrix multi-vector product on gpus,” in HPDC ’18, 2018.

[11] C.-C. Hsu, C.-Y. Lin, S. K. Chen, C.-W. Liu, and J.-K. Lee, “Optimized memory access support for data layout conversion on heterogeneous multi-core systems,” WOS:000358220900016, 2014. [Online]. Available: https://ir.nctu.edu.tw/handle/11536/128539

[12] Rong-Guey Chang, Jia-Shin Li, Jenq Kuen Lee, and Tyng-Ruey Chuang, “Probabilistic inference schemes for sparsity structures of fortran 90 array intrinsics,” in International Conference on Parallel Processing, 2001., 2001, pp. 61–68.

[13] R.-G. Chang, T.-R. Chuang, and J. K. Lee, “Support and optimization for parallel sparse programs with array intrinsics of fortran 90,” Parallel Computing, vol. 30, no. 4, pp. 527–550, 2004.

[14] A. Adams, K. Ma, L. Anderson, R. Baghdadi, T.-M. Li, M. Gharbi, B. Steiner, S. Johnson, K. Fatahalian, F. Durand et al., “Learning to optimize halide with tree search and random programs,” ACM Trans-actions on Graphics (TOG), vol. 38, no. 4, pp. 1–12, 2019.

[15] P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, “Enriching word vectors with subword information,” Transactions of the Association for Computational Linguistics, vol. 5, pp. 135–146, 2017.

[16] A. Joulin, E. Grave, P. Bojanowski, and T. Mikolov, “Bag of tricks for eﬃcient text classiﬁcation,” in Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers. Association for Computational Linguistics, April 2017, pp. 427–431.

[17] J. Li, B. U¸car, ¨U. V. C¸ ataly¨urek, J. Sun, K. Barker, and R. Vuduc, “Eﬃcient and eﬀective sparse tensor reordering,” in Proceedings of the ACM International Conference on Supercomputing, 2019, pp. 227–237.

[18] A. Maas, R. E. Daly, P. T. Pham, D. Huang, A. Y. Ng, and C. Potts, “Learning word vectors for sentiment analysis,” in Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies, 2011, pp. 142–150.

電子全文
中英文摘要

推文
推薦
評分
引用網址
轉寄

top

詳目顯示

相關論文