帳號:guest(          離開系統
字體大小: 字級放大   字級縮小   預設字形  


作者(中文):謝 坤
作者(外文):XIE, KUN
論文名稱(中文):運用深度學習優化 Spark 上的稀疏矩陣相乘
論文名稱(外文):The Optimization of SpMV by Deep Learning on Spark
指導教授(外文):LEE, CHE-RUNG
口試委員(外文):CHOU, JERRY
外文關鍵詞:SpMVSparkDeep LearningCNN
  • 推薦推薦:0
  • 點閱點閱:446
  • 評分評分:*****
  • 下載下載:4
  • 收藏收藏:0
著重於優化Spark 上SpMV 的性能,利用內存技術(Cache) 來提高分佈式
計算進行分類,以便根據矩陣結構和操作配置(包括迭代次數和Spark 節
羅里達稀疏矩陣集合(The University of Florida Sparse Matrix Collection)
比,SpMV 的性能可以提高近2.5倍。
Sparse matrix-vector multiplication (SpMV) is an important computa-
tional operation for solving large scale numerical problems. However, its
performance depends on not only the running platforms but also the struc-
ture of matrices. In this thesis, we focus on optimizing the performance of
SpMV on Spark, which utilizes in-memory technique to improve the perfor-
mance of iterative algorithms in the distributed environments. We designed
a deep learning model to classify the matrices and SpMV operations, so that
the best data format can be decided based on the matrix structures and op-
erational con gurations, including the number of iterations and the size
of memory in Spark nodes. The matrix structures are encoded into small
images, in which each pixel represents the density of nonzeros in a block
submatrix, and recognized by using convolution neural network (CNN).
Other con gurations are augmented to the model to increase the enhance
the accuracy of prediction. We used more than one thousand matrices from
Florida Sparse Matrix Collection for training and testing, and ne tuned
the model to achieve the best accuracy. Experiments show that our model can reach 82% accuracy can be obtained for top 1 choices and 94% for top
2 choice. The performance of SpMV can be improved near 2.5 times by our
method comparing that of using single sparse data format.
1 Introduction 1
2 Background 4
2.1 Sparse Matrix-Vector Multiplication (SpMV) . . . . . . . . . . . . . 4
2.2 Apache Spark . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.1 Spark Core . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.2 MLLib . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3 Data Formats Of Sparse Matrix . . . . . . . . . . . . . . . . . . . . 6
2.3.1 Coordinate Format (COO) . . . . . . . . . . . . . . . . . . . 6
2.3.2 Compressed Sparse Row (CSR) . . . . . . . . . . . . . . . . 6
2.3.3 Compressed Sparse Column (CSC) . . . . . . . . . . . . . . 7
2.3.4 Block Compressed Sparse Row (BCSR) . . . . . . . . . . . . 8
2.4 Data Formats Of Sparse Matrix in Spark MLlib . . . . . . . . . . . 8
2.4.1 Local Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4.2 Distributed Matrix . . . . . . . . . . . . . . . . . . . . . . . 10
2.5 Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.5.1 Convolutional Neural Networks (CNN) . . . . . . . . . . . . 12
2.5.2 TensorFlow . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.5.3 Keras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.6 The University of Florida Sparse Matrix Collection . . . . . . . . . 14
3 Motivation 15
3.1 The Limitation of MLlib Implementations . . . . . . . . . . . . . . 15
3.2 The Limitation of the SpMV on Spark . . . . . . . . . . . . . . . . 15
3.3 The Structure of the Matrix . . . . . . . . . . . . . . . . . . . . . . 16
3.4 The Limitation of Adaptive SpMV . . . . . . . . . . . . . . . . . . 17
4 Algorithms and Implementations 19
4.1 Training Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.1.1 The Image of Matrix . . . . . . . . . . . . . . . . . . . . . . 20
4.1.2 The Con gurations of Dataset . . . . . . . . . . . . . . . . . 21
4.1.3 The Label of Training Set . . . . . . . . . . . . . . . . . . . 22
4.1.4 Pre-processing . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.2 Deep Learning Model Design . . . . . . . . . . . . . . . . . . . . . . 25
4.2.1 Baseline Model . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.2.2 Fine-Tuned Model . . . . . . . . . . . . . . . . . . . . . . . 27
4.2.3 Additional Feature . . . . . . . . . . . . . . . . . . . . . . . 30
4.3 Training Dataset Preparation . . . . . . . . . . . . . . . . . . . . . 32
4.3.1 Block Coordinate Format (BCOO+) . . . . . . . . . . . . . 32
4.3.2 Coordinate Format (COO) . . . . . . . . . . . . . . . . . . . 35
4.3.3 Compressed Sparse Row (CSR) . . . . . . . . . . . . . . . . 35
4.3.4 Compressed Sparse Column (CSC) . . . . . . . . . . . . . . 37
5 Experiments and Result 39
5.1 The Experiments of Learning . . . . . . . . . . . . . . . . . . . . . 39
5.1.1 Experiments Setting . . . . . . . . . . . . . . . . . . . . . . 39
5.1.2 The Structure of the Baseline Model . . . . . . . . . . . . . 39
5.1.3 The Tuning of the Activation Function . . . . . . . . . . . . 40
5.1.4 The Tuning of the Batch Normalization . . . . . . . . . . . . 41
5.1.5 The Image of Matrix . . . . . . . . . . . . . . . . . . . . . . 41
5.1.6 The Overall Performance of Adaptive Format . . . . . . . . 42
5.1.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5.2 The Experiments of SpMV on Spark . . . . . . . . . . . . . . . . . 45
5.2.1 Experiments Setting . . . . . . . . . . . . . . . . . . . . . . 45
5.2.2 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.2.3 Benchmark Experiment of COO, CSC and CSR . . . . . . . 47
5.2.4 Experiments of BCOO+ . . . . . . . . . . . . . . . . . . . . 48
6 Conclusion and Future Works 50
Reference 51
[1] Bolye, P. (1977), \Options: A Monte Carlo Approach". Journal of Financial
Economics, 4, 323-338
[2] R. W. Vuduc. \Automatic Performance Tuning of Sparse Matrix Kernels".
PhD thesis, 2003. AAI3121741.
[3] Hubel, D. and Wiesel, T. (1968). Receptive elds and functional architecture
of monkey striate cortex. Journal of Physiology (London), 195, 215{243.
[4] R. Kannan, \Ecient sparse matrix multiple-vector multiplication using a
bitmapped format" in HiPC, IEEE, pp. 286-294, 2013.
[5] Zhang Y, Li S, Yan S, Zhou H. \A cross-platform SpMV framework on many-
core architectures". ACM Trans Archit Code Optim. 2016;13(4):33:1-33:25.
[6] N. Bell and M. Garland, Ecient sparse matrix-vector multiplication on
CUDA, NVIDIA Technical Report, NVR-2008-004, NVIDIA Corporation,
[7] Bell, N. and Garland, M., 2009, November. Implementing sparse matrix-
vector multiplication on throughput-oriented processors. In Proceedings of the
conference on high performance computing networking, storage and analysis
(p. 18). ACM.
[8] Liu, H., Yu, S., Chen, Z., Hsieh, B. and Shao, L., 2012. Sparse matrix-vector
multiplication on NVIDIA GPU. International Journal of Numerical Analysis
& Modeling, Series B, 3(2), pp.185-191.
[9] Nathan Bell. 2011. Sparse Matrix Representations &
Iterative Solvers. NVIDIA. [ONLINE] Available at:
http://www.bu.edu/pasi/ les/2011/01/NathanBell1-10-1000.pdf.
[10] Zardoshti, P., Khunjush, F. and Sarbazi-Azad, H., 2016. Adaptive sparse
matrix representation for ecient matrix{vector multiplication. The Journal of Supercomputing, 72(9), pp.3366-3386.
[11] Kourtis, K., Karakasis, V., Goumas, G. and Koziris, N., 2011, February.
CSX: an extended compression format for spmv on shared memory systems.
In ACM SIGPLAN Notices (Vol. 46, No. 8, pp. 247-256). ACM.
[12] Spark.apache.org. (2018). Spark SQL & DataFrames | Apache Spark. [on-
line] Available at: https://spark.apache.org/sql/ [Accessed 23 Jan. 2018]
[13] Spark.apache.org. (2018). Spark Streaming | Apache Spark. [online] Avail-
able at: https://spark.apache.org/streaming/ [Accessed 23 Jan. 2018].
[14] Spark.apache.org. (2018). MLlib | Apache Spark. [online] Available at:
https://spark.apache.org/mllib/ [Accessed 23 Jan. 2018].
[15] Spark.apache.org. (2018). GraphX | Apache Spark. [online] Available at:
https://spark.apache.org/graphx/ [Accessed 23 Jan. 2018].
[16] A. Benatia, W. Ji, Y. Wang and F. Shi, "Machine Learning Approach for the
Predicting Performance of SpMV on GPU," 2016 IEEE 22nd International
Conference on Parallel and Distributed Systems (ICPADS), Wuhan, 2016,
pp. 894-901.
[17] Mishkin, D., Matas, J.: All you need is a good init. In: ICLR (2016)
[18] Romero, Adriana, Ballas, Nicolas, Kahou, Samira Ebrahimi, Chassang, An-
toine, Gatta, Carlo, and Bengio, Yoshua. Fitnets: Hints for thin deep nets.
In Proceedings of ICLR, May 2015. URL http://arxiv.org/abs/1412.6550.
[19] Eunbyung Park, Xufeng Han, Tamara L Berg, and Alexander C Berg. 2016.
Combining multiple sources of knowledge in deep cnns for action recognition.
In Proceedings of AWACV. IEEE, pages 1{8.
[20] Gunter Klambauer, Thomas Unterthiner, Andreas Mayr, and Sepp Hochre-iter. Self-normalizing neural networks. arXiv preprint arXiv:1706.02515, 2017.
[21] Io e, S. and Szegedy, C., 2015, June. Batch normalization: Accelerating deep
network training by reducing internal covariate shift. In International Con-
ference on Machine Learning (pp. 448-456).
第一頁 上一頁 下一頁 最後一頁 top
* *