帳號:guest(3.145.63.33)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):高俊生
作者(外文):Kao, Jyun Sheng
論文名稱(中文):應用於串流圖資料的分布式漸進模式匹配演算法
論文名稱(外文):Distributed Incremental Pattern Matching on Streaming Graphs
指導教授(中文):周志遠
指導教授(外文):Chou, Jerry
口試委員(中文):李哲榮
許慶賢
口試委員(外文):Lee, Che Rung
Hsu, Ching Hsien
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊工程學系
學號:103062587
出版年(民國):105
畢業學年度:104
語文別:英文
論文頁數:48
中文關鍵詞:串流數據圖模式匹配演算法漸進演算法分散式計算
外文關鍵詞:streaming datagraph pattern matchingincremental algoirthmdistributed computing
相關次數:
  • 推薦推薦:0
  • 點閱點閱:439
  • 評分評分:*****
  • 下載下載:3
  • 收藏收藏:0
隨著大數據研究的興起,提出了許多資料處理系統如Hadoop, Spark等。而在機器學習、社群網路、道路網絡流量和智慧電網等許多的應用中,資料可以被更進一步表示成「圖」的資料結構,因此許多的大數據圖處理系統架構相繼在近幾年提出,如Graphlab、Giraph等。然而,現存對於圖處理系統結合串流數據的研究依舊有限。因此,此研究中,我們的目標是開發一個用於解決圖模式匹配查詢的分佈式圖處理系統,資料圖會隨著外部讀入的串流數據而不停改變拓撲結構或屬性資料。為了實現這一目標,我們提出漸進式匹配算法上並將它實現在由Stanford大學開源的圖處理系統,GPS。這是一個以Vertex為計算單元的分佈式圖處理系統。我們改寫了GPS的架構來支援串流圖數據,並且我們更進一步採用subgraph-centric資料模型來減少網路傳輸開銷,以提升系統效能。我們使用維基百科作為效能評估的真實資料,我們的結果相較於傳統的批次做法有3 ~ 10倍的加速,並顯著降低網路流量和記憶體空間的使用情況。
Big data has shifted the computing paradigm of data analysis. While some of the data can be treated as simple texts or independent data records, many other applications have data with structural patterns which are modeled as a graph, such as social media, road network traffic and smart grid, etc. However, there is still limited amount of work has been done to address the velocity problem of graph processing. In this work, we aim to develop a distributed processing system for solving pattern matching queries on streaming graphs where graphs evolve over time upon the arrives of streaming graph update events. To achieve the goal, we proposed an incremental pattern matching algorithm and implemented it on GPS, a vertex centric distributed graph computing framework. We also extended the GPS framework to support streaming graph, and adapted a subgraph-centric data model to further reduce communication overhead and system performance. Our evaluation using real wiki trace shows that our approach achieves a 3x ~ 10x speedup over the batch algorithm, and significantly reduces network and memory usage.
1 Introduction 4

2 Problem Definition 7

3 Algorithm 10
3.1 UpdatePhase 12
3.2 Self-Checking Phase 14
3.3 Propagation Phase 15

4 Implementation 17

4.1 Architecture 17
4.2 Incremental Computation Engine 18
4.3 Subgraph 19
4.4 Network Optimization 20

5 Evaluation 22

5.1 Environment Setup 22
5.2 Wikipedia Dataset 22
5.3 Other Settings 23
5.4 Incremental vs. Batch Algorithm 24
5.5 Resource Usage Analysis 26
5.6 Subgraph Size Analysis 28
5.7 Query Type Analysis 30

6 Related Work 32
7 Conclusion 35
Appendices 41
A Subgraph API 42
A.1 Computation related API 42
A.2 Topology related API 43
B Inner Vertex API 45
C Master API 48


[1] Apache giraph. http://giraph.apache.org/.
[2] Hadoop. http://hadoop.apache.org.
[3] Nmon - nigel’s performance monitor. http://nmon.sourceforge.net/ pmwiki.php.
[4] Rabbitmq. http://https://www.rabbitmq.com.
[5] Brynielsson, J., Hogberg, J., Kaati, L., M ̊artenson, C., and Sven- son, P. Detecting social positions using simulation. In Advances in Social Networks Analysis and Mining (ASONAM), 2010 International Conference on (2010), IEEE, pp. 48–55.
[6] Cheng, R., Hong, J., Kyrola, A., Miao, Y., Weng, X., Wu, M., Yang, F., Zhou, L., Zhao, F., and Chen, E. Kineograph: taking the pulse of a fast-changing and connected world. In Proceedings of the 7th ACM european conference on Computer Systems (2012), ACM, pp. 85–98.
[7] Ediger, D., McColl, R., Riedy, J., and Bader, D. A. Stinger: High performance data structure for streaming graphs. In High Performance Extreme Computing (HPEC), 2012 IEEE Conference on (2012), IEEE, pp. 1–5.
[8] Fan, W., Li, J., Luo, J., Tan, Z., Wang, X., and Wu, Y. Incremental graph pattern matching. In Proceedings of the 2011 ACM SIGMOD Inter- national Conference on Management of Data (New York, NY, USA, 2011), SIGMOD ’11, ACM, pp. 925–936.
[9] Fan, W., Li, J., Ma, S., Tang, N., Wu, Y., and Wu, Y. Graph pat- tern matching: from intractable to polynomial time. Proceedings of the VLDB Endowment 3, 1-2 (2010), 264–275. 37
[10] Fan, W., Wang, X., Wu, Y., and Deng, D. Distributed graph simulation: Impossibility and possibility. Proceedings of the VLDB Endowment 7, 12 (2014), 1083–1094.
[11] Fard, A., Nisar, M. U., Ramaswamy, L., Miller, J. A., and Saltz, M. A distributed vertex-centric approach for pattern matching in massive graphs. In Big Data, 2013 IEEE International Conference on (2013), IEEE, pp. 403–411.
[12] Garey, M. R., and Johnson, D. S. Computers and Intractability; A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., New York, NY, USA, 1990.
[13] Godskesen, J. C., and Nanz, S. Mobility models and behavioural equiv- alence for wireless networks. In Coordination Models and Languages (2009), Springer, pp. 106–122.
[14] Henzinger, M. R., Henzinger, T. A., and Kopke, P. W. Computing simulations on finite and infinite graphs. In Foundations of Computer Science, 1995. Proceedings., 36th Annual Symposium on (1995), IEEE, pp. 453–462.
[15] Khayyat, Z., Awara, K., Alonazi, A., Jamjoom, H., Williams, D., and Kalnis, P. Mizan: a system for dynamic load balancing in large-scale graph processing. In Proceedings of the 8th ACM European Conference on Computer Systems (2013), ACM, pp. 169–182.
[16] Low, Y., Bickson, D., Gonzalez, J., Guestrin, C., Kyrola, A., and Hellerstein, J. M. Distributed graphlab: a framework for machine learning and data mining in the cloud. Proceedings of the VLDB Endowment 5, 8 (2012), 716–727.
[17] Ma, S., Cao, Y., Huai, J., and Wo, T. Distributed graph pattern match- ing. In Proceedings of the 21st International Conference on World Wide Web (New York, NY, USA, 2012), WWW ’12, ACM, pp. 949–958.
[18] Malewicz, G., Austern, M. H., Bik, A. J., Dehnert, J. C., Horn, I., Leiser, N., and Czajkowski, G. Pregel: a system for large-scale graph 38
processing. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data (2010), ACM, pp. 135–146.
[19] McGregor, A. Graph stream algorithms: a survey. ACM SIGMOD Record 43, 1 (2014), 9–20.
[20] Ntoulas, A., Cho, J., and Olston, C. What’s new on the web?: the evolution of the web from a search engine perspective. In Proceedings of the 13th international conference on World Wide Web (2004), ACM, pp. 1–12.
[21] Quamar, A., Deshpande, A., and Lin, J. Nscale: neighborhood-centric analytics on large graphs. Proceedings of the VLDB Endowment 7, 13 (2014), 1673–1676.
[22] Roy, A., Mihailovic, I., and Zwaenepoel, W. X-stream: Edge-centric graph processing using streaming partitions. In Proceedings of the Twenty- Fourth ACM Symposium on Operating Systems Principles (2013), ACM, pp. 472–488.
[23] Salihoglu, S., and Widom, J. Gps: A graph processing system. In Proceed- ings of the 25th International Conference on Scientific and Statistical Database Management (2013), ACM, p. 22.
[24] Simmhan, Y., Kumbhare, A., Wickramaarachchi, C., Nagarkar, S., Ravi, S., Raghavendra, C., and Prasanna, V. Goffish: A sub-graph centric framework for large-scale graph analytics. In Euro-Par 2014 Parallel Processing. Springer, 2014, pp. 451–462.
[25] Stanton, I., and Kliot, G. Streaming graph partitioning for large dis- tributed graphs. In Proceedings of the 18th ACM SIGKDD international con- ference on Knowledge discovery and data mining (2012), ACM, pp. 1222–1230.
[26] Tian, Y., Balmin, A., Corsten, S. A., Tatikonda, S., and McPher- son, J. From think like a vertex to think like a graph. Proceedings of the VLDB Endowment 7, 3 (2013), 193–204.
[27] Vaquero, L., Cuadrado, F., Logothetis, D., and Martella, C. Adaptive partitioning for large-scale dynamic graphs. In Distributed Computing 39
Systems (ICDCS), 2014 IEEE 34th International Conference on (2014), IEEE, pp. 144–153.
[28] Wickramaarachchi, C., Frincu, M., and Prasanna, V. Enabling real- time pro-active analytics on streaming graphs. algorithms 15 , 18.
[29] Wickramaarachchi, C., Kumbhare, A., Frincu, M., Chelmis, C., and Prasanna, V. K. Real-time analytics for fast evolving social graphs. In Cluster, Cloud and Grid Computing (CCGrid), 2015 15th IEEE/ACM Inter- national Symposium on (2015), IEEE, pp. 829–834.
[30] Xin, R. S., Gonzalez, J. E., Franklin, M. J., and Stoica, I. Graphx: A resilient distributed graph system on spark. In First International Workshop on Graph Data Management Experiences and Systems (2013), ACM, p. 2.
[31] Yan, D., Cheng, J., Lu, Y., and Ng, W. Blogel: A block-centric framework for distributed computation on real-world graphs. Proceedings of the VLDB Endowment 7, 14 (2014), 1981–1992.

 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *