帳號:guest(18.218.167.247)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):陳怡仁
作者(外文):Chen, Yi Ren
論文名稱(中文):G-Storm: 具 GPU 感知之 Storm 規劃方法
論文名稱(外文):G-Storm: GPU-Aware Scheduling in Storm
指導教授(中文):李哲榮
指導教授(外文):Lee, Che Rung
口試委員(中文):周志遠
蕭宏章
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊工程學系
學號:102062652
出版年(民國):104
畢業學年度:103
語文別:英文
論文頁數:26
中文關鍵詞:大數據串流處理GPUStorm
外文關鍵詞:big datastream processGPUStorm
相關次數:
  • 推薦推薦:0
  • 點閱點閱:413
  • 評分評分:*****
  • 下載下載:12
  • 收藏收藏:0
現今我們正邁向資料經濟的時代,如何能有效分析大量數據則成為成功的關鍵。目前有許多用於處理巨量資料的系統已經被開發出來,當中Storm是為了處理資料串流而設計的。Storm預設只使用了相當簡易的round-robin策略來對工作進行排程。這種策略在同質平台的環境下可以達到不錯的成效,但是在異質環境下則無法達到有效的利用。
此篇論文我們設計並實作出G-Storm,一種新的Storm排程演算法,能讓Storm有效地評估並利用GPU計算卡來加速計算效能。我們的實驗顯示G-Storm在工作量較輕的情況下可以比Storm預設的工作排程多出1.65倍的效能,而在工作量較重的情況下更可達到將近2.04倍的加速。
Now we are shifting toward to a data driven economy, in which the ability to efficiently analyze huge amount of data in time is the key to successes. Many systems
for big data processing have been developed and Storm is one of them, whose target is stream data processing. By default Storm only provides a very simple round
robin scheduling policy to assign tasks. The default scheduler can provides nice performance for homogeneous platforms, but does not work well for heterogeneous
computing environments.
In this thesis, we propose and implement a new Storm scheduling algorithm, named G-Storm, such that Storm can evaluate GPU capacity for scheduling and more effectively make use of GPU to speed up the overall performance. The experimental results show that G-Storm can achieve 1.65x to 2.04x performance acceleration on lightly weight and heavily loading of topology, compared to Storm with
default scheduler.
1 Introduction 1
2 Background 3
2.1 Components of Storm cluster 3
2.2 Storm Topology 4
2.3 Internal messaging within Storm worker processes 5
2.4 Guaranteeing Message Processing 7
2.5 Storm Cluster Configuration 8
2.6 Storm Scheduler 8
3 Design and Implementation of G-Storm 10
3.1 Design of G-Storm 10
3.1.1 GPU Computation Capability Estimation 11
3.1.2 Executor Scheduling Algorithm 11
4 Performance Evaluation 14
5 Related Work 21
6 Conclusion 23
ffmpeg. http://www.ffmpeg.org.
Lmax disruptor. https://github.com/LMAX-Exchange/disruptor.
Netty. http://netty.io/.
Nvidia cuda document. https://developer.nvidia.com/cuda-toolkit-65.
Zeromq. http://zeromq.org/.
Gang Chen 0001, Ke Chen 0005, Dawei Jiang, Beng Chin Ooi, Lei Shi, Hoang Tam Vo, and Sai Wu. E3: an elastic execution engine for scalable data processing.
Lisa Amini, Henrique Andrade, Ranjita Bhagwan, Frank Eskesen, Richard King, Philippe Selo, Yoonho Park, and Chitra Venkatramani. Spc: A dis-tributed, scalable platform for data mining.
Leonardo Aniello, Roberto Baldoni, and Leonardo Querzoni. Adaptive online scheduling in storm.
Apache Software Foundation. Storm. http://storm.apache.org.
Brian Babcock, Shivnath Babu, Mayur Datar, Rajeev Motwani, and Jennifer Widom. Models and issues in data stream systems.
Vinayak Borkar, Michael Carey, Raman Grover, Nicola Onose, and Rares Vernica. Hyracks: A flexible and extensible foundation for data-intensive computing.
M. Cammert, C. Heinz, J. Kramer, B. Seeger, S. Vaupel, and U. Wolske. Flexible multi-threaded scheduling for continuous queries over data streams.
Fangfei Chen, M. Kodialam, and T.V. Lakshman. Joint scheduling of processing and shuffle phases in mapreduce systems.
Tyson Condie, Neil Conway, Peter Alvaro, Joseph M. Hellerstein, Khaled Elmeleegy, and Russell Sears.
Mapreduce online.
Tyson Condie, Neil Conway, Peter Alvaro, Joseph M. Hellerstein, John Gerth,
Justin Talbot, Khaled Elmeleegy, and Russell Sears. Online aggregation and continuous query support in mapreduce.
Jeffrey Dean and Sanjay Ghemawat. Mapreduce: Simplified data processing on large clusters.
Kaiming He, Jian Sun, and Xiaoou Tang. Single image haze removal using dark channel prior.
Patrick Hunt, Mahadev Konar, Flavio P. Junqueira, and Benjamin Reed. Zookeeper: Wait-free coordination for internet-scale systems.
Vibhore Kumar, Henrique Andrade, Bu ̆gra Gedik, and Kun-Lung Wu. Deduce: At the intersection of mapreduce and stream processing.
Muhammad Anis Uddin Nasir, Gianmarco De Francisci Morales, David Garc ́ıaSoriano, Nicolas Kourtellis, and Marco Serafini. The power of both choices: Practical load balancing for distributed stream processing engines.
L. Neumeyer, B. Robbins, A. Nair, and A. Kesari. S4: Distributed stream computing platform.
M. Rychly, P. Koda, and P. Smrz. Scheduling decisions in stream processing on heterogeneous clusters.
K. Shvachko, Hairong Kuang, S. Radia, and R. Chansler. The hadoop distributed file system.
Jielong Xu, Zhenhua Chen, Jian Tang, and Sen Su. T-storm: Traffic-aware online scheduling in storm.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *