帳號:guest(3.144.235.238)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):魏偉哲
作者(外文):Wei, Wei Che
論文名稱(中文):利用維持資料局部性及減少資料傳輸延遲提升雲平台之資料處理效能
論文名稱(外文):Maximize Data Processing Throughput on Cloud via Exploiting Data Locality and Minimizing Data Transfer Delay
指導教授(中文):周志遠
指導教授(外文):Chou, Chi Yuan
口試委員(中文):李哲榮
蕭宏章
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊工程學系
學號:102062703
出版年(民國):105
畢業學年度:104
語文別:英文
論文頁數:54
中文關鍵詞:排程吞吐量資料密集型計算資料流水線
外文關鍵詞:cloudflowshop problemthroughputdata intensive computingdata pipeline
相關次數:
  • 推薦推薦:0
  • 點閱點閱:435
  • 評分評分:*****
  • 下載下載:3
  • 收藏收藏:0
近年來,資訊量的成長非常快速。在這樣的趨勢下大數據成為了一個很重要的知識庫,且對於擴展性高的儲存系統跟計算系統的需求也日漸增多。因為大部分的使用者無法負擔昂貴的成本去購買大量的機器,越來越多的雲端供應商像是亞馬遜、微軟等公司開始架設一些雲端平台來提供使用者一些依照使用量付費的服務。然而,雲端供應商為了計價的方便,往往會將不同類型的服務獨立開來。舉例來說像是他們會提供儲存服務相關的產品,計算資源相關的產品,當然這些產品是各自獨立開來的系統。在一般的使用案例上,使用者會需要將他們的資料儲存在可靠且易擴展的儲存系統上,並且架設運算平台在這個系統之上去處理分析他們的資料。但是在這樣將儲存服務跟計算資源分開的環境上是很不方便去使用的。因此我們開發了一個服務去結合這兩種不同類型的平台且提出了一個資料流排程服務去處理在亞馬遜網路服務上處理多個資料分析工作的排程問題。除了提供一個好的方式使用亞馬遜的服務外,我們的排程服務比起亞馬遜提供的使用範例能有更好的執行成效。
In recent year, data increases in a rapid speed. With this trend, BigData becomes
a signi cant knowledge and the need for large scale storage and computing cluster
grows up too. Because not every user has enough funds to support large amount of
computers, more and more company, like Amazon, Microsoft, begin to build a plat-
form with many services on cloud and provide on demand service for users. However,
these cloud providers usually separate those di erent kinds of services independently
in order to price each service individually. For example, they will provide a storage
service, a virtual machine service or a simple cluster service while these services are
all independent. In a general use case, user will need to store their data in a high
reliability and scalability storage system and build a computing cluster above it to
analysis those data. It is not convenient to use the storage service and computing
cluster service in such situation. Therefore, we develop a service to integrate these
two kinds of services well and propose a data pipeline scheduling service for this
scenario dealing with multiple jobs on Amazon Web Service. Beside providing a
simple way to use Elastic MapReduce, the computing cluster service provided by
Amazon, this service also have a good performance improvement over the basic use
case proposed by Amazon.
1 Introduction 4
2 Background 8
2.1 Cloud Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2 Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3 Approach & Problem De nition 12
3.1 Proposed Data Pipeline Scheduling Service . . . . . . . . . . . . . . . 12
3.2 Scheduling Problem Formulation . . . . . . . . . . . . . . . . . . . . 13
4 Scheduling Algorithms 17
4.1 Optimal Schedule for In nite Bu er Capacity . . . . . . . . . . . . . 17
4.1.1 Optimal Sorting Algorithm . . . . . . . . . . . . . . . . . . . . 17
4.1.2 Optimality Proof & Analysis . . . . . . . . . . . . . . . . . . . 18
4.2 Algorithm for Finite Bu er Capacity . . . . . . . . . . . . . . . . . . 23
4.2.1 Local Search Algorithm . . . . . . . . . . . . . . . . . . . . . 24
4.2.2 Tabu Search Algorithm . . . . . . . . . . . . . . . . . . . . . . 25
4.3 Job Section Algorithm for Unbalanced Workload . . . . . . . . . . . . 27
5 Experimental Setup 30
5.1 Real Experiment Environment . . . . . . . . . . . . . . . . . . . . . . 30
5.2 Simulation Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
6 Algorithms Comparison & Analysis 32
6.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
6.2 I/O Intensity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 33
6.3 Workload Size Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 34
6.4 Bu er Capacity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 35
6.5 Converge Time Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 35
7 System Performance Evaluations 37
7.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
7.2 Makespan Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . 38
7.3 Data Transfer Time Comparison . . . . . . . . . . . . . . . . . . . . . 39
7.4 Data Processing Time Comparison . . . . . . . . . . . . . . . . . . . 41
8 Related Work 43
8.1 Flowshop Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
8.2 Cloud Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
9 Conclusion 46
[1] Amazon Data Pipeline. http://aws.amazon.com/datapipeline.
[2] Amazon DynamoDB. http://aws.amazon.com/dynamodb.
[3] Amazon EC2. http://aws.amazon.com/ec2.
[4] Amazon EMR. http://aws.amazon.com/elasticmapreduce.
[5] Amazon RDS. http://aws.amazon.com/rds.
[6] Amazon S3. http://aws.amazon.com/s3.
[7] Amazon Simple Work
ow Service. http://aws.amazon.com/swf.
[8] Amazon Web Services. http://aws.amazon.com.
[9] Apache Hadoop. http://hadoop.apache.org.
[10] Apache Spark. http://spark.apache.org.
[11] Azure HDInsight. http://azure.microsoft.com/services/hdinsight.
[12] Google Cloud Dataproc. http://cloud.google.com/dataproc.
[13] Google Cloud Platform. http://cloud.google.com.
[14] IBM BigInsights for Apache Hadoop. http://console.ng.bluemix.net/catalog/services/biginsights-
for-apache-hadoop.
[15] IBM Bluemix. http://console.ng.bluemix.net.
[16] Microsoft Azure. http://azure.microsoft.com.
[17] Rackspace Cloud Big Data. http://www.rackspace.com/cloud/big-data.
[18] Rackspace. http://www.rackspace.com.
[19] P. Brucker, S. Heitmann, and J. Hurink. Flow-shop problems with intermediate
bu ers. Technical report.
[20] H. G. Campbell, R. A. Dudek, and M. L. Smith. A heuristic algorithm for the
n job, m machine sequencing problem. Management Science, 16(10):B{630 {
B{637, 1970.
[21] M. R. Garey, D. Johnson, and R. Sethi. The complexity of
owshop and jobshop
scheduling. Mathematics of Operations Research, 1(2):117 { 129, 1976.
[22] S. Hammoud, M. Li, Y. Liu, N. Alham, and Z. Liu. Mrsim: A discrete
event based mapreduce simulator. In Fuzzy Systems and Knowledge Discov-
ery (FSKD), 2010 Seventh International Conference on, volume 6, pages 2993{
2997, Aug 2010.
[23] S. M. Johnson. Optimal two- and three-stage production schedules with setup
times included. Naval Research Logistics Quarterly, 1:61{68, 1954.
[24] B.-B. Li and L. Wang. A hybrid quantum-inspired genetic algorithm for mul-
tiobjective
ow shop scheduling. Systems, Man, and Cybernetics, Part B: Cy-
bernetics, IEEE Transactions on, 37(3):576{591, June 2007.
[25] M. Nawaz, E. E. Enscore, and I. Ham. A heuristic algorithm for the m-machine,
n-job
ow-shop sequencing problem. Omega, 11(1):91 { 95, 1983.
[26] C. H. Papadimitriou and P. C. Kanellakis. Flowshop scheduling with limited
temporary storage. J. ACM, 27(3):533{549, July 1980.
[27] S. R. Ramakrishnan, G. Swart, and A. Urmanov. Balancing reducer skew in
mapreduce workloads using progressive sampling. In Proceedings of the Third
ACM Symposium on Cloud Computing, SoCC '12, pages 16:1{16:14, New York,
NY, USA, 2012. ACM.
[28] C. Smutnicki. A two-machine permutation
ow shop scheduling problem with
bu ers. Operations-Research-Spektrum, 20(4):229{235, 1998.
[29] A. A. C. Sujit K. Dutta. Sequencing two-machine
ow-shops with nite inter-
mediate storage. Management Science, 21(9):989{996, 1975.
[30] A. Verma, L. Cherkasova, and R. H. Campbell. Aria: Automatic resource
inference and allocation for mapreduce environments. In Proceedings of the
8th ACM International Conference on Autonomic Computing, ICAC '11, pages
235{244, New York, NY, USA, 2011. ACM.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *