帳號:guest(3.12.164.62)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):楊勝婷
作者(外文):Yang, Sheng-Ting
論文名稱(中文):整合型資訊服務平台:應用視覺化與智慧分析技術於巨量資料分析之研究
論文名稱(外文):Integrated Information Service Platform: Applications of Information Visualization and Intelligent Analysis to Big Data
指導教授(中文):廖崇碩
指導教授(外文):Liao, Chung-Shou
口試委員(中文):謝孫源
林清池
學位類別:碩士
校院名稱:國立清華大學
系所名稱:工業工程與工程管理學系
學號:101034554
出版年(民國):103
畢業學年度:102
語文別:中文
論文頁數:64
中文關鍵詞:巨量資料服務科學資訊平台
外文關鍵詞:Big Dataservices scienceinformation platform
相關次數:
  • 推薦推薦:0
  • 點閱點閱:1131
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
隨著網際網路、行動裝置及社交媒體的普及,流通於全球的資料量呈現爆炸性的增長,國際數據資訊(International Data Corporation, IDC)估計從2013到2020年資料量將會增加10倍,總量將從4.4兆Gigabytes成長到44兆Gigabytes,資料的暴增讓巨量資料(Big Data)的分析及應用成了極為重要的課題,若能適當運用這些資料,將可為各方面帶來新的價值與創意,為資訊及服務的新趨勢。
本論文嘗試結合資訊服務平台的概念,整合視覺化技術和智慧分析功能來提供多元的資訊服務,建置一整合型資訊服務平台,以提供巨量資料之分析工具,並促進智慧的雲端服務應用之發展。在視覺化技術中,本論文改良繪圖工具建置一套iCircos自動視覺化工具,讓資料經上傳即可自動繪製出結構完整之圖形,降低使用者操作門檻並減少學習操作之時間,協助進行資料之判讀及分析,提供自動視覺化服務;而在智慧分析功能中,本論文結合智慧分析工具於資訊服務平台中,讓資料之分析不受限於使用者的知識及背景,提升資料分析於資訊服務之應用。平台提供了三大服務:資料過濾服務、資料視覺化服務以及資料分析技術,協助進行巨量資料的過濾、分析及視覺化,使資料呈現的更快速且更具價值。
本論文並使用了醫療領域之疾病資料作為案例應用,分析結果可協助使用者及專家從龐大的實驗資料中獲取更多具有價值之資訊。其中,視覺化技術可輔助快速鑑定境外移入病例及找出相關地域關聯性;另外,智慧分析模組可發掘出與基因調控機制有關的基因、研究基因突變產生抗藥性原因與藥物治療對策等等。平台成果能夠作為巨量資料之服務平台雛型,並與長庚醫療生物中心合作,成功輔助其目標區域定序(Targeted Sequencing)醫療系統之開發,證明本論文建置之資訊服務平台的有效性。最後,我們期待此平台能提供使用者在其他研究領域的巨量資料之處理、分析及視覺化服務,作為專業學者深度研究之參考依據。
With the rapid development of the Internet, mobile devices and social media, the available data around the world have grown explosively. IDC (International Data Corporation) estimates that the amount of data will increase by 10 times from 2013 to 2020; precisely, the total amount may grow from 4.4 trillion gigabytes to 44 trillion gigabytes. The explosion causes the analysis and applications of Big Data to become an important research issue. If the Big Data can be appropriately used, it will bring new values and innovations for life as well as a new trend of information systems and services.
This paper attempts to incorporate the functions of information visualization and intelligent analysis of Big Data into the concept of service platforms. We build an integrated information service platform to provide diverse services for users, and promote the development of cloud service applications. This platform provides three kinds of main services: data filtering, data visualization and intelligent analysis, which help users represent the data in a faster and more effective way.
In addition, we demonstrate the usefulness of our platform by testing medical disease data as case studies. The result shows that our platform can effectively help users to retrieve valuable information from the vast experimental data. More specifically, to the visualization tool helps identify imported diseases and its location areas. Moreover, the intelligent analysis detects new test reagents and the antibiotic resistance genes for drug development. This prototype can thus offer data analysis service on Big Data via this user-friendly interface. Finally, we collaborated with Chang Gung Molecular Medicine Research Center by applying our function modules-automatic visualization and intelligence analysis tools to their Targeted Sequencing System. Therefore, our platform can successfully support the construction of other information service systems. In the future, we expect that the platform can provide an integrated service of data processing, data analysis, and data visualization on Big Data in other research fields.
摘要 II
Abstract III
誌謝 IV
目錄 V
圖目錄 VII
表目錄 IX
第一章 緒論 1
1.1研究動機與目的 1
1.2研究貢獻 3
1.3論文架構 4
第二章 文獻回顧 6
2.1巨量資料簡介 6
2.1.1巨量資料特性 6
2.1.2巨量資料處理技術 7
2.2視覺化分析工具之探討 10
2.2.1線性分析工具 10
2.2.2圓弧形分析工具 11
2.3智慧分析工具之探討 15
2.3.1基因集分析(Gene Set Analysis, GSA)演算法 15
2.3.2階層式分群法(Hierarchical Clustering) 18
第三章 整合型資訊服務平台介紹 21
3.1整合型資訊服務平台建置 21
3.2視覺化功能建置-iCircos 24
3.3資料過濾功能建置 30
3.4智慧分析工具建置 32
第四章 平台成果分析與討論 36
4.1整合型資訊服務平台 36
4.1.1平台介面介紹 36
4.1.2平台功能介紹 37


4.2應用案例 42
4.2.1地域性傳染病差異性分析 43
4.2.2性接觸傳染病集群分析 48
4.2.3協助目標區域定序(Targeted Sequencing)醫療系統開發 56
第五章 結論與未來展望 59
5.1結論 59
5.2未來展望 60
參考文獻 61
[1] Apache Hadoop. Accessed: 5/20/2014. [Online]. Available at http://hadoop.apache.org
[2] K. Arakawa K, S. Tamaki, N. Kono, N. Kido, K. Ikegami, R. Ogawa, M. Tomita. Genome Projector: zoomable genome map with multiple views. BMC Bioinformatics, (2009), 10(1):31.
[3] R.J. Baerends, W.K. Smits, A. de Jong, L.W. Hamoen, J. Kok, O.P. Kuipers. Genome2D: a visualization tool for the rapid analysis of bacterial transcriptome data. Genome Biology, (2004), 5(5):R37.
[4] S. Brearton, D. Calleja, D. Jermyn. Plan for 2011: Get a job. The Globe and Mail, (2010), 12.
[5] CAPA2. Accessed: 5/21/2014. [Online]. Available at http://cgts.cgu.edu.tw/cpap2/
[6] T.J. Carver, K.M. Rutherford, M. Berriman, M.A. Rajandream, B.G. Barrell, J. Parkhill. ACT: the Artemis Comparison Tool. Bioinformatics, (2005), 21(16):3422-3423.
[7] T. Carver, N. Thomson, A. Bleasby, M. Berriman, J. Parkhill. DNAPlotter: circular and linear interactive genome visualization. Bioinformatics, (2009), 25(1):119-120.
[8] Clustering Software. Accessed: 5/15/2014. [Online]. Available at http://www.stanford.edu/group/sherlocklab/cluster.html
[9] A.C. Darling, B. Mau, F.R. Blattner, N.T. Perna. Mauve: multiple alignment of conserved genomic sequence with rearrangements. Genome Research, (2004), 14(7):1394-1403.
[10] J. Dean, S. Ghemawat. MapReduce: simplified data processing on large clusters. Communications of the ACM, (2008), 51(1):107-113.
[11] DHL. Accessed: 5/15/2014. [Online]. Available at http://www.dhl.com.tw/
[12] B. Efron, R. Tibshirani. On testing the significance of sets of genes. The Annals of Applied Statistics, (2007), 1(1):107-129.
[13] R. Engels, T. Yu, C. Burge, J.P. Mesirov, D. DeCaprio, J.E. Galagan. Combo: a whole genome comparative browser. Bioinformatics, (2006), 22(14):1782-1783.
[14] K.A. Frazer, L. Pachter, A. Poliakov, E.M. Rubin, I. Dubchak. VISTA: computational tools for comparative genomics. Nucleic Acids Research, (2004), 32(suppl 2):W273-279.
[15] J.D. Gans, M. Wolinsky. Genomorama: genome visualization and analysis. BMC Bioinformatics, (2007), 8(1):204.
[16] J.R. Grant, P. Stothard. The CGView Server: a comparative genomics tool for circular genomes. Nucleic Acids Research, (2008), 36(suppl 2):W181-184.
[17] P.F. Hallin, H.H. Stærfeldt, E. Rotenberg, T.T. Binnewies, C.J. Benham, D.W. Ussery. GeneWiz browser: An Interactive Tool for Visualizing Sequenced Chromosomes. Standards in Genomic Sciences, (2009), 1(2):204-215.
[18] P.F. Hallin, T.T. Binnewies, D.W. Ussery. The genome BLASTatlas - a GeneWiz extension for visualization of whole-genome homology. Molecular BioSystems, (2008), 4(5):363-371.
[19] S.I. Hay, C.A. Guerra, A.J. Tatem, A.M. Noor, R.W. Snow. The global distribution and population at risk of malaria: past, present, and future. The Lancet infectious diseases, (2004), 4(6):327-336.
[20] L. Horváthová, L. Šafaříková, M. Basler, I. Hrdý, N.B. Campo, J.W. Shin, K.Y. Huang, P.J. Huang, R. Lin, P. Tang, J. Tachezy. Transcriptomic identification of iron-regulated and iron-independent gene copies within the heavily duplicated Trichomonas vaginalis genome. Genome biology and evolution, (2012), 4(10):1017-1029.
[21] S.C. Johnson. Hierarchical clustering schemes. Psychometrika, (1967), 32(3):241-254.
[22] H.L. Kent. Epidemiology of vaginitis. American journal of obstetrics and gynecology, (1991), 165:1168-1176.
[23] R. Kerkhoven, F.H. Van Enckevort, J. Boekhorst, D. Molenaar, R.J. Siezen. Visualization for genomics: the Microbial Genome Viewer. Bioinformatics, (2004), 20(11):1812-1814.
[24] M. Krzywinski, J. Schein, I. Birol, J. Connors, R. Gascoyne, D. Horsman, S.J. Jones, M.A. Marra. Circos: an information aesthetic for comparative genomics. Genome Research, (2009), 19(9):1639-1645.
[25] D.P. Leader. BugView: a browser for comparing genomes. Bioinformatics, (2004), 20(1):129-130.
[26] J. Leahy. 2012-2013 Global Market Forecast. Airbus, (2013).
[27] V. Mayer-Schönberger, K, Cukier. Big Data: A Revolution That Will Transform How We Live, Work, and Think. Houghton Mifflin Harcourt, (2013).
[28] H. Minkoff, A.N. Grunebaum, R.H. Schwarz, J. Feldman, M. Crombleholme. et al. Risk factors for prematurity and premature rupture of membranes: a prospective study of the vaginal flora in pregnancy. American journal of obstetrics and gynecology, (1984), 150(8):965-972.
[29] O. Miotto, et al. Multiple populations of artemisinin-resistant Plasmodium falciparum in Cambodia. Nature genetics, (2013), 45:648–655.
[30] S. Murray. Interactive Data Visualization for the Web. O'Reilly Media, (2013).
[31] L. Pritchard, J.A. White, P.R. Birch, I.K. Toth. GenomeDiagram: a python package for the visualization of large-scale genomic data. Bioinformatics, (2006), 22(5):616-617.
[32] D.E. Soper, R.C. Bump, W.G. Hurt. Bacterial vaginosis and trichomonas vaginitis are risk factors for cuff cellutitis after abdominal hysterectomy. (1990), American journal of obstetrics and gynecology, 163(3):1016-1021.
[33] A. Sowunmi, B.A. Fateye. Plasmodium falciparum gametocytaemia in Nigerian children: before, during and after treatment with antimalarial drugs. Tropical Medicine & International Health, (2003), 8(9):783-792.
[34] P. Stothard, D.S. Wishart. Circular genome visualization and exploration using CGView. Bioinformatics, (2005), 21(4):537-539.
[35] V. Turner, J.F. Gantz, D. Reinsel, S. Minton. The Digital Universe of Opportunities: Rich Data and the Increasing Value of the Internet of Things. IDC iView: IDC Analyze the Future, (2014).
[36] V.G. Tusher, R. Tibshirani, G. Chu. Significance analysis of microarrays applied to the ionizing radiation response. Proceedings of the National Academy of Sciences, (2001), 98(9):5116-5121.
[37] C. Ware. Information visualization: perception for design. Elsevier, (2012).
[38] World Health Organization. Accessed: 5/23/2014. [Online]. Available: http://www.who.int/en/
[39] 地理資訊雲端服務平台,Accessed: 5/26/2014. [Online]. Available: http://www.makoci.com/
[40] 胡世忠,2013,「雲端時代的殺手級應用:Big Data海量資料分析」,天下雜誌。
[41] 張智星,1996,「資料分群與樣式辨認,Data Clustering and Pattern Recognition」,取自作者網站:http://neural.cs.nthu.edu.tw/jang/books/dcpr/
[42] 黃宏瑜,2013,「整合型能源資訊服務平台:再生能源產業預測與動態資料庫建置」,碩士論文,國立清華大學工業工程與工程管理學系。
[43] 劉慈明,2009,「桃園敏盛醫院,出院病患智能化創新照護服務系統」,神通資訊科技股份有限公司。
[44] 譚磊 著、胡嘉璽 譯,2013,「大數據挖掘-從巨量資料發現別人看不到的秘密」,上奇時代。
(此全文未開放授權)
電子全文
摘要
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *