帳號:guest(3.137.168.42)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):林子瑜
作者(外文):Lin, Tz-Yu
論文名稱(中文):利用深度類神經網路重新思考高維度下近似查詢之存儲
論文名稱(外文):Rethinking the Storage for High-Dimensional Query Approximation using Deep Neural Networks
指導教授(中文):吳尚鴻
指導教授(外文):Wu, Shan-Hung
口試委員(中文):陳偉松
吳怡樂
彭⽂志
周志遠
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊工程學系所
學號:105062570
出版年(民國):107
畢業學年度:106
語文別:中文
論文頁數:42
中文關鍵詞:線上分析處理企業決策支援近似查詢處理維度災難深度類神經網路
外文關鍵詞:online analytical processingbusiness decision making supportapproximate query processcurse of dimensionalitydeep neural network
相關次數:
  • 推薦推薦:0
  • 點閱點閱:2297
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
線上分析處理是一種將資料庫用於資料分析的使用模式,其多用於決策系統支援、智慧型商務等。其中決策支援是個互動式的過程,因此資料庫系統對於分析查詢的回應時間需要符合使用者的要求。然而由於分析的資料量通常十分龐大,為了因應其有限度的回應時間而發展出近似查詢處理,近似查詢處理透過回應小誤差的近似查詢結果來避免掃描所有資料以達到短暫的回應時間。
現今的近似查詢處理於低維度的資料上取得了相當好的成就,然而在面對高維度的資料時,由於空間中的體積指數級膨脹而導致維度災難的問題。維度災難使得近似查詢處理為了維持高準確率而導致其存儲空間巨幅上升並且急遽拉長所需的回應時間,這違背了近似查詢處理最初所期望縮短回應時間的目標。
為了處理高維度下維度災難的問題,本篇論文提出基於深度類神經網路的近似查詢處理(DNN-Based Approach),利用深度類神經網路能夠克服維度災難的特性來維持高維度下優秀的存儲空間、回應時間以及準確率。另外以 DNN-Based Approach 為基礎,提出用於回答近似答案的新型態存儲系統 - 神經存儲(Neural Storage),其特點在於存儲空間的高壓縮比以及答案的高準確率。
最後透過實際的實驗驗證了 DNN-Based Approach 的確有能力克服維度災難,並且與最先進的近似查詢研究相比,DNN-Based Approach 可以利用比較少的存儲空間達到比較低的相對誤差並且回應時間非常短。另外在神經存儲方面,隨著資料的插入,神經存儲能夠保持低誤差,並且相較於其餘方法能夠有著更好的準確率-存儲空間比。

關鍵字: 線上分析處理、企業決策支援、近似查詢處理、維度災難、深度類神經網路
OLAP (Online Analytical Processing) is an access pattern of using a database. Typical applications of OLAP include decision making support and business intelligence. Since the decision making support is an interactive process, the main requirement of this kind of application is that the response time of answering analytical queries need to be short enough. However, the amount of data OLAP need to process is very huge which lead DBMS to have a long processing time. To deal with this problem, AQP (Approximate Query Processing) answers the approximate answer instead of scanning the whole data to achieve the goal of having a short response time.
The current researches of AQP can already achieve high quality when the data is low dimensional. However, when the data dimension grows high, the volume of the space increasing exponentially, all approaches of AQP will suffer from the curse of dimensionality. The curse of dimensionality will lead to large space requirement and long response time which is conflicting with the goal of AQP.
We propose a DNN-Based Approach which use the deep neural network to overcome the curse of dimensionality and keep the low storage cost, low response time and high accuracy at the same time. In addition, based on DNN-Based Approach, we present Neural Storage, a new type of storage for answering approximate answers. The characteristics of Neural Storage is the high storage compression rate and high answer quality.
In the experiment, we demonstrate the DNN-Based Approach can really overcome the curse of dimensionality. Furthermore, DNN-Based Approach achieves better accuracy with lower storage cost and shorter response time. On the other hand, Neural Storage keeps high accuracy when the data is continued coming and can have better accuracy-storage ratio than other methods.

Keyword: online analytical processing, business decision making support, approximate query process, curse of dimensionality, deep neural network
摘要 2
Abstract 3
致謝 4
目錄 5
第一章 前言 7
第二章 背景 9
第一節 情境 9
第二節 符號表示法 10
第三節 相關研究 11
第四節 深度類神經網路 13
第三章 要旨 15
第一節 DNN 如何克服維度災難 15
第二節 簡單想法 16
第三節 資料中心抽樣法 18
第四節 多樣輸出 19
第四章 神經存儲 21
第五章 評測 23
第一節 實驗設定 23
第二節 維度災難的克服 25
第三節 Range Selection 26
第四節 Range Selection on Partial Columns 31
第五節 Group By 33
第六節 神經存儲 34
第六章 結論 38
參考文獻 39
附錄 41
一、不同維度下不同彙總函式之比較 41
[1] Agarwal, S., Mozafari, B., Panda, A., Milner, H., Madden, S., & Stoica, I., “BlinkDB: queries with bounded errors and bounded response times on very large data.,” 於 European Conference on Computer Systems, 2013.
[2] L. Sidirourgos, M. L. Kersten, and P. A. Boncz., “SciBORQ: Scientifc data management with bounds on runtime and quality.,” 於 CIDR, 2011.
[3] B. Ding, S. Huang, S. Chaudhuri, K. Chakrabarti, and C. Wang., “Sample + Seek: approximating aggregates with distribution precision guarantee.,” 於 SIGMOD, 2016.
[4] Poosala, V., Haas, P. J., Ioannidis, Y. E., & Shekita, E. J., “Improved histograms for selectivity estimation of range predicates.,” 於 SIGMOD, 1996.
[5] Muralikrishna, M., & DeWitt, D. J., “Equi-depth multidimensional histograms.,” 於 SIGMOD , 1988.
[6] Ioannidis, Y. E., & Poosala, V., “Histogram-based approximation of set-valued query-answers.,” 於 VLDB. Vol. 99., 1999.
[7] Poosala, V., Ganti, V., & Ioannidis, Y. E., “Approximate query answering using histograms.,” 於 IEEE Data Eng. Bull., 22(4), 5-14., 1999.
[8] Wang, H., & Sevcik, K. C., “A multi-dimensional histogram for selectivity estimation and fast approximate query answering.,” 於 Centre for Advanced Studies on Collaborative research (pp. 328-342)., 2003 .
[9] Peng, J., Zhang, D., Wang, J., & Pei, J., “AQP++: Connecting Approximate Query Processing With Aggregate Precomputation for Interactive Analytics.,” 於 SIGMOD, 2018.
[10] Galakatos, A., Crotty, A., Zgraggen, E., Binnig, C., & Kraska, T., “Revisiting reuse for approximate query processing.,” 於 VLDB , 2017.
[11] Stonebraker, M., Brown, P., Zhang, D., & Becla, J., “SciDB: A database management system for applications with complex analytics.,” 於 Computing in Science & Engineering, 2013.
[12] Shekelyan, M., Dignös, A., & Gamper, J., “Digithist: a histogram-based data summary with tight error bounds.,” 於 VLDB , 2017.
[13] Lee, J. H., Kim, D. H., & Chung, C. W., “Multi-dimensional selectivity estimation using compressed histogram information.,” 於 SIGMOD , 1999.
[14] Lee, H., Grosse, R., Ranganath, R., & Ng, A. Y., “Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations.,” Proceedings of the 26th annual international conference on machine learning., 2009.
[15] Zeiler, M. D., & Fergus, R., “Visualizing and understanding convolutional networks.,” ECCV, 2014.
[16] “MNIST Dataset,” [線上]. Available: http://yann.lecun.com/exdb/mnist/.
[17] “TensorFlow,” [線上]. Available: https://www.tensorflow.org/.
[18] “PostgreSQL,” [線上]. Available: https://www.postgresql.org/.
[19] “TLC Trip Record Data,” [線上]. Available: http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml.
(此全文未開放授權)
電子全文
中英文摘要
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *