作者(外文):Lien, Yi-Han
論文名稱(外文):Interlace-aware Data Management Strategy for Performance Enhancement of IMR-based Hard-disk Drives
指導教授(外文):Shih, Wei-Kuan
口試委員(外文):Chou, Chi-Yuan
Pang, Ai-Chun
Hsu, Fu-Hau
Chang, Yuan-Hao
Hsieh, Jen-Wei
Chen, Yu-Fang
Liang, Yu-Pei
Chen, Yen-Ting
外文關鍵詞:Interlaced Magnetic RecordingHard-disk DriveFile systemSelf-balanced treeB^epsilon-treeData management strategy
隨著雲端服務、大數據、機器學習等新興大規模應用的快速成長,近年來對大容量、高性價比儲存設備的需求不斷增加。其中,硬碟(HDDs)為一種代表性的低成本儲存設備,為了追求更高的硬碟容量,許多研究者透過不同的磁軌排列設計(track layout)來提高單位面積可儲存的資料量。其中,隔行磁紀錄(Interlaced Magnetic Recording, IMR)將磁軌分成頂部與底部磁軌,並且將兩個頂部磁軌部分交疊於每個底部磁軌來增加資料密度。然而,這種設計會導致更新底部磁軌時覆蓋到鄰近的兩個頂部磁軌,造成資料遺失。因此,在更新一個底部磁軌時,需將兩個鄰近的頂部磁軌進行備份,待底部磁軌完成更新後,再將兩個頂部磁軌的資料寫回。如此一來,即使只有一個底部磁軌的資料要做更新,卻需要多寫兩個頂部磁軌的資料,這種現象也稱作寫入放大,導致系統效能下降。為了解決此問題,有許多研究致力於提出減輕寫入放大的方法,但現有的方法皆為裝置層級的解決方案(device-level solutions),因無考慮儲存之資料的特性,其改善效果有限。因此,此論文希望依照不同的資料特性來做空間分配,突破既有方法的效能提升限制。此論文分為兩個部分,第一個部分以檔案系統的角度出發,觀察到不同的檔案類型有不同的更新頻率,進而提出一個空間分配的方法來減少寫入放大,同時因感知檔案系統的特性而將容易一起存取的資料放置於鄰近的位置,有效的減少了尋道時間(seek time)。在針對檔案資料的空間分配提出一個解決方案後,我們接著考慮到了索引結構的空間分配。在大規模的檔案系統或資料庫中,索引結構廣泛地被使用來提高尋找資料的速度。其中,B^epsilon-tree 為 B-tree 和 B^+-tree 的擴展,B^epsilon-tree 和 B^+-tree 相似,只有在葉子節點儲存鍵值對(key-value pair),而中間節點只有存鍵,以及一個特殊的暫存(buffer)設計來減少頻繁平衡樹的成本,近年來越來越受到關注。B^epsilon-tree 透過將插入、刪除、訪問等存取請求壓縮成訊息(message),並將訊息放入根目錄的暫存中,來達成一次的存取請求。當根目錄的暫存達到空間上限,便會將訊息向下倒入(flush)子節點的暫存中,直到子節點為葉子節點才會將訊息打開,執行相對應的存取請求。也就是說,當訊息被倒入葉子節點時,才有可能會觸發樹平衡。然而,因為暫存中的訊息只依照時間順序排隊,在決定要倒入哪一個子節點時,需要遍歷整個暫存以計算每個子節點擁有的訊息數量。這不僅會造成嚴重的讀取成本,在更新父節點的暫存時也有可能因為訊息分散,而要更新過多的空間。此外,我們也發現暫存屬於經常更新的資料,但有可能因沒有妥善規劃空間分配而被放在隔行磁紀錄的底部軌道中,而這必然會造成嚴重的寫入放大。因此,此論文的第二個部分即為重新設計暫存中管理訊息的方法,並且透過感知每個數節點的特性來對其做合適的空間分配,來達到減少寫入放大並提升讀寫的效能之目的。
Interlaced Magnetic Recording (IMR) is an emerging recording technology for hard-disk drives (HDDs) that provides larger storage capacity at a lower cost. By partially overlapping (interlacing) each bottom track with two adjacent top tracks, IMR-based HDDs successfully increase the data density while incurring some hardware write constraints. To update each bottom track, the data on two adjacent top tracks must be read and rewritten to avoid losing their valid data, resulting in additional overhead for performing read-modify-write (RMW) operations. Therefore, researchers have proposed various data management schemes to mitigate such overhead in recent years, aiming at improving the write performance. However, these designs have not taken into account the data characteristics of the file system, which is a crucial layer of operating systems for storing/retrieving data into/from HDDs. Consequently, the write performance improvement is limited due to the unawareness of spatial locality and hotness of data. The dissertation is divided into two parts: the first part proposes a file-system-aware data management scheme called FSIMR to improve system write performance. Noticing that data of the same directory may have higher spatial locality and are mostly updated at the same time, FSIMR logically partitions the IMR-based HDD into fixed-sized zones; data belonging to the same directory will be arranged to one zone to reduce the time of seeking to-be-updated data (seek time). Furthermore, cold data within a zone are arranged to bottom tracks and updated in an out-of-place manner to eliminate write amplification. After proposing a solution for file data space allocation, the dissertation considers space allocation for index structures. In large-scale file systems or databases, index structures are widely used to enhance data retrieval speed. Among these, the B^epsilon-tree, an extension of the B-tree and B^+-tree, has gained attention for its use of specialized buffers to reduce the costs of frequent tree balancing. Similar to B^+-tree, B^epsilon-tree stores the key-value pairs in the leaf nodes while storing only keys in the internal nodes. To be mode precise, the internal nodes are divided into pivot areas (for storing keys) and buffer areas. The B^epsilon-tree encodes access requests into messages, and adds them to the root node's buffer to complete one access request. Once the root node's buffer reaches its capacity limit, the messages are flushed to one of its sub-node until they reach the leaf node, where the messages are then opened to execute the corresponding requests. In other words, the tree balance routine only occurs when the message reaches the leaf node, which can greatly improve the write performance. However, messages in the buffer are only sorted by their arrival time. As a result, the entire buffer must be traversed during a flush routine to find the sub-node to be flushed. This design not only incurs significant read overhead but also involves more than necessary buffer updates when the messages in the parent buffer (the one flushing messages) are scattered. Moreover, it can be observed that the buffer is the most frequently updated data, which can cause significant write amplification if we do not strategically place the data, i.e., in the bottom tracks of the IMR-based HDD. Therefore, the second part of this dissertation proposes a B^epsilon-tree-aware data management strategy for IMR-based HDD to decrease the write amplification issue and increase read and write performance. This approach aims to redesign the management within the buffers and allocate space according to the update characteristics of different types of nodes.
Abstract (Chinese) I
Abstract III
Acknowledgements (Chinese) V
Contents VI
List of Figures VIII
List of Tables X
1 Introduction 1
2 Background 9
2.1 IMR-based Hard-disk Drives . . . . . . . . . . . . . . . . . . . . . . 9
2.2 File System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3 Bϵ-tree Index Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3 Interlace-aware User Data Management for IMR-based Hard-disk
Drives. . . . . . . . . . . . . . . . . . . . . . 14
3.1 Observation and Motivation . . . . . . . . . . . . . . . . . . . . . . 14
3.2 File-system-aware Data Management: FSIMR . . . . . . . . . . . . 16
3.2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.2.2 Directory-based Zone Allocation . . . . . . . . . . . . . . . . 19
3.2.3 Zone Management . . . . . . . . . . . . . . . . . . . . . . . 21
3.2.4 Directory-based Garbage Collection . . . . . . . . . . . . . . 24
3.3 Overhead Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.3.1 Space Utilization . . . . . . . . . . . . . . . . . . . . . . . . 26
3.3.2 Time Cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.4 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . 28
3.4.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . 28
3.4.2 Experimental Results . . . . . . . . . . . . . . . . . . . . . . 30
4 Interlace-aware Index Data Management for IMR-based Harddisk
Drives 37
4.1 Observation and Motivation . . . . . . . . . . . . . . . . . . . . . . 37
4.2 Bϵ-tree-aware Data Management . . . . . . . . . . . . . . . . . . . 41
4.2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.2.2 Partitioned Buffer Scheme . . . . . . . . . . . . . . . . . . . 43
4.2.3 Zone Management . . . . . . . . . . . . . . . . . . . . . . . 45
4.3 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . 47
4.3.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . 47
4.3.2 Experimental Results . . . . . . . . . . . . . . . . . . . . . . 48
5 Conclusion 52
Bibliography 54
6 Publication List 60
Publication List 60
