作者(外文):Chang, Chan-Jung
論文名稱(中文):ECS2: 利用直接且平行的I/O 路徑提升使用 GPU 加速糾刪碼的儲存系統效能
論文名稱(外文):ECS2: A Fast Erasure Coding Library for GPU-Accelerated Storage Systems With Parallel & Direct IO
指導教授(外文):Chou, Jerry
口試委員(外文):Lee, Che-Rung
Lai, Kuan-Chou
外文關鍵詞:Storage systemErasure CodeReliabilityPerformanceParallel I/O
cloud storage 和HDFS。然而,採用糾刪碼的代價即是較高的計算複雜度。許
多研究指出,糾刪碼的計算可透過GPU 大幅度加速,這也同時導致新的效
能瓶頸轉移到儲存裝置與GPU 之間的資料傳輸。在本研究中,我們設計並
實作了ECS2。ECS2 是個透過GPU 加速,快速的糾刪碼函式庫。使用者可
介面。透過Nvidia GPU 提供的最新GPUDirect 技術,本函式庫可使I/O 路經
省略並繞過CPU 和主記憶體,以減少計算以及I/O 的時間花費。基於真實的
儲存系統追蹤,我們透過合成的I/O 追蹤,驗證了I/O 延遲可透過GPUDirect
技術降低10% ∼ 20% 的時間,且整體的通過量可提高至70%。
As data volume keeps increasing at a rapid rate, there is an urgent need for large,
reliable, and cost-effective storage systems. Erasure coding has drawn increasing
attention because of its ability to ensure data reliability with higher storage efficiency,
and it has been widely adopted in many distributed and large-scale storage
systems, such as Azure cloud storage and HDFS. However, the storage efficiency
of erasure code comes at the price of higher computing complexity. While many
studies have shown the coding computations can be significantly accelerated using
GPU, the overhead of data transfer between storage devices and GPUs become a
new performance bottleneck. In this work, we designed and implemented, ECS2,
a fast erasure coding library on GPU-accelerated storage to let users enhance their
data protection with transparent IO performance and storage system like programming
interface. By taking advantage of the latest GPUDirect technology supported
on Nvidia GPU, our library is able to bypass CPU and host memory copy from the
IO path, so that both the computing and IO overhead from coding can be minimized.
Using synthetic IO workload based on real storage system trace, we show that the IO
latency can be reduced by 10% ∼ 20% with GPUDirect technology, and the overall
IO throughput of a storage system can be improved up to 70%.
1 Introduction 1
2 Related Works 4
3 Approach 5
3.1 ECS2 System Architecture . . . . . . . . . . . . . . . . . . . . . . 5
3.2 Performance Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 7
4 Implementations 10
4.1 Direct Memory Copy Technique . . . . . . . . . . . . . . . . . . . 10
4.2 GPU Accelerated Erasure Coding . . . . . . . . . . . . . . . . . . 12
4.3 ECS2 Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
5 Experiments 16
5.1 Testbed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
5.2 Performance Analysis on IO Behavior . . . . . . . . . . . . . . . . 17
5.3 Performance Analysis on Erasure Code Configuration . . . . . . . . 19
5.4 Performance Analysis on System Architecture . . . . . . . . . . . . 20
5.5 Performance Analysis on Real Workload . . . . . . . . . . . . . . . 22
6 Conclusions 25
References 26
