帳號:guest(3.139.103.57)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):羅偉恆
作者(外文):Lo, Wei Hen
論文名稱(中文):在電路以及架構層級對良率以及效能上優化的設計
論文名稱(外文):Yield Improvement and High-Performance Design in Circuit Level and Architecture Level
指導教授(中文):黃婷婷
指導教授(外文):Hwang, TingTing
口試委員(中文):金仲達
黃俊達
江蕙如
王廷基
王俊堯
口試委員(外文):King, Chung-Ta
JHuang, uinn Dar
Jiang, Iris Hui-Ru
Wang, Ting-Chi
Wang, Chun-Yao
學位類別:博士
校院名稱:國立清華大學
系所名稱:資訊工程學系
學號:100062813
出版年(民國):104
畢業學年度:104
語文別:英文
論文頁數:106
中文關鍵詞:測試架構冗餘矽穿通道資料搬移
外文關鍵詞:DFTArchitectureScan ChainRedundant TSVData Migration
相關次數:
  • 推薦推薦:0
  • 點閱點閱:70
  • 評分評分:*****
  • 下載下載:12
  • 收藏收藏:0
隨著製程進步,良率以及效能已經變得越來越重要。為了增加系統的良率以及效能,我們分別針對電路層級以及系統層級三個不同問題進行研究,分別為增進掃描鍊錯誤偵測率,3D-IC 的錯誤容忍問題,以及資料平行應用程式在記憶體內的干擾問題。
在現代製程科技中,錯誤的掃描鍊已經可能造成50%的良率損失,我們發
現掃描鍊偵測技術的有效與否不只取決於電路的邏輯依賴且包含了掃描元彼此的控制程度。所以,在本篇論文中,我們提出了一個利用電路架構來分割並重組掃描鍊的方法來增進偵測率。
另一方面,廣域的連結線所造成的延遲已經是效能與耗能的瓶頸所在,3D-IC的出現提供了許多效能以及耗能上的優化。但製造與疊加晶片的過程中可能會因表層粗細不均以及雜質等問題而導致矽穿通道的損壞。許多研究更進一步指出損壞的矽穿通道常會群聚出現。我們針對群聚出現的壞損矽穿通道設計了一簡單又能維持良率的環狀冗餘矽穿通道架構。
在系統層級方面,許多資料平行應用程式會使用共享的記憶體區塊來執行運算,平行的執行緒會在記憶體中體驗到記憶庫衝突,我們設計了一套動態搬移資料區塊的系統架構,可以有效降低記憶體中記憶庫衝突,有效提升系統效能。
With the advances of VLSI design technology, yield loss and performance have become more and more important. To improve the yield of circuits and the performance of systems, we targeted three different problems in circuit level and in system level, which are scan chain diagnosis problem, fault tolerance problem in 3D-ICs, and memory interference problem for data-parallel multi-threaded applications. First, to increase the diagnosis resolution of the scan chains, a scan chain partitioning
algorithm and a scan chain reordering algorithm have been proposed. In modern technology, scan design can be used to detect combinational failures in a circuit and improve the testability of a circuit. However, the defects of scan chains themselves are also critical for the yield loss of the chips. Since scan chains can take up a large portion of the chip area, faulty scan chains can be responsible for up to 50% of yield loss [1]. We observe that the effectiveness of scan chain diagnosis methods depend on not only logic dependency but also the controllability between scan flip-flops. Hence, in this dissertation, we propose a scan chain partitioning algorithm to increase the detectable values of scan cells in the faulty scan chain and a scan chain reordering algorithm to reduce the range of suspect faulty scan cells and to minimize the routing overhead. The experimental results show that our method can reduce the number of suspect scan cells from 378-31 to at most 3 for most cases of ITC’99 benchmarks.

Second, a ring-based redundant TSV architecture is proposed to improve the yield of 3D-ICs. The fabrication and bonding of TSVs may fail because of many factors, such as winding level of the thinned wafers, the surface roughness and cleaness of silicon dies, and bonding technology. In addition, faulty TSVs tend to cluster together because of imperfect bonding technology. To resolve this problem, the router-based redundant TSV architecture was proposed. Their method enables faulty TSVs to be repaired by redundant TSVs that are farther apart. In this dissertation, a new hardware efficient redundant TSV architecture for clustered fault is proposed. Simulation results show that for a given number of TSVs (8 × 8), TSV failure rate (1%), careful selection of grouping ratios, our design achieves 58.9% area reduction of MUXes per signal, 54.6% total area reduction per signal, and 50.54% total wire length reduction while the yield of our ring-based redundant TSV architectures can still maintain 98.47% to 99.00% as compared with router-based design [2]. The minimum shifting length of our ring-based redundant TSV architecture is at most 1 which guarantees the minimum timing overhead of each signal. The maximum extra shifting latency of our ring-based design is reduced 74.7% compared to that of router-based design when the number of faulty TSVs is set to 8.

Finally, a dynamic data migration method to eliminate memory interference of data parallel multi-threaded applications in multi-cores system has been proposed. Data parallelism is a
common parallel programming model that performs operations on a data set which is often regularly structured in an array. In other words, many thread may access the same shared data set. Thus, when the number of threads increases, the probability of memory interference in memory also increases. To address this issue, we provide a new software/hardware cooperative dynamic data migration method by exploiting the update-and-reuse property. Experimental evaluation in a 16-core x86 8-memory banks system shows that our method can improve the system performance by 13.2% compared to traditional OS page coloring method [3] and 9% compared to parallel application memory scheduling method [4].
1 Introduction 1
1.1 Scope and Objectives
1.2 Overview of Dissertation
2 Utilizing Circuit Structure for Scan Chain Diagnosis
2.1 Related Work
2.2 Motivation
2.2.1 Motivation for Partitioning Scan Chain
2.2.2 Motivation for Ordering Scan Flop-Flops
2.2.3 Overview of Diagnosis Design Flow
2.3 Scan Chain Partitioning Algorithm
2.3.1 Construction of Mutual Controllability Graph
2.3.2 Partition of Mutual Controllability Graph
2.4 Scan Chain Reordering Algorithm
2.4.1 Bipartite Matching for Initial Solution
2.4.2 Refinement of Chaining Ordering
2.5 Experimental Results
2.5.1 Experimental Environment and Design Flow
2.5.2 Reordering without Partitioning
2.5.3 Partitioning Effect
2.5.4 Comparison of Maximum Range of Suspect Faulty Scan Cells
2.5.5 Analysis Result with Diagnosis Tool Tessent
2.5.6 Comparison of Transition Fault Coverage and Chip Performance
3 Architecture of Ring-based Redundant TSV for Clustered Faults
3.1 RelatedWork
3.2 Motivation
3.3 Proposed TSV Redundancy Architecture
3.3.1 Ring-based TSV Redundancy Architecture Design
3.3.2 Analysis of Nonrepairable Defect Patterns and Recovery Rate
3.3.3 Repairing Algorithm
3.4 Other Design Issues
3.4.1 Placement and Routing of Different TSV Redundancy Architectures .
3.4.2 Bidirectional TSV
3.5 Experimental Results
3.5.1 Comparison of Hardware Components
3.5.2 Comparison of Total Area and Wire Length After Placement and Routing
3.5.3 Recovery Rate Analysis
3.5.4 Shifting Length and Latency Analysis
4 Dynamic Data Migration to Eliminate Bank-level Interference for Data Parallel Applications in Multicore Systems
4.1 Related Work
4.2 Motivation
4.2.1 Isolated Load of Memory
4.2.2 Motivation of Our Dynamic Data Migration Method
4.3 Methodology
4.3.1 Overview of System Flow
4.3.2 Updated-and-Reused Aware Page Allocation Policy in OS
4.3.3 Migrate-On-Write
4.3.4 Memory Controller for Bank-level Interference Elimination
4.4 Experimental Results
4.4.1 Simulation Environment
4.4.2 Comparison Results
4.4.3 Effect of the Number of Entries in Mapping Table
5 Conclusions
[1] S. Kasapi, J. Liao, B. Cory, “Laser Voltage Imaging (LVI) for ATPG Scan Chain Diagnosis on 40nm CMOS,” LSI Testing Symposium, Osaka, Japan, pp. 1422-1426, November 2010.
[2] L. Jiang, Q. Xu, B. Eklow, “On effective TSV repair for 3D-stacked ICs,” DATE’12, pp. 793-798, March 2012.
[3] L. Liu, Z. Cui, M. Xing, Y. Bao, M. Chen, C. Wu, “A software memory partition approach for eliminating bank-level interference in multicore,” PACT’12, pp. 367-376, September 2012.
[4] E. Ebrahimi, R. Miftakhutdinov, C. Fallin, C. J. Lee, O. Mutlu, and Y. N. Patt, “Parallel Application Memory Scheduling,” MICRO’11, pp. 362-373, December 2011.
[5] S. Kundu, “Diagnosing Scan Chain Faults,” IEEE TVLSI, Vol. 2, No.4, pp. 512-516, December 1994.
[6] K. Stanley, ”High Accuracy Flush and Scan Software Diagnostic,” Proc. IEEE YOT 2000, pp 56 - 62, Oct. 2000.
[7] R. Guo, S. Venkataraman, “A Technique For Fault Diagnosis of Defects in Scan Chains,”ITC, pp. 268-277, 2001.
[8] Y. Huang, W.-T. Cheng, S.M. Reddy, C.-J. Hsieh, Y.-T. Hung,“Statistical Diagnosis for Intermittent Scan Chain Hold Time Fault”, ITC, pp.319-328, 2003
[9] J. S. Yang, S. Huang, “Quick Scan Chain Diagnosis Using Signal Profiling,” ICCD, 2004
[10] Y. Huang, W.T. Cheng and G. Crowell. “Using Fault Model Relaxation to Diagnose Real Scan Chain Defects,” ASP-DAC , pp. 1176-1179 ,2005
[11] A. Crouch,“Debugging and Diagnosing Scan Chains.”, EDFAS, pp. 16-24, Feb. 2005.
[12] J. Li, “Diagnosis of Single Stuck-at Faults and Multiple Timing Faults in Scan Chains,” IEEE TVLSI, Vol.13, No.6, pp. 708-718, June 2005
[13] J. Li,“Diagnosis of Multiple Hold-Time and Setup-Time Faults in Scan Chains”, IEEE TC, Vol. 54, No. 11. pp 1467-1472, Nov. 2005
[14] R. Guo, S. Venkataraman, “An algorithmic technique for diagnosis of faulty scan chains”, IEEE Trans. on CAD, pp. 1861-1868, Sept. 2006
[15] Y. Huang, W.-T. Cheng, N. Tamarapalli, J. Rajski, R. Klimgerberg, W. Hsu and Y.-S. Chen, “Diagnosis with Limited Failure Information”, ITC, paper 22.2, 2006
[16] R. Guo, Y. Huang, W.-T Cheng, “A complete test set to diagnose scan chain failures,”ITC, pp.1-10, Oct. 2007
[17] J. Hirase, N. Shindou and K. Akahori, “Scan Chain Diagnosis using IDDQ Current Measurement”, Proc. ATS, pp. 153-157, 1999.
[18] R. Agarwal, W. Zhang, P. Limaye, R. Labie, B. Dimcic, A. Phommahaxay, and P. Soussan, “Cu/Sn Microbumps Interconnect for 3D TSV Chip Stacking”, Proceedings of Electronic Components and Technology Conference (ECTC’10), pp. 858-863, 2010.
[19] N. Lin, J. Miao, P. Dixit, “Void formation over limiting current density and impurity analysis of TSV fabricated by constant-current pulse-reverse modulation,” In Microelectronics Reliability, 2013.
[20] J.U. Knickerbocker, et al. “Three-dimensional silicon integration. IBM Journal of Research and Development”, 52(6):553569, November 2008.
[21] J. Schafer, F. Policastri, R. Mcnulty, “Partner SRLs for Improved Shift Register Diagnostics,”Proc. VTS, pp. 198-201, 1992
[22] S. Edirisooriya, G. Edirisooriya, “Diagnosis of Scan Path Failures,”, Proc. VTS, pp. 250-255, 1995.
[23] K. De, A. Gunda, “Failure Analysis for Full-Scan Circuits”, Proc. ITC, pp. 636-645, Mar. 1995.
[24] S. Narayananan, A. Das, “An Efficient Scheme to Diagnose Scan Chains,” Proc. ITC, pp. 704-713, 1997.
[25] Y.Wu, “Diagnosis of Scan Chain Failures,”, Proc. Int’l Symp. on Defect and Fault Tolerance in VLSI Systems, pp. 217-222-10, 1998.
[26] P. Song, F. Motika, D. Knebel, R. Rizzolo, M. Kusko, J. Lee and M. McManus, “Diagnostic techniques for the IBM S/390 600MHz G5 Microprocessor”, Proc. ITC, pp. 1073-1082, 1999.
[27] C,L, Kong, M.R. Islam, “Diagnosis of Multiple Scan Chain Faults,” International Symposium for Testing and Failure Analysis, pp.510-516. November 2005.
[28] F. Motika, P. Nigh, P. Song, “Stuck-at fault scan chain diagnostic method” US Pat 7010735, March 7, 2006.
[29] A. Anderson, T. M. Burdine, D. O. Forlenza, O. P. Forlenza, W. J. Hurley, P. T. Tran,“Method, apparatus, and computer program product for implementing deterministic
based broken scan chain diagnostics”, US Pat 20050229057, July 1. 2008
[30] J. Ye, Y. Huang, Y. Hu, W. Cheng, R. Guo, L. Lai, et al., Diagnosis and layout aware (DLA) scan chain stitching, in Proc. IEEE ITC, Sep. 2013, pp. 110.
[31] L. Goldstein, “Controllability/observability analysis of digital circuits”, ISCAS, pp.685-693, 1979
[32] Kernighan B. W. Lin Shen,“An efficient heuristic procedure for partitioning graphs,”Bell Systems Technical Journal 49, pp.291-307, 1970
[33] Tessent Diagnosis, Mentor Graphics, Wilsonville, OR, USA, 2012.
[34] “http://www.cerc.utexas.edu/itc99/benchmarks/bench.html”, ITC99 benchmarks, 2009.
[35] “Design Compiler”, Synopsys, 2010.
[36] “SoC Encounter”, Cadence, 2012.
[37] R. Patti, “Three-Dimensional Integrated Circuits and the Future of System-on-Chip Designs,” Proc. of the IEEE, vol. 84, no. 6, June 2006.
[38] A. W. Topol, J. D. C. La Tulipe, L. Shi, et al., “Three Dimensional Integrated Circuits,”IBM Journal of Research and Development, vol. 50, no. 4/5, pp. 491-506, July/Sepetember 2006.
[39] L. Jiang, Y. Liu, L. Duan, Y. Xie, and Q. Xu, “Modeling TSV open defects in 3D-stacked DRAM,” ITC’10, pp. 1-9, November 2010.
[40] N. Lin, J. Miao, P. Dixit, “Void formation over limiting current density and impurity analysis of TSV fabricated by constant-current pulse-reverse modulation,” Microelectronics
Reliability, vol. 53, pp. 1943-1953, 2013.
[41] K. H. Lu, S. Ryu, Q. Zhao, X. Zhang, J. Im, R. Huang, and P. S. Ho, “Thermal Stress Induced Delamination of Through Silicon Vias in 3-D Interconnects,” ECTC’10, pp. 40-45, June 2010.
[42] U. Kang, et al. “8 Gb 3-D DDR3 DRAM using through-silicon-via technology. IEEE Journal of Solid-State Circuits”, 45(1):111119, Jan. 2010.
[43] A. Hsieh, T. Hwang, M. Chan, M. Tsai, C. Tseng, H. Li, “TSV Redundancy: Architecture and Design Issues in 3D IC,” DATE’10, pp. 166-171, March 2010.
[44] I. Loi, et al. “A low-overhead fault tolerance scheme for TSV-based 3D network on chip links”, In Proc. Intl Conf. on Computer-Aided Design, pp. 598602, 2008.
[45] I. Koren and Z. Koren, “Defect tolerance in VLSI circuits: techniques and yield analysis,”Proc. of the IEEE, 86(9):18191838, 1998.
[46] Murphy, B.T., “Cost-Size Optima of Monolithic Integrated Circuits,” Proc. IEEE no. 12 vol. 52, pp. 1537–1545, 1964
[47] B. C. Arnold, “Pareto Distributions,” International Co-operative Publishing House, 1983
[48] B. J. Ho, B. Nader, “A Generic Traffic Model for On-Chip Interconnection Networks,”The First International Workshop on Networks-on-Chip Architectures, 2009
[49] Y. Kim, D. Han, O. Mutlu, M. Harchol-Balter, “ATLAS: A scalable and high performance scheduling algorithm for multiple memory controllers,” HPCA’10, pp. 1-12, January 2010.
[50] A.V. Goldberg and S. Rao. “Beyond the flow decomposition barrier,” Journal of the ACM, 45(5):783797, 1998.
[51] Nangate, “The Nangate 45nm Open Cell Library,” http://www.nangate.com.
[52] “http://www.algorithmic-solutions.com”, LEDA Library
[53] L. Huaguo1, C. Hao, L. Yang, W. Wei, C. Tian and X. Hui, “Optimized Mid-bond Order for 3D-Stacked ICs Considering Failed Bonding,” VLSI-DAT’14, pp. 1-4, April 2014.
[54] V. Bandishti, I. Pananilath, and U. Bondhugula. “Tiling Stencil Computations to Maximize Parallelism,” SC’12, pp. 1-11, November 2012.
[55] M. M. Baskaran, N. Vydyanathan, U. K. Bondhugula, J. Ramanujam, A. Rountev, P. Sadayappan, “The Compiler-Assisted Dynamic Scheduling for Effective Parallelization of Loop Nests on Multicore Processors,” PPoPP’09, pp. 219-228, April 2009.
[56] Y. Kim, D. Han, O. Mutlu, M. Harchol-balter, “ATLAS: A Scalable and High-
Performance Scheduling Algorithm for Multiple Memory Controllers,” HPCA’12, pp. 1-12, February 2012.
[57] Y. Kim, M. Papamichael, O. Mutlu, M. Harchol-Balter, “Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior,” MICRO’10, pp. 65-76, December 2010.
[58] O. Mutlu, T. Moscibroda, “Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems,” ISCA ’08, pp. 63-74, June 2008.
[59] S. P. Muralidhara et al., “Reducing memory interference in multicore systems via application-aware memory channel partitioning,” MICRO ’11, pp. 374-385, June 2011.
[60] S. Rixner et al., “Memory access scheduling,” ISCA ’00, May 2000.
[61] C. Bienia, S. Kumar, J. Pal Singh, K. Li, “The PARSEC Benchmark Suite: Characterization and Architectural Implications,” PACT ’08, September, 2008.
[62] JEDEC. Standard No. 21-C. Annex K: Serial Presence Detect (SPD) for DDR3 SDRAM Modules, 2011.
[63] M. Awasthi et al., “Handling the problems and opportunities posed by multiple on-chip
memory controllers,” PACT’10, September, 2010.
[64] J. Demme, S. Sethumadhavan, “Rapid Identification of Architectural Bottlenecks via Precise Event Counting.” ISCA’11, pp. 353-364, June 2011.
[65] P. Magnusson et al. “Simics: A full system simulation platform.” Computer, 35(2), Feb 2002.
[66] X. Tang, “Diagnosis of VLSI circuit defects: defects in scan chain and circuit logic,” dissertation, University of Iowa, 2010
[67] M. M. K. Martin et al. “Multifacets general execution-driven multiprocessor simulator (GEMS)”
[68] U. Bondhugula, J. Ramanujam, and P. Sadayappan. “Pluto: A practical and fully automatic polyhedral parallelizer and locality optimizer.” Technical Report OSU-CISRC-10/07-TR70, The Ohio State University, Oct. 2007.
[69] M. Christen, O. Schenk, H. Burkhart, “PATUS: A Code Generation and Autotuning Framework for Parallel Iterative Stencil Computations on Modern Microarchitectures,”
IPDPS ’11, pp. 676-687, May 2011.
[70] Y. Zhao, S. Khursheed, B. M. Al-Hashimi, “Cost-Effective TSV Grouping for Yield Improvement of 3D-ICs,” ATS ’11, pp. 201-206, November 2011.
[71] U. Kang, et al. “8 Gb 3-D DDR3 DRAM using through-silicon-via technology,” IEEE Journal of Solid-State Circuits, 45(1):111119, January. 2010.
[72] D. H. Kim, S. Kim, S. K. Lin, “Impact of nano-scale through-silicon vias on the quality of today and future 3D IC designs”, In Proc. SLiP, pp. 1-8, June 2011.
[73] C. H. Stapper, F. M. Armstrong, and K. Saji, “Integrated circuit yield statistics,” Proc. IEEE, vol. 71, pp.453 - 470, 1983.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *