帳號:guest(3.135.212.157)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):廖國宇
作者(外文):Liao, Kuo-Yu
論文名稱(中文):抽象學習者學習語義語言的數學理論
論文名稱(外文):A Mathematical Theory for Learning Semantic Languages by Abstract Learners
指導教授(中文):張正尚
指導教授(外文):Chang, Cheng-Shang
口試委員(中文):李端興
洪樂文
黃昱智
口試委員(外文):Lee, Duan-Shin
Hong, Yao-Win
Huang, Yu-Chih
學位類別:碩士
校院名稱:國立清華大學
系所名稱:通訊工程研究所
學號:111064522
出版年(民國):113
畢業學年度:112
語文別:中文
論文頁數:47
中文關鍵詞:大型語言模型能力的湧現低密度奇偶檢查碼不規則重複時隙式ALOHA密度演化語義通信
外文關鍵詞:Large language modelsemergence of capabilitiesLow-Density Parity Check codesIrregular Repetition Slotted ALOHAdensity evolutionsemantic communication
相關次數:
  • 推薦推薦:0
  • 點閱點閱:16
  • 評分評分:*****
  • 下載下載:0
  • 收藏收藏:0
近年来,大型語言模型(LLMs)顯示出當系統參數數量和訓練數據規模超過某些門檻時,能力(學到的技能)的出現現象。這種現象的確切機制尚未完全理解,仍然是積極研究的主題。受 Arora 和 Goyal 提出的語義語言建模技術技能-文本二分圖模型的啟發,我們開發了一種數學理論來解釋學到的技能的出現,並考慮到學習(或訓練)過程。我們的方法將技能-文本二分圖中的技能學習過程建模為低密度奇偶校驗(LDPC)碼和不規則重複時隙 ALOHA(IRSA)中的迭代解碼過程。通過密度演化分析,我們證明當訓練文本數量與技能數量的比率超過某個門檻時,學到的技能會出現。我們的分析還得出了測試誤差相對於該比率的縮放律。在訓練完成後,學到的技能的關聯也可以被獲取,以形成技能關聯圖。我們使用位滲透分析來推導技能關聯圖中存在巨型組件的條件。並且還可以擴展到具有技能層次結構的設置,在這種設置中,微調模型基於基礎模型構建。它也適用於多類技能和文本的設置。作為一個重要的應用,我們提出了一種語義壓縮的方法並討論其與語義通信的聯繫。
Recent advances in Large Language Models (LLMs) have demonstrated the emergence of capabilities (learned skills) when the number of system parameters and the size of training data surpass certain thresholds. The exact mechanisms behind such phenomena are not fully understood and remain a topic of active research. Inspired by the skill-text bipartite graph model proposed by Arora and Goyal for modeling semantic languages, we develop a mathematical theory to explain the emergence of learned skills, taking the learning (or training) process into account. Our approach models the learning process for skills in the skill-text bipartite graph as an iterative decoding process in Low-Density Parity Check (LDPC) codes and Irregular Repetition Slotted ALOHA (IRSA). Using density evolution analysis, we demonstrate the emergence of learned skills when the ratio of the number of training texts to the number of skills exceeds a certain threshold. Our analysis also yields a scaling law for testing errors relative to this ratio. Upon completion of the training, the association of learned skills can also be acquired to form a skill association graph. We use site percolation analysis to derive the conditions for the existence of a giant component in the skill association graph. Our analysis can also be extended to the setting with a hierarchy of skills, where a fine-tuned model is built upon a foundation model. It is also applicable to the setting with multiple classes of skills and texts. As an important application, we propose a method for semantic compression and discuss its connections to semantic communication.
Contents 1
List of Figures 3
1 Introduction 5
2 Semantic Languages 9
2.1 Definition of a Semantic Language . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Learning a Semantic Language by Sampling . . . . . . . . . . . . . . . . 10
3 Abstract Learners 12
3.1 1-Skill Learner . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.2 Poisson Learner . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4 Learning the association of skills 19
5 Hierarchy of skills 25
6 Extensions to multiple classes of skills and texts 30
6.1 Poisson learners with multiple classes of skills/texts . . . . . . . . . . . . 30
6.1.1 The ensemble of random bipartite graphs . . . . . . . . . . . . . . 30
6.1.2 Density evolution for Poisson learners with multiple classes of skills 32
6.2 Deterministic ψ-learners . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
7 Application to Semantic Compression and Communication 39
8 Conclusions and Future Directions 42
[1] OpenAI, “GPT-4 technical report,” https://cdn.openai.com/papers/gpt-4.pdf, 2023.
[2] S. Pichai and D. Hassabis, “Introducing gemini: our largest and most capable AI model,” Google. Retrieved December, 2023.
[3] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018.
[4] T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell et al., “Language models are few-shot learners,” Adv. Neural Inf. Process. Syst., vol. 33, pp. 1877–1901, 2020.
[5] J. Kaplan, S. McCandlish, T. Henighan, T. B. Brown, B. Chess, R. Child, S. Gray, A. Radford, J. Wu, and D. Amodei, “Scaling laws for neural language models,” arXiv preprint arXiv:2001.08361, 2020.
[6] J. Hoffmann, S. Borgeaud, A. Mensch, E. Buchatskaya, T. Cai, E. Rutherford, D. d. L. Casas, L. A. Hendricks, J. Welbl, A. Clark et al., “Training compute-optimal large language models,” arXiv preprint arXiv:2203.15556, 2022.
[7] J. Wei, Y. Tay, R. Bommasani, C. Raffel, B. Zoph, S. Borgeaud, D. Yogatama, M. Bosma, D. Zhou, D. Metzler et al., “Emergent abilities of large language models,” arXiv preprint arXiv:2206.07682, 2022.
[8] J. Wei, Y. Tay, and Q. V. Le, “Inverse scaling can become u-shaped,” arXiv preprint arXiv:2211.02011, 2022.
[9] M. Newman, Networks: an introduction. OUP Oxford, 2009.
[10] C.-S. Chang, “A simple explanation for the phase transition in large language models with list decoding,” arXiv preprint arXiv:2303.13112, 2023.
[11] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” Adv. Neural Inf. Process. Syst., vol. 30, 2017.
[12] H. Ramsauer, B. Sch¨afl, J. Lehner, P. Seidl, M. Widrich, T. Adler, L. Gruber, M. Holzleitner, M. Pavlovi´c, G. K. Sandve et al., “Hopfield networks is all you need,” arXiv preprint arXiv:2008.02217, 2020.
[13] S. Arora and A. Goyal, “A theory for emergence of complex skills in language models,” arXiv:2307.15936, 2023.
[14] R. Gallager, “Low-density parity-check codes,” IRE Trans. Inf. Theory, vol. 8, no. 1, pp. 21–28, 1962.
[15] M. A. Shokrollahi, “New sequences of linear time erasure codes approaching the channel capacity,” in International Symposium on Applied Algebra, Algebraic Algorithms, and Error-Correcting Codes, Springer, 1999, pp. 65–76.
[16] T. J. Richardson, M. A. Shokrollahi, and R. L. Urbanke, “Design of capacityapproaching irregular low-density parity-check codes,” IEEE Trans. Inf. Theory, vol. 47, no. 2, pp. 619–637, 2001.
[17] G. Liva, “Graph-based analysis and optimization of contention resolution diversity slotted ALOHA,” IEEE Trans. Commun., vol. 59, no. 2, pp. 477–487, 2011.
[18] K. R. Narayanan and H. D. Pfister, “Iterative collision resolution for slotted ALOHA: An optimal uncoordinated transmission policy,” in Proc. of International Symposium on Turbo Codes and Iterative Information Processing (ISTC), 2012, pp. 136–139.
[19] E. Paolini, G. Liva, and M. Chiani, “Random access on graphs: A survey and new results,” in Proc. of Asilomar Conference on Signals, Systems and Computers, 2012, pp. 1743–1747.
[20] D. Jakoveti´c, D. Bajovi´c, D. Vukobratovi´c, and V. Crnojevi´c, “Cooperative slotted ALOHA for multi-base station systems,” IEEE Trans. Commun., vol. 63, no. 4, pp. 1443–1456, 2015.
[21] C. Stefanovi´c and D. Vukobratovi´c, “Coded random access,” in ˇ Network Coding and Subspace Designs, Springer, 2018, pp. 339–359.
[22] Y.-H. Chiang, Y.-J. Lin, C.-S. Chang, and Y.-W. P. Hong, “Parallel decoding of irsa with noise,” in Proc. of IEEE International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC), 2022, pp. 320–326.
[23] M. Luby, M. Mitzenmacher, and M. A. Shokrollahi, “Analysis of random processes via and-or tree evaluation,” in SODA, vol. 98, 1998, pp. 364–373.
[24] M. Luby, M. Mitzenmacher, A. Shokrollah, and D. Spielman, “Analysis of low density codes and improved designs using irregular graphs,” in Proc. of the Annual ACM Symposium on Theory of Computing, 1998, pp. 249–258.
[25] T. J. Richardson and R. L. Urbanke, “The capacity of low-density parity-check codes under message-passing decoding,” IEEE Trans. Inf. Theory, vol. 47, no. 2, pp. 599–618, 2001.
[26] C.-M. Chang, Y.-J. Lin, C.-S. Chang, and D.-S. Lee, “On the stability regions of coded Poisson receivers with multiple classes of users and receivers,” IEEE/ACM
Trans. Netw., vol. 31, no. 1, pp. 234–247, 2022.
[27] C.-H. Yu, L. Huang, C.-S. Chang, and D.-S. Lee, “Poisson receivers: a probabilistic framework for analyzing coded random access,” IEEE/ACM Trans. Netw., vol. 29, no. 2, pp. 862–875, 2021.
[28] T.-H. Liu, C.-H. Yu, Y.-J. Lin, C.-M. Chang, C.-S. Chang, and D.-S. Lee, “ALOHA receivers: a network calculus approach for analyzing coded multiple access with SIC,” IEEE/ACM Trans. Netw., vol. 29, no. 2, pp. 862–875, 2021.
[29] E. Paolini, G. Liva, and M. Chiani, “Graph-based random access for the collision channel without feedback: Capacity bound,” in Proc. of IEEE Global Communications Conference, 2011.
[30] O. Ordentlich and Y. Polyanskiy, “Low complexity schemes for the random access Gaussian channel,” in Proc. of IEEE Int. Symp. Inf. Theory (ISIT), 2017, pp. 2528–2532.
[31] W. Weaver, “Recent contributions to the mathematical theory of communication,” ETC: a review of general semantics, pp. 261–281, 1953.
[32] C. E. Shannon, “Prediction and entropy of printed english,” The Bell System Technical Journal, vol. 30, no. 1, pp. 50–64, 1951.
[33] H. Xie, Z. Qin, G. Y. Li, and B.-H. Juang, “Deep learning enabled semantic communication systems,” IEEE Trans. Signal Process., vol. 69, pp. 2663–2675, 2021.
[34] Q. Zhou, R. Li, Z. Zhao, C. Peng, and H. Zhang, “Semantic communication with adaptive universal transformer,” IEEE Wireless Commun. Lett., vol. 11, no. 3, pp.453–457, 2022.
[35] Q. Hu, G. Zhang, Z. Qin, Y. Cai, G. Yu, and G. Y. Li, “Robust semantic communications with masked VQ-VAE enabled codebook,” IEEE Trans. Wireless Commun.,pp. 1–1, 2023.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *