作者(外文):Chen, Hung-Jen
論文名稱(外文):Mitigating Forgetting in Online Continual Learning via Instance-Aware Parameterization
指導教授(外文):Sun, Min
口試委員(外文):Lee, Chun-Yi
Chen, Chu-Song
外文關鍵詞:Deep LearningNeural Architecture SearchContinual LearningOnline Learning
在線持續學習(Online continual learning)是一個需要機器學習模型從連續的數據流中學習,並且無法重新訪問以前遇到的數據資料的困難情境。模型需要解決任務級(task-level)的遺忘問題,以及同一任務中的實例級別(instance-level)的遺忘問題。為了克服這種情況,我們採用神經網絡中的“實例感知”(Instance aware),其中對於每個數據實例,將透過由控制器(controller)從元圖(meta-graph)搜索到的網路路徑做預測。此外,為了保存我們從過去的實例中學到的知識,我們提出了一種保護機制:若這些實例與過往的不相似,將會限制該實例的梯度更新以防止覆蓋與其不相似實例所經過的路徑。反之,如果傳入的實例與以前的實例具有相似之處,則鼓勵微調(Fine tune)以往相似實例的路徑。選擇路徑的機制是由控制器根據根據實例相似性決定的。實驗結果表明,於在線持續學習的情境下,所提出的方法在CIFAR10,CIFAR100和TinyImageNet等數據集勝過當前表現最好技術。此外,該方法也有測試在更貼近現實的情境,即當任務的界線是模糊時,也勝過了表現最好技術。
Online continual learning is a challenging scenario where a model needs to learn from a continuous stream of data without revisiting any previously encountered data instances. The phenomenon of catastrophic forgetting is worsened since the model should not only address the forgetting at the task-level but also at the data instance-level within the same task. To mitigate this, we leverage the concept of "instance awareness" in the neural network, where each data instance is classified by a path in the network searched by the controller from a meta-graph. To preserve the knowledge we learn from previous instances, we proposed a method to protect the path by restricting the gradient updates of one instance from overriding past updates calculated from previous instances if these instances are not similar. On the other hand, it also encourages fine-tuning the path if the incoming instance shares the similarity with previous instances. The mechanism of selecting paths according to instances similarity is naturally determined by the controller, which is compact and online updated. Experimental results show that the proposed method outperforms state-of-the-arts in online continual learning. Furthermore, the proposed method is evaluated against a realistic setting where the boundaries between tasks are blurred. Experimental results confirm that the proposed method outperforms the state-of-the-arts on CIFAR-10, CIFAR-100, and Tiny-ImageNet.
1 Introduction--------------------------------------------1
2 Related Work--------------------------------------------5
3 Method--------------------------------------------------9
3.1 Meta Graph-Controller Framework ----------------------9
3.2 Training the Controller-------------------------------10
3.3 Training the Meta-Graph-------------------------------11
3.4 Encouraging Explorations------------------------------13
4 Experiments---------------------------------------------15
4.1 Experiment Setup--------------------------------------15
4.2 Baselines---------------------------------------------16
4.3 Quantitative Results----------------------------------17
4.4 Qualitative Analysis: Distribution of Architectures---19
4.5 Qualitative Analysis: Instance-Awareness--------------21
4.6 Ablation Study: Count-Based Search Exploration--------22
4.7 Ablation Study: Weight Regularization-----------------23
5 Conclusion----------------------------------------------24
A Symbol Table--------------------------------------------25
B Algorithm-----------------------------------------------26
