META-TRANSFER LEARNING THROUGH HARD TASKS
ABSTARCT : Meta-learning has been proposed as a framework to address the challenging few-shot learning setting. The key idea is to leverage a large number of similar few-shot tasks in order to learn how to adapt a base-learner to a new task for which only a few labeled samples are available. As deep neural networks (DNNs) tend to overfit using a few samples only, typical meta-learning models use shallow neural networks, thus limiting its effectiveness.
? Typically, DNN weights are either fixed for feature extraction or simply fine-tuned for each task, while we learn a meta-transfer learner through all tasks, which is different in terms of the underlying learning paradigm.
? More importantly, our approach can generalize to existing few-shot learning models whose image features are extracted from DNNs on different architectures, for which we conduct extensive experiments
? It does not change DNN weights, thereby avoiding the problem of “catastrophic forgetting” when learning specific tasks in MTL
? This demonstrates our contribution of utilizing deeper neural networks to better tackle the few-shot classification problems.
? MTL” obtain consistent performance improvements over the original methods in both normal and challenging SSFSL settings by quit large margins, e.g. 13.7% on tieredImageNet w/D 1-shot. This validates the effectiveness of our method for tackling SSFSL problems.
? Meta-learning has been proposed as a framework to address the challenging few-shot learning setting
? Extensive comparisons to related works validate that our MTL approach trained with the proposed HT meta-batch scheme achieves top performance.
? A straight forward idea is to increase the amount of available data by data augmentation techniques . Several methods proposed to learn a data generator
? The bottom blocks compare the proposed HT meta-batch with the conventional meta-batch Note that here the FT stands for fine-tuning a classifier.
? Besides, the knowledge to transfer can be from multi-modal category models, e.g. the word embedding models used for zero-shot learning and trained attribute models used for social relationship recognition
? Then, we introduce the task-level data denotations used at two phases, i.e., meta-train and meta-test
? Stage-1 is called base-learning, where the crossentropy loss is used to optimize the parameters of the base-learner.
? The test loss is used to optimize the parameters of the meta-learner.
|