- By Ruizhi Liao, Junhai zhai.
- This repo is the pytorch implementation of [Optimization model based on attention for Few-shot Learning ]
- Make sure Mini-Imagenet is split properly. For example:
- data/ - miniImagenet/ - train/ - n01532829/ - n0153282900000005.jpg - ... - n01558993/ - ... - val/ - n01855672/ - ... - test/ - ... - main.py - ...- It'd be set if you download and extract Mini-Imagenet from the link above
- Check out
scripts/train_5s_5c.sh, make sure--data-rootis properly set
For 5-shot, 5-class training, run
bash scripts/train_5s_5c.shHyper-parameters are referred to the author's repo.
For 5-shot, 5-class evaluation, run (remember to change --resume and --seed arguments)
bash scripts/eval_5s_5c.sh- Training with the default settings takes ~2.5 hours on a single Titan Xp while occupying ~2GB GPU memory.
- The implementation replicates two learners similar to the author's repo:
learner_w_gradfunctions as a regular model, get gradients and loss as inputs to meta learner.learner_wo_gradconstructs the graph for meta learner:- All the parameters in
learner_wo_gradare replaced bycIoutput by meta learner. nn.Parametersin this model are casted totorch.Tensorto connect the graph to meta learner.
- All the parameters in
- Several ways to copy a parameters from meta learner to learner depends on the scenario:
copy_flat_params: we only need the parameter values and keep the originalgrad_fn.transfer_params: we want the values as well as thegrad_fn(fromcItolearner_wo_grad)..data.copy_v.s.clone()-> the latter retains all the properties of a tensor includinggrad_fn.- To maintain the batch statistics,
load_state_dictis used (fromlearner_w_gradtolearner_wo_grad).
- This code borrows heavily from the meta-learning-lstm framework.

