Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

learn2learn (l2l) data loader for Meta-Dataset: A Dataset of Datasets for Learning to Learn from Few Examples? #286

Open
brando90 opened this issue Nov 18, 2021 · 11 comments
Labels
help wanted Extra attention is needed

Comments

@brando90
Copy link

Hi,

I was wondering if there was a l2l dataloader for Meta-Dataset: A Dataset of Datasets for Learning to Learn from Few Examples?

references:

ideally with a l2l model example would be fantastic!

@seba-1511
Copy link
Member

Hi @brando90,

We already have a few of the datasets of Meta-Dataset in l2l.vision.datasets. The remaining ones are work-in-progress.

@brando90
Copy link
Author

Hi @brando90,

We already have a few of the datasets of Meta-Dataset in l2l.vision.datasets. The remaining ones are work-in-progress.

if there are issues for the remaining ones that I could help let me know. If it becomes a part of my critical path I'm happy to help.

@brando90
Copy link
Author

brando90 commented Apr 7, 2022

Hi @brando90,

We already have a few of the datasets of Meta-Dataset in l2l.vision.datasets. The remaining ones are work-in-progress.

Hi @seba-1511 , what would be the next steps to have this working for l2l?

@seba-1511
Copy link
Member

We're missing MS COCO and ILSVRC. For MS COCO we should provide a class to download the data (like the other datasets) but for ILSVRC it'd be enough to only have the splits.

@brando90
Copy link
Author

brando90 commented Apr 7, 2022

why is "for ILSVRC it'd be enough to only have the splits." enough but not getting the data?

Thanks for the quick response!

@brando90
Copy link
Author

brando90 commented Apr 8, 2022

perhaps this is a good place to start: https://github.com/mboudiaf/pytorch-meta-dataset

@brando90
Copy link
Author

brando90 commented Apr 11, 2022

@seba-1511 Hi Seba! trying to figure out how I'd implement a l2l BenchmarkTasksets for the distributed MAML example you gave us for meta-dataset (which I think would work for all setting that use episodic meta-learning).

Is all I need the following:

  1. implement a standard pytorch classification data set e.g.
    # Load task-specific data and transforms
    datasets, transforms = _TASKSETS[name](train_ways=train_ways,
                                           train_samples=train_samples,
                                           test_ways=test_ways,
                                           test_samples=test_samples,
                                           root=root,
                                           device=device,
                                           **kwargs)
    train_dataset, validation_dataset, test_dataset = datasets
    train_transforms, validation_transforms, test_transforms = transforms
  1. then pass that dataset object to TaskDataset
  2. Then return the BenchmarkTasksets as: return BenchmarkTasksets(train_tasks, validation_tasks, test_tasks)

So I only need to implement a normal pytorch data set for meta-dataset (in particular the getting a pair (x,y)) and your code takes care of the rest I think. Right?

code example of above:

PS: I think would be the same for the IBM data set, just need a data set obj.

@seba-1511
Copy link
Member

Hello @brando90,

Yes, I think this would do it. Note that to get comparable results with published numbers, you might have to implement varying shot numbers, as described in their paper. This can be done with TaskTransforms and should be pretty straight forward.

Good luck!

@brando90
Copy link
Author

related: #301 but talks about how t write a dataloader for SL using l2l using the data set object

@brando90
Copy link
Author

@seba-1511 hi seba! Sorry for the random ping. How do you suggest one goes around implementing meta-data set for l2l?

Would downloading the data and then following the way you sample data from the files directly like in mini-imagenet a good idea? Or do you have any suggestions?

@seba-1511 seba-1511 added the help wanted Extra attention is needed label May 29, 2023
@AntreasAntoniou
Copy link
Contributor

I have actually implemented what I believe to be the MetaDataset episode sampling scheme in one of my current projects. It is mainly using l2l to get the datasets, and then using the episode sampler to create episodes.

It's a bit rough around the edges, but for the most part gets the job done.

If you give me until Friday I can come back with a PR for l2l to integrate that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants