data augmentation #68

laurenmoos · 2020-12-08T10:57:06Z

support for data augmentation pipelines, optimally data augmentation pipelines that support higher-level optimization like policy search or HPO

data augmentation would have its own abstraction but be bound to a pipeline abstraction and then pipelines would be passed to collate for dual-augmentation of batches specific to many self-supervision tasks, collation could be associated more with descriptive methods for how much divergence is introduced (via randomness) for each of the two "paths".

This would allow researchers to use strategies such as curriculum/active learning with augmentation diverging as the consistency loss reduced (for example)

philippmwirth · 2020-12-08T13:57:21Z

Doesn't the lightly.data.collate.BaseCollate class offer an interface for such pipelines? The pipeline need only be implemented as a torchvision.transforms.Compose.

laurenmoos · 2020-12-08T14:23:18Z

the last project I worked on was for video and so we had to use non-torch compose pipelines

If the overall goal of the platform is using self-supervision to improve sample efficiency for downstream labeled tasks, it would make sense for the platform to be very robust in terms of ingest of other sample efficiency increasing techniques. Thinking as an end-user, I would almost certainly couple experiments in terms of augmentation strategies + self-supervision and so having (even a very thin) abstraction over torchvision transforms allows for "literate experimental design" . My end-goal isn't strictly using self-supervision, it is preparing data / network initializations in such a way I require less data when the good ol vanilla CNN's come out.

concrete example- my previous company was using medical images to produce a probability of {z medical condition}. I wanted to use self-supervision because we had 10x the amount of unlabeled data as labeled and our domain included super time expensive labeling process. There was some clear things that did to be regularized - in our case it was video so both temporal and spatial regularization- presumably upstream to self-supervision which was in turn upstream to supervised learning. I might use another open source framework for augmentation, but having my experiments in lightly reflect the upstream augmentation processes so I could ask questions like - do I really_ need a Gaussian blur applied randomly to imitate different microscopy focal lengths or should it be part of the "CollateFunction" (i.e. part of self-supervision tasks and not pre-processing) would have been amazing.

Perhaps it is sufficient to do this with torchvision compose but there is some kind of non-trivial & more importantly dynamic relationship between those pipelines and collate....

laurenmoos · 2020-12-08T14:26:04Z

I think I am thinking less in terms of framework interoperability and more OOP/end-case usability, how do we express the relationship between prior augmentation pipelines (however they're implemented and you're right they don't need to be implemented by lightly) and how those same augmentations are used on batches in contrastive learning

busycalibrating · 2020-12-08T22:30:00Z

+1 on this idea, I think its a good idea to have very simple high level interfaces that can allow users to implement what they want without being forced to conform to something more restrictive (e.g. torchvision). This is pretty much the paradigm of how PyTorch does their nn.Module to allow for custom implementations without too much constraint.

IgorSusmelj · 2020-12-09T12:39:12Z

That's a very interesting idea. Augmentations are key for contrastive learning. I would like to summarize the requirements and the concept of such an augmentation pipeline further before any development.

Here are a few thoughts from me:

Augmentations should be flexible depending on the problem I want to solve
Augmentations can be grouped
- spatial (cropping, resize, ...)
- texture (blur)
- color
- temporal (e.g. in videos, frames before/ after current frame)
Selecting the right augmentation strength is very tricky (especially in self-supervised/ unsupervised learning)
- User can set the values (as it is now)
- Can use heuristics/RL to find good parameters

philippmwirth added the enhancement label Dec 9, 2020

philippmwirth added feature request and removed enhancement labels Jan 15, 2021

philippmwirth mentioned this issue Jan 15, 2021

Move image transforms from collate function to the dataset #100

Closed

guarin closed this as not planned Won't fix, can't repro, duplicate, stale Aug 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data augmentation #68

data augmentation #68

laurenmoos commented Dec 8, 2020

philippmwirth commented Dec 8, 2020

laurenmoos commented Dec 8, 2020

laurenmoos commented Dec 8, 2020

busycalibrating commented Dec 8, 2020

IgorSusmelj commented Dec 9, 2020

data augmentation #68

data augmentation #68

Comments

laurenmoos commented Dec 8, 2020

philippmwirth commented Dec 8, 2020

laurenmoos commented Dec 8, 2020

laurenmoos commented Dec 8, 2020

busycalibrating commented Dec 8, 2020

IgorSusmelj commented Dec 9, 2020