-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Howto create minibatches with independent samples #136
Comments
If If The epoch-shuffling idea you posted is interesting, and similar to something the RR already does. A slightly more pescadorian approach to this would be to limit each streamer to produce a small number of samples (== the number before the order will change) and self-terminate, then use If each streamer generates exactly 1 random example, this should give you exactly the desired behavior of fully randomized order with balanced presentation. However, you'll have to pay in overhead of reactivating the streamers, so it might not be worth it. My untested, gut feeling, is to set the per-streamer limit such that the permutation order changes several times per epoch. That "several times" may depend on how many total streamers you have: if it's small, then |
Side note: are you referring to section 4.2 of that paper? Or a different part? I'm not sure it applies in either case, but maybe I missed something in my cursory skim. |
I got it from the conclusion 😆 I'll be honest here, I don't understand any of the proof and I will probably not have the time to look into it. But normally when this paper is cited, they will tell you that this only applies to convex problems. |
Excellent, thanks. This is the hack I was looking for. If this would be of greater interest, I would propose to add an option to RR to make this easy to use. |
@bmcfee if you're interested how the RoundRobin flavors perform for source separation compared to StochasticMux, I can post the results here. Otherwise, feel free to close this out. |
I think it is of interest, but I'm not sure adding it to However, if you find that it does the job for you, it'd be great to add it as a use-case/example in the documentation. |
@faroit: I'd be interested how much of a difference the sampling strategy makes. I used to sample excerpts without replacement (https://github.com/f0k/ismir2015/blob/master/experiments/augment.py#L18-L51), but now often sample recordings without replacement (and then uniformly sample an excerpt within each chosen recording). I wasn't worried too much that a batch could contain multiple excerpts of the same recording, but a) that for datasets of wildly varying recording lengths, longer recordings would be favored over shorter ones, and b) that for large datasets, keeping track of all possible recording/position tuples adds a lot of overhead. |
This is typically how I do it as well, using a pumpp sampler for each track, and then a StochasticMux over tracks.
This is true, but you can also either use rate-limiting (in the mux) or self-limiting (in the streamer's generator) to avoid this. If you use
This is true. In general, we don't have a great mechanism for persisting state across activations of a streamer. This is somewhat by design, as you mention, it could incur substantial memory overhead. So far, it hasn't really been a problem, but I'm open to suggestions about how to generically support such a thing if it does become desirable. |
@f0k @bmcfee just short status update: I didn't forget this issue. I still think this discussion could be really valuable. Also – specifically for source separation networks – we will present this as part of the ISMIR source tutorial. I will share the code and results later here when we fixed the remaining parameters. |
@sleglaive yes, and no. I didn't use a pytorch dataloader, so I didn't get the speedup/caching benefit. Hence, I fully relied on pescadors batch sampler (which is significantly slower than the dataloader parallelization, even when zmq enabled), once the speed of pescador would be improved, this could make the sampling faster. Our code has a few tricks and will be released very soon. |
Ok, thanks a lot for the update ! |
@faroit it's not surprising that pesc's latency is a bit worse than pytorch's data loader, as it's pure python and much more flexible. If you're feeling generous with time, it would be great if you could profile your implementation and let us know where the hotspots are -- there might be some easy optimizations that we could implement, but it's hard to know up front without specific case studies. |
Pinging back on this thread -- @faroit did you ever do the comparison of round-robin and shuffle mux? |
we did. In the plot above, the track sampling without replacement was done with round-robin. Feel free to close this issue – maybe add a one-liner to the docs to advertise round-robin a bit more? |
Great, thanks! I'm not 100% sure I understand the axes of this plot though, can you break it down a bit more? Is the message that replacement for tracks doesn't matter, but it does for samples drawn per batch from the track set? Side note: I'm currently doing a bit of issue gardening, and trying to clean up some rough edges before we start hammering on speed. |
sorry, of course: the message is that taking samples (here samples = chunks/excerpts within a track) without replacement improves performance as it guarantees that the model sees all parts of each tracks (when no overlap) exactly once, thus utilizing all the data. This is not concerning the batch but the epoch! |
Ah, thanks for the clarification! So the idea is that it doesn't matter so much how you pick the tracks to include in a batch, but sampling of excerpts from a track should be exhaustive? Did excerpts overlap in your experiment, or were they non-overlapping? (It doesn't really matter, but I'm curious.) |
I think I always used non-overlapping segments. Here is the code I used back then to configure all options ( |
Great, thanks! I suspect there's something really interesting going on here relating to the conditional independence assumptions of excerpts within a track (definitely dependent) vs across tracks (hopefully independent). |
TL;DR: I am looking for a way to mux streams and aggregate them in batches while making sure that each batch contains only 1 sample from each stream (if the batchsize is smaller than the number of streams).
For quite some time now I wonder about an aspect of data sampling in the context of training deep models for hierarchical-like data set. Lets return (#133) to the infamous scenario of music tracks that are trained on smaller excerpts instead of using the whole track so that non-dynamic models such as CNNs can be used.
So lets assume we have 10 tracks and each track yields 100 excerpts (with our without overlap), this would result in 1000 samples. Now we have a few ways to sample for one training epoch, and one way is to sample randomly and select
batchsize
samples for each batch and repeat the procedure (without replacement) for a fixed number of iterations in each epoch. While this is fast and easy to implement – even without pescador – it comes with two downsides:This can be compensated by increasing the number of iterations or epochs. Also there is a proof that with-replacement sampling performs worse than without-replacement sampling, but this is only valid for non-convex problems. I forgot where I read it but for DNNs it could actually result in improved generalisation, which probably why it turned out to be successfully deployed in audio for such as singing voice detection paper from @f0k.
I currently don't know if this has an impact on performance, but there is some work that deals with these kind of problem for small-batch sizes and/or non-i.i.d samples. Also I know some methods in source separation use this kind of sampling (e.g. the ICASSP 2015 work from @TE-StefanUhlich). Therefore I want to run a small comparison and I hoped I could implement all flavors using pescador.
So, with pescador we can easily make sure that all samples are seen once using
StochasticMux
. However if we use the buffer_stream we cannot easily make sure that all samples are from different tracks. A workaround would be to useRoundRobin
if batchsize is smaller than the number of tracks and shuffle the streamers every epoch:however than we would end up having repetition (e.g. for three tracks and batchsize 2: [[a, b], [c, a], [b, c], [a, b]]).
I would be glad if someone could show me how to implement this in pescador or can possibly point me to some theory papers were this is discussed. Also maybe this would something to consider for the pescador paper you are planning to write.
The text was updated successfully, but these errors were encountered: