-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Encoding the dataset #2
Comments
Hello Ketan, I had a similar issue and I've managed to solve it by removing ReShuffle from the pipeline to avoid parallelism at the cost of reducing the speed of the process but at least it doesn't try to load entire data to the memory and runs smoothly (but very slowly) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Did you generate one giant TFRecord for the Lakh MIDI dataset? Or did you process the data in shards? If the latter, how exactly does one go about sharding the data with the pipelines that you have in place? I'm finding that generating one giant TFRecord using
convert_dir_to_note_sequences
is too large to load into memory when doingscripts/generate_song_data_beam.py
.The text was updated successfully, but these errors were encountered: