Encoding the dataset #2

ketan0 · 2021-12-07T01:37:29Z

Did you generate one giant TFRecord for the Lakh MIDI dataset? Or did you process the data in shards? If the latter, how exactly does one go about sharding the data with the pipelines that you have in place? I'm finding that generating one giant TFRecord using convert_dir_to_note_sequences is too large to load into memory when doing scripts/generate_song_data_beam.py.

The text was updated successfully, but these errors were encountered:

BenokanDeepBlue · 2022-11-01T13:21:33Z

Hello Ketan,

I had a similar issue and I've managed to solve it by removing ReShuffle from the pipeline to avoid parallelism at the cost of reducing the speed of the process but at least it doesn't try to load entire data to the memory and runs smoothly (but very slowly)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Encoding the dataset #2

Encoding the dataset #2

ketan0 commented Dec 7, 2021

BenokanDeepBlue commented Nov 1, 2022

Encoding the dataset #2

Encoding the dataset #2

Comments

ketan0 commented Dec 7, 2021

BenokanDeepBlue commented Nov 1, 2022