Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Corrupted Files? #44

Closed
albert239825 opened this issue Jul 20, 2020 · 6 comments
Closed

Corrupted Files? #44

albert239825 opened this issue Jul 20, 2020 · 6 comments

Comments

@albert239825
Copy link

Hello, I was trying to convert the small dataset to .wav using pydub and some files gave me errors trying to import. I tried them with librosa and they also failed. The files are as listed:

fma_small/099/099134.mp3
fma_small/108/108925.mp3
fma_small/133/133297.mp3

Please let me know if I did something wrong or if you are also getting the error. Thanks.

@mdeff
Copy link
Owner

mdeff commented Jul 20, 2020

That's a known issue (#41). Those 3 files have no audio at all (due to erroneous metadata). There are 6 files with less than 30s of audio in the small subset.

Any idea how we could make the list of known issues more visible?

@albert239825
Copy link
Author

So sorry didn't see that. I think maybe you could put a disclaimer in the readme that certain files will cause errors when trying to load in. I'm just starting machine learning so I might not be the best person to ask about this topic. Thanks for compiling an overall amazing dataset.

@mdeff
Copy link
Owner

mdeff commented Jul 20, 2020

Thanks for the kind words. There's a link under "History", but it might not be visible enough.

@mdeff
Copy link
Owner

mdeff commented Jul 22, 2020

I've added a wiki page and an hopefully more visible link in the README:
20200722_182743

@actuallyaswin
Copy link

I've found files in the fma_large and fma_full partitions that are malformed by using the following query:

find fma/data/fma_large/ -iname *.mp3 -type f -size -4097c

Basically searched for any files which are 4kB or smaller. I deleted these files and unzipped them again from fma_large/fma_full, still seeing these as malformed. soxi is unable to read any data from them. I've manually added their song IDs to my script's errata.

fma_large/001/001486.mp3
fma_large/002/002624.mp3
fma_large/003/003284.mp3
fma_large/005/005574.mp3
fma_large/008/008669.mp3
fma_large/010/010116.mp3
fma_large/011/011583.mp3
fma_large/012/012838.mp3
fma_large/013/013529.mp3
fma_large/014/014116.mp3
fma_large/014/014180.mp3
fma_large/020/020814.mp3
fma_large/022/022554.mp3
fma_large/023/023429.mp3
fma_large/023/023430.mp3
fma_large/025/025173.mp3
fma_large/025/025174.mp3
fma_large/025/025175.mp3
fma_large/025/025176.mp3
fma_large/025/025180.mp3
fma_large/029/029345.mp3
fma_large/029/029346.mp3
fma_large/029/029352.mp3
fma_large/029/029356.mp3
fma_large/033/033411.mp3
fma_large/033/033413.mp3
fma_large/033/033414.mp3
fma_large/033/033417.mp3
fma_large/033/033418.mp3
fma_large/033/033419.mp3
fma_large/033/033425.mp3
fma_large/035/035725.mp3
fma_large/039/039363.mp3
fma_large/041/041745.mp3
fma_large/042/042986.mp3
fma_large/043/043753.mp3
fma_large/050/050594.mp3
fma_large/050/050782.mp3
fma_large/053/053668.mp3
fma_large/054/054569.mp3
fma_large/054/054582.mp3
fma_large/061/061480.mp3
fma_large/061/061822.mp3
fma_large/063/063422.mp3
fma_large/063/063997.mp3
fma_large/065/065753.mp3
fma_large/072/072656.mp3
fma_large/072/072980.mp3
fma_large/073/073510.mp3
fma_large/080/080237.mp3
fma_large/080/080391.mp3
fma_large/080/080553.mp3
fma_large/082/082699.mp3
fma_large/084/084503.mp3
fma_large/084/084504.mp3
fma_large/084/084522.mp3
fma_large/084/084524.mp3
fma_large/086/086656.mp3
fma_large/086/086659.mp3
fma_large/086/086661.mp3
fma_large/086/086664.mp3
fma_large/087/087057.mp3
fma_large/090/090244.mp3
fma_large/090/090245.mp3
fma_large/090/090247.mp3
fma_large/090/090248.mp3
fma_large/090/090250.mp3
fma_large/090/090252.mp3
fma_large/090/090253.mp3
fma_large/090/090442.mp3
fma_large/090/090445.mp3
fma_large/091/091206.mp3
fma_large/092/092479.mp3
fma_large/094/094052.mp3
fma_large/094/094234.mp3
fma_large/095/095253.mp3
fma_large/096/096203.mp3
fma_large/096/096207.mp3
fma_large/096/096210.mp3
fma_large/098/098105.mp3
fma_large/098/098558.mp3
fma_large/098/098559.mp3
fma_large/098/098560.mp3
fma_large/098/098562.mp3
fma_large/098/098571.mp3
fma_large/099/099134.mp3
fma_large/101/101265.mp3
fma_large/101/101272.mp3
fma_large/101/101275.mp3
fma_large/102/102241.mp3
fma_large/102/102243.mp3
fma_large/102/102247.mp3
fma_large/102/102249.mp3
fma_large/102/102289.mp3
fma_large/105/105247.mp3
fma_large/106/106409.mp3
fma_large/106/106412.mp3
fma_large/106/106415.mp3
fma_large/106/106628.mp3
fma_large/108/108920.mp3
fma_large/108/108925.mp3
fma_large/109/109266.mp3
fma_large/110/110236.mp3
fma_large/115/115610.mp3
fma_large/117/117441.mp3
fma_large/126/126981.mp3
fma_large/127/127336.mp3
fma_large/127/127928.mp3
fma_large/129/129207.mp3
fma_large/129/129800.mp3
fma_large/130/130328.mp3
fma_large/130/130748.mp3
fma_large/130/130751.mp3
fma_large/131/131545.mp3
fma_large/133/133297.mp3
fma_large/133/133641.mp3
fma_large/133/133647.mp3
fma_large/134/134887.mp3
fma_large/140/140449.mp3
fma_large/140/140450.mp3
fma_large/140/140451.mp3
fma_large/140/140452.mp3
fma_large/140/140453.mp3
fma_large/140/140454.mp3
fma_large/140/140455.mp3
fma_large/140/140456.mp3
fma_large/140/140457.mp3
fma_large/140/140458.mp3
fma_large/140/140459.mp3
fma_large/140/140460.mp3
fma_large/140/140461.mp3
fma_large/140/140462.mp3
fma_large/140/140463.mp3
fma_large/140/140464.mp3
fma_large/140/140465.mp3
fma_large/140/140466.mp3
fma_large/140/140467.mp3
fma_large/140/140468.mp3
fma_large/140/140469.mp3
fma_large/140/140470.mp3
fma_large/140/140471.mp3
fma_large/140/140472.mp3
fma_large/142/142614.mp3
fma_large/143/143992.mp3
fma_large/144/144518.mp3
fma_large/144/144619.mp3
fma_large/145/145056.mp3
fma_large/146/146056.mp3
fma_large/147/147419.mp3
fma_large/147/147424.mp3
fma_large/148/148786.mp3
fma_large/148/148787.mp3
fma_large/148/148788.mp3
fma_large/148/148789.mp3
fma_large/148/148790.mp3
fma_large/148/148791.mp3
fma_large/148/148792.mp3
fma_large/148/148793.mp3
fma_large/148/148794.mp3
fma_large/148/148795.mp3
fma_large/151/151920.mp3
fma_large/155/155051.mp3
fma_full/080/080237.mp3
fma_full/145/145056.mp3
fma_full/015/015608.mp3

@jbmaxwell
Copy link

I'm noticing that several files seem to cause segfaults in AudioLoader. They come up as warnings on single frames (I think)—[ WARNING ] AudioLoader: invalid frame, skipping it: Invalid data found when processing input—but will eventually crash.

Is there a way around this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants