-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
inconsistency between readcsv & writecsv for multidimensional arrays #8675
Comments
What output do you get? |
CSV is an inherently 2D format, not a generic serialization format – what would you expect? Arguably, we should raise an error and require the caller to reshape the array to something 2D. |
Well, the writecsv command gives: 1,1 1,1 1,1 which is a format that makes perfect sense (to me). The readcsv indeed (as Stefan mentioned) provides a 2D format, thus not taking the empty lines into account. What I would expect is that, without any additional arguments, the readcsv and writecsv commands would be compatible. This would imply that the readcsv would recognize not only the "end-of-line" but also the "empty line" delimiter that is used by writecsv. I hope this makes sense ;) |
This is in direct conflict with the feature request to ignore blank lines. Honestly, I think that expecting CSV format to support higher dimensional array is kind of unreasonable. |
I'm no developer and cannot fully comprehend the consequences of supported higher-dimensional formats. On the other hand, it makes no sense (to me, as a user) that readcsv and writecsv -both without additional arguments- are not compatible. As a solution, I would personally prefer to make readcsv and writecsv compatible and useable for higher dimensional arrays, which would imply that blank lines are recognized by readcsv (and optionally ignored via an argument, cf. the feature request above). Alternatively, the functionality in writecsv where empty lines are used to signal a change in dimensions could be removed. To me, the latter option makes less sense but it's up to you (or whomever programs readcsv/writecsv), of course ;) |
Really you want to pass additional separators to writecsv (and potentially readcsv) with something like:
The problem is that there is no standard for multi-dim csvs, and line separated ones do exist, so there probably should be an option to read them (but IMO needn't be the default, skipping blank lines/2D is much more frequently used). |
IMO extending the CSV format in ad hoc ways to support non-2D data would be extremely confusing. I think that @StefanKarpinski 's suggestion is the right way of dealing with this: raise an error for everything else. @hayd: Using various other separators for higher dimensions would violate all defacto "standards" of CSV. Calling the function |
Ah, I hadn't realised you couldn't set delim in readcsv, of course I mean this should be in I'm for raising and maybe suggesting writedlm like above (if/once supported). |
Evidently there is a standards document for the CSV format that someone linked in some other issue, but unless that says anything about multidimensional arrays I think it would be better to error. CSV is abused and overused for lots of things it really shouldn't be - have a look at HDF5/JLD for a more appropriate format for arbitrary objects. |
I too think that we should have How about the following options:
I'd prefer option 1. |
Why do we need a |
That's true... Option 2 seemed clumsy as it needs exporting two additional functions. I'm not sure where they are used, we can choose not to export if no one needs them exported. |
What would be the use case for For serialization with Julia (keeping to the same version), we already have For saving data to be read by some other version/language/library, would the target environment understand the conventions for non-2d objects, especially given that they are not standardized? Again, I think that HDF is much better for that purpose. While coming up with new ad hoc ways to represent non-2d objects is very interesting, I think that unless there is a compelling use case it would be much better to simply recognize that the domain for these functions is restricted to 2d objects and throw an error. |
👍 to throwing an error for >2D arguments |
Then I guess raising an exception for >2D is the way to go. It would make readcsv and writecsv compatible and solve the issue. For those who prefer small to moderately large 3D/4D objects in csv (like me) and who appreciate the current behavior of writecsv, I guess that an optional "more-dimensional separator" argument would be very convenient. (If not in readcsv, then perhaps in readdlm.) But there are probably more pressing issues; hence thanks for reading and many more thanks for Julia! |
writecsv methods on AbstractArray and Iterators are now removed (commented).
writecsv now only for vectors & matrices fix #8675
Wasn't the new version supposed to give an error for >2D arguments? Sorry, just not sure if I missed a change in the discussion. |
There were some more discussions in #8688 where we decided to retain the iterator format. |
A simple issue.
A=zeros(2,2,3)
writecsv("test.csv",A)
works well, but:
B=readcsv("test.csv")
unexpectedly does not give the original A (at least not in Julia 0.3 in Windows 7)
The text was updated successfully, but these errors were encountered: