Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inconsistency between readcsv & writecsv for multidimensional arrays #8675

Closed
mfjonker opened this issue Oct 13, 2014 · 17 comments
Closed

inconsistency between readcsv & writecsv for multidimensional arrays #8675

mfjonker opened this issue Oct 13, 2014 · 17 comments

Comments

@mfjonker
Copy link

A simple issue.

A=zeros(2,2,3)
writecsv("test.csv",A)

works well, but:

B=readcsv("test.csv")
unexpectedly does not give the original A (at least not in Julia 0.3 in Windows 7)

@pao
Copy link
Member

pao commented Oct 13, 2014

What output do you get?

@StefanKarpinski
Copy link
Sponsor Member

CSV is an inherently 2D format, not a generic serialization format – what would you expect? Arguably, we should raise an error and require the caller to reshape the array to something 2D.

@mfjonker
Copy link
Author

Well, the writecsv command gives:

1,1
1,1

1,1
1,1

1,1
1,1

which is a format that makes perfect sense (to me). The readcsv indeed (as Stefan mentioned) provides a 2D format, thus not taking the empty lines into account. What I would expect is that, without any additional arguments, the readcsv and writecsv commands would be compatible. This would imply that the readcsv would recognize not only the "end-of-line" but also the "empty line" delimiter that is used by writecsv.

I hope this makes sense ;)

@StefanKarpinski
Copy link
Sponsor Member

This is in direct conflict with the feature request to ignore blank lines. Honestly, I think that expecting CSV format to support higher dimensional array is kind of unreasonable.

@mfjonker
Copy link
Author

I'm no developer and cannot fully comprehend the consequences of supported higher-dimensional formats. On the other hand, it makes no sense (to me, as a user) that readcsv and writecsv -both without additional arguments- are not compatible.

As a solution, I would personally prefer to make readcsv and writecsv compatible and useable for higher dimensional arrays, which would imply that blank lines are recognized by readcsv (and optionally ignored via an argument, cf. the feature request above).

Alternatively, the functionality in writecsv where empty lines are used to signal a change in dimensions could be removed. To me, the latter option makes less sense but it's up to you (or whomever programs readcsv/writecsv), of course ;)

@hayd
Copy link
Member

hayd commented Oct 13, 2014

Really you want to pass additional separators to writecsv (and potentially readcsv) with something like:

A=ones(2,2,3)
writecsv("test.csv", A, seps=(',', ';', '\n'))
1,1;1,1;1,1
1,1;1,1;1,1

The problem is that there is no standard for multi-dim csvs, and line separated ones do exist, so there probably should be an option to read them (but IMO needn't be the default, skipping blank lines/2D is much more frequently used).

@tpapp
Copy link
Contributor

tpapp commented Oct 13, 2014

IMO extending the CSV format in ad hoc ways to support non-2D data would be extremely confusing. I think that @StefanKarpinski 's suggestion is the right way of dealing with this: raise an error for everything else.

@hayd: Using various other separators for higher dimensions would violate all defacto "standards" of CSV. Calling the function writecsv would then be misleading, the user expectation is that the function produces a file that should be readable with other programs that claim to read CSV.

@hayd
Copy link
Member

hayd commented Oct 13, 2014

Ah, I hadn't realised you couldn't set delim in readcsv, of course I mean this should be in readdlm and writedlm (where you can). The problem is there is no standard for multi-dim, and people are forced to do it in ad hoc ways (and it is confusing), and they do in the real world. Atm we're essentially using delim=(',', '\n', "\n\n")...

I'm for raising and maybe suggesting writedlm like above (if/once supported).

@tkelman
Copy link
Contributor

tkelman commented Oct 14, 2014

Evidently there is a standards document for the CSV format that someone linked in some other issue, but unless that says anything about multidimensional arrays I think it would be better to error. CSV is abused and overused for lots of things it really shouldn't be - have a look at HDF5/JLD for a more appropriate format for arbitrary objects.

@tanmaykm
Copy link
Member

I too think that we should have readcsv and writecsv support 2D data only. There are other formats better suited for higher dimensional data.

How about the following options:

  1. Introduce an option read_compatible (default: true) and have writedlm fail for non 2D inputs unless read_compatibile is set to false.
  2. Rename writedlm methods that accept non 2D inputs. We have methods accepting AbstractArray and iterators, apart from AbstractVecOrMat, so we could probably have writedlm_ndarray and writedlm_iter.

I'd prefer option 1.

@tkelman
Copy link
Contributor

tkelman commented Oct 14, 2014

Why do we need a read_compatible option at all? While we're in the process of making breaking changes...

@tanmaykm
Copy link
Member

That's true... Option 2 seemed clumsy as it needs exporting two additional functions.

I'm not sure where they are used, we can choose not to export if no one needs them exported.

@tpapp
Copy link
Contributor

tpapp commented Oct 14, 2014

What would be the use case for readdlm/writedlm and generic (non-2d) objects?

For serialization with Julia (keeping to the same version), we already have serialize etc.

For saving data to be read by some other version/language/library, would the target environment understand the conventions for non-2d objects, especially given that they are not standardized? Again, I think that HDF is much better for that purpose.

While coming up with new ad hoc ways to represent non-2d objects is very interesting, I think that unless there is a compelling use case it would be much better to simply recognize that the domain for these functions is restricted to 2d objects and throw an error.

@IainNZ
Copy link
Member

IainNZ commented Oct 14, 2014

👍 to throwing an error for >2D arguments

@mfjonker
Copy link
Author

Then I guess raising an exception for >2D is the way to go. It would make readcsv and writecsv compatible and solve the issue.

For those who prefer small to moderately large 3D/4D objects in csv (like me) and who appreciate the current behavior of writecsv, I guess that an optional "more-dimensional separator" argument would be very convenient. (If not in readcsv, then perhaps in readdlm.) But there are probably more pressing issues; hence thanks for reading and many more thanks for Julia!

tanmaykm added a commit to tanmaykm/julia that referenced this issue Oct 15, 2014
writecsv methods on AbstractArray and Iterators are now removed (commented).
JeffBezanson added a commit that referenced this issue Oct 24, 2014
writecsv now only for vectors & matrices fix #8675
@rennis250
Copy link
Contributor

Wasn't the new version supposed to give an error for >2D arguments? Sorry, just not sure if I missed a change in the discussion.

@tanmaykm
Copy link
Member

There were some more discussions in #8688 where we decided to retain the iterator format.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants