inconsistency between readcsv & writecsv for multidimensional arrays #8675

mfjonker · 2014-10-13T16:03:04Z

A simple issue.

A=zeros(2,2,3)
writecsv("test.csv",A)

works well, but:

B=readcsv("test.csv")
unexpectedly does not give the original A (at least not in Julia 0.3 in Windows 7)

pao · 2014-10-13T16:15:46Z

What output do you get?

StefanKarpinski · 2014-10-13T16:18:05Z

CSV is an inherently 2D format, not a generic serialization format – what would you expect? Arguably, we should raise an error and require the caller to reshape the array to something 2D.

mfjonker · 2014-10-13T16:29:00Z

Well, the writecsv command gives:

1,1
1,1

which is a format that makes perfect sense (to me). The readcsv indeed (as Stefan mentioned) provides a 2D format, thus not taking the empty lines into account. What I would expect is that, without any additional arguments, the readcsv and writecsv commands would be compatible. This would imply that the readcsv would recognize not only the "end-of-line" but also the "empty line" delimiter that is used by writecsv.

I hope this makes sense ;)

StefanKarpinski · 2014-10-13T16:34:58Z

This is in direct conflict with the feature request to ignore blank lines. Honestly, I think that expecting CSV format to support higher dimensional array is kind of unreasonable.

mfjonker · 2014-10-13T16:49:06Z

I'm no developer and cannot fully comprehend the consequences of supported higher-dimensional formats. On the other hand, it makes no sense (to me, as a user) that readcsv and writecsv -both without additional arguments- are not compatible.

As a solution, I would personally prefer to make readcsv and writecsv compatible and useable for higher dimensional arrays, which would imply that blank lines are recognized by readcsv (and optionally ignored via an argument, cf. the feature request above).

Alternatively, the functionality in writecsv where empty lines are used to signal a change in dimensions could be removed. To me, the latter option makes less sense but it's up to you (or whomever programs readcsv/writecsv), of course ;)

hayd · 2014-10-13T17:27:54Z

Really you want to pass additional separators to writecsv (and potentially readcsv) with something like:

A=ones(2,2,3)
writecsv("test.csv", A, seps=(',', ';', '\n'))
1,1;1,1;1,1
1,1;1,1;1,1

The problem is that there is no standard for multi-dim csvs, and line separated ones do exist, so there probably should be an option to read them (but IMO needn't be the default, skipping blank lines/2D is much more frequently used).

tpapp · 2014-10-13T18:19:13Z

IMO extending the CSV format in ad hoc ways to support non-2D data would be extremely confusing. I think that @StefanKarpinski 's suggestion is the right way of dealing with this: raise an error for everything else.

@hayd: Using various other separators for higher dimensions would violate all defacto "standards" of CSV. Calling the function writecsv would then be misleading, the user expectation is that the function produces a file that should be readable with other programs that claim to read CSV.

hayd · 2014-10-13T20:00:17Z

Ah, I hadn't realised you couldn't set delim in readcsv, of course I mean this should be in readdlm and writedlm (where you can). The problem is there is no standard for multi-dim, and people are forced to do it in ad hoc ways (and it is confusing), and they do in the real world. Atm we're essentially using delim=(',', '\n', "\n\n")...

I'm for raising and maybe suggesting writedlm like above (if/once supported).

tkelman · 2014-10-14T05:24:18Z

Evidently there is a standards document for the CSV format that someone linked in some other issue, but unless that says anything about multidimensional arrays I think it would be better to error. CSV is abused and overused for lots of things it really shouldn't be - have a look at HDF5/JLD for a more appropriate format for arbitrary objects.

tanmaykm · 2014-10-14T08:06:08Z

I too think that we should have readcsv and writecsv support 2D data only. There are other formats better suited for higher dimensional data.

How about the following options:

Introduce an option read_compatible (default: true) and have writedlm fail for non 2D inputs unless read_compatibile is set to false.
Rename writedlm methods that accept non 2D inputs. We have methods accepting AbstractArray and iterators, apart from AbstractVecOrMat, so we could probably have writedlm_ndarray and writedlm_iter.

I'd prefer option 1.

tkelman · 2014-10-14T09:00:53Z

Why do we need a read_compatible option at all? While we're in the process of making breaking changes...

tanmaykm · 2014-10-14T09:13:18Z

That's true... Option 2 seemed clumsy as it needs exporting two additional functions.

I'm not sure where they are used, we can choose not to export if no one needs them exported.

tpapp · 2014-10-14T11:20:54Z

What would be the use case for readdlm/writedlm and generic (non-2d) objects?

For serialization with Julia (keeping to the same version), we already have serialize etc.

For saving data to be read by some other version/language/library, would the target environment understand the conventions for non-2d objects, especially given that they are not standardized? Again, I think that HDF is much better for that purpose.

While coming up with new ad hoc ways to represent non-2d objects is very interesting, I think that unless there is a compelling use case it would be much better to simply recognize that the domain for these functions is restricted to 2d objects and throw an error.

IainNZ · 2014-10-14T13:21:29Z

👍 to throwing an error for >2D arguments

mfjonker · 2014-10-14T14:50:23Z

Then I guess raising an exception for >2D is the way to go. It would make readcsv and writecsv compatible and solve the issue.

For those who prefer small to moderately large 3D/4D objects in csv (like me) and who appreciate the current behavior of writecsv, I guess that an optional "more-dimensional separator" argument would be very convenient. (If not in readcsv, then perhaps in readdlm.) But there are probably more pressing issues; hence thanks for reading and many more thanks for Julia!

writecsv methods on AbstractArray and Iterators are now removed (commented).

writecsv now only for vectors & matrices fix #8675

rennis250 · 2014-10-25T19:11:44Z

Wasn't the new version supposed to give an error for >2D arguments? Sorry, just not sure if I missed a change in the discussion.

tanmaykm · 2014-10-26T01:09:20Z

There were some more discussions in #8688 where we decided to retain the iterator format.

tanmaykm added a commit to tanmaykm/julia that referenced this issue Oct 15, 2014

writecsv now only for vectors & matrices fix JuliaLang#8675

48d5cc1

writecsv methods on AbstractArray and Iterators are now removed (commented).

tanmaykm mentioned this issue Oct 16, 2014

writecsv now only for vectors & matrices fix #8675 #8688

Merged

tanmaykm closed this as completed in 6d5f772 Oct 24, 2014

JeffBezanson added a commit that referenced this issue Oct 24, 2014

Merge pull request #8688 from tanmaykm/readcsv

93a33af

writecsv now only for vectors & matrices fix #8675

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

inconsistency between readcsv & writecsv for multidimensional arrays #8675

inconsistency between readcsv & writecsv for multidimensional arrays #8675

mfjonker commented Oct 13, 2014

pao commented Oct 13, 2014

StefanKarpinski commented Oct 13, 2014

mfjonker commented Oct 13, 2014

StefanKarpinski commented Oct 13, 2014

mfjonker commented Oct 13, 2014

hayd commented Oct 13, 2014

tpapp commented Oct 13, 2014

hayd commented Oct 13, 2014

tkelman commented Oct 14, 2014

tanmaykm commented Oct 14, 2014

tkelman commented Oct 14, 2014

tanmaykm commented Oct 14, 2014

tpapp commented Oct 14, 2014

IainNZ commented Oct 14, 2014

mfjonker commented Oct 14, 2014

rennis250 commented Oct 25, 2014

tanmaykm commented Oct 26, 2014

inconsistency between readcsv & writecsv for multidimensional arrays #8675

inconsistency between readcsv & writecsv for multidimensional arrays #8675

Comments

mfjonker commented Oct 13, 2014

pao commented Oct 13, 2014

StefanKarpinski commented Oct 13, 2014

mfjonker commented Oct 13, 2014

StefanKarpinski commented Oct 13, 2014

mfjonker commented Oct 13, 2014

hayd commented Oct 13, 2014

tpapp commented Oct 13, 2014

hayd commented Oct 13, 2014

tkelman commented Oct 14, 2014

tanmaykm commented Oct 14, 2014

tkelman commented Oct 14, 2014

tanmaykm commented Oct 14, 2014

tpapp commented Oct 14, 2014

IainNZ commented Oct 14, 2014

mfjonker commented Oct 14, 2014

rennis250 commented Oct 25, 2014

tanmaykm commented Oct 26, 2014