NamedArrays with AbstractString names #140

orenbenkiki · 2024-06-10T10:33:49Z

If one creates a named_array = NamedArray(values_vector; names = (names_vector,)) then the keytype(named_array.dicts[1]) == eltype(names_vector). This seems reasonable on its face, except when the eltype(names_vector) <: AbstractString.

Consider for example names_vector = split(one_name_per_line, "\n"). In this case the keytype is Substring. Trying to index into the named array with a regular String will fail because a String isn't a Substring. This also fails the other way around: if eltype(names_vector) == String, you can't index into the named array with a Substring.

A better behavior would be to test whether the eltype of the names vector is <: AbstractString, and if so, force the keytype to be AbstractString.

The text was updated successfully, but these errors were encountered:

davidavdav · 2024-06-11T12:47:27Z

Yes, this is a problem. I see that the regular Dict doesn't have this problem (anymore? I recall I've run into this issue with DataFrames or CSV, which reads files and splits strings such that the column types are SubString).

I don't know how they have solved it for Dict, I've tried to interpret the sources, traced it down to ht_keyindex().

One solution for now is to use names_vector = String.(split(one_name_per_line, "\n")).

The strange thing is that it does work for Int8 vs Int64

orenbenkiki · 2024-06-12T17:41:26Z

Sure, I can (and do) work around it by making a copy of the names array, which is a pity when this is a memory-mapped file I just split on a line break. I have ~40K names in there. It isn't a major performance issue but it isn't nice.

Instead of fixing it in OrderedDict, you could check if eltype(names) <: AbstractString and if so, explicitly create OrderedDict{AbstractString,Int}(...). This would ensure that you could query it for any string type without a problem, and shouldn't break any existing code.

hsseung · 2024-08-01T23:40:24Z

FYI I just ran into the same problem when using fixed-width string types from InlineStrings.jl.
These are common when reading DataFrames from CSV files.

davidavdav closed this as completed in 9ad1132 Aug 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NamedArrays with AbstractString names #140

NamedArrays with AbstractString names #140

orenbenkiki commented Jun 10, 2024

davidavdav commented Jun 11, 2024

orenbenkiki commented Jun 12, 2024

hsseung commented Aug 1, 2024

NamedArrays with AbstractString names #140

NamedArrays with AbstractString names #140

Comments

orenbenkiki commented Jun 10, 2024

davidavdav commented Jun 11, 2024

orenbenkiki commented Jun 12, 2024

hsseung commented Aug 1, 2024