Column types get obliterated by Query.jl #313

samuela · 2020-08-07T22:40:36Z

I have a DataFrame df with correct column types (String, Float64, etc). However after processing with Query.jl I'm getting only Any column types. Here's the suspect code snippet:

brand_code_df =
    df |>
    @groupby((_.Brand, _.Product_Code)) |>
    @map({
        Brand = key(_)[1],
        Product_Code = key(_)[2],
        WAP = sum(_.Unit_Price .* _.Units) / sum(_.Units),
        WAC = sum(_.Unit_Cost .* _.Units) / sum(_.Units),
        total_sales = sum(_.Sales),
        gross_margin = sum(_.Margin),
        GMP = sum(_.Margin) / sum(_.Sales) * 100,
        total_code_units = sum(_.Units),
        weight = sum(_.Units) / brand_total_units[key(_)[1]],
        unique_prices = unique_prices(_),
    }) |>
    DataFrame

Now, brand_code_df will have only Any column types.

OTOH I've found that doing ... |> collect |> DataFrame does in fact retain the correct column types.

The text was updated successfully, but these errors were encountered:

davidanthoff · 2020-08-07T23:31:55Z

Is there a chance that you could a) post a short snippet that creates a DataFrame with the correct columns and just 1-2 rows with sample data? Literally something like DataFrame(Brand=["asdf", "lij"], Product_Code=[3, 4]) or something like that, so that I can reproduce this, and 2) can you post the code for unique_prices?

samuela · 2020-08-07T23:35:24Z

Hey @davidanthoff ! Yeah, let me see if I can come up with some mock data that have the same effect...

extradosages · 2020-11-13T05:59:04Z

I've observed this when using @mutate.

davidanthoff · 2021-01-06T00:03:10Z

@extradosages @mutate uses @map under the hood. Any more data you could provide to replicate this would be helpful.

i-aki-y · 2021-01-13T15:15:51Z

Hi, @davidanthoff I encountered the same problem.
I hope this small example helps you somewhat.

julia> using DataFrames
julia> using Query

julia> struct Item
           value::Union{Missing, Float64}
       end

julia> df = DataFrame(:x => [Item(1.0)])
1×1 DataFrame
 Row │ x
     │ Item
─────┼───────────
   1 │ Item(1.0)

julia> df |> @mutate(y = _.x.value) |> DataFrame
1×2 DataFrame
 Row │ x          y
     │ Any        Any
─────┼────────────────
   1 │ Item(1.0)  1.0

i-aki-y · 2021-02-03T12:57:05Z

I have examined this problem furthermore.

The problem seems to happen in a return type estimation of a map function that is defined in the QueryOperators.

function map(source::Enumerable, f::Function, f_expr::Expr)
    TS = eltype(source)
    T = Base._return_type(f, Tuple{TS,})
    S = typeof(source)
    Q = typeof(f)
    return EnumerableMap{T,S,Q}(source, f)
end

cf. https://github.com/queryverse/QueryOperators.jl/blob/fd7534405a5f2db2d555f4dd9e796205d7711cde/src/enumerable/enumerable_map.jl#L12

Although I'm not sure what the Base._return_type is since it is undocumented, it seems to estimate a return type of the function f that generates a NamedTuple from the argument of @mutate.
And it fails with some kind of input are given.

This is an example.

using DataFrames
using QueryOperators

struct Item1
    value::Union{Missing, Int64}
end

struct Item2
    value::Int64
end

QueryOperators.map(QueryOperators.query([Item1(1.0)]), item -> (v = item.value, ), :()) |> DataFrame |> println
#1×1 DataFrame
# Row │ v   
#     │ Any 
#─────┼─────
#   1 │ 1


QueryOperators.map(QueryOperators.query([Item2(1.0)]), item -> (v = item.value, ), :()) |> DataFrame |> println
#1×1 DataFrame
# Row │ v     
#     │ Int64 
#─────┼───────
#   1 │     1

Sorry, I'm not sure why it happens, whether this is some limitation of a type inference of the language or kind of bugs. It will be difficult to investigate the cause any further with my limited knowledge now.

Anyway, I hope this will help.

tlamadon · 2021-02-09T18:22:09Z

I am having the same problem on grouping on multiple columns. However ... |> collect |> DataFrame, so thanks for that suggestion!

danvinci mentioned this issue Feb 2, 2021

Using @mutate with a Dictionary or array loses the column types #324

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Column types get obliterated by Query.jl #313

Column types get obliterated by Query.jl #313

samuela commented Aug 7, 2020

davidanthoff commented Aug 7, 2020

samuela commented Aug 7, 2020

extradosages commented Nov 13, 2020

davidanthoff commented Jan 6, 2021

i-aki-y commented Jan 13, 2021

i-aki-y commented Feb 3, 2021

tlamadon commented Feb 9, 2021 •

edited

Loading

Column types get obliterated by Query.jl #313

Column types get obliterated by Query.jl #313

Comments

samuela commented Aug 7, 2020

davidanthoff commented Aug 7, 2020

samuela commented Aug 7, 2020

extradosages commented Nov 13, 2020

davidanthoff commented Jan 6, 2021

i-aki-y commented Jan 13, 2021

i-aki-y commented Feb 3, 2021

tlamadon commented Feb 9, 2021 • edited Loading

tlamadon commented Feb 9, 2021 •

edited

Loading