Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to use @select on SQL data sources #309

Open
tlnagy opened this issue Jul 15, 2020 · 2 comments
Open

Add ability to use @select on SQL data sources #309

tlnagy opened this issue Jul 15, 2020 · 2 comments
Labels
Milestone

Comments

@tlnagy
Copy link

tlnagy commented Jul 15, 2020

I have a very large table in a SQL database that I'm working with and it would be great to ignore some columns before loading, but currently I can't drop columns using @select:

julia> r = DBInterface.execute(db, "SELECT * FROM fxm");

julia> r |> @select(-:c)
MethodError: no method matching remove(::SQLite.Row, ::Val{:c})
Closest candidates are:
  remove(!Matched::NamedTuple{an,T} where T<:Tuple, ::Val{bn}) where {an, bn} at /home/tlnagy/.julia/packages/QueryOperators/g4G21/src/NamedTupleUtilities.jl:27

Stacktrace:
 [1] (::var"#52#54")(::SQLite.Row) at /home/tlnagy/.julia/packages/Query/AwBtd/src/query_translation.jl:58
 [2] iterate at /home/tlnagy/.julia/packages/QueryOperators/g4G21/src/enumerable/enumerable_map.jl:25 [inlined]
 [3] iterate at ./iterators.jl:641 [inlined]
 [4] iterate at ./iterators.jl:639 [inlined]
 [5] _collect at ./array.jl:614 [inlined]
 [6] collect at ./array.jl:603 [inlined]
 [7] |> at ./operators.jl:823 [inlined]
 [8] printtable(::IOContext{Base.GenericIOBuffer{Array{UInt8,1}}}, ::QueryOperators.EnumerableMap{Union{},QueryOperators.EnumerableIterable{SQLite.Row,SQLite.Query},var"#52#54"}, ::String; force_unknown_rows::Bool) at /home/tlnagy/.julia/packages/TableShowUtils/ImkA9/src/TableShowUtils.jl:16
 [9] printtable(::IOContext{Base.GenericIOBuffer{Array{UInt8,1}}}, ::QueryOperators.EnumerableMap{Union{},QueryOperators.EnumerableIterable{SQLite.Row,SQLite.Query},var"#52#54"}, ::String) at /home/tlnagy/.julia/packages/TableShowUtils/ImkA9/src/TableShowUtils.jl:7
 [10] show(::IOContext{Base.GenericIOBuffer{Array{UInt8,1}}}, ::QueryOperators.EnumerableMap{Union{},QueryOperators.EnumerableIterable{SQLite.Row,SQLite.Query},var"#52#54"}) at /home/tlnagy/.julia/packages/QueryOperators/g4G21/src/enumerable/show.jl:35
 [11] show(::IOContext{Base.GenericIOBuffer{Array{UInt8,1}}}, ::MIME{Symbol("text/plain")}, ::QueryOperators.EnumerableMap{Union{},QueryOperators.EnumerableIterable{SQLite.Row,SQLite.Query},var"#52#54"}) at ./multimedia.jl:47
 [12] limitstringmime(::MIME{Symbol("text/plain")}, ::QueryOperators.EnumerableMap{Union{},QueryOperators.EnumerableIterable{SQLite.Row,SQLite.Query},var"#52#54"}) at /home/tlnagy/.julia/packages/IJulia/DrVMH/src/inline.jl:43
 [13] display_mimestring(::MIME{Symbol("text/plain")}, ::QueryOperators.EnumerableMap{Union{},QueryOperators.EnumerableIterable{SQLite.Row,SQLite.Query},var"#52#54"}) at /home/tlnagy/.julia/packages/IJulia/DrVMH/src/display.jl:67
 [14] display_dict(::QueryOperators.EnumerableMap{Union{},QueryOperators.EnumerableIterable{SQLite.Row,SQLite.Query},var"#52#54"}) at /home/tlnagy/.julia/packages/IJulia/DrVMH/src/display.jl:96
 [15] #invokelatest#1 at ./essentials.jl:712 [inlined]
 [16] invokelatest at ./essentials.jl:711 [inlined]
 [17] execute_request(::ZMQ.Socket, ::IJulia.Msg) at /home/tlnagy/.julia/packages/IJulia/DrVMH/src/execute_request.jl:112
 [18] #invokelatest#1 at ./essentials.jl:712 [inlined]
 [19] invokelatest at ./essentials.jl:711 [inlined]
 [20] eventloop(::ZMQ.Socket) at /home/tlnagy/.julia/packages/IJulia/DrVMH/src/eventloop.jl:8
 [21] (::IJulia.var"#15#18")() at ./task.jl:358

It works fine if I first convert to a DataFrame

julia> r |> DataFrame |> @select(-:c)
3x2 query result
a │ b
──┼──
13
24
35
@davidanthoff
Copy link
Member

The tabular query commands like @select expect a stream of named tuples, but here you are getting a stream of SQLite.Row elements instead. If you convert to a DataFrame first then it will iterate named tuples, so that is why it works in that case.

Two ideas: you could either try to convert these Row instances into named tuples somehow. I don't know if they have that ability built in, but maybe something like @map(NamedTuple(_)) or @map({_...}) might work? I haven't tried that.

You could also give https://github.com/queryverse/QuerySQLite.jl a shot. That is an alternative Query.jl backend that translates your Query.jl query into SQL. You would write something like this:

import QuerySQLite

db = QuerySQLite.Database(db)

db.fxm |> @select(-:c)

And in theory that should then run a SQL statement that never even loads the c column from the database at all.

I'm not sure, though, whether this works with the -:c construction, @bramtayl would know.

@davidanthoff davidanthoff added this to the Backlog milestone Jul 15, 2020
@bramtayl
Copy link
Contributor

I haven't done any performance testing on QuerySQL at all, but it's definitely worth a shot! If you don't see performance improvements (highly possible), open an issue and I can try to see what I can do (it might just be a matter of optimizing the generated SQL code a bit).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants