-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to ensure DTable
s stay on one process
#58
Comments
I tried using julia> using Distributed; addprocs(1); @everywhere using Dagger, DTables, DataFrames
julia> Dagger.with_options(; scope = ProcessScope(myid())) do
dt = DTable(DataFrame(a = rand(1:5, 100), b = 1:100))
gdt = groupby(dt, :a)
map(c -> (c.scope, c.processor), gdt.dtable.chunks)
end
5-element Vector{Tuple{AnyScope, OSProc}}:
(AnyScope(), OSProc(1))
(AnyScope(), OSProc(1))
(AnyScope(), OSProc(1))
(AnyScope(), OSProc(1))
(AnyScope(), OSProc(1)) Any tips? I guess it appears all the chunks are on the correct process, but since the scope isn't what I set it to be, could the chunks be migrated to another process? |
@jpsamaroo is this expected or is it a bug? Seems like the result chunk doesn't inherit options julia> t = Dagger.with_options(; scope=ProcessScope(1)) do
Dagger.spawn(Dagger.get_options)
end
EagerThunk (finished)
julia> fetch(t.future.future)[2].scope
AnyScope()
julia> fetch(t)
(scope = ProcessScope: worker == 1,)
|
I would like to use DTables.jl where each
DTable
orGDTable
exists on exactly one process and does not migrate (so, effectively using DTables.jl just for its out-of-core processing capabilities). It looks likeDagger.@spawn
has some support for ensuring a task executes on a given process. Does DTables.jl support thescope
kwarg ofDagger.@spawn
? And is what I want to do possible/feasible?The text was updated successfully, but these errors were encountered: