Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to ensure DTables stay on one process #58

Open
StevenWhitaker opened this issue Nov 1, 2023 · 2 comments
Open

How to ensure DTables stay on one process #58

StevenWhitaker opened this issue Nov 1, 2023 · 2 comments

Comments

@StevenWhitaker
Copy link

I would like to use DTables.jl where each DTable or GDTable exists on exactly one process and does not migrate (so, effectively using DTables.jl just for its out-of-core processing capabilities). It looks like Dagger.@spawn has some support for ensuring a task executes on a given process. Does DTables.jl support the scope kwarg of Dagger.@spawn? And is what I want to do possible/feasible?

@StevenWhitaker
Copy link
Author

StevenWhitaker commented Nov 3, 2023

I tried using Dagger.with_options to keep all GDTable chunks on a single process, but it doesn't look like it's working quite right:

julia> using Distributed; addprocs(1); @everywhere using Dagger, DTables, DataFrames

julia> Dagger.with_options(; scope = ProcessScope(myid())) do
           dt = DTable(DataFrame(a = rand(1:5, 100), b = 1:100))
           gdt = groupby(dt, :a)
           map(c -> (c.scope, c.processor), gdt.dtable.chunks)
       end
5-element Vector{Tuple{AnyScope, OSProc}}:
 (AnyScope(), OSProc(1))
 (AnyScope(), OSProc(1))
 (AnyScope(), OSProc(1))
 (AnyScope(), OSProc(1))
 (AnyScope(), OSProc(1))

Any tips? I guess it appears all the chunks are on the correct process, but since the scope isn't what I set it to be, could the chunks be migrated to another process?

@krynju
Copy link
Member

krynju commented Nov 4, 2023

@jpsamaroo is this expected or is it a bug? Seems like the result chunk doesn't inherit options
Options get passed into the task scope properly. Haven't seen options of results tested anywhere in tests

julia> t = Dagger.with_options(; scope=ProcessScope(1)) do
           Dagger.spawn(Dagger.get_options)
       end
EagerThunk (finished)

julia> fetch(t.future.future)[2].scope
AnyScope()

julia> fetch(t)
(scope = ProcessScope: worker == 1,)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants