Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make nprocs report only fully connected workers #21347

Merged
merged 2 commits into from
Apr 14, 2017

Commits on Apr 12, 2017

  1. make nprocs report only fully connected workers

    This changes nprocs (and therefore nworkers) report only workers in `W_CONNECTED` state.
    
    This is to avoid issue with `@everywhere` or other similar methods that broadcast
    messages to all workers.
    
    To simulate:
    - introduce an artificial delay at https://github.com/JuliaLang/julia/blob/2c4f6d74577a1b7606ed5e74e96158810f4f7af4/base/distributed/cluster.jl#L443 with `sleep(10)`
    - start a master with:
    ```
    using ClusterManagers
    
    ElasticManager(;addr=IPv4("0.0.0.0"), port=9009, cookie="cookie", topology=:master_slave)
    
    while nworkers() < 4
        sleep(1)
    end
    
    @Everywhere println(myid())
    ```
    - start 4 workers with:
    ```
    using ClusterManagers
    ClusterManagers.elastic_worker("cookie", "127.0.0.1", 9009; stdout_to_master=false)
    ```
    
    Without this change, this will often result in:
    
    ```
    ERROR: LoadError: peer 3 is not connected to 1. Topology : master_slave
    check_worker_state(::Base.Distributed.Worker) at ./distributed/cluster.jl:115
    send_msg_(::Base.Distributed.Worker, ::Base.Distributed.MsgHeader, ::Base.Distributed.CallMsg{:call_fetch}, ::Bool) at ./distributed/messages.jl:180
    remotecall_fetch(::Function, ::Base.Distributed.Worker, ::Expr, ::Vararg{Expr,N} where N) at ./distributed/remotecall.jl:346
    remotecall_fetch(::Function, ::Int64, ::Expr, ::Vararg{Expr,N} where N) at ./distributed/remotecall.jl:367
    (::##1#3)() at ./distributed/macros.jl:84
    ```
    tanmaykm committed Apr 12, 2017
    Configuration menu
    Copy the full SHA
    fec3287 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    884e5f4 View commit details
    Browse the repository at this point in the history