Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[core] Make GCS python client retry number and timeout tunable (ray-p…
…roject#39650) ## Why are these changes needed? When the GCS is overloaded, it might not response in time. For example, we might get error like this: ``` [2023-09-13 16:00:17,030 C 110 110] gcs_client.cc:153: Check failed: (_left_ != _right_) 0 vs 0 *** StackTrace Information *** /home/ray/anaconda3/lib/python3.11/site-packages/ray/_raylet.so(+0xfde20a) [0x7f7e422f920a] ray::operator<<() /home/ray/anaconda3/lib/python3.11/site-packages/ray/_raylet.so(+0xfdfcf2) [0x7f7e422facf2] ray::SpdLogMessage::Flush() /home/ray/anaconda3/lib/python3.11/site-packages/ray/_raylet.so(_ZN3ray6RayLogD1Ev+0x37) [0x7f7e422fb007] ray::RayLog::~RayLog() /home/ray/anaconda3/lib/python3.11/site-packages/ray/_raylet.so(+0x8afd16) [0x7f7e41bcad16] ray::gcs::(anonymous namespace)::HandleGcsError() /home/ray/anaconda3/lib/python3.11/site-packages/ray/_raylet.so(_ZN3ray3gcs15PythonGcsClient7ConnectERKNS_9ClusterIDElm+0x3ca) [0x7f7e41bd0e1a] ray::gcs::PythonGcsClient::Connect() /home/ray/anaconda3/lib/python3.11/site-packages/ray/_raylet.so(+0x622700) [0x7f7e4193d700] __pyx_pw_3ray_7_raylet_9GcsClient_3_connect() /home/ray/anaconda3/bin/python() [0x525877] cfunction_call /home/ray/anaconda3/lib/python3.11/site-packages/ray/_raylet.so(+0x5b6bbf) [0x7f7e418d1bbf] __Pyx__PyObject_CallOneArg() /home/ray/anaconda3/lib/python3.11/site-packages/ray/_raylet.so(+0x656eb5) [0x7f7e41971eb5] __pyx_tp_new_3ray_7_raylet_GcsClient() /home/ray/anaconda3/bin/python(_PyObject_MakeTpCall+0x182) [0x502662] _PyObject_MakeTpCall /home/ray/anaconda3/bin/python(_PyEval_EvalFrameDefault+0x758) [0x50eaa8] _PyEval_EvalFrameDefault /home/ray/anaconda3/bin/python(_PyFunction_Vectorcall+0x173) [0x535113] _PyFunction_Vectorcall ``` This PR makes these parameter tunable.
- Loading branch information