Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enable test_advanced_2 on windows #20994

Merged
merged 8 commits into from
Dec 15, 2021
Merged

Conversation

mattip
Copy link
Contributor

@mattip mattip commented Dec 9, 2021

Why are these changes needed?

Builds on #20986 and enables test_advanced_2.py for windows. There is a test that needed a tweak for too-long command line. The last 3 commits are the relevant ones for this PR.

Related issue number

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@mattip
Copy link
Contributor Author

mattip commented Dec 9, 2021

@pcmoritz

@mattip
Copy link
Contributor Author

mattip commented Dec 9, 2021

(there is a linting error I will fix when rebasing this after merge of #20986)

The error is a Memory error in this loop

        # Attempt to wait for all of the workers to start up.
        while True:
            if len(
                    set(
                        ray.get([
>                           get_worker_id.remote() for _ in range(num_workers)
                        ]))) == num_workers:

where the actual error is

                      raise value.as_instanceof_cause()
E                       ray.exceptions.RayTaskError(MemoryError): ray::get_worker_id() (pid=5572, ip=127.0.0.1)
E                         File "python\ray\_raylet.pyx", line 694, in ray._raylet.execute_task
E                         File "python\ray\_raylet.pyx", line 695, in ray._raylet.execute_task
E                         File "python\ray\_raylet.pyx", line 1892, in ray._raylet.CoreWorker.store_task_outputs
E                         File "d:\a\ray\ray\python\ray\serialization.py", line 365, in serialize
E                           return self._serialize_to_msgpack(value)
E                         File "d:\a\ray\ray\python\ray\serialization.py", line 340, in _serialize_to_msgpack
E                           msgpack_data = MessagePackSerializer.dumps(value, _python_serializer)
E                         File "python\ray\includes/serialization.pxi", line 162, in ray._raylet.MessagePackSerializer.dumps
E                         File "C:\hostedtoolcache\windows\Python\3.7.9\x64\lib\site-packages\msgpack\__init__.py", line 35, in packb
E                           return Packer(**kwargs).pack(o)
E                         File "msgpack\_packer.pyx", line 120, in msgpack._cmsgpack.Packer.__cinit__
E                       MemoryError: Unable to allocate internal buffer.

But that can't be right, the test must be hitting some other limit. It does try to allocate 20 workers on 10 cpus and 2 gpus, when the gitub runner has 2 physical CPUs and 8GB memory. On my local 4-core 16GB windows machine, the test passes.

@czgdp1807
Copy link
Contributor

It does try to allocate 20 workers on 10 cpus and 2 gpus, when the gitub runner has 2 physical CPUs and 8GB memory. On my local 4-core 16GB windows machine, the test passes.

This is a common issue with CI. Due to limited CPUs and Memory sometimes things timed out or a Memory Error is raised. Reproducing thing is possible locally by degrading the system hardware. An easy fix is to reduce the task size (a better option) or relaxing the time limits (not worth it, since spending 10 minutes on CI for a test is just slowing down the overall workflow).

@pcmoritz
Copy link
Contributor

@mattip I merged your other PR now #20986, can you rebase this one?

@mattip
Copy link
Contributor Author

mattip commented Dec 13, 2021

rebased

@pcmoritz
Copy link
Contributor

Looks great, thanks! I pushed a small fix to make sure other tests on other platforms are not modified :)

@mattip
Copy link
Contributor Author

mattip commented Dec 15, 2021

Cool

@pcmoritz pcmoritz merged commit d2cd073 into ray-project:master Dec 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants