fix python version and pytest install #1234

jahatef · 2024-06-06T20:20:27Z

Possibly fix workflow issues. Needs to be tested in PR.

…ix_workflows

jahatef · 2024-06-17T21:42:19Z

Fixed workflows by specifying python versions, and installing packages before running tests. The pip install will exit with "requirement already satisfied" if the package is already installed, which should be fine. I also updated some requirements in requirements.txt: I pulled the commit hash off deeper speed (which I'm not sure if we want), and I updated the numpy requirement to be <2.0, which is required or it breaks deep speed.

Tests will run, although it seems as though some tests currently fail with no access to a gpu, and some fail with reasons seemingly unrelated to the workflows. See https://github.com/EleutherAI/gpt-neox/actions/runs/9555032138/job/26337367665

Quentin-Anthony · 2024-06-17T22:28:50Z

Fixed workflows by specifying python versions, and installing packages before running tests. The pip install will exit with "requirement already satisfied" if the package is already installed, which should be fine. I also updated some requirements in requirements.txt: I pulled the commit hash off deeper speed (which I'm not sure if we want), and I updated the numpy requirement to be <2.0, which is required or it breaks deep speed.

Tests will run, although it seems as though some tests currently fail with no access to a gpu, and some fail with reasons seemingly unrelated to the workflows. See https://github.com/EleutherAI/gpt-neox/actions/runs/9555032138/job/26337367665

Here's the relevant trace from the runner, for future reference.

____________________________ test_main_constructor _____________________________
def test_main_constructor():
        input_args = ["train.py", "tests/config/test_setup.yml"]
>       neox_args = NeoXArgs.consume_deepy_args(input_args)

tests/unit/test_arguments.py:21: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
megatron/neox_arguments/arguments.py:371: in consume_deepy_args
    neox_args = cls.from_ymls(
megatron/neox_arguments/arguments.py:229: in from_ymls
    return cls(**config)
<string>:266: in __init__
    ???
megatron/neox_arguments/arguments.py:134: in __post_init__
    self.calculate_derived()
megatron/neox_arguments/arguments.py:836: in calculate_derived
    resources = obtain_resource_pool(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
hostfile_path = 'None', include_arg = 'localhost:1', exclude_arg = ''

    def obtain_resource_pool(
        hostfile_path, include_arg, exclude_arg
    ) -> Dict[str, List[int]]:
        """
        Get dict of `resource_pool[hostname] = [list of GPU ranks]` using hostfile, include and exclude args.
        Modified from: `deepspeed.launcher.runner.main`
        """
        resource_pool = fetch_hostfile(hostfile_path)
        if not resource_pool:
            resource_pool = {}
            device_count = torch.cuda.device_count()
            if device_count == 0:
>               raise RuntimeError("Unable to proceed, no GPU resources available")
E               RuntimeError: Unable to proceed, no GPU resources available

megatron/utils.py:201: RuntimeError
----------------------------- Captured stdout call -----------------------------
NeoXArgs.from_ymls() ['tests/config/test_setup.yml']
Warning: 17 21:32:57,005] [WARNING] [runner.py:217:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
__________________________ test_constructor_from_ymls __________________________
def test_constructor_from_ymls():
        t1 = test_constructor_from_ymls_class()
>       t1.test()

tests/unit/test_arguments.py:37: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
tests/unit/test_arguments.py:31: in test
    neox_args = NeoXArgs.from_ymls(["tests/config/test_setup.yml"])
megatron/neox_arguments/arguments.py:229: in from_ymls
    return cls(**config)
<string>:266: in __init__
    ???
megatron/neox_arguments/arguments.py:134: in __post_init__
    self.calculate_derived()
megatron/neox_arguments/arguments.py:836: in calculate_derived
    resources = obtain_resource_pool(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

hostfile_path = 'None', include_arg = 'localhost:1', exclude_arg = ''

    def obtain_resource_pool(
        hostfile_path, include_arg, exclude_arg
    ) -> Dict[str, List[int]]:
        """
        Get dict of `resource_pool[hostname] = [list of GPU ranks]` using hostfile, include and exclude args.
        Modified from: `deepspeed.launcher.runner.main`
        """
        resource_pool = fetch_hostfile(hostfile_path)
        if not resource_pool:
            resource_pool = {}
            device_count = torch.cuda.device_count()
            if device_count == 0:
>               raise RuntimeError("Unable to proceed, no GPU resources available")
E               RuntimeError: Unable to proceed, no GPU resources available
megatron/utils.py:201: RuntimeError
----------------------------- Captured stdout call -----------------------------
NeoXArgs.from_ymls() ['tests/config/test_setup.yml']
Warning: 17 21:32:57,294] [WARNING] [runner.py:217:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
__________________________ test_constructor_from_dict __________________________
def test_constructor_from_dict():
        t1 = test_constructor_from_dict_class()
>       t1.test()

tests/unit/test_arguments.py:49: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
tests/unit/test_arguments.py:44: in test
    neox_args = NeoXArgs.from_dict(BASE_CONFIG)
megatron/neox_arguments/arguments.py:236: in from_dict
    return cls(**args_dict)
<string>:266: in __init__
    ???
megatron/neox_arguments/arguments.py:134: in __post_init__
    self.calculate_derived()
megatron/neox_arguments/arguments.py:836: in calculate_derived
    resources = obtain_resource_pool(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

hostfile_path = 'None', include_arg = 'localhost:1', exclude_arg = ''
    def obtain_resource_pool(
        hostfile_path, include_arg, exclude_arg
    ) -> Dict[str, List[int]]:
        """
        Get dict of `resource_pool[hostname] = [list of GPU ranks]` using hostfile, include and exclude args.
        Modified from: `deepspeed.launcher.runner.main`
        """
        resource_pool = fetch_hostfile(hostfile_path)
        if not resource_pool:
            resource_pool = {}
            device_count = torch.cuda.device_count()
            if device_count == 0:
>               raise RuntimeError("Unable to proceed, no GPU resources available")
E               RuntimeError: Unable to proceed, no GPU resources available

megatron/utils.py:201: RuntimeError
----------------------------- Captured stdout call -----------------------------
Warning: 17 21:32:57,574] [WARNING] [runner.py:217:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
_________________________ test_gpt_neox_to_huggingface _________________________
monkeypatch = <_pytest.monkeypatch.MonkeyPatch object at 0x7f278be35b70>
tmpdir = local('/tmp/pytest-of-root/pytest-1/test_gpt_neox_to_huggingface0')
tmp_path = PosixPath('/tmp/pytest-of-root/pytest-1/test_gpt_neox_to_huggingface0')

    def test_gpt_neox_to_huggingface(monkeypatch, tmpdir, tmp_path):
        # Generate random GPT-NEOX model, check we can convert to hf format
        model_dir = str(tmpdir)
        input_args = ["train.py", "tests/config/test_setup.yml"]
>       deepspeed_main_args = simulate_deepy_env(monkeypatch, input_args)

tests/unit/test_format_conversion_scripts.py:11: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
tests/common.py:523: in simulate_deepy_env
    neox_args = NeoXArgs.consume_deepy_args(input_args)
megatron/neox_arguments/arguments.py:371: in consume_deepy_args
    neox_args = cls.from_ymls(
megatron/neox_arguments/arguments.py:229: in from_ymls
    return cls(**config)
<string>:266: in __init__
    ???
megatron/neox_arguments/arguments.py:134: in __post_init__
    self.calculate_derived()
megatron/neox_arguments/arguments.py:836: in calculate_derived
    resources = obtain_resource_pool(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

hostfile_path = 'None', include_arg = 'localhost:1', exclude_arg = ''

    def obtain_resource_pool(
        hostfile_path, include_arg, exclude_arg
    ) -> Dict[str, List[int]]:
        """
        Get dict of `resource_pool[hostname] = [list of GPU ranks]` using hostfile, include and exclude args.
        Modified from: `deepspeed.launcher.runner.main`
        """
        resource_pool = fetch_hostfile(hostfile_path)
        if not resource_pool:
            resource_pool = {}
            device_count = torch.cuda.device_count()
            if device_count == 0:
>               raise RuntimeError("Unable to proceed, no GPU resources available")
E               RuntimeError: Unable to proceed, no GPU resources available

megatron/utils.py:201: RuntimeError
----------------------------- Captured stdout call -----------------------------
NeoXArgs.from_ymls() ['tests/config/test_setup.yml']
Warning: 17 21:32:58,104] [WARNING] [runner.py:217:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
=============================== warnings summary ===============================
<string>:8
  <string>:8: PytestDeprecationWarning: A private pytest class or function was used.

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
FAILED tests/neox_args/test_neoxargs_usage.py::test_neoxargs_usage
FAILED tests/unit/test_arguments.py::test_main_constructor
FAILED tests/unit/test_arguments.py::test_constructor_from_ymls
FAILED tests/unit/test_arguments.py::test_constructor_from_dict
FAILED tests/unit/test_format_conversion_scripts.py::test_gpt_neox_to_huggingface
======= 5 failed, 24 passed, 92 skipped, 80 xfailed, 1 warning in 28.89s =======
Error: Process completed with exit code 1.

Quentin-Anthony · 2024-06-17T22:30:09Z

@jahatef -- Why remove the commit hash from deeperspeed, but leave it for lm_dataformat?

jahatef · 2024-06-17T22:33:21Z

No good reason, it was a 4 month old version, which I'm not sure we want to be the default for users. I can add the hash back or remove it for the other package. I don't believe it was the cause of the issues I saw, I think it was the numpy version that caused problems with deep speed.

fix python version and pytest install

6b8c4a7

jahatef requested a review from Quentin-Anthony as a code owner June 6, 2024 20:20

github-actions and others added 15 commits June 6, 2024 20:20

Update NeoXArgs docs automatically

66933bd

python3

28d0e0c

Merge branch 'fix_workflows' of github.com:EleutherAI/gpt-neox into f…

3dd7daa

…ix_workflows

Update NeoXArgs docs automatically

3be4cb8

pip not pip3

a6ca818

Update NeoXArgs docs automatically

54bc341

python3 pip

f62acee

Merge branch 'fix_workflows' of github.com:EleutherAI/gpt-neox into f…

33c236b

…ix_workflows

Update NeoXArgs docs automatically

4aa314f

python3 -m pip

e7a49f8

Merge branch 'fix_workflows' of github.com:EleutherAI/gpt-neox into f…

42fe483

…ix_workflows

Update NeoXArgs docs automatically

b334787

Update NeoXArgs docs automatically

30b4ac0

Merge branch 'main' into fix_workflows

9ed8e8e

Update NeoXArgs docs automatically

268d5d8

jahatef marked this pull request as draft June 7, 2024 01:33

jahatef and others added 12 commits June 10, 2024 23:19

add docker setup to workflow

fca70d4

Merge branch 'fix_workflows' of github.com:EleutherAI/gpt-neox into f…

3843347

…ix_workflows

Update NeoXArgs docs automatically

70cfc19

python setup

d4795de

Merge branch 'fix_workflows' of github.com:EleutherAI/gpt-neox into f…

9edfc4e

…ix_workflows

Update NeoXArgs docs automatically

3d6d0a0

python setup v2

b7bcc01

Merge branch 'fix_workflows' of github.com:EleutherAI/gpt-neox into f…

4706d2c

…ix_workflows

Update NeoXArgs docs automatically

2b3e3f0

python setup v3

3688c17

Merge branch 'fix_workflows' of github.com:EleutherAI/gpt-neox into f…

324fa8c

…ix_workflows

python setup v3

c943cd3

jahatef and others added 23 commits June 17, 2024 20:38

Merge branch 'fix_workflows' of github.com:EleutherAI/gpt-neox into f…

7023f43

…ix_workflows

Update NeoXArgs docs automatically

7655e0b

python setup v3

ea319a4

Merge branch 'fix_workflows' of github.com:EleutherAI/gpt-neox into f…

ed523e4

…ix_workflows

Update NeoXArgs docs automatically

cfe7331

python setup v3

0c131ee

Merge branch 'fix_workflows' of github.com:EleutherAI/gpt-neox into f…

5343837

…ix_workflows

Update NeoXArgs docs automatically

4ba311b

python setup v3

32bd805

Merge branch 'fix_workflows' of github.com:EleutherAI/gpt-neox into f…

56391cb

…ix_workflows

Update NeoXArgs docs automatically

8db3965

python setup v3

9b8830b

Merge branch 'fix_workflows' of github.com:EleutherAI/gpt-neox into f…

b0fc9ef

…ix_workflows

Update NeoXArgs docs automatically

db42f08

python setup v3

95f2298

Merge branch 'fix_workflows' of github.com:EleutherAI/gpt-neox into f…

3163e0e

…ix_workflows

Update NeoXArgs docs automatically

55c179f

python setup v3

2a1438e

Merge branch 'fix_workflows' of github.com:EleutherAI/gpt-neox into f…

a965163

…ix_workflows

Update NeoXArgs docs automatically

7dd7007

python setup v3

69378c4

Merge branch 'fix_workflows' of github.com:EleutherAI/gpt-neox into f…

b456e2c

…ix_workflows

Update NeoXArgs docs automatically

dace326

jahatef marked this pull request as ready for review June 17, 2024 21:42

Quentin-Anthony and others added 2 commits June 17, 2024 15:29

Merge branch 'main' into fix_workflows

20cc2aa

Update NeoXArgs docs automatically

ad4bca1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix python version and pytest install #1234

fix python version and pytest install #1234

jahatef commented Jun 6, 2024

jahatef commented Jun 17, 2024 •

edited

Quentin-Anthony commented Jun 17, 2024

Quentin-Anthony commented Jun 17, 2024

jahatef commented Jun 17, 2024

fix python version and pytest install #1234

Are you sure you want to change the base?

fix python version and pytest install #1234

Conversation

jahatef commented Jun 6, 2024

jahatef commented Jun 17, 2024 • edited

Quentin-Anthony commented Jun 17, 2024

Quentin-Anthony commented Jun 17, 2024

jahatef commented Jun 17, 2024

jahatef commented Jun 17, 2024 •

edited