move peft imports to avoid RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase #30

geronimi73 · 2024-03-13T05:38:53Z

It seems that something in autoawq causes a RuntimeError in train.py if the package (autoawq) is imported before process forking. peft, starting from 0.9 imports autoawq. This PR moves the peft imports after process forking and thereby prevents the RuntimeError with peft>=0.9.

Related issue: #28

geronimi73 · 2024-03-15T00:16:47Z

@johnowhitaker

johnowhitaker · 2024-03-15T15:13:43Z

Thank you @geronimi73 much appreciated :)

iseesaw · 2024-04-24T06:36:35Z

Sorry, I still met this problem using the merged code

  File "/root/miniconda3/lib/python3.10/site-packages/fastcore/script.py", line 119, in _f
    return tfunc(**merge(args, args_from_prog(func, xtra)))
  File "/root/kyzhang/llms/UltraMedical/llm_train/train_qdora.py", line 1086, in main
    mp.spawn(fsdp_main,
  File "/root/miniconda3/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 246, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method="spawn")
  File "/root/miniconda3/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 193, in start_processes
    process.start()                                                                                                                                                          File "/root/miniconda3/lib/python3.10/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)                                                                                                                                          File "/root/miniconda3/lib/python3.10/multiprocessing/context.py", line 288, in _Popen
    return Popen(process_obj)                                                                                                                                                File "/root/miniconda3/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 32, in __init__                                                                            super().__init__(process_obj)                                                                                                                                            File "/root/miniconda3/lib/python3.10/multiprocessing/popen_fork.py", line 19, in __init__                                                                                   self._launch(process_obj)                                                                                                                                                File "/root/miniconda3/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 42, in _launch                                                                             prep_data = spawn.get_preparation_data(process_obj._name)                                                                                                                File "/root/miniconda3/lib/python3.10/multiprocessing/spawn.py", line 154, in get_preparation_data                                                                           _check_not_importing_main()                                                                                                                                              File "/root/miniconda3/lib/python3.10/multiprocessing/spawn.py", line 134, in _check_not_importing_main                                                                      raise RuntimeError('''                                                                                                                                                 RuntimeError:                                                                                                                                                                      An attempt has been made to start a new process before the                                                                                                                 current process has finished its bootstrapping phase.                                                                                                                                                                                                                                                                                                 This probably means that you are not using fork to start your                                                                                                              child processes and you have forgotten to use the proper idiom                                                                                                             in the main module:                                                                                                                                                                                                                                                                                                                                       if __name__ == '__main__':                                                                                                                                                     freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

geronimi73 · 2024-04-24T07:24:19Z

try again after pip uninstall autoawq

iseesaw · 2024-04-24T07:41:48Z

try again after pip uninstall autoawq

thanks for your response, autoawq is not installed on my server

(base) root@b575798d621b:~/kyzhang/llms/UltraMedical# pip uninstall autoawq
WARNING: Skipping autoawq as it is not installed.
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv

iseesaw · 2024-04-24T08:21:02Z

pip list

accelerate                0.29.3
bitsandbytes              0.43.1
datasets                  2.14.6
huggingface-hub           0.20.3
llama-recipes             0.0.1
peft                      0.10.0
safetensors               0.4.2      
tokenizers                0.19.1
torch                     2.1.2
transformers              4.40.0
cupy-cuda12x              12.1.0
nvidia-cuda-cupti-cu12    12.1.105
nvidia-cuda-nvrtc-cu12    12.1.105
nvidia-cuda-runtime-cu12  12.1.105

8xA6000 48G, CUDA Version: 12.2

geronimi73 · 2024-04-24T14:18:17Z

pip list

accelerate                0.29.3
bitsandbytes              0.43.1
datasets                  2.14.6
huggingface-hub           0.20.3
llama-recipes             0.0.1
peft                      0.10.0
safetensors               0.4.2      
tokenizers                0.19.1
torch                     2.1.2
transformers              4.40.0
cupy-cuda12x              12.1.0
nvidia-cuda-cupti-cu12    12.1.105
nvidia-cuda-nvrtc-cu12    12.1.105
nvidia-cuda-runtime-cu12  12.1.105

8xA6000 48G, CUDA Version: 12.2

can't reproduce with those packages. give me the entire pip list please

iseesaw · 2024-04-24T14:41:58Z

Thanks for your patience! Here is the complete list:
requirements.txt

geronimi73 · 2024-04-24T15:39:01Z

Check if this works:

in train.py, insert the following code (below) at line 1036

fsdp_qlora/train.py

Line 1036 in 467933f

code to insert:

    if __name__ != '__main__':
        return

your code starting from line 1034 should look like this:

    entity: str = None, # For wandb logging
):
    if __name__ != '__main__':
        return

    # Set world size
    if world_size == -1:
        world_size = torch.cuda.device_count()
    print(f"World size: {world_size}")

then try again

iseesaw · 2024-04-24T15:51:21Z

Thanks! I'll give this a try later.

iseesaw · 2024-04-24T16:28:35Z

I successfully ran the code! Thank you very much. This project is wonderful!

geronimi73 · 2024-04-24T17:02:08Z

I successfully ran the code! Thank you very much. This project is wonderful!

👍

which OS are you on, windows?

iseesaw · 2024-04-24T17:11:45Z

which OS are you on, windows?

Ubuntu 22.04.2 LTS in Docker

geronimi73 · 2024-04-25T13:51:20Z

@iseesaw could you please check if this runs or throws the same error:

import torch.multiprocessing as mp
from fastcore.script import call_parse

print(f"script. {__name__}")

def do_something(inp):
    print('do_something')

@call_parse
def main():
    print('main')

    mp.spawn(
        do_something, 
        nprocs=2,
        join=True
        )
    print('Finished')

iseesaw · 2024-04-25T13:56:53Z

@iseesaw could you please check if this runs or throws the same error:

I tested the code, and it executed successfully without any errors. Here is the output I observed:

script. __main__
main
script. __mp_main__
do_something
script. __mp_main__
do_something
Finished

geronimi73 · 2024-04-25T14:32:23Z

i'm still trying to understand why this error happens.

are you using the original train.py from this repo or did you modify the code? are you by any chance using the HF datasets lib with import datasets (inside train.py) or something similar?

iseesaw · 2024-04-25T14:46:13Z

The error may be related to the use of multiprocessing for dataset processing.

To adapt to different model chat templates, I modified the get_dataloader() function in train.py. Additionally, I've imported LazySupervisedDataset from the FastChat repository, which you can view train_with_template.py#L258 and train_with_template.py#L209.

My apologies for any confusion caused. This modification could be the source of the problem.

geronimi73 added 2 commits March 11, 2024 10:43

move peft imports

9046685

Merge branch 'AnswerDotAI:main' into fix_ProcessExitedException

f8b36ff

johnowhitaker merged commit d7818ec into AnswerDotAI:main Mar 15, 2024

geronimi73 deleted the fix_ProcessExitedException branch March 15, 2024 17:11

geronimi73 mentioned this pull request Apr 25, 2024

fix multiprocess issue (RuntimeError An attempt has been made to start a new process before the current process has finished its bootstrapping phase) #62

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

move peft imports to avoid RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase #30

move peft imports to avoid RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase #30

geronimi73 commented Mar 13, 2024

geronimi73 commented Mar 15, 2024

johnowhitaker commented Mar 15, 2024

iseesaw commented Apr 24, 2024

geronimi73 commented Apr 24, 2024

iseesaw commented Apr 24, 2024

iseesaw commented Apr 24, 2024

geronimi73 commented Apr 24, 2024

iseesaw commented Apr 24, 2024

geronimi73 commented Apr 24, 2024

iseesaw commented Apr 24, 2024

iseesaw commented Apr 24, 2024

geronimi73 commented Apr 24, 2024

iseesaw commented Apr 24, 2024

geronimi73 commented Apr 25, 2024

iseesaw commented Apr 25, 2024

geronimi73 commented Apr 25, 2024

iseesaw commented Apr 25, 2024

move peft imports to avoid RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase #30

move peft imports to avoid RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase #30

Conversation

geronimi73 commented Mar 13, 2024

geronimi73 commented Mar 15, 2024

johnowhitaker commented Mar 15, 2024

iseesaw commented Apr 24, 2024

geronimi73 commented Apr 24, 2024

iseesaw commented Apr 24, 2024

iseesaw commented Apr 24, 2024

geronimi73 commented Apr 24, 2024

iseesaw commented Apr 24, 2024

geronimi73 commented Apr 24, 2024

iseesaw commented Apr 24, 2024

iseesaw commented Apr 24, 2024

geronimi73 commented Apr 24, 2024

iseesaw commented Apr 24, 2024

geronimi73 commented Apr 25, 2024

iseesaw commented Apr 25, 2024

geronimi73 commented Apr 25, 2024

iseesaw commented Apr 25, 2024