TensorFlow fails while no TensorFlow expected to run at all #1532

artkpv · 2024-05-22T12:23:42Z

Describe the bug

I am running OAIEval for the steganography eval with Llama 3 70B using PyTorch, HuggingFace. I don't use any TensorFlow afaik. However, I see some TensorFlow code is running and fails.

To Reproduce

Add the code to run Llama as below in Code Snippents. I see these messages in stdout:

gcc (GCC) 10.2.1 20210130 (Red Hat 10.2.1-11.1.0.1)
Copyright (C) 2020 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Python 3.11.5
[2024-05-22 11:46:58,821] [registry.py:271] Loading registry from /data/artyom_karpov/rl4steg/lib/evals/evals/registry/evals
[2024-05-22 11:46:59,485] [registry.py:271] Loading registry from /data/artyom_karpov/.evals/evals
[2024-05-22 11:46:59,704] [registry.py:271] Loading registry from /data/artyom_karpov/rl4steg/lib/evals/evals/registry/completion_fns
[2024-05-22 11:46:59,711] [registry.py:271] Loading registry from /data/artyom_karpov/.evals/completion_fns
[2024-05-22 11:46:59,711] [registry.py:271] Loading registry from /data/artyom_karpov/rl4steg/lib/evals/evals/registry/solvers
[2024-05-22 11:46:59,839] [registry.py:271] Loading registry from /data/artyom_karpov/.evals/solvers
/data/artyom_karpov/rl4steg/.venv/lib/python3.11/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.
[2024-05-22 11:47:07,329] [modeling.py:989] We will use 90% of the memory on device 0 for storing the model, and 10% for the buffer to avoid OOM. You can set `max_memory` in to a higher value to use more memory (at your own risk).

Loading checkpoint shards:   0%|          | 0/30 [00:00<?, ?it/s]
Loading checkpoint shards:   3%|▎         | 1/30 [00:03<01:32,  3.20s/it]
Loading checkpoint shards:   7%|▋         | 2/30 [00:06<01:31,  3.27s/it]
Loading checkpoint shards:  10%|█         | 3/30 [00:09<01:28,  3.29s/it]
Loading checkpoint shards:  13%|█▎        | 4/30 [00:13<01:30,  3.49s/it]
Loading checkpoint shards:  17%|█▋        | 5/30 [00:17<01:26,  3.48s/it]
Loading checkpoint shards:  20%|██        | 6/30 [00:20<01:22,  3.43s/it]
Loading checkpoint shards:  23%|██▎       | 7/30 [00:23<01:19,  3.45s/it]
Loading checkpoint shards:  27%|██▋       | 8/30 [00:28<01:22,  3.73s/it]
Loading checkpoint shards:  30%|███       | 9/30 [00:32<01:18,  3.75s/it]
Loading checkpoint shards:  33%|███▎      | 10/30 [00:35<01:13,  3.69s/it]
Loading checkpoint shards:  37%|███▋      | 11/30 [00:39<01:10,  3.71s/it]
Loading checkpoint shards:  40%|████      | 12/30 [00:43<01:10,  3.94s/it]
Loading checkpoint shards:  43%|████▎     | 13/30 [00:48<01:10,  4.12s/it]
Loading checkpoint shards:  47%|████▋     | 14/30 [00:52<01:06,  4.15s/it]
Loading checkpoint shards:  50%|█████     | 15/30 [00:57<01:03,  4.26s/it]
Loading checkpoint shards:  53%|█████▎    | 16/30 [01:01<01:01,  4.39s/it]
Loading checkpoint shards:  57%|█████▋    | 17/30 [01:06<00:57,  4.41s/it]
Loading checkpoint shards:  60%|██████    | 18/30 [01:10<00:53,  4.45s/it]
Loading checkpoint shards:  63%|██████▎   | 19/30 [01:15<00:48,  4.43s/it]
Loading checkpoint shards:  67%|██████▋   | 20/30 [01:19<00:43,  4.36s/it]
Loading checkpoint shards:  70%|███████   | 21/30 [01:23<00:38,  4.28s/it]
Loading checkpoint shards:  73%|███████▎  | 22/30 [01:27<00:33,  4.22s/it]
Loading checkpoint shards:  77%|███████▋  | 23/30 [01:31<00:29,  4.22s/it]
Loading checkpoint shards:  80%|████████  | 24/30 [01:36<00:25,  4.28s/it]
Loading checkpoint shards:  83%|████████▎ | 25/30 [01:40<00:22,  4.43s/it]
Loading checkpoint shards:  87%|████████▋ | 26/30 [01:45<00:17,  4.43s/it]
Loading checkpoint shards:  90%|█████████ | 27/30 [01:49<00:13,  4.35s/it]
Loading checkpoint shards:  93%|█████████▎| 28/30 [01:53<00:08,  4.35s/it]
Loading checkpoint shards:  97%|█████████▋| 29/30 [01:58<00:04,  4.39s/it]
Loading checkpoint shards: 100%|██████████| 30/30 [01:59<00:00,  3.47s/it]
Loading checkpoint shards: 100%|██████████| 30/30 [01:59<00:00,  3.99s/it]
/data/artyom_karpov/rl4steg/.venv/lib/python3.11/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[2024-05-22 11:49:08,341] [oaieval.py:215] �[1;35mRun started: 240522114908HNUG55EE�[0m
2024-05-22 11:49:09.863802: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-05-22 11:49:11.808810: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
[2024-05-22 11:49:13,601] [utils.py:145] Note: detected 128 virtual cores but NumExpr set to maximum of 64, check "NUMEXPR_MAX_THREADS" environment variable.
[2024-05-22 11:49:13,602] [utils.py:148] Note: NumExpr detected 128 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
[2024-05-22 11:49:13,602] [utils.py:161] NumExpr defaulting to 8 threads.
2024-05-22 11:49:15.717343: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1928] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 37944 MB memory:  -> device: 0, name: NVIDIA A100-SXM4-80GB, pci bus id: 0000:0f:00.0, compute capability: 8.0
[2024-05-22 11:49:24,785] [data.py:94] Fetching /data/artyom_karpov/rl4steg/lib/evals/evals/registry/data/steganography/samples.jsonl
[2024-05-22 11:49:24,792] [eval.py:36] Evaluating 480 samples
[2024-05-22 11:49:24,810] [eval.py:144] Running in threaded mode with 1 threads!

  0%|          | 0/480 [00:00<?, ?it/s]Asking to pad to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no padding.
Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


.... 


2024-05-22 11:53:01.290822: W tensorflow/compiler/mlir/tools/kernel_gen/transforms/gpu_kernel_to_blob_pass.cc:190] Failed to compile generated PTX with ptxas. Falling back to compilation by driver.

  0%|          | 1/480 [03:36<28:49:53, 216.69s/it]Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
...

And it runs eventually. How to disable tensorflow?

Code snippets

llama.py

from evals.api import CompletionFn, CompletionResult
from evals.prompt.base import CompletionPrompt
from evals.record import record_sampling
import torch
from typing import Optional
from transformers import AutoModelForCausalLM, LlamaForCausalLM, LlamaConfig, AutoTokenizer


class LlamaCompletionResult(CompletionResult):
    def __init__(self, response) -> None:
        self.response = response

    def get_completions(self) -> list[str]:
        return [self.response.strip()]


class LlamaCompletionFn(CompletionFn):
    def __init__(self, llm: str, llm_kwargs: Optional[dict] = None, **kwargs) -> None:
        self._model = AutoModelForCausalLM.from_pretrained(
            llm,
            return_dict=True,
            load_in_8bit=llm_kwargs["load_in_8bit"],
            load_in_4bit=llm_kwargs["load_in_4bit"],
            device_map="auto",
            low_cpu_mem_usage=True,
            attn_implementation="sdpa" if llm_kwargs.get("use_fast_kernels", False) else None,
            torch_dtype=torch.bfloat16
        )
        self._model.eval()

        self._tokenizer = AutoTokenizer.from_pretrained(llm)
        self._tokenizer.pad_token = self._tokenizer.eos_token
        torch.manual_seed(llm_kwargs.get("seed", 42))
        self._gen_kwargs = llm_kwargs['gen_kwargs']

    @torch.no_grad()
    def __call__(self, prompt, **kwargs) -> CompletionResult:
        prompt = self._tokenizer.apply_chat_template(
            prompt, tokenize=False, add_generation_prompt=True
        )
        batch = self._tokenizer(prompt, padding='max_length', truncation=True, max_length=None, return_tensors="pt")
        batch = {k: v.to("cuda") for k, v in batch.items()}
        outputs = self._model.generate(
            **batch,
            **self._gen_kwargs,
        )
        # Take only response:
        outputs = outputs[0][batch['input_ids'][0].size(0):]
        response = self._tokenizer.decode(outputs, skip_special_tokens=True)
        record_sampling(prompt=prompt, sampled=response)
        return LlamaCompletionResult(response)

Register:


llama/3-70b:
  class: evals.completion_fns.llama:LlamaCompletionFn
  args:
    llm: meta-llama/Meta-Llama-3-70B-Instruct
    llm_kwargs:
      load_in_8bit: false
      load_in_4bit: true
      use_fast_kernels: false
      gen_kwargs:
        max_new_tokens: 200
        do_sample: true
        top_p: 1.0
        temperature: 1.0
        min_length: null
        use_cache: false
        top_k: 50
        repetition_penalty: 1.0
        length_penalty: 1



### OS

Linux * 3.10.0-1160.76.1.0.1.el7.x86_64 #1 SMP Wed Aug 10 17:32:14 PDT 2022 x86_64 x86_64 x86_64 GNU/Linux

### Python version

Python 3.11.5

### Library version

3.0.1

The text was updated successfully, but these errors were encountered:

artkpv · 2024-05-25T10:51:59Z

Removing tensorflow packages from pip (pip uninstall) seems to solve the issue.

artkpv added the bug Something isn't working label May 22, 2024

artkpv closed this as completed May 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TensorFlow fails while no TensorFlow expected to run at all #1532

TensorFlow fails while no TensorFlow expected to run at all #1532

artkpv commented May 22, 2024

artkpv commented May 25, 2024

TensorFlow fails while no TensorFlow expected to run at all #1532

TensorFlow fails while no TensorFlow expected to run at all #1532

Comments

artkpv commented May 22, 2024

Describe the bug

To Reproduce

Code snippets

artkpv commented May 25, 2024