Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: FastText Incompatibility with NumPy >= 2.0.0 #3

Closed
myhloli opened this issue Jul 7, 2024 · 3 comments
Closed

Bug: FastText Incompatibility with NumPy >= 2.0.0 #3

myhloli opened this issue Jul 7, 2024 · 3 comments
Labels
bug Something isn't working documentation Improvements or additions to documentation

Comments

@myhloli
Copy link

myhloli commented Jul 7, 2024

______________________________________________________________________ test_detect_totally _______________________________________________________________________

    def test_detect_totally():
        from fast_langdetect import detect_language
>       assert detect_language("hello world") == "EN", "ft_detect error"

tests/test_detect.py:25: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
venv/lib/python3.10/site-packages/fast_langdetect/ft_detect/__init__.py:23: in detect_language
    lang_code = detect(sentence, low_memory=low_memory).get("lang").upper()
venv/lib/python3.10/site-packages/fast_langdetect/ft_detect/infer.py:81: in detect
    labels, scores = model.predict(text)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <fasttext.FastText._FastText object at 0x10ecdca90>, text = 'hello world\n', k = 1, threshold = 0.0, on_unicode_error = 'strict'

    def predict(self, text, k=1, threshold=0.0, on_unicode_error='strict'):
        """
        Given a string, get a list of labels and a list of
        corresponding probabilities. k controls the number
        of returned labels. A choice of 5, will return the 5
        most probable labels. By default this returns only
        the most likely label and probability. threshold filters
        the returned labels by a threshold on probability. A
        choice of 0.5 will return labels with at least 0.5
        probability. k and threshold will be applied together to
        determine the returned labels.
    
        This function assumes to be given
        a single line of text. We split words on whitespace (space,
        newline, tab, vertical tab) and the control characters carriage
        return, formfeed and the null character.
    
        If the model is not supervised, this function will throw a ValueError.
    
        If given a list of strings, it will return a list of results as usually
        received for a single line of text.
        """
    
        def check(entry):
            if entry.find('\n') != -1:
                raise ValueError(
                    "predict processes one line at a time (remove \'\\n\')"
                )
            entry += "\n"
            return entry
    
        if type(text) == list:
            text = [check(entry) for entry in text]
            all_labels, all_probs = self.f.multilinePredict(
                text, k, threshold, on_unicode_error)
    
            return all_labels, all_probs
        else:
            text = check(text)
            predictions = self.f.predict(text, k, threshold, on_unicode_error)
            if predictions:
                probs, labels = zip(*predictions)
            else:
                probs, labels = ([], ())
    
>           return labels, np.array(probs, copy=False)
E           ValueError: Unable to avoid copy while creating an array as requested.
E           If using `np.array(obj, copy=False)` replace it with `np.asarray(obj)` to allow a copy when needed (no behavior change in NumPy 1.x).
E           For more details, see https://numpy.org/devdocs/numpy_2_0_migration_guide.html#adapting-to-changes-in-the-copy-keyword.

venv/lib/python3.10/site-packages/fasttext/FastText.py:232: ValueError

https://github.com/facebookresearch/fastText has been archived, I just add "numpy<2.0.0" in my requirements.txt.

@neutron-nerve neutron-nerve bot added bug Something isn't working documentation Improvements or additions to documentation labels Jul 7, 2024
@neutron-nerve neutron-nerve bot changed the title fasttext not support numpy >= 2.0.0 Bug: FastText Incompatibility with NumPy >= 2.0.0 Jul 7, 2024
@sudoskys
Copy link
Member

sudoskys commented Jul 8, 2024

Copy link

neutron-nerve bot commented Jul 8, 2024

Issue Report: Bug - FastText Incompatibility with NumPy >= 2.0.0

Issue Summary

An issue was identified in the fast_langdetect library where the FastText model was incompatible with NumPy versions greater than or equal to 2.0.0. The specific error encountered was:

ValueError: Unable to avoid copy while creating an array as requested.
If using `np.array(obj, copy=False)` replace it with `np.asarray(obj)` to allow a copy when needed (no behavior change in NumPy 1.x).

The detected incompatibility caused the unit test test_detect_totally to fail when attempting to detect language using the FastText model, due to changes in NumPy 2.0.0's handling of array creation.

Root Cause

The error occurred because FastText used np.array with the copy=False parameter, which is not supported in NumPy 2.0.0 as per the migration guide. This made the code incompatible with newer versions of NumPy.

Resolution

To resolve the incompatibility, the project's requirements were updated to restrict the version of NumPy to less than 2.0.0. Specifically, the following change was made to the requirements.txt file:

numpy>=1.26.4,<2.0.0

This adjustment ensures that the project remains compatible with NumPy versions that do not introduce the breaking change.

Final Outcome

The issue was successfully resolved by the contributor @sudoskys. The project's requirements now specify an appropriate range for the NumPy version, avoiding the incompatibility with NumPy 2.0.0 and ensuring stable functionality for fast_langdetect.

Appreciations

We extend our gratitude to @sudoskys for promptly addressing this issue and providing a solution. The community's swift action ensures the continued reliability and performance of the fast_langdetect library.


Report Prepared By:
LlmKira Contributors

@xiahuadong1981
Copy link

找到 FastText.py 文件的 predict 方法的实现部分,找到这段代码:
return labels, np.array(probs, copy=False)
将其替换为:
return labels, np.asarray(probs)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

3 participants