Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: 2539520 exceeds max_bin_len(1048576) when uses spacy.load() #2995

Closed
RochaOwng opened this issue Nov 30, 2018 · 18 comments
Closed
Labels
feat / serialize Feature: Serialization, saving and loading 🔮 thinc spaCy's machine learning library Thinc third-party Third-party packages and services

Comments

@RochaOwng
Copy link

Hi, I'm new with spaCy.

So, I've tried a little script to understand how it works:

import spacy

nlp = spacy.load('pt')

I've already have installed spacy (with pip and conda), using python3.6, and already downloaded the portuguese model, but I'm getting this following error:

Traceback (most recent call last):
File "C:/Users/rocha/PycharmProjects/projeto/entidade.py", line 4, in
nlp = spacy.load('pt')
File "C:\Users\rocha\Anaconda3\lib\site-packages\spacy_init_.py", line 18, in load
return util.load_model(name, **overrides)
File "C:\Users\rocha\Anaconda3\lib\site-packages\spacy\util.py", line 112, in load_model
return load_model_from_link(name, **overrides)
File "C:\Users\rocha\Anaconda3\lib\site-packages\spacy\util.py", line 129, in load_model_from_link
return cls.load(**overrides)
File "C:\Users\rocha\Anaconda3\lib\site-packages\spacy\data\pt_init_.py", line 12, in load
return load_model_from_init_py(file, **overrides)
File "C:\Users\rocha\Anaconda3\lib\site-packages\spacy\util.py", line 173, in load_model_from_init_py
return load_model_from_path(data_path, meta, **overrides)
File "C:\Users\rocha\Anaconda3\lib\site-packages\spacy\util.py", line 156, in load_model_from_path
return nlp.from_disk(model_path)
File "C:\Users\rocha\Anaconda3\lib\site-packages\spacy\language.py", line 647, in from_disk
util.from_disk(path, deserializers, exclude)
File "C:\Users\rocha\Anaconda3\lib\site-packages\spacy\util.py", line 511, in from_disk
reader(path / key)
File "C:\Users\rocha\Anaconda3\lib\site-packages\spacy\language.py", line 643, in
deserializers[name] = lambda p, proc=proc: proc.from_disk(p, vocab=False)
File "pipeline.pyx", line 643, in spacy.pipeline.Tagger.from_disk
File "C:\Users\rocha\Anaconda3\lib\site-packages\spacy\util.py", line 511, in from_disk
reader(path / key)
File "pipeline.pyx", line 626, in spacy.pipeline.Tagger.from_disk.load_model
File "pipeline.pyx", line 627, in spacy.pipeline.Tagger.from_disk.load_model
File "C:\Users\rocha\Anaconda3\lib\site-packages\thinc\neural_classes\model.py", line 335, in from_bytes
data = msgpack.loads(bytes_data, encoding='utf8')
File "C:\Users\rocha\Anaconda3\lib\site-packages\msgpack_numpy.py", line 214, in unpackb
return _unpackb(packed, **kwargs)
File "msgpack_unpacker.pyx", line 187, in msgpack._cmsgpack.unpackb
ValueError: 2539520 exceeds max_bin_len(1048576)

Anyone can help?

@ines ines added third-party Third-party packages and services 🔮 thinc spaCy's machine learning library Thinc feat / serialize Feature: Serialization, saving and loading labels Nov 30, 2018
@ines
Copy link
Member

ines commented Nov 30, 2018

Thanks for the report and sorry you've hit this problem. It also just came up in our tests today and it was pretty confusing.

Looks like it might be related to an update of the msgpack library that was released today and is used in our library thinc, which spaCy depends on. So when you installed spaCy, that new version was pulled in and apparently it includes a change to the limit?

We'll investigate this and hopefully push an update to thinc soon. In the meantime, try downgrading msgpack:

pip install msgpack==0.5.6

@msukmanowsky
Copy link

Just hit this as well, your fix works (we did msgpack>=0.3.0,<0.6).

@ines
Copy link
Member

ines commented Nov 30, 2018

Glad it worked!

We also released Thinc v6.12.1 earlier, which pins to the exact msgpack version. It should now be installed automatically when you install/update spaCy.
https://github.com/explosion/thinc/releases/tag/v6.12.1

@msukmanowsky
Copy link

Great, thanks so much!

@honnibal
Copy link
Member

Probably best to keep this open for now, as people with cached packages might still run into this.

tl;dr: Thinc 6.12.1 is up now, so fresh installs should work. If your installation doesn't work, do:

python -m pip install "msgpack<0.6.0"

@daniel347x
Copy link

daniel347x commented Dec 1, 2018

Note: I just fired up a fresh Ubuntu 18.04 VM in Azure and sudo pip install -U spacy reveals this message in the console:

thinc 6.12.1 has requirement msgpack<0.6.0,>=0.5.6, but you'll have msgpack 0.6.0 which is incompatible.

... So it looks like you do need to manually install the older version of msgpack along with a fresh install of spaCy (i.e., python -m pip install "msgpack<0.6.0" is required).

@honnibal
Copy link
Member

honnibal commented Dec 1, 2018

Hmm! What else requires msgpack in your environment though? spaCy shouldn't be depending on it directly.

@vividfree
Copy link

vividfree commented Dec 1, 2018

The version of msgpack is 0.5.6, but the problem still exists.

disfluency_detection/crf.py:36: in init
self.nlp = spacy.load('en')
.venv/lib/python3.6/site-packages/spacy/init.py:15: in load
return util.load_model(name, **overrides)
.venv/lib/python3.6/site-packages/spacy/util.py:112: in load_model
return load_model_from_link(name, **overrides)
.venv/lib/python3.6/site-packages/spacy/util.py:129: in load_model_from_link
return cls.load(**overrides)
.venv/lib/python3.6/site-packages/spacy/data/en/init.py:12: in load
return load_model_from_init_py(file, **overrides)
.venv/lib/python3.6/site-packages/spacy/util.py:173: in load_model_from_init_py
return load_model_from_path(data_path, meta, **overrides)
.venv/lib/python3.6/site-packages/spacy/util.py:156: in load_model_from_path
return nlp.from_disk(model_path)
.venv/lib/python3.6/site-packages/spacy/language.py:653: in from_disk
util.from_disk(path, deserializers, exclude)
.venv/lib/python3.6/site-packages/spacy/util.py:511: in from_disk
reader(path / key)
.venv/lib/python3.6/site-packages/spacy/language.py:649: in
deserializers[name] = lambda p, proc=proc: proc.from_disk(p, vocab=False)
pipeline.pyx:643: in spacy.pipeline.Tagger.from_disk
???
.venv/lib/python3.6/site-packages/spacy/util.py:511: in from_disk
reader(path / key)
pipeline.pyx:626: in spacy.pipeline.Tagger.from_disk.load_model
???
pipeline.pyx:627: in spacy.pipeline.Tagger.from_disk.load_model
???
.venv/lib/python3.6/site-packages/thinc/neural/_classes/model.py:335: in from_bytes
data = msgpack.loads(bytes_data, encoding='utf8')
.venv/lib/python3.6/site-packages/msgpack_numpy.py:214: in unpackb
return _unpackb(packed, **kwargs)
msgpack/_unpacker.pyx:187: in msgpack._cmsgpack.unpackb
???
E ValueError: 1792000 exceeds max_bin_len(1048576)

Anyone could help?

@daniel347x
Copy link

I tested on a raw (virgin) VM instance of Ubuntu 18.04 on Azure just to rule everything out.

My only commands:

sudo apt-get update
sudo apt-get upgrade -y
sudo apt-get dist-upgrade
sudo apt-get install -y python-pip
sudo python -m pip install --upgrade pip
pip install -U spacy

...It looks like msgpack is included in the default Ubuntu 18.04 system (upgraded to the latest).

@honnibal honnibal reopened this Dec 1, 2018
@honnibal
Copy link
Member

honnibal commented Dec 1, 2018

Try pip install spacy==2.0.18

@iperezr
Copy link

iperezr commented Dec 1, 2018

Just got started with spaCy, had the same error.
Downgrading msgpack to 0.5.6 solved it. Thanks!

@amatsuo
Copy link

amatsuo commented Dec 1, 2018

Thank you for the very quick fix!

Travis build of spacyr experienced the same issue yesterday:
https://travis-ci.org/quanteda/spacyr/builds/461753381
(see the line 4336 for the failure. The installed spacy was 2.0.17 (line 4088)).

I was to file the issue today, but the problem is already gone in a build this morning. The spacy version is 2.0.18 in this build.
https://travis-ci.org/quanteda/spacyr/builds/462103684

Just for a reference, our package installs spaCy with the following:

conda create -n spacy_condaenv python=3.6 -y
source activate spacy_condaenv
pip install spacy
python -m spacy download en

@honnibal
Copy link
Member

honnibal commented Dec 1, 2018

@amatsuo Glad it was fixed!

You should probably update your installation to include a version range. It should now be safe to install spaCy via conda as well. conda install -c conda-forge "spacy>=2.0.0,<2.1.0" is recommended. Minor versions such as v2.1 aren't necessarily model-compatible: v2.1 will require new models to be downloaded and trained.

Version pinning is especially useful for helping people producing reproducible experiments. If versions aren't pinned, when someone tries your code in a few years time, new versions of the software will be installed and everything will break.

@amatsuo
Copy link

amatsuo commented Dec 1, 2018

@honnibal Thanks for the suggestion! Sounds a good idea.

We'd include it in the next update.

@honnibal honnibal closed this as completed Dec 2, 2018
tmbo added a commit to RasaHQ/rasa that referenced this issue Dec 3, 2018
tmbo added a commit to RasaHQ/rasa that referenced this issue Dec 3, 2018
jbobo pushed a commit to hyperledger-archives/sawtooth-next-directory that referenced this issue Dec 7, 2018
See explosion/spaCy#2995

Bump base Core image to 12.3, rebuild training data, and
address dependency issue.

Add binaries to .gitattributes

Signed-off-by: PGobz <[email protected]>
pgobin-zz pushed a commit to hyperledger-archives/sawtooth-next-directory that referenced this issue Dec 7, 2018
* Add chatbot models

Add chatbot built models to source control

Signed-off-by: PGobz <[email protected]>

* Address Rasa NLU issue

See explosion/spaCy#2995

Bump base Core image to 12.3, rebuild training data, and
address dependency issue.

Signed-off-by: PGobz <[email protected]>

* Move chatbot from bin

- Move server start to entrypoint
- Rename build -> models
- Move train to module inside top-level chatbot dir

Signed-off-by: PGobz <[email protected]>
@gabsrl
Copy link

gabsrl commented Dec 14, 2018

I've got the same problem of @RochaOwng and downgrading the msgpack solved the problem. Thanks @ines , you just saved my day :))

@cameronrhamilton
Copy link

For those who installed spacy with conda, I found the following command to work the best:

conda install -c conda-forge msgpack-python==0.5.6

@imsrgadich
Copy link

conda install -c conda-forge msgpack-python==0.5.6

Worked for conda installation. Thanks @cameronrhamilton !

@lock
Copy link

lock bot commented Feb 27, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators Feb 27, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
feat / serialize Feature: Serialization, saving and loading 🔮 thinc spaCy's machine learning library Thinc third-party Third-party packages and services
Projects
None yet
Development

No branches or pull requests