Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot load teknium/Replit-v2-CodeInstruct-3B #436

Open
shrikrishnaholla opened this issue Aug 6, 2023 · 14 comments
Open

Cannot load teknium/Replit-v2-CodeInstruct-3B #436

shrikrishnaholla opened this issue Aug 6, 2023 · 14 comments

Comments

@shrikrishnaholla
Copy link

This issue is on similar lines as #248 , but is regarding replit-v2 models, not replit-v1

I am using ggml@ a301077 and am having trouble loading teknium/Replit-v2-CodeInstruct-3B

I used examples/replit/convert-h5-to-ggml.py to convert to ggml f32.
Also created both a q4_1 as well as q8_0 quantized versions using replit-quantize.

However, when trying to load either f32, q4_1 or q8_0 versions of the models with replit
(e.g., ./bin/replit -m Replit-v2-CodeInstruct-3B-f32.bin -p "def hello_world():")
I get:

replit_model_load: unknown tensor 'transformer.blocks.0.norm_1.weight' in model file

Any ideas?

@klosax
Copy link
Contributor

klosax commented Aug 6, 2023

The model loads fine for me. The named tensor should be recognized and loaded. Did you get any compilation warnings?

@shrikrishnaholla
Copy link
Author

@klosax One thing might be that I had received this error when I ran python examples/replit/convert-h5-to-ggml.py ../teknium-Replit-v2-CodeInstruct-3B/Replit-v2-CodeInstruct-3B/ 0

Traceback (most recent call last):
  File "~/ggml/examples/replit/convert-h5-to-ggml.py", line 7, in <module>
    import sentencepiece.sentencepiece_model_pb2 as model
  File "~/.local/lib/python3.10/site-packages/sentencepiece/sentencepiece_model_pb2.py", line 34, in <module>
    _descriptor.EnumValueDescriptor(
  File "/opt/miniconda3/conda/envs/textgen/lib/python3.10/site-packages/google/protobuf/descriptor.py", line 796, in __new__
    _message.Message._CheckCalledFromGeneratedFile()
TypeError: Descriptors cannot not be created directly.
If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
If you cannot immediately regenerate your protos, some other possible workarounds are:
 1. Downgrade the protobuf package to 3.20.x or lower.
 2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).

So I rephrased the command like this:

PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python python examples/replit/convert-h5-to-ggml.py ../teknium-Replit-v2-CodeInstruct-3B/Replit-v2-CodeInstruct-3B/ 0

and it compiled successfully.

Could this have anything to do with it?

@klosax
Copy link
Contributor

klosax commented Aug 7, 2023

Could this have anything to do with it?

I guess not if the model file was converted successfully.

Any compilation warnings when compling the inference binary?

@shrikrishnaholla
Copy link
Author

Nothing stood out to me in particular...

cmake .. && make -j4 replit replit-quantize
-- CMAKE_SYSTEM_PROCESSOR: x86_64
-- x86 detected
-- Linux detected
-- x86 detected
-- Linux detected
-- Configuring done (0.1s)
-- Generating done (0.3s)
-- Build files have been written to: ~/ggml/build
[ 25%] Building CXX object examples/CMakeFiles/common.dir/common.cpp.o
[ 25%] Building C object src/CMakeFiles/ggml.dir/ggml.c.o
In file included from /usr/include/string.h:535,
                 from ~/ggml/src/ggml.c:21:
In function ‘memcpy’,
    inlined from ‘ggml_set_op_params’ at ~/ggml/src/ggml.c:4642:5,
    inlined from ‘ggml_conv_1d’ at ~/ggml/src/ggml.c:6883:5:
/usr/include/x86_64-linux-gnu/bits/string_fortified.h:29:10: warning: ‘__builtin_memcpy’ offset [0, 11] is out of the bounds [0, 0] [-Warray-bounds]
   29 |   return __builtin___memcpy_chk (__dest, __src, __len,
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   30 |                                  __glibc_objsize0 (__dest));
      |                                  ~~~~~~~~~~~~~~~~~~~~~~~~~~
In function ‘memcpy’,
    inlined from ‘ggml_set_op_params’ at ~/ggml/src/ggml.c:4642:5,
    inlined from ‘ggml_conv_2d’ at ~/ggml/src/ggml.c:6923:5:
/usr/include/x86_64-linux-gnu/bits/string_fortified.h:29:10: warning: ‘__builtin_memcpy’ offset [0, 23] is out of the bounds [0, 0] [-Warray-bounds]
   29 |   return __builtin___memcpy_chk (__dest, __src, __len,
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   30 |                                  __glibc_objsize0 (__dest));
      |                                  ~~~~~~~~~~~~~~~~~~~~~~~~~~
In function ‘memcpy’,
    inlined from ‘ggml_set_op_params’ at ~/ggml/src/ggml.c:4642:5,
    inlined from ‘ggml_conv_1d’ at ~/ggml/src/ggml.c:6883:5,
    inlined from ‘ggml_conv_1d_ph’ at ~/ggml/src/ggml.c:6942:12:
/usr/include/x86_64-linux-gnu/bits/string_fortified.h:29:10: warning: ‘__builtin_memcpy’ offset [0, 11] is out of the bounds [0, 0] [-Warray-bounds]
   29 |   return __builtin___memcpy_chk (__dest, __src, __len,
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   30 |                                  __glibc_objsize0 (__dest));
      |                                  ~~~~~~~~~~~~~~~~~~~~~~~~~~
In function ‘memcpy’,
    inlined from ‘ggml_set_op_params’ at ~/ggml/src/ggml.c:4642:5,
    inlined from ‘ggml_pool_2d’ at ~/ggml/src/ggml.c:7015:5:
/usr/include/x86_64-linux-gnu/bits/string_fortified.h:29:10: warning: ‘__builtin_memcpy’ offset [0, 27] is out of the bounds [0, 0] [-Warray-bounds]
   29 |   return __builtin___memcpy_chk (__dest, __src, __len,
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   30 |                                  __glibc_objsize0 (__dest));
      |                                  ~~~~~~~~~~~~~~~~~~~~~~~~~~
In function ‘memcpy’,
    inlined from ‘ggml_set_op_params’ at ~/ggml/src/ggml.c:4642:5,
    inlined from ‘ggml_win_part’ at ~/ggml/src/ggml.c:7183:5:
/usr/include/x86_64-linux-gnu/bits/string_fortified.h:29:10: warning: ‘__builtin_memcpy’ offset [0, 11] is out of the bounds [0, 0] [-Warray-bounds]
   29 |   return __builtin___memcpy_chk (__dest, __src, __len,
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   30 |                                  __glibc_objsize0 (__dest));
      |                                  ~~~~~~~~~~~~~~~~~~~~~~~~~~
[ 37%] Linking CXX static library libcommon.a
[ 37%] Built target common
[ 50%] Linking C static library libggml.a
[ 50%] Built target ggml
[ 62%] Building CXX object examples/CMakeFiles/common-ggml.dir/common-ggml.cpp.o
[ 75%] Linking CXX static library libcommon-ggml.a
[ 75%] Built target common-ggml
[ 87%] Building CXX object examples/replit/CMakeFiles/replit.dir/main.cpp.o
[100%] Linking CXX executable ../../bin/replit
[100%] Built target replit
[ 37%] Built target common
[ 50%] Built target ggml
[ 75%] Built target common-ggml
[ 87%] Building CXX object examples/replit/CMakeFiles/replit-quantize.dir/quantize.cpp.o
[100%] Linking CXX executable ../../bin/replit-quantize
[100%] Built target replit-quantize

@klosax

@klosax
Copy link
Contributor

klosax commented Aug 8, 2023

string_fortified.h:29:10: warning: ‘__builtin_memcpy’ offset [0, 11] is out of the bounds [0, 0] [-Warray-bounds]

The model file seems to be fine since the tensor transformer.blocks.0.norm_1.weight is in it. The inference binary should recognize the tensor and load it. My guess it that something is wrong with your compiler since you get warnings that could have to do with the problem. The binary does string comparison sto recognize the tensor names.

Try updating or reinstalling the compiler.

@shrikrishnaholla
Copy link
Author

This is my version. Should it be upgraded?

$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/11/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 11.3.0-1ubuntu1~22.04' --with-bugurl=file:https:///usr/share/doc/gcc-11/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-11 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib --enable-libphobos-checking=release --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --enable-cet --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none=/build/gcc-11-xKiWfi/gcc-11-11.3.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-xKiWfi/gcc-11-11.3.0/debian/tmp-gcn/usr --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu --with-build-config=bootstrap-lto-lean --enable-link-serialization=2
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 11.3.0 (Ubuntu 11.3.0-1ubuntu1~22.04)

@klosax
Copy link
Contributor

klosax commented Aug 8, 2023

I think it should work with your compiler.

But you could try change this line

if (model.tensors.find(name.data()) == model.tensors.end()) {

to

if (model.tensors.find(name) == model.tensors.end()) {

and compile again.

@shrikrishnaholla
Copy link
Author

This worked! @klosax , thanks for your time and for the help. I was lost without you 🙏

Would this change be useful to others as well? Should I commit and raise a PR?

@klosax
Copy link
Contributor

klosax commented Aug 10, 2023

Great!

Then all references of name.data() should be changed to name, and in lines with fprintf or printf it should be changed to name.c_str()

Would this change be useful to others as well? Should I commit and raise a PR?

It looks like this error can also be found in other examples and all of them should be fixed.

@shrikrishnaholla
Copy link
Author

Wouldn't that be breaking compilation of other models as well? Would you like me to try and reproduce for other classes of models before making a fix?

Because if what you say is true, then wouldn't this be a huge change? 🤔

@klosax
Copy link
Contributor

klosax commented Aug 10, 2023

name is a std::string and should be accessed as such, the contents should not be accessed directly by data() like it is done here.

All examples compile and works fine for me using gcc 9, so my guess is that your gcc 11 is handling this different than the older compilers, and that is the reason it wont work for you.

@shrikrishnaholla
Copy link
Author

Understood. So if I'm understanding correctly, even if name is accessed directly, since it is an std::string it won't break for older compilers like the one you use, correct?

Apologies for asking what might be basic questions. My C++ is rusty, so I don't want to be creating a regression and getting angry emails 😅

@klosax
Copy link
Contributor

klosax commented Aug 10, 2023

Yes the changes wont break anything for older compilers. I will make a PR for this to change all examples.

@klosax
Copy link
Contributor

klosax commented Aug 10, 2023

Would you like me to try and reproduce for other classes of models before making a fix?

If you like you could test one other example to see if the same error is there and if it is fixed by this change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants