Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*** Error in `python': free(): invalid pointer: 0x00002aaaffba8c40 *** #56

Open
KirillShmilovich opened this issue Jan 18, 2024 · 3 comments

Comments

@KirillShmilovich
Copy link

KirillShmilovich commented Jan 18, 2024

I'm getting a strange error: *** Error in 'python': free(): invalid pointer: 0x00002aaaffba8c40 *** when trying to run GNINA from a separate file:

I have a file gnina_feat.py with the following code:

import torch
from argparse import Namespace
import tempfile
from tqdm.autonotebook import tqdm

from gninatorch import setup, gnina, dataloaders

class VoxelLoader(dataloaders.GriddedExamplesLoader):
    def __len__(self):
        return self.num_batches

grid_args = Namespace(**{'data_root':'',
                    'batch_size':20,
                    'ligmolcache':"",
                    'recmolcache':"",
                    'cache_structures':False,
                    'dimension':23.5,
                    'resolution':0.5,
                    'balanced':False,
                    'shuffle':False,
                    'stratify_receptor':False,
                    'stratify_pos':False,
                    'iteration_scheme':'small',
                    'stratify_max':0,
                    'stratify_min':0,
                    'stratify_step':0})


def get_gnina_feats_from_df(df, device='cuda'):
    temp_name = next(tempfile._get_candidate_names())
    types_file = f'/local/tmp/{temp_name}.types'

    with open(types_file, "w") as text_file:
        for i, (_, row) in enumerate(df.iterrows()):
            file_str = f"{i} {row['docked_pocket_path']} {row['docked_ligand_path']}"
            print(file_str, file=text_file)
    
    provider = setup.setup_example_provider(
        types_file, grid_args, training=False
    )
    grid_maker = setup.setup_grid_maker(grid_args)

    all_loader = VoxelLoader(
        example_provider=provider,
        grid_maker=grid_maker,
        random_translation=0,
        random_rotation=False,
        device=device,
    )

    model = gnina.setup_gnina_model('dense_ensemble')[0]
    model.to(torch.device(device))

    feats = list()
    with torch.inference_mode():
        for batch in tqdm(all_loader, desc='Predicting GNINA features'):
            feat = list()
            for mod in model.models:
                feat.append(mod.features(batch[0]).squeeze()[None])
            feats.append(torch.cat(feat).mean(0))
        feats = torch.cat(feats).cpu()
    return [f for f in feats]

Then in a separate file I call:

feats = get_gnina_feats_from_df(df)

Which yields the error:

*** Error in `python': free(): invalid pointer: 0x00002aaaffba8c40 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x81329)[0x2aaaab675329]
/home/kirillshmilovich/.local/lib/python3.10/site-packages/molgrid/molgrid.so(_ZNSt23_Sp_counted_ptr_inplaceIN10libmolgrid15ExampleProviderESaIS1_ELN9__gnu_cxx12_Lock_policyE2EE10_M_disposeEv+0x539)[0x2aab88d32c09]
/home/kirillshmilovich/.local/lib/python3.10/site-packages/molgrid/molgrid.so(_ZN5boost6python7objects14pointer_holderISt10shared_ptrIN10libmolgrid15ExampleProviderEES5_ED1Ev+0x5a)[0x2aab88d00bba]
/home/kirillshmilovich/.local/lib/python3.10/site-packages/molgrid/molgrid.so(+0x6d8c9f)[0x2aab88f00c9f]
python[0x50ec0e]
python[0x4ec2c6]
python[0x50eadc]
python[0x4f8478]
python(_PyFunction_Vectorcall+0xa5)[0x4ff2a5]
python(_PyEval_EvalFrameDefault+0x31f)[0x4ef96f]
python(_PyFunction_Vectorcall+0x6f)[0x4ff26f]
python(_PyEval_EvalFrameDefault+0x731)[0x4efd81]
python[0x50a69e]
python(PyObject_Call+0xb8)[0x50b048]
python(_PyEval_EvalFrameDefault+0x2a50)[0x4f20a0]
python(_PyFunction_Vectorcall+0x6f)[0x4ff26f]
python(_PyEval_EvalFrameDefault+0x13b2)[0x4f0a02]
python(_PyFunction_Vectorcall+0x6f)[0x4ff26f]
python(_PyEval_EvalFrameDefault+0x4b2c)[0x4f417c]
python[0x50a69e]
python(_PyEval_EvalFrameDefault+0x13b2)[0x4f0a02]
python[0x50a7f6]
python(_PyEval_EvalFrameDefault+0x2a50)[0x4f20a0]
python(_PyFunction_Vectorcall+0x6f)[0x4ff26f]
python(_PyEval_EvalFrameDefault+0x4b2c)[0x4f417c]
python[0x50a69e]
python(_PyEval_EvalFrameDefault+0x13b2)[0x4f0a02]
python(_PyFunction_Vectorcall+0x6f)[0x4ff26f]
python(_PyEval_EvalFrameDefault+0x13b2)[0x4f0a02]
python(_PyFunction_Vectorcall+0x6f)[0x4ff26f]
python(_PyEval_EvalFrameDefault+0x31f)[0x4ef96f]
python(_PyFunction_Vectorcall+0x6f)[0x4ff26f]
python(_PyEval_EvalFrameDefault+0x31f)[0x4ef96f]
python(_PyFunction_Vectorcall+0x6f)[0x4ff26f]
python(_PyEval_EvalFrameDefault+0x13b2)[0x4f0a02]
python[0x50a69e]
python(_PyEval_EvalFrameDefault+0x13b2)[0x4f0a02]
python(_PyFunction_Vectorcall+0x6f)[0x4ff26f]
python(_PyEval_EvalFrameDefault+0x31f)[0x4ef96f]
python(_PyFunction_Vectorcall+0x6f)[0x4ff26f]
python(_PyEval_EvalFrameDefault+0x31f)[0x4ef96f]
python(_PyFunction_Vectorcall+0x6f)[0x4ff26f]
python(_PyEval_EvalFrameDefault+0x13b2)[0x4f0a02]
python(_PyFunction_Vectorcall+0x6f)[0x4ff26f]
python(_PyEval_EvalFrameDefault+0x13b2)[0x4f0a02]
python(_PyFunction_Vectorcall+0x6f)[0x4ff26f]
python(_PyEval_EvalFrameDefault+0x31f)[0x4ef96f]
python[0x594fd2]
python(PyEval_EvalCode+0x87)[0x594f17]
python[0x5c7667]
python[0x5c24c0]
python[0x45ba37]
python(_PyRun_SimpleFileObject+0x19f)[0x5bc9df]
python(_PyRun_AnyFileObject+0x43)[0x5bc7e3]
python(Py_RunMain+0x38d)[0x5b95fd]
python(Py_BytesMain+0x39)[0x588099]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x2aaaab616555]
python[0x587f4e]
======= Memory map: ========
00400000-0041f000 r--p 00000000 00:3d 563664911                          /home/kirillshmilovich/.conda/envs/binding_affinity/bin/python3.10
0041f000-0061f000 r-xp 0001f000 00:3d 563664911                          /home/kirillshmilovich/.conda/envs/binding_affinity/bin/python3.10
0061f000-00721000 r--p 0021f000 00:3d 563664911                          /home/kirillshmilovich/.conda/envs/binding_affinity/bin/python3.10
00722000-00723000 r--p 00321000 00:3d 563664911                          /home/kirillshmilovich/.conda/envs/binding_affinity/bin/python3.10
00723000-00755000 rw-p 00322000 00:3d 563664911                          /home/kirillshmilovich/.conda/envs/binding_affinity/bin/python3.10
00755000-1dff1000 rw-p 00000000 00:00 0                                  [heap]
200000000-200200000 ---p 00000000 00:00 0 
200200000-200400000 rw-s 00000000 00:05 48522                            /dev/nvidiactl
200400000-202400000 rw-s 00000000 00:05 48522                            /dev/nvidiactl
202400000-205400000 rw-s 00000000 00:05 48522                            /dev/nvidiactl
205400000-206000000 ---p 00000000 00:00 0 
206000000-206200000 rw-s 00000000 00:05 48522                            /dev/nvidiactl
206200000-206400000 rw-s 00000000 00:05 48522                            /dev/nvidiactl
206400000-206600000 rw-s 206400000 00:05 53296                           /dev/nvidia-uvm
206600000-206800000 rw-s 00000000 00:05 48522                            /dev/nvidiactl
206800000-206a00000 ---p 00000000 00:00 0 
206a00000-206c00000 rw-s 00000000 00:05 48522                            /dev/nvidiactl
206c00000-300200000 ---p 00000000 00:00 0 
10000000000-10104000000 ---p 00000000 00:00 0 
3c52e000000-3c56e000000 rw-p 00000000 00:00 0 
2aaaaaaab000-2aaaaaacd000 r-xp 00000000 09:00 203039084                  /usr/lib64/ld-2.17.so
2aaaaaacd000-2aaaaaacf000 r-xp 00000000 00:00 0                          [vdso]
2aaaaaacf000-2aaaaaad0000 rw-p 00000000 00:00 0 
2aaaaaad0000-2aaaaaad7000 r--s 00000000 09:00 140121628                  /usr/lib64/gconv/gconv-modules.cache
2aaaaaad7000-2aaaaaadc000 r--p 00000000 00:3d 481877408                  /home/kirillshmilovich/.conda/envs/binding_affinity/lib/python3.10/lib-dynload/_socket.cpython-310-x86_64-linux-gnu.so
2aaaaaadc000-2aaaaaae5000 r-xp 00005000 00:3d 481877408                  /home/kirillshmilovich/.conda/envs/binding_affinity/lib/python3.10/lib-dynload/_socket.cpython-310-x86_64-linux-gnu.so
2aaaaaae5000-2aaaaaaed000 r--p 0000e000 00:3d 481877408                  /home/kirillshmilovich/.conda/envs/binding_affinity/lib/python3.10/lib-dynload/_socket.cpython-310-x86_64-linux-gnu.so
2aaaaaaed000-2aaaaaaee000 ---p 00016000 00:3d 481877408                  /home/kirillshmilovich/.conda/envs/binding_affinity/lib/python3.10/lib-dynload/_socket.cpython-310-x86_64-linux-gnu.so
2aaaaaaee000-2aaaaaaef000 r--p 00016000 00:3d 481877408                  /home/kirillshmilovich/.conda/envs/binding_affinity/lib/python3.10/lib-dynload/_socket.cpython-310-x86_64-linux-gnu.so
2aaaaaaef000-2aaaaaaf0000 rw-p 00017000 00:3d 481877408                  /home/kirillshmilovich/.conda/envs/binding_affinity/lib/python3.10/lib-dynload/_socket.cpython-310-x86_64-linux-gnu.so
2aaaaaaf0000-2aaaaaaf1000 r--p 00000000 00:3d 481877383                  /home/kirillshmilovich/.conda/envs/binding_affinity/lib/python3.10/lib-dynload/_heapq.cpython-310-x86_64-linux-gnu.so
2aaaaaaf1000-2aaaaaaf4000 r-xp 00001000 00:3d 481877383                  /home/kirillshmilovich/.conda/envs/binding_affinity/lib/python3.10/lib-dynload/_heapq.cpython-310-x86_64-linux-gnu.soAborted

Strangely, if I place a debug statement within the get_gnina_feats_from_df function, e.g. at the end of the original file,

with torch.inference_mode():
      for batch in tqdm(all_loader, desc='Predicting GNINA features'):
          feat = list()
          for mod in model.models:
              feat.append(mod.features(batch[0]).squeeze()[None])
          feats.append(torch.cat(feat).mean(0))
      feats = torch.cat(feats).cpu()
 import ipdb; ipdb.set_trace()
 return [f for f in feats]

and then continue after entering that debug statement (i.e., pressing c after entering the debugger) everything appears to run fine without error. Although the error appears again once I Cntl-C out of the debugger.

Do you have any idea what the origin of this error might be and how I might be able to fix it and get this to work without using the debugger?

@RMeli
Copy link
Owner

RMeli commented Jan 19, 2024

I have never seen this error before (but admittedly I didn't use gnina-torch much lately). From the backtrace, this looks like an issue with libmolgrid. Which version of libmolgrid are you using and how did you install it?

@KirillShmilovich
Copy link
Author

Looks like version 0.5.3. I had just installed gnina-torch with pip install -e . Would you recommend installing from source?

@RMeli
Copy link
Owner

RMeli commented Jan 19, 2024

Can you try molgrid 0.5.2? I think that's the last version I used locally.

I think it might be worth building from source a debug build (-DCMAKE_BUILD_TYPE=Debug or -DCMAKE_BUILD_TYPE=RelWithDebInfo) in order to get a bit more information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants