add slow5 reading #5

Psy-Fer · 2022-10-27T06:28:26Z

Hello,

I added a hacky way to add slow5/blow5 reading to radian.
I didn't want to change the interface or args, as I think that should be left to the authors, but I have shown how easy it is to implement and added some comments for clarity.

I timed myself doing this as well.

~8min for:

make fork
load in VS code
read code and find fast5 reading
make code changes to add pyslow5
add some comments

then ~52min for:

create environment
install deps
find errors and switch python
try again
try test data
fix errors
try slow5 conversion of fast5 test data
fix errors
try again
test slow5 reading
success!

So about 1h.

Thanks!

P.S:
I've added my notes of this below. I would recommend stating that python3.8 is recommended for installing tensorflow 2.4.4 and to use virtual environments.
Also the tar command puts the output in the wrong folder (I think), I had to move it back to the models folder.

James' Notes:

setting up slow5 hack using fast5dir for *.blow5 file

time taken: 8min

trying to install on laptop

CUDA Version: 11.7 might work? server is 11.1, so will have to see

no python version given. Trying my default of 3.10

no instruction of creating venv, but doing that first

python3 -m venv venv
source venv/bin/activate
export PYSLOW5_ZSTD=1
pip install --upgrade pip
pip install -r requirements.txt

error:

ERROR: Could not find a version that satisfies the requirement tensorflow~=2.4.4 (from versions: 2.8.0rc0, 2.8.0rc1, 2.8.0, 2.8.1, 2.8.2, 2.8.3, 2.9.0rc0, 2.9.0rc1, 2.9.0rc2, 2.9.0, 2.9.1, 2.9.2, 2.10.0rc0, 2.10.0rc1, 2.10.0rc2, 2.10.0rc3, 2.10.0, 2.11.0rc0, 2.11.0rc1)
ERROR: No matching distribution found for tensorflow~=2.4.4

looks like the release compatibility on pypi is for python3.8

so using deadsnakes, installing python3.8

sudo apt install python3.8 python3.8-dev python3.8-venv

deactivate
rm -rf ./venv

python3.8 -m venv venv
source venv/bin/activate
export PYSLOW5_ZSTD=1
pip install --upgrade pip
pip install -r requirements.txt

okay that worked! (7min so far)

tar -xvzf radian/models/rnamodel_12mer_pc.tar.gz

testing on test data

cd radian
mkdir out_dir
python3 basecall.py data out_dir

error:

Traceback (most recent call last):
  File "./basecall.py", line 8, in <module>
    import pyslow5
  File "python/pyslow5.pyx", line 1, in init pyslow5
ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 96 from C header, got 80 from PyObject

maybe something to do with numpy

removing and adding again

that doesn't work. Problem is i need to purge the cached pyslow5 build to force it to build again

pip cache purge
pip install pyslow5

now that works

trying again

python3 basecall.py data out_dir

error:
FileNotFoundError: [Errno 2] No such file or directory: 'models/rnamodel_12mer_pc.json'

mv rnamodel_12mer_pc.json radian/models/

can't use my GPU? or something.

(venv) jamfer@garvan-work:~/Dropbox/Bioinformatics/tools/repos/radian/radian$ python3 basecall.py data out_dir
2022-10-27 15:37:50.938231: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-10-27 15:37:50.938255: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2022-10-27 15:38:07.925487: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2022-10-27 15:38:07.926116: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2022-10-27 15:38:07.935911: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-10-27 15:38:07.936078: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 3050 Ti Laptop GPU computeCapability: 8.6
coreClock: 1.223GHz coreCount: 20 deviceMemorySize: 3.82GiB deviceMemoryBandwidth: 163.94GiB/s
2022-10-27 15:38:07.936166: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-10-27 15:38:07.936218: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcublas.so.11'; dlerror: libcublas.so.11: cannot open shared object file: No such file or directory
2022-10-27 15:38:07.936253: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcublasLt.so.11'; dlerror: libcublasLt.so.11: cannot open shared object file: No such file or directory
2022-10-27 15:38:07.936287: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcufft.so.10'; dlerror: libcufft.so.10: cannot open shared object file: No such file or directory
2022-10-27 15:38:07.936320: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcurand.so.10'; dlerror: libcurand.so.10: cannot open shared object file: No such file or directory
2022-10-27 15:38:07.936354: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcusolver.so.10'; dlerror: libcusolver.so.10: cannot open shared object file: No such file or directory
2022-10-27 15:38:07.936385: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcusparse.so.11'; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory
2022-10-27 15:38:07.936434: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory
2022-10-27 15:38:07.936440: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1757] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2022-10-27 15:38:07.936691: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-10-27 15:38:07.937207: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2022-10-27 15:38:07.937223: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2022-10-27 15:38:07.937226: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267]      
2022-10-27 15:38:08.320243: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2022-10-27 15:38:08.320551: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 2304000000 Hz
Basecalled read 00256416-5423-47a9-ad91-54a87a6be5e5 in 6.25 sec.
Basecalled read 018097e8-babe-4525-8426-e7cfe568219c in 1.59 sec.
Basecalled read 022542bc-00b3-4d6a-9226-9f5da14af8ea in 4.53 sec.
Basecalled read 04295bc9-d0af-4a85-a59f-fff2ef52ece6 in 5.83 sec.
Basecalled read 049f55ce-f95e-4712-95a7-44a5709134f8 in 3.93 sec.

anyway, works for CPU on a few reads. That's enough to test.

SO now testing slow5....let's convert the fast5 files (we are at 30min now)

cd data
mkdir fast5
cp *.fast5 fast5/
slow5tools f2s -o reads.blow5 fast5/
[list_all_items] Looking for '*.fast5' files in fast5/
[f2s_main] 1 fast5 files found - took 0.000s
[f2s_main] Just before forking, peak RAM = 0.000 GB
[f2s_iop] 1 proceses will be used.
[fast5_group_itr::ERROR] Bad fast5: A primary attribute is missing in the fast5//reads.fast5.
[read_fast5::ERROR] Bad fast5: Could not iterate over the read groups in the fast5 file fast5//reads.fast5.
[f2s_child_worker::ERROR] Could not read contents of the fast5 file 'fast5//reads.fast5'.

lol! okay that's weird...

opens up file in HDFView

oh...there are 2 signal entries. Let's delete the old_signal entry...

try again with slow5tools....and that works now with slow5tools.

Now let's try slow5 as the input.

python3 basecall.py ./data/reads.blow5 out_dir

yep, that works!

(venv) jamfer@garvan-work:~/Dropbox/Bioinformatics/tools/repos/radian/radian$ python3 basecall.py ./data/reads.blow5 out_dir
2022-10-27 17:12:43.659243: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-10-27 17:12:43.659266: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2022-10-27 17:12:59.056701: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2022-10-27 17:12:59.057282: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2022-10-27 17:12:59.066022: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-10-27 17:12:59.066126: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 3050 Ti Laptop GPU computeCapability: 8.6
coreClock: 1.223GHz coreCount: 20 deviceMemorySize: 3.82GiB deviceMemoryBandwidth: 163.94GiB/s
2022-10-27 17:12:59.066177: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-10-27 17:12:59.066204: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcublas.so.11'; dlerror: libcublas.so.11: cannot open shared object file: No such file or directory
2022-10-27 17:12:59.066226: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcublasLt.so.11'; dlerror: libcublasLt.so.11: cannot open shared object file: No such file or directory
2022-10-27 17:12:59.066249: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcufft.so.10'; dlerror: libcufft.so.10: cannot open shared object file: No such file or directory
2022-10-27 17:12:59.066271: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcurand.so.10'; dlerror: libcurand.so.10: cannot open shared object file: No such file or directory
2022-10-27 17:12:59.066291: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcusolver.so.10'; dlerror: libcusolver.so.10: cannot open shared object file: No such file or directory
2022-10-27 17:12:59.066312: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcusparse.so.11'; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory
2022-10-27 17:12:59.066333: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory
2022-10-27 17:12:59.066337: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1757] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2022-10-27 17:12:59.066507: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-10-27 17:12:59.067950: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2022-10-27 17:12:59.067967: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2022-10-27 17:12:59.067970: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267]      
2022-10-27 17:12:59.395246: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2022-10-27 17:12:59.395513: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 2304000000 Hz
Basecalled read 00256416-5423-47a9-ad91-54a87a6be5e5 in 6.49 sec.
Basecalled read 018097e8-babe-4525-8426-e7cfe568219c in 1.75 sec.
Basecalled read 022542bc-00b3-4d6a-9226-9f5da14af8ea in 4.58 sec.
Basecalled read 04295bc9-d0af-4a85-a59f-fff2ef52ece6 in 6.02 sec.
Basecalled read 049f55ce-f95e-4712-95a7-44a5709134f8 in 4.13 sec.

Winning!

Time: 52min.

Total time to implement and test: ~1h

add slow5 reading

a1d1e9a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add slow5 reading #5

add slow5 reading #5

Psy-Fer commented Oct 27, 2022

add slow5 reading #5

Are you sure you want to change the base?

add slow5 reading #5

Conversation

Psy-Fer commented Oct 27, 2022

James' Notes: