envs/
--> conda environments
cdk8_structures/
--> target structures
aligned_fragments.tgz
available for download at 10.5281/zenodo.7023191 --> output data after steps 1-3, input for step 4+
scripts/
--> scripts to for library generation
output_files.tgz
available for download at 10.5281/zenodo.7023191 --> data obtained at each step, and depedencies
Conda environements with python 3.6+.
sc-PDB subpockets and fragments
- input structures from the sc-PDB database
- fragmentation, cavity clouds and interactions computed with IChem
for more information on fragmentation, sc-PDBFrag
CDK8 pocket
<this_repo>/cdk8_structures/5hbh_cavityALL_p0-p1-p6.mol2
ProCare: https://github.com/kimeguida/ProCare (state of our conda env: envs/procare.yml
)
source <path_to_your_conda>
conda activate procare
cd aligned_fragments/
python ../scripts/procare_launcher.py -s <subpocket> -t ../cdk8_structures/5hbh_cavityALL_p0-p1-p6.mol2 --transform --ligandtransform <fragment>
Outputs: aligned subpockets, fragments and procare_scores.tsv
We used OpenEye python toolkits (state of our conda env: envs/oepython.yml):
conda deactivate
conda activate oepython
../scripts/convert.py <fragment>.mol2 <fragment>.sdf
<path_to_your_ichem>/IChem ../cdk8_structures/5hbh_protein.mol2 <fragment>.mol2 > <fragment>.ifp
To reproduce this step, the required data out of steps 1-to-3 were made availaible at link
current directory: aligned_fragments
containing data from steps 1-3
assignment of CDK8 areas
conda deactivate
conda activate delinker
(state of our conda env: envs/delinker.yml)
python ../scripts/select_fragments_round1.py -f ../output_files/procare_scores.tsv -d . -p ../cdk8_structures/5hbh_protein.mol2 -c ../cdk8_structures/5hbh_cavityALL_p0-p1-p6.mol2
Outputs:
subpocket_p0_gate.list
which corresponds to GA1
subpocket_p0_hinge.list
--> H
subpocket_p0_solv_1.list
--> SE2
subpocket_p0_solv_2.list
--> SE1
subpocket_p6_alphaC.list
--> AC
subpocket_p6_lys52.list
--> GA2
available in <this_repo>/output_files/
python ../scripts/linkable_fragments_round1_job.py --hinge subpocket_p0_hinge.list --gate subpocket_p0_gate.list --solv1 subpocket_p0_solv_1.list --solv2 subpocket_p0_solv_2.list --alphac subpocket_p6_alphaC.list --lys52 subpocket_p6_lys52.list
Outputs:
linkable_fragments_round1_<N>.list
with N in {0, 1, 2, 3, 4, 5, 6} available in <this_repo>/output_files/
DeLinker: https://github.com/oxpig/DeLinker
Feed DeLinker with connecatble candidates:
python ../scripts/delinker.py -f linkable_fragments_round1_<N>.list -p <your_path_to_DeLinker>/DeLinker/ > /dev/null
This step was distributed on computer clusters.
Output: generation.smi renamed and zipped as generation_<N>.smi.gz
with N in {0, 1, 2, 3, 4, 5, 6} available in <this_repo>/output_files/
Sometimes, no linker is generated and DeLinker might return truncated attempts.
python ../scripts/get_linker.py --file generation_<N>.smi.gz --fragsdir . --pathdelinker <your_path_to_DeLinker>/DeLinker/
Outputs:
generation_complete.smi
generation_uncomplete.smi
SMILES were assigned IDs to keep track of the molecules infos. Filter will protonate and generate canonical SMILES different from RDKit's.
python ../scripts/index_generated_molecules.py -i generation_complete.smi -o generation_complete_indexed.smi
Output: generation_complete_indexed.smi
<path_to_openeye>/filter -in generation_complete_indexed.smi -out druglike_molecules.smi -fail druglike_failed_molecules.smi -filter ../output_files/filter_labo_cdk8.txt
Check that other annotations in the file did not affect how Filter processed the SMILES. In our case, we extracted the SMILES and indexes to a separate file molecules.smi
.
Output: druglike_molecules.smi
SAscore from https://github.com/rdkit/rdkit/blob/master/Contrib/SA_Score/sascorer.py
python ../scripts/get_sascore.py -i druglike_molecules.smi -o druglike_molecules_sascore.tsv
Output: druglike_molecules_sascore.tsv
RDKit descriptors
python ../scripts/get_druglike_descriptors.py -i druglike_molecules.smi -o druglike_molecules_descriptors.tsv
python ../scripts/get_linker_descriptors.py -i generation_complete_indexed.smi -o generation_linker_descriptors.tsv
Outputs:
druglike_molecules_descriptors.tsv
generation_linker_descriptors.tsv
Clean generated linkers, remove too flexible, hydrophobic
python ../scripts/filter_linker.py -i generation_linker_descriptors.tsv -o linker_discarded.tsv
Output: linker_discarded.tsv
python ../scripts/library_round1.py --descriptor druglike_molecules_descriptors.tsv --sascore druglike_molecules_sascore.tsv --discarded linker_discarded.tsv -o libr1.txt
Output: libr1.txt
Example of hit compound 12
python ../scripts/round2_fuse_mols.py --dl druglike_molecules.smi --gen generation_complete_indexed.smi --origin ../output_files/frag_origin.tsv --discarded linker_discarded.tsv --procare ../output_files/procare_scores.tsv
Outputs:
hit12_round2_mols.tsv
hit12_round2_sascore_pass.tsv
RDKit descriptors
python ../scripts/get_round2_descriptors.py -i hit12_round2_mols.tsv -o hit12_round2_mols_descriptors.tsv
Output: hit12_round2_mols_descriptors.tsv
candidates for synthesis
python ../scripts/library_round2.py -i hit12_round2_mols_descriptors.tsv --sascore hit12_round2_sascore_pass.tsv -o libr2.txt
Output:libr2.txt