You need to install the Conda dependencies this script uses.
- Conda
- Pip
- OpenMM:
- Conda Packages via below where you can replace ENVIRONMENTNAME with your choice. Then, in your terminal in any directory that you reserve for projects (get to that directory via cd command), type in below. Replace PATHTOPACAKGELISTTXT with the path to the environment.yml file. The new environment will be named 'testenv'.
conda env create -f PATHTOPACAKGELISTTXT
To activate the environment:
conda activate testenv
This command adds executable permissions to the owner, group, and complementary:
cd path_to_scoresolver
cd wham
cd wham
chmod +x wham
To verify the command worked, run:
ls -l wham
If it says
chmod: wham: No such file or directory
Execute the command to compile wham.c
make
Example Output:
-rwxr-xr-x 1 user group 12345678 date wham
---
title: Bind Coefficient Calculation
---
graph LR;
pdb[("test.pdb")]
win["windows"]
mt["metafile.txt & cv data"]
pmf["pmf.txt"]
traj["smd_traj.dcd"]
wl["wham_logs.txt"]
kd["$$K_d$$"]
pdb -- "SMD Pulling Loop" --> traj & win
win -- "Umbrella Sampling" --> mt
mt -- "WHAM ANALYSIS" --> pmf & wl
pmf -- "Binding Coeff. Algorithm" --> kd
win -- "Binding Coeff. Algorithm" --> kd
If the histograms have poor overlap you will need to use more windows and/or reduce the spring constant.
The windows do not need to be linearly spaced and they can have different spring constants.
In all, the main theory is that we are pulling a complex or whatever at two pivotal important atoms. Then, it constructs the curve that describes the total amount of free energy used to pull the protein. If you don't know, free energy is usable energy that can be accessed by the system. Unusable energy at a moment in time would include the energy that is distributed with less of a probability to be gathered to be used as work. There is no concrete of what "usable work" is, but it is described as the most average and suitable approximation of how much energy can be used. In other words, its a statistical quantity.
-
$F$ : Free energy -
$k_B$ : Boltzmann constant -
$T$ : Temperature -
$Z$ : Canonical partition function -
$\xi$ : Collective Variable (CV)$\to$ represents the progress of a reaction into a single value of a state -
$\mathbf{r}$ : Atomic coordinates -
$P(\xi)$ : Probability distribution of the system as a function of CV -
$\beta$ : Inverse temperature ($\beta = 1 / k_B T $ ) -
$U(\mathbf{r})$ : Potential energy of the system -
$U_i^{\text{bias}}$ : Biasing potential for window$i $ -
$k$ : Force constant for the biasing potential -
$\xi_i$ : Target value of the CV in window$i $ -
$P_i^{\text{bias}}(\xi)$ : Biased probability distribution for window$i $ -
$n_i(\xi)$ : Number of samples in bin$\xi$ for window$i $ -
$F_i^{\text{bias}}(\xi)$ : Biased free energy in window$i $ -
$F(\xi)$ : Unbiased free energy (Potential of Mean Force, PMF) -
${\displaystyle \delta _{a}(x)}$ : Dirac delta function
- Reaction Coordinate
Note: different atomic configurations
- Free energy in the canonical ensemble:
In context,
- Dirac Delta Function: (
$\delta(x) = \delta_1(x)$ ):
In the case of this theory, it projects the space of
- Partition Function (discrete):
Encapsulates the total distribution of energy, which can later be used to normalize probability distributions over boltzmann factors.
- Probability Distribution in
$\textbf{r}$
Here higher values of
- Probability Distribution (continous):
Which is the integration over all possible degrees of freedom of a reaction coordinate
In all, its basically multiplying all the possible factors that affect the probability of having state
Theoretically, when sampling without restraints, the most probable configurations will be sampled, but most of these are at the lowest of the potential energy landscape. We need to sample the areas with higher levels of energy, which would be otherwise hard to reach. Thus, we add biasing potential of window
- Biasing Potential
It acts a spring force with strength $k $ that restrains the system simulation to state
So, each quantity discussed previously must have biased version indicated by an apostrophe superscript
- Energy at window
$i$ to unbiased version:
- Probability distribution at window
$i$ to unbiased version:
- Free Energy at window
$i$ to unbiased version:
Combining equations
Theoretically,
- Absolute Binding Free Energy
Where
$\mathrm{pocket} := {\xi:\textrm{ligand at state},\xi,\textrm{is bound to the receptor} }$ $\mathrm{bulk} := {\xi:\textrm{ligand at state},\xi,\textrm{is not interacting with the receptor} }$
So, what we have right now is the
The most important parameters are:
-
$L_i$ and$L_f$ : the starting and ending lengths of the complex you are trying to stretch -
$\textrm{index1}$ and$\textrm{index2}$ : the indexes of the two atoms we want to pivot as endpoints and stretch accordingly. -
$\textrm{fc}_\textrm{pull}$ : the strength of bias -
$\textrm{v}_\textrm{pulling}$ : the speed of exeucting the pulling loop to streth the two pivot atoms (the simulation will thus adjust the total time of the pulling loop based on$L_i$ and$L_f$ as well)
Virtually,
Additionally, since the simulation is purely discretized,
However, the pivot indexes can be a little tricky to determine. But, the best way to start off is by identifying your objective and looking at the starting PDB to get a glipmse at what you're working with.
Generally Pick the strongest parts of the Complex
- For proteins: C-alpha atoms are picked because they're usually the backbone atoms that represent the structure very well, so stretching that wont break the system entirely (they are strong enough to stretch).
- For nucleic acids: Phosphate atoms or specific base atoms (because again, they're the strongest backbones).
- For small molecules or ligands: Select atoms that are involved in key interactions or are structurally significant.
If you are trying to get the binding affinity of a docked complex (which is what I am trying to do in this case,) I would select the center of mass of the ligand, and the key atom in a resiude in the binding pocket of the receptor.
As stored in the
./gh_scoresolver/tests/pdb/ahl_dock_luxr_1.pdb
PDB file, we have a ligand protein (
https://openmm.github.io/openmm-cookbook/latest/notebooks/tutorials/umbrella_sampling.html, https://www.sciencedirect.com/topics/biochemistry-genetics-and-molecular-biology/umba, https://en.wikipedia.org/wiki/Umbrella_sampling, https://www.nature.com/articles/s43588-022-00389-9