Loading your data into ChemicalSystems#
One of the first tasks you’ll likely want to do is loading your various input files.
In openfe
the entire contents of a simulation volume, for example the ligand, protein and water is referred to
as the openfe.ChemicalSystem
.
A free energy difference is defined as being between two such ChemicalSystem
objects.
To make expressing free energy calculations easier,
this ChemicalSystem
is broken down into various Component
objects.
It is these Component
objects that are then transformed, added or removed when performing a free energy calculation.
Note
Once chemical models are loaded into Components they are read only and cannot be modified. This means that any modification/tweaking of the inputs must be done before the Component objects are created. This is done so that any data cannot be accidentally modified, ruining the provenance chain.
As these all behave slightly differently to accomodate their contents,
there are specialised versions of Component to handle the different items in your system.
We will walk through how different items can be loaded,
and then how these are assembled to form ChemicalSystem
objects.
Loading small molecules#
Small molecules, such as ligands, are handled using the {py:class}`openfe.SmallMoleculeComponent` class. These are lightweight wrappers around RDKit Molecules and can be created directly from an RDKit molecule:
from rdkit import Chem
import openfe
m = Chem.MolFromMol2File('myfile.mol2', removeHs=False)
smc = openfe.SmallMoleculeComponent(m, name='')
Warning
Remember to include the removeHs=False
keyword argument so that RDKit does not strip your hydrogens!
As these types of structures are typically stored inside sdf files, there is a from_sdf_file
convenience class method:
import openfe
smc = openfe.SmallMoleculeComponent.from_sdf_file('file.sdf')
Note
The from_sdf_file
method will only read the first molecule in a multi-molecule MolFile.
To load multiple molcules, use RDKit’s Chem.SDMolSupplier
to iterate over the contents,
and create a SmallMoleculeComponent
from each.
Loading proteins#
Proteins are handled using an openfe.ProteinComponent
.
Like SmallMoleculeComponent
, these are based upon RDKit Molecules,
however these are expected to have the MonomerInfo struct present on all atoms.
This struct contains the residue and chain information and is essential to apply many popular force fields.
A “protein” here is considered as the fully modelled entire biological assembly,
i.e. all chains and structural waters and ions etc.
To load a protein, use the openfe.ProteinComponent.from_pdb_file()
or openfe.ProteinComponent.from_pdbx_file()
classmethod
import openfe
p = openfe.ProteinComponent.from_pdb_file('file.pdb')
Defining solvents#
The bulk solvent phase is defined using a openfe.SolventComponent
object.
Unlike the previously detailed Components, this does not have any explicit molecules or coordinates,
but instead represents the way that the overall system will be solvated.
This information is then interpreted inside the Protocol
when solvating the system.
By default, this solvent is water with 0.15 M NaCl salt. All parameters; the positive and negative ion as well as the ion concentration (which must be specified along with the unit) can be freely defined.
import openfe
from openff.units import unit
solv = openfe.SolventComponent(ion_concentation=0.15 * unit.molar)
Assembling into ChemicalSystems#
With individual components defined, we can then proceed to assemble combinations of these into
a description of an entire system, called a openfe.ChemicalSystem
.
The end result of this is a chemical model
which describes the chemical topology (e.g. bonds, formal charges) and atoms’ positions
but does not describe the force field aspects, and therefore any energetic terms.
The input to the ChemicalSystem constructor is a dictionary mapping string labels (e.g. ‘ligand’ or ‘protein’) to individual Components. The nature of these labels must match the labels that a given Protocol expects. For free energy calculations we often want to describe two systems which feature many similar components but differ in one component, which is the subject of the free energy perturbation. For example we could define two ChemicalSystem objects which we could perform a relative binding free energy calculation between as:
from openfe import ChemicalSystem, ProteinComponent, SmallMoleculeComponent, SolventComponent
# define the solvent environment and protein structure, these are common across both systems
sol = SolventComponent()
p = ProteinComponent()
# define the two ligands we are interested in
m1 = SmallMoleculeComponent()
m2 = SmallMoleculeComponent()
# construct two systems, these only differ in the ligand input
cs1 = ChemicalSystem({'ligand': m1, 'solvent': sol, 'protein': p})
cs2 = ChemicalSystem({'ligand': m2, 'solvent': sol, 'protein': p})