Loading Molecules and Chemical Systems#

Loading Molecule Data into Components#

One of the first tasks you’ll likely want to do is loading your various input files. In openfe the entire contents of a simulation volume, for example the ligand, protein and water is referred to as the ChemicalSystem.

A free energy difference is defined as being between two such ChemicalSystem objects. To make expressing free energy calculations easier,this ChemicalSystem is broken down into various Component objects. It is these Component objects that are then transformed, added or removed when performing a free energy calculation.

Once a chemical model is loaded into a Component it is read only and cannot be modified. This means that any modification/tweaking of the inputs must be done before any Component objects are created. This is done so that any data cannot be accidentally modified, ruining the provenance chain.

As these all behave slightly differently to accomodate their contents, there are specialised versions of Component to handle the different items in your system. We will walk through how different items can be loaded, and then how these are assembled to form ChemicalSystem objects.

Loading small molecules#

Small molecules, such as ligands, are handled using the SmallMoleculeComponent class. These are lightweight wrappers around RDKit Molecules and can be created directly from an RDKit molecule:

[1]:
from rdkit import Chem
import openfe

m = Chem.MolFromMol2File('assets/benzene.mol2', removeHs=False)

smc = openfe.SmallMoleculeComponent(m, name='benzene')

Remember to include the removeHs=False keyword argument so that RDKit does not strip your hydrogens!

As these types of structures are typically stored inside sdf files, there is a from_sdf_file convenience class method:

[2]:
import openfe

smc = openfe.SmallMoleculeComponent.from_sdf_file('assets/benzene.sdf')

The from_sdf_file method will only read the first molecule in a multi-molecule file.

To load multiple molcules, use RDKit’s Chem.SDMolSupplier to iterate over the contents, and create a SmallMoleculeComponent from each.

[3]:
from rdkit import Chem
import openfe

molecules = [
    openfe.SmallMoleculeComponent(mol)
    for mol in Chem.SDMolSupplier(
        "assets/somebenzenes.sdf",
        removeHs=False,
    )
]

Loading proteins#

Proteins are handled using a ProteinComponent. Like SmallMoleculeComponent, these are based upon RDKit Molecules; however, they are expected to have the MonomerInfo struct present on all atoms. This struct contains the residue and chain information and is essential to apply many popular force fields. A “protein” here is considered as the fully modelled entire biological assembly, including all chains, structural waters, ions, and so forth.

To load a protein, use the ProteinComponent.from_pdb_file or ProteinComponent.from_pdbx_file class methods.

[4]:
import openfe

p = openfe.ProteinComponent.from_pdb_file('assets/t4_lysozyme.pdb')

Defining solvents#

The bulk solvent phase is defined using a SolventComponent object. Unlike the previously detailed Components, this does not have any explicit molecules or coordinates, but instead represents the way that the overall system will be solvated. This information is then interpreted inside the Protocol when solvating the system.

By default, this solvent is water with 0.15 M NaCl salt. All parameters; the positive and negative ion as well as the ion concentration (which must be specified along with the unit) can be freely defined.

[5]:
import openfe
from openff.units import unit

solv = openfe.SolventComponent(ion_concentration=0.15 * unit.molar)

Assembling into ChemicalSystems#

With individual components defined, we can then proceed to assemble combinations of these into a description of an entire system with the ChemicalSystem class. The end result of this is a chemical model which describes the chemical topology (e.g. bonds, formal charges) and atom positions, but does not describe the force field parameters or atom types or any other energetic terms.

The input to the ChemicalSystem constructor is a dictionary mapping string labels (e.g. ‘ligand’ or ‘protein’) to individual Component objects. The nature of these labels must match the labels that a given Protocol expects. For free energy calculations, we often want to describe two systems which feature many similar components but differ in one component, which is the subject of the free energy perturbation. For example we could define two ChemicalSystem objects which we could perform a relative binding free energy calculation between as:

[6]:
from openfe import ChemicalSystem, ProteinComponent, SmallMoleculeComponent, SolventComponent

# Define the solvent environment and protein structure, these are common across both systems
sol = SolventComponent(ion_concentration=0.15 * unit.molar)
p = ProteinComponent.from_pdb_file('assets/t4_lysozyme.pdb')

# Specify the dictionary of shared components
shared_components = {'solvent': sol, 'protein': p}

# Define the two ligands we are interested in
m1 = SmallMoleculeComponent.from_sdf_file('assets/benzene.sdf')
m2 = SmallMoleculeComponent.from_sdf_file('assets/toluene.sdf')

# create the systems
cs1 = ChemicalSystem({'ligand': m1, **shared_components})
cs2 = ChemicalSystem({'ligand': m2, **shared_components})