8ed0ecf165cf42e0834cc8f611e3b12a

OpenFE Showcase: Relative Binding Free Energies in the TYK2 System#

Intro#

Welcome to the Open Free Energy toolkit!

The OpenFE toolkit provides open-source frameworks for calculating alchemical free energies. This notebook showcases the methods that are available in OpenFE and their usage.

Throughout this showcase, we will introduce different interchangeable components that users can choose from during the setup of free energy calculations. OpenFE allows you to mix and match different components, such as:

Atom mappers
Scorers (for atom mappings)
Ligand networks

This showcase currently focuses on relative binding free energy (RBFE) calculations. However, OpenFE also provides protocols for running absolute hydration free energy calculations and Molecular Dynamics (MD) simulations. In the future, other methods will become available, such as absolute binding free energy calculations and RBFE calculations using a Separated Topologies approach.

If you are planning your own calculations, please also check out our tutorials which will walk you step-by-step through setup, execution and analysis of different protocols.

Outline#

Setup for Google Colab
Overview
Setup 2.1. Loading Ligands and Defining Ligand Atom Mappings 2.2. Creating a ligand network 2.3. Defining ChemicalSystems 2.4. Defining the RBFE simulation settings and protocol
Running a Relative Ligand Binding Free Energy Calculation 3.1. Using the Python API 3.2. Using the CLI
Analysis
Relative Free Energies with the OpenFE CLI
Useful References for Getting Started

1. Overview#

In this example we show how to set up a network of transformations using the OpenFE toolkit for small chemical modifications of ligands binding to tyrosine kinase 2 (TYK2).

For convenience, a prepared (capped and protonated) PDB structure of the TYK2 protein is provided under inputs/tyk2_protein.pdb.

265fc2fd608f42bebef5f4eb58544c47

1.1. The dataset: Alchemical transformations of TYK2 ligands#

Here we explore how OpenFE can be used to build a network of alchemical transformations between the TYK2 ligands.

First, we will use rdkit to visualize the TYK2 ligands.

[3]:

from rdkit import Chem
from rdkit.Chem import AllChem
from rdkit.Chem import Draw

# Extract the contents of the sdf file and visualise it
ligands_rdmol = [mol for mol in
                 Chem.SDMolSupplier('inputs/tyk2_ligands.sdf', removeHs=False)]

for ligand in ligands_rdmol:
    AllChem.Compute2DCoords(ligand)

Chem.Draw.MolsToGridImage(ligands_rdmol)

[3]:

../_images/tutorials_showcase_notebook_9_0.png

1.2. The plan#

Here is what we will achieve in this notebook and what software toolchains are used along the way.

Actions	Software
Create OpenFE Molecules	OpenFE RDKit
Create Network	OpenFE Lomap, Networkx
Visualise Network	OpenFE NetworkX, RDKit, Matplotlib
Create ligand topologies	OpenFE interface - OpenFF tk
Create hybrid OpenMM topology	OpenFE interface - OpenMMTools (eventually - ex Perses)
Create Lambda Protocol	OpenFE interface - OpenMMTools (eventually - ex Perses)
Set up and run RBFE calculations	OpenFE interface - OpenMM + OpenMMTools
Analyze RBFE calculations	OpenFE interface - PyMBAR + OpenMMTools

2. Setup#

2.1. Loading Ligands and Defining Ligand Atom Mappings#

Creating OpenFE SmallMoleculeComponents#

To keep track of the various inputs being passed through the OpenFE toolkit, OpenFE implements a set of Components which define the proteins, small molecules, and solvent components which a system may contain. Here we use the SmallMoleculeComponent which takes in either RDKit molecules or OpenFF molecules.

In the backend, OpenFE treats the RDKit molecules as the central representation of the ligands, and uses the OpenFF toolkit to convert between objects from various toolchains (for example OpenEye’s OEMol).

Here we demonstrate how to load the ligands from inputs/tyk2_ligands.sdf into a list of OpenFE SmallMoleculeComponents for further processing.

Load ligands using RDKit:

[4]:

import locale
locale.getpreferredencoding = lambda _: 'UTF-8'  # hack for google colab, not needed for local execution
from openfe import SmallMoleculeComponent

# Load ligands using RDKit
ligands_sdf = Chem.SDMolSupplier('inputs/tyk2_ligands.sdf', removeHs=False)

# Now pass these to form a list of Molecules
ligand_mols = [SmallMoleculeComponent(sdf) for sdf in ligands_sdf]

Load ligands using the OpenFF toolkit:

[5]:

from openff.toolkit import Molecule
from openfe import SmallMoleculeComponent

# Load ligands using OpenFF toolkit
ligands_sdf = Molecule.from_file('inputs/tyk2_ligands.sdf')

# Now pass these to form a list of Molecules
ligand_mols = [SmallMoleculeComponent.from_openff(sdf) for sdf in ligands_sdf]

OpenFE SmallMoleculeComponents have some useful built-in attributes and methods.

For example, the molecule’s name (as defined by the SDF file) can be accessed:

[6]:

print("name: ", ligand_mols[0].name)

name:  lig_ejm_31

As previously stated, SmallMoleculeComponents also use the OpenFF backend to allow conversion between different object types. For example, it’s possible to obtain an OpenFF Molecule:

[7]:

type(ligand_mols[0].to_openff())

[7]:

openff.toolkit.topology.molecule.Molecule

Ligand Atom Mapping#

In the hybrid topology RBFE protocol , an atom mapping defines which atoms are mutated during the alchemical transformation. The user can choose between two different atom mappers:

LomapAtomMapper
- based on the maximum common substructure (MCS)
KartografAtomMapper
- based on the 3D geometries of the ligands

While we use the defaults here, please note that the various supported arguments of Lomap and Kartograf can be passed to the atom mapper.

1. ``LomapAtomMapper``

[8]:

from openfe.setup import LomapAtomMapper
mapper = LomapAtomMapper()
lomap_mapping = next(mapper.suggest_mappings(ligand_mols[0], ligand_mols[4]))

We can also visualize the atom mappings by invoking the individual OpenFE AtomMapping objects directly.

Unique atoms between each mapping are shown in red, and atoms which are mapped but undergo element changes are shown in blue. Bonds which either involve atoms that are unique or undergo element changes are highlighted in red.

[9]:

# We can display the atom mapping in 2D by calling it
lomap_mapping

../_images/tutorials_showcase_notebook_27_0.png

It is also possible to visualize the mapping in 3D using py3dmol:

Here, the view_3D method displays the two end state molecules (left and right), in addition to the hybrid molecule (middle).

Atoms that have the same sphere color in both end states are mapped (i.e. will be interpolated between each other), whilst those that do not have a coloured sphere are unmapped (i.e. will be transformed into dummy atoms in the opposite end state).

[10]:

# Visualize the mapping in 3D
lomap_mapping.view_3d(show_atomIDs=True)

3Dmol.js failed to load for some reason. Please check your browser console for error messages.

[10]:

<py3Dmol.view at 0x117bbef90>

2. ``KartografAtomMapper``

We can also use the KartografAtomMapper which is based on the 3D geometries of the ligands.

[11]:

from kartograf import KartografAtomMapper
# Build Kartograf Atom Mapper
mapper = KartografAtomMapper(atom_map_hydrogens=True)

# Get Mapping
kartograf_mapping = next(mapper.suggest_mappings(ligand_mols[0], ligand_mols[4]))

INFO:   #################################
INFO:   Map Heavy Atoms
INFO:   #################################
INFO:   Masking Atoms
INFO:   Build Distance Matrix
INFO:   Calculate Mapping
INFO:   Find Maximal overlapping connected sets of mapped atoms
INFO:   #################################
INFO:   Map Hydrogen Atoms:
INFO:   #################################
INFO:   Masking Atoms
INFO:   Build Distance Matrix
INFO:   Calculate Mapping
INFO:   Find Maximal overlapping connected sets of mapped atoms
INFO:   Filtering bond breaks

[12]:

# We can display the atom mapping in 2D by calling it
kartograf_mapping

../_images/tutorials_showcase_notebook_32_0.png

2.2. Creating a ligand network#

A LigandNetwork is a set of SmallMoleculeComponents that are connected by AtomMappings of two small molecules.

The user can choose between multiple different network topologies:

Minimial spanning tree (MST)
LOMAP network
Radial (star) network
Loading in networks from external software (FEP+ or Orion)
Loading in a user-defined network

In this section, we will create and visualize the MST, LOMAP, and radial networks for the TYK2 dataset.

Here, we will be using the LomapAtomMapper as the atom mapper for all networks.

[13]:

# Create network from the two molecules
import openfe
from openfe.setup.ligand_network_planning import generate_radial_network
from openfe.setup.ligand_network_planning import generate_minimal_spanning_network
from openfe.setup.ligand_network_planning import generate_lomap_network
from openfe.setup import LomapAtomMapper

# Create an MST network
mst_network = generate_minimal_spanning_network(
    ligands=ligand_mols,
    scorer=openfe.lomap_scorers.default_lomap_score,
    mappers=[LomapAtomMapper(),])

# Create a LOMAP network
lomap_network = generate_lomap_network(
    ligands=ligand_mols,
    scorer=openfe.lomap_scorers.default_lomap_score,
    mappers=[LomapAtomMapper(),])

# Create a radial, choosing the first ligand as central ligand
radial_network = generate_radial_network(
    ligands=ligand_mols[1:],
    central_ligand=ligand_mols[0],
    mappers=[LomapAtomMapper(),])

INFO:   Trying to remove edge 0-2 with similarity 0.818731
INFO:   Checking edge deletion on distance-to-actives 0 vs 0
INFO:   Removed edge 0-2
INFO:   Trying to remove edge 2-6 with similarity 0.860708
INFO:   Rejecting edge deletion on cycle covering
INFO:   Trying to remove edge 0-1 with similarity 0.904837
INFO:   Rejecting edge deletion on cycle covering
INFO:   Trying to remove edge 0-6 with similarity 0.904837
INFO:   Rejecting edge deletion on cycle covering
INFO:   Trying to remove edge 1-2 with similarity 0.904837
INFO:   Rejecting edge deletion on cycle covering
INFO:   Trying to remove edge 1-6 with similarity 0.951229
INFO:   Checking edge deletion on distance-to-actives 0 vs 0
INFO:   Removed edge 1-6
INFO:   Trying to remove edge 8-3 with similarity 0.904837
INFO:   Checking edge deletion on distance-to-actives 0 vs 0
INFO:   Removed edge 8-3
INFO:   Trying to remove edge 9-3 with similarity 0.904837
INFO:   Rejecting edge deletion on cycle covering
INFO:   Trying to remove edge 3-7 with similarity 0.904837
INFO:   Rejecting edge deletion on cycle covering
INFO:   Trying to remove edge 8-7 with similarity 0.951229
INFO:   Rejecting edge deletion on cycle covering
INFO:   Trying to remove edge 8-9 with similarity 0.951229
INFO:   Rejecting edge deletion on cycle covering
INFO:   Trying to remove edge 9-7 with similarity 0.951229
INFO:   Checking edge deletion on distance-to-actives 0 vs 0
INFO:   Removed edge 9-7

We can plot out the different networks to visualize their structure and to see how ligands are being tranformed.

[14]:

# Visualize the MST network
from openfe.utils.atommapping_network_plotting import plot_atommapping_network

plot_atommapping_network(mst_network)

[14]:

../_images/tutorials_showcase_notebook_36_0.png

../_images/tutorials_showcase_notebook_36_1.png

[15]:

# Visualize the LOMAP network
from openfe.utils.atommapping_network_plotting import plot_atommapping_network

plot_atommapping_network(lomap_network)

[15]:

../_images/tutorials_showcase_notebook_37_0.png

[16]:

# Visualize the radial network
from openfe.utils.atommapping_network_plotting import plot_atommapping_network

plot_atommapping_network(radial_network)

[16]:

../_images/tutorials_showcase_notebook_38_0.png

Edges along the network can be accessed to recover the individual molecules involved in a given alchemical tranformation and the atom mapping between the two ligands.

Note: as can be seen in the example below, transformations are defined within OpenFE as going from componentA to componentB

[17]:

mst_edges = [edge for edge in mst_network.edges]

# Pick an edge
edge = mst_edges[1]

# Print the smiles of the molecules and the mapping
print("molecule A smiles: ", edge.componentA.smiles)
print("molecule B smiles: ", edge.componentB.smiles)
print("map between molecule A and B: ", edge.componentA_to_componentB)

molecule A smiles:  CC(=O)Nc1cc(NC(=O)c2c(Cl)cccc2Cl)ccn1
molecule B smiles:  O=C(Nc1ccnc(NC(=O)C2CCCC2)c1)c1c(Cl)cccc1Cl
map between molecule A and B:  {0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9, 10: 10, 11: 11, 12: 12, 13: 13, 14: 14, 15: 15, 16: 16, 17: 17, 18: 18, 19: 19, 20: 20, 21: 21, 22: 22, 23: 23, 28: 38, 29: 39, 30: 40, 31: 41}

[18]:

# We can display the atom mapping of an edge by calling it
edge

../_images/tutorials_showcase_notebook_41_0.png

[19]:

from IPython.display import Image

# mappings can also be saved to file if required
edge.draw_to_file('tyk2_edge.png')

# load it back for visualisation
Image("tyk2_edge.png")

[19]:

../_images/tutorials_showcase_notebook_42_0.png

Storing the ligand network#

Created networks can easily be converted to (and also loaded from) a GraphML representation.

This can allow users of OpenFE to store the network to disk for later use.

[20]:

# Convert to graphml
with open("network_store.graphml", "w") as writer:
    writer.write(mst_network.to_graphml())

2.3. Defining the Chemical Systems#

ChemicalSystems are OpenFE containers which define the various components in a system of interest. You can consider ChemicalSystems to be the nodes along an alchemical network which are connected by edges which carry out calculations along Alchemical states to get free energies.

ChemicalSystems take in:

A dictionary of the chemical components (e.g. SmallMoleculeComponent, ProteinComponent, SolventComponent) defining the system.
An identifier name (optional), for the ChemicalSystem. This is used as part of the hash identifier of the ChemicalSystem, and can help distinguish between otherwise comparable systems.

In the case of a relative ligand binding free energy calculation for lig_ejm_31 -> lig_ejm_47, four ChemicalSystems must be defined:

lig_ejm_31 in complex with TYK2 in a box of water
lig_ejm_47 in complex with TYK2 in a box of water
lig_ejm_31 in a box of water
lig_ejm_47 in a box of water

Here we will be passing the previously defined SmallMoleculeComponents for lig_ejm_31 and lig_ejm_47. We will also pass a ProteinComponent generated from the PDB file present under inputs/tyk2_protein.pdb. Finally, instead of passing in a specific box of water, we will define a SolventComponent which will contain the necessary information for OpenMM’s Modeller class to add water and 0.15 M NaCl around the solute when creating the OpenMM simulation objects.

[21]:

# First let's define the Protein and Solvent Components which we will be using
from openfe import SolventComponent, ProteinComponent
from openff.units import unit

protein = ProteinComponent.from_pdb_file('inputs/tyk2_protein.pdb')

# Note: the distance from the solute to add water is not defined here but in the
# the relevant RBFE solver method
solvent = SolventComponent(positive_ion='Na', negative_ion='Cl',
                           neutralize=True, ion_concentration=0.15*unit.molar)

[22]:

# Extract the relevant edge for the lig_ejm_31 -> lig_ejm_47 transform in the radial graph
ejm_31_to_ejm_47 = [edge for edge in mst_network.edges if edge.componentB.name == "lig_ejm_47"][0]

ejm_31_to_ejm_47

../_images/tutorials_showcase_notebook_49_0.png

[23]:

# Let's create the four ChemicalSystems
from openfe import ChemicalSystem

ejm_31_complex = ChemicalSystem({'ligand': ejm_31_to_ejm_47.componentA,
                                  'solvent': solvent,
                                  'protein': protein,},
                               name=ejm_31_to_ejm_47.componentA.name)
ejm_31_solvent = ChemicalSystem({'ligand': ejm_31_to_ejm_47.componentA,
                                  'solvent': solvent,},
                               name=ejm_31_to_ejm_47.componentA.name)

ejm_47_complex = ChemicalSystem({'ligand': ejm_31_to_ejm_47.componentB,
                                 'solvent': solvent,
                                 'protein': protein,},
                               name=ejm_31_to_ejm_47.componentB.name)
ejm_47_solvent = ChemicalSystem({'ligand': ejm_31_to_ejm_47.componentB,
                                 'solvent': solvent,},
                               name=ejm_31_to_ejm_47.componentB.name)

2.4. Defining the RBFE simulation settings and protocol#

Now that we have a set of atom mappings defined, we know which atoms should undergo alchemical transformations to capture the free energy cost of transforming from one ligand to another.

To simulate this transformation, we use the equilibrium RBFE protocol implemented in OpenFE. This uses OpenMM to run a Perses-like relative ligand binding free energy calculation using a single topology approach.

To achieve this simulation, the following steps need to happen:

Create OpenMM systems of both end states
Create a hybrid topology based on these defined end states
Set an appropriate Lambda schedule
Set a MultiState reporter to write out appropriate coordinates and energies
Create an OpenMM sampler (in this case we will be using a replica exchange sampler)
Carry out the necessary simulation steps (minimization, equilibration, and production)

The RelativeHybridTopologyProtocol class in openfe.protocols.openmm_rfe implements a means to achieve all the above with minimal intervention.

Here we work through its usage for the lig_ejm_31 -> lig_ejm_47 binding free energy test case. As this involves both a relative binding free energy in solvent and complex phases, RelativeHybridTopologyProtocol will be used to build two separate ProtocolDAG (directed-acyclic-graph) classes, one for each phase. These DAGs (which contain the necessary individual simulations), are then executed to yield the desired free energy results.

Note: the underlying components used for the creation of OpenMM hybrid topologies and samplers is still in flux, originating mostly from Perses. Please consider these to be in beta.

Defining the RBFE simulation settings#

There are various parameters which can be set to determine how the RBFE simulation will take place. To allow for maximum user flexibility, these are defined as a series of settings objects which control the following:

protocol_repeats: The number of completely independent repeats of the entire sampling process.
simulation_settings: Parameters controling the simulation plan and the alchemical sampler, including the number of minimization steps, lengths of equilibration and production runs, the sampler method (e.g. Hamiltonian replica exchange, repex), and the time interval at which to perform an analysis of the free energies.
output_settings: Simulation output control settings, including the frequency to write a checkpoint file, the selection string for which part of the system to write coordinates for, and the paths to the trajectory and output structure storage files.
alchemical_settings: Parameters controlling the creation of the hybrid topology system. This includes various parameters ranging from softcore parameters, through to whether or not to apply an explicit charge correction for systems with net charge changes.
engine_settings: Parameters determining how the OpenMM engine will execute the simulation. This controls the compute platform which will be used to carry out the simulation.
integrator_settings: Parameters controlling the LangevinSplittingDynamicsMove integrator used for simulation.
lambda_settings: Lambda protocol settings, including number of lambda windows and lambda functions.
forcefield_settings: Parameters to set up the force field with OpenMM Force Fields, including the general force fields, the small molecule force field, the nonbonded method, and the nonbonded cutoff.
thermo_settings: Settings for thermodynamic parameters, such as the temperature and the pressure of the system.
solvation_settings: Settings for solvating the system, including the solvent model and the solvent padding.
partial_charge_settings: Settings for assigning partial charges to small molecules, including the partial charge method (e.g. am1bcc) and the OpenFF toolkit backend (e.g. ambertools or openeye).

The RelativeHybridTopologyProtocol class can directly populate the above set of default settings through its default_settings method. Parameters can be overriden after creation. In this case, we’ll reduce the equilibration length to 0.01 * nanosecond and the production to 0.05 * nanosecond in order to reduce the costs of running this notebook (in practice, values of 1 and 5 nanoseconds, respectively, would be most appropriate).

Note: by default the settings use a solvent padding of 1.5 nm, this is appropriate for solvent simulations, but for complex simulations this leads to excess water being in the box, slowing down your simulation. To deal with this, we will create two sets of settings, and two sets of protocols, one for each leg of the transformations.

[24]:

from openfe.protocols.openmm_rfe import RelativeHybridTopologyProtocol
from openff.units import unit

# Create the solvent settings
solvent_rbfe_settings = RelativeHybridTopologyProtocol.default_settings()
solvent_rbfe_settings.simulation_settings.equilibration_length = 10 * unit.picosecond # Reduce equilibration length to 10 picoseconds
solvent_rbfe_settings.simulation_settings.production_length = 50 * unit.picosecond # Reduce prodution length to 50 picoseconds
solvent_rbfe_settings.engine_settings.compute_platform = None  # default is to look for CUDA, none allows falling back to OpenCL or CPU

# Create the complex settings
complex_rbfe_settings = RelativeHybridTopologyProtocol.default_settings()
complex_rbfe_settings.simulation_settings.equilibration_length = 10 * unit.picosecond # Reduce equilibration length to 10 picoseconds
complex_rbfe_settings.simulation_settings.production_length = 50 * unit.picosecond # Reduce prodution length to 50 picoseconds
complex_rbfe_settings.solvation_settings.solvent_padding = 1 * unit.nanometer
complex_rbfe_settings.engine_settings.compute_platform = None  # default is to look for CUDA, none allows falling back to OpenCL or CPU

# For context, let's display the complex simulation settings!
complex_rbfe_settings.simulation_settings

{'early_termination_target_error': {'unit': 'kilocalorie_per_mole', 'val': 0.0},
 'equilibration_length': {'unit': 'nanosecond', 'val': 0.009999999999999998},
 'minimization_steps': 5000,
 'n_replicas': 11,
 'production_length': {'unit': 'nanosecond', 'val': 0.04999999999999999},
 'real_time_analysis_interval': {'unit': 'picosecond', 'val': 250.0},
 'real_time_analysis_minimum_time': {'unit': 'picosecond', 'val': 500.0},
 'sampler_method': 'repex',
 'sams_flatness_criteria': 'logZ-flatness',
 'sams_gamma0': 1.0,
 'time_per_iteration': {'unit': 'picosecond', 'val': 2.5}}

/Users/atravitz/micromamba/envs/openfe-conda/lib/python3.11/site-packages/gufe/settings/models.py:30: PydanticDeprecatedSince20: The `dict` method is deprecated; use `model_dump` instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.11/migration/
  pprint.pprint(self.dict())

Creating the RFE Protocol#

With the Settings inspected and adjusted, we can provide these to a Protocol. This Protocol defines the procedure to estimate a free energy difference between two chemical systems, with the details of the two end states yet to be defined.

In this case, since we need to run both solvent and complex simulations with different settings, we will create two Protocols, one for each leg.

[25]:

# Create RBFE Protocol classes
solvent_rbfe_protocol = RelativeHybridTopologyProtocol(
    settings=solvent_rbfe_settings
)

complex_rbfe_protocol = RelativeHybridTopologyProtocol(
    settings=complex_rbfe_settings
)

3. Running a Relative Ligand Binding Free Energy Calculation#

3.0 Creating the Transformations#

Once we have the ChemicalSystems and the Protocol, we can create the Transformation.

The Transformation requires as input:

the two ChemicalSystem objects defining either end of the alchemical transformation (stateA and stateB)
a mapping between the two systems
the protocol
a name (optional)

As previously detailed, we create two sets of transformations for the complex and the solvent legs of the thermodynamic cycle.

[26]:

transformation_complex = openfe.Transformation(
            stateA=ejm_31_complex,
            stateB=ejm_47_complex,
            mapping=ejm_31_to_ejm_47,
            protocol=complex_rbfe_protocol,  # use complex protocol created above
            name=f"{ejm_31_complex.name}_{ejm_47_complex.name}_complex"
        )
transformation_solvent = openfe.Transformation(
            stateA=ejm_31_solvent,
            stateB=ejm_47_solvent,
            mapping=ejm_31_to_ejm_47,
            protocol=solvent_rbfe_protocol,  # use solvent protocol created above
            name=f"{ejm_31_solvent.name}_{ejm_47_solvent.name}_solvent"
        )

3.1. Using the Python API#

Creating the `ProtocolDAG`#

With the Transformation defined, we can move onto creating the ProtocolDAG.

The Transformation.create() method creates a directed-acyclic-graph (DAG) of computational tasks necessary for creating an estimate of the free energy difference between the two chemical systems.

[27]:

complex_dag = transformation_complex.create()

solvent_dag = transformation_solvent.create()

The individual pieces of computational work are called Units. In this particular Protocol, the Units defined are three independent repeats of the alchemical transformation.

For other Protocols, for example non-equilibrium sampling routines, there might be dependencies between the tasks.

Simulating the RelativeLigandTransforms#

Individual Units can then be executed by calling the .execute() method.

In the first instance we do a dry-run (which does everything but starting the simulation) to make sure that the hybrid OpenMM system can be constructed without any issues. Note: A successful call to .run() will return an empty Dictionary.

[28]:

# complex dry-run
complex_unit = list(complex_dag.protocol_units)[0]

complex_unit.run(dry=True, verbose=True)

INFO:   Preparing the hybrid topology simulation
INFO:   Parameterizing molecules
INFO:   Requested to generate parameters for residue <Residue 0 (UNK) of chain 0>
INFO:   Requested to generate parameters for residue <Residue 0 (UNK) of chain 0>
INFO:   Creating hybrid system
INFO:   Setting force field terms
INFO:   Adding forces
INFO:   Hybrid system created
WARNING:        Warning: The openmmtools.multistate API is experimental and may change in future releases
/Users/atravitz/micromamba/envs/openfe-conda/lib/python3.11/site-packages/openfe/protocols/openmm_rfe/equil_rfe_methods.py:1018: DeprecationWarning: `in1d` is deprecated. Use `np.isin` instead.
  bfactors[np.in1d(selection_indices, list(hybrid_factory._atom_classes['unique_old_atoms']))] = 0.25  # lig A
/Users/atravitz/micromamba/envs/openfe-conda/lib/python3.11/site-packages/openfe/protocols/openmm_rfe/equil_rfe_methods.py:1019: DeprecationWarning: `in1d` is deprecated. Use `np.isin` instead.
  bfactors[np.in1d(selection_indices, list(hybrid_factory._atom_classes['core_atoms']))] = 0.50  # core
/Users/atravitz/micromamba/envs/openfe-conda/lib/python3.11/site-packages/openfe/protocols/openmm_rfe/equil_rfe_methods.py:1020: DeprecationWarning: `in1d` is deprecated. Use `np.isin` instead.
  bfactors[np.in1d(selection_indices, list(hybrid_factory._atom_classes['unique_new_atoms']))] = 0.75  # lig B
/Users/atravitz/micromamba/envs/openfe-conda/lib/python3.11/site-packages/mdtraj/core/topology.py:97: UserWarning: atom_indices are not monotonically increasing
  warnings.warn("atom_indices are not monotonically increasing")
/Users/atravitz/micromamba/envs/openfe-conda/lib/python3.11/site-packages/openfe/protocols/openmm_utils/omm_compute.py:76: UserWarning: Non-CUDA platform selected: CPU, this may significantly impact simulation performance
  warnings.warn(wmsg)
WARNING:        Non-CUDA platform selected: CPU, this may significantly impact simulation performance
INFO:   Creating and setting up the sampler
WARNING:        Warning: The openmmtools.multistate API is experimental and may change in future releases
WARNING:        An online_analysis_interval that is not a multiple of the checkpoint_interval can lead to redundant information in the real time yaml file after recovering from checkpoints.

Please cite the following:

        Friedrichs MS, Eastman P, Vaidyanathan V, Houston M, LeGrand S, Beberg AL, Ensign DL, Bruns CM, and Pande VS. Accelerating molecular dynamic simulations on graphics processing unit. J. Comput. Chem. 30:864, 2009. DOI: 10.1002/jcc.21209
        Eastman P and Pande VS. OpenMM: A hardware-independent framework for molecular simulations. Comput. Sci. Eng. 12:34, 2010. DOI: 10.1109/MCSE.2010.27
        Eastman P and Pande VS. Efficient nonbonded interactions for molecular dynamics on a graphics processing unit. J. Comput. Chem. 31:1268, 2010. DOI: 10.1002/jcc.21413
        Eastman P and Pande VS. Constant constraint matrix approximation: A robust, parallelizable constraint method for molecular simulations. J. Chem. Theor. Comput. 6:434, 2010. DOI: 10.1021/ct900463w
        Chodera JD and Shirts MR. Replica exchange and expanded ensemble simulations as Gibbs multistate: Simple improvements for enhanced mixing. J. Chem. Phys., 135:194110, 2011. DOI:10.1063/1.3660669

[28]:

{'debug': {'sampler': <instance of HybridRepexSampler>}}

[29]:

# solvent dry-run
solvent_unit = list(solvent_dag.protocol_units)[0]

solvent_unit.run(dry=True, verbose=True)

INFO:   Preparing the hybrid topology simulation
INFO:   Parameterizing molecules
INFO:   Requested to generate parameters for residue <Residue 0 (UNK) of chain 0>
INFO:   Requested to generate parameters for residue <Residue 0 (UNK) of chain 0>
INFO:   Creating hybrid system
INFO:   Setting force field terms
INFO:   Adding forces
INFO:   Hybrid system created
WARNING:        Warning: The openmmtools.multistate API is experimental and may change in future releases
/Users/atravitz/micromamba/envs/openfe-conda/lib/python3.11/site-packages/openfe/protocols/openmm_rfe/equil_rfe_methods.py:1018: DeprecationWarning: `in1d` is deprecated. Use `np.isin` instead.
  bfactors[np.in1d(selection_indices, list(hybrid_factory._atom_classes['unique_old_atoms']))] = 0.25  # lig A
/Users/atravitz/micromamba/envs/openfe-conda/lib/python3.11/site-packages/openfe/protocols/openmm_rfe/equil_rfe_methods.py:1019: DeprecationWarning: `in1d` is deprecated. Use `np.isin` instead.
  bfactors[np.in1d(selection_indices, list(hybrid_factory._atom_classes['core_atoms']))] = 0.50  # core
/Users/atravitz/micromamba/envs/openfe-conda/lib/python3.11/site-packages/openfe/protocols/openmm_rfe/equil_rfe_methods.py:1020: DeprecationWarning: `in1d` is deprecated. Use `np.isin` instead.
  bfactors[np.in1d(selection_indices, list(hybrid_factory._atom_classes['unique_new_atoms']))] = 0.75  # lig B
/Users/atravitz/micromamba/envs/openfe-conda/lib/python3.11/site-packages/mdtraj/core/topology.py:97: UserWarning: atom_indices are not monotonically increasing
  warnings.warn("atom_indices are not monotonically increasing")
/Users/atravitz/micromamba/envs/openfe-conda/lib/python3.11/site-packages/openfe/protocols/openmm_utils/omm_compute.py:76: UserWarning: Non-CUDA platform selected: CPU, this may significantly impact simulation performance
  warnings.warn(wmsg)
WARNING:        Non-CUDA platform selected: CPU, this may significantly impact simulation performance
INFO:   Creating and setting up the sampler
WARNING:        Warning: The openmmtools.multistate API is experimental and may change in future releases
WARNING:        An online_analysis_interval that is not a multiple of the checkpoint_interval can lead to redundant information in the real time yaml file after recovering from checkpoints.

Please cite the following:

        Friedrichs MS, Eastman P, Vaidyanathan V, Houston M, LeGrand S, Beberg AL, Ensign DL, Bruns CM, and Pande VS. Accelerating molecular dynamic simulations on graphics processing unit. J. Comput. Chem. 30:864, 2009. DOI: 10.1002/jcc.21209
        Eastman P and Pande VS. OpenMM: A hardware-independent framework for molecular simulations. Comput. Sci. Eng. 12:34, 2010. DOI: 10.1109/MCSE.2010.27
        Eastman P and Pande VS. Efficient nonbonded interactions for molecular dynamics on a graphics processing unit. J. Comput. Chem. 31:1268, 2010. DOI: 10.1002/jcc.21413
        Eastman P and Pande VS. Constant constraint matrix approximation: A robust, parallelizable constraint method for molecular simulations. J. Chem. Theor. Comput. 6:434, 2010. DOI: 10.1021/ct900463w
        Chodera JD and Shirts MR. Replica exchange and expanded ensemble simulations as Gibbs multistate: Simple improvements for enhanced mixing. J. Chem. Phys., 135:194110, 2011. DOI:10.1063/1.3660669

[29]:

{'debug': {'sampler': <instance of HybridRepexSampler>}}

3.2. Using the CLI#

Even when using the Python API to set up the RBFE calculations, you can dump all Transformations to a JSON file and run the calculations using the openfe quickrun command. Here, we will show you how to save the Transformations to the JSON file.

We’ll write out the transformation to disk, so that it can be run using the openfe quickrun command:

[30]:

import pathlib
# first we create the directory
transformation_dir = pathlib.Path("tyk2_json")
transformation_dir.mkdir(exist_ok=True)

# then we write out the transformations
transformation_complex.to_json(transformation_dir / f"{transformation_complex.name}.json")
transformation_solvent.to_json(transformation_dir / f"{transformation_solvent.name}.json")

You can run the RBFE simulations from the CLI by using the openfe quickrun command, as described in Section 5. below.

4. Analysis#

Finally now that we’ve “run” our simulations, let’s go ahead and gather the free energies for both phases. First we will take a look at the way to do this using our python api, then we will show how to use our openfe gather command on pre-computed results.

4.1 Analysis - Python API#

We can use the python API to gather the free energies for both phases by passing the results of executing the DAGs and calling the gather() methods of RelativeLigandTransform. This takes a list of completed DAG results, catering for when simulations have been extended.

For production use we recommend saving the transformations to disk and using openfe quickrun to run them in an HPC environment, but for completeness, below is a python snippet that will run the transformations we defined earlier and then analyze the results.

# Finally we can run the simulations
complex_path = pathlib.Path('./complex')
complex_path.mkdir()

# First the complex transformation
complex_dag_results = execute_DAG(complex_dag, scratch_basedir=complex_path, shared_basedir=complex_path)

# Next the solvent state transformation
solvent_path = pathlib.Path('./solvent')
solvent_path.mkdir()

solvent_dag_results = execute_DAG(solvent_dag, scratch_basedir=solvent_path, shared_basedir=solvent_path)

# Get the complex and solvent results
complex_results = rbfe_protocol.gather([complex_dag_results])
solvent_results = rbfe_protocol.gather([solvent_dag_results])

print(f"Complex dG: {complex_results.get_estimate()}, err {complex_results.get_uncertainty()}")
print(f"Solvent dG: {solvent_results.get_estimate()}, err {solvent_results.get_uncertainty()}")

4.2 Analysis - `openfe gather`#

First we will download some TYK2 transformations we already ran. These results are from an entire TYK2 network and not a single edge.

[31]:

# Results from our cli tutorial
locale.getpreferredencoding = lambda: "UTF-8"  # hack for google colab, not needed for local execution
!openfe fetch rbfe-tutorial-results
# Extract results
!tar -xf rbfe_results.tar.gz

Fetching /Users/atravitz/micromamba/envs/openfe-conda/lib/python3.11/site-packages/openfecli/tests/data/rbfe_results.tar.gz

Now we can use the openfe gather command to look at the results (see section 5.3 for more details)

[32]:

!openfe gather results/ --report dg -o final_results.tsv
!cat final_results.tsv

writing dg output to 'final_results.tsv'
ligand  DG(MLE) (kcal/mol)      uncertainty (kcal/mol)
lig_ejm_31      -0.09   0.05
lig_ejm_42      0.7     0.1
lig_ejm_46      -0.98   0.05
lig_ejm_47      -0.1    0.1
lig_ejm_48      0.53    0.09
lig_ejm_50      0.91    0.06
lig_ejm_43      2.0     0.2
lig_jmc_23      -0.68   0.09
lig_jmc_27      -1.1    0.1
lig_jmc_28      -1.25   0.08

5. Relative Free Energies with the OpenFE CLI#

You can also do all the above using the OpenFE command line interface – with no Python at all!

The entire process of running the campaign of simulations is split into 3 stages; each of which corresponds to a CLI command:

Setting up the necessary files to describe each of the individual simulations to run.
Running the simulations.
Gathering the results of separate simulations into a single table.

For more details, please visit our tutorial: Relative binding free energies with the OpenFE CLI.

5.1. Setup#

The setup, as described above, can also be carried out using the CLI command openfe plan-rbfe-network.

openfe plan-rbfe-network -M inputs/tyk2_ligands.sdf -p inputs/tyk2_protein.pdb -o tyk2_json/

This command plans a relative binding free energy network and saves it as JSON files for the openfe quickrun command.

By default, this tool makes the following choices:

Atom mappings performed by LOMAP, with settings max3d=1.0 and element_change=False
Minimal spanning network as the network planner, with LOMAP default score as the weight function
Water as solvent, with NaCl counter ions at 0.15 M concentration.
Protocol is the OpenMM-based relative hybrid topology protocol, with default settings.

These choices can be customized by creating a settings yaml file, which is passed in via the -s settings.yaml option. For more details, please visit our user guide section about Customising CLI planning with yaml settings

5.2. Execution#

You can run each leg individually by using the openfe quickrun command. It takes a transformation JSON as input, and the flags -o to give the final output JSON file and -d for the directory where simulation results should be stored. For example,

openfe quickrun tyk2_json/lig_ejm_31_lig_ejm_47_complex.json -o results_complex.json -d working-directory

openfe quickrun tyk2_json/lig_ejm_31_lig_ejm_47_solvent.json -o results_solvent.json -d working-directory

5.3. Analysis#

To gather the estimates into a single file, use the openfe gather command from within the working directory used above:

openfe gather results/ --report dg -o final_results.tsv

This will write out a tab-separated table of results, where the results reported are controlled by the --report option:

dg (default) reports the ligand and the results are the maximum likelihood estimate of its absolute free, and the associated uncertainty from DDG replica averages and standard deviations.
ddg reports pairs of ligand_i and ligand_j, the calculated relative free energy DDG(i->j) = DG(j) - DG(i) and its uncertainty.
raw reports the raw results, giving the leg (vacuum, solvent, or complex), ligand_i, ligand_j, the raw DG(i->j) associated with it.

6. Useful References for Getting Started#

In our documentation, we provide tutorials for ever protocol to walk you through setup, execution and analysis step by step.

In addition to the tutorials, you can find cookbooks, written as How-to guides on how to utilize different components of the toolkit, as well as a User Guide that goes into the underlying concepts of the OpenFE toolkit.

For details about the toolkit’s core methods and classes, please visit our API Reference or our Github page.

To learn more about the project, our team and how you can get involved, please visit our Homepage or get in touch at OpenFreeEnergy@omsf.io.

[ ]: