Components, Chemical Systems and Thermodynamic Cycles#

This page describes the core building blocks used to define simulation states in openfe: Components, which describe what is physically present in a system; ChemicalSystems, which combine components into a complete end state; and thermodynamic cycles, which connect end states via alchemical transformations.

Components#

Components are the composable building blocks that define the chemical composition of a simulated system. Splitting a system into components serves three purposes:

  1. Alchemical transformations can be easily understood by comparing the differences in components.

  2. Components can be reused to compose different systems.

  3. Protocols can apply component-specific behaviour, e.g. different force fields per component.

Component types — overview#

Component

Role

Key notes

ProteinComponent

Biological assembly

Typically the contents of a PDB file. May include crystallographic waters and ions (defined as HETATM entries), and disulfide bonds (defined via CONECT records).

SmallMoleculeComponent

Ligands and cofactors

Can optionally contain atomic partial charges. If present, those will be used in the simulation.

SolventComponent

Abstract solvent definition

Defines solvent conditions and ion concentration. Does not include coordinates or box vectors. Solvent is added by the protocol at runtime.

SolvatedPDBComponent

Explicitly solvated system

Includes atomic coordinates and box vectors. Solvent is already present, the protocol does not add any further solvation.

ProteinMembraneComponent

Protein-membrane complex

Subclass of SolvatedPDBComponent. Includes protein, membrane, solvent, and box vectors. Replaces ProteinComponent in membrane systems.

Abstract vs explicit solvation#

These two approaches are mutually exclusive:

Either define the solvent abstractly, or provide a fully solvated system — do not mix both for the same leg of a transformation.

Note

Some protocols, such as SepTopProtocol and AbsoluteBindingProtocol, use a single ChemicalSystem to represent both the complex and solvent legs. In this case, a ChemicalSystem may contain both a SolventComponent and a ProteinMembraneComponent. However, these apply to different legs: the SolventComponent is used only for the solvent leg, and the ProteinMembraneComponent (which is already explicitly solvated) is used only for the complex leg. The mutual exclusivity rule still holds per leg.

Box vectors for explicitly solvated systems#

The components SolvatedPDBComponent and ProteinMembraneComponent require periodic box vectors. These can be provided in three ways:

  1. CRYST record in the PDB file — OpenMM reads box vectors automatically. No additional arguments are needed:

    membrane_protein = openfe.ProteinMembraneComponent.from_pdb_file('./protein_membrane.pdb')
    
  2. Manual specification — box vectors can be provided explicitly as numpy arrays with OpenFF units in OpenMM format via the box_vectors argument:

    import numpy as np
    import openff.units as offunit
    
    box_vectors = np.array([
        [6.9587, 0.0, 0.0],
        [0.0, 5.9164, 0.0],
        [0.0, 0.0, 9.2692]
    ]) * offunit.nanometer
    
    membrane_protein = openfe.ProteinMembraneComponent.from_pdb_file(
        './protein_membrane.pdb', box_vectors=box_vectors
    )
    
  3. Inference from atomic coordinates — box vectors can be estimated from the atomic positions by passing infer_box_vectors=True:

    membrane_protein = openfe.ProteinMembraneComponent.from_pdb_file(
        './protein_membrane.pdb', infer_box_vectors=True
    )
    

    Warning

    Inferring box vectors from atomic positions can be inaccurate if the PDB originates from a previous simulation where atoms may be distributed across periodic images.

ChemicalSystem#

A ChemicalSystem is composed of components that together describe a model of the system to be simulated. simulated system. It represents the end state of an alchemical transformation and is the primary input a Protocol consumes to define a simulation state.

What a ChemicalSystem defines

  • Exact atomic information (including protonation state) of protein, ligands, cofactors, and any crystallographic waters.

  • Atomic positions of all explicitly defined components such as ligands or proteins.

  • The abstract or explicit definition of the solvent environment (SolventComponent).

What a ChemicalSystem does NOT define, and are instead handled by the Protocol:

Any simulation parameters including: * Forcefield applied to any component, including water model or virtual particles. * Thermodynamic conditions (e.g. temperature or pressure). * These are handled by the Protocol.

System composition examples#

The components that make up each ChemicalSystem depend on the protocol and the nature of the system. The table below summarises the composition for each combination.

Note

Protocol-specific behaviour: For SepTopProtocol and AbsoluteBindingProtocol, a single ChemicalSystem represents both legs of the thermodynamic cycle. The protocol determines internally what is the complex leg and what is the solvent leg. This differs from the RelativeHybridTopologyProtocol, where each leg (e.g. complex and solvent) is defined by separate ChemicalSystems. This behaviour is expected to change in future versions.

System

RBFE (RelativeHybridTopologyProtocol)

SepTop / ABFE (SepTopProtocol, AbsoluteBindingProtocol)

Standard protein–ligand

Single ChemicalSystem (both legs):

Membrane system

Complex leg:
(no SolventComponent — already explicitly solvated)

Solvent leg:
Single ChemicalSystem (both legs):
(protocol applies SolventComponent only in the solvent leg)

Thermodynamic Cycles#

A thermodynamic cycle can be described as a set of ChemicalSystems (nodes) connected by alchemical transformations (edges). The Protocol defines how the ChemicalSystems map onto the cycle and how they are used in practice. The same ChemicalSystem can be reused across multiple thermodynamic states depending on the protocol. For details of which end states to construct, consult the pages for each specific Protocol.

Hybrid topology RBFE example#

As an example, the relative binding free energy cycle requires four ChemicalSystems — one for each node in the cycle:

RBFE thermodynamic cycle

Illustration of the relative binding free energy thermodynamic cycles and the chemical systems at each end state.#

import openfe

# two small molecules defined in a molfile format
ligand_A = openfe.SmallMoleculeComponent.from_sdf_file('./ligand_A.sdf')
ligand_B = openfe.SmallMoleculeComponent.from_sdf_file('./ligand_B.sdf')
# a complete biological assembly
protein = openfe.ProteinComponent.from_pdb_file('./protein.pdb')
# defines an aqueous solvent environment, with a concentration of ions
solvent = openfe.SolventComponent(smiles='O')

# ligand_A + protein + solvent
ligand_A_complex = openfe.ChemicalSystem(components={'ligand': ligand_A, 'protein': protein, 'solvent': solvent})
# ligand_B + protein + solvent
ligand_B_complex = openfe.ChemicalSystem(components={'ligand': ligand_B, 'protein': protein, 'solvent': solvent})
# ligand_A + solvent
ligand_A_solvent = openfe.ChemicalSystem(components={'ligand': ligand_A, 'solvent': solvent})
# ligand_B + solvent
ligand_B_solvent = openfe.ChemicalSystem(components={'ligand': ligand_B, 'solvent': solvent})

Explicitly solvated variant#

When using a SolvatedPDBComponent or ProteinMembraneComponent, replace ProteinComponent and SolventComponent for the complex leg. No separate SolventComponent is required:

# explicitly solvated protein-membrane complex (box vectors read from CRYST1 record)
protein_membrane = openfe.ProteinMembraneComponent.from_pdb_file('./protein_membrane.pdb')

# ligand_A + explicitly solvated protein-membrane — no SolventComponent needed
ligand_A_complex = openfe.ChemicalSystem(components={'ligand': ligand_A, 'protein_membrane': protein_membrane})

See Also#