Plotting OpenFE results against experiment using Cinnabar v0.4#

This notebook shows how one would go about creating a cinnabar plot of OpenFE results against known experimental values.

0. Setup for Google Colab#

If you are running this example in Google Colab, run the following cells to setup the environment. If you are running this notebook locally, skip down to 1. Overview

[ ]:

# NBVAL_SKIP
# Only run this cell if on google colab
import os
if "COLAB_RELEASE_TAG" in os.environ:
    # fix for colab's torchvision causing issues
    !rm -r /usr/local/lib/python3.12/dist-packages/torchvision

    !pip install -q condacolab
    import condacolab
    condacolab.install_from_url("https://github.com/OpenFreeEnergy/openfe/releases/download/v1.7.0/OpenFEforge-1.7.0-Linux-x86_64.sh")

[ ]:

# NBVAL_SKIP
# Only run this cell if on google colab
import os
if "COLAB_RELEASE_TAG" in os.environ:
    import condacolab
    import locale
    locale.getpreferredencoding = lambda: "UTF-8"
    !mkdir inputs && cd inputs && openfe fetch rbfe-tutorial
    # quick fix for https://github.com/conda-incubator/condacolab/issues/75
    # if deprecation warnings persist, rerun this cell
    import warnings
    warnings.filterwarnings(action="ignore", message=r"datetime.datetime.utcnow")
    for _ in range(3):
      # Sometimes we have to re-run the check
      try:
        condacolab.check()
      except:
        pass
      else:
        break

Loading experimental data#

First we load our known experimental data from a tab separated values (TSV) file.

The format of the TSV file is as follows:

ligand  estimate (kcal/mol)     uncertainty (kcal/mol)

Important note: Please note that for now you must have an experimental entry for every ligand involved in your free energy network. In future versions of cinnabar this will no longer be needed.

[1]:

# First we do a set of imports
import csv
from pprint import pprint
import cinnabar
from cinnabar import plotting as cinnabar_plotting
from cinnabar import femap, stats

[2]:

# read in the experimental data

experimental_data = {}
experimental_filename = 'experimental.tsv'

with open(experimental_filename, 'r') as fd:
    rd = csv.reader(fd, delimiter="\t", quotechar='"')
    headers = next(rd)
    for row in rd:
        experimental_data[row[0]] = {}
        experimental_data[row[0]]['dG'] = float(row[1])
        experimental_data[row[0]]['ddG'] = float(row[2])

pprint(experimental_data)

{'lig_ejm_31': {'dG': -9.57, 'ddG': 0.18},
 'lig_ejm_42': {'dG': -9.81, 'ddG': 0.18},
 'lig_ejm_43': {'dG': -8.29, 'ddG': 0.18},
 'lig_ejm_45': {'dG': -9.59, 'ddG': 0.18},
 'lig_ejm_46': {'dG': -11.35, 'ddG': 0.17},
 'lig_ejm_47': {'dG': -9.73, 'ddG': 0.18},
 'lig_ejm_48': {'dG': -9.03, 'ddG': 0.18},
 'lig_ejm_50': {'dG': -9.01, 'ddG': 0.18},
 'lig_ejm_54': {'dG': -10.57, 'ddG': 0.18},
 'lig_ejm_55': {'dG': -9.24, 'ddG': 0.18},
 'lig_jmc_23': {'dG': -11.74, 'ddG': 0.18},
 'lig_jmc_27': {'dG': -11.31, 'ddG': 0.17},
 'lig_jmc_28': {'dG': -11.01, 'ddG': 0.18}}

Loading free energy results#

Next we load in results from the TSV file created by openfe gather --report ddg.

Please see the following tutorial for more information on how to run the gather command: OpenFreeEnergy/ExampleNotebooks

[3]:

# Read in calculated results

calculated_data = {}
calculated_filename = 'final_results.tsv'

with open(calculated_filename, 'r') as fd:
    rd = csv.reader(fd, delimiter="\t", quotechar='"')
    headers = next(rd)
    for row in rd:
        tag = row[0] + "->" + row[1]
        calculated_data[tag] = {}
        calculated_data[tag]['ligand_i'] = row[0]
        calculated_data[tag]['ligand_j'] = row[1]
        calculated_data[tag]['dG'] = float(row[2])
        calculated_data[tag]['ddG'] = float(row[3])

pprint(calculated_data)

{'lig_ejm_31->lig_ejm_42': {'dG': 0.49,
                            'ddG': 0.09,
                            'ligand_i': 'lig_ejm_31',
                            'ligand_j': 'lig_ejm_42'},
 'lig_ejm_31->lig_ejm_46': {'dG': -0.73,
                            'ddG': 0.1,
                            'ligand_i': 'lig_ejm_31',
                            'ligand_j': 'lig_ejm_46'},
 'lig_ejm_31->lig_ejm_47': {'dG': 0.0,
                            'ddG': 0.2,
                            'ligand_i': 'lig_ejm_31',
                            'ligand_j': 'lig_ejm_47'},
 'lig_ejm_31->lig_ejm_48': {'dG': 0.5,
                            'ddG': 0.2,
                            'ligand_i': 'lig_ejm_31',
                            'ligand_j': 'lig_ejm_48'},
 'lig_ejm_31->lig_ejm_50': {'dG': 0.94,
                            'ddG': 0.07,
                            'ligand_i': 'lig_ejm_31',
                            'ligand_j': 'lig_ejm_50'},
 'lig_ejm_42->lig_ejm_43': {'dG': 1.2,
                            'ddG': 0.1,
                            'ligand_i': 'lig_ejm_42',
                            'ligand_j': 'lig_ejm_43'},
 'lig_ejm_46->lig_jmc_23': {'dG': -0.39,
                            'ddG': 0.05,
                            'ligand_i': 'lig_ejm_46',
                            'ligand_j': 'lig_jmc_23'},
 'lig_ejm_46->lig_jmc_27': {'dG': -0.6,
                            'ddG': 0.1,
                            'ligand_i': 'lig_ejm_46',
                            'ligand_j': 'lig_jmc_27'},
 'lig_ejm_46->lig_jmc_28': {'dG': -0.12,
                            'ddG': 0.04,
                            'ligand_i': 'lig_ejm_46',
                            'ligand_j': 'lig_jmc_28'}}

Creating a Cinnabar CSV file#

Next we take the data we’ve gathered above and write it out as cinnabar “csv” file.

[4]:

cinnabar_filename = 'cinnabar_input.csv'

with open(cinnabar_filename, 'w') as f:
    f.write("# Experimental block\n")
    f.write("# Ligand, expt_DDG, expt_dDDG\n")
    for entry in experimental_data:
        dG = experimental_data[entry]['dG']
        ddG = experimental_data[entry]['ddG']
        f.write(f"{entry},{dG:.2f},{ddG:.2f}\n")
    f.write('\n')
    f.write('# Calculated block\n')
    f.write('# Ligand1,Ligand2,calc_DDG,calc_dDDG(MBAR),calc_dDDG(additional)\n')
    for entry in calculated_data:
        dG = calculated_data[entry]['dG']
        ddG = calculated_data[entry]['ddG']
        molA = calculated_data[entry]['ligand_i']
        molB = calculated_data[entry]['ligand_j']
        f.write(f"{molA},{molB},{dG:.2f},0,{ddG:.2f}\n")

Plotting the results using Cinnabar#

Once we’ve created the appropriate Cinnabar input file we can go ahead and plot out our results.

Generating a Cinnabar FEMap and plotting out the network#

First let’s load the data into cinnabar and draw out the network of free energy results.

[5]:

fe = femap.FEMap.from_csv('cinnabar_input.csv')
fe.generate_absolute_values()  # Get MLE generated estimates of the absolute values
fe.draw_graph()

/Users/hannahbaumann/cinnabar/cinnabar/femap.py:21: UserWarning: Assuming kcal/mol units on measurements
  warnings.warn("Assuming kcal/mol units on measurements")

../_images/tutorials_plotting_with_cinnabar_13_1.png

Plotting out the relative free energy results#

Next we can go ahead and plot out the relative free energy results.

[6]:

# note you can pass the filename argument to write things out to file
cinnabar_plotting.plot_DDGs(fe.to_legacy_graph(), figsize=5)

../_images/tutorials_plotting_with_cinnabar_15_0.png

Plotting out the absolute free energy results#

Finally let’s go ahead and plot out the MLE derived absolute free energies

[7]:

# note you can pass the filename argument to write to file
cinnabar_plotting.plot_DGs(fe.to_legacy_graph(), figsize=5)

../_images/tutorials_plotting_with_cinnabar_17_0.png

We can also shift our free energies by the average experimental value to have DGs on the same scale as experiment.

[8]:

data = femap.read_csv('cinnabar_input.csv')
exp_DG_sum = sum([data['Experimental'][i].DG for i in data['Experimental'].keys()])
shift = exp_DG_sum / len(data['Experimental'].keys())

cinnabar_plotting.plot_DGs(fe.to_legacy_graph(), figsize=5,  shift=shift.m)

/Users/hannahbaumann/cinnabar/cinnabar/femap.py:21: UserWarning: Assuming kcal/mol units on measurements
  warnings.warn("Assuming kcal/mol units on measurements")

../_images/tutorials_plotting_with_cinnabar_19_1.png

Writing out the MLE derived absolute free energies#

Finally, we can also write out the MLE derived absolute free energies in the following manner.

Note: you can obtain this directly from your results by calling ``openfe gather –report dg``

[9]:

dG_results = {}
nodes = list(fe.to_legacy_graph().nodes.data())

for key in range(len(nodes)):
    dG_results[nodes[key][0]] = {
        'experimental_estimate': nodes[key][1]['exp_DG'],
        'experimental_error': nodes[key][1]['exp_dDG'],
        'calculated_estimate': round(nodes[key][1]['calc_DG'],2),
        'calculated_error': round(nodes[key][1]['calc_dDG'],2),
    }

# We can now print out the results
pprint(dG_results)

# write out the calculated results
with open('dG_calculated_results.dat', 'w') as f:
    writer = csv.writer(f, delimiter="\t", lineterminator="\n")
    writer.writerow(["ligand", "DG(MLE)", "uncertainty (kcal/mol)",])
    for ligand in dG_results:
        writer.writerow([
            ligand,
            dG_results[ligand]['calculated_estimate'],
            dG_results[ligand]['calculated_error'],
        ])

# write out the experimental results
with open('dG_experimental_results.dat', 'w') as f:
    writer = csv.writer(f, delimiter="\t", lineterminator="\n")
    writer.writerow(["ligand", "DG", "uncertainty (kcal/mol)",])
    for ligand in dG_results:
        writer.writerow([
            ligand,
            dG_results[ligand]['experimental_estimate'],
            dG_results[ligand]['experimental_error'],
        ])

{'lig_ejm_31': {'calculated_error': 0.05,
                'calculated_estimate': 0.04,
                'experimental_error': 0.18,
                'experimental_estimate': -9.57},
 'lig_ejm_42': {'calculated_error': 0.09,
                'calculated_estimate': 0.53,
                'experimental_error': 0.18,
                'experimental_estimate': -9.81},
 'lig_ejm_43': {'calculated_error': 0.13,
                'calculated_estimate': 1.73,
                'experimental_error': 0.18,
                'experimental_estimate': -8.29},
 'lig_ejm_46': {'calculated_error': 0.07,
                'calculated_estimate': -0.69,
                'experimental_error': 0.17,
                'experimental_estimate': -11.35},
 'lig_ejm_47': {'calculated_error': 0.19,
                'calculated_estimate': 0.04,
                'experimental_error': 0.18,
                'experimental_estimate': -9.73},
 'lig_ejm_48': {'calculated_error': 0.19,
                'calculated_estimate': 0.54,
                'experimental_error': 0.18,
                'experimental_estimate': -9.03},
 'lig_ejm_50': {'calculated_error': 0.08,
                'calculated_estimate': 0.98,
                'experimental_error': 0.18,
                'experimental_estimate': -9.01},
 'lig_jmc_23': {'calculated_error': 0.08,
                'calculated_estimate': -1.08,
                'experimental_error': 0.18,
                'experimental_estimate': -11.74},
 'lig_jmc_27': {'calculated_error': 0.11,
                'calculated_estimate': -1.29,
                'experimental_error': 0.17,
                'experimental_estimate': -11.31},
 'lig_jmc_28': {'calculated_error': 0.08,
                'calculated_estimate': -0.81,
                'experimental_error': 0.18,
                'experimental_estimate': -11.01}}