Installing Atomate2 from source with OpenMM¶

# setting up our conda environment
>>> conda create -n atomate2 python=3.11
>>> conda activate atomate2

# installing atomate2
>>> pip install "atomate2[openmm]"

Alternatively, if you anticipate regularly updating atomate2 from source, you can clone the repository and install from source.

# installing atomate2
>>> git clone https://github.com/materialsproject/atomate2
>>> cd atomate2
>>> pip install -e '.[openmm]'

To test the openmm installation, you can run the following command. If you intend to run on GPU, make sure that the tests are passing for CUDA.

>>> python -m openmm.testInstallation

Understanding Atomate2 OpenMM¶

Atomate2 is really just a collection of jobflow workflows relevant to materials science. In all the workflows, we pass our system of interest between different jobs to perform the desired simulation. Representing the intermediate state of a classical molecular dynamics simulation, however, is challenging. While the intermediate representation between stages of a periodic DFT simulation can include just the elements, xyz coordinates, and box vectors, classical molecular dynamics systems must also include velocities and forces. The latter is particularly challenging because all MD engines represent forces differently. Rather than implement our own representation, we use the openff.interchange.Interchange object, which catalogs the necessary system properties and interfaces with a variety of MD engines. This is the object that we pass between stages of a classical MD simulation and it is the starting point of our workflow.

Setting up the system¶

The first job we need to create generates the Interchange object. To specify the system of interest, we use give it the SMILES strings, counts, and names (optional) of the molecules we want to include.

from atomate2.openff.core import generate_interchange

mol_specs_dicts = [
    {"smiles": "O", "count": 200, "name": "water"},
    {"smiles": "CCO", "count": 10, "name": "ethanol"},
    {"smiles": "C1=C(C=C(C(=C1O)O)O)C(=O)O", "count": 1, "name": "gallic_acid"},
]

gallic_interchange_job = generate_interchange(mol_specs_dicts, 1.3)

If you are wondering what arguments are allowed in the dictionaries, check out the create_mol_spec function in the atomate2.openff.utils module. Under the hood, this is being called on each mol_spec dict. Meaning the code below is functionally identical to the code above.

from atomate2.openff.utils import create_mol_spec

mols_specs = [create_mol_spec(**mol_spec_dict) for mol_spec_dict in mol_specs_dicts]

generate_interchange(mols_specs, 1.3)

In a more complex simulation we might want to scale the ion charges and include custom partial charges. An example with a EC:EMC:LiPF6 electrolyte is shown below. This yields the elyte_interchange_job object, which we can pass to the next stage of the simulation.

NOTE: It’s actually mandatory to include partial charges for PF6- here, the built in partial charge method fails.

import numpy as np
from pymatgen.core.structure import Molecule


pf6 = Molecule(
    ["P", "F", "F", "F", "F", "F", "F"],
    [
        [0.0, 0.0, 0.0],
        [1.6, 0.0, 0.0],
        [-1.6, 0.0, 0.0],
        [0.0, 1.6, 0.0],
        [0.0, -1.6, 0.0],
        [0.0, 0.0, 1.6],
        [0.0, 0.0, -1.6],
    ],
)
pf6_charges = np.array([1.34, -0.39, -0.39, -0.39, -0.39, -0.39, -0.39])

mol_specs_dicts = [
    {"smiles": "C1COC(=O)O1", "count": 100, "name": "EC"},
    {"smiles": "CCOC(=O)OC", "count": 100, "name": "EMC"},
    {
        "smiles": "F[P-](F)(F)(F)(F)F",
        "count": 50,
        "name": "PF6",
        "partial_charges": pf6_charges,
        "geometry": pf6,
        "charge_scaling": 0.8,
        "charge_method": "RESP",
    },
    {"smiles": "[Li+]", "count": 50, "name": "Li", "charge_scaling": 0.8},
]

elyte_interchange_job = generate_interchange(mol_specs_dicts, 1.3)

Running a basic simulation¶

To run a production simulation, we will create a production flow, link it to our elyte_interchange_job, and then run both locally.

In jobflow, jobs and flows are created by Makers, which can then be linked into more complex flows. Here, OpenMMFlowMaker links together makers for energy minimization, pressure equilibration, annealing, and a nvt simulation. The annealing step is a subflow that saves us from manually instantiating three separate jobs.

Finally, we create our production flow and link to the generate_interchange job, yielding a production ready molecular dynamics workflow.

from atomate2.openmm.flows.core import OpenMMFlowMaker
from atomate2.openmm.jobs.core import (
    EnergyMinimizationMaker,
    NPTMaker,
    NVTMaker,
)
from jobflow import Flow, run_locally


production_maker = OpenMMFlowMaker(
    name="production_flow",
    makers=[
        EnergyMinimizationMaker(traj_interval=10, state_interval=10),
        NPTMaker(n_steps=100),
        OpenMMFlowMaker.anneal_flow(n_steps=150),
        NVTMaker(n_steps=100),
    ],
)

production_flow = production_maker.make(
    elyte_interchange_job.output.interchange,
    prev_dir=elyte_interchange_job.output.dir_name,
    output_dir="./tutorial_system",
)

run_locally(Flow([elyte_interchange_job, production_flow]))

Above, we are running a very short simulation (350 steps total) and reporting out the trajectory and state information very frequently. For a more realistic simulation, see the “Configuring the Simulation” section below.

When the above code is executed, you should expect to see this within the tutorial_system directory:

/tutorial_system
├── state.csv
├── state2.csv
├── state3.csv
├── state4.csv
├── state5.csv
├── state6.csv
├── taskdoc.json
├── trajectory.dcd
├── trajectory2.dcd
├── trajectory3.dcd
├── trajectory4.dcd
├── trajectory5.dcd
├── trajectory6.dcd

Each job saved a separate state and trajectory file. There are 6 because the anneal flow creates 3 sub-jobs and the EnergyMinimizationMaker does not report anything. The taskdoc.json file contains the metadata for the entire workflow.

Awesome! At this point, we’ve run a workflow and could start analyzing our data. Before we get there though, let’s go through some of the other simulation options available.

Digging Deeper¶

Atomate2 OpenMM supports running a variety of workflows with different configurations. Below we dig in to some of the more advanced options.

Configuring the Simulation¶

Learn more about the configuration of OpenMM simulations

All OpenMM jobs, i.e. anything in atomate2.openmm.jobs, inherits from the BaseOpenMMMaker class. BaseOpenMMMaker is highly configurable, you can change the timestep, temperature, reporting frequencies, output types, and a range of other properties. See the docstring for the full list of options.

Note that when instantiating the OpenMMFlowMaker above, we only set the traj_interval and state_interval once, inside EnergyMinimizationMaker. This is a key feature: all makers will inherit attributes from the previous maker if they are not explicitly reset. This allows you to set the timestep once and have it apply to all stages of the simulation. The value inheritance is as follows: 1) any explicitly set value, 2) the value from the previous maker, 3) the default value (as shown below).

from atomate2.openmm.jobs.base import OPENMM_MAKER_DEFAULTS

print(OPENMM_MAKER_DEFAULTS)

{
    "step_size": 0.001,
    "temperature": 298,
    "friction_coefficient": 1,
    "platform_name": "CPU",
    "platform_properties": {},
    "state_interval": 1000,
    "state_file_name": "state",
    "traj_interval": 10000,
    "wrap_traj": False,
    "report_velocities": False,
    "traj_file_name": "trajectory",
    "traj_file_type": "dcd",
    "embed_traj": False,
}

Perhaps we want to embed the trajectory in the taskdoc, so that it can be saved to the database, but only for our final run so we don’t waste space. AND we also want to add some tags, so we can identify the simulation in our database more easily. Finally, we want to run for much longer, more appropriate for a real production workflow.

production_maker = OpenMMFlowMaker(
    name="production_flow",
    tags=["tutorial_production_flow"],
    makers=[
        EnergyMinimizationMaker(traj_interval=0),
        NPTMaker(n_steps=1000000),
        OpenMMFlowMaker.anneal_flow(n_steps=1500000),
        NVTMaker(n_steps=5000000, traj_interval=10000, embed_traj=True),
    ],
)

production_flow = production_maker.make(
    elyte_interchange_job.output.interchange,
    prev_dir=elyte_interchange_job.output.dir_name,
    output_dir="./tutorial_system",
)

run_locally(Flow([elyte_interchange_job, production_flow]))

Running with Databases¶

Learn to upload your MD data to databases

Before trying this, you should have a basic understanding of JobFlow and Stores.

To log OpenMM results to a database, you’ll need to set up both a MongoStore, for taskdocs, and blob storage, for trajectories. Here, I’ll show you the correct jobflow.yaml file to use the MongoDB storage and MinIO S3 storage provided by NERSC. To get this up, you’ll need to contact NERSC to get accounts on their MongoDB and MinIO services. Then you can follow the instructions in the Stores tutorial to link jobflow to your databases. Your jobflow.yaml should look like this:

JOB_STORE:
  docs_store:
    type: MongoStore
    database: DATABASE
    collection_name: atomate2_docs # suggested
    host: mongodb05.nersc.gov
    port: 27017
    username: USERNAME
    password: PASSWORD

  additional_stores:
      data:
          type: S3Store
          index:
              type: MongoStore
              database: DATABASE
              collection_name: atomate2_blobs_index # suggested
              host: mongodb05.nersc.gov
              port: 27017
              username: USERNAME
              password: PASSWORD
              key: blob_uuid
          bucket: oac
          s3_profile: oac
          s3_resource_kwargs:
              verify: false
          endpoint_url: https://next-gen-minio.materialsproject.org/
          key: blob_uuid

NOTE: This can work with any MongoDB and S3 storage, not just NERSC’s.

As shown in the production example above, you’ll need to set the embed_traj property to True in any makers where you want to save the trajectory to the database. Otherwise, the trajectory will only be saved locally.

Rather than use jobflow.yaml, you could also create the stores in Python and pass the stores to the run_locally function. This is a bit more code, so usually the prior method is preferred.

from jobflow import run_locally, JobStore
from maggma.stores import MongoStore, S3Store

mongo_info = {
    "username": "USERNAME",
    "password": "PASSWORD",
    "database": "DATABASE",
    "host": "mongodb05.nersc.gov",
}

md_doc_store = MongoStore(**mongo_info, collection_name="atomate2_docs")

md_blob_index = MongoStore(
    **mongo_info,
    collection_name="atomate2_blobs_index",
    key="blob_uuid",
)

md_blob_store = S3Store(
    index=md_blob_index,
    bucket="BUCKET",
    s3_profile="PROFILE",
    endpoint_url="https://next-gen-minio.materialsproject.org",
    key="blob_uuid",
)

# run our previous flow with the new stores
run_locally(
    Flow([elyte_interchange_job, production_flow]),
    store=JobStore(md_doc_store, additional_stores={"data": md_blob_store}),
    ensure_success=True,
)

Running on GPUs¶

Learn to accelerate MD simulations with GPUs

Running on a GPU is nearly as simple as running on a CPU. The only difference is that you need to specify the platform_properties argument in the EnergyMinimizationMaker with the DeviceIndex of the GPU you want to use.

production_maker = OpenMMFlowMaker(
    name="test_production",
    makers=[
        EnergyMinimizationMaker(
            platform_name="CUDA",
            platform_properties={"DeviceIndex": "0"},
        ),
        NPTMaker(),
        OpenMMFlowMaker.anneal_flow(),
        NVTMaker(),
    ],
)

Some systems (notably perlmutter) have multiple GPUs available on a single node. To fully leverage the compute, you’ll need to distribute 4 simulations across the 4 GPUs. A simple way to do this is with MPI.

First you’ll need to install mpi4py.

>>> conda install mpi4py

Then you can modify and run the following script to distribute the work across the GPUs.

# other imports

from mpi4py import MPI

comm = MPI.COMM_WORLD
rank = comm.Get_rank()

list_of_mol_spec_lists = []
# logic to add four mol_spec_lists to list_of_mol_spec_lists


flows = []
for i in range(4):
    device_index = i
    mol_specs = list_of_mol_spec_lists[i]

    setup = generate_interchange(mol_specs, 1.0)

    production_maker = OpenMMFlowMaker(
        name="test_production",
        makers=[
            EnergyMinimizationMaker(
                platform_name="CUDA",
                platform_properties={"DeviceIndex": str(device_index)},
            ),
            NPTMaker(),
            OpenMMFlowMaker.anneal_flow(),
            NVTMaker(),
        ],
    )

    production_flow = production_maker.make(
        setup.output.interchange,
        prev_dir=setup.output.dir_name,
        output_dir=f"/pscratch/sd/o/oac/openmm_runs/{i}",
    )
    flows.append(Flow([setup, production_flow]))

# this script will run four times, each with a different rank, thus distributing the work across the four GPUs.
run_locally(flows[rank], ensure_success=True)

Varying Forcefields: OPLS¶

Learn to generate OPLS Forcefield Parameters

The OpenFF Force Fields provide a powerful starting point to simulate a variety of organic materials using general forcefields like Parsley and Sage. Just as is done through the OpenFF Toolkit and Interchange machinery, one can automate force field generation for custom force fields. For instance, LigParGen is an automatic OPLS-AA parameter generator for small organic molecules with both a online server and open-source repository. You will see that for any custom parameter generation tool, one can create a container environment as a wrapper to plug into the workflow described up until now.

To do so, you will use the generate_opls_xml(...) function in atomate2/openmm/utils. This function runs a subprocess to call an image of the LigParGen repository (and all of its respective dependencies). Thus, this requires a local installation of Docker (otherwise, download_opls_xml can be run via the LigParGen website server instead). Once you have docker installed locally, generate_opls_xml(...) can be unlocked in three steps:

1. Create a Private LigParGen Image¶

You will need to install BOSS–once you receive the email, follow the instructions, LICENSE guidelines, and save the boss directory in the same directory as the following Dockerfile:

FROM ubuntu:20.04

LABEL org.opencontainers.image.version="20.04"
LABEL org.opencontainers.image.ref.name="ubuntu"

ARG LAUNCHPAD_BUILD_ARCH
ARG RELEASE

RUN dpkg --add-architecture i386 && \
    apt-get update && \
    apt-get install -y \
        libc6:i386 \
        libncurses5:i386 \
        libstdc++6:i386 \
        zlib1g:i386 \
        gcc-multilib \
        g++-multilib \
        binutils \
        git \
        curl \
        libxrender1 \
        csh && \
    apt-get clean

RUN curl -L -O https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh && \
    bash Miniconda3-latest-Linux-x86_64.sh -b -p /opt/conda && \
    rm Miniconda3-latest-Linux-x86_64.sh && \
    /opt/conda/bin/conda init

RUN /opt/conda/bin/conda create -n ligpargen -y python=3.7 && \
    /opt/conda/bin/conda install -n ligpargen -y -c rdkit rdkit && \
    /opt/conda/bin/conda install -n ligpargen -y -c conda-forge openbabel

ENV PATH="/opt/conda/envs/ligpargen/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"

RUN git clone https://github.com/Isra3l/ligpargen.git /opt/ligpargen && \
    cd /opt/ligpargen && \
    /opt/conda/envs/ligpargen/bin/pip install -e .

COPY ./boss /opt/BOSSdir

RUN chmod +x /opt/BOSSdir/*

ENV BOSSdir="/opt/BOSSdir"

WORKDIR /opt/output

RUN echo "source activate ligpargen" > ~/.bashrc

SHELL ["/bin/bash", "-c"]

CMD ["/bin/bash"]

It will help to have an account either via DockerHub or the NERSC registry to privately upload your image. Next, follow the docker commands to upload an image to your registry of choice:

docker build -t USERNAME/ligpargen .
docker login
docker push USERNAME/ligpargen

Note: Be sure to check that on DockerHub, the image Visibility is set to Private.

Now, you can simply pull your image to which ever HPC cluster environment you choose to proceed with,

docker pull USERNAME/ligpargen:latest

On NERSC, users have the option of using Shifter or Podman. We recommend Podman in this case to circumvent additional user-level permission requirements. The following Podman commands will work:

podman-hpc login docker.io
Username: USERNAME
Password:

podman-hpc pull docker.io/USERNAME/ligpargen:latest

2. Set environment variables¶

Set the image name and container software (Docker, Shifter, Apptainer, etc.) to environment variables (consider adding these to your ~/.bashrc):

export LPG_IMAGE_NAME="USERNAME/ligpargen:latest"
export CONTAINER_SOFTWARE="podman-hpc" # e.g.

3. Run `generate_opls_xml`¶

A simple function call will create your desired .XML force field file (e.g., EC.xml):

from atomate2.openmm.utils import generate_opls_xml

mols = {
    "EC": {
        "smiles": "C1COC(=O)O1",
        "charge": "0",  # default_value=0
        "checkopt": 3,  # default_value=3
        "charge_method": "CM1A",  # default_value="CM1A"
    },
}
generate_opls_xml(mols)

Functionally, this is equivalent to running the following LigParGen command:

ligpargen -n EC -p EC -r EC -c 0 -o 3 -cgen CM1A -s C1COC(=O)O1

Now, just like before, you can create an Interchange object. Be sure to include opls as an ff_kwargs so the correct geometric combination rules for OPLS force fields are invoked,

elyte_interchange_job = generate_openmm_interchange(
    mol_specs_dicts, ff_xmls=["EC.xml"], ff_kwargs=["opls"]
)

See that this general process can work transferably for any parameter generation software given you (1) create an image, (2) set the image name as an environment variable, and (3) minimally modify generate_opls_xml(...) to your own requirements. In future work, we’ll improve this black-box type functionality to support wider parameter generation methods.

Analysis with Emmet¶

For now, you’ll need to make sure you have a particular emmet branch installed. Later the builders will be integrated into main.

pip install git+https://github.com/orionarcher/emmet@md_builders

Analyzing Local Data¶

Learn to analyze your data without a database

Emmet will give us a solid head start on analyzing our data even without touching a database. Below, we use emmet to create a MDAnalysis Universe and a SolvationAnalysis Solute. From here, we can do all sorts of very cool analysis, but that’s beyond the scope of this tutorial. Consult the tutorials in SolvationAnalysis and MDAnalysis for more information.

from atomate2.openff.core import ClassicalMDTaskDocument
from emmet.builders.classical_md.utils import create_universe, create_solute
from openff.interchange import Interchange

ec_emc_taskdoc = ClassicalMDTaskDocument.parse_file("tutorial_system/taskdoc.json")
interchange = Interchange.parse_raw(ec_emc_taskdoc.interchange)
mol_specs = ec_emc_taskdoc.mol_specs

u = create_universe(
    interchange,
    mol_specs,
    str("tutorial_system/trajectory5.dcd"),
    traj_format="DCD",
)

solute = create_solute(u, solute_name="Li", networking_solvents=["PF6"])

Setting up builders¶

Connect with your databases

If you followed the instructions above to set up a database, you can use the ElectrolyteBuilder to perform the same analysis as above.

First, we’ll need to create the stores where are data is located, these should match the stores you used when running your flow.

from maggma.stores import MongoStore, S3Store

mongo_info = {
    "username": "USERNAME",
    "password": "PASSWORD",
    "database": "DATABASE",
    "host": "mongodb05.nersc.gov",
}
s3_info = {
    "bucket": "BUCKET",
    "s3_profile": "PROFILE",
    "endpoint_url": "https://next-gen-minio.materialsproject.org",
}

md_docs = MongoStore(**mongo_info, collection_name="atomate2_docs")
md_blob_index = MongoStore(
    **mongo_info,
    collection_name="atomate2_blobs_index",
    key="blob_uuid",
)
md_blob_store = S3Store(
    **s3_info,
    index=md_blob_index,
    key="blob_uuid",
)

Now we create our Emmet builder and connect to it. We will include a query that will only select jobs with the tag “tutorial_production_flow” that we used earlier.

from emmet.builders.classical_md.openmm.core import ElectrolyteBuilder

builder = ElectrolyteBuilder(
    md_docs, md_blob_store, query={"output.tags": "tutorial_production_flow"}
)
builder.connect()

Here are some more convenient queries.

Here are some more convenient queries we could use!

# query jobs from a specific day
april_16 = {"completed_at": {"$regex": "^2024-04-16"}}
may = {"completed_at": {"$regex": "^2024-05"}}


# query a particular set of jobs
job_uuids = [
    "3d7b4db4-85e5-48a5-9585-07b37910720f",
    "4202b18f-f156-4705-8ca6-ac2a08093174",
    "187d9466-c359-4013-9e25-8b4ece6e3ecf",
]
my_specific_jobs = {"uuid": {"$in": job_uuids}}

Analyzing systems individually¶

Download and explore systems one-by-one

To analyze a specific system, you’ll need the uuid of the taskdoc you want to analyze. We can find the uuids of all the taskdocs in our builder by retrieving the items and extracting the uuids.

items = builder.get_items()
uuids = [item["uuid"] for item in items]

This, however, can quickly get confusing once you have many jobs. At this point, I would highly recommend starting to use an application that makes it easier to view and navigate MongoDB databases. I recommend Studio3T or DataGrip.

Now we again use our builder to create a Universe and Solute. This time instatiate_universe downloads the trajectory, saves it locally, and uses it to create a Universe.

# a query that will grab
tutorial_query = {"tags": "tutorial_production_flow"}

u = builder.instantiate_universe(uuid, "directory/to/store/trajectory")

solute = create_solute(
    u,
    solute_name="Li",
    networking_solvents=["PF6"],
    fallback_radius=3,
)

Automated analysis with builders¶

Do it all for all the systems!

Finally, we’ll put the H in high-throughput molecular dynamics. Below, we create Stores to hold our SolvationDocs and CalculationDocs and execute the builder on all of our jobs!

Later, there will also be TransportDocs, EquilibrationDocs and more. Aggregating most of what you might want to know about an MD simulation.

solvation_docs = MongoStore(**mongo_info, collection_name="solvation_docs")
calculation_docs = MongoStore(**mongo_info, collection_name="calculation_docs")
builder = ElectrolyteBuilder(md_docs, md_blob_store, solvation_docs, calculation_docs)

builder.connect()
items = builder.get_items()
processed_docs = builder.process_items(items)
builder.update_targets(processed_docs)