OpenMM Tutorial¶
Installing Atomate2 From Source with OpenMM¶
# setting up our conda environment
>>> conda create -n atomate2 python=3.11
>>> conda activate atomate2
# installing atomate2
>>> pip install 'git+https://github.com/orionarcher/atomate2.git#egg=atomate2[classical_md]'
# installing classical_md dependencies
>>> conda install -c conda-forge --file .github/classical_md_requirements.txt
Alternatively, if you anticipate regularly updating atomate2 from source (which at this point, you should), you can clone the repository and install from source.
# installing atomate2
>>> git clone https://github.com/orionarcher/atomate2.git
>>> cd atomate2
>>> git branch openff
>>> git checkout openff
>>> git pull origin openff
>>> pip install -e '.[classical_md]'
To test the openmm installation, you can run the following command. If you intend to run on GPU, make sure that the tests are passing for CUDA.
>>> python -m openmm.testInstallation
pip uninstall pymongo
pip uninstall bson
pip install pymongo
Understanding Atomate2 OpenMM¶
Atomate2 is really just a collection of jobflow workflows relevant to materials science. In all the workflows, we pass our system of interest between different jobs to perform the desired simulation. Representing the intermediate state of a classical molecular dynamics simulation, however, is challenging. While the intermediate representation between stages of a periodic DFT simulation can include just the elements, xyz coordinates, and box vectors, classical molecular dynamics systems must also include velocities and forces. The latter is particularly challenging because all MD engines represent forces differently. Rather than implement our own representation, we use the openff.interchange.Interchange
object, which catalogs the necessary system properties and interfaces with a variety of MD engines. This is the object that we pass between stages of a classical MD simulation and it is the starting point of our workflow.
Pouring a Glass of Wine¶
The first job we need to create generates the Interchange
object. To specify the system of interest, we use give it the SMILES strings, counts, and names (optional) of the molecules we want to include.
from atomate2.openff.core import generate_interchange
mol_specs_dicts = [
{"smiles": "O", "count": 200, "name": "water"},
{"smiles": "CCO", "count": 10, "name": "ethanol"},
{"smiles": "C1=C(C=C(C(=C1O)O)O)C(=O)O", "count": 1, "name": "gallic_acid"},
]
gallic_interchange_job = generate_interchange(mol_specs_dicts, 1.3)
If you are wondering what arguments are allowed in the dictionaries, check out the create_mol_spec
function in the atomate2.openff.utils
module. Under the hood, this is being called on each mol_spec dict. Meaning the code below is functionally identical to the code above.
from atomate2.openff.utils import create_mol_spec
mols_specs = [create_mol_spec(**mol_spec_dict) for mol_spec_dict in mol_specs_dicts]
generate_interchange(mols_specs, 1.3)
In a more complex simulation we might want to scale the ion charges and include custom partial charges. An example with the Gen2 electrolyte is shown below. This yields the elyte_interchange_job
object, which we can pass to the next stage of the simulation.
NOTE: It’s actually mandatory to include partial charges for PF6- here, the built in partial charge method fails.
import numpy as np
from pymatgen.core.structure import Molecule
pf6 = Molecule(
["P", "F", "F", "F", "F", "F", "F"],
[
[0.0, 0.0, 0.0],
[1.6, 0.0, 0.0],
[-1.6, 0.0, 0.0],
[0.0, 1.6, 0.0],
[0.0, -1.6, 0.0],
[0.0, 0.0, 1.6],
[0.0, 0.0, -1.6],
],
)
pf6_charges = np.array([1.34, -0.39, -0.39, -0.39, -0.39, -0.39, -0.39])
mol_specs_dicts = [
{"smiles": "C1COC(=O)O1", "count": 100, "name": "EC"},
{"smiles": "CCOC(=O)OC", "count": 100, "name": "EMC"},
{
"smiles": "F[P-](F)(F)(F)(F)F",
"count": 50,
"name": "PF6",
"partial_charges": pf6_charges,
"geometry": pf6,
"charge_scaling": 0.8,
"charge_method": "RESP",
},
{"smiles": "[Li+]", "count": 50, "name": "Li", "charge_scaling": 0.8},
]
elyte_interchange_job = generate_interchange(mol_specs_dicts, 1.3)
The basic simulation¶
To run a production simulation, we will create a production flow, link it to our elyte_interchange_job
, and then run both locally.
In jobflow, jobs and flows are created by Makers, which can then be linked into more complex flows. The production maker links together makers for energy minimization, pressure equilibration, annealing, and a nvt simulation. The annealing is itself a flow flow that links together nvt and tempchange makers (it uses the anneal_flow
method to save us from creating three more jobs manually). When linked up the generate_interchange
job this yields a production ready molecular dynamics workflow.
from jobflow import Flow, run_locally
from atomate2.openmm.flows.core import OpenMMFlowMaker
from atomate2.openmm.jobs.core import EnergyMinimizationMaker, NPTMaker, NVTMaker
production_maker = OpenMMFlowMaker(
name="production_flow",
makers=[
EnergyMinimizationMaker(traj_interval=10, state_interval=10),
NPTMaker(n_steps=100),
OpenMMFlowMaker.anneal_flow(n_steps=150),
NVTMaker(n_steps=100),
],
)
production_flow = production_maker.make(
elyte_interchange_job.output.interchange,
prev_dir=elyte_interchange_job.output.dir_name,
)
run_locally(
Flow([elyte_interchange_job, production_flow]), root_dir="./tutorial_system"
)
When the above code is executed, you should expect to see something like this:
/tutorial_system
├── state.csv
├── state2.csv
├── state3.csv
├── state4.csv
├── state5.csv
├── state6.csv
├── taskdoc.json
├── trajectory.dcd
├── trajectory2.dcd
├── trajectory3.dcd
├── trajectory4.dcd
├── trajectory5.dcd
├── trajectory6.dcd
We see that each job saved a separate state and trajectory file. There are 6 because the AnnealMaker
creates 3 sub-jobs and the EnergyMinimizationMaker
does not report anything. We also see a taskdoc.json
file, which contains the metadata for the entire workflow. This is needed when we later want to do downstream analysis in emmet
.
Configuring the Simulation¶
All OpenMM jobs, i.e. anything in atomate2.openmm.jobs
, inherits from the BaseOpenMMMaker
class. BaseOpenMMMaker
is highly configurable, you can change the timestep, temperature, reporting frequencies, output types, and a range of other properties. See the docstring for the full list of options.
Note that when instantiating the ProductionMaker
above, we only set the traj_interval
and state_interval
once, inside EnergyMinimizationMaker
. This is a key feature: all makers will inherit attributes from the previous maker if they are not explicitly reset. This allows you to set the timestep once and have it apply to all stages of the simulation. More explicitly, the value inheritance is as follows: 1) any explictly set value, 2) the value from the previous maker, 3) the default value, shown below.
from atomate2.openmm.jobs.base import OPENMM_MAKER_DEFAULTS
OPENMM_MAKER_DEFAULTS
Perhaps we want to record a trajectory with velocities but only for the final NVT run.
Running with Databases¶
Before trying this, you should have a basic understanding of JobFlow and Stores.
To log OpenMM results to a database, you’ll need to set up both a MongoStore, for taskdocs, and blob storage, for trajectories. Here, I’ll show you the correct jobflow.yaml file to use the MongoDB storage and MinIO S3 storage provided by NERSC. To get this up, you’ll need to contact NERSC to get accounts on their MongoDB and MinIO services. Then you can follow the instructions in the Stores tutorial to link jobflow to your databases. Your jobflow.yaml
should look like this:
JOB_STORE:
docs_store:
type: MongoStore
database: DATABASE
collection_name: atomate2_docs # suggested
host: mongodb05.nersc.gov
port: 27017
username: USERNAME
password: PASSWORD
additional_stores:
data:
type: S3Store
index:
type: MongoStore
database: DATABASE
collection_name: atomate2_blobs_index # suggested
host: mongodb05.nersc.gov
port: 27017
username: USERNAME
password: PASSWORD
key: blob_uuid
bucket: oac
s3_profile: oac
s3_resource_kwargs:
verify: false
endpoint_url: https://next-gen-minio.materialsproject.org/
key: blob_uuid
NOTE: This can work with any MongoDB and S3 storage, not just NERSC’s.
Rather than use jobflow.yaml
, you could also create the stores in Python and pass the stores to the run_locally
function. This is shown below for completeness but the prior method is usually recommended.
from jobflow import run_locally, JobStore
from maggma.stores import MongoStore, S3Store, MemoryStore
md_doc_store = MongoStore(
username="USERNAME",
password="PASSWORD",
database="DATABASE",
collection_name="atomate2_docs", # suggested
host="mongodb05.nersc.gov",
port=27017,
)
md_blob_index = MongoStore(
username="USERNAME",
password="PASSWORD",
database="DATABASE",
collection_name="atomate2_blobs_index", # suggested
host="mongodb05.nersc.gov",
port=27017,
key="blob_uuid",
)
md_blob_store = S3Store(
index=md_blob_index,
bucket="BUCKET",
s3_profile="PROFILE",
endpoint_url="https://next-gen-minio.materialsproject.org",
key="blob_uuid",
)
wf = [] # set up whatever workflow you'd like to run
# run the flow with our custom store
run_locally(
wf,
store=JobStore(md_doc_store, additional_stores={"data": md_blob_store}),
ensure_success=True,
)
Running on GPU(s)¶
Running on a GPU is nearly as simple as running on a CPU. The only difference is that you need to specify the platform_properties
argument in the EnergyMinimizationMaker
with the DeviceIndex
of the GPU you want to use.
production_maker = OpenMMFlowMaker(
name="production_flow",
makers=[
EnergyMinimizationMaker(
platform_name="CUDA",
platform_properties={"DeviceIndex": "0"},
),
NPTMaker(n_steps=100),
OpenMMFlowMaker.anneal_flow(n_steps=150),
NVTMaker(n_steps=100),
],
)
To run on a system with multiple GPUs, the ‘DeviceIndex’ can be changed to a different number for each job.