from jobflow import JobStore, run_locally
from maggma.stores import MemoryStore
from mock_vasp import TEST_DIR, mock_vasp
from monty.json import MontyDecoder
from pymatgen.core import Structure
from pymatgen.io.vasp import Chgcar
from atomate2.vasp.flows.core import StaticMaker
job_store = JobStore(MemoryStore(), additional_stores={"data": MemoryStore()})
si_structure = Structure.from_file(TEST_DIR / "structures" / "Si.cif")
ref_paths = {"static": "Si_band_structure/static"}
Using Blob StorageΒΆ
While most of the output data from atomate2
is serialized and stored in a MongoDB database, some objects exceed the 16MB limit for MongoDB documents and must be placed into blob storage. Objects like the electronic charge density (Chgcar
) are routinely larger than this file size and requires special treatment. jobflows
method of dealing with these objects this shown below:
@job(data=Chgcar)
def some_job():
# return a document/dictionary that contains a Chgcar
return dictionary
where the argument to the @job
decorator indicates that all Chgcar
objects will be automaically dispatched to
JOB_STORE.additional_stores["data"]
Which should already be configured in your jobflow.yaml
file.
For more details on how additional_store
works please check out this example.
atomate2
will automatically dispatch some well-known large objects to the data
blob storage.
A full list of the the objects that will automatically dispatched to blob storage can be found here:
A common usage case of object storage is in storing volumetric data from VASP outputs. The storage of volumetric data is turned off by default, but specific files can be turned on by setting the task_document_kwargs
for any child class of BaseVaspMaker
.
For example, to store the CHGCAR
file, you would set the task_document_kwargs
in StaticMaker as follows:
static_maker = StaticMaker(task_document_kwargs={"store_volumetric_data": ("chgcar",)})
Note that a valid list of object Enum
values must be provided to store_volumetric_data
in order to store the data. The list of valid objects can be found here
class VaspObject(ValueEnum):
"""Types of VASP data objects."""
BANDSTRUCTURE = "bandstructure"
DOS = "dos"
CHGCAR = "chgcar"
AECCAR0 = "aeccar0"
AECCAR1 = "aeccar1"
AECCAR2 = "aeccar2"
TRAJECTORY = "trajectory"
ELFCAR = "elfcar"
WAVECAR = "wavecar"
LOCPOT = "locpot"
OPTIC = "optic"
PROCAR = "procar"
Using the static_maker
we can create a job and execute it.
# create the job
job = static_maker.make(si_structure)
# run the job in a mock vasp environment
# make sure to send the results to the temporary job store
with mock_vasp(ref_paths=ref_paths) as mf:
responses = run_locally(
job,
create_folders=True,
ensure_success=True,
store=job_store,
raise_immediately=True,
)
Once the job completes, you can retrieve the full task document along with the serialized Chgcar
object from the blob storage and reconstruct the Chgcar
object using the load=True
flag as shown below.
with job_store as js:
result = js.get_output(job.uuid, load=True)
chgcar = MontyDecoder().process_decoded(result["vasp_objects"]["chgcar"])
if not isinstance(chgcar, Chgcar):
raise TypeError(f"{type(chgcar)=}")
However, if the objects is too big to keep around while you are exploring the data structure, you can use the default load=False
flag and only load the reference to the object. This will allow you to explore the data structure without loading the object into memory.
with job_store as js:
result_no_obj = js.get_output(job.uuid)
result_no_obj["vasp_objects"]
Then you can query for the object at any time using the blob_uuid
.
search_data = result_no_obj["vasp_objects"]["chgcar"]
with job_store.additional_stores["data"] as js:
blob_data = js.query_one(criteria={"blob_uuid": search_data["blob_uuid"]})
Then we can deserialize the object again from the data
subfield of the blob query result.
chgcar2 = MontyDecoder().process_decoded(blob_data["data"])