from jobflow import JobStore, run_locally
from maggma.stores import MemoryStore
from mock_vasp import TEST_DIR, mock_vasp
from monty.json import MontyDecoder
from pymatgen.core import Structure
from pymatgen.io.vasp import Chgcar

from atomate2.vasp.flows.core import StaticMaker

job_store = JobStore(MemoryStore(), additional_stores={"data": MemoryStore()})
si_structure = Structure.from_file(TEST_DIR / "structures" / "Si.cif")
ref_paths = {"static": "Si_band_structure/static"}

Using Blob Storage¶

While most of the output data from atomate2 is serialized and stored in a MongoDB database, some objects exceed the 16MB limit for MongoDB documents and must be placed into blob storage. Objects like the electronic charge density (Chgcar) are routinely larger than this file size and requires special treatment. jobflows method of dealing with these objects this shown below:

@job(data=Chgcar)
def some_job():
    # return a document/dictionary that contains a Chgcar
    return dictionary

where the argument to the @job decorator indicates that all Chgcar objects will be automaically dispatched to

JOB_STORE.additional_stores["data"]

Which should already be configured in your jobflow.yaml file.

For more details on how additional_store works please check out this example.

atomate2 will automatically dispatch some well-known large objects to the data blob storage.

A full list of the the objects that will automatically dispatched to blob storage can be found here:

A common usage case of object storage is in storing volumetric data from VASP outputs. The storage of volumetric data is turned off by default, but specific files can be turned on by setting the task_document_kwargs for any child class of BaseVaspMaker. For example, to store the CHGCAR file, you would set the task_document_kwargs in StaticMaker as follows:

static_maker = StaticMaker(task_document_kwargs={"store_volumetric_data": ("chgcar",)})

Note that a valid list of object Enum values must be provided to store_volumetric_data in order to store the data. The list of valid objects can be found here

class VaspObject(ValueEnum):
    """Types of VASP data objects."""

    BANDSTRUCTURE = "bandstructure"
    DOS = "dos"
    CHGCAR = "chgcar"
    AECCAR0 = "aeccar0"
    AECCAR1 = "aeccar1"
    AECCAR2 = "aeccar2"
    TRAJECTORY = "trajectory"
    ELFCAR = "elfcar"
    WAVECAR = "wavecar"
    LOCPOT = "locpot"
    OPTIC = "optic"
    PROCAR = "procar"

Using the static_maker we can create a job and execute it.

# create the job
job = static_maker.make(si_structure)
# run the job in a mock vasp environment
# make sure to send the results to the temporary job store
with mock_vasp(ref_paths=ref_paths) as mf:
    responses = run_locally(
        job,
        create_folders=True,
        ensure_success=True,
        store=job_store,
        raise_immediately=True,
    )

Once the job completes, you can retrieve the full task document along with the serialized Chgcar object from the blob storage and reconstruct the Chgcar object using the load=True flag as shown below.

with job_store as js:
    result = js.get_output(job.uuid, load=True)

chgcar = MontyDecoder().process_decoded(result["vasp_objects"]["chgcar"])
if not isinstance(chgcar, Chgcar):
    raise TypeError(f"{type(chgcar)=}")

However, if the objects is too big to keep around while you are exploring the data structure, you can use the default load=False flag and only load the reference to the object. This will allow you to explore the data structure without loading the object into memory.

with job_store as js:
    result_no_obj = js.get_output(job.uuid)
result_no_obj["vasp_objects"]

Then you can query for the object at any time using the blob_uuid.

search_data = result_no_obj["vasp_objects"]["chgcar"]
with job_store.additional_stores["data"] as js:
    blob_data = js.query_one(criteria={"blob_uuid": search_data["blob_uuid"]})

Then we can deserialize the object again from the data subfield of the blob query result.

chgcar2 = MontyDecoder().process_decoded(blob_data["data"])