fireworks.features package

Submodules

fireworks.features.background_task module

class fireworks.features.background_task.BackgroundTask(tasks, num_launches=0, sleep_time=60, run_on_finish=False)

Bases: fireworks.utilities.fw_serializers.FWSerializable, object

__init__(tasks, num_launches=0, sleep_time=60, run_on_finish=False)
Parameters
  • [Firetask] (tasks) – a list of Firetasks to perform

  • num_launches (int) – the total number of times to run the process (0=infinite)

  • sleep_time (int) – sleep time in seconds between background runs

  • run_on_finish (bool) – always run this task upon completion of Firework

classmethod from_dict(*args, **kwargs)
to_dict(*args, **kwargs)

fireworks.features.dupefinder module

class fireworks.features.dupefinder.DupeFinderBase

Bases: fireworks.utilities.fw_serializers.FWSerializable

This serves an Abstract class for implementing Duplicate Finders

__init__()

Initialize self. See help(type(self)) for accurate signature.

classmethod from_dict(m_dict)
query(spec)

Given a spec, returns a database query that gives potential candidates for duplicated Fireworks.

Parameters

spec (dict) – spec to check for duplicates

to_dict(*args, **kwargs)
verify(spec1, spec2)

Method that checks whether two specs are identical enough to be considered duplicates. Return true if duplicated. Note that implementing this method might slow FireWorks performance somewhat, so it is best to do as much as possible within the “query” method.

Args: spec1 (dict) spec2 (dict)

Returns

bool

fireworks.features.fw_report module

class fireworks.features.fw_report.FWReport(lpad)

Bases: object

__init__(lpad)

Args: lpad (LaunchPad)

get_stats(coll='fireworks', interval='days', num_intervals=5, additional_query=None)

Compile statistics of completed Fireworks/Workflows for past <num_intervals> <interval>, e.g. past 5 days.

Parameters
  • coll (str) – collection, either “fireworks”, “workflows”, or “launches”

  • interval (str) – one of “minutes”, “hours”, “days”, “months”, “years”

  • num_intervals (int) – number of intervals to go back in time from present moment

  • additional_query (dict) – additional constraints on reporting

Returns

list, with each item being a dictionary of statistics for a given interval

static get_stats_str(decorated_stat_list)

Convert the list of stats from FWReport.get_stats() to a string representation for viewing.

Parameters

decorated_stat_list ([dict]) –

Returns

str

plot_stats(coll='fireworks', interval='days', num_intervals=5, states=None, style='bar', **kwargs)

Makes a chart with the summary data

Parameters
  • coll (str) – collection, either “fireworks”, “workflows”, or “launches”

  • interval (str) – one of “minutes”, “hours”, “days”, “months”, “years”

  • num_intervals (int) – number of intervals to go back in time from present moment

  • states ([str]) – states to include in plot, defaults to all states, note this also specifies the order of stacking

  • style (str) – style of plot to generate, can either be ‘bar’ or ‘fill’

Returns

matplotlib plot module

fireworks.features.introspect module

class fireworks.features.introspect.Introspector(lpad)

Bases: object

__init__(lpad)
Parameters

lpad (LaunchPad) –

introspect_fizzled(coll='fws', rsort=True, threshold=10, limit=100)
static print_report(table, coll)
fireworks.features.introspect.collect_stats(list_keys, filter_truncated=True)

Turns a list of keys (from flatten_to_keys) into a dict of <str>:count, i.e. counts the number of times each key appears.

Parameters
  • list_keys

  • filter_truncated (bool) –

Returns

dict

fireworks.features.introspect.compare_stats(statsdict1, numsamples1, statsdict2, numsamples2, threshold=5)
fireworks.features.introspect.flatten_to_keys(curr_doc, curr_recurs=1, max_recurs=2)

Converts a dictionary into a list of keys, with string values “key1.key2:val”

Parameters
  • curr_doc

  • curr_recurs (int) –

  • max_recurs (int) –

Returns

str

fireworks.features.multi_launcher module

fireworks.features.multi_launcher.launch_multiprocess(launchpad, fworker, loglvl, nlaunches, num_jobs, sleep_time, total_node_list=None, ppn=1, timeout=None, exclude_current_node=False, local_redirect=False)

Launch the jobs in the job packing mode.

Parameters
  • launchpad (LaunchPad) –

  • fworker (FWorker) –

  • loglvl (str) – level at which to output logs

  • nlaunches (int) – 0 means ‘until completion’, -1 or “infinite” means to loop forever

  • num_jobs (int) – number of sub jobs

  • sleep_time (int) – secs to sleep between rapidfire loop iterations

  • total_node_list ([str]) – contents of NODEFILE (doesn’t affect execution)

  • ppn (int) – processors per node (doesn’t affect execution)

  • timeout (int) – # of seconds after which to stop the rapidfire process

  • exclude_current_node – Don’t use the script launching node as a compute node

  • local_redirect (bool) – redirect standard input and output to local file

fireworks.features.multi_launcher.ping_multilaunch(port, stop_event)

A single manager to ping all launches during multiprocess launches

Parameters
  • port (int) – Listening port number of the DataServer

  • stop_event (Thread.Event) – stop event

fireworks.features.multi_launcher.rapidfire_process(fworker, nlaunches, sleep, loglvl, port, node_list, sub_nproc, timeout, running_ids_dict, local_redirect)

Initializes shared data with multiprocessing parameters and starts a rapidfire.

Parameters
  • fworker (FWorker) – object

  • nlaunches (int) – 0 means ‘until completion’, -1 or “infinite” means to loop forever

  • sleep (int) – secs to sleep between rapidfire loop iterations

  • loglvl (str) – level at which to output logs to stdout

  • port (int) – Listening port number of the shared object manage

  • password (str) – security password to access the server

  • node_list ([str]) – computer node list

  • sub_nproc (int) – number of processors of the sub job

  • timeout (int) – # of seconds after which to stop the rapidfire process

  • local_redirect (bool) – redirect standard input and output to local file

fireworks.features.multi_launcher.split_node_lists(num_jobs, total_node_list=None, ppn=24)

Parse node list and processor list from nodefile contents

Parameters
  • num_jobs (int) – number of sub jobs

  • total_node_list (list of str) – the node list of the whole large job

  • ppn (int) – number of procesors per node

Returns

(([int],[int])) the node list and processor list for each job

fireworks.features.multi_launcher.start_rockets(fworker, nlaunches, sleep, loglvl, port, node_lists, sub_nproc_list, timeout=None, running_ids_dict=None, local_redirect=False)

Create each sub job and start a rocket launch in each one

Parameters
  • fworker (FWorker) – object

  • nlaunches (int) – 0 means ‘until completion’, -1 or “infinite” means to loop forever

  • sleep (int) – secs to sleep between rapidfire loop iterations

  • loglvl (str) – level at which to output logs to stdout

  • port (int) – Listening port number

  • node_lists ([str]) – computer node list

  • sub_nproc_list ([int]) – list of the number of the process of sub jobs

  • timeout (int) – # of seconds after which to stop the rapidfire process

  • running_ids_dict (dict) – Shared dict between process to record IDs

  • local_redirect (bool) – redirect standard input and output to local file

Returns

([multiprocessing.Process]) all the created processes

fireworks.features.stats module

class fireworks.features.stats.FWStats(lpad)

Bases: object

__init__(lpad)

Object to get Fireworks running stats from a LaunchPad.

Parameters

lpad (LaunchPad) – A LaunchPad object that manages the Fireworks database

get_daily_completion_summary(query_start=None, query_end=None, query=None, time_field='time_end', **args)

Get daily summary of fireworks for a specified time range :param query_start: (str) The start time (inclusive) to query in isoformat (YYYY-MM-DDTHH:MM:SS.mmmmmm). Default is 30 days before current time. :param query_end: (str) The end time (exclusive) to query in isoformat (YYYY-MM-DDTHH:MM:SS.mmmmmm). Default is current time. :param query: (dict) Additional Pymongo queries to filter entries for process. :param time_field: (str) The field to query time range. Default is “time_end”. :param args: (dict) Time difference to calculate query_start from query_end. Accepts arguments in python datetime.timedelta function. args and query_start can not be given at the same time. Default is 30 days. :return: (list) A summary of daily fireworks stats for the specified time range.

get_fireworks_summary(query_start=None, query_end=None, query=None, time_field='updated_on', **args)

Get fireworks summary for a specified time range.

Parameters
  • query_start (str) – The start time (inclusive) to query in isoformat (YYYY-MM-DDTHH:MM:SS.mmmmmm). Default is 30 days before current time.

  • query_end (str) – The end time (exclusive) to query in isoformat (YYYY-MM-DDTHH:MM:SS.mmmmmm). Default is current time.

  • query (dict) – Additional Pymongo queries to filter entries for process.

  • time_field (str) – The field to query time range. Default is “updated_on”.

  • args (dict) – Time difference to calculate query_start from query_end. Accepts arguments in python datetime.timedelta function. args and query_start can not be given at the same time. Default is 30 days.

Returns

(list) A summary of fireworks stats for the specified time range.

get_launch_summary(query_start=None, query_end=None, time_field='time_end', query=None, runtime_stats=False, include_ids=False, **args)

Get launch summary for a specified time range.

Parameters
  • query_start (str) – The start time (inclusive) to query in isoformat (YYYY-MM-DDTHH:MM:SS.mmmmmm). Default is 30 days before current time.

  • query_end (str) – The end time (exclusive) to query in isoformat (YYYY-MM-DDTHH:MM:SS.mmmmmm). Default is current time.

  • time_field (str) – The field to query time range. Default is “time_end”.

  • query (dict) – Additional Pymongo queries to filter entries for process.

  • runtime_stats (bool) – If return runtime stats. Default is False.

  • include_ids (bool) – If return fw_ids. Default is False.

  • args (dict) – Time difference to calculate query_start from query_end. Accepts arguments in python datetime.timedelta function. args and query_start can not be given at the same time. Default is 30 days.

Returns

(list) A summary of launch stats for the specified time range.

get_workflow_summary(query_start=None, query_end=None, query=None, time_field='updated_on', **args)

Get workflow summary for a specified time range. :param query_start: (str) The start time (inclusive) to query in isoformat (YYYY-MM-DDTHH:MM:SS.mmmmmm). Default is 30 days before current time. :param query_end: (str) The end time (exclusive) to query in isoformat (YYYY-MM-DDTHH:MM:SS.mmmmmm). Default is current time. :param query: (dict) Additional Pymongo queries to filter entries for process. :param time_field: (str) The field to query time range. Default is “updated_on”. :param args: (dict) Time difference to calculate query_start from query_end. Accepts arguments in python datetime.timedelta function. args and query_start can not be given at the same time. Default is 30 days. :return: (list) A summary of workflow stats for the specified time range.

group_fizzled_fireworks(group_by, query_start=None, query_end=None, query=None, include_ids=False, **args)

Group fizzled fireworks for a specified time range by a specified key. :param group_by: (str) Database field used to group fireworks items. :param query_start: (str) The start time (inclusive) to query in isoformat (YYYY-MM-DDTHH:MM:SS.mmmmmm). Default is 30 days before current time. :param query_end: (str) The end time (exclusive) to query in isoformat (YYYY-MM-DDTHH:MM:SS.mmmmmm). Default is current time. :param query: (dict) Additional Pymongo queries to filter entries for process. :param include_ids: (bool) If return fw_ids. Default is False. :param args: (dict) Time difference to calculate query_start from query_end. Accepts arguments in python datetime.timedelta function. args and query_start can not be given at the same time. Default is 30 days. :return: (list) A summary of fizzled fireworks for group by the specified key.

identify_catastrophes(error_ratio=0.01, query_start=None, query_end=None, query=None, time_field='time_end', include_ids=True, **args)

Get days with higher failure ratio :param error_ratio: (float) Threshold of error ratio to define as a catastrophic day :param query_start: (str) The start time (inclusive) to query in isoformat (YYYY-MM-DDTHH:MM:SS.mmmmmm). Default is 30 days before current time. :param query_end: (str) The end time (exclusive) to query in isoformat (YYYY-MM-DDTHH:MM:SS.mmmmmm). Default is current time. :param query: (dict) Additional Pymongo queries to filter entries for process. :param time_field: (str) The field to query time range. Default is “time_end”. :param include_ids: (bool) If return fw_ids. Default is False. :param args: (dict) Time difference to calculate query_start from query_end. Accepts arguments in python datetime.timedelta function. args and query_start can not be given at the same time. Default is 30 days. :return: (list) Dates with higher failure ratio with optional failed fw_ids.

Module contents