ema workbench

Other Sub Sites

model_ensemble

Created on 23 dec. 2010

Code author: jhkwakkel <j.h.kwakkel (at) tudelft (dot) nl>

class model_ensemble.ModelEnsemble(sampler=<samplers.LHSSampler object at 0x10AD4550>)

One of the two main classes for performing EMA. The ensemble class is responsible for running experiments on one or more model structures across one or more policies, and returning the results.

The sampling is delegated to a sampler instance. The storing or results is delegated to a callback instance

the class has an attribute ‘parallel’ that specifies whether the experiments are to be run in parallel or not. By default, ‘parallel’ is False.

an illustration of use

>>> model = UserSpecifiedModelInterface(r'working directory', 'name')
>>> ensemble = SimpleModelEnsemble()
>>> ensemble.set_model_structure(model)
>>> ensemble.parallel = True #parallel processing is turned on
>>> results = ensemble.perform_experiments(1000) #perform 1000 experiments

In this example, a 1000 experiments will be carried out in parallel on the user specified model interface. The uncertainties are retrieved from model.uncertainties and the outcomes are assumed to be specified in model.outcomes.

add_model_structure(ms)

Add a model structure to the list of model structures.

Parameters:ms – a ModelStructureInterface instance.
add_model_structures(mss)

add a collection of model structures to the list of model structures.

Parameters:mss – a collection of ModelStructureInterface instances
add_policies(policies)

Add policies, policies should be a collection of policies.

Parameters:policies – policies to be added, every policy should be a dict with at least a name.
add_policy(policy)

Add a policy.

Parameters:policy – policy to be added, policy should be a dict with at least a name.
continue_robust_optimization(cases=None, nr_of_generations=10, pop=None, stats_callback=None, policy_levers=None, obj_function=None, crossover_rate=0.5, mutation_rate=0.02, reporting_interval=100, **kwargs)

Continue the robust optimization from a previously saved state. To make this work, one should save the return from perform_robust_optimization. The typical use case for this method is to manually track convergence of the optimization after a number of specified generations.

Parameters:
  • cases – In case of Latin Hypercube sampling and Monte Carlo sampling, cases specifies the number of cases to generate. In case of Full Factorial sampling, cases specifies the resolution to use for sampling continuous uncertainties. Alternatively, one can supply a list of dicts, where each dicts contains a case. That is, an uncertainty name as key, and its value.
  • nr_of_generations – the number of generations for which the GA will be run
  • pop – the last ran population, returned by perform_robust_optimization
  • stats_callback – the NSGA2StatisticsCallback instance returned by perform_robust_optimization
  • reporting_interval – parameter for specifying the frequency with which the callback reports the progress. (Default is 100)
  • policy_levers – A dictionary with model parameter names as key and a dict as value. The dict should have two fields: ‘type’ and ‘values. Type is either list or range, and determines the appropriate allele type. Values are the parameters to be used for the specific allele.
  • obj_function – the objective function used by the optimization
  • crossover_rate – crossover rate for the GA
  • mutation_rate – mutation_rate for the GA

Note

There is some tricky stuff involved in loading the stats_callback via cPickle. cPickle requires that the classes in the pickle file exist. The individual class used by deap is generated dynamicly. Loading the cPickle should thus be preceded by reinstantiating the correct individual.

determine_uncertainties()

Helper method for determining the unique uncertainties and how the uncertainties are shared across multiple model structure interfaces.

Returns:An overview dictionary which shows which uncertainties are used by which model structure interface, or interfaces, and a dictionary with the unique uncertainties across all the model structure interfaces, with the name as key.
parallel = False

boolean for turning parallel on (default is False)

perform_experiments(cases, callback=<class 'callbacks.DefaultCallback'>, reporting_interval=100, model_kwargs={}, which_uncertainties='intersection', which_outcomes='intersection', **kwargs)

Method responsible for running the experiments on a structure. In case of multiple model structures, the outcomes are set to the intersection of the sets of outcomes of the various models.

Parameters:
  • cases – In case of Latin Hypercube sampling and Monte Carlo sampling, cases specifies the number of cases to generate. In case of Full Factorial sampling, cases specifies the resolution to use for sampling continuous uncertainties. Alternatively, one can supply a list of dicts, where each dicts contains a case. That is, an uncertainty name as key, and its value.
  • callback – Class that will be called after finishing a single experiment,
  • reporting_interval – parameter for specifying the frequency with which the callback reports the progress. (Default is 100)
  • model_kwargs – dictionary of keyword arguments to be passed to model_init
  • which_uncertainties – keyword argument for controlling whether, in case of multiple model structure interfaces, the intersection or the union of uncertainties should be used. (Default is intersection).
  • which_uncertainties – keyword argument for controlling whether, in case of multiple model structure interfaces, the intersection or the union of outcomes should be used. (Default is intersection).
  • kwargs – generic keyword arguments to pass on to callback
Returns:

a structured numpy array containing the experiments, and a dict with the names of the outcomes as keys and an numpy array as value.

suggested use

In general, analysis scripts require both the structured array of the experiments and the dictionary of arrays containing the results. The recommended use is the following:

>>> results = ensemble.perform_experiments(10000) #recommended use
>>> experiments, output = ensemble.perform_experiments(10000) #will work fine

The latter option will work fine, but most analysis scripts require to wrap it up into a tuple again:

>>> data = (experiments, output)

Another reason for the recommended use is that you can save this tuple directly:

>>> import expWorkbench.util as util
>>> util.save_results(results, file)

Note

The current implementation has a hard coded limit to the number of designs possible. This is set to 50.000 designs. If one want to go beyond this, set self.max_designs to a higher value.

perform_outcome_optimization(reporting_interval=100, obj_function=None, weights=(), nr_of_generations=100, pop_size=100, crossover_rate=0.5, mutation_rate=0.02, **kwargs)

Method responsible for performing outcome optimization. The optimization will be performed over the intersection of the uncertainties in case of multiple model structures.

Parameters:
  • reporting_interval – parameter for specifying the frequency with which the callback reports the progress. (Default is 100)
  • obj_function – the objective function used by the optimization
  • weights – tuple of weights on the various outcomes of the objective function. Use the constants MINIMIZE and MAXIMIZE.
  • nr_of_generations – the number of generations for which the GA will be run
  • pop_size – the population size for the GA
  • crossover_rate – crossover rate for the GA
  • mutation_rate – mutation_rate for the GA
perform_robust_optimization(cases, reporting_interval=100, obj_function=None, policy_levers={}, weights=(), nr_of_generations=100, pop_size=100, crossover_rate=0.5, mutation_rate=0.02, **kwargs)

Method responsible for performing robust optimization.

Parameters:
  • cases – In case of Latin Hypercube sampling and Monte Carlo sampling, cases specifies the number of cases to generate. In case of Full Factorial sampling, cases specifies the resolution to use for sampling continuous uncertainties. Alternatively, one can supply a list of dicts, where each dicts contains a case. That is, an uncertainty name as key, and its value.
  • reporting_interval – parameter for specifying the frequency with which the callback reports the progress. (Default is 100)
  • obj_function – the objective function used by the optimization
  • policy_levers – A dictionary with model parameter names as key and a dict as value. The dict should have two fields: ‘type’ and ‘values. Type is either list or range, and determines the appropriate allele type. Values are the parameters to be used for the specific allele.
  • weights – tuple of weights on the various outcomes of the objective function. Use the constants MINIMIZE and MAXIMIZE.
  • nr_of_generations – the number of generations for which the GA will be run
  • pop_size – the population size for the GA
  • crossover_rate – crossover rate for the GA
  • mutation_rate – mutation_rate for the GA
processes = None

In case of parallel computing, the number of processes to be spawned. Default is None, meaning that the number of processes will be equal to the number of available cores.

set_model_structure(modelStructure)

Set the model structure. This function wraps the model structure in a tuple, limiting the number of model structures to 1.

Parameters:modelStructure – a ModelStructureInterface instance.