ema workbench

Other Sub Sites

scenario_discovery

Code author: jhkwakkel <j.h.kwakkel (at) tudelft (dot) nl>

This module contains helper functions related to scenario discovery. These functions can be used to transform a classification tree into a list of boxes, identical to those returned by the PRIM algorithm. There are also functions here for calculating the scenario discovery metrics coverage and density.

scenario_discovery.calculate_sd_metrics(boxes, y, threshold, threshold_type)

Function for calculating the coverage and density scenario discovery metrics.

Parameters:
  • boxes – A list of PRIM boxes.
  • y – The y vector used in generating the boxes. This is typically the return from a classify function.
  • threshold – the threshold of the output space that boxes should meet.
  • threshold_type – If 1, the boxes should go above the threshold, if -1 the boxes should go below the threshold, if 0, the algorithm looks for both +1 and -1.
Returns:

The list of PRIM boxes with coverage and density added to each box as additional attribute.

Coverage and density are given below:

coverage=\frac
            {{\displaystyle\sum_{y_{i}\in{B}}y_{i}{'}}}
            {{\displaystyle\sum_{y_{i}\in{B^I}}y_{i}{'}}}

where y_{i}{'}=1 if x_{i}\in{B} and y_{i}{'}=0 otherwise.

Coverage is the ratio of cases of interest in a box to the total number of cases of interests. It thus provides insight into which fraction of cases of interest is in a particular box.

density=\frac
            {{\displaystyle\sum_{y_{i}\in{B}}y_{i}{'}}}
            {{\displaystyle\left|{y_{i}}\right|\in{B}}}

where y_{i}{'}=1 if x_{i}\in{B} and y_{i}{'}=0 otherwise, and {\displaystyle\left|{y_{i}}\right|\in{B}} is the cardinality of y_{i}.

Density is the ratio of the cases of interest in a box to the total number of cases in that box. density is identical to the mean in case of a binary classification. For more detail on these metrics see Bryant and Lempert (2010)

scenario_discovery.make_boxes(tree, data, classify, threshold)

Function that turns a classification tree into prim boxes, including the scenario discovery metrics.

Parameters:
  • tree – the return from orangeFunctions.tree().
  • data – the return from perform_experiments().
  • classify – the classify function used in making the tree.
  • threshold – the minimum mean that the boxes should meet.
Returns:

a list of prim boxes.

scenario_discovery.find_branches(node, vars)

Recursive function for finding branches in a tree.

Parameters:
  • node – The node from which you want to find branches.
  • vars – The variables found so far and their limits.
Returns:

A list of branches. Each branch is in turn a list, starting from the distribution in the leaf, back to the root of the tree. For each split in the tree, it gives the name of the variable, and the split condition. The split condition is given as a string. So. A branch is given from the bottom up.

example of use

>>> tree = analysis.orangeFunctions.tree(data, classify_function)
>>> startNode = tree.tree
>>> branches = find_branches(startNode, ["root", ""])