ema workbench

Other Sub Sites

orange_functions

Code author: jhkwakkel <j.h.kwakkel (at) tudelft (dot) nl>

This module contains some convenience functions that wrap machine learning algorithms implemented in orange

The current wrappers use default values for the various parameters that can be specified. Follow the provided links to the orange functions that are being wrapped for more details.

build_orange_data() can be used as a starting point if one wants to use other algorithms provided by orange.

Where appropriate, the relevant documentation from orange has been used.

orange_functions.build_orange_data(data, classify)

helper function for turning the data from perform_experiments() into a data object that can be used by the various orange functions.

For more details see orange domain

Parameters:
  • data – return from perform_experiments().
  • classify – function to be used for determining the class for each run.
orange_functions.random_forest(data, classify, nrOfTrees=100, attributes=None)

make a random forest using orange

For more details see orange ensemble

Parameters:
  • data – data from perform_experiments().
  • classify – function for classifying runs.
  • nrOfTrees – number of trees in the forest (default: 100).
  • attributes – Number of attributes used in a randomly drawn subset when searching for best attribute to split the node in tree growing (default: None, and if kept this way, this is turned into square root of attributes in example set).
Return type:

an orange random forest.

orange_functions.feature_selection(data, classify, k=5, m=100)

perform feature selection using orange

For more details see orange feature selection and orange measure attribute

the default measure is ReliefF ((MeasureAttribute_relief in Orange).

Parameters:
  • data – data from perform_experiments().
  • classify – function for classifying runs.
  • k – the number of neighbors for each example (default 5).
  • m – number of examples to use, Set to -1 to use all (default 100).
Return type:

sorted list of tuples with uncertainty names and reliefF attribute scores.

Orange provides other metrics for feature selection

  • Information Gain
  • Gain ratio
  • Gini index
  • Relevance of attributes
  • Costs

If you want to use any of of these instead of ReliefF, use the code supplied here as a template, but modify the measure. That is replace:

measure = orange.MeasureAttribute_relief(k=k, m=m)

with the measure of choice. See the above provided links for more details.

orange_functions.random_forest_measure_attributes(data, classify)

performs feature selection using random forests in orange.

For more details see orange ensemble

Parameters:
  • data – data from perform_experiments().
  • classify – function for classifying runs.
  • nrOfTrees – number of trees in the forest (default: 100).
  • attributes – Number of attributes used in a randomly drawn subset when searching for best attribute to split the node in tree growing. (default: None, and if kept this way, this is turned into square root of attributes in example set)
Return type:

sorted list of tuples with uncertainty names and importance values.

orange_functions.tree(data, classify, sameMajorityPruning=False, mForPruning=0, maxMajority=1, minSubset=0, minExamples=0)

make a classification tree using orange

For more details see orange tree

Parameters:
  • data – data from perform_experiments()
  • classify – function for classifying runs
  • sameMajorityPruning – If true, invokes a bottom-up post-pruning by removing the subtrees of which all leaves classify to the same class (default: False).
  • mForPruning – If non-zero, invokes an error-based bottom-up post-pruning, where m-estimate is used to estimate class probabilities (default: 0).
  • maxMajority – Induction stops when the proportion of majority class in the node exceeds the value set by this parameter (default: 1.0).
  • minSubset – Minimal number of examples in non-null leaves (default: 0).
  • minExamples – Data subsets with less than minExamples examples are not split any further, that is, all leaves in the tree will contain at least that many of examples (default: 0).
Return type:

a classification tree

in order to print the results one can for example use graphviz.

>>> import orgnTree
>>> tree = tree(input, classify)
>>> orngTree.printDot(tree, r'..\..\models\tree.dot', 
                  leafStr="%V (%M out of %N)") 

this generates a .dot file that can be opened and displayed using graphviz. the leafStr keyword argument specifies the format of the string for each leaf. See on this also the more detailed discussion on the orange web site.

At some future state, a convenience function might be added for turning a tree into a networkx graph. However, this is a possible future addition.