`clusterer`¶

Created on Sep 8, 2011

Code author: gyucel <g.yucel (at) tudelft (dot) nl>, jhkwakkel <j.h.kwakkel (at) tudelft (dot) nl>

a reworking of the cluster. The distance metrics have now their own .py file. The metrics available are currently stored in the distance_functions dictionary.

class clusterer.Cluster(cluster_no, all_ds_indices, sample_ds_index, runLogs, dist_clust)¶

Contains information about a data-series cluster, as well as some methods to help analyzing a cluster. Basic attributes of a cluster (e.g. c) object are as follows;

c.no : Cluster number/index

c.indices : Original indices of the dataseries that are in cluster c

c.sample : Original index of the dataseries that is the representative of cluster c (i.e. median element of the cluster)

c.size : Number of elements (i.e. dataseries) in the cluster c

clusterer.cluster(data, outcome, distance='gonenc', interClusterDistance='complete', cMethod='inconsistent', cValue=2.5, plotDendrogram=True, plotClusters=True, groupPlot=False, **kwargs)¶

Method that clusters time-series data from the specified cpickle file according to a selected distance measure.

Parameters:

data – return from meth:perform_experiments.
outcome – Name of outcome/variable whose behavior is being analyzed
distance – The distance metric to be used.
interClusterDistance – How to calculate inter cluster distance. see linkage for details.
cMethod – Cutoff method, see fcluster for details.
cValue –
Cutoff value, see fcluster for details.
plotDendogram – Boolean, if true, plot dendogram.
plotCluster – Boolean, true if you want to plot clusters.
groupPlot – Boolean, if true plot clusters in a single window, else the clusters are plotted in separate windows.

Return type:

A tuple containing the list of distances, the list of clusters (a Cluster object for each cluster), and a list of logged distance metrics for each time series.

The remainder of the arguments are passed on to the specified distance function.

Gonenc Distance:

‘distance’: String that specifies the distance to be used.

Options: bmd (default), mse, sse
‘filter?’: Boolean that specifies whether the data series will be

filtered (for bmd distance)
‘slope filter’: A float number that specifies the filtering threshold

for the slope (for every data point if change__in_the_ outcome/average_value_of_the_outcome < threshold, consider slope = 0) (for bmd distance)
‘curvature filter’: A float number that specifies the filtering

threshold for the curvature (for every data point if change__in_the_slope/average_value_of_the_slope < threshold, consider curvature = 0) (for bmd distance)
‘no of sisters’: 50 (for bmd distance)

clusterer.construct_distances(data, distance='gonenc', **kwargs)¶

Constructs a n-by-n matrix of distances for n data-series in data according to the specified distance.

Distance argument specifies the distance measure to be used. Options, which are defined in clusteringDistances.py, are as follows.

gonenc: a distance based on qualitative dynamic pattern features
willem: a disance mainly based on the presence of crisis-periods and

the overall trend of the data series
sse: regular sum of squared errors
mse: regular mean squared error

SSE and MSE are in clusterinDistances.py and don’t work right now.

others will be added over time

clusterer.get_drow_index(i, j, size)¶

Get the index in the distance row for the distance between i and j.

:param i; result i :param j: result j :param size: the number of results

...note:: i > j

Navigation

Other Sub Sites

Quick search

clusterer¶

Navigation

`clusterer`¶