Created on Sep 8, 2011
Code author: gyucel <g.yucel (at) tudelft (dot) nl>, jhkwakkel <j.h.kwakkel (at) tudelft (dot) nl>
a reworking of the cluster. The distance metrics have now their own .py file. The metrics available are currently stored in the distance_functions dictionary.
Contains information about a data-series cluster, as well as some methods to help analyzing a cluster. Basic attributes of a cluster (e.g. c) object are as follows;
- c.no : Cluster number/index
- c.indices : Original indices of the dataseries that are in cluster c
- c.sample : Original index of the dataseries that is the representative of cluster c (i.e. median element of the cluster)
- c.size : Number of elements (i.e. dataseries) in the cluster c
Method that clusters time-series data from the specified cpickle file according to a selected distance measure.
Parameters: |
|
---|---|
Return type: | A tuple containing the list of distances, the list of clusters (a Cluster object for each cluster), and a list of logged distance metrics for each time series. |
The remainder of the arguments are passed on to the specified distance function.
Gonenc Distance:
Options: bmd (default), mse, sse
filtered (for bmd distance)
for the slope (for every data point if change__in_the_ outcome/average_value_of_the_outcome < threshold, consider slope = 0) (for bmd distance)
threshold for the curvature (for every data point if change__in_the_slope/average_value_of_the_slope < threshold, consider curvature = 0) (for bmd distance)
‘no of sisters’: 50 (for bmd distance)
Constructs a n-by-n matrix of distances for n data-series in data according to the specified distance.
Distance argument specifies the distance measure to be used. Options, which are defined in clusteringDistances.py, are as follows.
gonenc: a distance based on qualitative dynamic pattern features
the overall trend of the data series
sse: regular sum of squared errors
mse: regular mean squared error
SSE and MSE are in clusterinDistances.py and don’t work right now.
others will be added over time
Get the index in the distance row for the distance between i and j.
:param i; result i :param j: result j :param size: the number of results
...note:: i > j