Created on Sep 8, 2011
Code author: gyucel <g.yucel (at) tudelft (dot) nl>, jhkwakkel <j.h.kwakkel (at) tudelft (dot) nl>
a reworking of the cluster. The distance metrics have now their own .py file. The metrics available are currently stored in the distance_functions dictionary.
Method that clusters time-series data from the specified cpickle file according to a selected distance measure.
Parameters: |
|
---|---|
Return type: | A tuple containing the list of distances, the cluster allocation, and a list of logged distance metrics for each time series. |
The remainder of the arguments are passed on to the specified distance function. See the distance functions for details on these parameters.
Constructs a n-by-n matrix of distances for n data-series in data according to the specified distance.
Distance argument specifies the distance measure to be used. Options, which are defined in clusteringDistances.py, are as follows.
gonenc: a distance based on qualitative dynamic pattern features
the overall trend of the data series
sse: regular sum of squared errors
mse: regular mean squared error
SSE and MSE are in clusterinDistances.py and don’t work right now.
others will be added over time
The distance measures the proximity of data series in terms of their qualitative pattern features. In order words, it quantifies the proximity between two different dynamic behaviour modes.
It is designed to work mainly on non-stationary data. It’s current version does not perform well in catching the proximity of two cyclic/repetitive patterns with different number of cycles (e.g. oscillation with 4 cycle versus oscillation with 6 cycles).
Parameters: |
|
---|
The MSE (mean squared-error) distance is equal to the SSE distance divided by the number of data points in data series.
The SSE distance between two data series is equal to the sum of squared-errors between corresponding data points of these two data series. Let the data series be of length N; Then SSE distance between ds1 and ds2 equals to the sum of the square of error terms from 1 to N, where error_term(i) equals to ds1(i)-ds2(i)
Given that SSE is calculated as given above, MSE equals SSE divided by N.
As SSE distance, the MSE distance only works with data series of equal length.
The SSE (sum of squared-errors) distance between two data series is equal to the sum of squared-errors between corresponding data points of these two data series. Let the data series be of length N; Then SSE distance between ds1 and ds2 equals to the sum of the square of error terms from 1 to N, where error_term(i) equals to ds1(i)-ds2(i)
Since SSE calculation is based on pairwise comparison of individual data points, the data series should be of equal length.
SSE distance equals to the square of Euclidian distance, which is a commonly used distance metric in time series comparisons.
Let ds1(.) and ds2(.) be two data series of length N. Then; A equals to the summation of ds1(i).ds2(i) from i=1 to N B equals to the square-root of the (summation ds1(i)^2 from i=1 to N) C equals to the square-root of the (summation ds1(i)^2 from i=1 to N)
distance_triangle = A/(B.C)
The triangle distance works only with data series of the same length
In the literature, it is claimed that the triangle distance can deal with noise and amplitude scaling very well, and may yield poor results in cases of offset translation and linear drift.