PredictorX1 t1_ixu8zje wrote on November 26, 2022 at 11:52 AM

#683,283

It sounds like you want a classifier, not clustering. If the data includes the washing programs in use at any given time, then building a classifier is possible. If that information is not available, then the remaining variables ("water consumption, electricity consumption, noise and so on") can be clustered, though the clusters may not correspond exactly to the washing programs. Just determining how many clusters should be used can be a challenge.

[deleted] OP t1_ixuca02 wrote on November 26, 2022 at 12:34 PM

#683,719

Replying to PredictorX1 (#683,283)

>It sounds like you want a classifier, not clustering. If the data includes the washing programs in use at any given time, then building a classifier is possible. If that information is not available, then the remaining variables ("water consumption, electricity consumption, noise and so

Indeed, that's why I called it clustering as the washing programs in use at any given time aren't known.
However I can determine the number of washing programs that are used in the time series (looking at the washing machine manual for example).
Say that there are 3 washing programs, therefore 4 clusters are nedeed (?) - as one cluster deals with the time where no washing is being done and the other variables should be specific to a different program.
But even so, I'm not sure how to start clustering on this type of problem.

PredictorX1 t1_ixucpv3 wrote on November 26, 2022 at 12:40 PM

#683,777

Replying to [deleted] (#683,719)

Usually, the assumption is made that the variables are equally "important", so they are standardized. Most often this is done, for each variable, by subtracting the mean then dividing by the standard deviation. Then, the data is clustered, for instance by k-means to discover the clusters. Have you gathered and prepared the data? What kind of clustering algorithms do you have?

fcwick t1_ixud7ae wrote on November 26, 2022 at 12:46 PM

#683,855

Rather than the exact clustering algorithm, I think the main issue here is the feature extraction for the clustering. https://github.com/blue-yonder/tsfresh might be useful for that.

[deleted] OP t1_ixudbhw wrote on November 26, 2022 at 12:47 PM

#683,872

Replying to PredictorX1 (#683,777)

I think I can use k-means as you mentioned for that, but wouldn't the information of time be lost then? As every timestamp would be treated independent to another - while one program (cluster) is used for a period of time not interrupted.

[deleted] OP t1_ixuegbd wrote on November 26, 2022 at 1:00 PM

#684,020

Replying to fcwick (#683,855)

Wow this is amazing!

[R] Approach to identify clusters on a time series

Comments