It sounds like you want a classifier, not clustering. If the data includes the washing programs in use at any given time, then building a classifier is possible. If that information is not available, then the remaining variables ("water consumption, electricity consumption, noise and so on") can be clustered, though the clusters may not correspond exactly to the washing programs. Just determining how many clusters should be used can be a challenge.
>It sounds like you want a classifier, not clustering. If the data includes the washing programs in use at any given time, then building a classifier is possible. If that information is not available, then the remaining variables ("water consumption, electricity consumption, noise and so
Indeed, that's why I called it clustering as the washing programs in use at any given time aren't known.
However I can determine the number of washing programs that are used in the time series (looking at the washing machine manual for example).
Say that there are 3 washing programs, therefore 4 clusters are nedeed (?) - as one cluster deals with the time where no washing is being done and the other variables should be specific to a different program.
But even so, I'm not sure how to start clustering on this type of problem.
Usually, the assumption is made that the variables are equally "important", so they are standardized. Most often this is done, for each variable, by subtracting the mean then dividing by the standard deviation. Then, the data is clustered, for instance by k-means to discover the clusters. Have you gathered and prepared the data? What kind of clustering algorithms do you have?
Rather than the exact clustering algorithm, I think the main issue here is the feature extraction for the clustering. https://github.com/blue-yonder/tsfresh might be useful for that.
I think I can use k-means as you mentioned for that, but wouldn't the information of time be lost then? As every timestamp would be treated independent to another - while one program (cluster) is used for a period of time not interrupted.
PredictorX1 t1_ixu8zje wrote
It sounds like you want a classifier, not clustering. If the data includes the washing programs in use at any given time, then building a classifier is possible. If that information is not available, then the remaining variables ("water consumption, electricity consumption, noise and so on") can be clustered, though the clusters may not correspond exactly to the washing programs. Just determining how many clusters should be used can be a challenge.