coffeecoffeecoffeee t1_is1pv58 wrote on October 12, 2022 at 5:43 PM

I'm looking for advice on identifying clusters of people, each of whom has longitudinal data.

I have data structured as a multivariate time series of exactly 28 days for each of a large number of people. (The days themselves differ from person to person, but each person's days are always consecutive and a given person's Day D is the same day for every observation in the multivariate time series). Each person-day is associated with a bunch of nonnegative counts, many of which are 0.

For further clarification, a given person's data looks something like this, where Obs d corresponds to the observation of a given feature on Day d: "Feature A: [10, 9, 0, 2, 0, 0, ..., obs27a, 3], Feature B: [38, 12, 0, 3, 0, 0, ..., obs27b, 0], Feature C: [12, 6, 0, 10, 0, 0, ...obs27c, 13]".

What are some recommended approaches towards identifying clusters of people when the data is structured like this? I've considered mixture modeling with a random effect on person but it's not obvious how to fit one when there's no response variable. I've also looked into self-organizing maps but they look like they're for clustering time series, rather than individuals who have longitudinal data. I also recently discovered the Croston method for demand forecasting of intermittent time series, which is a modified EWMA, but it sounds like it's more useful for smoothing, and I'd still have to figure out how to cluster the smoothed time series'.