Submitted by spiritualquestions t3_10ha8gp in MachineLearning
I am beginning a problem similar to the one bellow for my work.
There is a score 1-4 (1 is bad, 4 is very good) of a persons back sprain recovery. The data we have are back sprain recovery scores recorded after two weeks, 3 months and 6 months, along with information (features) about their behavior like sleep, medications, diet, and exercise.
We want to predict there 2 week, 3 month, and 6 month back sprain recovery scores based on their initial behavior inputs. For example, given a user sleeps 8 hours a day, consumes x amount of sugar, does physical therapy 4 days a week, and takes x medication, what will there recovery scores be at 2 weeks, 3 months and 6 months?
The training data would look like:
​
Sleep Average | Medication | Days of Physical Therapy | Diet | Week 2 recovery score | Month 3 recovery score | Month 6 recovery score |
---|---|---|---|---|---|---|
9 hours per night | Advil | 4 days/ week | Healthy | 2 | 3 | 4 |
5 hours per night | None | 0 days/week | Unhealthy | 1 | 2 | 2 |
​
I want a model (or multiple models) to predict 3 values which is the 2 week, 3 month, and 6 month scores. I am not familiar with time series, but it seems like the data may be too sparse.
Should I be using time series here, or should I create 3 classification models?
suflaj t1_j57ce83 wrote
This looks like something for XGBoost. In that case you're looking at the
XGBRegressor
class.Your X are the first 4 features, your Y are the 3 outputs. You will need to convert the medication to a one-hot vector representation, and the diet will presumably be enumerated into whole numbers sorted by healthiness.