Viewing a single comment thread. View all comments

Zealousideal-Card637 t1_izy2kh1 wrote

Interested comparison. I looked at the full experiments, and Amazon performs slightly better on the bottom level, the actual time series you are forecasting.

44

SherbertTiny2366 t1_izy50ew wrote

For Hierarchical and sparse data it is quite common to see models achieving good accuracy in the bottom levels but being very bad at higher aggregation levels. This is the case because the models are systematically under or over predicting.

29

mangotheblackcat89 t1_izzighp wrote

IMO, this is an important consideration. Sure, the target level is SKU-store, but at what level are the purchase orders being made? The M5 Competition didn't say anything about this, but probably the SKU level is as important as the SKU-store, if not more.

For retail data in general, I think we need to see how well a method perfoms at different levels of the hierarchy. I've seen commercial and finance teams prefer a forecast that is more accurate at the top than another that is slightly more accurate at the bottom.

4

-Rizhiy- t1_j018jx5 wrote

Do you by any chance have a resource that explains that a bit more?

I can't get my head around how a collection of accurate forecasts, can produce an inaccurate aggregate.

Is it related to class imbalances or perhaps something like Simpson's paradox?

2

SherbertTiny2366 t1_j01t4du wrote

Imagine this toy example. You have 5 series, which are very sparse, as is often the case in retail. For example, series 1 has sales on Mondays and 0's the rest of the days, series 2 on Tuesdays, series 3 on Wednesdays, and so on. For those individual series, a value close to 0 would be more or less accurate, however, when you add all the predictions up, the value will be way below the true value.

5

xgboostftw t1_j02hrl4 wrote

where do you see the full experiment? I think only the results table from Amazon is published, no?

1

fedegarzar OP t1_j04jp9e wrote

2