fedegarzar

fedegarzar OP t1_jaev47v wrote

That's an interesting question. Behind the scenes, BigQuery uses an auto Arima model to extrapolate the trend of the time series after deseasonalizing them (https://cloud.google.com/bigquery-ml/docs/reference/standard-sql/bigqueryml-syntax-create-time-series). I would say that the complexity of the pipeline makes it slower (also our implementations use numba which speeds up the fitting time).

3

fedegarzar OP t1_jaeue2g wrote

Yes, I agree with your intuitions. However, we used the datasets from the official BigQuery tutorial (https://cloud.google.com/bigquery-ml/docs/arima-speed-up-tutorial). In particular, it isn't easy to generalize in time series forecasting due to the diversity of the datasets of the field. The central intuition of the experiment is that running less sophisticated methods and pipelines could be a better practice before using AutoML as is.

3

fedegarzar OP t1_j048qe0 wrote

Here is the step-by-step guide to reproducing Amazon Forecast: https://nixtla.github.io/statsforecast/examples/aws/amazonforecast.html

As you can see, all the exogenous variables of M5 are included in Amazon Forecast.

Concretely, if you read the same link you posted, we even provide links to the Static and temporal exogenous variables you mention.

From the ReadMe:

The data are ready for download at the following URLs:

2

fedegarzar OP t1_izycx10 wrote

  1. We did not run those experiments. But in our opinion, it's easier to maintain a python pipeline than using the UI or CLI of AWS.

  2. In terms of scalability, I think StatsForecast wins by far, given that it takes a lot less time to compute and supports integration with spark and ray.

  3. The point of the whole experiment is to show that the AutoML solution is far more expensive in the long run.

27