Viewing a single comment thread. View all comments

MyActualUserName99 t1_jaes8gh wrote

My biggest concerns with this assessment is the lake of dataset diversity. Sure, you can get one method to outperform another on one or two datasets, but to be able to do so across many datasets, all of various sizes, is much much harder.

From what I can tell, the open source StatsForecast was able to outperform BigQuery for an extremely small dataset (Citibike Trips) and one large dataset (Liquor Sales). Granted the much larger dataset, to me, is much more impressive to outperform upon than the smaller. But to make such a definitive conclusion that Open Source is better than commercial would require testing across a plethora of datasets, all of different sizes, domains, etc.

7

fedegarzar OP t1_jaeue2g wrote

Yes, I agree with your intuitions. However, we used the datasets from the official BigQuery tutorial (https://cloud.google.com/bigquery-ml/docs/arima-speed-up-tutorial). In particular, it isn't easy to generalize in time series forecasting due to the diversity of the datasets of the field. The central intuition of the experiment is that running less sophisticated methods and pipelines could be a better practice before using AutoML as is.

3