> TL;DR: We compared BigQuery ML's forecasting solution with two open-source tools, StatsForecast and Fugue. The experiment concludes that BigQuery is 13% less accurate, 8 times slower, and 3.2 times more expensive than running an open-source alternative in a simple cloud cluster. You can reproduce everything yourself in a couple of lines.

https://preview.redd.it/4b44550ezyka1.png?width=632&format=png&auto=webp&v=enabled&s=c46d0453f6df6aad52ebe53e5e81898220003995

For the experiment, we the same methodology as the one used by Google to showcase its forecasting capabilities. We first tested the tools on a small dataset of approximately 400 time series, representing Citi Bike trips in New York City, before moving on to a larger dataset of over one million time series, representing liquor sales in Iowa.

Our experiment revealed that Nixtla and Fugue outperformed BigQuery regarding accuracy, speed, and cost.

The cost savings for using an open-source alternative like StatsForecast or Fugue can be substantial. In our experiment, StatsForecast and Fugue on a Databricks cluster of 16 e2-standard-32 virtual machines (GCP) cost only 12.94 USD, whereas using BigQuery costs 41.96 USD.

Google's BigQuery:

Achieved 24.13 (Mean Absolute Error, MAE) in error for the new_york.citibike_trips dataset.
Took 7.5 minutes to run the new_york.citibike_trips dataset (approximately 400 time series).
Took 1 hour 16 minutes to run the iowa_liquor_sales.sales dataset (over a million time series).
Cost 41.96 USD.

StatsForecast and Fugue trained on a databricks cluster of 16 e2-standard-32 virtual machines (GCP):

Achieved 20.96 (Mean Absolute Error, MAE) in error for the new_york.citibike_trips dataset.
Took 2 minutes to run the new_york.citibike_trips dataset (approximately 400 time series).
Took 9 minutes to run the iowa_liquor_sales.sales dataset (over a million time series).
Cost only 12.94 USD.

Overall, our experiment shows that classical methods such as StatsForecast and Fugue can outperform complex methods and pipelines like BigQuery in terms of speed, accuracy, and cost. While using StatsForecast or Fugue may require some basic knowledge of Python and cloud computing, the results are simply better.

Reproduce the experiment here: https://github.com/Nixtla/statsforecast/tree/main/experiments/bigquery.

Statement of errors: it was pointed out by Nick Akincilar that we did not include the correct DBU cost of Databricks, the corrected amounts are: 12.94 USD (open source) vs. 41.96 USD (Google).

Comments

Kinferatu t1_jaedgwb wrote on February 28, 2023 at 8:52 PM

AutoML has a significant limitation when it comes to time series analysis - the inherent nature of time series data makes it challenging to obtain clean validation signals that can extrapolate to test results. This issue is often overlooked, and it can lead to inaccurate predictions and unreliable results.

More-Horse-3281 t1_jadyg0x wrote on February 28, 2023 at 7:16 PM

I have no experience with GCP AutoML, but I have experienced heavy overfitting when using FLAML and auto-sklearn. Did you experience the same? (I.e. AutoML outperforming the open source algos on training data?) I have the feeling that a lot of AutoML solutions „cherry-pick“ models that just happened to shine on the training data.

fedegarzar OP t1_jaevmj5 wrote on February 28, 2023 at 10:52 PM

I agree. Overfitting is a common problem in AutoML solutions. A proper validation strategy should improve the performance in unseen data, but in our experience, most of the AutoML solutions lack this feature.

CyberPun-K t1_jae0m46 wrote on February 28, 2023 at 7:30 PM

While AutoML is a powerful tool for automated machine learning, it's not widely used by most people. Personally, I wouldn't pay thousands of dollars for fancy hyperparameter optimization. In most cases improvements are marginal.

One of the cool features of Big Query is its seamless integration with SQL queries, which makes data analysis much easier.

SherbertTiny2366 t1_jaebzb0 wrote on February 28, 2023 at 8:43 PM

From what I get, that is also the advantage of Fugue. From their Webpage:
> FugueSQL is designed for heavy SQL users to extend the boundaries of traditional SQL workflows. FugueSQL allows the expression of logic for end-to-end distributed computing workflows. It can also be combined with Python code to use custom functions alongside the SQL commands. It provides a unified interface, allowing the same SQL code to run on Pandas, Dask, and Spark.

https://github.com/fugue-project/fugue

MyActualUserName99 t1_jaes8gh wrote on February 28, 2023 at 10:28 PM

My biggest concerns with this assessment is the lake of dataset diversity. Sure, you can get one method to outperform another on one or two datasets, but to be able to do so across many datasets, all of various sizes, is much much harder.

From what I can tell, the open source StatsForecast was able to outperform BigQuery for an extremely small dataset (Citibike Trips) and one large dataset (Liquor Sales). Granted the much larger dataset, to me, is much more impressive to outperform upon than the smaller. But to make such a definitive conclusion that Open Source is better than commercial would require testing across a plethora of datasets, all of different sizes, domains, etc.

fedegarzar OP t1_jaeue2g wrote on February 28, 2023 at 10:43 PM

Yes, I agree with your intuitions. However, we used the datasets from the official BigQuery tutorial (https://cloud.google.com/bigquery-ml/docs/arima-speed-up-tutorial). In particular, it isn't easy to generalize in time series forecasting due to the diversity of the datasets of the field. The central intuition of the experiment is that running less sophisticated methods and pipelines could be a better practice before using AutoML as is.

No_Yogurtcloset_5639 t1_jadxmg7 wrote on February 28, 2023 at 7:11 PM

What about Vertex AI is it any better?

More-Horse-3281 t1_jae4fj3 wrote on February 28, 2023 at 7:54 PM

GCPs AutoML is part of GCP Vertex AI.

mangotheblackcat89 t1_jaef0kf wrote on February 28, 2023 at 9:02 PM

Very interesting results. The reduction in time and cost is definitely worth checking out in more detail.

tblume1992 t1_jaf3zgc wrote on February 28, 2023 at 11:52 PM

Can you guys add model selection and the results of the chosen method to make it more like what we would do in production?

cristianic18 t1_jaec55g wrote on February 28, 2023 at 8:44 PM

Very interesting comparison. Do you know why BigQuery takes much longer to run if it is using an ARIMA?

fedegarzar OP t1_jaev47v wrote on February 28, 2023 at 10:48 PM

That's an interesting question. Behind the scenes, BigQuery uses an auto Arima model to extrapolate the trend of the time series after deseasonalizing them (https://cloud.google.com/bigquery-ml/docs/reference/standard-sql/bigqueryml-syntax-create-time-series). I would say that the complexity of the pipeline makes it slower (also our implementations use numba which speeds up the fitting time).

[deleted] t1_jaed52q wrote on February 28, 2023 at 8:50 PM

[deleted]