Submitted by fedegarzar t3_11effj0 in MachineLearning
> TL;DR: We compared BigQuery ML's forecasting solution with two open-source tools, StatsForecast and Fugue. The experiment concludes that BigQuery is 13% less accurate, 8 times slower, and 3.2 times more expensive than running an open-source alternative in a simple cloud cluster. You can reproduce everything yourself in a couple of lines.
For the experiment, we the same methodology as the one used by Google to showcase its forecasting capabilities. We first tested the tools on a small dataset of approximately 400 time series, representing Citi Bike trips in New York City, before moving on to a larger dataset of over one million time series, representing liquor sales in Iowa.
Our experiment revealed that Nixtla and Fugue outperformed BigQuery regarding accuracy, speed, and cost.
The cost savings for using an open-source alternative like StatsForecast or Fugue can be substantial. In our experiment, StatsForecast and Fugue on a Databricks cluster of 16 e2-standard-32 virtual machines (GCP) cost only 12.94 USD, whereas using BigQuery costs 41.96 USD.
Google's BigQuery:
- Achieved 24.13 (Mean Absolute Error, MAE) in error for the new_york.citibike_trips dataset.
- Took 7.5 minutes to run the new_york.citibike_trips dataset (approximately 400 time series).
- Took 1 hour 16 minutes to run the iowa_liquor_sales.sales dataset (over a million time series).
- Cost 41.96 USD.
StatsForecast and Fugue trained on a databricks cluster of 16 e2-standard-32 virtual machines (GCP):
- Achieved 20.96 (Mean Absolute Error, MAE) in error for the new_york.citibike_trips dataset.
- Took 2 minutes to run the new_york.citibike_trips dataset (approximately 400 time series).
- Took 9 minutes to run the iowa_liquor_sales.sales dataset (over a million time series).
- Cost only 12.94 USD.
Overall, our experiment shows that classical methods such as StatsForecast and Fugue can outperform complex methods and pipelines like BigQuery in terms of speed, accuracy, and cost. While using StatsForecast or Fugue may require some basic knowledge of Python and cloud computing, the results are simply better.
Reproduce the experiment here: https://github.com/Nixtla/statsforecast/tree/main/experiments/bigquery.
​
Statement of errors: it was pointed out by Nick Akincilar that we did not include the correct DBU cost of Databricks, the corrected amounts are: 12.94 USD (open source) vs. 41.96 USD (Google).
Kinferatu t1_jaedgwb wrote
AutoML has a significant limitation when it comes to time series analysis - the inherent nature of time series data makes it challenging to obtain clean validation signals that can extrapolate to test results. This issue is often overlooked, and it can lead to inaccurate predictions and unreliable results.