Viewing a single comment thread. View all comments

sweeetscience t1_iy0hh50 wrote

Get a workstation. We used GCP/Vertex to do batch prediction on a computer vision model, but for larger videos it inexplicably fails. Google has spent 6 weeks now trying to figure out why it doesn’t work (everyone, including Google engineers, are in agreement that the model container is not the problem). They still don’t have an answer.

We ended up investing in building our own multi-GPU server and not only are our prediction times better, but we can instantly see and diagnose issues that arise.

One of the often overlooked aspects of using public clouds is that there are several layers of abstraction that remove you from what’s happening under the hood. If something happens behind the scenes that you can’t readily diagnose and fix yourself, you’re basically at the mercy of AWS et al to provide you with an answer.

For 10-12k, you can get a handful of high end consumer cards and a boatload of memory, and you have full control of the system.

8

Character-Ad9862 OP t1_iy0n2uw wrote

Really appreciate your insights. Having that extra dependency layer is something that has also worried me.

3