sweeetscience

sweeetscience t1_iy0hh50 wrote

Get a workstation. We used GCP/Vertex to do batch prediction on a computer vision model, but for larger videos it inexplicably fails. Google has spent 6 weeks now trying to figure out why it doesn’t work (everyone, including Google engineers, are in agreement that the model container is not the problem). They still don’t have an answer.

We ended up investing in building our own multi-GPU server and not only are our prediction times better, but we can instantly see and diagnose issues that arise.

One of the often overlooked aspects of using public clouds is that there are several layers of abstraction that remove you from what’s happening under the hood. If something happens behind the scenes that you can’t readily diagnose and fix yourself, you’re basically at the mercy of AWS et al to provide you with an answer.

For 10-12k, you can get a handful of high end consumer cards and a boatload of memory, and you have full control of the system.

8

sweeetscience t1_iv0q0xr wrote

This should fail since the original work is not being redistributed. To wholly recreate a repo on which Codex was trained you’d have to literally start typing the original code, and even then the contextual suggestions would likely yield a different result from the original anyways. I could be mistaken but I remember reading about some litigation in this space concerning a model trained on copyrighted data. The court ruled in favor of the defendant because the resulting model couldn’t possibly reproduce the original work. It’s tricker here because technically you could recreate the original work, but you would have to know very well what the original work was to begin with to actually recreate it, and if that’s the case what’s the point of using copilot to begin with. I could be (and probably am) wrong.

Imagine trying to recreate PyTorch from scratch using Codex or copilot. IF, and that’s a big if, one did so the author of the recreation would still have to attribute it.

Not legal advice

5