Viewing a single comment thread. View all comments

zenpianist t1_j01jvp7 wrote

I think real world ML Dev and OPS is much messier. Something as simple as a decoupled Inference pipeline would mean a lot to us, instead of having to retrigger the workflow when something failed. At TB scale, even snapshotting outputs from each stage became ridiculously expensive and downright impossible. Would love to see how you address those

6