Appropriate_Ant_4629 t1_jb1rhkh wrote on March 5, 2023 at 7:47 PM

Take a step back:

Start on a cloud -- renting GPUs or TPUs -- with nonsensitive data.

I know you said "but bottom line the data running through our platform is all back-office, highly sensitive business information, and many have agreements explicitly restricting the movement of data to or from any cloud services".

You shouldn't be touching such information during development anyway.

Make or find a non-sensitive dataset of similar scale for development.

Don't buy hardware up front until you have almost the entire data pipeline working well on rented servers. Rent them hourly on any of the big cloud platforms, and you'll quickly be able to quantify most of your hardware requirements. How much RAM you need in GPUs/TPUs. How much RAM you need on CPUs. How fast a storage layer you'll need.

Only after you have an at-scale dev/qa environment working on a cloud, will you have any idea what physical hardware you'd want to buy.

ChristmasInOct OP t1_jb2enle wrote on March 5, 2023 at 10:29 PM

I really appreciate this response.

I'm not planning on using any of our data or touching the infrastructure yet, but for some reason I never considered using the cloud to determine hardware configuration.

Thanks again. Exactly what I needed!