jakethesnake_
jakethesnake_ t1_j0h769x wrote
Reply to comment by rajatarya in [P] XetHub: We scaled Git to support 1 TB repos by rajatarya
To be honest, I very much doubt we'd ever let a 3rd party manage our data. We have non-sensitive data on s3, and some more sensitive data on prem. My ideal would be a VCS which either leaves the data in place, or to a dedicated on prem deployment. For commercial sensitivity and data governance reasons, transfering data to a 3rd party is a non-starter.
I doubt a 3rd party storing a Merkle tree of the data would be acceptable to our partners either. We work with sensitive information.
That being said, XetHub looks useful for me and my team. I particularly like the mounting feature. Our distributed computing system uses docker images to run jobs, and I currently download the data as needed inside the image...which works but is not efficient. I'd much prefer to mount a data repo. I think this would solve some pain points in our experimentations.
I'm off work for the next two weeks, but I'll probably experiment with XetHub in the new year - cool stuff!
jakethesnake_ t1_j0h1m7l wrote
Can I put all my data in s3, then use XueHub to manage it?
jakethesnake_ t1_j0hch5f wrote
Reply to comment by rajatarya in [P] XetHub: We scaled Git to support 1 TB repos by rajatarya
Sounds great, I'll scout out XetHub in more detail when I'm back and DM you. Thanks for the helpful answers :)
re: data governance, we have signed very strict agreements with our clients. They specify where the data resides, who has access to it and a bunch of other stuff. I'm not invovled in those types of talks with clients, but the negotiations took months. A lot of care has been taken to meet these requirements, and adding another site and unvetted company into the mix is likely going to be tricky. This seems pretty standard for enterprise clients in my experience.