Viewing a single comment thread. View all comments

rajatarya OP t1_j0gzh2r wrote

Oh I forgot to mention - yes! mapping model to training data is a key part of reproducibility. 100% agree!

Using XetHub you can _finally_ commit the data, features, models, and metadata all in one place (along with the code). Have full confidence everything is aligned & working.

4

Liorithiel t1_j0h19at wrote

> finally

I was doing so with git annex for a long time, so this is a bit of a stretch that it wasn't possible in the past. Kind of a Schmidhuber moment…

Still, nice work with the merkle tree!

2

rajatarya OP t1_j0h7npz wrote

True :) I haven't used `git annex` myself so for me it felt like _finally_ when I could put all parts of the project in one place with XetHub.

How do you like using git annex? Are you working with others on your projects - does git annex help support team collaboration?

Again, appreciate the comment!

3

Liorithiel t1_j0hehga wrote

> How do you like using git annex? Are you working with others on your projects - does git annex help support team collaboration?

Right now I've got one large 5 TB repository with general media and archives, and some smaller project-specific repos. Slow with many small files (like, over 1 million), but very easy to set up. Haven't tried collaboration, I've mostly worked with projects where my collaborators were rather less technical. My main use case was working with the same dataset on different computers, and for that it was more than enough.

2