Sure keep in mind that my data is a little old but last time I peeked into the g...

strogonoff · on March 7, 2021

This is super informative!

In my case it’s different since Git isn’t accessed by users directly, rather I’m working on some tools that work on top of Git (on user’s machine). Data is primarily text-based, though sometimes binary assets come up (options for offloading them out of Git are being investigated).

So far there were no major issues. I predict degradation over time as repos grow in size and history (Git is not unique in this regard, but it’ll probably be more rapid and easier to observe with Git), so we might start using partial cloning.

(I stand by the idea that using straight up Git for data is something to consider, but with an amendment that it’s predominantly text data, not binary assets.)

vvanders · on March 7, 2021

Yeah, my experience has been that you start seeing issues with long delta decompression times around the 1-2gb mark. That climbs quicker of you have binary formats that push the delta compression algorithm into cases where it does poorly(which makes sense since it was optimized for source code).

If you have binary assets and they don't support merging or regeneration from source artifacts that mandates locking(ideally built into SCM but I've seen wiki pages in a pinch at small scale).