Sure keep in mind that my data is a little old but last time I peeked into the git LFS space it seemed like there were still a few gaps.
First, most of my background in this area comes from gamedev so YMMV if the same applies in your use cases.
For our usage we'd usually have a repo history size that crossed the 1TB mark and even upwards of 2-3TB in some cases. The developer sync was 150-200GB, the art sync was closer to 500-600GB and the teams were regularly churning through 50-100GB/week depending on where we were in production.
You need discipline specific views into the repo. It just speeds everything up and means that only the teams that need to take the pain have to. From a performance perspective Perforce blows the pants off anything else I've seen, SVN tries, but P4 was easily an order of magnitude faster to sync or do a clean fetch.
I've seen proxy servers done with git but it's usually some really hack thing scripted together with a ton of ductape and client-specific host overrides. When you have a team split across East Coast/West Coast(or other country) you need that proxy so that history is cached in a way that it only gets pulled in locally once. Having a split push/pull model is asking for trouble and last I checked it wasn't clear to me if stuff like git LFS actually handles locking cleanly across it.
From an overhead perspective git just falls over at ~1gb(hence git LFS, which I've seen teams use to varying degrees of success based on project size). The need to do shallow history and sidestep resolving deltas is a ton of complexity that isn't adding anything.
With a lot of assets, merging just doesn't exist and a DVCS totally falls over here. I've seen fights nearly break out in the hallway multiple times when two artist/animators both forgot to checkout a file(usually because someone missed the metadata to say it's an exclusive access file). With unmergeable binary files that don't get locked your choice is who gets to drop 1-3 days of work on the floor when the other person blows away their changes to commit. If those changes span multiple interconnected packages/formats/etc you have a hard fork that you can never bring back together.
There's a couple other details but that's the large ones, Perforce worked incredibly well in this space but it is not cheap and so I've seen teams try to go their own way to mixed success. I'll admit that you can't do a monorepo in P4(and even tools like repo in the Android world have their problems too) but if you segregate your large business/product lines across P4 repos it scales surprisingly well.
Anyway, you may or may not hit any or all of this but I've yet to see git tackle a 1TB+ repo history well(and things like repo that uses many mini-repos doesn't count in my book due to the lack of atomicity across submissions that span multiple repos).
In my case it’s different since Git isn’t accessed by users directly, rather I’m working on some tools that work on top of Git (on user’s machine). Data is primarily text-based, though sometimes binary assets come up (options for offloading them out of Git are being investigated).
So far there were no major issues. I predict degradation over time as repos grow in size and history (Git is not unique in this regard, but it’ll probably be more rapid and easier to observe with Git), so we might start using partial cloning.
(I stand by the idea that using straight up Git for data is something to consider, but with an amendment that it’s predominantly text data, not binary assets.)
Yeah, my experience has been that you start seeing issues with long delta decompression times around the 1-2gb mark. That climbs quicker of you have binary formats that push the delta compression algorithm into cases where it does poorly(which makes sense since it was optimized for source code).
If you have binary assets and they don't support merging or regeneration from source artifacts that mandates locking(ideally built into SCM but I've seen wiki pages in a pinch at small scale).
First, most of my background in this area comes from gamedev so YMMV if the same applies in your use cases.
For our usage we'd usually have a repo history size that crossed the 1TB mark and even upwards of 2-3TB in some cases. The developer sync was 150-200GB, the art sync was closer to 500-600GB and the teams were regularly churning through 50-100GB/week depending on where we were in production.
You need discipline specific views into the repo. It just speeds everything up and means that only the teams that need to take the pain have to. From a performance perspective Perforce blows the pants off anything else I've seen, SVN tries, but P4 was easily an order of magnitude faster to sync or do a clean fetch.
I've seen proxy servers done with git but it's usually some really hack thing scripted together with a ton of ductape and client-specific host overrides. When you have a team split across East Coast/West Coast(or other country) you need that proxy so that history is cached in a way that it only gets pulled in locally once. Having a split push/pull model is asking for trouble and last I checked it wasn't clear to me if stuff like git LFS actually handles locking cleanly across it.
From an overhead perspective git just falls over at ~1gb(hence git LFS, which I've seen teams use to varying degrees of success based on project size). The need to do shallow history and sidestep resolving deltas is a ton of complexity that isn't adding anything.
With a lot of assets, merging just doesn't exist and a DVCS totally falls over here. I've seen fights nearly break out in the hallway multiple times when two artist/animators both forgot to checkout a file(usually because someone missed the metadata to say it's an exclusive access file). With unmergeable binary files that don't get locked your choice is who gets to drop 1-3 days of work on the floor when the other person blows away their changes to commit. If those changes span multiple interconnected packages/formats/etc you have a hard fork that you can never bring back together.
There's a couple other details but that's the large ones, Perforce worked incredibly well in this space but it is not cheap and so I've seen teams try to go their own way to mixed success. I'll admit that you can't do a monorepo in P4(and even tools like repo in the Android world have their problems too) but if you segregate your large business/product lines across P4 repos it scales surprisingly well.
Anyway, you may or may not hit any or all of this but I've yet to see git tackle a 1TB+ repo history well(and things like repo that uses many mini-repos doesn't count in my book due to the lack of atomicity across submissions that span multiple repos).