git supports large files, it just can't track changes in binary files efficientl...

pooya13 · on May 13, 2021

It tracks the changes fine. It’s just that it doesn’t make sense to track changes in a binary.

AstralStorm · on May 13, 2021

It does make sense, and there are forms of delta compression particularly suited to various binary formats, which if combined with a unpacker for compressed files make great sense. However, git does not have an efficient binary diff implemented yet.

LRzip happens to have such a format preprocessor that would make for exceedingly efficient binary history at cost of being more similar to git pack file than incremental versions.

Then again, GitHub in particular sets a very low limit on binary size in version control.

rurban · on May 13, 2021

In embedded almost everybody uses efficient binary delta diffs and patching for DFOTA (delta firmware over the air update). Jojodiff exists as GPL and MIT variants.

http://jojodiff.sourceforge.net/

https://github.com/janjongboom/janpatch

Rsync is also very popular, even if not that efficient. xdelta, bsdiff, BDelta, bdiff are all crap.

cerved · on May 13, 2021

So I decided to check this out. Used dd if=/dev/random to create a 100mb file, checked that in, used dd again to modify 10mb of that file, checked that in and the result were two 98mb objects.

Tracking changes of binaries makes a lot of sense if you use that to only store incremental changes to the file. Git stores each modification of a binary file as a separate blob since it doesn't know how to track its changes.

This is mitigated in large parts by the compression applied in git-gc, after packed, objects went from 196mb to 108mb.

smileybarry · on May 13, 2021

Git LFS has the advantage of not pulling all versions of a large file, too. Instead, it only pulls the version it's checking out.

In our project it helped dramatically as you only pull X MB instead of X * Y MB when a CI or developer clone the (already big) repo.

danudey · on May 13, 2021

This is true. Git-LFS can dramatically increase the size of the repository on-disk (e.g. in our GitLab cluster), but dramatically decrease the size of the clone a user must perform to get to work.

Note that this can now be accomplished with Git directly, by using --filter=blob:none when you clone; this will cause Git to basically lazy-load blobs (i.e. file contents) by only downloading blobs from the server when necessary (i.e. when checkout out, when doing a diff, etc).

cerved · on May 13, 2021

I prefer keeping large files out of source control and thus far I've not encountered a problem where their introduction has been required.

rakmos · on May 13, 2021

While I share the sentiment of keeping large files out of source control, one use-case I believe warrants having large files in source control is game development.

cerved · on May 13, 2021

that's about the only conceivable niche I can think of, even then I'm skeptical about turning the VCS into an asset manager

You can't diff and I'm not not convinced the VCS should carry the burden of version controlling assets. Seems better to have a separate dedicated system for such purposes

Then again I don't do game development so I'm not familiar with the requirements of such projects