Hacker News new | past | comments | ask | show | jobs | submit login

git supports large files, it just can't track changes in binary files efficiently and if they're large you check in a new blob every modification.

If they're just sitting around it's fine, but then why would you have them in VC




It tracks the changes fine. It’s just that it doesn’t make sense to track changes in a binary.


It does make sense, and there are forms of delta compression particularly suited to various binary formats, which if combined with a unpacker for compressed files make great sense. However, git does not have an efficient binary diff implemented yet.

LRzip happens to have such a format preprocessor that would make for exceedingly efficient binary history at cost of being more similar to git pack file than incremental versions.

Then again, GitHub in particular sets a very low limit on binary size in version control.


In embedded almost everybody uses efficient binary delta diffs and patching for DFOTA (delta firmware over the air update). Jojodiff exists as GPL and MIT variants.

http://jojodiff.sourceforge.net/

https://github.com/janjongboom/janpatch

Rsync is also very popular, even if not that efficient. xdelta, bsdiff, BDelta, bdiff are all crap.


So I decided to check this out. Used dd if=/dev/random to create a 100mb file, checked that in, used dd again to modify 10mb of that file, checked that in and the result were two 98mb objects.

Tracking changes of binaries makes a lot of sense if you use that to only store incremental changes to the file. Git stores each modification of a binary file as a separate blob since it doesn't know how to track its changes.

This is mitigated in large parts by the compression applied in git-gc, after packed, objects went from 196mb to 108mb.


Git LFS has the advantage of not pulling all versions of a large file, too. Instead, it only pulls the version it's checking out.

In our project it helped dramatically as you only pull X MB instead of X * Y MB when a CI or developer clone the (already big) repo.


This is true. Git-LFS can dramatically increase the size of the repository on-disk (e.g. in our GitLab cluster), but dramatically decrease the size of the clone a user must perform to get to work.

Note that this can now be accomplished with Git directly, by using --filter=blob:none when you clone; this will cause Git to basically lazy-load blobs (i.e. file contents) by only downloading blobs from the server when necessary (i.e. when checkout out, when doing a diff, etc).


I prefer keeping large files out of source control and thus far I've not encountered a problem where their introduction has been required.


While I share the sentiment of keeping large files out of source control, one use-case I believe warrants having large files in source control is game development.


that's about the only conceivable niche I can think of, even then I'm skeptical about turning the VCS into an asset manager

You can't diff and I'm not not convinced the VCS should carry the burden of version controlling assets. Seems better to have a separate dedicated system for such purposes

Then again I don't do game development so I'm not familiar with the requirements of such projects




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: