Hacker News new | past | comments | ask | show | jobs | submit login

facebooks solution allows you to work better with shallow copies. it doesn't magically solve the size issue. it's much faster, because it doesn't do the real work. [1]

also I distinctly remember that the linux kernel, rewinds history every so often(but maybe my memory is just playing tricks on me).

git is not a backup tool, i always scold people for checking large binary files into their text file revision systems.

take a look at ori[2] for git like backup solution. git annex is also an interesting solution[3]. although he says specifically to not confuse it for a dropbox clone.

now to get back to your first point. as we established facebooks speedup happens because of difference in dealing with shallow cloned repositories. it seems to me like git is making improvements in that regard see the following:

> * Fetching from a shallowly-cloned repository used to be forbidden, primarily because the codepaths involved were not carefully vetted and we did not bother supporting such usage. This release attempts to allow object transfer out of a shallowly-cloned repository in a more controlled way (i.e. the receiver become a shallow repository with a truncated history).

[1] https://bitbucket.org/facebook/remotefilelog

[2] http://ori.scs.stanford.edu/

[3] http://git-annex.branchable.com/

EDIT: bup looks nice though, thanks for the info




You really should "scold" the folks at LibreOffice. They have a neat tool that relies on checking large binary files into git. It is actually a neat development tool:

"bibisect stands for "binary bisect" and is intended to help LibreOffice QA dealing with regressions. Regressions are a most annoying artifact that unfortunately comes with software development and QA. However, regressions are a misfeature we want to deal with quick and early as they might get harder and harder to triage and fix as time passes.

Because the way git stores its stuff, one bibisect can contain several complete Linux 64-bit office installs in a very much compressed size.

And one does not need to install them in parallel as one can switch through all of them with a quick "git checkout source-hash-XXXXXX" -- one switch costs <1 second)."

https://wiki.documentfoundation.org/QA/HowToBibisect

http://cgit.freedesktop.org/libreoffice/contrib/dev-tools/


yeah, well, that's entirely different from what we were talking about though. also speed is obviously not their focus. they just chose a comfortable approach to binary searching regressions in binary builds of libreoffice.

it's not a repo anyone will ever work with. some build bot builds it, and noone else ever commits things back. you just download the tar'd repo, and never ever commit anything or push back. it's also clearly seperated from any code repositories they have.

but maybe i misunderstood something so feel free to correct me.


Another thing that FB's solution did was to add inotify support. Git may get this feature soon -- there are already experimental patches on the mailing list for it.

http://thread.gmane.org/gmane.comp.version-control.git/24120...


I missed that about fetching from shallow cloning. Thanks for pointing that out.

I'll have to check out ori as well.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: