git-annex is an interesting alternative the HTTP-first nature of Git LFS and the...

riedel · on May 13, 2021

dispite loving the idea of git annex and having tried it multiple time in my workflows, it was really to complex to wrap my head around for simple use cases. Also i never got it to run on cygwin which is essential to me because it heavily uses symlinks (havent checked if the Windows version finally supports native symlinks). The examples are all about nontech creative ppl but i never managed to explain it to anyone who wanted to check in just large amounts of graphics to a bit repo...

krupan · on May 13, 2021

I don't know about the windows problem, but git annex can be configured to be pretty simple to use. See:

https://bryan-murdock.blogspot.com/2020/03/git-annex-is-grea...

theamk · on May 13, 2021

Last time we used git-annex was a few years ago, and it was too decentralized: the "sync" command that we used to download the remote content would also upload the information about current state.

This means there are were no read-only operations: you just want some files.. and that throwaway clone and CI machine would get recorded into the global repo state. If you are not careful, and will be propagated forever and would appear in the various reports.

rakoo · on May 13, 2021

That's the whole premise of git-annex: not distributing content but distributing what machine has the content. If you just want to get the content you have to hack git-annex, probably by reading the manifest, to get the url and download content in a third party process

cesarb · on May 13, 2021

With git-annex you have the same "one-way door" behavior: it replaces large files with a pointer to the content (in git-annex, it's a relative symbolic link which by default encodes the real file's size and hash), which is stored in git-annex's own database.

dwohnitmok · on May 13, 2021

Sort of. The way that the author talks about Mercurial as not having this problem makes me think they're talking about something related but subtly different. In particular, AFAICT, Mercurial requires the exact same thing as what you're pointing out. If you want to completely disable use of largefiles then you still have to run `hg lfconvert` at some point. That also changes your revision history.

The "one-way door" as I understand the article to be describing is talking about the additional layer of centralization that Git LFS brings. In particular it's pretty annoying to have to always spin up a full HTTPS server just to be able to have access to your files. There is now always a source of truth that is inconvenient to work around when you might still have the files lying around on a bunch of different hard drives or USB drives.

Whereas with git-annex, it is true that without rewriting history, even if you disable git-annex moving forward, you'll still have symlinks in your git history. However, as long as you still have your exact binary files sitting around somewhere, you can always import them back on the fly, so e.g. to move away from git-annex you can just commit the binary files directly to your git directory and then just copy them out to a separate folder whenever you go back to an old commit and re-import them.

But perhaps I'm interpreting the author incorrectly, in which case it's hard for me to see how any solution for large files in git would allow you to move back without rewriting history to an ordinary git repository without large file support.

petertodd · on May 13, 2021

> so e.g. to move away from git-annex you can just commit the binary files directly to your git directory and then just copy them out to a separate folder whenever you go back to an old commit and re-import them.

Exactly. Here's an (anonymized) example of a git-annex symlink from one of my repos:

    ../../.git/annex/objects/AA/BB/SHA256-s123456--abcdf...1234/SHA256-s8968192--abcdf...1234

It's just a link to a file with a SHA256 hash in the name and path. The simplest way to reconstruct that in the future is to just check-in the whole `objects` directory into the repo, and copy/symlink it back to `.git/annex` when needed. You definitely don't need the git-annex software itself to view the data in the future.

I personally have hundreds of gigabytes of data in git-annex repos. It works great!

neandrake · on May 13, 2021

I don’t think it’s clear but mercurial has two solutions for large file support. The original “largefiles” which had all the same designs and issues as Git LFS they bring up in the blog post, and “lfs” which is newer.

I’ve used largefiles and ran into these issues and ended up having to turn it off after a few years because it’s so problematic with the tooling since it modifies the underlying mercurial commit structure like git lfs.

However it sounds like mercurial lfs is different in that it only modifies the transport layer, though I’m not totally clear on the details and have been meaning to look into it further.

dwohnitmok · on May 13, 2021

To preface: though I've read a fair amount about Mercurial, I can count on my fingers the number of times I've actually used a Mercurial repo and I've used largefiles only ever as a toy, so I am very much a Mercurial newbie. So there is a chance I may get something wrong here.

However, my impression is that in fact largefiles is basically the only game in town and Mercurial LFS if anything is meant to be even more like Git LFS to the point of being compatible with it.

The thing I'm more curious about is I don't immediately see how large file support in git (or mercurial), whether implemented as a separate tool or natively, could ever feasibly be "transparently erasable," that is rewindable back to be absolutely identical to a repository with no large files support without rewriting revision history.

It doesn't seem impossible (e.g. maybe you could somehow maintain a duplicate shadow revision history and transparently intercept syscalls?), but the approaches I can think of all have pretty hefty downsides and feel even more like hacks than the current crop of tools.

remram · on May 13, 2021

That content can easily be moved in bulk though. It is true that you have to use git-annex command to do so, but this is different from LFS where the complete set of historical files is only stored on the server and can't be moved at all.

edit: The article claims it's a "one-way door" because you can't move to an altogether different system without rewriting history, which is true of git-annex. My bad.