Rework git core for native submodules

niggler · on April 14, 2013

"This is going nowhere. You're stuck at making the current submodule system work, not answering my questions, diverting conversation, repeatedly asking the same stupid questions, labelling everything that I say "subjective", and refusing to look at the objective counterpart (aka, the code). It's clear to me that no matter how many more emails I write, you're not going to concede.

I'm not interested in wasting any more of my time with this nonsense.

I give up."

http://thread.gmane.org/gmane.comp.version-control.git/22051...

artagnon · on April 15, 2013

To clarify: "I give up" was referring to giving up on the argument in that subthread, not on the idea in general. It requires a lot of hard work and perseverance to get something this disruptive merged; I'm merely taking a break to do more groundwork before coming back with a v2 of the approach.

I have utmost respect for Junio, Linus and the others, but realize that they have some negative attributes like all human beings do. Junio can be especially defensive when it comes to something new, although it's not completely without reason. After all, we do have an ultra-stable and well-maintained piece of software because of him.

Tobu · on April 14, 2013

Hah, I thought you quoted a maintainer, but this is from the original submitter. Cheeky.

tinco · on April 14, 2013

It's not cheeky, it's desparate. He proposes an honest and well thought through idea that he spent a lot of time on, and someone he looks up to just behaves like a complete ass. Junio et al. do nothing but thinking against him, instead of with him. They'd do better just not responding at all.

jedbrown · on April 14, 2013

I followed the discussion and read it exactly opposite. Ram was putting the cart in front of the horse ("I really need you to start reviewing the code now." see replies [1,2]) and everyone else involved in the discussion wanted to understand the benefits first. Junio was never dismissive of the idea, he just requested a coherent argument of the benefits so that real issues could be discussed. It is understood that submodules are not a smooth workflow in many cases, but Ram's proposed change would be very disruptive and most stated "benefits" of his design are red herrings.

[1] http://permalink.gmane.org/gmane.comp.version-control.git/22... [2] http://permalink.gmane.org/gmane.comp.version-control.git/22...

kelnos · on April 15, 2013

I agree with you on the tone of the conversation, but from my -- admittedly biased -- view, submodules are an abomination, and any serious proposal to come up with an alternative should be welcomed with open arms.

Ramkumar may have taken the questions directed at him the wrong way, but IMO the questioner shares equal fault for that. Know (or learn) your audience, and tailor your responses so you you achieve a good outcome. "I'm super frustrated and feel like I've wasted my time so I give up" is not a good outcome, for any of the parties concerned.

drtse4 · on April 14, 2013

If he can't defend his idea among project maintainers it's not worth implementing imho. While the first implementation could have been made in a rush, if this needs to be fixed let's give it some thought this time. While they were clearly not supportive, also clearly this guy is not the right one for this job.

artagnon · on April 15, 2013

Who's who, for those of you just joining in:

- Linus is the original author of Git, and he wrote it in April 2005. He doesn't contribute anymore, and is rarely seen on the Git mailing list these days (except when something like this happens). In number of patches, he's #4, after Junio, Jeff, and Shawn.

- Junio is the maintainer of the Git project. He took over maintainership of Git a few months after it was originally built, in July 2005.

- Jonathan is a very big contributor at #6. He doesn't focus on any one part of the codebase, and contributes to a wide spectrum.

- Jens primarily contributes to submodule.c/ git-submodule.sh, the current submodule implementation. Along with Heiko, he's one of the authorities on the current submodule system.

- Ram is a small contributor. He started out in Jan 2010 with two GSoC projects: one in 2010, and another in 2011 (neither were in submodules).

qznc · on April 14, 2013

He wants to unify submodules and subtrees? Sounds fishy to me, since these are for very completely different use cases.

Submodules are for tying project parts together, where you have control over all of them. For example, the clang compiler frontent could submodule the LLVM backend. Both are under the LLVM project, so people usually work on both of them at the same time. They should not be in the same repo, since LLVM also has other users unrelated to clang.

Subtrees are for integrating external projects, which are not really under your control, but you probably want to follow upstream developments. Since a subtree includes all the repo data, you can cleanly check out, even if the external origin repository vanishes.

jedbrown · on April 14, 2013

This is backwards. Subtree import all the data from the sub-project. (There is no way to clone without getting the subtrees because they are a native part of the repository.) You interact with subtree as if you had one project, committing without needing to know that the subtree has its own upstream. You can split out the subtree history and send it upstream. Splitting it out changes the SHA1. You can merge from upstream back into the subtree. Subtree makes the most sense when you have a component that is completely dominated by its parent, but which you want to also release stand-alone.

Submodules provide weaker coupling and make the most sense when the submodule has its own healthy upstream and you want to track those versions. It's awkward if all submodule development is happening from within the parent.

qznc · on April 15, 2013

I think we agree about the functionality, but maybe not quite about when what is a appropriate. The subtree/submodule discussion is somewhat similar to merge vs rebase. There is a lot of personal/project-specific opinion in there.

scribu · on April 14, 2013

`git subtree` seems like the perfect tool to complement `git submodule`.

Too bad it's not enabled by default: http://engineeredweb.com/blog/how-to-install-git-subtree/

jedbrown · on April 14, 2013

Subtree only needs to be installed by the maintainer that interacts with the submodule's upstream. Everyone else just makes normal commits in the parent repo. They don't even need to know that the subtree has its own upstream (but they likely write better commit messages if they know).

dustingetz · on April 14, 2013

parent comment is out of date; git subtree is part of git since roughly git 1.8.

Tobu · on April 15, 2013

It's part of git/contrib. Depending on the packaging you still need to enable it manually.

jrochkind1 · on April 15, 2013

I don't think anyone would think external dependencies you have control over are "completely different use cases" from ones you don't have control over, if it weren't for having already adapting to the fact that have to think of them as very different things with git's current toolset.

In fact, many people getting started with git get confused about whether subtree or submodule is appropriate, and end up wanting parts of both.

Tobu · on April 14, 2013

I've started the thread on Linus's first reply, and the guy is completely unconvincing. He was after a quick feature improvement (I don't really know what but Linus seemed to) and implemented it, but he gave little thought to the overall design (either his or Git's).

Big meh, and I'm normally interested in the evolution of Git.

tinco · on April 14, 2013

I don't know man, anyone who uses gitmodules knows that they are a pain and really unlike anything else in git. This guy had an idea on how to improve it, introduce a cool new basic object type to git and he even wrote a PoC, I'd say hats off to that.

Linus makes a rather unconvincing argument against the system, saying the current system allows for submodules be different for local sites. As if the proposed system would not support that, and as if the current 'dirty submodule' system is a better solution. He's being an absolute moron.

And Junio is just being very unproductive, he seems fully incapable of inducing anything from the design Ramkumar proposes and fails to see implications that anyone could see, even though he is a core git guy. And frankly he's being an ass too.

What I see is someone enthousiastically trying to fix a core problem of git in an ambitious but well constructed way, and a bunch of old guys just bashing the life out of him.

I think he's better off just not asking Juno or Linus for advice and just keep on hacking on his fork. I know I would use it.

snprbob86 · on April 14, 2013

> just keep on hacking on his fork. I know I would use it.

Correct me if I'm wrong, but isn't the problem that Linus brought up this: If you introduce a new object type, you need to get it right. A new object type would create non-backwards-compatible repositories, so you'd have a new minimum Git version. If you were to use this fork, then everyone who checks out your code would have to use it. Also, it would preclude tooling support (eg GitHub). Once such important repository versioning decisions are made, they can't be unmade. Git, at it's core, is basically just a well designed repository model.

tinco · on April 14, 2013

Yes, precisely :)

Tobu · on April 14, 2013

He's not the only one working on this. But he doesn't have the skills to defend his ideas (it might be just communication skills, it might not). As it is he won't be able to make the big revolutionary step the patch was promising.

If he had been making his own VCS he wouldn't need this kind of review, but Git is an agreed-upon format and protocol; it is absolutely necessary to start by considering the downsides when core changes will affect a large user base.

kelnos · on April 15, 2013

Linus makes a rather unconvincing argument against the system, saying the current system allows for submodules be different for local sites. As if the proposed system would not support that, and as if the current 'dirty submodule' system is a better solution.

Yes! And in the next breath, Linus flat-out admits he doesn't know if anyone uses/keeps a dirty local .gitmodules file. A great example of being out of touch with your users and still thinking you know best. Arguing to keep a (IMO minor & weird) feature around without knowing if anyone even uses it is folly.

I think he's better off just not asking Juno or Linus for advice and just keep on hacking on his fork. I know I would use it.

If only it were that easy. His changes would create a git implementation incompatible with everyone else's.

kzrdude · on April 14, 2013

Linus' first reply just laid bare that the object format matters much more than the implementation around it. That's how git got as far as it did until now, by using a sound data model.

kelnos · on April 15, 2013

Then you missed his earlier emails that explicitly acknowledge that the code he wrote is not at all intended to be mergeable and is just a proof-of-concept.

He also, in the thread, explicitly acknowledges that he's not sure about the best design and asks for help.

alexchamberlain · on April 14, 2013

I would like to applaud this guy; he has got insightful and polite answers from Linus.

k3n · on April 14, 2013

I noticed that too; after getting my popcorn ready, I could find only mild technical disagreements.

I'd be honored to have an idea shot down so mercifully by Torvalds.

alexchamberlain · on April 14, 2013

As would I...

akkartik · on April 15, 2013

I think both of you are being unfair. Find me an example when some newcomer submits a patch and gets flamed by Linus. His flames tend to steer clear of actual code (at least at the start), and of outsiders.

drewcrawford · on April 14, 2013

I am not a git maintainer, but as someone interested in improving submodules I can try to summarize the thread.

Submodules are difficult to use in practice for a wide variety of reasons. There are serious, complex proposals that have made it into git-contrib to build a "better" submodule, but for various reasons these have produced systems that merely make the tradeoffs in a different way that some people prefer.

This is not like any of those proposals. His problem is that "git add" "git diff", etc., don't "understand" submodules. It would be as if ls, cd etc. don't "follow" symlinks, so that you had to navigate to the correct directory yourself before you can use standard unix tools.

This is a serious problem, but his solution is essentially "we should use hardlinks instead of symlinks". That is, he wants to take the code that understands submodules out of the individual tools, and pop them in the filesystem somewhere where they are "shared" among more of the tools and don't have to exist in any of them.

There are many objections to this proposal. The chief one seems to be that this does not seem to directly address any particular problem. I think Ramkumar perceives that the reason git add/diff/rm don't support submodules is as a metaproblem "it is too hard to add submodule support to arbitrary tool". Whereas the git maintainers are saying "It is possible to add submodule support to arbitrary tool." So that's the initial standoff.

Another problem is that this requires a filesystem change, and that is essentially the most stable part of git that breaks incompatibility with other versions. If you read Linus's rants, you know that he generally applies an enormous amount of scrutiny to breaking compatibility. And so from his desk, you would need not just one clear benefit, but an overwhelming number of them, to break the contract like this.

But what I suspect is the True Rejection here is that this will pan out like all the proposals before it: to be different, but not strictly better, than the current implementation. To return to the POSIX analogy: we have both symlinks and hardlinks, and which one is better depends on what you are doing, there is no "one true link". If you replace all the symlinks with hardlinks, I think you will run into trouble with the hardlinks too.

Finally, it is unfortunate that the flamewar is about the monolithic patch rather than about some of the principles that led to the patch. I think Ramkumar has had (at least) two very good insights: that "git add" and friends should understand submodules a lot better than they do, and also that they should have this understanding by way of consuming some API that understands them rather than incorporating separate code for submodules into every tool. These strike me as a concrete improvement over the existing system, and I wish that the energy that leads to huge unusable patches like this could be redirected into usable ones.

tinco · on April 14, 2013

    The chief one seems to be that this does not seem to directly address any particular problem.

Except that you later say:

     I think Ramkumar has had (at least) two very good insights: that "git add" and friends should understand submodules a lot better than they do, and also that they should have this understanding by way of consuming some API that understands them rather than incorporating separate code for submodules into every tool.

This is exactly the problem this solution solves. Instead of having a weird configuration file in the working tree for something that should be an integral part of the repository, there will be a generic system for adding links. With this generic system in place it is much easier to implement "git add" and friends support for submodules.

He repeatedly makes this clear but no one reacts to this point.

    But what I suspect is the True Rejection here is that this will pan out like all the proposals before it: to be different, but not strictly better, than the current implementation.

Implementing code in a different but not strictly better way that allows you to more easily understand and extend your library is called refactoring. This 'True Rejection' is essentially rejecting the merit of refactoring code.

I also don't think that the hardlinks/symlinks analogy holds very well. Hardlinks and symlinks are both features in their own rights. Having submodules be defined as a weird file instead of as a part of your repositories objects is a superficial change, he also states this. Everything the current submodules do could be achieved using the proposed solution. (As he repeatedly has to make clear to Linus and Junio)

drewcrawford · on April 14, 2013

There are a complicated set of problems that are preventing us from understanding each other. I am going to do my best.

> weird configuration file

One of the disputes here is that the maintainers are of the opinion that config files are actually good, on the face of them. They point to examples of well-settled uses like .gitignore to claim that config files are The Git Way.

It may very well be that configuration files are in fact weird, or are weird in this particular case, but since the convention is and has been for git's history that config-files-are-good it would require a well-reasoned essay to move the needle of discourse on this subject, not just to use "they are weird" as a claim to prove something else.

> This 'True Rejection' is essentially rejecting the merit of refactoring code.

I don't want to get into a big meta-meta flamewar here, but there are many people who do reject the merits of refactoring working code, for some definitions of "refactor", for some definitions of "working", and this has been the subject of many popular essays, most notably Spolsky et al. This is another place where moving the needle of discourse would require writing a well-reasoned essay that quotes the appropriate authorities, and it is not sufficient just to appeal to a particular view of the merits of refactoring as a claim to prove something else.

> Hardlinks and symlinks are both features in their own rights.. [this] is a superficial change.

This is another one of those thorny semantic problems that are preventing us from understanding each other. There is a sense in which it is superficial, and another sense in which it is a substantial change. If you are using "git add", or are implementing it, it is a superficial change. If you are writing subtree-merge or git-submodule or something that really needs to understand the storage of submodules, it is substantial.

And so they are both features in their own right, in the sense that: git-add-and-friends will want to access things with a certain pattern, and git-submodule-and-friends will want to access things in a very different pattern. This is why I suspect the solution here is to have two distinct APIs, that access the same underlying storage mechanism. And if it makes sense to continue to support something very much like the old API, it probably does not make sense to redesign the FS to look like the new API.

Of course, there is a lot of resistance in the git community to have two ways to do the same thing. So when I say "I suspect the solution is to have two APIs" I mean only that it would address most of the objections raised thus far, not that it would actually be implemented in mainline.

> Everything the current submodules do could be achieved using the proposed solution. (As he repeatedly has to make clear to Linus and Junio)

And as Linus and Junio have repeatedly made clear, merely doing everything the current implementation does is not within a few galaxies of meeting the burden for breaking FS compatibility. The compatability-break burden is extremely high.

tinco · on April 14, 2013

> I am going to do my best.

Great :)

> One of the disputes here is that the maintainers are of the opinion that config files are actually good, on the face of them. They point to examples of well-settled uses like .gitignore to claim that config files are The Git Way.

Yes but .gitignore only configures your git client, the gitsubmodules say something about the repository instead. If that was the git way, wouldn't branch names be in a .gitbranches as well?

> I don't want to get into a big meta-meta flamewar here, but there are many people who do reject the merits of refactoring

I might be an extremist on this topic, so it's good to just leave it be.

> This is why I suspect the solution here is to have two distinct APIs, that access the same underlying storage mechanism.

I agree, but I think Ram. is correct in asserting that both ways could be achieved by having a link object with some configuration in it. (it could just be the .gitmodules file moved to the .git directory for all the end users care)

> The compatability-break burden is extremely high.

I understand, and it should not be taken lightly. But no one was suggesting this feature would be added to the master and shipped in the next release of git. It could even be delayed until there is another compatibility breaking change. Ram. never pretended his current work would be the final way of doing it.

Thank you for elaborating your understanding of the discussion :)

drewcrawford · on April 14, 2013

This is one of the nicest disagreements I have ever had. If we don't already, we should compare notes and find something to work on together, because when two people can disagree but still understand each other, that is where you make progress on complex problems. :-)

> Yes but .gitignore only configures your git client, the gitsubmodules say something about the repository instead.

This feature is often used to configure the repository, and I in fact use it that way. By way of example, https://github.com/new operates under the assumption that you use .gitignore to configure a repository. Perhaps it is best to say that config files offer flexibility in this dimension, whereas a link file is more rigid.

> It could even be delayed until there is another compatibility breaking change.

I believe that perhaps the discussion on the point of backwards incompatibility has been framed in a way that is nonproductive. Of course, once one has decided on a course of action, it is proper to consider how to reduce the impact of that decision. I agree with you that there are a wide variety of harm reduction strategies available here.

But these inquiries only become relevant once one has decided that the patch is in general an improvement in some dimensions. As an outside observer, I do not see an improvement.

I can see the logic that if it is true that git-add-and-friends have omitted support for submodules on the basis that such support is difficult, this patch could solve that problem. But I have not been convinced of the premise; there is no citation of the people who maintain the UI tools making claims of difficulty. Furthermore, Junio seems to argue at least that add's behavior is by design, I do not know enough about it to know if that is a sensible design, but it does suggest to me that the problem with UI tooling is not a function of implementation difficulty, but there is perhaps some design or ideological reason for the behavior of these tools that explains the state of them today.

The other problem that I have is as follows: if I accept the premise that the trouble with git-add is a matter of implementation difficulty, it seems to me that the trouble can be resolved at some other tool layer rather than in the FS proper. So if the hypothesis underlying the patch is correct, it seems to me that one should adopt the implementation that doesn't break compatibility over the implementation that does.

It is unfortunate that the matter of backwards compatibility was raised early and vociferously in the thread, because as you have pointed out there is a lot that can be done about backwards compatibility that doesn't address the real merits of whether the idea is good or bad. (Although I can understand why compatibility would be at the top of any maintainer's mind.) Perhaps this exchange between Junio and Ram. is an example of two people being far enough along their own lines of inquiry that they are having trouble making any sense of one another.

nthj · on April 15, 2013

> I don't want to get into a big meta-meta flamewar here, but there are many people who do reject the merits of refactoring working code, for some definitions of "refactor", for some definitions of "working", and this has been the subject of many popular essays, most notably Spolsky et al.

Spolsky wrote against rewriting your software from scratch [1], but I couldn't find anything against refactoring, which are 2 very different things.

[1] http://www.joelonsoftware.com/articles/fog0000000069.html

akkartik · on April 15, 2013

A 'refactoring' is a change that doesn't change behavior, so the word is a red herring in this context, shedding more heat than light. Redesigns can be valuable, but let's call a spade a spade.

drtse4 · on April 14, 2013

This thread is a mess... and i'm not sure statements like this one "'git add' should not go past submodule boundaries. I should not be able to 'git add clayoven/' or 'git add clayoven/LICENSE'" are a good start. Gives a simplified description of what he want to do without going too in-depth about why that path was chosen and starts coding right away.

tinco · on April 14, 2013

Why would he need to go in-depth about why that path was chosen, isn't it obvious? The workflow he proposes is miles better than how gitmodules is working now.

drtse4 · on April 14, 2013

Why? Simply to discuss it and evaluate alternatives that could be better. I'm referring to the solution he proposed not to the fact that git modules have a lot of space for improvement. "miles better" considering that we are talking about git modules it's not really that hard to devise.

plorkyeran · on April 14, 2013

Mostly unrelated to the topic, but I'm always amused by things like "teach ce_compare_gitlink() about OBJ_LINK". I've never seen any other project that anthropomorphizes the code like that, and I sort of like how it makes the resulting changelog read.

davvid · on April 15, 2013

I've never seen any other project that anthropomorphizes the code like that

Git's SubmittingPatches document says to use an imperative tone in commit messages. That's why it reads the way it does.

stormbrew · on April 15, 2013

So, I'm curious. In response to Linus' comment that "... .gitmodules was always a bit of a hack, but it's a working hack ...", does anyone who's actually used them really feel that they are indeed a 'working' hack? I find that whenever I interact with a git repo with submodules I spend an inordinate amount of time wrangling them to do things they clearly weren't meant to do. I find that most people I talk to about them have experienced the same. And then I go and do something like try to use bisect in concert with them and I basically want to shoot my computer.

Am I missing something?

richardwhiuk · on April 14, 2013

I'd really like something like this to happen, but I agree that this set of patches isn't likely to get included. Submodules are my biggest gripe with git usage, and what persuades me not to suggest people roll git out more widely. I've seen various strategies to avoid submodules (build scripts that clone sub repos instead is one example alternative) but it'd be much nicer if it there was a One True Way which worked properly.

lnanek2 · on April 14, 2013

Sure would be nice. Sometimes I'm working on projects and get sent repos to work on with all the deps missing, because people just cloned the deps into subdirectories and git ignored them or something. Would be much better if they had a .git in every folder like Subversion does nowadays instead of trying to have a special root that includes and ignores certain children.

comex · on April 15, 2013

Just to comment on one of the issues in the thread: not everyone uses a command line editor or even an editor which can be easily invoked from the command line (though I do), so requiring a special command, "git edit-link", to edit some inherently textual data that seems to work perfectly well being stored as a normal text file in the repository, is a little gross.