I work in a large company and I have used a central repository for six years and...

JoshTriplett · on Dec 10, 2017

> 1) Transparency. I can see what everybody else is doing and if somebody has an interesting project I can find it quickly. You can also learn a lot from looking at other peoples changes.

This doesn't require a single central repository, just that all repositories live in a common location.

> 2) Faster. To check out the source code for the project I now work on takes an hour in the distributed system, while it only took 5 minutes in the centralized system.

What distributed repository management system do you use, and what centralized system did you use?

> 3) Always backed up. All code that is checked into the central repository is backed up. It has happened twice that employees have left and code was lost because they only checked it in locally.

As with point 1, this doesn't require a single central repository, just that all repositories live in a common location.

monksy · on Dec 10, 2017

>1 Single repo

It's more of a matter of your tool to visualize change set history.

>2 Faster

This again is an issue with the tool quality. There needs to be meta git repos. Groups in Github and Gitlab attempt to create a shallow sense of that.

>3

Always push. That's not an issue that is resolved by a single central repo.

mindcrime · on Dec 10, 2017

This doesn't require a single central repository, just that all repositories live in a common location.

Even better, if every project includes a DOAP file (or something similar) and/or you publish commit messages using ActivityStrea.ms or something, you could easily have an interface that shows project activity around the organization, regardless of how many repositories and/or servers you use. Of course it's probably easier if all the repositories live in a common location...

candiodari · on Dec 10, 2017

I use git-svn to use a central repository. Let me list the advantages

1) Faster

There is no comparison. But let me count the ways

a) checking out stuff

It is faster than just downloading a directory using SVN.

b) just trying something out (ie. branch)

Creating a branch, making a few changes takes me seconds, and does not require me to change paths like it does for the svn victims I work with. Throwing it back out again takes seconds, and all operations are reversible for when I fuck up (which is often).

c) merging

Git's merging. Oh my God. In half the cases I just have to check stuff over, if that.

d) submitting

We use code review. Unlike most of the subversion folks I can easily have 5 co-dependant changes in flight (5 changes, each depending on the previous one) without going insane, and I have gone up to 13, not counting experimental branches. I observe around me that it takes a good developer to manage 2 with subversion. 5 is considered insane, I bet if I showed them the 13 were in flight at the same time they'd have me taken away as a danger to humanity.

2) always backed up

Subversion doesn't back up until you commit and people don't commit anywhere near quickly enough ... The way people lose code around here 99.9% of the time is by accidentally overwriting their in-flight code contributions (the remaining 0.1% involves laptop upgrades and overenthusiastic developers. Even then cp -rp will just copy my environment and just work, and yet the same is absolutely not true for the subversion guys).

Now with Git, I commit every spelling fix I make, every semicolon I have forgotten, on occassion separately, other times with "--amend". And only then make my share of stupid mistakes, after committing, something that's technically not impossible on subversion but not practical, mostly because of code review ("just commit it" on subversion takes ~5 minutes in the very fast case (that requires a colleague dropping everything that very second, AND can't involve any actual code changes, as that trips a CI run that takes 3 minutes assuming zero contention), and 20-30 minutes is a more typical time (measured from "hey, I'd like to commit this", to actually in the repository). Committing on git takes me the time to type "<esc>! git commit % -m 'spellingfix'". The subversion commit time means that developers often go for weeks without committing. Weeks, as in plural.

I get that a git commit isn't the same thing as a subversion commit. But it does allow me to use the functionality of source control, and that's exactly what I'm looking for in a source control system. Subversion commit doesn't allow me to use source control without paying a large cost for it, that's what I'm getting at.

So I have backups guarding against the 99.9% problem (and an auto-backup script that does hourly incremental backups for the 0.1% case). The subversion guys are probably better covered for the 0.1% problem. Good for them !

3) actual version control

Git's branches, rebase, merge, etc mean I can actually work on different things within short time periods in the same codebase.

The fact that other developers are using subversion means I can have my own git hooks that I use for various automated stuff. Some fixing code layout, some warning me about style mistakes, bugs, ... (you'd be surprised how much your reputation benefits from these). Some updating parts of the codebase when I modify other parts, ... you have to be careful as these are part of the reason subversion is so slow (esp. the insistence on CI, I hear a CI run at big G, which is required before even code review can happen, takes upwards of an hour on many projects with some taking 8-9 hours)

icebraining · on Dec 10, 2017

The discussion wasn't really about SVN vs Git; you can have one or multiple repositories with either system.

candiodari · on Dec 10, 2017

You'd have the same problems with any other centralized versioning system the way companies use it these days (ie. with CI, and code review).

joshuamorton · on Dec 10, 2017

Not really. I work at google. I work on a leaf, so my CI takes < a minute. I also can send out multiple chained changes, in a tree, to multiple reviewers, and have them reviewed independently.

Certainly, CI takes a long time for certain changes, but those are changes that affect everything. You'd have the same problem in a multi-repo approach if you updated a repo that everything else depended on. At some point, you have to run all of the tests on that change.

candiodari · on Dec 10, 2017

Cool. I've wondered about Google's CI a lot, but there are a lot of horror stories online. Most people are complaining about it taking an hour for simple changes (something called "tap", I wonder what that stands for).

Chained code review changes, I refuse to believe that in Google version control (which is perforce according to Linus' git talk at Google) chained changes are easy. Branching in perforce is literally worse than SVN, it's a bit more like the old CVS model, and they've sort-of tried to get the SVN copy-directory model forced into the design afterwards. Also the tool support (merges ...) is bad compared to subversion and stone-age compared to Git's tools.

The one reason I keep hearing for using perforce is that perforce allows the administrator to "lock off" parts of the repository to certain users.

I've done branches and merges in Git, Subversion and CVS (and I've had someone talk me through one in Perforce, but I don't really know). Google's branch/merge experience is very likely to be somewhere between SVN and CVS, and those can accurately be referred to as "disaster" and "crime against human dignity". It's certainly not impossible, but it's very hard and you can't expect me to believe (normal developer) people can reasonably do that in Perforce.

Also: what would happen if you send out 20 chained commits, 10 of which are spelling corrections, 5 of which are trivial, compile-fixing bugs (forgot semicolon, "]" that should have been ")", etc ...), 2 of which are small changes to single expressions and 3 of which introduce a new function and some tests. Perforce, like subversion and cvs doesn't have any way of tracking stuff unless you commit it and you can almost never commit without CI and code review, so would you track changes like that, or would you just leave them in your client untracked until you're ready for a code review ?

joshuamorton · on Dec 10, 2017

>Cool. I've wondered about Google's CI a lot, but there are a lot of horror stories online. Most people are complaining about it taking an hour for simple changes (something called "tap", I wonder what that stands for).

Well, like I said, its possible to do modify things that have a lot of dependencies, at which point you run a lot of tests, but that would be truish anyway. Consider the hypothetical situation where you're changing you're modifying the `malloc` implementation in your /company/core/malloc.c`. Everything depends on this, because everything uses malloc. If you have a monorepo, you make this change, and run (basically) every unit and integration test, and it takes a while.

Alternatively, if `core` is its own repo, you run the core unittests, and then later when you bump the version of `core` that everything else depends on, you run those tests too, but now if there's a rarely encountered issue that only certain tests exercise, you notice that immediately when you run all the monorepo tests, and can be sure that the malloc change is the breakage. If you don't do that, then you notice breakages when you update `core`, or maybe you don't notice it, because its only one test failing per package, and it could just be flakyness. So noticing it is harder, and identifying the issue once you've decided there is one is harder, and now you need to rollback instead of just not releasing.

>Chained code review changes, I refuse to believe that in Google version control (which is perforce according to Linus' git talk at Google) chained changes are easy. Branching in perforce is literally worse than SVN, it's a bit more like the old CVS model, and they've sort-of tried to get the SVN copy-directory model forced into the design afterwards. Also the tool support (merges ...) is bad compared to subversion and stone-age compared to Git's tools.

Google no longer uses perforce, we use Piper (note that this is a google develped tool called Piper, not the Perforce frontend called Piper, yes this is confusing, afaik, Google's Piper came first). Piper is inspired by perforce, but is not at all the same thing. (See Citc in the article). The exact workflow I use isn't piblic (yet), but suffice to say that while Piper is perforce inspired, Perforce is not the only interface to Piper. This article even mentions a git style frontend for Piper.

>Google's branch/merge experience is very likely to be somewhere between SVN and CVS, and those can accurately be referred to as "disaster" and "crime against human dignity". It's certainly not impossible, but it's very hard and you can't expect me to believe (normal developer) people can reasonably do that in Perforce.

Suffice to say you're totally mistaken here.

>Also: what would happen if you send out 20 chained commits, 10 of which are spelling corrections, 5 of which are trivial, compile-fixing bugs (forgot semicolon, "]" that should have been ")", etc ...), 2 of which are small changes to single expressions and 3 of which introduce a new function and some tests. Perforce, like subversion and cvs doesn't have any way of tracking stuff unless you commit it and you can almost never commit without CI and code review, so would you track changes like that, or would you just leave them in your client untracked until you're ready for a code review ?

So, Piper doesn't have a concept of "untracked". Well it does, in the sense that you have to stage files to a given change, but CitC snapshots every change in a workspace. Essentially, since CitC provides a FUSE filesystem, every write is tracked independently as a delta, and it's possible to return to any previous snapshot at any time. One way to think of this concept is that every "CL" is vaguely analogous to a squashed pull request, and every save is vaguely analogous to an anonymous commit.

This means that in extreme cases, you can do something like "oh man I was working on a feature 2 months ago, but stopped working on it and didn't really need it, but now I do", and instead of starting from scratch, you can, with a few incantations, jump to you're now deleted client and recover files at a specific timestamp (for example: you could jump to the time that you ran a successful build or test).

>Also: what would happen if you send out 20 chained commits, 10 of which are spelling corrections, 5 of which are trivial, compile-fixing bugs (forgot semicolon, "]" that should have been ")", etc ...), 2 of which are small changes to single expressions and 3 of which introduce a new function and some tests.

I'd logically group them so that each resulting commit-set was a successfully building, and isolated, feature. Then, each of those would become its own CL and be sent for independent review.

Too · on Dec 10, 2017

I think you are confusing central/distributed with monorepo/multiple repos. Also distributed VCS doesn't imply that you don't have a central master somewhere.

leetcrew · on Dec 10, 2017

perforce has such a janky ui though. whenever i try to do anything significant with my company's codebase, the whole application locks up for hours. i guess i need to learn how to use the cli.

mcbain · on Dec 10, 2017

This might not only be GUI vs cli, it can just be down to the granularity of your client mapping - if the p4 server thinks it needs to lock across large regions of depots it can go into the weeds.

I always try to have the absolute minimum in my client specs, but sometimes you do need to operate over the world.

The perforce docs are generally well written, worth looking at them.

itronitron · on Dec 10, 2017

yeah, the perforce UI is super easy to crash

adrianmonk · on Dec 10, 2017

The third problem can essentially be solved by doing all your production builds by checking out code from some central repository. If you follow that rule, then you guarantee you'll have the source code for every binary in production.

That way, you can still have a distributed repository (Git, Mercurial, etc.) if you want. Even if some code exists only in some developer's local repository, it's presumably not that big of a deal since that code can never have made it to production.