How is this Go's fault? Just because you can pull code automatically from a remote repository doesn't mean you always should. It seems to me that keeping the entirety of the code necessary to compile a project in a local build directory is a very good idea if you don't want to compile against moving targets.
Maintain your own forks of the libraries you need. Commit your changes there, and also submit upstream. Pull upstream changes back to your fork when you are ready. You have this problem with any language or tool that uses github directly as the registry.
I've run into the problem (with some Emacs add-ons in particular), of changing something on one machine, only to miss that change in another. So, if I've got my own "central" repo, I can push changes to that and pull them down to other machines.
Since I site the repos on a server I have online already, it is also trivially easy to make it publicly "pullable" for others if need be.
It sounds like it's even worse than that - the developer was maintaining a fork of the upstream projects, but didn't put it in version control. That is, he/she used Go's github import mechanism to get a copy of a project onto their own machine, and edited the locally cached copy. So of course, when other people tried to build, the libraries didn't match up.
The real issue here seems to be that this was a half-finished project, at a very early stage of development, and the developer probably never intended their local scratch copy to be used in production [http://www.joystiq.com/2012/10/24/haunts-anatomy-of-a-kickst...]
You are correct. Go's remote imports are dangerous for long-term project maintenance, but the feature is still useful for quick, throwaway projects.
Go badly needs two things:
1. A best practice that dictates that you never import from remote repositories in production, long-term code; the feature is fine for one-offs and experimentation, but the article summarizes only one way this style of work can lead you in to a maintenance world of pain. What happens if the repo you're importing from Github is deleted? What do you do for fresh clones? You're going to end up changing the URL anyway. I feel the Go community has kind of glossed over this (and I like Go).
2. An equivalent of CPAN or PyPI, which you could then import from in concert with a tool to manage those dependencies, a la:
import (
"cgan/video/graphics/opengl"
)
This model works for CPAN, PyPI, and so on for a reason, and that reason is avoiding several of the dependency/merge hells that remote repos can create. CPAN provides Perl a lot, such as distributed testing in a variety of environments. I personally think such a thing is necessary for long-term maintenance of any software project that utilizes third-party libraries. This is one of Google's oversights in Go, because they have an (obviously) different take on third-party code. Here's a good case:
Developer A checks out the code clean. Five minutes later, developer B checks out the code clean. In both cases, your "go get" bootstrap script fetches two different commits, because in that five minutes, upstream committed a bug. Developer B cannot build or, worse, can build but has several tests fail for unknown reasons or, even worse, the program no longer functions properly. Developer A has none of those problems. In a world with a CPAN-like, developer B can see that he has 0.9.1 and developer A has 0.9.0, developer B can commit "foo/bar: =0.9.0" to the project's dependency file, then everybody else doesn't suffer the same fate. In the current world, you're either massaging your local fork to keep the new commit out, or any other troublesome, non-scalable approach to this.
Building large software projects against a repository never works. You need tested, versioned, cut releases to build against, not master HEAD. It only takes one bad upstream commit to entirely torpedo your build, and you've now completely removed the ability to qualify a new library version against the rest of your code base. Other people are suggesting "well, maintain your own forks," so you're basically moving merge hell from one place to another. I, personally, have better things to do with my time; I've seen (Java) apps with dozens of dependencies before, and keeping dozens of repositories remotely stable for a team of people will rapidly turn into multiple full-time jobs. Do you want to hire two maintenance monkeys[0] to constantly keep your build green by massaging upstream repositories, or do you want to hire two feature developers? Exactly.
I've started writing a CPAN-like for Go a couple times but I'm always held back by these threads:
The second one highlighting how difficult Go is to package as a language -- my personal opinion is treat Go just like C and distribute binary libraries in libpackage, then the source in libpackage-src. If one message in that thread is true and binaries refuse to compile with different version compilers, I'm troubled about Go long-term.
[0]: I'm not calling all build engineers maintenance monkeys. I'm saying the hypothetical job we just created is a monkey job. I love you, build engineers, you keep me green.
You're right that syncing directly with the net is a problem. However, the problem is basically developer education. Go doesn't work like other languages and the consequences aren't well-documented. If you do it right, the process works fine; it's basically how Google works internally.
The basic idea is to commit everything under $GOPATH/src so that you never need to touch the net to reproduce your build.
After running "go install" you should check in the third-party code locally, just like any other commit.
Then updating a new third-party library to sync with their trunk is like any other commit: run "go install", test your app, and then commit locally if it works. If it doesn't work, don't commit that version; either wait for them to fix it, sync to the last known good version, or patch it yourself.
If you aren't committing your dependencies then you're doing it wrong.
> If you aren't committing your dependencies then you're doing it wrong.
Disagree strongly (and hate absolutes like "doing it wrong"). Their metadata, yes, by all means, commit that. There is absolutely no reason, however, to have the source code for a dependency in my tree, Go or not.
Give me a binary I can link against, or at least the temporary source in a .gitignored location, and let's call it a day. When I want to bump to a new version my commit should be a one-line version bump in a metadata file, not the entirety of the upstream changes as a commit. I've seen a sub-1MLoC project take 10 minutes just to clone. Internally! You're telling me you want to add all the LoC and flattened change history of your dependencies in your repo? Egads, no thanks! Where do you draw that line? Do you commit glibc?
There's just no reason to store that history unless you are in the business of actively debugging your dependencies and fixing the problems yourself, rather than identifying the issue and rolling back to a previous version after reporting the problem upstream. I guess it's paying your engineers to fix libopenal versus paying your engineers to work on your product; one's a broken shop, the other isn't. Some people will feel it's one, some the other.
Actually what is insane is to have your production code do anything else. "pip install foo" and similar schemes open your code up to the following problems:
- incompatibilities that were introduced in the version 1.2.1 while you've only tested your code with 1.2
- the host is down so you can't compile your own code because your dependency is not available
- the host is hacked and "foo" was replaced with "malicious foo"
- exponential increase of testing (you really should test with all version of your dependencies you use)
Ultimately, I don't understand the doom and gloom point of view. C, C++, Java, C# etc. programmers have been pulling dependencies in their repos for ages. In my SumatraPDF I have 12 dependencies. I certainly prefer to manually update them from time to time than to have builds that work on my machine but fail for other people or many other problems that are a result of blindly pulling third party code.
None of the things you listed are problems. The other comment demonstrates solutions to all of them, and I do not understand your fourth bullet point in context at all.
> C, C++, Java, C# etc. programmers have been pulling dependencies in their repos for ages.
I'm not making this up: in my career, I have never worked on a project where this is the case, and I've worked for shops that write in three of those languages.
> I certainly prefer to manually update them from time to time than to have builds that work on my machine but fail for other people
That's your choice, and it's a little bit different because I'm assuming "other people" are end users -- those that want to recompile SumatraPDF from source for some bizarre reason -- not developers. Fixing a broken build is a skill that every developer should have, but not an end user. Once I learned how to write software, I never came across a situation as an end user compiling an open-source Unix package that I could not solve myself.
The opinion I'm sharing here is related to developing on a team, not distributing source for end-user consumption. It sounds like you don't develop SumatraPDF with many other people, either. Nothing like merge failures on a huge dependency that I didn't write to ruin a Monday.
Also, wait, SumatraPDF is built with dependencies in the codebase? What if a zero-day is discovered in one of your dependencies while you're on vacation for a month; what do distribution maintainers do? Sigh? Patch in the distribution and get to suffer through a merge when you return?
> C, C++, Java, C# etc. programmers have been pulling dependencies in their repos for ages.
The first time I worked on a C# project started in an age where nu-get was not widespread, I saw with dismay a "lib" directory with vendored DLLs. It does happen.
- Binary artifacts under version control is no-no for me, unless we're talking assets. Third-party libraries are not assets.
- Where do these DLLs come from? How do I know it's not some patched version built by a developer on his machine? I have no guarantee the library can be upgraded to fix a security issue.
- Will the DLLs work on another architecture?
- What DLL does my application need, and which ones are transitive dependencies?
That's many questions I shouldn't have to ask, because that's what a good package management system solves for you.
Spot on. We had this problem taking on some legacy code during a round of layoffs. They had checked in /their own/ DLLs from subprojects. It turned out that one DLL had unknown modifications not in the source code, and another had no source at all.
Another problem was that by building the way they had, they'd hidden the slowness and complexity of the build - including the same code from different branches via a web of dependencies, and with masses of unused code. They never felt this pain, so had no incentive to keep the code lean.
Sure. But at the same time, if you make it a policy to forbid nailguns at the workplace, you have less people shooting themselves in the foot while you're not looking.
Anyway, this analogy isn't helping anyone. You think libs in source control is a problem because some people might not do it properly. I'm contending that there's nothing wrong with libs in source control--there's something wrong with letting people who might not do it properly near your source control.
There are clear benefits from having a package manager (if anything, pesky things like version numbers, direct dependencies, etc are self-documented). In addition, it does prevent people from taking shortcuts, and even good people take shortcuts when the deadline is short enough.
But if you didn't write the dependency (and thus presumably don't commit to it), why would there be a merge conflict?
As for upstream maintainers rebuilding your package, I don't see how having to update a submodule is vastly different from updating the relevant versions in a file. Both seem like they'd take mere seconds.
It's not like you're writing gotos straight into library code, it's merely a bookkeeping change. You're just importing everything into your repo instead of referring to it by ambiguous version numbers. In the end, the code written and the binary produced should be identical.
- you don't need the internet to install dependencies. There are many options e.g., a locally cached tarball will do (no need to download the same file multiple times). Note: your source tree is not the place to put it (the same source can be built, tested, staged, deployed using different dependencies versions e.g., to support different distributions where different versions are available by default)
- if your build infrastructure is compromised; you have bigger problems than just worrying about dependencies
- you don't need to pull dependencies and dependencies of dependencies, etc into your source tree to keep the size of the test matrix in check even if you decided to support only a single version for each your dependencies.
As usual different requirements may lead to different trades off. There could be circumstances where to vendor dependencies is a valid choice but not due to the reasons you provided
No really! Go isn't like other languages. You have to think differently. Please try it!
There's no such thing as, say, Maven binary dependencies for Java. If you don't check in the source code for your dependencies, your team member won't get the same version as you have and builds won't be reproducible. You won't be able to go back to a previous version of your app and rebuild it, because the trunk of your dependencies will have changed. By checking in the source code you're avoid a whole lot of hurt.
Checking in source is okay because Go source files are small and the compiler is fast. There isn't a huge third-party ecosystem for Go yet.
> There's no such thing as, say, Maven binary dependencies for Java.
I'm saying there should be, but not necessarily the same thing. That's my entire point.
I'm also not a fan of the "you have a dissimilar opinion to mine, so obviously you've never used Go properly" attitude in this thread. One way to read your last is that I've never used Go at all, though I'm giving you the benefit of the doubt and assuming you meant used Go properly. Either way, I don't get the condescension of assuming I'm unaware of everything you're explaining to me simply because I have an opinion that is different than yours. Especially since half of your comment is repeating things to me that I said earlier.
Maybe it sounds like condescension. I was in the same place at the beginning. No exceptions? No generics? Heresy. How dare you Go people ignore my many years of experience? I wrote a few rants to the mailing list, which were basically ignored.
The reason I assume you haven't used Go much is that your examples of problems with checking stuff in aren't examples of problems happening in Go. It's an analogy with other languages and other environments. Such arguments don't seem to get very far.
Maybe it won't scale and something will have to give. I expect the Go maintainers will find their own solution when it happens, and it won't look like Maven or traditional shared libraries. (If anything, it might be by replacing Git/hg with something that scales better.)
You already have that in Javaland. It's called maven, and it allows you to change one number in one file to upgrade the dependency version. Clojure also has that with Leiningen.
Almost. `go get` will not resolve your sub-moduled dependencies. It'll work well enough for developing your library, but will break when people try to consume your library (at least, without git cloning it)
I agree 100% with your first point. It should be spelled out.
Your second point I am less sold on. Transitive dependencies (pkgA imports pkgB@v1 but my code need pkgB@v2 which is incompatible with v1) are the thing of nightmares in large systems development, which is what Go is designed for... that lack of versioned imports wasn't an oversight, it is a feature.
Centralized repos are centralized points of failure, and only as good as they are well managed. NPM versus CPAN if you will. Any serious project will localize dependencies, even if they are in CPAN, you never know when CPAN will be down or other unforeseen things might happen.
Instead what we have is that pkgA needs pkgB@then (which happens to be when the author of pkgA last cached pkgB) but my code needs pkgB@now. That's worse in pretty much every way, mostly because there are no identifiers anywhere to clearly work around or even detect the problem. I'm all for "your build can only use a single version of pkgB" (linking two versions of pkgB into the same binary is insane) but I need to say what version that is, not leave it nondeterministic and dependent on uncontrolled, unrecorded state of the machine running the build.
No, you just mirror CPAN. This is already done in lots of shops I know of for PyPI. IME, I've only ever had PyPI down on me once, and there are mirrors (that are usually up) if that is ever the case[0]. I think localizing dependencies as you say is a waste of time.
I do understand the basics of probability. The likelihood of your serving infrastructure or application being compromised is an order of magnitude higher than the most popular repositories in software development. I'm not saying it doesn't happen, but I also don't walk around worried about having an asteroid land on me simply because I understand probability. If it happens, it happens, and we deal accordingly, but using a much more difficult software engineering process because of (arguably) paranoia is silly.
And, that the package(s) you're trojaning aren't signed[1]
(I'm not immediately sure if new releases are automagically signed/digested when uploaded via PAUSE, or what fraction of currect packages are signed)
Exactly! But trusting a remote repo to contain an essential part required to build the project is just extremely short-sighted, no matter what language is being used.
I think it is okay, as long as you have a backup plan. Can be advantageous to keep the code required to build the project small and have people pull dependencies as required.
Transitive dependencies (pkgA imports pkgB@v1 but my code need pkgB@v2 which is incompatible with v1) are the thing of nightmares in large systems development.
This is a decided on feature by the golang team, not an oversite or something beyond their technical capabilities.
Yeah, but you can use any number of systems to keep track of it if needed. I'm using git submodules in my current project, which keeps track of what commit you are using.
Submodules are flawed in a lot of ways, but as a simple way to to keep a pointer to the appropriate version of an external project they are great. It's a little extra work to set them up, but it's fairly marginal. Check them out with go get, add the repo after the fact with git submodule, and after that just use git submodule to always grab the correct versions.