Depends on if you want to bother setting up Artifactory. The problem with having...

setr · on Nov 7, 2018

Afaik, Most language communities with a package manager are fine with the network request, since it should really only occur on initial pull, and library updates; not sure what they do with vagrant, but i imagine just keeping the libs locally and copying it in on vagrant build.

Eg in pythonland, I’m pretty sure I’ve never seen a repo with packages stored in the repo.

So what happened in jsland that makes the difference?

danShumway · on Nov 7, 2018

Python installs its packages system-wide with pip, so you'd never be able to commit those. The default for Ruby gems is also system-wide (although it seems like members of the community are starting to shift away from that).

Node installs packages locally to the project itself. This was partially a direct response to languages like Ruby and Python; the early community felt like system-wide dependencies were usually bad practice. So you can install packages globally in Node, but it's not the default.

When you move away from global dependencies to storing everything in a local folder, suddenly you have the ability to commit things. And at the time, there weren't a ton of resources for hashing a dependency; managers like Yarn didn't exist. So checking into source turns out to be an incredibly straightforward answer to the question of, "how do I guarantee that I will always get the same bytes out?"

People are free to fight me on it, but I would claim that this was not particularly controversial when Node came out, and it is a recent trend that now package managers are advising Orgs to just use lockfiles by default. Although to be fair, a lot of the community ignored that advice back then too, so it's never been exactly common practice in Open Source JS code.

setr · on Nov 7, 2018

>Python installs its packages system-wide with pip

Standard practice atm is to install packages locally to a project by using venv, or rather pipenv. Afaik, lockfiles remain sufficient. I assume ruby is in a similar state, but im not familiar with its ecosystem

>And at the time, there weren't a ton of resources for hashing a dependency

I suppose that’d be a big reason, but isn’t that basically equivalent to version pinning? (Whats the point of versioning, if multiple different sources can be mapped to the same project-version in the npm repo?)

It seems odd to me because it seems like it’d screw with all the tooling around vcs (eg github statistics), conflates your own versioning with other projects, and is the behavior you’d expect when package management doesn’t exist like in a C/++ codebase.

rust/python/ruby/haskell don’t see this behavior commonly, specifically because utilizing the package manager is generally sufficient. 62That njs would commonly only use npm for the initial fetch seems like a huge indictment of npm; its apparently failing half its job? It seems really weird to me that the js community would accept a package manager..that isn’t managing packages.. to the point that adding packages to your vcs becomes the norm, instead of getting fed up with npm

Adding to it is that, afaik, package management is mostly a solved problem for the common case, and there are enough examples to copy fron that I’d expect npm to be in a decent state... but apparently its not trusted at all?

danShumway · on Nov 7, 2018

> Standard practice atm is to install packages locally to a project by using venv, or rather pipenv.

Thanks for letting me know. This is a good thing to know, it makes me more likely to jump back into Python in the future.

I suppose it is to a certain point an indictment of NPM, certainly I expected more people to start doing this after the left-pad fiasco. But it's also an indictment of package-managers in general.

So let's assume you're using modern NPM or an equivalent. You have a good package manager with both version pinning and (importantly) integrity checks, so you're not worried about it getting compromised. You maintain a private mirror that you host yourself, so you're not worried that it'll go down 5-10 years from now or that the URLs will change. You know that your installation environment will have access to that URL, and you've done enough standardization to know that recompiling your dependencies won't produce code that differs from production. You also only ever install packages from your own mirror, so you don't need to worry about a package that's installed directly from a Github repo vanishing either.

Even in that scenario, you are still going to have to make a network request when your dependencies change. No package manager will remove that requirement. If you're regularly offline, or if your dependencies change often, that's not a solved problem at all. A private mirror doesn't help with that, because your private mirror will still usually need to be accessed over a network (and in any case, how many people here actually have a private package mirror set up on their home network right now?) A cache sort of helps, except on new installs you still have the question of "how do I get the cache? Is it on a flash drive somewhere? How much of the cache do I need?"

If you're maintaining multiple versions of the same software, package install times add up. I've worked in environments where I might jump back forth between a "new" branch and an "old" branch 10 or 15 times a day. And to avoid common bugs in that environment, you have to get into the habit of re-fetching dependencies every checkout. When Yarn came out, faster install times were one of its biggest selling points.

I don't think it's a black-and-white thing. All of the downsides you're talking about exist. It does bloat repo size, it does mess with Github stats (if you care about those). It makes tools like this a bit harder to use. Version conflation doesn't seem like a real problem to me, but it could be I suppose. If you're working across multiple environments or installing things into a system path it's probably not a good idea.

But there are advantages to knowing:

A) 100% that when someone checks out a branch, they won't be running outdated dependencies, even if they forget to run a reinstall.

B) If you checkout a branch while you're on a plane without Internet, it'll still work, even if you've never checked it out before or have cleared your package cache.

C) Your dependency will still be there 5 years from now, and you won't need to boot up a server or buy a domain name to make sure it stays available.

So it's benefits and tradeoffs, as is the case with most things.

setr · on Nov 7, 2018

I understand that the tradeoffs exist, my surprise is mainly that would be an uncommon workaround in pythonland for workload specific tasks (eg most projects dont have differing library versions across branches; at least not for very long) is common practice in jsland

Although one factor I just realized is that pip also ships pre-compiled binaries (wheels) instead of the actual source, when available. Which would generally be pretty dumb to want in your repo, since its developer-platform specific; assuming js only has text files, it would be a more viable strategy in that ecosystem to have as a common case

Regarding B and C, its not like you’re wiping out your libraries every commit; the common case is install once on git clone, and only again on the uncommon library update. A and C is a bit of an obtuse concern for most projects; I can see it happening and being useful, but eg none of my public project repos in python have the issue of A or B(they’re not big enough to have version dependency upgrades last more than a day, on a single person, finished in a single go) and for C, its much more likely my machine(s) will die long before all the pypy mirrors do;

Which I’m pretty sure is true of like 99% of packages on pypy, and on npm; which makes the divergent common practice weird to me. It makes sense in a larger team environment, but if npm tutorials are also recommending it (or node_modules/ isn’t in standard .gitignores), its really weird.

And now that you’ve pointed it out, I’m pretty sure I’ve seen this behavior in most js projects I’ve peeked at (where there’ll be a commit with 20k lines randomly in the history), which makes me think this is recommended practice