Hacker News new | past | comments | ask | show | jobs | submit login

I would be interested in doing some sort of hybrid mirror / cache of a few repos, if I could do them just in time style. I don't need all of pypi (nor do I want 13.5TB of packages). I probably only ever use a few hundred packages at most.

I would like to point all my systems at my server. And if I `pip install pandas` and its not on my server the server grabs it and passes it through and syncs that package locally from then on. Same with yum, npm, docker, or whatever.

And I just realized I could use Artifactory as a caching proxy and at least save some time there. However, that doesn't mirror the package it just caches that specific version. I would be very interested in something where the system sees that I use `pandas` now and will mirror it. Or give it a requirements.txt and it flags all those packages and dependencies for mirroring.





It looks like DevPi may just work as a caching proxy as well. I also found Bandersnatch too. https://github.com/pypa/bandersnatch which is a configurable mirror with allow / block lists.

Essentially want something like DevPi but add the package to the bandersnatch allow list and mirror it from then on. With some extra large packages in the deny list.

Probably possible to wire that all up reasonably well, but probably just using a caching proxy is 90%+ of the improvement anyways. So may just stick with that.


I don't know of any particular solution for the python ecosystem, I don't use it often enough to justify dealing with that can of worms.

But for npm, there is Verdaccio which does exactly what you want, and is what I'm using.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: