Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Jupyter kernel using Poetry for reproducible Python package management (github.com/pathbird)
95 points by travisd on Dec 23, 2021 | hide | past | favorite | 26 comments



Hi HN!

Author/OP here. Wrote this small but mighty package to make it easy to create, run, and share Jupyter notebooks with reproducible environments. Was definitely inspired by Julia's Project.toml/Manifest.toml package management.

Ended up building it to support my startup Pathbird which lets instructors build engaging, interactive courses using computational lessons. A big value proposition there is that every student gets exactly the same environment (no dependency version issues!) so we want to make sure that the environment running on Pathbird exactly matches the environment on the instructor's computer.


Cool stuff! Have you considered using Nix for this? That way you get even more reproducibility, because all the OS packages (including Python and Poetry) can be easily pinned. If you only use packages which are already in nixpkgs you won't even need Poetry for easy repro.

Example[1]

[1] https://github.com/linz/stac/blob/6a82e92432945777fbd49631b4...


This example is too impractical if you need arbitrary, pinned PyPI packages. Trying to have Poetry work with Nix has been so painful to me that I'd just rather use a FHS shim.


Poetry2nix[1] is good for that. A lot of packages unfortunately require overrides to work, but poetry2nix ships with a bunch of these by default.

[1] https://github.com/nix-community/poetry2nix


I switched to Nixpkgs earlier this year, after getting annoyed with other options for working with Java and Python (in particular).

NixOS on servers, Nixpkgs-darwin instead of homebrew (mostly), and Nixpkgs on Fedora for a laptop. Solid results so far.


This is excellent work.

I worked on a similar project for my last company. However the goal was portable notebooks that could be executed anywhere, and rapidly deployed to various environments with minimal installation. I ended up using a combination of LinkedIn's shiv library with Netflix's papermill library.

The point was to turn a notebook into an executable runtime with all the dependencies embedded. I don't want to get into the specifics of why we were doing this but it had to do with my previous employers product which is targeted at no code and low code folks, and integrated Jupyter notebooks.

I think the use of poetry is very elegant here. And the fact that you can reuse the same kernel easily is a huge plus.


Are you going to present at jupyter community day?https://discourse.jupyter.org/t/jupyter-community-calendar/2...


Ah noice.

I have been doing this with a hand edited kernel.json for a year or two and it works perfectly.. hadn't been looking forward to demoing it to coworkers due to hacky setup.

You've solved the only problem then perfectly!


Yep, previous version was exactly that! I decided to package it up to make it easy to give to instructors who wanted to use Pathbird.

Way more than half of the effort on this project when to looking in to how to distribute the kernel.json file (I ended up copying from ipython). The actual code that runs is little more than a Popen.


How does it compare to conda/mamba + env.yaml?


Likely inferior since it’s python only. You can get pretty far with python first but in most sciences that’s not going to get you all sorts of community standard things that are C/C++/fortran


Huh?

Have you ever used poetry before? Are you implying that it can't be used to install Python dependencies that have c/fortran like pandas?

If that's what you're saying that is not correct.


> Are you implying that it can't be used to install Python dependencies that have c/fortran like pandas?

No, they are implying that many scientific python packages and dependencies of python packages have dependencies on non-python libraries and packages written in c/c++/fortran. If you want to manage and reproducibly pin these they also need to be tracked, because otherwise you just end up with "whatever the underlying OS has installed", if you are lucky, or "compile failed" when installing the package if you aren't.


I think he was saying that conda/mamba are much more language agnostic than poetry. It handles a lot of packages that have /nothing/ to do with python, and it handles them a reasonable way.

Poetry does seem much more tied to python (for better or worse).

Keep in mind at least in mamba's case, not being tied to the python interpreter is very much a feature, not a bug: https://medium.com/@QuantStack/open-software-packaging-for-s...


I agree. I know people who avoid conda because of long install/update times, but I am more than willing to occasionally have to go outside for a walk while packages I rely on get properly managed and installed.

BTW, sorry for being off topic, but Lex Fridman has two great recent long form interviews with Anaconda founders Peter Wang and Travis Oliphant. Good discussions!


We have found that mamba (fast alternate front end instead of conda) and micromamba entirely fixes this issue for us. IMO “conda install” is entirely unfit for purpose and probably puts lots of people off of the ecosystem (15 minutes for megabytes of error messages often in the case of unresolved dependency trees). Our CI was often spending more time running conda than running our test suite.

I assume there is some political difference or acrimony which means the actual sources about core conda are entirely silent on the issue - I’ve privately replied to people expressing concern over this on the conda mailing list several times and people are always relieved to know there is a solution.

Seriously, it’s the only way we can tolerate using conda as an ecosystem any more, and only ever feel this pain of 15-20 minute feedback cycles preparing packages with conda-build (for which boa is a promising new replacement we haven’t moved to yet)


A python library with C/fortran is different than a C-only library. eigen is a great example of this. Many python packages depend on eigen, and there even exists a pyeigen, but assuredly almost none of them depend on pyeigen.

In conda, you can declare this properly, and you can have one true eigen in your environment.


Likely even a little bit more narrow than Python-only since it seems to be Poetry-only. Most Python projects probably work okay with Poetry, but it's still a relatively new tool compared to the other options.


"Likely":

Translation: I've never used poetry, don't know anything about it, and I'm going to talk out of my rear end and assume that it doesn't use conventional, existing venv/pip under the hood.

There isn't a single python project out there that can't be worked on with poetry. It creates a venv. Poetry is doing a lot of things to manage automatically activating and deactivating it, etc, but it's just a venv.


False. Poetry implements it’s own dependency resolver. It does not use pip’s resolver. For most packages this is fine. For a couple packages it’s resolver does not work. Pytorch is one major ml library that is incompatible with poetry as poetry has a non-standard interpretation of local package version tags.

https://github.com/python-poetry/poetry/issues/4231

Is one issue. If you even transitively depend on pytorch poetry will break and you will need hacks to make it work or just give up.


yes, alas major exceptions like pytorch make this old xkcd still ring true today: https://xkcd.com/1987/

also, a lot of python packages don't seem to follow the idea of semvar very well. Especially when you compare it to other communities like Go.


The xkcd hover-over text sounds almost like it's describing containers:

> The Python environmental protection agency wants to seal it in a cement chamber [...]


ISTR poetry really didn't like the black "eternal beta" versioning scheme for a while and installing it didn't work properly.

Also, it really doesn't work super well if you need to work on multiple development projects simultaneously, and when "just change the way you structure your entire project" is poetry's answer to this, the solution is "okay we won't use poetry".


I don’t think this is much of a concern here. Poetry is still not super compatible with a bunch of tooling, but that only matters when consuming Poetry projects. When you just install a dependency into a poetry project, it gets installed into a venv - nothing particularly exotic about it.


It matters a lot, because C, C++ and Fortran dependencies make up a large number of Python package dependencies. It’s not really a Poetry problem though, but a PyPi problem. Wheels improved the situation but aren’t a panacea.


Correct. They are commenting on Poetry without knowing anything about it, other than it being new and unfamiliar. Always amazed at the arrogance behind that behavior.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: