matplotlib doesn't score highly on usability. jupyter notebooks encourage disorg...

wiz21c · on March 26, 2022

For Jupyter, it depends on the workflow. Especially with data sciences. In data science, you spend a lot of time playing with the data, testing things, drawing charts, computing, etc. When you do that, the cost of starting a python interpreter, loading the imports, loading the (usually big) data becomes a real pain 'cos you iterate like hell. Working in a REPL becomes really important.

But even more, working with Jupyter allows you to work out a very detailed explanation of your thought process, describing your ideas, your experiments. Being able to mix code and explanations is really important (and reminiscent of literate programming). You got the same kind of flow with R.

As data scientist, I'm concerned about data, statistics, maths, understanding the problem (instead of the solution). I don't care about code. Once I get my data understanding right then comes the time of turning all of that into a software that can be used. Before that, Jupyter really gives a productivity boost.

For the code part, yep, you need other principles where Jupyter may not be suitable.

patrick451 · on March 26, 2022

It's interesting, I never feel like I get these exploratory benefits from jupyter notebooks. I just end up feeling like one hand and half my brain is tied behind my back. I'm most productive iterating in a similar way to what you describe, but in an ipython terminal, running a script and libraries that I'm iterating on in a real editor. If there are expensive computations that I want' to check point, I just save and load them as pickle files.

kzrdude · on March 26, 2022

I have to say I think a jupyter notebook format is a 10x improvement in productivity over ipython. It's just so much easier to work with - and a step more reproducible too, my scribbles are all there saved in the notebook, at least!

wiz21c · on March 26, 2022

really interesting. I may have overlooked IPython a bit (I just thought Jupyter was its improved version). For the moment, maybe like you, I prerpocess the data (which takes minutes) into numpy array which then take seconds to load. But once I add imports, everything takes about 5 or 6 seconds to load everything I need. So Jupyter remains a good idea. Moreover, I love (and actually need) to mix math and text, so markdown+latex maths is really a great combo. I dont' know if one can do that in IPython, I'll sure look!

analog31 · on March 26, 2022

I've programmed in a number of languages over the past 40+ years, starting with BASIC, and every one of them encourages sloppy coding. The good discipline always has to be taught, learned, and willingly practiced. The closest I came to a language designed for teaching good practices was Pascal.

I find it easier to read and understand bad code written in Python, than good code written in the C family languages.

da39a3ee · on March 27, 2022

Yes, what I was saying was that writing Python code in files is a better and more educational way to program than writing Python code in a Jupyter Notebook. It wasn't a criticism of Python.

analog31 · on March 27, 2022

I use Jupyter a lot, but have a personal rule to do "restart kernel and run all cells" once in a while, to scare up any kind of hidden state or out-of-order execution problems. For instance, if I'm about to leave a notebook for a while, I'll make sure it runs without error from top to bottom.

In that sense, I'm making it work like Python code in a file. The advantage of code in files is that I can use all of the slick code analysis tools that will warn me about my mistakes. I wish there were something that would let those tools go through the code in a Python notebook from top to bottom.

kzrdude · on March 26, 2022

papermill is good and ploomber is a thing to watch.

Ploomber makes it systematic - store notebooks as .py (py:percent files for example), parameterize them with papermill and execute as a batch job. One can view the resulting jupyter notebooks as .ipynb later and produce reports as html if wanted. It's really good already, and better if ploomber gets more development.

The whole reason it works is because it's easy to open the .py notebook and work on it, interactively, in jupyter.

The main idea - jupytext for .py notebooks and papermill for parameters & execution - that's already "stable" and easy for anyone to use for their own purposes.

edublancas · on March 26, 2022

(ploomber maintainer here)

Any feedback for us? What can we do to improve Ploomber?

kzrdude · on March 27, 2022

Maybe I haven't come far enough with my ploomber use to tell yet! It works nicely but I know I'll learn more and open my eyes more as I go.

As a first impression, I eventually found meta.extract_upstream = False which I think is important. Reason: The code for each step should be a lego piece, a black box with inputs and outputs. That code should not itself hardcode what its predecessor in the pipeline is - you connect the pieces in pipeline.yaml. (extract_upstream = False is not by itself enough to solve this, since you also need to be able to rename inputs/outputs for a notebook to be fully reusable as a lego piece, but it's good enough for now.)

I also for my own sanity need to know more about how the jupyter extension part works, how it decides to load injected-parameters or not. But maybe I could learn that somehow from docs.

In general I want components that are easy to understand and plug together and less magic (but the whole jupyter ecosystem's source code feels this way to me unfortunately, lots of hard to follow abstractions passing things around). But it's developing rapidly and already very useful, thank you so much!

edublancas · on March 29, 2022

This is great feedback, thanks a lot!

I'll ensure we display the "extract_upstream" more prominently in the docs, we've been getting this feedback a couple times now :)

Re: the Jupyter extension injects the cell when the file you're opening is declared in the pipeline.yaml file. You can turn off the extension if you prefer.

Feel free to join our community, this feedback helps us make Ploomber better!

https://ploomber.io/community

monkeybutton · on March 26, 2022

Being able to hack out code to explore and experiment with data while not having to reload and reprocess data (thanks to that global mutable state!) saves a hell of a lot of time in the long run.

mistrial9 · on March 26, 2022

.. and get more new users each year than four generations of PCs combined

patrick451 · on March 26, 2022

> matplotlib doesn't score highly on usability.

The one place matplotlib sucks is any kind of interactivity. But other than that, matplotlib has the best, most intuitive interface of all the python plotting libraries I've tried. It's also one of the few libraries that doesn't rely on generating html for a webbrowser, which makes for a miserable workflow.

I still think Matlab's plotting is untouched by open source options.

cinntaile · on March 26, 2022

Matplotlib hides a lot of complexity if you ask me. As soon as you do something in a different way than intended you're off searching stackoverflow for a post that did something similar to what you want. Then you tweak it a little and hope it works.

taeric · on March 26, 2022

The charts that R produces are typically better looking. But...R. :(