Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Ruby in Jupyter Notebook (nbviewer.org)
181 points by Alifatisk on Aug 27, 2024 | hide | past | favorite | 55 comments


I love the idea. It's similar to what the Elixir folks have been working on with Livebook https://livebook.dev which seems somewhat more refined on the UI side + the benefits of distributed erlang/elixir (e.g. a livebook can interface with a live system and interact with the remote application/gpu etc).


Jupyter notebooks have been around for almost a decade now. The interesting piece here is adding Ruby as a language that can be used instead of the original Python. (There have been many other languages integrated with Jupyter as well.)


Jupyter means "Julia, Python, R", the three original languages that it was created for. Nowadays you can add support for many languages by using other kernels.

If you want to see the available kernels for each language you can check it here: https://github.com/jupyter/jupyter/wiki/Jupyter-kernels


It wasn’t ´created’ for the three languages at all.

Originally the whole project was ‘IPython’ and then ‘IPython Notebook’. It came out of Fernando Perez’s attempt to make interactive Python a better experience.

They renamed it Jupyter to reflect that it was no longer a pure Python project as kernels were written in even before it rebranded as Jupyter. Those three were the most popular at the time but by no means the only ones.

I used to work with one of the core team about the time they changed this!


Thanks for the correction!


Didn't know that. I always thought of Jupyter as originally a Python thing. Thanks for sharing that :)


Jupyter is a rebrand of iPython, so you were right the first time!


JuPytR aaaah!

TIL!


Ruby fits in here nicely then...


What is being presented at ElixirConf: https://x.com/josevalim/status/1828791276587934164

"What about using @livebookdev to spawn 64 machines with GPU on @flydotio, each machine fine-tuning BERT with different hyperparameters and graphing in realtime in 2-3 minutes?"


The Livebook UI is lovely. I'd like to see Jupyter get some love in this area.


It heavily inspired us for https://srcbook.com (a TypeScript notebook)


There is also jupyter_on_rails [0] which integrates both. Using it feels so good, I love how an app can suddently become a playground or a sandbox.

[0]: https://github.com/Yuki-Inoue/jupyter_on_rails


That's so bizzare yet cool!


This is a dead project. Everything in the SciRuby/sciruby-notebooks Github repository hasn't been updated in 8,9 years.


This page is talking about a project called iRuby, which I assume is unrelated or successor. The repository this page links had a 0.8.0 release exactly one month ago.

The same org does also have a sciruby project that hasn't been touched in 8 or 9 years like you say.


While the example-notebooks repo you've mentioned is not actively maintained it looks like the IRuby kernel used by Jupyter still is maintained: https://github.com/SciRuby/iruby


Is any of this usable on hosted GPU notebooks (e.g. paperspace)?

I'm currently diving into Machine Learning using Python + Scikit-learn, and I'd love to one day replace Python with Ruby. But looking at the current ML ecosystem I don't see that happening. Does anyone have experience building (Supervised / Unsupervised) models using something other than Python (including deployment)?


Since you wrote "using something other than Python" and not necessarily only Ruby, definitely look into Livebook and Elixir, and the whole ecosystem around it, including:

- https://github.com/elixir-nx/axon Nx-powered Neural Networks

- https://github.com/elixir-nx/nx Multi-dimensional arrays (tensors) and numerical definitions for Elixir

- https://github.com/elixir-nx/scholar Traditional machine learning on top of Nx

- https://github.com/elixir-nx/bumblebee Pre-trained Neural Network models in Axon (+ Models integration)

- https://github.com/elixir-explorer/explorer Series (one-dimensional) and dataframes (two-dimensional) for fast and elegant data exploration in Elixir

- https://fly.io/blog/rethinking-serverless-with-flame/ (for offloading large work to remote containers)

- https://www.youtube.com/watch?v=RABXu7zqnT0 InstructorEx

And of course Livebook (https://livebook.dev)

Old but interesting video https://www.youtube.com/watch?v=g3oyh3g1AtQ (Bumblebee: GPT2, Stable Diffusion, and more in Elixir)

A talk on the upcoming Elixir conference (https://2024.elixirconf.com/schedule/#schedules) is actually titled "Livebook in the cloud: GPUs and clustered workflows in seconds".


Very interesting, thanks for sharing! I didn't know Elixir was so invested in Machine Learning. I have a background in Erlang so I'll definitely dive in. Do you know of any individuals / companies that have had success with Elixir+ML in production?


You welcome. Afaik it has been a long-time bet done by José Valim (former Rubyist & author of e.g. Devise) and others, this is not something that is going away anytime.

Each library has been building on top of the previous libraries & abstractions (including transpiling Elixir instructions into GPU code, see "defn" etc).

Since you mention Erlang, there is even a Machine Learning Working Group at https://erlef.org/wg/machine-learning.

The most iconic & advertised case I'm aware of is the work done at Amplified https://www.amplified.ai ; this has been the topic of a Keynote at ElixirConf EU this year, which you can find at https://www.youtube.com/watch?v=5FlZHkc4Mq4.

I am also starting to use ML + Elixir in production and I'm aware of other individuals doing so.

I do not have a registry of companies doing so, but we're seeing more and more experienced ML practitioners mentioning they are coming from Python and willing to try something different (e.g. https://elixirforum.com/t/data-science-and-machine-learning-... and other posts on Elixir Forum).

Hope this helps!


I had looked into Elixir/Erlang + ML before, and I keep revisiting it for Python alternatives.

How developed is Elixir/Erlang in this area? What are the key advantages?


Perhaps not quite what you're looking for but Ankane does (a lot) of great work, e.g. https://github.com/ankane/torch.rb.


It's certainly possible, but the provider would need to add support for the kernel. That is, I'm pretty sure you can't install a new kernel in these environments yourself.


Nbviewer is just a service that lets you host notebooks, there is no computation there.

> Does anyone have experience building (Supervised / Unsupervised) models using something other than Python (including deployment)?

XGBoost/LightGBM have a C API and can be used from pretty much anything, deployment is not a problem. Practically building models is more about dealing with data, the ecosystem tends to revolve around Python and R for that reason.


R and Julia have had integration with Jupyter for a long time.

Even so, trying to avoid Python in the world of Jupyter will put you in a very tough spot. In general, doesn't matter how much you dislike it, there's no real way around it. You'll have to face it in some capacity whether you like it or not.


This renders better for me straight from Github than it does on nbviewer.org:

https://github.com/SciRuby/sciruby-notebooks/blob/master/get...


nbviewer has nothing to do with ruby specifically, this example links to a notebook using existing iruby kernel

> nbviewer is an open source project under the larger Project Jupyter initiative > along with other projects like Jupyter Notebook, JupyterLab, and JupyterHub.

https://nbviewer.org/faq#what-is-nbviewer


Does anyone have a way to use generative AI to generate Jupyter notebooks? I've tried with prompts but it chokes on the markup, and also I'm wondering if the markup wastes too much context for that to be a good idea anyways. Right now I just use Cursor or Claude, copy replies, and chop them up into blocks manually.


We've made an open source fork of Jupyter - kind of like Cursor but for Jupyter.

See GH: https://github.com/pretzelai/pretzelai/

You can install it with pip install pretzelai (in a new environment preferably) - then run it with pretzel lab. You can bring your own keys or use the default free (for now) AI server.

We also have a hosted version to make it easy to try it out: https://pretzelai.app

Would love to get your feedback!


Can it reliably generate interspersed blocks of markup and code with a single prompt?


Hmm do you mean you want to create multiple cells from a single prompt - some code cells, then some markdown cells, then some code cells and so on?

The sidebar can certainly produce code mixed with markdown but right now, we process the markdown and show visually.

https://imgur.com/a/bpYu8yN

The cell level Cmd + K shortcut only works on a given cell to create or edit code and fix errors. Just tested it and it generates markdown well (just start your prompt with "this is a markdown cell")

https://imgur.com/VuDciQN

In the sidebar/chat window, it should be trivial to not parse the markdown and just show it raw. I'll work on it. In the main notebook, it's a bit harder but we are planning to allow multi-cell insertions but it will probably take 2-3 weeks.


Yeah the golden goose for me personally is the ability to say "create a jupyter notebook about x topic" and have an LLM spit out interspersed markdown (w/ inline latex) and python cells. It would be really cool if the LLM was good at segmenting those chunks and drawing stuff/evaluating output at interesting points. Quick example to illustrate the idea:

https://imgur.com/04FUp9s

I find Cursor to be extremely good right up to that point - I can work with Jupyter via the VS code extension and quickly get mixed markdown like how you're describing now - but it cannot do the multi-cell output or intelligent segmenting described above. I currently split it apart myself from the big 'ol block of markdown output.


This is something we've experimented with and I know some other tools out there claim to do this, I've just found that there's a very simple issue with this: if the AI gets any step wrong, every subsequent step is wrong and then you have to review every bit of code/markdown bit by bit, and it ends up turning into more work than just doing the analysis step by step while guiding the AI. I'm optimistic that this will change over time as the AI gets better, but it's still quite fragile (although it demos really well...)


So if you had 3 markdown cells and 3 python cells, I would design the tool to pull all the content out of those cells and present it (sans all that ipynb markup, just contents, probably in markdown) to the model as the full context for every edit you want to make. So the tool would need to know how to transform a given notebook into a collection of markdown/python cells which it would present to the model to make edits. The model would need to return updated cells in the same format, and the tool would update the cells in the document (or just replace them entire with new cells from the response). I would be fine with this just blowing away all previous evaluation results.

Do you think that approach would work? Not sure if I'm misunderstanding the issue you're describing and I recognize it is likely much messier than I imagine.


This is something we're planning on doing - just generate a large bit of text with markdown text and code in the middle. This is actually how the newer models already generate code - with the only difference being there's only one code block.

Via the use of <thinking></thinking> blocks, it's pretty straightforward to get the the model to evaluate it's own work and plan the next steps (basically chain of thought) but then you can filter out the <thinking> block in the final output.

The last trick to making this actually work is to give the AI model evaluation power - make it be able to run certain inspection code to evaluate its decisions so far and feel that evaluation to the next set of steps.

Combining all of this, it's very possible to convert an AI chat into a multi-step markdown + code notebook that actually works.


I see, interesting. Hadn't come upon this use-case before but makes sense.

I've made a GitHub issue for this feature: https://github.com/pretzelai/pretzelai/issues/142

If you'd like to be updated when we have this feature in, please leave a comment on the issue. Alternatively, my email is in my bio - feel free to email me so that when we have this feature, we can send you an update!


Needs (2016)


This looks great! I’m already running Node as well as python in Jupyter notebooks so to see more languages being added is always great


Do you like using Node in Jupyter? Are there any downsides / problems that you find with it?

We're trying to build a TypeScript notebook and I'm very interested in what people's current tooling for this looks like today.


I like being able to use Node in Jupyter, but I can't say that how I'm using it feels as natural as it does with Python. I'm way stronger at python than node so it was pretty hard to get set up with it.

There's also probably a way to do it, but I haven't figured out how to use ESModule imports in Jupyter, so that's been a bit annoying also. It also hasn't crossed my mind until literally now, that one could write typescript in jupyter notebooks given that typescript needs to be compiled.

Do you have a link to your current work? I think I'd be interested


Imports is the thing that makes it useless. You cannot import stuff normally in the kernel..

Yes, my current project is https://github.com/srcbookdev/srcbook


Deno has support for Jupyter notebooks which is quite nice.

https://docs.deno.com/runtime/manual/tools/jupyter/


Cool maybe I'll take a look at that too


i thought there already was a ruby kernel.

are you the original developer or a new maintainer of iruby?


Rubyter notebook =)


Finally!!


[flagged]


I was talking with a coworker recently who was asking the question of why Ruby wasn’t the data science language of choice. I am still not sure the answer. Why not? Ruby seems like it would be great for math.


Python was the first programming language whose primary focus was not scientific applications, i.e. not MATLAB,Fortran or R, to have support for numeric arrays (matrices) via ndarray and then Numpy, while still being fairly approachable for the scientific community. If Travis Oliphant had picked Ruby for implementing ndarray, maybe would be the defacto language for data science.


Well maybe it should be. I was thinking the same about languages which are better suited for data wrangling. But to me, the reason python is the DS language shows how good evolution is and survival of the fittest. You and I would probably argue that a DS language should have strong static typing but in nature what prevailed was the opposite. A nimble, easy, no fuss, comedy oriented named language.


From my experience, python won out the research crowd exactly for the reasons you mentioned, but also because it is a bit of a swiss army knife platform. Python has plenty of libraries for doing pretty much anything these days. Computer Vision? Can do. ML? Can do. Dashboards? Can do. And the list goes on. This means that a researcher can do their research in python and then also do the implementation in the same language and platform they are familiar with.

If someone told me to use rust for doing any sort of interactive/iterative numerical research, I'd send them for the hills. And I love Rust. But when I want to quickly learn about the properties of some data, or try to use dataset A to predict dataset B, I don't give a rats ass about types, or type safety, or proper programming idioms. I just want to look at my data.

Then, if I convince myself that I have some sort of model that I want to use, I could theoretically reach for Rust, but generally I don't need to. I can just take the output of the research - more or less verbatim - and start using it in production. Often with minimal changes.

Yes, sometimes circumstances require to use something more performant, but these days you also have Numba or Nuitka to optimize some hot loops.

For the longest time I wanted Julia to become a success, because I felt that it combined in one language and platform the possibility to write easy no-fuss, scalable, performant and robust code. A language which could evolve a codebase from initial research analysis to strongly typed production code. Unfortunately it never seems to have gained traction, so we are left with the next best thing: python. It's not really good at anything (as a language), but it's the least bad out of everything out there and the libraries are excellent.


My only USE for a notebook would be to use with my Ruby code. I have zero use for Jupyter with Python.


Jupyter is a decent REPL.

I use it when I want to try a new language. Or when solving adventofcode.

If I had the desire to learn Ruby, i would probably start by downloading a docker image of jupyter with a Ruby kernel.


Jupyter Notebooks can be useful for more than small snippets of code.

A Ruby Jupyter kernel would be useful if you've got a Ruby codebase, and want to run a Notebook using some of that code.


Zooming out, this is the case in a lot of the programming world in general. Massive, massive duplication of effort so that each language community can do the same stuff as everyone else, but in their preferred language. There's what, 20-30 different general-purpose languages out there with a sizeable library ecosystem?

Now don't get me wrong, I love programming languages. It's all fun and interesting. But it's a bit absurd too.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: