Debugging Lisp (2015)

phoe-krk · on Jan 7, 2021

Warning: a bit of a shameless plug.

I have recently given an interview[0] for Immutable Conversations where I show some of the techniques described in these blogposts. In the video, inspect the state of the Lisp image, and I evaluate arbitrary code (redefining functions and variables) while not leaving the debugger. Perhaps HN can find this interesting, as it is a livecoding demonstration of how a Lisp programmer might make use of these techniques in real-life scenarios.

The examples are trivial and might be perhaps a bit too trivial for people used to programming, but the secondary point of the video was to demonstrate the livecoding techniques to people who possibly do not know Lisp whatsoever, and I didn't want to burden them with complicated code examples. (The primary point was to describe the Common Lisp condition system, which I have written a book[1] about, and show the basics of control flow in Common Lisp that are the foundation of conditions.)

[0] https://www.youtube.com/watch?v=pkqQq2Hwt5o

[1] https://news.ycombinator.com/item?id=24867548

aidenn0 · on Jan 6, 2021

I now have a good place to link to when talking about lisp debugging being the killer feature that keeps me coming back to lisp as a language.

I usually get very confused responses when I say this, as people think "single step through the code" when I say "debugger" and I respond with something like "I'm pretty sure there's a way to step through the code in SLIME, but I never learned it because I've never had a need for that."

A quick skim through the series shows that the author feels the same way; there doesn't seem to be a mention of SLDB's stepper.

fiddlerwoaroof · on Jan 6, 2021

It's implementation dependent, but something like this works with sbcl:

    C-u C-c C-k ;; recompile with (debug 3)
    (step (some-function))

I also never do this

lokedhs · on Jan 7, 2021

I discovered the "s" key more or less by accident. Since I learned that you actually can single-step, I've used it a few times, but most of the time it's not necessary.

Compare this to another project I've been working on which is written in Kotlin. Almost every debugging session involves a lot of single-stepping.

There is no insightful point of this reply, except for agreeing that Lisp is powerful enough that single-stepping is mostly pointless. I do add a lot of (break) statements during debugging though.

bcrosby95 · on Jan 7, 2021

Realistically, how often do you need to redefine a class and care about having to restart your runtime?

lisper · on Jan 7, 2021

It can be incredibly handy if you are doing a computation with a lot of state. I'm currently working on a tool that does chip design, the state for which runs into hundreds of gigabytes. Being able to redefine a class without restarting can be an absolute life saver in a situation like that.

patrec · on Jan 7, 2021

Exactly. For exploratory programming with long running computations losing all your state because of something you could fix by hot-reloading a one line change can be rather frustrating -- I remember cursing at python years ago when I was running some ML experiments and more than once lost all data right at the end due to some silly bug at the serialization stage that would have been one line to fix.

Mind you, with python you can at least run things with ipython --pdb to get thrown into a debugger on failure so you can potentially serialize some state before losing it. And stacktraces are better than common lisp's. But having a robust way to redefine stuff or fixing up a failed computation is definitely very handy in some contexts.

lisper · on Jan 7, 2021

> stacktraces are better than common lisp's

Stack traces are not standardized in Common Lisp, so this is a non-sensical statement. At worst you could say that Python's stack traces are better than some particular CL implementation, but not CL in general.

And almost certainly, if you don't like the way your CL presents stack traces, you can easily change it.

patrec · on Jan 7, 2021

> Stack traces are not standardized in Common Lisp, so this is a non-sensical statement.

Yes, someone could have secretly implemented a common lisp implementation that has the most ergonomic stacktraces in the whole wide world or, equally, a posix shell that runs numerical code a gazillion times faster than the equivalent C compiled with gcc -O3 -mnative -ffast-math because after all the relevant standards do not explicitly forbid it!

I'm not quite sure why lispers in particular are so in love with this argument.

There is nothing nonsensical in saying that shell is a truly terrible language to write high performance numerical code in, even if that is not true by some sort of logical necessity.

In practice most languages have a dominant implementation (even those with ISO standards) and a range of capabilities that existing (and likely future) implementations fall within. Both are far more important than what standards say (try compiling code with djb's standard conformant usage of errno some time).

p_l · on Jan 7, 2021

Can you give an example of Python stacktrace being better than CLs? It hasn't been true in my experience, so I'm wondering.

patrec · on Jan 7, 2021

A good example would take a fair amount of space, but let's try this bogus example:

This is ipython:

    In [2]: os.path.join(None)
    ---------------------------------------------------------------------------
    TypeError                                 Traceback (most recent call last)
    <ipython-input-2-ba05fdaae739> in <module>
    ----> 1 os.path.join(None)

    /opt/anaconda3/lib/python3.8/posixpath.py in join(a, *p)
         74     will be discarded.  An empty last part will result in a path that
         75     ends with a separator."""
    ---> 76     a = os.fspath(a)
         77     sep = _get_sep(a)
         78     path = a

    TypeError: expected str, bytes or os.PathLike object, not NoneType

This is SBCL:

    * (merge-pathnames nil)

    debugger invoked on a TYPE-ERROR in thread
    #<THREAD "main thread" RUNNING {1000560083}>:
      The value
        NIL
      is not of type
        (OR (VECTOR CHARACTER) (VECTOR NIL) BASE-STRING PATHNAME SYNONYM-STREAM
            FILE-STREAM)

      when binding PATHNAME

    Type HELP for debugger help, or (SB-EXT:EXIT) to exit from SBCL.

    restarts (invokable by number or by possibly-abbreviated name):
      0: [ABORT] Exit debugger, returning to top level.

    (MERGE-PATHNAMES NIL 70256781343884 MERGE-PATHNAMES) [external]
    0]

I find the first much quicker to read and parse (better layout, no SHOUTING, color coded, context info) and you can immediately see what file and location you'd need to "fix". What is an example were you prefer the lisp stacktrace to something you'd get in interactive development with ipython or in production with newrelic or anything else that captures python stacktraces?

_19qg · on Jan 7, 2021

The difference is that SBCL caught the error directly on entry of MERGE-PATHNAMES. SBCL did on call to MERGE-PATHNAME a runtime type check. It knows the expected types for the arguments.

SBCL told you that the call to MERGE-PATHNAME is already wrong. A backtrace then will only show higher up code from the environment and the call to MERGE-PATHNAME.

Your Python code went into the routine...

Often the Python backtrace will be easier to understand, since it is source/line oriented, since Python code usually does not have extensive code transformations (-> Lisp macros) and using an optimizing compiler like SBCL may make the code less debuggable (for example when using tail call optimization).

travv0 · on Jan 7, 2021

You didn't post CL's stacktrace.

patrec · on Jan 7, 2021

I did. There just isn't much of a callstack, because I call a single function with an invalid argument.

travv0 · on Jan 7, 2021

    * (merge-pathnames nil)

    debugger invoked on a TYPE-ERROR in thread
    #<THREAD "main thread" RUNNING {10010B0523}>:
    The value
        NIL
    is not of type
        (OR (VECTOR CHARACTER) (VECTOR NIL) BASE-STRING PATHNAME SYNONYM-STREAM
            FILE-STREAM)

    when binding PATHNAME

    Type HELP for debugger help, or (SB-EXT:EXIT) to exit from SBCL.

    restarts (invokable by number or by possibly-abbreviated name):
    0: [ABORT] Exit debugger, returning to top level.

    (MERGE-PATHNAMES NIL 4946604 MERGE-PATHNAMES) [external]
    0] :backtrace

    Backtrace for: #<SB-THREAD:THREAD "main thread" RUNNING {10010B0523}>
    0: (MERGE-PATHNAMES NIL 4946604 MERGE-PATHNAMES) [external]
    1: (SB-INT:SIMPLE-EVAL-IN-LEXENV (MERGE-PATHNAMES NIL) #<NULL-LEXENV>)
    2: (EVAL (MERGE-PATHNAMES NIL))
    3: (INTERACTIVE-EVAL (MERGE-PATHNAMES NIL) :EVAL NIL)
    4: (SB-IMPL::REPL-FUN NIL)
    5: ((FLET "LAMBDA0" :IN "SYS:SRC;CODE;TOPLEVEL.LISP"))
    6: (SB-IMPL::%WITH-REBOUND-IO-SYNTAX #<CLOSURE (FLET "LAMBDA0" :IN "SYS:SRC;CODE;TOPLEVEL.LISP") {96F7CB}>)
    7: (SB-IMPL::TOPLEVEL-REPL NIL)
    8: (SB-IMPL::TOPLEVEL-INIT)
    9: ((FLET SB-UNIX::BODY :IN SAVE-LISP-AND-DIE))
    10: ((FLET "WITHOUT-INTERRUPTS-BODY-7" :IN SAVE-LISP-AND-DIE))
    11: ((LABELS SB-IMPL::RESTART-LISP :IN SAVE-LISP-AND-DIE))
    12: ("foreign function: #x43270B")
    13: ("foreign function: #x403F08")

    0]

patrec · on Jan 7, 2021

Ugh o/c sorry, my bad, I should just have done it in slime to start with (esp since I've compared it ipython); was just too lazy:

    The value
      NIL
    is not of type
      (OR (VECTOR CHARACTER) (VECTOR NIL) BASE-STRING PATHNAME
          SYNONYM-STREAM FILE-STREAM)
    
    when binding PATHNAME
       [Condition of type TYPE-ERROR]
    
    Restarts:
     0: [RETRY] Retry SLY mREPL evaluation request.
     1: [*ABORT] Return to SLY's top level.
     2: [ABORT] abort thread (#<THREAD "sly-channel-1-mrepl-remote-1" RUNNING {1004894CD3}>)
    
    Backtrace:
     0: (MERGE-PATHNAMES NIL 69988797066848 MERGE-PATHNAMES) [external]
     1: (SB-INT:SIMPLE-EVAL-IN-LEXENV (MERGE-PATHNAMES NIL) #<NULL-LEXENV>)
     2: (EVAL (MERGE-PATHNAMES NIL))
     3: ((LAMBDA NIL :IN SLYNK-MREPL::MREPL-EVAL-1))
     --more--

This is nicer than "raw" sbcl but I still have trouble seeing how anyone could prefer looking at common lisp backtraces (with the caveat that I only have used open source lisp implementations; I have no idea what allegro or lispworks are like).

However, as I wrote common lisp is much nicer in some other respects (as you undoubtedly know). For a few other toy examples let's say I do:

    (/ 1 (random 2))

This will cause DIVISION-BY-ZERO 50% of the time. But if that happens one of the possible restarts (also seen above) is just try the same thing again. I can try as many times as necessary to get (/ 1 1). Of course this is a silly example, but realistic cases are not hard to come up: you forgot to copy a file to the right place or the disk is full and you need to make some space before retrying. Or you have a transient network failure etc. Similarly

    (mapcar #'sine '(1 2 3))

The function sine does not exists, but one of the possible restarts allows me to supply something else instead:

    The function COMMON-LISP-USER::SINE is undefined.
       [Condition of type UNDEFINED-FUNCTION]

    Restarts:
     0: [CONTINUE] Retry using SINE.
     1: [USE-VALUE] Use specified function
    [...]

If I press 1 and then provide #'sin I'll get (0.84147096 0.9092974 0.14112). But the more fun thing to do is to just implement the missing function there and then. Whilst the debugger window stays active, I can just write my "sine" function in the editor or repl and then retry, e.g. writing (defun sine (x) (sin x)) will give the same result.

This is pretty cool, because it means you can start writing some topdown code start running it an incrementally fill in the missing functions you are calling bad haven't yet defined without ever losing your state.

travv0 · on Jan 7, 2021

One other nicety I'd add that you don't get from the Python stacktrace is the ability to inspect each frame to see the local bindings, and even restart evaluation from a previous frame. I agree that the Python stacktrace looks nicer on a surface level, but I'd argue that in practice SBCL's debugger is more helpful.

patrec · on Jan 8, 2021

> One other nicety I'd add that you don't get from the Python stacktrace is the ability to inspect each frame to see the local bindings

No, you do get that. Even in the plain python interpreter you can do import pdb; pdb.pm() after an error to do post mortem debugging and walk up and down the stackframes and inspect or manipulate local variables. In ipython that happens automatically (if you run with --pdb) or after you type `debug` after an exception. And tooling for production stacktraces normally also captures local variables.

There are a bunch of additional niceties that Common Lisp has, such as turtles-most-of-the-way-down: you might eventually hit a foreign function call you cannot further inspect, but for most Common Lisp implementations almost everything is implemented in lisp. In python you a significant proportion are C extensions which are opaque to the built in debugger, although you can make gdb work. Also remote debugging is much more natural in common lisp.

Generally the stuff that is better about the debugging experience in python is in a way more superficial and the stuff that's nicer in common lisp is much more fundamental, and yet, my experience is different than yours: the superficial stuff that python does well and common lisp does badly or simply less well matters more for overall productivity for most things I tend to do. This is although the debugger related stuff you can't do as well with common lisp amounts more or less to minor friction whereas the stuff you can't do with python is really hard to work around if you need it.

I think this is an object lesson on focussing on the (right) low hanging fruit.

artemonster · on Jan 7, 2021

can you share what you're working on? is this open?

tgbugs · on Jan 7, 2021

Having done years of development in Python I can say, all the time. The thing is that in languages like Python you just learn that it cannot be done, or that if you do you are inviting in chaos and the creeping sanity destroying little unreproducible bugs. As a result the development workflows simply avoid ever doing that and pretend like they don't miss it. When you actually have the ability to do it, you do it all the time during development and on many other occasions. You don't need fancy multiple failover distributed systems in order to keep everything working while you take a node down, you just switch out the class like it was nothing.

fiddlerwoaroof · on Jan 7, 2021

In Common Lisp, classes are mostly specialization tags so you don't have to redefine them all that much. However, being able to redefine your functions and other code on the fly is amazing (i.e. make a request to a REST endpoint, get an exception in the debugger, modify and reload just a couple functions and then restart the request handler without ever going back to the browser/Postman and re-issuing the request)

fiddlerwoaroof · on Jan 7, 2021

Also, little story: once I had to migrate a large database between an old, schemaless, version of a product to a new one that enforced schemas. Because the old version was schemaless, there was a whole host of minor schema issues. For various reasons the migration process already involved a Python script spawning an older version of Python and communicating with it over a pipe. I started running the migration and it failed 20 minutes in, so I fixed the issue and restarted the migration only for it to fail differently 20 minutes in. At this point I wrote some code to catch the exception, spawn a prompt that let me input the appropriate conversion code that got evaled and added to a dictionary of translators, along with a function to determine if the conversion applied. This way, every 20 minutes or so I’d come across a new form of corruption and add a translator to fix it without restarting the whole process.

When I learned CL, the condition/restart system and pervasive hot code reloading reminded me of my Python hack and how much easier it would have been in a language designed for this sort of programming.

jonathankoren · on Jan 7, 2021

That was a killer feature for me. I had a long running job that would mysterious crash after hitting an edge case after a couple of days processing. I was able set a breakpoint, fire the job off, come back in a couple of days to stopped process, and step through the busted function, patch it, then restart from one level up on the call stack, and then continue the job. If this was another language, I would have spent much longer just figuring out what to log, and restarting the whole task while testing the job.

It was amazing.

Jach · on Jan 7, 2021

In Java, all the time, because your code is forced into classes, and restarting a program of any non-trivial size for every minor change sucks. The default experience isn't very good when it comes to hotswapping changes, you're basically limited to editing existing method bodies. JRebel eventually came on the scene to do a lot better and it's magical, it can even handle a bit of re-initialization for Spring beans or making changes in code based on changes in XML files, etc. It has by itself saved me days of waiting on app restarts, now multiply that by a couple thousand other devs in just one company and it's a no-brainer. (Or should be -- such logic taken seriously ought to have made interactive development championed by Lisp more popular decades ago instead of decades later having other languages with suped up IDEs and plugins slowly rediscovering the benefits.)

But JRebel is still fundamentally an incomplete solution because of the way it handles (or fails to handle) existing instances. In practice you learn to live with it, designing for more static methods or short-lived objects so the next time a code path runs it'll be with new objects and new code, or you bite the restart bullet for the times when you have a long-lived object that you need to change and it's important that it uses the new code and data. Those times are typically rarer, admittedly, and so most of the benefit is captured by being able to redefine classes and methods (and functions in Lisp) more-so than updating existing instances, but it's great that Lisp supports the latter when you need it.

mikelevins · on Jan 7, 2021

All the time. Constantly. Often enough that when I was using a Scheme that compiles to the JVM, I wound up in extended conversations with the compiler's author about how to add such features to his compiler. It was possible to do, and he was interested in doing it, but it was really too much work to be practical.

The context matters, though. I've been working with Lisp systems (and occasionally with Smalltalk systems) for over thirty years, and the way I normally work is to start with some simple sketch of a model and some simple interactions with it, and build it up gradually, interactively. While my program runs, of course.

That means making a lot of incremental changes to the model as I go, which, in a class-based system, means a lot of changes to classes. And since I prefer to modify my work while it runs, that means a lot cases of the environment saying, "hey, this definition changed; what do you want to do about it?"

If you typically work in an environment that doesn't support those kinds of incremental changes to a running system, then of course you're going to learn a way of working where such situations don't come up, because they can't.

EdwardCoffin · on Jan 7, 2021

I have done this to debug a difficult to reproduce bug, where restarting the runtime would have lost my hold on the occurrence.

yomly · on Jan 7, 2021

Aren't our entire telecom infrastructure powered by Erlang which exploited hot reloading as a desirable property for that kind of mission critical infrastructure?

kleiba · on Jan 7, 2021

Defining functions and data structures is all that programming is, hence debugging is exactly redefining classes (and functions).

Your question about not needing to restart the runtime ("how often do you have to do this?") seems to imply that you expect the common answer to be "(almost) never" -- that expectation is not surprising since most languages don't even support this kind of debugging!

That's exactly the point of the author: to demonstrate a different way of debugging that's afforded by Lisp and to demonstrate that it can be incredibly powerful. It's just something that is perhaps unfamiliar to programmers who are more used to other programming languages where the debugging facilities are different.

pfdietz · on Jan 7, 2021

Every time I redefine a class, because I develop in Lisp by loading the existing version of something, then modifying it and recompiling/reloading the changed files.

cat199 · on Jan 7, 2021

as a counterpoint - why should you need to restart your runtime when your program encounters a bug and you could have fixed it dynamically?

phoe-krk · on Jan 7, 2021

For me, it's about ten to a thirty times a day when I'm working on defining and redefining Lisp classes. And I lose completely no program state in the process.

scroot · on Jan 7, 2021

Smalltalk perspective: I work a lot with graphics (Morphic) and it is quite a regular activity to add or remove behaviors to custom Morph objects while they're out being displayed on my screen. It would be a tremendous pain in the ass to have the tear the world down and build it up again each time just to see a small change in my graphic element