Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: “Did You Mean?” for Python (github.com/dutc)
66 points by jamesdutc on Oct 25, 2014 | hide | past | favorite | 10 comments



It niggles me that the library is called "Did you mean?" but the error message is "Maybe you meant?". Why is the error message not "Did you mean X?"


This is probably not a good idea. (dutc = Don't Use This Code)

It's horrible that it even works.

It's surprising how well it works when it does work.

It was a great opportunity to solidify some knowledge about low-level (implementation) details.


Since you mention thinking there must be a better way in hook.c, you can override the traceback printer:

https://docs.python.org/3.4/library/sys.html#sys.excepthook

I haven't thought through how to access the state needed to make a suggestion though.


Yeah, that was one of the (less-bad) approaches I tried first:

https://gist.github.com/dutc/3f2c79048d95287be138

But you can see it's somewhat limited. And I was curious to actually hook an internal Python call!

By the way, that hook.c comment was wondering whether there might be a nicer (more nearly portable?) way to set-up these assembly sections. Some of this stuff was hacked together, but I have a couple other games I want to play with this gimmick, and I want to make the hooking a bit better.

https://github.com/dutc/libhook


Someone on GitHub asked "why shouldn't this be used":

https://github.com/dutc/didyoumean/issues/1

Here's the answer I gave:

It's not a particularly useful feature in practice. Misspellings resulting in AttributeErrors are generally caught pretty quickly, and this addition to standard error reporting would mostly be useful in interactive settings where the mistake would be obvious.

(Note that linting tools like PyFlakes already do a fairly good job of picking up on NameErrors, which result from using variables that don't exist.)

I have two other approaches to supplement error reporting with spelling suggestions (https://gist.github.com/dutc/3f2c79048d95287be138) that are a little less ‘janky.’

What makes this approach particularly offensive are:

- I implement this by hooking into a C function in the Python interpreter itself, `PyObject_GetAttr`: https://github.com/dutc/didyoumean/blob/2c9f01d03c1574f93155...

- I find the function, unprotect its memory page, then clobber first few assembly instructions with a jump: https://github.com/dutc/didyoumean/blob/2c9f01d03c1574f93155...

- I need to jump to an absolute address, since I don't want to (or can't?) calculate the relative addresses. I don't believe I can do this with a `push` and a `ret` or with a regular `call`, so I use a `jmp` instruction. The `jmp` instruction won't take an absolute address as an immediate value, so I have to use the `%rax` register: https://github.com/dutc/didyoumean/blob/2c9f01d03c1574f93155...

- Since I'm using this register, I have to save & restore its value. In the hooking code, I save its value with a `push %rax`. In order to restore its value, I have to patch the assembly for the hook function to stick a `pop %rax` before any other instructions: https://github.com/dutc/didyoumean/blob/2c9f01d03c1574f93155...

- In order to figure out the candidates for the spelling correction, I need to call `dir` on the object. But `dir` calls`PyObject_GetAttr` internally, and those calls can themselves trigger exceptions. In order to avoid this unbounded recursion, I have to implement a parallel code-path for `dir` by creating a `safe_PyObject_Dir`: https://github.com/dutc/didyoumean/blob/2c9f01d03c1574f93155...

- In some builds of Python, the internal CFunction which provides the Python builtin function `getattr()`, `builtin_getattr`, is compiled without explicit calls to `PyObject_GetAttr`. In order to hook into these `getattr` calls, I need to patch the builtin module. But because someone could have already gotten a handle on `getattr`, I need to directly patch the function: https://github.com/dutc/didyoumean/blob/2c9f01d03c1574f93155...

This approach is probably not portable, and it's definitely not a good idea.

However, figuring out all these small problems was a great way to put some low-level knowledge to use!


"In order to avoid this unbounded recursion, I have to implement a parallel code-path for `dir` by creating a `safe_PyObject_Dir`"

I thought this was intended to not be used. Why then, don't you do the (in)sane thing and set a global flag "I am calling `dir`; ignore calls to PyObject_GetAttr, please", and check that flag in your patch? (if you aren't sure whether a single global flag will do, use a thread-local one, but I think/guess that is overkill, given Python's GIL)

And, by the way, this is how extensions in Mac OS pre Mac OS X did their magic. It got really fun when multiple extensions tried to patch up the same OS call, or when the OS would unpatch your patches. For some OS calls, that would happen when the Finder launched, for others whenever an application launched.


Setting a flag is another approach. There's already an extension mechanism for storing arbitrary per-thread state in a dictionary object, accessible via `PyThreadState_GetDict`.

https://hg.python.org/cpython/file/1d708436831a/Python/pysta...

However, implementing the parallel code-path turned out to be easier (just a bit of cut & paste) and a bit saner to debug. `PyObject_Dir` tries to look up a few attributes on the object (`__dir__`, `__bases__`, &c.) which may not exist. There's a high likelihood of multiple AttributeErrors each time through.

Any documentation on Mac OS patches? Would be very curious to learn more about how these worked out in practice!


That old MacOS stuff is hard to find on the Internet. One thing I did find that gets a bit close is http://www.mactech.com/articles/develop/issue_16/Radcliffe_f.... Problem is that it likely is a bit hard to follow if you do not know about the stuff. Moreover, it doesn't really show patching itself.

I am fairly sure there is some article about patching in those archives of Develop, but I couldn't find it.

Wikipedia has little, too. http://en.m.wikipedia.org/wiki/INIT may provide some starting points.


Didn't I see a question on how to get the object when an AttributeError is thrown on stackoverflow yesterday?





Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: