Hacker News new | past | comments | ask | show | jobs | submit login
Ghostwriter – use the reMarkable2 as an interface to vision-LLMs (github.com/awwaiid)
211 points by wonger_ 83 days ago | hide | past | favorite | 80 comments



Project author here -- happy to elaborate on anything; a continuous WIP project. The biggest insight has been limitations of vision models in spacial awareness -- see https://github.com/awwaiid/ghostwriter/blob/main/evaluation_... for some sketchy examples of my rudimentary eval.

Next top things:

* Continue to build/extract into a yaml+shellscript agentic framework/tool

* Continue exploring pre-segmenting or other methods of spacial awareness

* Write a reSvg backend that sends actual pen-strokes instead of lots of dots


Wow! This is really cool! Really really cool! I imagine some sort of use where it's even more collaborative and not just "unadorned turn-by-turn".

For example, maybe I'm taking notes involving words, simple math, and a diagram. Underline a key phrase and "the device" expands on the phrase in the margin. Maybe the device is diagramming, and I interrupt and correct it, crossing out some parts, and it understands and alters.

Sorry, I know this is vague, I don't know precisely what I mean, but I do think that the combination of text (via some sort of handwriting recognition), stroke gestures, and a small iconography language with things enabled by LLMs probably opens up all sorts of new user interaction paradigms that I (and others) might be too set in our ways to think of immediately.

I think there's a "mother of all demos" moment potentially coming soon with stuff like this, but I am NOT a UX designer and can't quite imagine it clearly enough. Maybe you can.


Yes! I have flashbacks to productive times standing in front of a whiteboard, alone or with others, doodling out thoughts and annotating them. When working with others I can usually talk to them, so we are also discussing as we are drawing and annotating. But also I've handed diagrams / equations to someone and then later they hand me back an annotated version -- that's interesting too.


This is a really cool effect. How do you envision this being used?

Thinking about it as a product, I’d want a way to easily slip in and out of “LLM please respond” so it wasn’t constantly trying to write back the moment I stopped the stylus - maybe I’d want awhile to sketch and think, then restart a conversation. Or maybe for certain pages to be LLM-enabled, and others not.

Does it require any sort of jailbreak to get SSH access to the device?


The reMarkable comes with root-ssh out of the box, so installation here is scp'ing a rust-built binary over, and then ssh'ing and running it. I haven't wrapped it in a startup-on-boot service yet.

It is triggered right now by finger-tapping in the upper-right corner, so you can ask it to respond to the current contents of the screen on-demand. I think it would be cool to have another out-of-band communication, like voice, but this device has no microphone.

Also right now it is one-shot, but on my long long TODO list is a second trigger that would _continue_ a back and forth multi-screenshot (like multi-page even) conversation.


Ah great, I will definitely give this a try later then, thanks!

I’m curious if this is becoming something that you are using in your own day-to-day, or if your focus right now is on building it?

The context for my question is just a general interest in the transition to AI-enabled workflows. I know that I could be much more productive if I figured out how to integrate AI assistance into my workflows better.


Only building so far.

The one use-case that is _close_ to ready-for-useful: I often take business meeting notes. In these notes I often write a T in a circle to indicate a TODO item. I am going to add a bit of config in there, basically "If you see a circle-T, then go add that to my todo list if it isn't already there. If you see a crossed-out circle-T then go mark it as done on the todo list" .

I got slightly distracted implementing this, working instead toward a pluggable "if you see X call X.sh" interface. Almost there though :)


This is so cool! I love to see people hacking together apps for the reMarkable tablet

I made a little app for reMarkable too and I shared it here some time back: https://digest.ferrucc.io/


It's so great seeing these, always make me want to play with developing apps for the Remarkable 2. Do you have any sources you can recommend? Thank you!

edit: found the official developer website https://developer.remarkable.com/documentation


https://github.com/reHackable/awesome-reMarkable is a great resource to get other resources, including getting onto the discord if you want some interactive conversations.


IMO the easiest way to play around is to use the reverse engineered APIs

https://github.com/erikbrinkman/rmapi-js


Much appreciated :+1:


That’s awesome! Love seeing the reMarkable get more functionality through creative hacks. Just checked out your app—what was the biggest challenge you faced while developing for the reMarkable?


I think the thing I really didn't like was the lack of an OAuth like flow with fine-grained permissions

Basically authentication with devices is "all-access" or "no-access". I would've liked it if a "write-only" or "add-only" api permission scope existed


Blocked for AI reply @dang


Good catch, the last few pages of comment history are inhumanly insincere.

https://news.ycombinator.com/threads?id=memorydial

" @dang " isn't a thing, he doesn't watch for it - take credit and email him direct.


Do you have proof this is true?


I might be biased because memorydial was complimentary to me ... but they SEEM like a human! Also I'm not all that opposed to robot participation in the scheme of things. Especially if they are nice to me or give good ideas :)


Ha thanks for having my back! I genuinely love your project. I have been toying with get either a boox or a remarkable for ages.


Well you're human, you took the bait :-)

FWiW I mostly read HN at it's deadest time (I'm GMT+8 local time) and I see a lot of mechanical turk comments, especially from new (green coloured) accounts.

I always look for a response (eg: yours) before flagging them as spam bots . . .


Ha I guess when I stay up very late -8 overlaps with +8!


He has commented on this.

Retrieval is tricky as Algolia doesn't index '@' symbols:

https://hn.algolia.com/?query=%40dang%20by%3Adang&sort=byDat...


Most people don't correctly use an em-dash differently than a hyphen. That jumps out to me. :)


This is awkward—I use em-dash all the time on HN! I'm not an LLM (as far as I know); I just like to write neatly when I'm able to, and it's very low friction when you're familiar with your keyboard compose sequences[0]. It's a trivial four keypresses,

    AltR(hold) - - -
(The discoverability of these functions is way too low, on GNOME/Linux; I really dislike the direction of modern UX, with its fake simplicity, and infantalization of users. Way more people would be using —'s and friends if they were easily discoverable and prominently hinted in their UX. "It's documented in x.org man pages" is an unacceptable state of affairs for a core GUI workflow).

[0] https://news.ycombinator.com/item?id=35118338#35118598 (On "Punctuation Matters: How to use the en dash, em dash and hyphen" (2023); 356 comments)


never knew about the em dash thing, I was just using an AI writing assistant to help fix my shitty grammar and formatting. I think in future ill stick with bad formatting


no, just l–AI–zy copy-pasta. your book looks great! putting on your chat with lex now.


no, just lazily and stupidly used an AI writing assistant


Me too! :)


I wish the remarkable tablets weren't so locked down.

It's one of my favorite pieces of hardware and wish there were more apps for it.


Locked down? You can get a shell by ssh'ing to it. Call me when an iPad lets you do that...


I agree I definitely wouldn't call them "locked down." I do however think they could do a lot more to make it usable/hackable. This slightly undermines their cloud service ambitinos, but I think the hackability is what makes the Remarkable so ... well .. remarkable. Certainly that's why I bought one!


Awesome.

I wanted to try to implement this for months. You did a really good job.


Thank you! Still a WIP, but a very fun learning / inspiration project. Got a bit of Rust jammed in there, bit of device constraint dancing, bit of multiple LLM api normalization, bit of spacial vision LLM education, etc.


At some point I wanted to turn goMarkableStream into a MCP server (model context protocol). I could get the screen, but without “hack” I couldn’t write the response back.


The trick here is to inject events as if they came from the user. The virtual-keyboard works really reliably, you can see it over at https://github.com/awwaiid/ghostwriter/blob/main/src/keyboar... . It is the equivalent of plugging in the reMarkable type-folio.

Main limitation is that the reMarkable drawing app is very very minimal, it doesn't let you place text in arbitrary screen locations and is instead sort of a weird overlay text area spanning the entire screen.


This is so cool. I’m going to try it this weekend.

I’ve been playing with the idea of auto creating tasks when I write todos by emailing the PDF and sending it to an LLM.

This just opened up a whole realm of better ways to accomplish that goal in realtime.


This works pretty well when I did a proof of concept with Claude and rMPP a couple of months ago. It even handles scheduling fuzzy times ("I want to do this sometime but I don't have any real time I want to do it, pick a time that doesn't conflict with my actually scheduled tasks"). All with minimal prompting. I just didn't have a decent workflow and did exactly what you considered, emailed the pdf. I should probably revisit this but I haven't had the inclination since I just ignored the tasks anyway lol


Ha, automating the doing of the task is the next step.


Let me know if you need any help, I think only one other person has tried to get this working. I'm over on the reMarkable discord server, https://discord.gg/u3P9sDW (linked from https://github.com/reHackable/awesome-reMarkable)

Rust binary so should be easy to install. In theory :)


Will do! My wife and I love Harry Potter so I’m motivated to show her my investment in the tablet actually got me Tom Riddle’s diary.

I don’t use discord much but I’ll find you somewhere around here!


I'm on at awwaiid@gmail.com and probably other places :)

"proof" to partner of tablet investment value based on interactive fiction conversation == excellent strategy and nothing could go wrong


How about this on android driven Onyx Boox ereaders? Would it be possible?


The limitations for the reMarkable made it so that I took a screenshot and then injected input events to interact with the proprietary drawing app. Cross-app screenshots with the right permission are probably possible on Android, I'm not sure about injecting the drawing events.

The other way to go would be to make a specific app. I just picked up an Apple Pencil and am thinking of porting the concepts to a web app which so far works surprisingly well ... but for a real solution it'd be better for this Agent to interact with existing apps.


This is a brilliant use case—handwriting input combined with LLMs makes for a much more natural workflow. I wonder how well it handles messy handwriting and if fine-tuning on personal notes would improve recognition over time.


I did this a few months ago with the Remarkable Paper Pro and Claude. It worked quite well, my handwriting is pretty terrible, and I even had a clunky workflow where I could just write down stuff I wanted to do, and roughly(or specifically) when I wanted to do it, and it was able to generate an ical I could load into my calendar.


Generally if I can read my handwriting then it can! It has no issues with that. Really the problem is more in spacial awareness -- it can't reliably draw an X in a box, let alone play tic-tac-toe or dots-and-boxes.


Love this! There are some vector diffusion models out there; why not use tool calling to outsource to one of those if the model decides to draw something? Then it could specify coordinate range and the prompt.


Two reasons. One, because I haven't gotten to it yet. Two... er no just the one reason! Do you have a particular one, ideally with a hosted API, that you recommend?


I recall flux.ai had a couple models — a quick google search turned up these guys: https://github.com/ximinng/SVGDreamer

I’ve been working on a different angle - in place updating of PDFs on the Remarkable, so it’s cool to see what you’re working on. Thanks for sharing it.


For PDF paper readers, is the Remarkable’s 11” size sufficient? I have the Sony DPT 2nd version at 13”, and it’s perfect viewing experience. But projects like this keep drawing me to the Remarkable product.


I have used the Remarkable 2 for papers, but it is slightly too small to read text comfortably. I’m also an active reader, so I miss the color highlighting. Annotations are excellent. For now, I’m sticking to reviewing papers in the Zotero application on my iPad.


I got the reMarkable Pro tablet recently and as a result was able to move on from my Sony DPT-S1 and reMarkable 2. The latter was nice for its hackability but the screen size of the Pro, its color functionality, and size have made it a great replacement.


It’s barely usable for PDFs


Depends mostly on the font size in the PDF. For dense PDFs I agree, it's barely usable. For most PDFs though I'd call it "acceptable." If you have control over the font size (such as when you're converting some source material to PDF) you can make it an excellent reading experience IMHO.


So close. The advertised diagonal screen for the reMarkable Pro is 11.8". The DPT-RP1 is advertised as 13.3" (my unit measures 13.125"). Hopefully in the future reMarkable will make a full-size unit. As mobile phones, tablets, laptops and monitor sales indicate, larger sized screen is important buying factor.


I own a boox tablet (full fledge Android tablet with eink screen), and this sort of things would be perfect for it. I wonder if in 5 years the mobile hw would support something like that locally!


Really cool. Would this run on the remarkable paper pro too?


Buy me one and I'll find out! hahahaha

But also -- the main thing that might be different is the screenshot algorithm. I'm over on the reMarkable discord; if you want to take up a bit of Rust and give it a go then I'd be happy to (slowly/async) help!


:) Thanks! Been looking into learning rust recently, so will keep that in mind if I get it off the ground.


Initially most of the Rust was written by copilot or Sourcegraph's Cody; then I learn more and more rust as I disagree with the code-helper's taste and organization. Though I have a solid foundation in other programming languages which accelerates the process ... it's still a weird way to learn a language that I'm getting used to and kinda like.

That said, I based the memory capture on https://github.com/cloudsftp/reSnap/tree/latest which is a shell script that slurps out of process space device files. If you can find something like that which works on the rPP then I can blindly slap it in there and we can see what happens!


> Things that worked at least once:

I like it.


Top quality modern AI Eval!!!


Kinda unrelated but should I go for kobo or the remarkable? I mostly want to read papers and maybe take notes. How do tthey compare in terms of hackability and freedom?


I wonder if this can be abstracted to accept interaction from a Daylight too.


@apple.com add to iPadOS Notes?


Harry potter half-blood prince vibes. Interesting just how much the medium changes the feeling of interacting with a chat model


erm, you mean harry potter tom riddle's horcrux diary, sure

you know, the diary that wrote back to you and possessed your soul? that cursed diary?


I wonder if its better than the current version where my soul gets possessed by youtube shorts for 40 minutes.


Now only if llm response font is some handwritten style.


This uses LLM Tools to pick between outputting an SVG or plugging in a virtual keyboard to type. The keyboard is much more reliable, and that's what you see in the screenshot.

If nothing else it could use an SVG font that has handwriting; you'd need to bundle that for rendering via reSVG or use some other technique.

But if I ever make a pen-backend to reSVG then it would be even cooler, you would be able to see it trace out the letters.


That's definitely pretty easy to achieve, just change the font settings to use a particular handwritten style font [0].

[0] https://fonts.google.com/?categoryFilters=Calligraphy:%2FScr...


That would be next-level immersion! You could probably achieve this by rendering the LLM’s response using a handwritten font—maybe even train a model on your own handwriting to make it feel truly personal.


Script fonts don’t really look like handwriting - too regular.

But one of the early deep learning papers from Alex Graves does this really well with LSTMs - https://arxiv.org/abs/1308.0850

Implementation - https://www.calligrapher.ai/


ooo -- thanks for the link!


Like Apple Notes's Smart Script?


Actually if you figure that out please post it here!! I'd love to see that!


Exactly! There’s something about handwriting that makes it feel more personal—like scribbling notes in the margins of a spellbook. The shift from typing to pen input definitely changes the vibe of interacting with AI.


That's beside the point but you are probably referring to harry potter and the chamber of secrets not the half-blood prince.


Not to distract from the project but if anyone is interested in eink tablets with LLMs, the ViWoods tablet might be of interest to you.


Is this a Remarkable rebrand? Even the UI looks the same!

edit: https://viwoods.com/ (based in Hong Kong)

edit 2:

It's a blatant copy of the Remarkable 2 for sure :/ LLM integration is interesting --> Remarkable are you listening?




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: