Project author here -- happy to elaborate on anything; a continuous WIP project. The biggest insight has been limitations of vision models in spacial awareness -- see https://github.com/awwaiid/ghostwriter/blob/main/evaluation_... for some sketchy examples of my rudimentary eval.
Next top things:
* Continue to build/extract into a yaml+shellscript agentic framework/tool
* Continue exploring pre-segmenting or other methods of spacial awareness
* Write a reSvg backend that sends actual pen-strokes instead of lots of dots
Wow! This is really cool! Really really cool! I imagine some sort of use where it's even more collaborative and not just "unadorned turn-by-turn".
For example, maybe I'm taking notes involving words, simple math, and a diagram. Underline a key phrase and "the device" expands on the phrase in the margin. Maybe the device is diagramming, and I interrupt and correct it, crossing out some parts, and it understands and alters.
Sorry, I know this is vague, I don't know precisely what I mean, but I do think that the combination of text (via some sort of handwriting recognition), stroke gestures, and a small iconography language with things enabled by LLMs probably opens up all sorts of new user interaction paradigms that I (and others) might be too set in our ways to think of immediately.
I think there's a "mother of all demos" moment potentially coming soon with stuff like this, but I am NOT a UX designer and can't quite imagine it clearly enough. Maybe you can.
Yes! I have flashbacks to productive times standing in front of a whiteboard, alone or with others, doodling out thoughts and annotating them. When working with others I can usually talk to them, so we are also discussing as we are drawing and annotating. But also I've handed diagrams / equations to someone and then later they hand me back an annotated version -- that's interesting too.
This is a really cool effect. How do you envision this being used?
Thinking about it as a product, I’d want a way to easily slip in and out of “LLM please respond” so it wasn’t constantly trying to write back the moment I stopped the stylus - maybe I’d want awhile to sketch and think, then restart a conversation. Or maybe for certain pages to be LLM-enabled, and others not.
Does it require any sort of jailbreak to get SSH access to the device?
The reMarkable comes with root-ssh out of the box, so installation here is scp'ing a rust-built binary over, and then ssh'ing and running it. I haven't wrapped it in a startup-on-boot service yet.
It is triggered right now by finger-tapping in the upper-right corner, so you can ask it to respond to the current contents of the screen on-demand. I think it would be cool to have another out-of-band communication, like voice, but this device has no microphone.
Also right now it is one-shot, but on my long long TODO list is a second trigger that would _continue_ a back and forth multi-screenshot (like multi-page even) conversation.
Ah great, I will definitely give this a try later then, thanks!
I’m curious if this is becoming something that you are using in your own day-to-day, or if your focus right now is on building it?
The context for my question is just a general interest in the transition to AI-enabled workflows. I know that I could be much more productive if I figured out how to integrate AI assistance into my workflows better.
The one use-case that is _close_ to ready-for-useful: I often take business meeting notes. In these notes I often write a T in a circle to indicate a TODO item. I am going to add a bit of config in there, basically "If you see a circle-T, then go add that to my todo list if it isn't already there. If you see a crossed-out circle-T then go mark it as done on the todo list" .
I got slightly distracted implementing this, working instead toward a pluggable "if you see X call X.sh" interface. Almost there though :)
It's so great seeing these, always make me want to play with developing apps for the Remarkable 2. Do you have any sources you can recommend? Thank you!
That’s awesome! Love seeing the reMarkable get more functionality through creative hacks. Just checked out your app—what was the biggest challenge you faced while developing for the reMarkable?
I might be biased because memorydial was complimentary to me ... but they SEEM like a human! Also I'm not all that opposed to robot participation in the scheme of things. Especially if they are nice to me or give good ideas :)
FWiW I mostly read HN at it's deadest time (I'm GMT+8 local time) and I see a lot of mechanical turk comments, especially from new (green coloured) accounts.
I always look for a response (eg: yours) before flagging them as spam bots . . .
This is awkward—I use em-dash all the time on HN! I'm not an LLM (as far as I know); I just like to write neatly when I'm able to, and it's very low friction when you're familiar with your keyboard compose sequences[0]. It's a trivial four keypresses,
AltR(hold) - - -
(The discoverability of these functions is way too low, on GNOME/Linux; I really dislike the direction of modern UX, with its fake simplicity, and infantalization of users. Way more people would be using —'s and friends if they were easily discoverable and prominently hinted in their UX. "It's documented in x.org man pages" is an unacceptable state of affairs for a core GUI workflow).
never knew about the em dash thing, I was just using an AI writing assistant to help fix my shitty grammar and formatting. I think in future ill stick with bad formatting
I agree I definitely wouldn't call them "locked down." I do however think they could do a lot more to make it usable/hackable. This slightly undermines their cloud service ambitinos, but I think the hackability is what makes the Remarkable so ... well .. remarkable. Certainly that's why I bought one!
Thank you! Still a WIP, but a very fun learning / inspiration project. Got a bit of Rust jammed in there, bit of device constraint dancing, bit of multiple LLM api normalization, bit of spacial vision LLM education, etc.
At some point I wanted to turn goMarkableStream into a MCP server (model context protocol).
I could get the screen, but without “hack” I couldn’t write the response back.
The trick here is to inject events as if they came from the user. The virtual-keyboard works really reliably, you can see it over at https://github.com/awwaiid/ghostwriter/blob/main/src/keyboar... . It is the equivalent of plugging in the reMarkable type-folio.
Main limitation is that the reMarkable drawing app is very very minimal, it doesn't let you place text in arbitrary screen locations and is instead sort of a weird overlay text area spanning the entire screen.
This works pretty well when I did a proof of concept with Claude and rMPP a couple of months ago. It even handles scheduling fuzzy times ("I want to do this sometime but I don't have any real time I want to do it, pick a time that doesn't conflict with my actually scheduled tasks"). All with minimal prompting. I just didn't have a decent workflow and did exactly what you considered, emailed the pdf. I should probably revisit this but I haven't had the inclination since I just ignored the tasks anyway lol
The limitations for the reMarkable made it so that I took a screenshot and then injected input events to interact with the proprietary drawing app. Cross-app screenshots with the right permission are probably possible on Android, I'm not sure about injecting the drawing events.
The other way to go would be to make a specific app. I just picked up an Apple Pencil and am thinking of porting the concepts to a web app which so far works surprisingly well ... but for a real solution it'd be better for this Agent to interact with existing apps.
This is a brilliant use case—handwriting input combined with LLMs makes for a much more natural workflow. I wonder how well it handles messy handwriting and if fine-tuning on personal notes would improve recognition over time.
I did this a few months ago with the Remarkable Paper Pro and Claude. It worked quite well, my handwriting is pretty terrible, and I even had a clunky workflow where I could just write down stuff I wanted to do, and roughly(or specifically) when I wanted to do it, and it was able to generate an ical I could load into my calendar.
Generally if I can read my handwriting then it can! It has no issues with that. Really the problem is more in spacial awareness -- it can't reliably draw an X in a box, let alone play tic-tac-toe or dots-and-boxes.
Love this! There are some vector diffusion models out there; why not use tool calling to outsource to one of those if the model decides to draw something? Then it could specify coordinate range and the prompt.
Two reasons. One, because I haven't gotten to it yet. Two... er no just the one reason! Do you have a particular one, ideally with a hosted API, that you recommend?
I’ve been working on a different angle - in place updating of PDFs on the Remarkable, so it’s cool to see what you’re working on. Thanks for sharing it.
For PDF paper readers, is the Remarkable’s 11” size sufficient? I have the Sony DPT 2nd version at 13”, and it’s perfect viewing experience. But projects like this keep drawing me to the Remarkable product.
I have used the Remarkable 2 for papers, but it is slightly too small to read text comfortably. I’m also an active reader, so I miss the color highlighting. Annotations are excellent. For now, I’m sticking to reviewing papers in the Zotero application on my iPad.
I got the reMarkable Pro tablet recently and as a result was able to move on from my Sony DPT-S1 and reMarkable 2. The latter was nice for its hackability but the screen size of the Pro, its color functionality, and size have made it a great replacement.
Depends mostly on the font size in the PDF. For dense PDFs I agree, it's barely usable. For most PDFs though I'd call it "acceptable." If you have control over the font size (such as when you're converting some source material to PDF) you can make it an excellent reading experience IMHO.
So close. The advertised diagonal screen for the reMarkable Pro is 11.8". The DPT-RP1 is advertised as 13.3" (my unit measures 13.125"). Hopefully in the future reMarkable will make a full-size unit. As mobile phones, tablets, laptops and monitor sales indicate, larger sized screen is important buying factor.
I own a boox tablet (full fledge Android tablet with eink screen), and this sort of things would be perfect for it. I wonder if in 5 years the mobile hw would support something like that locally!
But also -- the main thing that might be different is the screenshot algorithm. I'm over on the reMarkable discord; if you want to take up a bit of Rust and give it a go then I'd be happy to (slowly/async) help!
Initially most of the Rust was written by copilot or Sourcegraph's Cody; then I learn more and more rust as I disagree with the code-helper's taste and organization. Though I have a solid foundation in other programming languages which accelerates the process ... it's still a weird way to learn a language that I'm getting used to and kinda like.
That said, I based the memory capture on https://github.com/cloudsftp/reSnap/tree/latest which is a shell script that slurps out of process space device files. If you can find something like that which works on the rPP then I can blindly slap it in there and we can see what happens!
Kinda unrelated but should I go for kobo or the remarkable? I mostly want to read papers and maybe take notes. How do tthey compare in terms of hackability and freedom?
This uses LLM Tools to pick between outputting an SVG or plugging in a virtual keyboard to type. The keyboard is much more reliable, and that's what you see in the screenshot.
If nothing else it could use an SVG font that has handwriting; you'd need to bundle that for rendering via reSVG or use some other technique.
But if I ever make a pen-backend to reSVG then it would be even cooler, you would be able to see it trace out the letters.
That would be next-level immersion! You could probably achieve this by rendering the LLM’s response using a handwritten font—maybe even train a model on your own handwriting to make it feel truly personal.
Exactly! There’s something about handwriting that makes it feel more personal—like scribbling notes in the margins of a spellbook. The shift from typing to pen input definitely changes the vibe of interacting with AI.
Next top things:
* Continue to build/extract into a yaml+shellscript agentic framework/tool
* Continue exploring pre-segmenting or other methods of spacial awareness
* Write a reSvg backend that sends actual pen-strokes instead of lots of dots