Ghostwriter – use the reMarkable2 as an interface to vision-LLMs

awwaiid · 2025-02-08T15:36:01 1739028961

Project author here -- happy to elaborate on anything; a continuous WIP project. The biggest insight has been limitations of vision models in spacial awareness -- see https://github.com/awwaiid/ghostwriter/blob/main/evaluation_... for some sketchy examples of my rudimentary eval.

Next top things:

* Continue to build/extract into a yaml+shellscript agentic framework/tool

* Continue exploring pre-segmenting or other methods of spacial awareness

* Write a reSvg backend that sends actual pen-strokes instead of lots of dots

loxias · 2025-02-08T17:00:19 1739034019

Wow! This is really cool! Really really cool! I imagine some sort of use where it's even more collaborative and not just "unadorned turn-by-turn".

For example, maybe I'm taking notes involving words, simple math, and a diagram. Underline a key phrase and "the device" expands on the phrase in the margin. Maybe the device is diagramming, and I interrupt and correct it, crossing out some parts, and it understands and alters.

Sorry, I know this is vague, I don't know precisely what I mean, but I do think that the combination of text (via some sort of handwriting recognition), stroke gestures, and a small iconography language with things enabled by LLMs probably opens up all sorts of new user interaction paradigms that I (and others) might be too set in our ways to think of immediately.

I think there's a "mother of all demos" moment potentially coming soon with stuff like this, but I am NOT a UX designer and can't quite imagine it clearly enough. Maybe you can.

awwaiid · 2025-02-08T17:34:00 1739036040

Yes! I have flashbacks to productive times standing in front of a whiteboard, alone or with others, doodling out thoughts and annotating them. When working with others I can usually talk to them, so we are also discussing as we are drawing and annotating. But also I've handed diagrams / equations to someone and then later they hand me back an annotated version -- that's interesting too.

rybosome · 2025-02-08T16:04:51 1739030691

This is a really cool effect. How do you envision this being used?

Thinking about it as a product, I’d want a way to easily slip in and out of “LLM please respond” so it wasn’t constantly trying to write back the moment I stopped the stylus - maybe I’d want awhile to sketch and think, then restart a conversation. Or maybe for certain pages to be LLM-enabled, and others not.

Does it require any sort of jailbreak to get SSH access to the device?

awwaiid · 2025-02-08T16:16:02 1739031362

The reMarkable comes with root-ssh out of the box, so installation here is scp'ing a rust-built binary over, and then ssh'ing and running it. I haven't wrapped it in a startup-on-boot service yet.

It is triggered right now by finger-tapping in the upper-right corner, so you can ask it to respond to the current contents of the screen on-demand. I think it would be cool to have another out-of-band communication, like voice, but this device has no microphone.

Also right now it is one-shot, but on my long long TODO list is a second trigger that would _continue_ a back and forth multi-screenshot (like multi-page even) conversation.

rybosome · 2025-02-08T17:16:19 1739034979

Ah great, I will definitely give this a try later then, thanks!

I’m curious if this is becoming something that you are using in your own day-to-day, or if your focus right now is on building it?

The context for my question is just a general interest in the transition to AI-enabled workflows. I know that I could be much more productive if I figured out how to integrate AI assistance into my workflows better.

awwaiid · 2025-02-08T17:38:15 1739036295

Only building so far.

The one use-case that is _close_ to ready-for-useful: I often take business meeting notes. In these notes I often write a T in a circle to indicate a TODO item. I am going to add a bit of config in there, basically "If you see a circle-T, then go add that to my todo list if it isn't already there. If you see a crossed-out circle-T then go mark it as done on the todo list" .

I got slightly distracted implementing this, working instead toward a pluggable "if you see X call X.sh" interface. Almost there though :)

0xferruccio · 2025-02-08T09:32:43 1739007163

This is so cool! I love to see people hacking together apps for the reMarkable tablet

I made a little app for reMarkable too and I shared it here some time back: https://digest.ferrucc.io/

Ensign35 · 2025-02-08T12:03:48 1739016228

It's so great seeing these, always make me want to play with developing apps for the Remarkable 2. Do you have any sources you can recommend? Thank you!

edit: found the official developer website https://developer.remarkable.com/documentation

awwaiid · 2025-02-08T16:59:47 1739033987

https://github.com/reHackable/awesome-reMarkable is a great resource to get other resources, including getting onto the discord if you want some interactive conversations.

0xferruccio · 2025-02-08T12:40:40 1739018440

IMO the easiest way to play around is to use the reverse engineered APIs

https://github.com/erikbrinkman/rmapi-js

Ensign35 · 2025-02-08T12:50:53 1739019053

Much appreciated :+1:

memorydial · 2025-02-08T09:35:57 1739007357

That’s awesome! Love seeing the reMarkable get more functionality through creative hacks. Just checked out your app—what was the biggest challenge you faced while developing for the reMarkable?

0xferruccio · 2025-02-08T09:43:58 1739007838

I think the thing I really didn't like was the lack of an OAuth like flow with fine-grained permissions

Basically authentication with devices is "all-access" or "no-access". I would've liked it if a "write-only" or "add-only" api permission scope existed

pieterhg · 2025-02-08T10:09:29 1739009369

Blocked for AI reply @dang

defrost · 2025-02-08T10:12:51 1739009571

Good catch, the last few pages of comment history are inhumanly insincere.

https://news.ycombinator.com/threads?id=memorydial

" @dang " isn't a thing, he doesn't watch for it - take credit and email him direct.

kordlessagain · 2025-02-08T16:27:01 1739032021

Do you have proof this is true?

awwaiid · 2025-02-08T16:53:48 1739033628

I might be biased because memorydial was complimentary to me ... but they SEEM like a human! Also I'm not all that opposed to robot participation in the scheme of things. Especially if they are nice to me or give good ideas :)

memorydial · 2025-02-09T05:22:34 1739078554

Ha thanks for having my back! I genuinely love your project. I have been toying with get either a boox or a remarkable for ages.

defrost · 2025-02-09T05:30:05 1739079005

Well you're human, you took the bait :-)

FWiW I mostly read HN at it's deadest time (I'm GMT+8 local time) and I see a lot of mechanical turk comments, especially from new (green coloured) accounts.

I always look for a response (eg: yours) before flagging them as spam bots . . .

memorydial · 2025-02-09T20:49:10 1739134150

Ha I guess when I stay up very late -8 overlaps with +8!

defrost · 2025-02-08T22:59:36 1739055576

He has commented on this.

Retrieval is tricky as Algolia doesn't index '@' symbols:

https://hn.algolia.com/?query=%40dang%20by%3Adang&sort=byDat...

loxias · 2025-02-08T17:06:11 1739034371

Most people don't correctly use an em-dash differently than a hyphen. That jumps out to me. :)

perihelions · 2025-02-08T23:21:12 1739056872

This is awkward—I use em-dash all the time on HN! I'm not an LLM (as far as I know); I just like to write neatly when I'm able to, and it's very low friction when you're familiar with your keyboard compose sequences[0]. It's a trivial four keypresses,

    AltR(hold) - - -

(The discoverability of these functions is way too low, on GNOME/Linux; I really dislike the direction of modern UX, with its fake simplicity, and infantalization of users. Way more people would be using —'s and friends if they were easily discoverable and prominently hinted in their UX. "It's documented in x.org man pages" is an unacceptable state of affairs for a core GUI workflow).

[0] https://news.ycombinator.com/item?id=35118338#35118598 (On "Punctuation Matters: How to use the en dash, em dash and hyphen" (2023); 356 comments)

memorydial · 2025-02-09T05:14:23 1739078063

never knew about the em dash thing, I was just using an AI writing assistant to help fix my shitty grammar and formatting. I think in future ill stick with bad formatting

memorydial · 2025-02-09T05:41:07 1739079667

no, just l–AI–zy copy-pasta. your book looks great! putting on your chat with lex now.

memorydial · 2025-02-09T05:28:15 1739078895

no, just lazily and stupidly used an AI writing assistant

kordlessagain · 2025-02-11T20:04:13 1739304253

Me too! :)

vendiddy · 2025-02-08T16:44:23 1739033063

I wish the remarkable tablets weren't so locked down.

It's one of my favorite pieces of hardware and wish there were more apps for it.

thrtythreeforty · 2025-02-08T17:36:40 1739036200

Locked down? You can get a shell by ssh'ing to it. Call me when an iPad lets you do that...

freedomben · 2025-02-08T17:58:16 1739037496

I agree I definitely wouldn't call them "locked down." I do however think they could do a lot more to make it usable/hackable. This slightly undermines their cloud service ambitinos, but I think the hackability is what makes the Remarkable so ... well .. remarkable. Certainly that's why I bought one!

owulveryck · 2025-02-08T06:12:21 1738995141

Awesome.

I wanted to try to implement this for months. You did a really good job.

awwaiid · 2025-02-08T15:42:27 1739029347

Thank you! Still a WIP, but a very fun learning / inspiration project. Got a bit of Rust jammed in there, bit of device constraint dancing, bit of multiple LLM api normalization, bit of spacial vision LLM education, etc.

owulveryck · 2025-02-08T06:15:18 1738995318

At some point I wanted to turn goMarkableStream into a MCP server (model context protocol). I could get the screen, but without “hack” I couldn’t write the response back.

awwaiid · 2025-02-08T15:52:42 1739029962

The trick here is to inject events as if they came from the user. The virtual-keyboard works really reliably, you can see it over at https://github.com/awwaiid/ghostwriter/blob/main/src/keyboar... . It is the equivalent of plugging in the reMarkable type-folio.

Main limitation is that the reMarkable drawing app is very very minimal, it doesn't let you place text in arbitrary screen locations and is instead sort of a weird overlay text area spanning the entire screen.

rpicard · 2025-02-08T05:36:44 1738993004

This is so cool. I’m going to try it this weekend.

I’ve been playing with the idea of auto creating tasks when I write todos by emailing the PDF and sending it to an LLM.

This just opened up a whole realm of better ways to accomplish that goal in realtime.

r2_pilot · 2025-02-08T14:49:14 1739026154

This works pretty well when I did a proof of concept with Claude and rMPP a couple of months ago. It even handles scheduling fuzzy times ("I want to do this sometime but I don't have any real time I want to do it, pick a time that doesn't conflict with my actually scheduled tasks"). All with minimal prompting. I just didn't have a decent workflow and did exactly what you considered, emailed the pdf. I should probably revisit this but I haven't had the inclination since I just ignored the tasks anyway lol

rpicard · 2025-02-08T20:58:14 1739048294

Ha, automating the doing of the task is the next step.

awwaiid · 2025-02-08T15:40:48 1739029248

Let me know if you need any help, I think only one other person has tried to get this working. I'm over on the reMarkable discord server, https://discord.gg/u3P9sDW (linked from https://github.com/reHackable/awesome-reMarkable)

Rust binary so should be easy to install. In theory :)

rpicard · 2025-02-08T16:03:53 1739030633

Will do! My wife and I love Harry Potter so I’m motivated to show her my investment in the tablet actually got me Tom Riddle’s diary.

I don’t use discord much but I’ll find you somewhere around here!

awwaiid · 2025-02-08T16:56:43 1739033803

I'm on at awwaiid@gmail.com and probably other places :)

"proof" to partner of tablet investment value based on interactive fiction conversation == excellent strategy and nothing could go wrong

t0bia_s · 2025-02-08T09:35:09 1739007309

How about this on android driven Onyx Boox ereaders? Would it be possible?

awwaiid · 2025-02-08T15:45:39 1739029539

The limitations for the reMarkable made it so that I took a screenshot and then injected input events to interact with the proprietary drawing app. Cross-app screenshots with the right permission are probably possible on Android, I'm not sure about injecting the drawing events.

The other way to go would be to make a specific app. I just picked up an Apple Pencil and am thinking of porting the concepts to a web app which so far works surprisingly well ... but for a real solution it'd be better for this Agent to interact with existing apps.

memorydial · 2025-02-08T09:35:28 1739007328

This is a brilliant use case—handwriting input combined with LLMs makes for a much more natural workflow. I wonder how well it handles messy handwriting and if fine-tuning on personal notes would improve recognition over time.

r2_pilot · 2025-02-08T14:45:45 1739025945

I did this a few months ago with the Remarkable Paper Pro and Claude. It worked quite well, my handwriting is pretty terrible, and I even had a clunky workflow where I could just write down stuff I wanted to do, and roughly(or specifically) when I wanted to do it, and it was able to generate an ical I could load into my calendar.

awwaiid · 2025-02-08T15:43:28 1739029408

Generally if I can read my handwriting then it can! It has no issues with that. Really the problem is more in spacial awareness -- it can't reliably draw an X in a box, let alone play tic-tac-toe or dots-and-boxes.

vessenes · 2025-02-08T12:32:38 1739017958

Love this! There are some vector diffusion models out there; why not use tool calling to outsource to one of those if the model decides to draw something? Then it could specify coordinate range and the prompt.

awwaiid · 2025-02-08T15:33:02 1739028782

Two reasons. One, because I haven't gotten to it yet. Two... er no just the one reason! Do you have a particular one, ideally with a hosted API, that you recommend?

vessenes · 2025-02-12T16:03:47 1739376227

I recall flux.ai had a couple models — a quick google search turned up these guys: https://github.com/ximinng/SVGDreamer

I’ve been working on a different angle - in place updating of PDFs on the Remarkable, so it’s cool to see what you’re working on. Thanks for sharing it.

xtiansimon · 2025-02-08T14:11:40 1739023900

For PDF paper readers, is the Remarkable’s 11” size sufficient? I have the Sony DPT 2nd version at 13”, and it’s perfect viewing experience. But projects like this keep drawing me to the Remarkable product.

pilotneko · 2025-02-08T14:49:02 1739026142

I have used the Remarkable 2 for papers, but it is slightly too small to read text comfortably. I’m also an active reader, so I miss the color highlighting. Annotations are excellent. For now, I’m sticking to reviewing papers in the Zotero application on my iPad.

abawany · 2025-02-08T16:34:47 1739032487

I got the reMarkable Pro tablet recently and as a result was able to move on from my Sony DPT-S1 and reMarkable 2. The latter was nice for its hackability but the screen size of the Pro, its color functionality, and size have made it a great replacement.

kordlessagain · 2025-02-08T16:27:19 1739032039

It’s barely usable for PDFs

freedomben · 2025-02-08T18:00:02 1739037602

Depends mostly on the font size in the PDF. For dense PDFs I agree, it's barely usable. For most PDFs though I'd call it "acceptable." If you have control over the font size (such as when you're converting some source material to PDF) you can make it an excellent reading experience IMHO.

xtiansimon · 2025-02-12T17:19:05 1739380745

So close. The advertised diagonal screen for the reMarkable Pro is 11.8". The DPT-RP1 is advertised as 13.3" (my unit measures 13.125"). Hopefully in the future reMarkable will make a full-size unit. As mobile phones, tablets, laptops and monitor sales indicate, larger sized screen is important buying factor.

3abiton · 2025-02-08T12:58:39 1739019519

I own a boox tablet (full fledge Android tablet with eink screen), and this sort of things would be perfect for it. I wonder if in 5 years the mobile hw would support something like that locally!

complex1314 · 2025-02-08T08:45:47 1739004347

Really cool. Would this run on the remarkable paper pro too?

awwaiid · 2025-02-08T15:47:02 1739029622

Buy me one and I'll find out! hahahaha

But also -- the main thing that might be different is the screenshot algorithm. I'm over on the reMarkable discord; if you want to take up a bit of Rust and give it a go then I'd be happy to (slowly/async) help!

complex1314 · 2025-02-08T17:28:23 1739035703

:) Thanks! Been looking into learning rust recently, so will keep that in mind if I get it off the ground.

awwaiid · 2025-02-08T17:44:45 1739036685

Initially most of the Rust was written by copilot or Sourcegraph's Cody; then I learn more and more rust as I disagree with the code-helper's taste and organization. Though I have a solid foundation in other programming languages which accelerates the process ... it's still a weird way to learn a language that I'm getting used to and kinda like.

That said, I based the memory capture on https://github.com/cloudsftp/reSnap/tree/latest which is a shell script that slurps out of process space device files. If you can find something like that which works on the rPP then I can blindly slap it in there and we can see what happens!

chrismorgan · 2025-02-08T10:57:29 1739012249

> Things that worked at least once:

I like it.

awwaiid · 2025-02-08T16:30:41 1739032241

Top quality modern AI Eval!!!

seethedeaduu · 2025-02-08T11:11:08 1739013068

Kinda unrelated but should I go for kobo or the remarkable? I mostly want to read papers and maybe take notes. How do tthey compare in terms of hackability and freedom?

newman314 · 2025-02-08T08:04:32 1739001872

I wonder if this can be abstracted to accept interaction from a Daylight too.

cancelself · 2025-02-08T07:05:04 1738998304

@apple.com add to iPadOS Notes?

tony_francis · 2025-02-08T06:41:08 1738996868

Harry potter half-blood prince vibes. Interesting just how much the medium changes the feeling of interacting with a chat model

GeoAtreides · 2025-02-08T09:52:38 1739008358

erm, you mean harry potter tom riddle's horcrux diary, sure

you know, the diary that wrote back to you and possessed your soul? that cursed diary?

guax · 2025-02-08T14:37:37 1739025457

I wonder if its better than the current version where my soul gets possessed by youtube shorts for 40 minutes.

s2l · 2025-02-08T07:36:55 1739000215

Now only if llm response font is some handwritten style.

awwaiid · 2025-02-08T15:49:02 1739029742

This uses LLM Tools to pick between outputting an SVG or plugging in a virtual keyboard to type. The keyboard is much more reliable, and that's what you see in the screenshot.

If nothing else it could use an SVG font that has handwriting; you'd need to bundle that for rendering via reSVG or use some other technique.

But if I ever make a pen-backend to reSVG then it would be even cooler, you would be able to see it trace out the letters.

satvikpendem · 2025-02-08T08:49:39 1739004579

That's definitely pretty easy to achieve, just change the font settings to use a particular handwritten style font [0].

[0] https://fonts.google.com/?categoryFilters=Calligraphy:%2FScr...

memorydial · 2025-02-08T09:42:33 1739007753

That would be next-level immersion! You could probably achieve this by rendering the LLM’s response using a handwritten font—maybe even train a model on your own handwriting to make it feel truly personal.

dharma1 · 2025-02-08T11:21:22 1739013682

Script fonts don’t really look like handwriting - too regular.

But one of the early deep learning papers from Alex Graves does this really well with LSTMs - https://arxiv.org/abs/1308.0850

Implementation - https://www.calligrapher.ai/

awwaiid · 2025-02-08T16:17:59 1739031479

ooo -- thanks for the link!

wdb · 2025-02-08T11:28:26 1739014106

Like Apple Notes's Smart Script?

memorydial · 2025-02-08T09:43:00 1739007780

Actually if you figure that out please post it here!! I'd love to see that!

memorydial · 2025-02-08T09:37:12 1739007432

Exactly! There’s something about handwriting that makes it feel more personal—like scribbling notes in the margins of a spellbook. The shift from typing to pen input definitely changes the vibe of interacting with AI.

hexomancer · 2025-02-08T11:22:23 1739013743

That's beside the point but you are probably referring to harry potter and the chamber of secrets not the half-blood prince.

8bithero · 2025-02-08T10:59:49 1739012389

Not to distract from the project but if anyone is interested in eink tablets with LLMs, the ViWoods tablet might be of interest to you.

Ensign35 · 2025-02-08T11:53:31 1739015611

Is this a Remarkable rebrand? Even the UI looks the same!

edit: https://viwoods.com/ (based in Hong Kong)

edit 2:

It's a blatant copy of the Remarkable 2 for sure :/ LLM integration is interesting --> Remarkable are you listening?