Hacker Newsnew | past | comments | ask | show | jobs | submit | jbarrow's commentslogin

I’m personally a huge fan of Modal, and have been using their serverless scale-to-zero GPUs for a while. We’ve seen some nice cost reductions from using them, while also being able to scale WAY UP when needed. All with minimal development effort.

Interesting to see a big provider entering this space. Originally swapped to Modal because big providers weren’t offering this (e.g. AWS lambdas can’t run on GPU instances). Assuming all providers are going to start moving towards offering this?


Modal is great, they even released a deep dive into their LP solver for how they're able to get GPUs so quickly (and cheaply).

Coiled is another option worth looking at if you're a Python developer. Not nearly as fast on cold start as Modal, but similarly easy to use and great for spinning up GPU-backed VMs for bursty workloads. Everything runs in your cloud account. The built-in package sync is also pretty nice, it auto-installs CUDA drivers and Python dependencies from your local dev context.

(Disclaimer: I work with Coiled, but genuinely think it's a good option for GPU serverless-ish workflows. )


I’m also a big fan.

Modal has the fastest cold-start I’ve seen for 10GB+ models.


Thanks for sharing! They even support running HIPAA-compliant workloads, which I didn't anticipate.


Modal documentation is also very good.


If you enjoyed this essay, you should check out the author’s current project, Dynamicland[1]. It is a wonderful expression of what computing and interaction could be. Even the project website — navigating a physical shelf, and every part is hyperlinked — is joyful.

1. https://dynamicland.org/


i wish i could say this looked interesting to me but it doesnt :(


Thanks, I'll pick out something else for your birthday then.


> i wish i could say this looked interesting to me but it doesnt :(

Then, not to be snarky, why say anything?


Editing text in PDFs is _really_ hard compared to other document formats because most PDFs don't really encode the "physics" of the document. I.e. there isn't a notion of a "text block with word wrapping," it's more "glyphs inserted at location X with font Y."

If the PDF hasn't been made accessible, you have to do a lot of inferencing based on the layout about how things are grouped and how they should flow if you want to be able to make meaningful edits. Not impossible (Acrobat does it), but very challenging.

It's part of the legacy of PDF as a format for presentation and print jobs, rather than typesetting.


Yes, and alongside formatting challenges, PDFs commonly only include the glyphs from the font that are actually used in the document.

So if you had PDF with "Hello World" on it, you could feasibly change it to "Hello Hello", but wouldn't be able to change it to "Goodbye World" (as the glyphs for "G", "b", "y", and "e" are not included in the PDF)

Sure, you could do a bit of detective work to figure out which font it was from the glyphs or something and lookup and insert new glyphs into the PDF, but I can't imagine a generic PDF editor being capable of doing this for you.

Some editors get around this but just straight up switching the font(s) for the whole PDF, so they'll look different after saving.


It's still what a PDF editor, as it says in the title, would do. With a quick Google I found one that I hadn't heard about before, and it let me edit some text and save it for free.


Ask yourself, why would someone spend money on bandwidth for me to download something for free...


PDF editor is used as a broadly encompassing term. Yes, other tools can edit existing text, but they upload your PDF to their servers, so it's not private if that's something you care about.

There isn't anything off the shelf that enables editing existing text in the browser, but it's something I'll build from scratch. So you'll be able to edit existing PDF text without compromising privacy.


This can if I remember correctly (can't check now), but it's a POC and not a finished product https://github.com/ShizukuIchi/pdf-editor


Sejda.com does it. Though its free one is severely crippled


Wonderful! Inserted form-fields show up in Preview and Acrobat, which is not a trivial task. I run a little AI-powered tool that automatically figures out where form fields should go (https://detect.penpusher.app) and robustly adding form fields to the PDF was the hardest part.

Fwiw, I do see the issue with being unable to scroll down across both Safari and Chrome.


Thanks! I fixed the scrolling issue


I'm sorry I missed this earlier, but I absolutely believe that it could do that. Do you have any pointers to PDF forms that work well or don't work well with screen readers? I'd be happy to take a look, and see if I can improve this tool based on that.

In addition, did you try the "enhanced" pipeline? It gives each field a meaningful name based on the label, which might help with accessibility.

PDF accessibility is a huge issue that _should_ be easily solved, but isn't, unfortunately.


> Unfortunately Gemini really seems to struggle on this, and no matter how we tried prompting it, it would generate wildly inaccurate bounding boxes

Qwen2.5 VL was trained on a special HTML format for doing OCR with bounding boxes. [1] The resulting boxes aren't quite as accurate as something like Textract/Surya, but I've found they're much more accurate than Gemini or any other LLM.

[1] https://qwenlm.github.io/blog/qwen2.5-vl/


I've been very impressed by Gemini 2.0 Flash for multimodal tasks, including object detection and localization[1], plus document tasks. But the 15 requests per minute limit was a severe limiter while it was experimental. I'm really excited to be able to actually _do_ things with the model.

In my experience, I'd reach for Gemini 2.0 Flash over 4o in a lot of multimodal/document use cases. Especially given the differences in price ($0.10/million input and $0.40/million output versus $2.50/million input and $10.00/million output).

That being said, Qwen2.5 VL 72B and 7B seem even better at document image tasks and localization.

[1] https://notes.penpusher.app/Misc/Google+Gemini+101+-+Object+...


> In my experience, I'd reach for Gemini 2.0 Flash over 4o

Why not use o1-mini?


Mostly because OpenAI's vision offerings aren't particularly compelling:

- 4o can't really do localization, and ime is worse than Gemini 2.0 and Qwen2.5 at document tasks

- 4o mini isn't cheaper than 4o for images because it uses a lot of tokens per image compared to 4o (~5600/tile vs 170/tile, where each tile is 512x512)

- o1 has support for vision but is wildly expensive and slow

- o3-mini doesn't yet have support for vision, and o1-mini never did


Loved the one in Kansas City! There are some great, thematically-similar museums in other countries as well, if you ever find yourself there:

- the Vasa in Stockholm, Sweden is a ship dredged from the harbor and stabilized, sank in 1628

- the Mary Rose in Portsmouth, England is a Tudor ship that sank in 1545 that was raised and stabilized

In both cases a ton of work was done to stabilize and preserve the remains of the ships that is, imo, almost more interesting than the ship itself.


The Vasa Museum , Stockholm, Sweden ( Ultra 4K ) https://www.youtube.com/watch?v=N9NQUULR-UE


Honestly I wouldn't rush to see the Mary Rose unless you are extremely interested. It's a little anticlimactic as a viewing experience.


> There will be more cloud-based products turned to bricks by manufacturers that go bankrupt or simply stop caring.

This one feels like a gimme. The recent Garmin outage that partially bricked the Connect app was a bit of a surprise; so much of what Garmin Connect does _should be_ local to the phone. Plus it's a free service (after you've paid for the device).

"You'll own nothing and you'll be happy" doesn't only apply to media/digital goods, but a lot of hardware at this point. :/


This trick is also the way to teach adults, if you're teaching or learning to ride!

For children, there are companies that sell progressive balance bikes[1][2], that start off as balance bikes but can be converted to pedal bikes later. In the US, I've seen tons of cheap Strider bikes on Craigslist, and then you can get the pedal conversion kit separately (you have to get the 14", not the 12").

[1] https://striderbikes.co.uk/collections/14x-balance-pedal-bik...

[2] https://www.littlebigbikes.com/shop/convertible-balance-bike...


Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: