Hacker News new | past | comments | ask | show | jobs | submit | jumploops's comments login

AND, OR, NOT - pick 2

NOR - pick 1

Or equivalently NAND, leading to the usual recommendation about NAND to Tetris - https://www.nand2tetris.org/

I'm surprised it took OpenAI this long to launch scheduled tasks, but as we've seen from our users[0], pure LLM-based responses are quite limited in utility.

For context: ~50% of our users use a time-triggered Loop, often with an LLM component.

Simple stuff I've used it for: baby name idea generator, reminder to pay housekeeper, pre-natal notifications, etc.

We're moving away from cron-esque automations as one of our core-value props (most new users use us for spinning up APIs really quickly), but the base functionality of LLM+code+cron will still be available (and migrated!) to the next version of our product.

[0]https://magicloops.dev/


This was a weak citation.

> Simple stuff I've used it for: baby name idea generator, reminder to pay housekeeper, pre-natal notifications, etc.

None of these require an LLM. It seems like you own this service yet can't find any valuable use for it.

---

ChatGPT tasks will become a powerful tool once incorporated into GPTs.

I produce lots of data. Lots of it, and I'd like to have my clients have daily updates on it, or even have content created based on it.


> None of these require an LLM. It seems like you own this service yet can't find any valuable use for it.

Sorry? My point was that these are the only overlapping features I've personally found useful that could be replaced with the new scheduled tasks from ChatGPT.

Even these shouldn't require an LLM. A simple cron+email would suffice.

The web scraping component is neat, but for my personal use-cases (tide tracking) I've had to use LLM-generated code to get the proper results. Pure LLMs were lacking in following the rules I wanted (tide less than 1 ft, between sunrise and sunset). Sometimes the LLM would get it right, sometimes it would not.

For our customers, purely scheduling an LLM call isn't that useful. They require pairing multiple LLM and code execution steps to get repeatable and reliable results.

> ChatGPT tasks will become a powerful tool once incorporated into GPTs.

Out of curiosity, do you use GPTs?


> Simple stuff I've used it for: baby name idea generator, reminder to pay housekeeper, pre-natal notifications, etc.

Baby name generator: why would this be a scheduled task? Surely you aren't having that many children... :)

Reminder to pay, notifications: what value does OpenAI bring to the table here over other apps which provide calendar / reminder functionality?


> Baby name generator: why would this be a scheduled task? Surely you aren't having that many children... :)

So far it's help name two children :) -- my wife and I like to see the same 10 ideas each day (via text), so that we can discuss what we like/don't like daily. We tried the sift through 1000 names thing and it didn't fit well with us.

> Reminder to pay, notifications: what value does OpenAI bring to the table here over other apps which provide calendar / reminder functionality?

That's exactly my point. Without further utility (i.e. custom code execution), I don't think this provides a ton of value at present.


"ok Google, remind me to ____ every ____"

Am I missing something or is there exactly zero benefit here over native Apple/Google calendar/todo apps?


You're not missing anything, other than us using Siri :)

My point was that this new functionality, while neat at a surface level, doesn't provide much real utility.

Without custom code execution, you're limited to very surface-level tasks that should be doable with a cron+sms/email.


> Lot of great classes in the EECS department though.

Couldn’t agree more, Jack! Great times during 482… tranquil compared to the 470 slog that started immediately after every night :)


I totally remember 482 (Operating Systems for those reading) being really interesting. Story I remember is one of the final projects and dealing with locks in C++ world where I'd get close to full solution, but some errors from the locks, then I'd make a change and suddenly those previous failing tests passed but new ones failed. I didn't realize that could happen.

Great times. And I really liked how we did it all in C++ (other than computer vision 442 that was in matlab) rather than Python which some places do. Having that lower level understanding of languages in school makes understanding code so much easier, and something I didn't have to learn on my own.


What do VMs mean in this context?

I did a pass of the codebase and it seems they’re just forking processes?

It’s unclear to me where the safety guarantees come from (compared to using e.g. KVM).

Edit: it appears the safety guarantees come from libriscv[0]. As far as I can tell, these sandboxes are essentially RISC-V programs running in an isolated context (“machine”) where all the Linux syscalls are emulated and thus “safe.” Still curious what potential attack vectors may exist?

[0] https://github.com/libriscv/libriscv/tree/dfb7c85d01f01cb38f...


The whole thing could really do with an explanation of how it works.


Well, the boundary between the host and the guest, the system call API, is always going to be the biggest vector of attacks no matter what the solution used is. But, if you find a problem and fix it, you're back to being safe again, unlike if you don't have any sandboxing at all. You can also put the whole solution in a jail, which is very common nowadays.


Can you expand on “put the whole solution in a jail”?



Ah, this is helpful, thanks!



libriscv sounds amazing on paper, I’d love to learn more about it


See also: Google's GVisor [1], which was used in Google App Engine.

[1]: https://en.wikipedia.org/wiki/GVisor


We do something internally[0] but specifically for security concerns.

We’ve found that having the LLM provide a “severity” level (simply low, medium, high), we’re able to filter out all the nitpicky feedback.

It’s important to note that this severity level should be specified at the end of the LLM’s response, not the beginning or middle.

There’s still an issue of context, where the LLM will provide a false positive due to unseen aspects of the larger system (e.g. make sure to sanitize X input).

We haven’t found the bot to be overbearing, but mostly because we auto-delete past comments when changes are pushed.

[0] https://magicloops.dev/loop/3f3781f3-f987-4672-8500-bacbeefc...


The severity needing to be at the end was an important insight. It made the results much better but not quite good enough.

We had it output a json with fields {comment: string, severity: string} in that order.


Another variation on this is to think about tokens and definitions. Numbers don’t have inherent meaning for your use case, so if you use numbers you need to provide an explicit definition of each rating number in the prompt. Similarly, and more effectively is to use labels such as low-quality, medium-quality, high-quality, and again providing an explicit definition of the label; one step further is to use explicit self describing label (along with detailed definition) such as “trivial-observation-on-naming-convention” or “insightful-identification-on-missed-corner-case”.

Effectively you are turning a somewhat arbitrary numeric “rating” task , into a multi label classification problem with well defined labels.

The natural evolution is to then train a BERT based classifier or similar on the set of labels and comments, which will get you a model judge that is super fast and can achieve good accuracy.


Can you read the rules and explain them to others faster than a video?


Yes because I know the other players strengths and experience.


This is awesome.

My wife wanted a wooden engagement ring, and so I fashioned one (well ~10) out of a Pacific madrone burl.

Great material to work with, but wouldn’t recommend wooden bands unless your actual wedding is near!


Why not?


Unless you use an epoxy of some sort, they’re quite prone to breaking over time — I only used natural beeswax.

(Plus, quite a few broke while I was iterating on my technique…)

To be clear, this is one of the reasons my then-girlfriend wanted one, to ensure a speedy engagement!


If they are made by cutting a ring shape out of wood, the grain is too weak for long term wear.

I more common method for wooden rings is to cut a long thin rip at 1/16th”. Soak it water for 30 minutes. Wrap it around something finger size, put a rubber band around it and let it dry. You can get a good imitation of a glossy epoxy finish with CA/super glue. This gives a lot more strength than a cutout.


Why not just use epoxy? It’s pretty easy to work with.


> Why not just use epoxy? It’s pretty easy to work with.

CA glue is easier for me to work with than epoxy and has done a fine job for me.


Thin CA will wick into the grain of thin veneers acting as a stabilizer. Epoxy is thicker and doesn't penetrate as deeply.

There are methods to get epoxy deeper, but they require significant equipment. Search for "stabilized wood" if you're curious.


I don't think that is true- I build and restore both wooden and fiberglass boats with epoxy, and have used it in almost every possible way. There are different thicknesses of epoxy with different properties, but the ones specially designed for penetrating deeply into wood - such as clear penetrating epoxy sealer will indeed penetrate extremely deep into wood, the manufacturer claims 9-16". In practice, almost any epoxy will penetrate at least 1" into wood.

If anything, epoxy often has too much penetration, and I end up doing a first coat or two that disappear fully into the wood, and another thickened one so it actually stays on the surface or joint.


Fingers change size, but wooden rings can't be stretched.


They can be sanded, just get a thick ring!


Yes, but that's generally not something you want to be doing the week before a wedding. It's _very_ easy to forget to do, and hard for the best man to run around and fix while you panic.

I had enough trouble SHINING MY SHOES. :)


Curious how these numbers correlate to the estimates of the engineers behind the PRs?

For example, the first PR is correlated with ~15 "hours of work for an expert engineer"

Looking at the PR, it was opened on Sept 18th and merged on Oct 2nd. That's two weeks, or 10 working days, later.

Between the initial code, the follow up PR feedback, and merging with upstream (8 times), I would wager that this took longer than 15 hours of work on the part of the author.

It doesn't _really_ matter, as long as the metrics are proportional, but it may be better to refer to them as isolated complexity hours, as context-switching doesn't seem to be properly accounted for.


Yeah maybe "expert engineer" is the wrong framing and it should be "oracle engineer" instead - you're right that we're not accounting for context switching (which, to be fair, is not really productive right?)

However ultimately the meaning isn't the absolute number but rather the relative difference (e.g. from PR to PR, or from team to team) - that's why we show industry benchmarks and make it easy to compare across teams!


What a fantastic read, thanks for posting!

Also: this is a great reminder that “history” is oft in the eye of the beholder.


Indeed, a fantastically written article!


This reminds me of a couple startups I knew running Node.js circa ~2014, where they would just restart their servers every night due to memory issues.

iirc it was mostly folks with websocket issues, but fixing the upstream was harder

10 years later and specific software has gotten better, but this type of problem is certainly still prevalent!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: