More

billmalarky · 2025-08-07T18:24:13 1754591053

Hi Swyx I always appreciate your insights, something you wrote really resonated with a personal theory I've been developing:

>"While I never use AI for personal writing (because I have a strong belief in writing to think)"

The optimal AI productivity process is starting to look like:

AI Generates > Human Validates > Loop

Yet cognitive generation is how humans learn and develop cognitive strength, as well as how they maintain such strength.

Similar to how physical activity is how muscles/bone density/etc grow, and how body tissues maintain.

Physical technology freed us from hard physical labor that kept our bodies in shape -- at a cost of physical atrophy.

AI seems to have a similar effect for our minds. AI will accelerate our cognitive productivity, and allow for cognitive convenience -- at a cost of cognitive atrophy.

At present we must be intentional about building/maintaining physical strength (dedicated strength training, cardio, etc).

Soon we will need to be intentional about building/maintaining cognitive strength.

I suspect the workday/week of the future will be split on AI-on-a-leash work for optimal productivity, with carve-outs for dedicated AI-enhanced-learning solely for building/maintaining cognitive health (where productivity is not the goal, building/maintaining cognition is). Similar to how we carve out time for working out.

What are your thoughts on this? Based on what you wrote above, it seems you have similar feelings?

Is there a name for this theory?

If not can you coin one? You're great at that :)

tailspin2019 · 2025-08-07T22:39:32 1754606372

This is very interesting - I like the way you’ve explained this.

The parallel with “intentionally working out to maintain physical strength” is extremely helpful as an analogy to communicate this concept.

That’s exactly what we might be faced with… cognitive atrophy…

It’s arguably already started, and is accelerating!

swyx · 2025-08-07T21:44:43 1754603083

thanks very much :)

problem with your theory is it bundles 2-3 steps which each could be their own theses

suggest you nail those down before building up to a general bundle (or mental model/framework)

billmalarky · 2025-08-07T23:52:48 1754610768

Ah, I probably should have listed some of the "assumptions" I'm developing it on top of:

1) Regarding the "generation is how learning occurs" claim, I'm going off of this:

https://www.learningscientists.org/blog/2024/3/7/how-does-re...

Granted, that article refers to retrieval specifically being one major way we learn, and of course learning incorporates many dimensions. But it seems a bit self-evident that retrieval occurs heavily during active problem solving (ie "generation"), and less so during passive learning (ie: just reading/consuming info).

From personal experience, I always noticed I learned much more by doing than by consuming documentation alone.

But yes, I admit this assumption and my own personal experience/bias is doing a lot of heavy lifting for me...

2) Regarding the "optimal AI productivity process" (AI Generates > Human Validates > Loop)

I'm using Karpathy's productivity loop described in his AI startup school talk last month here:

https://youtu.be/LCEmiRjPEtQ?t=1327

Does this help make it more concrete Swyx (name dropping you here since I'm pretty sure you've got a social listener set for your handle ;)? Love to hear your thoughts straight from the hip based on your own personal experiences.

Full disclosure: I'm not trying to get too academic about this. In all honestly I'm really trying to get to an informal theory that's useful and practical enough that it can be turned into a regular business process for rapid professional development.

billmalarky · 2025-07-31T18:47:19 1753987639

Search tool calling is RAG. Maybe we should call it a "RAG Agent" to be more en vogue heh. But RAG is not just similarity search on embeddings in vector DBs. RAG is any type of a retrieval + context injection step prior to inference.

Heck, the RAG Agent could run cosign diff on your vector db in addition to grep, FTS queries, KB api calls, whatever, to do wide recall (candidate generation) then rerank (relevance prioritization) all the results.

You are probably correct that for most use cases search tool calling makes more practical sense than embeddings similarity search to power RAG.

visarga · 2025-07-31T21:04:53 1753995893

> could run cosign diff on your vector db

or maybe even "cosine similarity"

billmalarky · 2025-08-01T18:49:35 1754074175

word ;)

billmalarky · 2025-04-20T18:05:18 1745172318

I built a distributed software engineering firm pre-covid, so all of our clients were onsite even though we were full-remote. My engineers plugged into the engineering teams of our clients, so it's not like we were building on the side and just handing over deliverables, we had to fully integrate into the client teams.

So we had to solve this problem pre-covid, and the solution remained the same during the pandemic when every org went full remote (at least temporarily).

There is no "one size fits all approach" because each engineer is different. We had dozens of engineers on our team, and you learn that people are very diverse in how they think/operate.

But we came up with a framework that was really successful.

1) Good faith is required: you mention personnel abusing time/trust, that's a different issue entirely, no framework will be successful if people refuse to comply. This system only works if teammates trust the person. Terminate someone who can't be trusted.

2) "Know thyself": Many engineers wouldn't necessarily even know how THEY operated best (if they needed large chunks of focus time, or were fine multi-tasking, etc). We'd have them make a best guess when onboarding and then iterate and update as they figured out how they worked best.

3) Proactively Propagate Communication Standard: Most engineers would want large chunks of uninterrupted focus time, so we would tell them to EXPLICITLY tell their teammates or any other stakeholders WHEN they would be focusing and unresponsive (standardize it via schedule), and WHY (ie sell the idea). Bad feelings or optics are ALWAYS simply a matter of miscommunication so long as good faith exists. We'd also have them explain "escalation patterns", ie "if something is truly urgent, DM me on slack a few times and finally, call my phone."

4) Set comms status: Really this is just slack/teams. but basically as a soft reminder to stakeholders, set your slack status to "heads down building" or something so people remember that you aren't available due to focus time. It's really easy to sync slack status to calendar blocks to automate this.

We also found that breaking the day into async task time and sync task time really helped optimize. Async tasks are tasks that can get completed in small chunks of time like code review, checking email, slack, etc. These might be large time sinks in aggregate, but generally you can break into small time blocks and still be successful. We would have people set up their day so all the async tasks would be done when they are already paying a context switching cost. IE, scheduled agile cadence meetings etc. If you're doing a standup meeting, you're already gonna be knocked out of flow so might as well use this time to also do PR review, async comms, etc. Naturally we had people stack their meetings when possible instead of pepper throughout the day (more on how this was accomplished below).

Anyways, sometimes when an engineer of ours joined a new team, there might be a political challenge in not fitting into the existing "mold" of how that team communicated (if that team's comm standard didn't jive with our engineer's). This quickly resolved every single time when our engineer was proven out to be much more productive/effective than the existing engineers (who were kneecapped by the terrible distracting existing standard of meetings, constant slack interruptions, etc). We would even go as far as to tell stakeholders our engineers would not be attending less important meetings (not immediately, once we had already proven ourselves a bit). The optics around this weren't great at first, but again, our engineers would start 1.5-2X'ing productivity of the in-house engineers, and political issues melt away very quickly.

TL;DR - Operate in good faith, decide your own best communication standard, propagate the standard out to your stakeholders explicitly, deliver and people will respect you and also your comms standard.

billmalarky · 2025-04-13T15:20:12 1744557612

I've been lucky enough to have a few conversations with Scott a month or so ago and he is doing some really compelling work around the AISDLC and creating a factory line approach to building software. Seriously folks, I recommend following this guy closely.

There's another guy in this space I know who's doing similar incredible things but he doesn't really speak about it publicly so don't want to discuss w/o his permission. I'm happy to make an introduction for those interested just hmu (check my profile for how).

Really excited to see you on the FP of HN Scott!

billmalarky · 2025-04-11T19:45:03 1744400703

^ this guy knows Jobs To Be Done theory ;)

For those who don't, reading "Competing Against Luck" by Clayton Christensen will dramatically improve your ability to create successful products/services.

billmalarky · 2025-02-25T20:28:22 1740515302

Yes. Absolutely it is. For different workloads it is an insanely effective tool.

billmalarky · 2025-02-25T20:25:37 1740515137

Hi Paul, been following the aider project for about a year now to develop an understanding of how to build SWE agents.

I was at the AI Engineering Summit in NYC last week and met an (extremely senior) staff ai engineer doing somewhat unbelievable things with aider. Shocking things tbh.

Is there a good way to share stories about real-world aider projects like this with you directly (if I can get approval from him)? Not sure posting on public forum is appropriate but I think you would be really interested to hear how people are using this tool at the edge.

tecleandor · 2025-02-26T10:46:26 1740566786

Hope it gets to be public, I love to learn "weird" (or unusual) ways of using tools

billmalarky · 2025-01-13T21:33:59 1736804039

Hey! this is exciting to see! Hi Earl and Oisin! (I've had the pleasure of meeting Earl and Oisin face to face a few times. Really friendly and smart guys, fwiw based on my convos they are very serious about building a compelling product, excited to see it on hn!)

SlyRacoon23 · 2025-01-13T22:09:09 1736806149

Thanks! I appreciated our face to face chats as well :)

billmalarky · 2024-11-14T17:34:04 1731605644

You find issues when they surface during your actual use case (and by "smoke testing" around your real-world use case). You can often "fix" issues in the base model with additional training (supervised fine-tuning, reinforcement learning w/ DPO, etc).

There's a lot of tooling out there making this accessible to someone with a solid full-stack engineering background.

Training an LLM from scratch is a different beast, but that knowledge honestly isn't too practical for everyday engineers given even if you had the knowledge you wouldn't necessarily have the resources necessary to train a competitive model. Of course you could command a high salary working for the orgs who do have these resources! One caveat is there are orgs doing serious post-training even with unsupervised techniques to take a base model and reeaaaaaally bake in domain-specific knowledge/context. Honestly I wonder if even that is unaccessible to pull off. You get a lot of wiggle-room and margin for error when post-training a well-built base model because of transfer learning.

billmalarky · 2024-10-28T20:08:28 1730146108

This post is using regression to build a reward model. The reward model will then be used (in a future post) to build the overall RL system.

Here's the relevant text from the article:

>In this post we’ll discuss how to build a reward model that can predict the upvote count that a specific HN story will get. And in follow-up posts in this series, we’ll use that reward model along with reinforcement learning to create a model that can write high-value HN stories!

jampekka · 2024-10-30T12:42:22 1730292142

The title is misleading. The $4.80 is spent for supervised learning to find the best post.

The post is interesting and I'll be sure to check out the next parts too. It's just that people, as evidenced by this thread, clearly misunderstood or were what was done.