More

singron · 2025-07-30T19:35:38 1753904138

"total comp" (salary+equity) is really hard to quantify for a private company. In order to qualify as ISOs, the stock options need to be priced at the Fair Market Value (FMV), which makes them essentially worth ~$0 on paper on the day they are granted. In order to value them differently, you need to guess if/how the company will increase in value in the future. If the gains were guaranteed, then that should be factored into the current FMV, so options always have significant uncertainty.

This is unlike an RSU from a public company, where you can sell the value of your shares as they vest and add that to your income with minor risk of price volatility.

singron · 2025-07-27T20:30:07 1753648207

Are NASA employees even a significant part of SLS? Doesn't the bulk of the money go to Boeing and Northrup Grumman?

singron · 2025-07-22T15:38:42 1753198722

I have a similar philosophy for low priority tickets. Some people say it's not worth filing a low priority ticket since we can't even do the medium priority ones. I think it's still valuable since (1) it let's you write it down to remove it from your mind, and (2) you can track repeated instances or anything else that might cause you to increase the priority.

singron · 2025-07-18T01:42:49 1752802969

I tried using aider with godot. CC would probably be better. Aider with 4o/o3-mini wasn't very good at gdscript, and it was terrible at editing tres/tscn files (which are usually modified through the editor). If you had a very code-centric game, it could turn out OK, but if you have resources/assets that you normally edit with special programs, it is going to struggle.

wahnfrieden · 2025-07-18T02:18:15 1752805095

You should be using the best models. Try o3-pro

singron · 2025-07-17T21:14:29 1752786869

I haven't used Claude Code a lot, but I was using about $2-$5/hour, but it varied a lot. If I used it 6 hours/day and worked a normal 21 workday month (126 hours), then I would rack up $250-$630/month in API costs. I think I could be a more efficient with practice (maybe $1-$3/hour?). If you think you are seriously going to use it, then the $100/month or $200/month subscriptions could definitely be worth it as long as you aren't getting rate limited.

If you aren't sure whether to pull the trigger on a subscription, I would put $5-$10 into an API console account and use CC with an API key.

singron · 2025-07-15T16:49:05 1752598145

I don't think there is discussion of the snort-2 and snort-3 benchmarks, which the linear engine handily beats the python re for once (70-80x faster). I'm guessing they are cases where backtracking is painfully quadratic in re, but it would have been nice to hear about those successes. [In the rest of the benchmarks, python re is 2-5x faster]

singron · 2025-07-15T15:45:20 1752594320

You need to train new models to advance the knowledge cutoff. You don't necessarily need to R&D new architectures, and maybe you can infuse a model with new knowledge without completely training from scratch, but if you do nothing the model will become obsolete.

Also the semianalysis estimate is from Feb 2023, which is before the release of gpt4, and it assumes 13 million DAU. ChatGPT has 800 million WAU, so that's somewhere between 115 million and 800 million DAU. E.g. if we prorate the cogs estimate for 200 DAU, then that's 15x higher or $3.75B.

ghc · 2025-07-15T17:28:32 1752600512

> You need to train new models to advance the knowledge cutoff

That's a great point, but I think it's less important now with MCP and RAG. If VC money dried up and the bubble burst, we'd still have broadly useful models that wouldn't be obsolete for years. Releasing a new model every year might be a lot cheaper if a company converts GPU opex to capex and accepts a long training time.

> Also the semianalysis estimate is from Feb 2023,

Oh! I missed the date. You're right, that's a lot more expensive. On the other hand, inference has likely gotten a lot cheaper (in terms of GPU TOPS) too. Still, I think there's a profitable business model there if VC funding dries up and most of the model companies collapse.

singron · 2025-07-15T14:52:14 1752591134

So the "multiply by 12" thing is a slight corruption of ARR, which should be based on recurring revenue (i.e. subscriptions). Subscriptions are harder to game by e.g. channel-stuffing and should be much more stable than non-recurring revenue.

To steelman the original concept, annual revenue isn't a great measure for a young fast-growing company since you are averaging all the months of the last year, many of which aren't indicative of the trajectory of the company. E.g. if a company only had revenue the last 3 months, annual revenue is a bad measure. So you use MRR to get a better notion of instantaneous revenue, but you need to annualize it to make it a useful comparison (e.g. to compute a P/E ratio), so you use ARR.

Private investors will of course demand more detailed numbers like churn and an exact breakdown of "recurring" revenue. The real issue is that these aren't public companies, and so they have no obligation to report anything to the public, and their PR team carefully selects a couple nice sounding numbers.

singron · 2025-07-14T17:36:26 1752514586

I don't think there is a 10th amendment violation or a question of federal authority. States can't be compelled to perform federal law enforcement because of the 10th amendment. States are accordingly allowed to prevent their own law enforcement from performing federal law enforcement. If state law enforcement aids the feds anyway, then they are just breaking state law.

A 10th amendment violation would be if the feds require the state to perform federal law enforcement.

Federal authority is relevant if they e.g. raided state law enforcement offices to take the data without consent, but in this case they are just given the data by state officers.

rapatel0 · 2025-07-15T00:10:09 1752538209

We don't know what degree of pressure was or was not exerted on state authorities to compel them to support ICE.

Also, I don't think sharing data would be considered enforcing federal law.

singron · 2025-07-10T21:33:21 1752183201

Polling is the way to go, but it's also very tricky to get right. In particular, it's non-trivial to make a reliable queue that's also fast when transactions are held open and vacuum isn't able to clean tuples. E.g. "get the first available tuple" might have to skip over 1000s of dead tuples.

Holding transactions open is an anti-pattern for sure, but it's occasionally useful. E.g. pg_repack keeps a transaction open while it runs, and I believe vacuum also holds an open transaction part of the time too. It's also nice if your database doesn't melt whenever this happens on accident.

time0ut · 2025-07-11T00:22:39 1752193359

An approach that has worked for me is to hash partition the table and have each worker look for work in one partition at a time. There are a number of strategies depending on how you manage workers. This allows you to only consider 1/Nth of the dead tuples, where N is the number of partitions, when looking for work. It does come at the cost of strict ordering, but there are many use cases where strict ordering is not required. The largest scale implementation of this strategy that I have done had 128 partitions with a worker per partition pumping through ~100 million tasks per day.

I also found LISTEN/NOTIFY to not work well at this scale and used a polling based approach with a back off when no work was found.

Quite an interesting problem and a bit challenging to get right at scale.

j16sdiz · 2025-07-11T01:13:36 1752196416

Can't change the number of partition dynamically.

Additional challenge if jobs comes in funny sizes

AlisdairO · 2025-07-11T04:01:49 1752206509

Depending on exactly what you need, you can often fake this with a functional index on mod(queue_value_id, 5000). You then query for mod(queue_value_id,5000) between m and n. You can then dynamically adjust the gap between m and n based on how many partitions you want

dfsegoat · 2025-07-11T00:36:36 1752194196

If there were a toy or other public implementation of this, I would love to see it.

CBLT · 2025-07-11T02:10:13 1752199813

This is how Kafka does it. Kafka has spent years working on the rough edges (e.g. partition resizing), haven't used it recently though.

atombender · 2025-07-11T09:59:13 1752227953

Dead tuples is a real and significant problem, not just because it has to skip the tuples, but because the statistics that drive the planner don't account for them.

I found this out the hard way when I had a simple query that suddenly got very, very slow on a table where the application would constantly do a `SELECT ... FOR UPDATE SKIP LOCKED` and then immediately delete the rows after a tiny bit of processing.

It turned out that with a nearly empty table of about 10-20k dead tuples, the planner switched to using a different index scan, and would overfetch tons of pages just to discard them, as they only contained dead tuples. What I didn't realize is that the planner statistics doesn't care about dead tuples, and ANALYZE doesn't take them into account. So the planner started to think the table was much bigger than it actually was.

It's really important for these uses cases to tweak the autovacuum settings (which can be set on a per-table basis) to be much more aggressive, so that under high load, the vacuum runs pretty much continuously.

Another option is to avoid deleting rows, but instead use a column to mark rows as complete, which together with a partial index can avoid dead tuples. There are both pros and cons; it requires doing the cleanup (and VACUUM) as a separate job.

singron · 2025-07-11T19:10:04 1752261004

Unfortunately, updating the row also creates dead tuples. It's very tricky!

atombender · 2025-07-11T19:14:21 1752261261

It does, but because of how indexes work, I believe it won't be skewed by the presence of dead tuples (though the bloat can cause the live dat to be spread across a lot more blocks and therefore generate more I/O) as long as you run autoanalyze semi-regularly.

singron · 2025-07-11T21:10:18 1752268218

It depends on if you are getting Heap Only Tuples (HOT) updates or not. https://www.postgresql.org/docs/current/storage-hot.html

In this case, you might have enough dead tuples across your heap that you might get a lot of HOT updates. If you are processing in insertion order, you will also probably process in heap order, and you can actually get 0 HOT updates since the other tuples in the page aren't fully dead yet. You could try using a lower fillfactor to avoid this, but that's also bad for performance so it might not help.

atombender · 2025-07-11T21:39:06 1752269946

If you have a "done" column that you filter on using a partial index, then it would never use HOT updates anyway, since HOT requires that none of the modified columns have an index.

menthe · 2025-07-11T22:30:28 1752273028

False.

As of PG16, HOT updates are tolerated against summarizing indexes, such as BRIN.

https://www.postgresql.org/docs/16/storage-hot.html

Besides, you probably don't want "done" jobs in the same table as pending or retriable jobs - as you scale up, you likely want to archive them as it provides various operational advantages, at no cost.

atombender · 2025-07-11T22:34:30 1752273270

Not false. Nobody would ever use BRIN for this. I'm talking about regular indexes, which do prevent HOT.

If you read my earlier comment properly, you'll notice a "done" column is to avoid deleting columns on the hot path and avoid dead tuples messing up the planner. I agree that a table should not contain done jobs, but then you risk running into the dead tuple problem. Both approaches are a compromise.

leontrolski · 2025-07-11T04:36:32 1752208592

> also fast when transactions are held open

In my linked example, on getting the item from the queue, you immediately set the status to something that you're not polling for - does Postgres still have to skip past these tuples (even in an index) until they're vacuumed up?