The article does a great job of highlighting the core disconnect in the LLM API economy: linear pricing for a service with non-linear, quadratic compute costs. The traffic analogy is an excellent framing.
One addition: the O(n^2) compute cost is most acute during the one-time prefill of the input prompt. I think the real bottleneck, however, is the KV cache during the decode phase.
For each new token generated, the model must access the intermediate state of all previous tokens. This state is held in the KV Cache, which grows linearly with sequence length and consumes an enormous amount of expensive GPU VRAM. The speed of generating a response is therefore more limited by memory bandwidth.
Viewed this way, Google's 2x price hike on input tokens is probably related to the KV Cache, which supports the article’s “workload shape” hypothesis. A long input prompt creates a huge memory footprint that must be held for the entire generation, even if the output is short.
That obviously should and will be fixed architecturally.
>For each new token generated, the model must access the intermediate state of all previous tokens.
Not all the previous tokens are equal, not all deserve the same attention so to speak. The farther the tokens, the more opportunity for many of them to be pruned and/or collapsed with other similarly distant and lesser meaningful tokens in a given context. So instead of O(n^2) it would be more like O(nlog(n))
I mean, you'd expect that for example "knowlegde worker" models (vs. say "poetry" models) would posses some perturbative stability wrt. changes to/pruning of the remote previous tokens, at least to those tokens which are less meaningful in the current context.
Personally, i feel the situation is good - performance engineering work again becomes somewhat valuable as we're reaching N where O(n^2) forces management to throw some money at engineers instead of at the hardware :)
Does anyone know how AI coding fits in with S174? If a person’s “coding” part of the job is primarily running prompts and checking code outputs (quality control and minor reprompting) with the remainder of the time used for other activities, does this count as software engineering?
It seems like an inevitable outcome of this is elaborate system-gaming to mitigate how much employees fall under S174…
I think the most interesting thing to me is they have multi-hop search & query refinement built in based on prior context/searches. I'm curious how well this works.
I've built a lot of LLM applications with web browsing in it. Allow/block lists are easy to implement with most web search APIs, but multi-hop gets really hairy (and expensive) to do well because it usually requires context from the URLs themselves.
The thing I'm still not seeing here that makes LLM web browsing particularly difficult is the mismatch between search result relevance vs LLM relevance. Getting a diverse list of links is great when searching Google because there is less context per query, but what I really need from an out-of-the-box LLM web browsing API is reranking based on the richer context provided by a message thread/prompt.
For example, writing an article about the side effects of Accutane should err on the side of pulling in research articles first for higher quality information and not blog posts.
It's possible to do this reranking decently well with LLMs (I do it in my "agents" that I've written), but I haven't seen this highlighted from anyone thus far, including in this announcement.
That's been my experience as well. Web search built into the API is great for convenience, but it would be ideal to be able to provide detailed search and reranking params.
Would be interesting to see comparisons for custom web search RAG vs API. I'm assuming that many of the search "params" of the API could be controlled via prompting?
> For example, writing an article about the side effects of Accutane should err on the side of pulling in research articles first for higher quality information and not blog posts.
Interesting, I'm taking isotretinoin right now and I've found it's more interesting and useful to me to read "real" experiences (from reddit and blogs) than research papers.
I just want to hear about how other people have felt while taking the medicine. I don't care about aggregate statistics very much. Honestly what research do you read and for what purpose? All social science is basically junk and most medical research is about people whose bodies and lifestyles are very different than mine.
Wear lots of (mineral) sunscreen, and drink lots and lots of water. La Roche Posey lotions are what I used, and continue to use with tretinoin. Sunscreen is the most important.
Great advice, already quite on top of it. I'd recommend checking out stylevana and importing some of the japanese/korean sunscreens if you haven't tried them out yet!
Higher weight polyphenols tend to taste less bitter than lower weight ones. Its more correlation than causation because I don’t think we precisely know why this is.
I wonder if they're talking about the blurred lines between taste and smell. Most of what we 'taste' is happening in our nose.
Turns out the French and Italians with all of their fancy wine glasses for different kinds of wine are not insane. Glass shape effects the timing of scent versus taste, and so lighter wines have a narrower glass to shorten the time. The heavier the red the wider the glass.
That would be green tea no? You pick the green tea, cook it to denature the enzymes to arrest the oxidization (called the "killgreen" step in Chinese) and voila, green tea! Lots of green teas can be quite smooth and even more so with more careful brewing.
The Brit I mentioned elsewhere seemed to think it was the drying that arrested the chemical processes in the tea.
He was also adamant about storing it in well sealed containers out of direct sunlight. I ended up throwing away a couple of containers because of this (although I've kept a couple that are just too beautiful to part with - I store my daily drinker in there since it doesn't need to keep as long). Also explains why my dealer uses mylar vacuum packs for anything over an ounce. No oxygen, no light.
White tea does not undergo the "killgreen" step that green teas and oolongs teas do IIRC. The drying slows the oxidation but does not arrest it. "Aged" white tea is a thing. If you let it sit around long enough it turns deep red. Green tea just turns into stale tea. They even compress white teas into something similar to those "Pu'er Cakes".
Yes, green tea needs lower temperature and controlled infusion time, but rewards that. The author definitely does not seem to be a fan and is not doing it justice.
Yeah! Green tea gets fried or steamed right away to halt oxidation. That kills off some of the undesirable bitterness that masks some flavors that are even present in fresh leaves. It is not that Green doesn’t have any taste; it is that there are more guardrails over what flavors can appear and how distinct they can be.
The odd thing is that the range (and top end) of antioxidants in white tea is larger than in green.
"Total catechin content (TCC) for white teas ranged widely from 14.40 to 369.60 mg/g of dry plant material for water extracts and 47.16 to 163.94 mg/g for methanol extracts. TCC for green teas also ranged more than 10-fold, from 21.38 to 228.20 mg/g of dry plant material for water extracts and 32.23 to 141.24 mg/g for methanol extracts."
It’s hard to build a product that shows you how clothes fit on someone like you as a B2B service. Retailers don’t want to showcase their clothes on anyone who isn’t anatomically perfect. Plus, if you try to source a more diverse set of “models” from real people wearing clothes, you run into the problem that most people are uncomfortable sharing photos of themselves “modeling” clothes publicly.
You’re also right that fit is only a part of the picture, and even the terms fit and style don’t quite capture what’s really going on you really want to see what clothes are going to look like on someone who looks like you and dresses like you (same preferences for fit, style, etc). Again, hard as B2B for sure.
I’ve been working on a B2C solution in this space for a while (fitfirst.app)…all too familiar with the nuances and intricacies in this space.
Fun fact and totally tangential: if you have two clothes with the exact same measurements and material that are dyed different colors, the darker dyed version tends to feel tighter than the lighter dyed one. Has to do with how the dye feels on the skin. It’s a nuance you can’t get from a photo or rendering.
haha yeah, there's a lot of nuance about a fitting room that's hard for 1 product to solve. Our current product focus more on the styling / outfits / engagement, not claiming on the exact fit. Hopefully it brings positive value to conversion and AOV, which would be enough to justify a B2B case. We've also build an app (Style Space), but are not experts in running it.
My partner and I have been creating a database of over a thousand pant measurements that we've personally gathered over the past year or so. We've found it irritating that the only way to shop for apparel is pretty much by trial and error (fitting rooms, manually looking at size charts and hoping that they're accurate). I have a super small waist-to-hip ratio (and a super small waist) for a man, so I run into two problems
1. Usually stores don't carry the right waist size for pants I want to buy, so I can't try them on.
2. When I buy online, stuff that's in my size is usually too tight around the seat of my pants.
That's why I thought it'd be fun to create a way of browsing pants that let you see the differences between two pairs of pants so you could figure out if something is even worth trying to buy.
Over the past couple days, my partner and I put together a tool in D3 that overlays pants and compares their measurements everywhere.
Let me know what you think, and let me know if you have any technical questions about it.
I’ve started picking the bottom result of google on page ten for fun...you get some really wacky content that clearly isn’t optimized for SEO or isn’t relevant at all.
Example: last night I searched “Robin Williams bipolar” and got a page ten result that was a conspiracy theory on Taylor Swift being a psychopath.
Eh, I think I found the one you're talking about, and it still looks like a somewhat legitimate looking news site, and according to other sources still gets thousands of pageviews a day. Also, that's such a specific topic, by page 10 most of the results are just somewhat related news articles. Try searching something more general, and the results on page 10 are things that could easily be on the frontpage, probably just don't have dedicated SEO.
I want to see what the other 90% of the internet looks like.
But also, I think they should break traffic down by domains, not pages. For example, does each article on HN count as a new page?
One addition: the O(n^2) compute cost is most acute during the one-time prefill of the input prompt. I think the real bottleneck, however, is the KV cache during the decode phase.
For each new token generated, the model must access the intermediate state of all previous tokens. This state is held in the KV Cache, which grows linearly with sequence length and consumes an enormous amount of expensive GPU VRAM. The speed of generating a response is therefore more limited by memory bandwidth.
Viewed this way, Google's 2x price hike on input tokens is probably related to the KV Cache, which supports the article’s “workload shape” hypothesis. A long input prompt creates a huge memory footprint that must be held for the entire generation, even if the output is short.