More

ankit219 · 2025-08-21T21:04:47 1755810287

Building with non deterministic systems isnt new. It does not take a scientist. Though people who have experience with these systems are fewer in number today. You saw the same thing with TCP/IP development where we ended up developing systems that assumed the randomness and made sure that isnt passed on to the next layer. For every game, given the latency involved in previous networks, there is no way on the network games were deterministic.

golergka · 2025-08-22T09:09:05 1755853745

Isn't any kind of human in the loop make system non-deterministic?

ankit219 · 2025-08-18T22:48:55 1755557335

From Incognito window's note > Others who use this device won’t see your activity, so you can browse more privately. This won't change how data is collected by websites you visit and the services they use, including Google. Downloads, bookmarks and reading list items will be saved

Incognito does not hide your activity from Google. Especially when you googled in incognito and they likely use IP addresses as part of their targeting. I am also assuming it's different for different kinds of ads given you wont see ads if you look at something personal. They infact allow IP address targeting somehow. [1] Their privacy stance is more about 3rd party not having access to the data google has collected.

[1]: https://www.shopifreaks.com/google-to-allow-the-use-of-ip-ad...

ankit219 · 2025-08-16T06:01:43 1755324103

The bottleneck for automation is verification. With human work, verification was fast(er) because you know where to look with certain assumptions that your upstream tasker would not have made trivial mistakes. For automation, AI needs to verify it's own work, review, and self correct to be able to automate any given work. Where this works, it will also change the abstraction layer compared to what it is today. The problem is same with every automation promise - it needs to work reliably at say 95% or 99% times and when it doesn't, there should be human contingency in terms of what to look for. Considering coding as the first example: it's already underway. AI generates the code, the test cases, and then verifies if the code works as intended. Code has a built in verification layer (both compiler and unit tests). High probablity the other domains move towards something similar too. I would also say the model needs to be intelligent to course correct when the output isn't validated[1].

Verification solves the human in the loop dependency both for AI and human tasks. All the places where we could automate in the past, there were clearly quality checks which ensured the machinery were working as expected. Same thing will be replicated with AI too.

Disclaimer: I have been working on building a universal verifier for AI tasks. The way it works is you give it a set of rules (policy) + AI output (could be human output too) and it outputs a scalar score + clause level citations. So I have been thinking about the problem space and might be over rating this. Would welcome contrarian ideas. (no, it's not llm as a judge)

[1]: Some people may call it environment based learning, but in ML terms i feel it's different. That woudl be another example of sv startups using technical terms to market themselves when they dont do what they say.

saint_yossarian · 2025-08-16T08:33:42 1755333222

One thing that comes to mind: You still have to verify that the tests are exhaustive, and that the code isn't just gaming specific test scenarios.

I guess fuzzing and property-based testing could mitigate this to some extent.

ankit219 · 2025-08-16T16:25:30 1755361530

Yes, we are getting there. I think compiler is a bigger problem than unit tests given most verticals don't even have that. With unit tests, there would be some reward hacking but would be controlled at the model level + tests. (this is one of the reason i dont believe in transformer based llm as a judge for a verifier)

ankit219 · 2025-08-14T20:03:01 1755201781

This is super cool. Usually you dont see effective models at 270M out in the wild. The architectural choices are new and interesting as well.

Would it be okay for you to divulge some more training information here? With 170M embedding parameters, how do you ensure no embedding collapse and keeping the embedding matrix stable at training time?

(i know i am asking too much, but just curious). There is a clear trade off for you with vocab / transformer layers. How did you arrive at the split of 170m/100m. Does this contribute to model's performance on task specific fine tuning? Any internal experiments you could share? or public info you could point us to? Anything would be amazing.

PS: I am sorry if this is rude, but this has so many decisions i am curious about. Not intending to undermine anything, this is amazing work, and thank you for the whole Gemma series.

canyon289 · 2025-08-14T21:21:17 1755206477

Not rude at all and I'll again share what I can.

We ran a bunch of experimental architectures at this size to get a sense of performance at this size, in particular how well it was able to adapt to datasets across some loss measures.

For the embedding size it comes from a mix of "hard technical" data, like the loss measures I mentioned above, and for this model it also comes from community considerations such as adaptability across input tokens and consistency with the gemma ecosystem. At this size you are right its a bit funny the embedding is so large.

For more details read the Gemma3 technical report https://arxiv.org/pdf/2503.19786. It doesnt cover the 270m model as this was written from the 1b to 27b gemma3 release but itll answer some of your questions. As for 270m we may share more information in the future, Up until now we were just focused on getting the model out there.

ankit219 · 2025-08-11T20:35:24 1754944524

No justification for a $100k number. For $100k a year or about $8k a month, you will end up using 1B tokens a month (that too a generous blended $8 per million input/output tokens including caching while the number is lower than that). Per person.

I think there is a case Claude did not reduce their pricing given that they have the best coding models out there. There recent fundraise had them disclose their Gross margins at 60% (and -30% with usage via bedrock etc). This way they can offer 2.5x more tokens at the same price than the vibe code companies and yet break even. The market movement where the assumption did not work out was about how we still only have claude which made vibe coding work and is the most tasteful when it comes to what users want. There are probably models better at thinking and logic, especially o3, but this signals the staying power of claude - having a lock in, it's popularity, and challenges the more fundamental assumption about language models being commodities.

(Speculating) Many companies woudl want to move away from claude but cant because users love the models.

ankit219 · 2025-08-09T06:49:31 1754722171

This is a highly speculative post, with conjectures presented as facts. Some things that irked me:

- Cursor did not hire Anthropic's "researchers". It hired the guys who built Claude code (PM and dev). Who then promptly went back to Anthropic in 14 days. A researcher for Cursor need not come from Anthropic either. One high profile recruit for them was Jack Gallagher (Midjourney) who is probably one of the best at RL.

- Google's deal with Windsurf is structured that way because they likely could not directly acquire, or were not confident that it would have gotten past the antitrust. A signal for that is such deal increased in last few years after FTC refused to allow any deal over $100M or so. Microsoft has done such deals too. Meta would have acquired scale ai in older times. Not sure with Openai, but they arent as scrutinized as Google for such deals. To imply that this means Google did not care about ARR is not justified. and then google licensed Windsurf's IP too.

- Openai's agreements with Microsoft is more probable than they did not complete the acquisition because of negative gross margins.

- Plus, the old adage about how a growing startup is worth more because of a stellar team. You strip a team away and still get 2x multiple is sure enough valuing the current ARR highly.

I thought the userbase is valuable. A sale at this point made sense because they might not have been to get the money if they waited a year. Reasons laid out in the article are not why I think so.

raincole · 2025-08-09T11:37:34 1754739454

Every single article on this domain is just human hallucination. Little to none research and due diligence done.

By the way, The latest article before this one was "tokens are getting more expensive". (One week before $1.25/$10 GPT5 releases. Talking about aging like milk...)

binary132 · 2025-08-09T11:57:06 1754740626

There are tells that it’s not written by a human, but it’s harder to know how much it’s guided.

apwell23 · 2025-08-09T13:19:14 1754745554

> Little to none research and due diligence done.

There has never been more drama in tech. 007 level drama with chinese and russian spies.

Can we just relax and have fun.

dingnuts · 2025-08-09T14:55:48 1754751348

[flagged]

raincole · 2025-08-09T15:11:16 1754752276

Thank you for reminding me words have meanings! I never know that.

I'll keep using "human hallucination" to describe bullshit articles about AI though.

aprilthird2021 · 2025-08-09T13:13:37 1754745217

> Google's deal with Windsurf is structured that way because they likely could not directly acquire, or were not confident that it would have gotten past the antitrust. A signal for that is such deal increased in last few years after FTC refused to allow any deal over $100M or so.

Any proof of this? It's quite speculative. Also FTC scrutiny is not escaped if you only acquire a percentage of a company to avoid antitrust scrutiny (as you claim Meta did, speaking of which...)

> Meta would have acquired scale ai in older times

According to reporting, Meta was solely interested in Wang and his inner circle, and did not want to acquire a significant stake in the company. Wang negotiated them UP. It's not as if they wanted to buy the whole thing at its previous valuation, let alone a higher valuation. (source: https://archive.is/ZPoNJ)

ankit219 · 2025-08-09T16:05:37 1754755537

This is speculated as the reason as blockbuster acquihires have risen:

https://www.bloomberg.com/opinion/articles/2025-07-17/meta-g...

https://natlawreview.com/article/rise-acquihiring-post-layof...

https://bowoftheseus.substack.com/p/update-the-gut-and-licen...

Meta's case is interesting. In the past, for what they want, I still feel they would have just acquired the company and be done with it. Now, they explored more paths, and ended up negotiating for Scale AI's stake.

aprilthird2021 · 2025-08-10T19:02:24 1754852544

Did you read the articles you linked? Here's one except:

> The data shows that after four years of being frozen out of the acquihire market by former FTC chair Lina Khan, big tech companies are back with a vengeance

That (and the accompanying chart showing total acquihires over years) says acquihires existed before regulatory scrutiny, stopped while that scrutiny ramped up, and came back when the scrutiny went away. Not what you suggested, that it's a novel tool used to avoid scrutiny.

I appreciate how you feel, but it's based on ultimately, just a feeling, not any statistics (the stats in your linked articles paint another picture entirely about acquihires). Also, there is the basic fact that FTC scrutiny cannot be avoided by minority ownership or acquihiring alone. They have the right and ability to investigate even minority purchases of stake in a company. This is a good case study of that: https://www.faegredrinker.com/en/insights/publications/2018/...

pyman · 2025-08-09T09:51:22 1754733082

> Google's deal with Windsurf is structured that way because they likely could not directly acquire, or were not confident that it would have gotten past the antitrust

I've been following OpenAI, Google, and Microsoft's acquisitions over the last five years, and the US government has given them the green light when it comes to AI. It makes sense since the FTC and DOJ directors are appointed by the government, and the government is concerned about China's advances in AI.

Also, Google pulled the same move Microsoft did with Inflection AI. They hired Windsurf's CEO, its co-founder, and other key people, and licensed Windsurfs codebase without acquiring the company. It was the smartest business move they could make.

So from a business and political point of view, your assumption doesn't hold up.

> Openai's agreements with Microsoft is more probable than they did not complete the acquisition because of negative gross margins.

This is also incorrect and the first time I've heard this reason. Executives have already told reporters that OpenAI and Microsoft have an agreement, and Microsoft doesn't want OpenAI entering the software development arena. They hold the keys to GitHub, and that's keeping everyone out for now, including Google.

> Plus, the old adage about how a growing startup is worth more because of a stellar team. You strip a team away and still get 2x multiple is sure enough valuing the current ARR highly.

I don't think so. Investors back people first, and in AI, the people are everything. Just look at how much Meta is willing to pay top AI researchers. OpenAI, Microsoft, and Google are all chasing the same talent. Knowledge is extremely valuable when it comes to AI/ML.

Google learned this the hard way when it let Noam Shazeer leave. When they realised how valuable he was, they ended up paying $2.7 billion to bring him back.

ankit219 · 2025-08-09T15:58:01 1754755081

Meta's case might be different, and hence i wrote it as scale would have been acquired in past times instead of an investment like they did today. For other companies, the hiring aspect avoids M&A filings, balance sheet consolidation, and still give the big tech companies access to IP and talent. Easier path to get what you want than say Wiz acquisition still waiting to be completed. The stance may have recently changed, but seems like big tech companies are not testing it. (it's also not just US, but other jurisdictions have a say too. Meta had to sell Giphy due to issue raised by UK/Europe)

Re Microsoft and Windsurf, this[1] article claims Windsurf did not want MS accessing it's intellectual property, a default condition for Microsoft OpenAI deal.

> Windsurf didn’t want Microsoft to have access to its intellectual property — a condition that OpenAI was unsuccessful in getting Microsoft’s agreement on, people familiar said. That was one of several sticking points in Microsoft and OpenAI’s ongoing talks about the AI company’s effort to restructure into a commercial entity. Microsoft’s existing agreement with OpenAI says the software giant is entitled to access the startup’s technology.

This is different from Microsoft not wanting openai to compete with Github copilot, given thats what they do with Codex anyway.

> Knowledge is extremely valuable when it comes to AI/ML We agree on that, but even in the past most acquisitions would value teams at a certain cost too. Without the team that built it, any company would struggle to get a good multiple. OP's claim was that lower multiple was cos of margins only, and i disagreed there.

[1]: https://www.bloomberg.com/news/articles/2025-07-11/openai-s-...

ankit219 · 2025-08-08T00:43:29 1754613809

Anthropic in the latest fundraise announce their gross margins are around 60%. Typically in the industry now, the gross margins are reported as revenue - cost of only serving paid users (free users etc. go in either cac or r&d). The thing is, with this margin, Anthropic can serve 2.5x more tokens than say cursor without making a loss for the same price compared to cursor. This reality is likely going to hit other startups based on the same assumption too. Anthropic has not reduced its pricing in almost a year now.

v5v3 · 2025-08-08T08:30:51 1754641851

>cost of only serving paid users (free users etc. go in either cac or r&d).

Then they should ideally be quoting margins for both Inc and exc free.

ankit219 · 2025-08-09T07:38:38 1754725118

They arent public and pitch in front of sophisticated guys who can probably tell this apart.

ankit219 · 2025-08-08T00:28:26 1754612906

I think CLI is a good idea for now. Next abstraction seems to be Github PRs where someone (likely me) files an issue/feature, then I click a button, and the agent fixes the issue/feature. Github has talked about something similar, but surely it were a pain to figure out if it was GA and I had access to it given so many different variations they have called gh copilot. (PS: it exists, but not as smooth as I described: https://docs.github.com/en/copilot/how-tos/use-copilot-agent... )

imp0cat · 2025-08-08T06:34:55 1754634895

You can already have that with Jules. It's quite impressive.

https://jules.google/

ankit219 · 2025-08-07T01:20:18 1754529618

This was a version where they wanted to collect data. The next version where gemini cli/jules harness is likely part of the RL training environment and the model would work a lot better. Thats the trajectory of improvement.

ankit219 · 2025-08-03T15:02:25 1754233345

Likely op does not mean ai slop, but more a signal of human carelessness that they could not write it in a proper manner.