"Doesn't pass my sniff test" is not the purpose of the flag button. Furthermore, it passes my personal sniff test: hundreds of people upvoting it while the top comment is saying it's worthless. Usually the real alpha is in the comments under such things.
Lots of folks working on open-source reasoning models trained with reinforcement learning right now. The best one atm appears to be Alibaba's 32B-parameter QwQ: https://qwenlm.github.io/blog/qwq-32b-preview/
I also recently wrote a blog explaining how reinforcement fine-tuning works, which is likely at least part of the pipeline used to train o1: https://openpipe.ai/blog/openai-rft
I don't know if I would call it "the best one" when it has "How many r in strawberry" as one of its example questions and when tried it arrives at the answer "two".
> Purchasing a .open domain name isn't available to the general public. This particular extension is owned by American Express and currently isn't for sale or open to registration, limiting its use to only selected entities associated with American Express. It's primarily designed to serve the interests of the corporation and its customers' claims or needs.
This is a feature of arxiv that automatically converts text looks like a link into "this http url". The submitter missed the space after the "." in "...strong reasoning ability. OpenAI has claimed...".
whois says no, and it seems there's a close one of https://tld-list.com/tld/open that's owned by (strangely enough) American Express Travel. It could be yet another tpyo of foo.open.ai which would work today, no global TLD required (I mean, they have damn near unlimited money, just buy out whoever owns it now)
In a mathematical conversation, someone suggested to Grothendieck that they should consider a particular prime number. “You mean an actual number?” Grothendieck asked. The other person replied, yes, an actual prime number. Grothendieck suggested, “All right, take 57.”
This paper has been available for a few weeks, and I wrote an article [1] exploring how to apply its inner workings to the design of multi-agent systems. If you can design "reasoning" at the model level, you can also design "reasoning" in larger, more complex systems using the same principles.
- creating more efficient models such as MoE based DeepSeek
- getting their hands on cutting edge GPUs all the same
I think it was Dylan Patel (from semianalysis) on Dwarkesh that mentioned one scam is for a Chinese source to arrange for a SOTA NVidia cluster to be bought/installed in some non-embargoed country, then dismantled and shipped to China.
"HarmonyOS NEXT (Chinese: 鸿蒙星河版; pinyin: Hóngméng Xīnghébǎn) is a proprietary distributed operating system and a major iteration of HarmonyOS, developed by Huawei to support only HarmonyOS native apps."
HarmonyOS NEXT is based on an open source core, OpenHarmony [0], with proprietary additions.
So, not hugely dissimilar from iOS (lots of bits of which are open source, most significantly the core of its XNU kernel) and Android (considering that the proprietary Google Mobile Services is de facto a mandatory component)
They spent a lot of money to find a lot of shallow gradients. Everyone else can climb those same gradients by putting in a little bit of money. Every single funded vertical and research org is proving this. Players in third place and below are incentivized to release their weights to develop an ecosystem around them. Meta and Tencent get to ensure the technology doesn't evolve beyond them by commoditizing their compliment and releasing stuff like Llama and Hunyuan for free.
Furthermore, OpenAI hasn't stumbled across a defensible moat. There's zero switching cost to move to another product, and they don't control any major panes of glass to stay as a default.
If OpenAI doesn't find a moat soon, they're gonna be cooked. The value of foundation models will plummet.
They won’t because they drove away all their actually talent to Anthropic and elsewhere in pursuit of the dumbest version of SV product dev and are now forced to do benchmark hacking in a paper thin ruse to convince the market they still have the talent to compete. The o1 series models are unusable in practice while Claude and the new MCP protocol work is becoming the basis of a bunch of actually functional applications.
I wish HN would stop devolving into Reddit. This comment is the same boring "joke" that has been repeated 100 times on every platform, and keeps being posted for karma. It adds nothing to the conversation.
If you joined 8 months ago it might be hard to recognize. I've been on HN for more than a decade and the quality of discourse has drastically lowered in quality especially in the last 3-4 years. This is a problem with the broader web, not just HN. Tech / startups is now a mainstream topic that attracts a lot of people who are not really in the weeds and are just able to write surface level comments.
Regarding the name, open is just a word. Apple doesn't sell apples. The company never promised to open source every model, only to make them accessible to the public, so you're arguing semantics that lead to no improvement in the technical conversation.
100% agree. If you’ve been here for any length of time you’ve seen it, and nothing is added by the repetition.
Perhaps we should just string-sub to IAnepO or some such, so we can engage with the models and company as it is, without dealing with the (empty) semantics of the name.
I think this is both a harmful and irrational attitude. Why focus on some trivial mechanical errors and disparage the authors for it instead of the thing that is much more important, i.e., the substance of the work? And in dismissing work for such trivial reasons, you risk ignoring things you might have otherwise found interesting.
In an ideal world would second-language speakers of English proofread assiduously? Of course, yes. But time is finite, and in cases like this, so long as a threshold of comprehensibility is cleared, I always give the benefit of the doubt to the authors and surmise that they spent their limited resources focusing on what's more important. (I'd have a much different opinion if this were marketing copy instead of a research paper, of course.)
>in dismissing work for such trivial reasons, you risk ignoring things you might have otherwise found interesting
Not dismissing work for trivially avoidable mistakes risks wasting your precious, limited lifespan investing effort into nonsense. These signals are useful and important. If they couldn't be bothered to proofread, what else couldn't they be bothered to do?
>spent their limited resources focusing on what's more important
Showing that you give a crap is important, and it takes seconds to run through a spell checker.
Well, it not exactly a research paper, more an overview of the problem and suggested techniques, but it'd still be interesting to hear some criticism based on the content rather than the (admittedly odd) omission to run it through a spell checker. I do wonder why it was written in English, apparently targeting a western audience.
Two of the authors are from "Shanghai AI Labs" rather than students, so one might hope it had at least been proofread and passed some sort of muster.
I guess now the strategy of OpenAI would be to keep the small edge all the time, integrate it with businesses fast & possibly kickstart new businesses by supporting them and trying to be synonymous with the best in AI (may be with Deepmind). I cannot think of any other moat, unless somehow they have a lot of proprietary and useful data (like in company) that others cannot replicate
But that edge will become more and more expensive, while the competition will cover more and more of the task space and make it less profitable for OAI
many people are dismissing this paper because it has errors in spelling and grammar.
this is a terrible heuristic for evaluating AI papers. If you use it, you will miss a lot of good work by very strong researchers with below-average English writing skills.
I have not read this paper carefully so claim nothing one way or the other about its quality. It superficially seems like a pleasant and timely survey although a little flag-planty.
I cannot help with the fact that it sounds like a bad strategy to claim this is a good reason "not to worry" about something.
If I were "OpenAI" I would rather read the content than evaluate the form of such articles to know if I should "worry" or not.
It seems like the most Inteiligent method.
Have they released anything on how it works other than "test time compute"? I wonder how similar it is to what's being proposed on this roadmap, that sounds close to what I imagine OpenAI are doing. I guess we'll see when they open source it.
It seems to me that for the most capable and useful models, openness almost exclusively benefits businesses, or maybe academic organizations with money for serious hardware. I know what I can run on my 4090 at home but the results pale in comparison to the commercial services. I see why people consider these matters important from a theoretical standpoint but from a practical standpoint it doesn’t seem particularly consequential. I self-host a few FOSS server applications that are primarily sold as SaaS subscriptions, and folks are often very critical of those businesses benefitting from the “open source” label because they’re often seemingly deliberately difficult to self-host. This seems to be an order of magnitude less open than that. Is there some use case for people with reasonable hardware that I’m just not aware of?
I look at this as being for the reasonable hardware of the future. This is starting to look like actual AGI, and I don't think actual AGI is going to run on a 4090. But an H100 starts to sound like a mass-market product even with the $50k price tag if it actually can run an AGI.
You can get cheaper hosting of open weights from a commercial provider than you can closed weights from the same company. So even if you’re not hosting yourself, openness is a major factor for price competitiveness.
"Now AI has made everything more complex!" "AI is embedded in everything we do"...
Sounds like marketing gibberish and obfuscation, combined with self promotion.
That's just my read at first sniff.
reply