The title here buries the lede, IMO. Quoting from Epoch AI in the thread:
> We were restricted from disclosing the partnership until around the time o3 launched [...] Our contract specifically prevented us from disclosing information about the funding source and the fact that OpenAI has data access to much but not all of the dataset.
> Regarding training usage: We acknowledge that OpenAI does have access to a large fraction of FrontierMath problems and solutions, with the exception of a unseen-by-OpenAI hold-out set that enables us to independently verify model capabilities. However, we have a verbal agreement that these materials will not be used in model training.
Here, you can read "large fraction" as meaning "everything but the holdout set" - and my understanding is, they haven't disclosed performance on the holdout set.
I don't think this article, or the underlying claims, are (yet) credible. I'm disappointed that the Washington Post wrote about this at all, and that people I respect a lot have amplified it on Twitter.
The claims come from an MBA student graduating from Stanford this year, who plans on starting a developer tooling company. [1] The faculty member (a professor of psychology) who supported this work doesn't comment at all on the claims made by the student; their single quote in the article instead emphasizes how difficult it is to measure productivity.
The claims are based on, from their own description, unpublished ongoing research. There's no description of what data they gathered, what their methods were, or how they assessed accuracy anywhere. (For starters: how did they identify who a software engineer was in each of the companies in their dataset? How did they exclude data scientists, or technical writers, or other engineering-adjacent folks who in some companies need to commit infrequently? How did they identify remote workers, and handle workers transitioning between roles or working arrangements?)
Without this extremely basic information, I think it makes sense to just ignore it. It's possible that it'll eventually pan out as real! But definitely not yet. Wild that people are taking it seriously.
In the United States, physician salaries were 6.5 times GDP per capita for specialists and 4.1 times GDP per capita for generalists.
(Shrug) I'm OK with the notion that doctors contribute somewhere between 4 and 7 times more value than your average schlub driving a bus, and I'm OK with paying them accordingly.
Now, how much do the administrators, insurance-company execs, and other noncontributors make?
Their commitment to open source, however, might go.
Quite recently Google quietly unshipped an effort to make their protobuf build rules more useful to OSS users of Bazel (see the rules_proto repository). This wasted a huge amount of planning and work that'd gone into the migration.
And the fact that these tools are designed first and foremost for Google use shows up everywhere. Stuff that Google fundamentally doesn't care about but is widely used (eg Ruby) is stagnant.
In this state, it's totally reasonable to reconsider whether these tools are worth building on top of. I personally still believe! But I don't blame people who are skeptical.
> Their commitment to open source, however, might go
Google's OSS contributions are largely correlated to the fact that they could _afford_ to do OSS. When you have the best business model in the world, you can afford X% of your engineering hours focused on OSS. Of course, it's purely not altruistic they also get back a lot in return
However, if due to AI or other advancements, Google's business model takes a hit I wouldn't be surprised that their OSS contributions are the first to go off. Like we saw Google codejam being discontinued in 2022 layoffs
Though if your business outlives Google, gRPC going away might be least of your problems
There was a influential internal white paper about not being a "tech island" that drove open-sourcing. The point was that by having its own tech stack Google would eventually be left behind and have a harder time recruiting.
Not sure if the message is still believed as strongly.
The message is pretty well understood - the only difference is that the monorepo (think of it as a service in and of itself) and its associated tooling do get seen as "Google-specific."
- In mid-February 2024, the polyfillpolyfill account was created on Github, and took ownership over the repo.
So I think sometime between October 2023 and February 2024, JakeChampion decided to sell the site to Funnull. I think the evidence is consistent with him having made a decision to sell the site to _somebody_ in December 2023, and the deal with Funnull closing sometime early February 2024.
As someone who attended before and after the change: there was certainly no immediate effect. The consensus at the time (at least amongst my little circle of friends) was that adopting the Common App was an obviously short-sighted metrics play.
It's definitely been superficially good for my social life, but I remain very sorry that the Uncommon App is no longer around. It played a nontrivial role in my decision to apply in the first place.
I agree with the general thrust of this, but it's worth noting that the author does _slightly_ better than is typical for LLM-based analyses: they released the dataset of book-labeled posts. You can at least estimate the false-positive rate from that, by sampling the results. (You can't estimate false-negative rate, though.)
Ideally authors attempt to do some sort of validation of the results at the LLM-labeling step and present that, but that rarely happens with these sorts of posts. I think that's pretty telling.
For whatever it's worth, the person you're replying to took the courtesy of pointing you to some research they're familiar with, and your "it's just one paper" point (or really, any of your points here) would be much more convincing if you responded in kind to that courtesy.
On topics like this you can argue from first principles endlessly and never get any closer to the truth. Better to show us some data, I think.
This is really interesting to hear; I work in a team & org that I think is pretty healthy, but I've found the yearly "brag doc" exercise useful for several years running, if only to go through and remind myself of everything I worked on. I consistently find that I've done a _lot_ more than I remember, and that's both a boost and also a healthy opportunity to reflect. The artifact is then useful over the next year as a reminder.
This is in fact the focus of Julia's post (she literally says this early on), and I think it's kind of unfortunate that folks are mostly talking about using the brag doc as advocacy in the performance review process.
When the organization is healthy there is no need for a catalog of achievements.
For mental boosts you get affirmation regularly that you are moving the team/product/org in the right direction (or the opposite you recalibrate quickly if you aren’t).
Similarly, reflecting on what has been accomplished is a regular part of the holistic process, not a bespoke individuals task.
If a brag doc is valuable to you personally, great! By all means feel free to build one. But if building one is necessary to excel in an organization that is a very bad sign.
Sure - earlier you said "if this is useful that's a really bad sign", and now you're saying "if it's necessary it's a very bad sign", which are pretty different claims. I'm mostly interested in probing the former, so if you're not making that stronger claim then I think we're on the same page.
I was responding to this claim: “ but its a red flag if you do this and it has no impact on your compensation or promos”
That implies (to me at least) that the brag doc is necessary to get appropriate recognition externally in the org. That’s a huge red flag. If it provides you personally some internal validation then whatever, that doesn’t say anything about your organization.
What's most shocking to me is how much malware there is in all of this. The fact that Google et al aren't constantly in trouble for directly forwarding unwitting users to malware distributors indicates to me just how far our standards have fallen for a "good" search engine. I feel like we'd be happier with search engines that adhered to "first, do no harm" principles.
> We were restricted from disclosing the partnership until around the time o3 launched [...] Our contract specifically prevented us from disclosing information about the funding source and the fact that OpenAI has data access to much but not all of the dataset.
> Regarding training usage: We acknowledge that OpenAI does have access to a large fraction of FrontierMath problems and solutions, with the exception of a unseen-by-OpenAI hold-out set that enables us to independently verify model capabilities. However, we have a verbal agreement that these materials will not be used in model training.
Here, you can read "large fraction" as meaning "everything but the holdout set" - and my understanding is, they haven't disclosed performance on the holdout set.
reply