Our metric is approximately "hours of work for an expert engineer." Here are some example open source PRs and their output metrics calculated by our algorithm:
Curious how these numbers correlate to the estimates of the engineers behind the PRs?
For example, the first PR is correlated with ~15 "hours of work for an expert engineer"
Looking at the PR, it was opened on Sept 18th and merged on Oct 2nd. That's two weeks, or 10 working days, later.
Between the initial code, the follow up PR feedback, and merging with upstream (8 times), I would wager that this took longer than 15 hours of work on the part of the author.
It doesn't _really_ matter, as long as the metrics are proportional, but it may be better to refer to them as isolated complexity hours, as context-switching doesn't seem to be properly accounted for.
Yeah maybe "expert engineer" is the wrong framing and it should be "oracle engineer" instead - you're right that we're not accounting for context switching (which, to be fair, is not really productive right?)
However ultimately the meaning isn't the absolute number but rather the relative difference (e.g. from PR to PR, or from team to team) - that's why we show industry benchmarks and make it easy to compare across teams!
That assumes all or almost all the work is writing the code, with no time allotted to actually using the app with that code written, benchmarking or other measurements, research about possible alternatives, etc.
Not at all! The algorithm is calibrated with real human effort. So find/replacing something 1000 times will have nowhere near the same value as adding 1000 lines of new code. And given 1000 lines of new code, you'll get the same value for implementing the same functionality in 100 lines instead.
What we don't capture is any product or communication overhead - however our platform has other metrics which can help find if these are causing inefficiencies :)
In a complex, mature system, a high impact bug could have a very small fix that is highly non-obvious. Your metric assumes that the person shitting out 1000 lines of a new feature no one wants is equally as productive as a distributed systems wizard who can fix bugs no one else can figure out adding a 3 line fix for an issue that customers have been complaining about for years. It is inherently biased towards adding new features and against maintenance and system quality improvement.
https://github.com/PostHog/posthog/pull/25056: 15.266 (Adds backend, frontend, and tests for a new feature)
https://github.com/microsoft/vscode/pull/222315: 8.401 (Refactors code to use a new service and adds new tests)
https://github.com/facebook/react/pull/27977: 5.787 (Small change with extensive, high effort tests; approximately 1 day of work for expert engineer)
https://github.com/microsoft/vscode/pull/213262: 1.06 (Mostly straightforward refactor; well under 1 day of work)