A lesson for the ages: that cultured (or not) rich person over there isn't any more intelligent or prescient than your neighbour or colleague, and most certainly no more than your partner. They just have more money.
Seriously though, it’s a bit of an amusing coincidence that the Leibniz biscuit and the fig Newton were both independently invented in 1891 (at least according to Wikipedia).
It's also impossible for a new user of macOS to show hidden files without an online search. Iirc it's a non documented (in the UI) keyboard shortcut. Very discoverable.
Only tangentially related to the post, but I don't see it mentioned there: what do people use to run benchmarks on CI? If I understand correctly, standard OSS GH Actions/Azure Pipelines runners aren't going to be uniform enough to provide useful benchmark results. What does the rust project use? What do other projects use?
Typically, you purchase/rent a server that does nothing but sequentially run queued benchmarks (and the size/performance of this server doesn't really matter, as long as the performance is consistent), then sends the report somewhere for hosting and processing. Of course, this could be triggered by something running in CI, and the CI job could wait for the results, if benchmarking is an important part of your workflow. Or if your CI setup allows it, you tag one of the nodes as a "benchmarking" node which only run jobs tagged as "benchmark", but I don't think a lot of the hosted setups allow this, mostly seen this in self-hosted CI setups.
But CI and benchmarks really shouldn't be run on the same host.
Essentially what I described above, a dedicated machine that runs benchmarks. The Rust project seems to do it via GitHub comments (as I understand https://github.com/rust-lang/rustc-perf/tree/master/collecto...), others have API servers that respond to HTTP requests done from CI/chat, others have remote GUIs that triggers the runs. I don't think there is a single solution that everyone/most are using.
Dedicated hardware doesn't need to be expensive! Hetzner has dedicated servers for like 40 EUR/month, Vultr has it for 30 EUR/month.
VPS's kind of doesn't make sense because of noisy neighbors, and since that has a lot of fluctuations, because neighbors come and go, I don't think there is a measure you can take that applies everywhere.
For example, you could rent a VPS at AWS and start measuring variance, which looks fine for two months but suddenly it doesn't, because that day you got a noisy neighbor. Then you try VPS at Google Cloud and that's noisy from day one.
You really don't know until you allocate the VPS and leave it running, but the day could always come, and the benchmarking results are something you really need to be able to trust that they're accurate.
Is there something to be said for practicing how you play? If your real world builds are going to be on VPS’s with noisy neighbors (or indeed local machines with noisy users), I’d prefer a system that was built to optimize for that to one that works fantastically when there is 0 contention but falls on its face otherwise.
Different things for different purposes. Measuring how real software under real production workloads in variable enviornments behaves is useful but inherently high-variance. It doesn't let you track <1% changes commit-by-commit.
> As I understand it, hardware counters would remain consistent in the face of the normal noisy CI runner.
With cloud CI runners you'd still have issues with hardware differences, e.g. different CPUs counting slightly differently. even memcpy behavior is hardware-dependent! And if you're measuring multi-threaded programs then concurrent algorithms may be sensitive to timing. Also microcode updates for the latest CPU vulnerabilities. And that's just instruction counts. Other metrics such as cycle counts, cache misses or wall-time are far more sensitive.
To make sure we're not slowly accumulating <1% regressions hidden in the noise and to be able to attribute regressions to a specific commit we need really low noise levels.
So for reliable, comparable benchmarks dedicated is needed.
> With cloud CI runners you'd still have issues with hardware differences
For my project it really is the diff of each commit, which means that I start from a parent commit that isn’t part of the PR and re-measure that, then for each new commit. This should avoid accounting for changes in hardware as well as things like Rust versions (if those aren’t locked in via rustup).
The rest of your points are valid of course, but this was a good compromise for my OSS project where I don’t wish to spend extra money.
The thing is that things like Cachegrind are supposed to be used as complements to time-based profilers, not to replace them.
If you're getting +-20% different for each time based benchmark, it might just be noisy neighbors but could also be some other problem that actually manifests for users too.
> used as complements to time-based profilers, not to replace them
Sure. I also use hyperfine to run a bigger test as a user would see the system. I cross reference that with the instruction counts. I use these hardware metrics in a free CI runner, and hyperfine locally.
I've looked into this before and there are very few tools for this. The only vaguely generic one I've found is Codespeed: https://github.com/tobami/codespeed
However it's not very good. Seems like most people just write their own custom performance monitoring tooling.
As for how you actually run it, you can get fairly low noise runtimes by running on a dedicated machine on Linux. You have to do some tricks like pinning your program to dedicated CPU cores and making sure nothing else can run on them. You can get under 1% variance that way, but in general I found you can't really get low enough variance on wall time to be useful in most cases, so instruction count is a better metric.
I think you could do better than instruction count though but it would be a research project - take all the low noise performance metrics you can measure (instruction count, branch misses etc), measure a load of wall times for different programs and different systems (core count, RAM size etc.). Feed it into some kind of ML system and that should give you a decent model to get a low noise wall time estimate.
Surely it’s possible to build some benchmark to demonstrate the difference right? Otherwise, what’s the point of making that improvement in the first place?
I think what you’re saying though is that having benchmarks/micro benchmarks that are cheap to run is valuable and in those instruction counts may be the only way to measure a 5% improvement (you’d have to run the test for a whole lot longer to prove that a 5% instruction count improvement is a real 1% wall clock improvement and not just noise). Even criterion gets real iffy about small improvements and it tries to build a statistical model.
> Surely it’s possible to build some benchmark to demonstrate the difference right? Otherwise, what’s the point of making that improvement in the first place?
No, sometimes the improvement you made is like 0.5% faster. It's very very difficult to show that that is actually faster by real wall clock measurements so you have to use a more stable proxy.
What's the point of a 0.5% improvement? Well, not much. But you don't do one you do 20 and cumulatively your code is 10% faster.
I really recommend Nicholas Nethercote's blog posts. A good lesson in micro-optimisation (and some macro-optimisation).
> It's very very difficult to show that that is actually faster by real wall clock measurements so you have to use a more stable proxy.
That’s what I’m saying though. You don’t actually need a stable proxy. You should be able to quantify the wall clock improvement but it requires a very long measurement time. For example, a 0.5% improvement amounts to a benchmark that takes 1 day completing 7 minutes earlier. The reason you use a stable proxy is that the benchmark can finish more quickly to shorten the feedback loop. But relying too much on the proxy can also be harmful because you can decrease the instruction count and slow down wall clock (or vice-versa). That’s because wall clock performance is more complex because branch prediction, data dependencies, and cache performance also really matter.
So if you want to be really diligent with your benchmarks (and you should when micro optimizing to this degree), you should validate your assumptions by confirming impact with wall clock time as that’s “the thing” your actually optimizing, not cycle counts for cycle counts sake (same with power if you’re optimizing the power performance of your code or memory usage). Never forget that once a proxy measurement can stop being a good measurement once it becomes the target rather than the thing you actually want to measure.
On my workplace we use self-hosted GitLab and GitLab CI. The CI allows you to allocate dedicated server instances to specific CI tasks. We run a e2e test battery on CI, and it's quite resource heavy compared to normal tests, so we have some dedicated instances for this. I'd imagine the same strategy would work for benchmarks, but I'm not sure whether cloud instances fit the bill. I think that the CI also allows you to bring your own hardware although I don't have experience taking it that far.
> I'd imagine the same strategy would work for benchmarks, but I'm not sure whether cloud instances fit the bill. I think that the CI also allows you to bring your own hardware although I don't have experience taking it that far.
Typically you use the solution between cloud hosted VPS and your own hardware, dedicated servers :)
It's not base64 because of privacy, and using the snipping tool instead of just copying is not natural. Imagine opening a whole other program to copy some text you just highlighted.
> If you actually wanted to do something about the US murder rate, it's very obvious where to begin fixing things. Nobody wants to talk about it, it's not a national discussion at all, it just keeps getting swept under the rug (while the dead bodies pile up, year after year).
2021 so far 6 months in: 378 Homicides, 1727 wounded, 2082 shot
Until the gang problem gets solved, nothing will change. And politicians are too busy disarming the law abiding citizens and cutting police budgets instead of focusing on the illegal gang problem.
Texas gets an “F” from Gifford Law Center, yet Houston and Dallas have murder rates that are half of that in Chicago. The rates in Austin and El Paso are tiny when compared to Chicago. All this despite Texas having neighbours with cartels south of border.
2. The age for purchasing handguns (pistols and revolvers) in Illinois is 21 years old. Vast majority of the gang violence and shootings happen using handguns.
3. The state requires gun owners to obtain licenses and face background checks as well as imposing waiting periods on firearms purchases. They also have red flag laws.
> Bonus points of you can do it while blaming mental health AND justify our decreased spending on mental health. The mic is yours.
This along with rest of your snide comment indicates to me that you have some pre-conceived incorrect assumptions about me and therefore I am not entertaining you.
You expressed a rather simplistic view of a much larger issue, stating a single problem as the core. I didn't judge your opinion untill you provided it, therefore it can't be preconceived.
As a gun owner, I'm fully aware of the complexity of the issue. Pointing at socioeconomics as the root is naive. It ignores the vast majority of incidents to focus very narrowly on one factor. The reality is if you only look at the one instance and ignore the whole, you cannot address the actual problems. Socioeconomics have nothing to do with almost all of our mass shootings because most of them aren't gang related. It doesn't account for most homicides by gun because they're individual issues.
We country divided politically, racially and sexually (gender, not intercourse) and pretty free access to firearms. Wanna guess what happens when people don't like each other or have heated arguments and have guns? Hint: shooting.
So sure, try and solve for the one thing and ignore all the rest. You can't solve for it because you're dealing with historically unrepresented populations of people that we've shoved into a corner together and we basically ignore. You wanna actually solve for all the killing we do? Make access to firearms require more than just a pulse.
It's not both. The root cause of most of these killings is honor culture, which existed in plenty of societies before guns and led to just as many killings.
Well certainly that's one required aspect of many.
I've yet to see a single serious national proposal for hoovering up the vast number of illegal guns from the inner cities however. The 117,000 gang members in Chicago do not care about gun permits or background checks.
The gun control measures being proposed - exclusively by the Democrats - won't do anything to stem the near-genocide rate of murder in US inner cities.
The actual big target is: opportunity, jobs, wealth. If you don't fix that in the US inner cities, you won't stem the murder, gang participation and rate of violence. There also isn't much being done about that equation in the US inner cities.