There are allegations that Satoshi may have accidentally slipped up and leaked IP address that was not a Tor exit-node or other anonymous-proxy.
Could either be Satoshi fucking up and not using Tor all the time (has happened to other 'anonymous' entities) or perhaps they needed a clearnet connection for some reason and managed to use another internet connection not attached to any identifiers that would lead back to them despite that.
This uses 600 'candidate' programmers. But I wonder how much harder it becomes when on e.g. an arbitrary piece of github gist code. As the number of candidates increases (with many writing the same styles) I'd imagine the problem becomes enormously more difficult.
With copilot I'm sure the data exists and it's a matter of modeling it out. The fact you can't configure co-pilot to do celebrity coding by fine tuning it to a particular person or organizations repositories is actually surprising to me.
I am thinking of a future when every piece of code can be traced back to a common ancestor because everyone is using a tool like copilot and there is no identifying signature. De-anonymization is only going to become more difficult.
That's a good point. In general there's probably a ton of easy ways to adverserially rewrite the binary against a deanonymizer like this. You could probably even make a program that rewrites your own code into someone elses style to frame them.
Their experiment with different optimization level is with symbol information intact and they did not mention whether they have debug information enabled. Stripping the binary but with no optmimization reduces accuracy by 24%, but they did not mention the accuracy of O3 + stripped binary, so I guess it is probably not that good as this is so obvious that they should have tried.
Basically , to outsmart this algorithm you can use deniability attack
You just say that someone imitated your style. It's not like binary has cryptographic signature of person who compiled it, even then you can say that someone stole your private key.
None. There is a difference between actually storing personal data and it being possible to forensically analyze data not eligible for protection and potentially correlate other similar data that in turn is tagged with metadata.
The second party has the obligation not the first. Ultimately all risk of exposure of personal data derives from the second party. For example if you mail in an executable to a client and put some code on github under your own real name the holder of the exe has no obligation because it is impossible for your identity to be exposed by it or indeed an infinite number of similar executables.
It is only when combined with your github profile where you willingly shared a work sample and your real info that you could possibly be exposed.
Technically, it means that any system holding binaries is capable of holding data identifying a user, which has crazy gdpr implications.
Practically I don't expect this to have any impact beyond state surveillance, where obtaining a binary for a virus (or, you know, drm-defeat code) can identify its creator against any public code they would have posted elsewhere.
Most malware is packed and/or obfuscated. I'd imagine this defeats fingerprinting relatively handily since the binary is rewritten. I'm sure this technique is used to catch particularly dumb adversaries, but against anyone with a hint of operational security it wouldn't work at all. Moreover, what's stopping a determined adversary from rewriting the binary with a signature that matches another person? Using this as a targeting method would have a lot of collateral damage.
Anyone whose job it is to evade detection will pull their output through a scrambler. What this will catch is small-time criminals, probably in minority groups (frequently categorised as "high risk" by police).
De-anonymizing programmers from executable binaries - https://news.ycombinator.com/item?id=16598962 - March 2018 (39 comments)
When coding style survives compilation: De-anonymizing programmers from binaries - https://news.ycombinator.com/item?id=10806956 - Dec 2015 (67 comments)