Someone else can answer better than me, but https://panopticlick.eff.org claims the canvas fingerprint provides 17 bits of identifying information (click detailed results after testing)
On mine, the System Fonts give 17 bits of info (1 in roughly 200,000 computers). User-Agent is next with only 8 bits.
Based on that alone, it seems that just replying back with either a blank font list or the minimal standard font list (e.g. only Times & Arial) would solve most of this problem.
A blank font list where? There's no way to get a direct list of fonts: you just try rendering text with a given font and look at the metrics of it versus the fallback. Font lists are done using side-channels (and you also therefore have to have a list of fonts to sniff in the first place).
The only way to stop font-based side-channels is to limit the web to a fixed set of fonts: and that will horribly break the web in some linguistic communities where there's a fair amount of web content that relies on specific fonts (that typically map old Windows codespaces to other characters for support for their language, often before Unicode covered those characters).
You also need identical fonts for a given user agent, and that's very hard to guarantee short of shipping your own fonts (e.g., consider an OS update that changes a font!), and that becomes expensive fast.
Get two block level elements, render some text in between, and calculate how far apart the two block level elements are, and you can already determine the height of the glyph.
So, yeah, to disable that you'd have to entirely disable the CSSOM, which would cause ridiculous amounts of breakage.
Unless every browser in the world adopts the same list, replying with a fixed list of fonts would make users of a given browser immediately recognizable (especially for low-marketshare browsers like Tor). Seems like you'd want a system where the response to a list-of-fonts query would be semi-random and likely to overlap with the lists that are naturally produced by other browsers.
Generally speaking, you have two approaches (that I'm aware of) for addressing fingerprints: one is to "hide in the crowd", i.e., return values that are common across the browsers population. The second is to create unique value for each separate session (like incognito and cookies).
See: https://www.microsoft.com/en-us/research/wp-content/uploads/... [PDF!]
But user agents already identify the browser, right?
I agree that implementing this first in Tor is probably not a good idea, but if Firefox were to do it first, then I don't see the problem. "They're a Firefox user" isn't nearly as specific information.
User agent gives the browser version and platform version. Two macs with the same OS version and the latest version of Chrome will have the same user agent.
It would reduce the variability. 1 in 200,000 is reasonably unique. But if all Firefox browsers reported the same result for fonts, then it would provide no more information than the spying website already has (i.e. the user is using Firefox).
I'd bet that Chrome would follow quickly, which would put pressure on Apple to do the same. If that happened, we'd have a minor victory.
All I'm trying to do is reduce information that is needlessly leaked out by a browser. True privacy still requires more.
This would also have the additional positive effect of reducing differences in rendering across browsers. At the moment there's a risk of the browser a webpage is viewed in not having the right fonts.
There's no reason for browsers to make a large number of fonts available if websites aren't able to use them because not all browsers make them available.
However, there may be an issue with internationalisation.
I think this is an over-estimation of the amount of entropy. If canvas hardware acceleration is disabled, the only things that can really have an impact on the output of the panopticlick canvas fingerprint are OS version, user-agent and available cpu vectorization instructions.
> Presumably they are unprivileged instructions on x86?
They're all unprivileged; having to go to the kernel would defeat the purpose most of the time.
Also, trapping them wouldn't make a difference. Fixing the CPUID fields on the other hand (so that these code paths are not taken in the first place)...
The 2012 UCSD paper [1] claims they observed 5.73 bits of entropy in their admittedly non-representative population.
As with everything, it depends on the user's threat model. In a court setting, it'd depend on how individual pieces of evidence stack up against a user to make them look bad, and whether there is enough reasonable doubt.