"In our solution, developers annotate the functions perform-
ing sensitive computation(s) that they would like to protect."
and assumes that the cache timing is used to gather info on a secret function, so it does not consider the usecase of dumping everything a process can see?
Their performance results depend on isolating “protected” functions that access sensitive data, while keeping the rest of the code unprotected. That’s fine in a sane world, but with Spectre, just about any code can be tricked into speculatively accessing sensitive data in its address space. So trying to adapt this into a Spectre mitigation reduces to just flushing all CPU caches on every context switch (plus giving each core a separate chunk of the L3 cache, which is supposed to be shared). That would be… not pretty, from a performance standpoint.
Consider though: a similar logic has lead us to not flush the cache, and here we are.
Until we have better ways of reasoning about and restricting the code we are running, it isn't going to get prettier. Conservative approaches like this may not be best for every propose, but in hardened code like a hypervisor it might make sense.
Isn't the problem rather that we have instructions to control the cache at all? As far as my research goes, things like CLFLUSH were introduced at the same time as SSE2, which was what, the Pentium III era? We were apparently doing fine without them before that, even with 4+ CPU systems.
I'm currently trying to get the spectre.c PoC to work without using CLFLUSH or similar, but it doesn't look too promising yet. Then again, I only started an hour ago and lack the ability to think in those creative and twisted ways that lead to the discovery of this whole issue in the first place.
Practical secret extraction from caches --- practical as in "papers documented C programs the authors wrote to verify their claims" --- go back to at least 2005.
It's the speculative execution side channel that's new. What we didn't fully grok was the idea that code that never really runs could leave predictable cache footprints.
and that you can poison prediction state (branch predictor / branch target buffer etc) and cache state to force longer periods of processor speculation, leaving more room for unintended side effects.
The folks that invented high-assurance security (TCSEC A1-class) were eliminating storage and timing channels best they could. An early reference was in VAX Security Kernel that tried to assess and mitigate timing channels.
Mainstream security professionals have always ignored such work dismissing it as red tape with no value or too costly. The early work preempted most problems, though, back in the 1980's-1990's by focusing on root causes. The same crowd that ignores them keep rediscovering the lessons in new forms while continuing to ignore them. (There are exceptions who pay attention.) Far as covert channels, the first technique I saw to systematically examine was Kemmerer's in 1983. It was one that made it into the first criteria for security certification. As in, it was mandatory to look for covert/side channels the best you could.
So, the basic concepts have been known for decades under covert, channel analysis. The analysis technique established long ago was to identify every shared resource between two things in the system. If they could read/write to it, it was a potential, storage channel. It was a timing channel if they can manipulate its activity with a way to time that. Those are basic ones. So, you can just list everything, categorize them, and do the analysis by brute force. They'd have found the caching stuff quickly. Here's an example that should've happened a long time ago:
What's really wild is that they even got the rough proportion of security cost for losing it down to 5% of the numbers were seeing with today's security patches.
I submitted this because I think their technique is preferred for lots of stuff distributed systems developers do with crypto primitives (as opposed to retpoline).
There are lots of well-known things no one seems to know outside of Academia, like how generalized sorting is O(n) and compilers that can prove the correctness of real world code when it's written a specific way
what's interesting to me about this paper is how many different things they had to worry about. Disable scheduling, pin to a single core, page coloring, no huge pages, disable hyperthreading, disabling frequency scaling, and lastly the lazy cleanup (I think that's all?). it's a pretty long list of kernel modifications and configuration, and I doubt these will ever be defaults.
They probably mean the "postman sort", or whatever catchy name it goes by (radix sort, IIRC). That doesn't involve any comparisons between elements to be sorted. Think of how a postman sorts mail into P.O. boxes... you just algorithmically map a number to a row and column (bin), and deliver. This truly is O(N), but there's a catch: it doesn't work when you don't have bins to sort into, or when you don't know even how many bins you'll need. If you try to generalize this it becomes O(N log N).
Is the technique. It's an imposing 80 page paper even dedicated educators like Edward Kmett has trouble explaining trivially, but there is a talk here by the author to help sum it up: https://www.youtube.com/watch?v=sz9ZlZIRDAg
We've had some of Henglein's associates here to talk about it, too.
There is a Haskell implementation and it can really speed up certain types of operations. It's tricky to get the constants low in Haskell, but Kmett seems to have done a pretty good job.
Agda and Idris are languages with more "real world" applications that can validate totality and offer proofs.
It's pretty cool because you can essentially ad hoc extend the type system to understand all transitions valid for types, then prove then.
Idris is different from Agda in that it has great introducory literature (the Idris book is just fantastic) and it also has F# style type providers that let you do things like query the AWS API with a given account's creds then extend the type system with a series of descriptive types for all valid operations for every asset.
It means that you can (and I find this really surprising) assert code totality on very real world use cases like code that orchestrates containers, binary parsers, or network protocols.
I spent my Christmas break studying Idris and what really delighted me most was that in many cases systems like that end up being simpler to use (even though their "Verified" constructions often involve simple inductive proofs to the compiler. You don't have to prove everything, but things that are proven are safe to use
I'm familiar with dependently typed languages, I thought you might mean something different than semi-manual theorem proving. For instance, writing your program in a particular style that could maximally take advantage of automated SMT solving in LiquidHaskell or something.
"In our solution, developers annotate the functions perform- ing sensitive computation(s) that they would like to protect."
and assumes that the cache timing is used to gather info on a secret function, so it does not consider the usecase of dumping everything a process can see?