Hacker News new | past | comments | ask | show | jobs | submit login
Live patching for Linux 3.20 (iu.edu)
200 points by meskio on Feb 10, 2015 | hide | past | favorite | 30 comments



And it was officially merged into Linux this evening:

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux....


I am very happy about this. It's been years in coming!

We might finally get live updates on distros. (No thanks to Oracle, of course.)

Wonder if we can also do it that well in userspace? (How does systemd behave with patching, actually?)


The original KSplice project was actually fully open source, funded by a Dutch charity called NLnet that funds lots of interesting projects like that. I've been running it ever since.

After this was all done the team got acquihired by Oracle. I was actually amazed that the team was allowed to keep the service up for some non-Oracle distro's.

But very happy to see a broader adoption of this kind of technology, it is essential for all these unmanned systems out there in the cloud that they can be patched whilst running.


Any reason why no one forked KSplice when the original team went to Oracle?


Yes: KSplice had software patents - Oracle bought them. And everyone knows what Oracle is like with software patents: aggressive!

I'm not clear what they actually cover, and can't look them up right now, but I'd thought they were specific on how KSplice in particular operates, both applying hotpatches and analysing the source to create them. I don't know whether they'd apply to anything else, or whether there is prior art, but they're an obvious landmine to be aware of and to avoid. So a simple fork wouldn't do unless it'd change the way it actually worked. A fresh approach was needed, and we seem to have two fresh approaches here.

I'm trusting they've been avoided here. They probably have, as this is much more general? The concept of hot patches are of course fine, people have been doing that for decades, and you can't patent concepts.

The lesson here: please don't patent stuff jn your open-source software, in case you wake up one day and got acquihired by Evil™.


OK, I've now looked up the Ksplice patents that I know of. (I may not have found them all, but I think I probably have?) Here be dragons! (Those who are ordered not to read patents: Don't click on the links in this post.)

Of course the time they were granted (to Oracle, after Ksplice were bought) the applications became nigh-impenetrable patentese that really need a US-qualified patent attorney to interpret, so I'm absolutely not going to try and I'm just going to post what I found here.

Application: https://www.google.co.uk/patents/US20100269105 became patent https://www.google.co.uk/patents/US8612951 (B2) "Method of determining which computer program functions are changed by an arbitrary source code modification". (They've also cited a patent for a… coffeepot. OK, I'm pretty sure that bit's a typo. <g>)

Application: https://www.google.co.uk/patents/US20100083224 seems to have become patent https://www.google.co.uk/patents/US8261247 (B2) "Method of modifying code of a running computer program based on symbol values discovered from comparison of running code to corresponding object code".

Application: https://www.google.co.uk/patents/US20100269106 does not seem to have been granted directly, but then there's patent https://www.google.co.uk/patents/US8607208 "System and methods for object code hot updates" which I think is a continuation-in-part of it and oh I've gone cross-eyed, get a professional.


> The lesson here: please don't patent stuff jn your open-source software, in case you wake up one day and got acquihired by Evil™.

Or use a a license with a patent grant, like Apache 2.0 or GPL 3.


Here's some more discussion from LWN: https://lwn.net/Articles/597407/

(Note: from May 2014)


And for those who prefer the official lkml.org link:

https://lkml.org/lkml/2015/2/9/534


lkml.org explicitly states that they are unofficial.

From https://lkml.org , "In case you haven't read the titlebar of your web browser's window: this site is the (unofficial) Linux Kernel Mailing List archive."


Firefox Nightly doesn't display that neither in the tab name nor the Xwindow title. I'd never have seen this ...


That's because the <title> element only contains "LKML: " on the thread, and "LKML.ORG - the Linux Kernel Mailing List Archive" on the main page.

Maybe they forgot to keep the "unofficial" in the title?


This is cool as fuck, I didn't know about kpatch or kGraft.

Does anyone know if any other OSs have live kernel patching?


FreeBSD has not gone quite that far, but there was a PoC for loading new kernels without rebooting.

https://www.bsdcan.org/2012/schedule/events/325.en.html


Yup Linux has had a similar feature, kexec, for several years.


Everyone I talked to about it since has said "nobody seems to care enough to bother completing it"

It is kind of a niche feature, really.


Linux has this create feature of being able to include the operating system in the initrd. Put this with a nice PXE infrastructure where you pxe the OS and download the initrd with the OS in it. Then you simply kexec to upgrade or downgrade the entire operating system in ~30 seconds. I used to manage a production environment that worked exactly this way for several thousand nodes.

Not sure I'd call it niche, but yes, somewhat specialized.


Not that it matters too much for the current ecosystem, but Lisp Machines had this 25 years+ ago.


Windows Server 2003 had a live patching feature [1] and it was also applicable to kernel patching according to [2]. For some reason it was quickly abandoned, though.

[1] https://technet.microsoft.com/en-us/library/cc781109%28v=ws....

[2] http://jpassing.com/2011/05/01/windows-hotpatching/


I was told by Andrew Tanenbaum some time ago that Minix has had this kind of feature for a number of years now.


And lkml discussion of the follow-up patches: http://lkml.iu.edu/hypermail/linux/kernel/1502.1/00694.html


Does anyone have a simple example of how this would work? I can't wrap my head around code evolving during runtime for any arbitrary binary change.


I took a quick look at the accepted patch. while I can't guarantee I know what's actually going on, my understanding is that patching individual functions works by sticking the replacement functions code somewhere new in memory, getting a pointer to it, and then over-writing the code in the old function to jump to the new one. (Kinda like short-circuiting the old function - all the old code still calls the old function location, but that location simply says 'jump to this new location over here').

It looks like, however, because kernel modules seem to be in elf format (Don't quote me on that, just going from the code), elf format includes a 'relocation table', which is basically a table that says "this function is located here, and this next function is located here, and ..." for every function in the module. Ignoring why that is actually there, they can take advantage of the relocation table and replace a functions location with the location of the replacement function, effectively overwriting the old one. Even if it's still in memory (I can't tell if it gets removed or not) the code will never be called again.

From there, the discussion mostly seems to be around how to 'stop' the kernel enough to be able to replace the function without resulting in a mess because something was trying to use that function at the same time that you replaced it.


Basically yes. Check out the kprobes docs for a nice description of how these frameworks work ,https://www.kernel.org/doc/Documentation/kprobes.txt. Being able to intercept (and mangle) kernel function calls is awesome. With uprobes the same techniques work in userland as well.


(just what I gathered from a bit of mailing-list reading, if any of this is wrong please correct me)

I don't think it can do arbitrary changes, it "only" applies specially prepared changes to the running system by replacing function call targets while tasks are sleeping. The biggest limitation to changes probably is that old and new code runs in parallel, so you can only do changes to data structures that won't confuse the old code. "Simplest" use case might be adding guards against exploits to syscalls.

I can't tell how viable it would be for the new code to build an entire parallel structure and to only switch to this after everything is migrated, or how deep into the kernel these changes could go. Could one fix a file system driver while using the file system?


Yes, it's just that the fix won't happen for you when you're halfway through a patched function.


The original academic paper from MIT describing how it works:

http://www.ksplice.com/doc/ksplice.pdf


This paper is very readable - thanks for the link.


An interesting case of the Cathedral causing siloed duplication, and the Bazaar creating a better solution without duplication.


Ignoring the fact that both SuSE and Red Hat are paying the involved engineers their paychecks in order to get the feature in their silo first.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: