Hacker News new | past | comments | ask | show | jobs | submit login
Ideas for getting started in the Linux kernel (labbott.name)
207 points by ashitlerferad on Aug 16, 2016 | hide | past | favorite | 45 comments



I feel like this underestimates how hard it is to get started. I've started playing around with the kernel and making small modifications. It's not straight forward. For me just getting it complied in a vm was an ordeal. Most books are pretty old and if you go google for what ever you want you end up with a lot of very dense resources. Then you need to figure out if that article you found actually applied to whatever current kernel your using and what's changed since then. Ive been doing like hours of reading for a few line change here and then.

And somewhat off topic. I think it will be interesting to see what happens to linux in the long term. Are that many new/young programers getting into kernel development? Id imagine there's not to many people getting into programing now that are writing c or working on the kernel. Who is going to replace the kernel developers of today? Or am I just totally off?


That's basically the experience everyone has to go through with any new codebase.

For example, when I worked on a couple Firefox bugs a few years ago, I hadn't touch C++ for several years. I did have to spend several hours just to get a hang of things. Luckily what I needed to change was not too challenging. Several dozen lines were added and overall the necessary functions were all . hg grep helped a lot to find what I want. If I were to implement a new driver from scratch, that would be really tough.

Though since C and C++ aren't something I am very comfortable with (I haven't written a single line of and C and C++ for so long), it can be really tough especially during debugging.


> That's basically the experience everyone has to go through with any new codebase.

Yes; for a codebase of comparable size and widespread usage, the Linux kernel is very, very easy to get into and very, very well documented. Go to Documentation/[your subsystem] and you'll get something useful, usually a reference on a driver you can base yours on (which is how practically all drivers are created).

Nice ways to get started:

- simple character device

- SPI slave and SPI protocol driver


> For me just getting it complied in a vm was an ordeal

You're right, the process of building and installing a custom kernel is much harder than it used to be 10-15 years ago.

And I believe this is a problem that impacts a certain kind of contribution. Because even for the most experienced and competent of programmers, the bar for entry is too high.

The distributions are partly to blame here. The complexity of initrd setups have meant that testing even a small or single line change is an ordeal. The wrapping in package managers like RPM makes incremental builds of a kernel impossible.

My personal setup is based on Slackware without an initd; a kernel build resembles a 'make; make install' like it has been for the past 15+ years. I've been unable to achieve anything similar in reasonable time on a CentOS system.


I dunno, I've been a Linux kernel guy since Linus posted to the minix-list, and I have to say that things seem easier than ever. I mean, pick your distro, figure out how it boots the kernel, use it to build a new kernel, boot it. Repeat, ad infinitum.

The problem seems to be a rather hefty "NIMBY" factor, imho. If you want to get into the kernel these days, you have to realize: its a done thing. Nobody cares if you're a rockstar.

Learn the tooling and methodology, or gtfo...


I've been using Gentoo for 10 years and have been configing, building, and installing (and occasionally patching) kernels for that entire time. For me it has gotten much easier now that you can use an efi stub kernel to install. That said I agree that it can be too much because most distros don't make it easy and 'hello world' is often the hardest program you will ever write.


And it will get worse.

Just about every idea that is coming out of Gnome/Freedesktop these days assumes a massive initramfs.

To bootstrap systemd they even have a copy of dbus crammed in there.

And the (ab)use of initramfs is what has lead to the push for just cramming everything into /usr.


Young folks come through the GSoC/Outreachy programs:

https://wiki.linuxfoundation.org/en/Google_Summer_of_Code https://kernelnewbies.org/OutreachyIntro

Lots of new folks come through companies hiring people to work on Linux.


As for books about the Linux kernel, I'd recommend https://github.com/0xAX/linux-insides (which, oddly, didn't seem to be linked)


I would recommend Robert Love's Quora answers as an additional low barrier starting point

http://qr.ae/TNK9UD


One thing I hadn't seen said here yet:

Do we really need that many kernel developers? I'm getting the impression that a lot of dev power is wasted just to maintain drivers that should be produced by hardware vendors. The vendors won't do that because there's no stable ABI, which means that people need to reinvent the wheel.

One could argue that it adds better quality control and all in all, we get open source implementations of what would instead be proprietary drivers, but in the end, complex drivers aren't good enough anyway (see Nouveau). As for quality control... many say that Linus's view on security is questionable. For one, I don't like the concept of pluggable /stackable security mechanisms (selinux, apparmor, etc) where one consistent system would do. It just feels cluttered.


Stable API nonsense! :)

> I'm getting the impression that a lot of dev power is wasted just to maintain drivers

No. Maintenance of in-tree drivers is done mostly by subsystems maintainers, in automated fashion (see http://coccinelle.lip6.fr/). Including API/ABI-breaking changes. But nobody notices, because it all goes at once, in sync.

> that should be produced by hardware vendors

Yep, vendors should. But only few of them do it properly. Most of vendors fall into such shameful categories:

- who don't publish drivers sources at all, violating GPL; - who don't care about quality and just feel fine with their utter crap of code, based on ages-old kernel and buggy drivers full of braindead things; also, no updates and no forward-ports; - who, in addition to above, distribute only obfuscated sources.

Stable ABI would help only to those bad actors who are enemies of user freedom and don't share sources. Nobody else is having a problem with unstable ABI.

> I don't like the concept of pluggable /stackable security mechanisms (selinux, apparmor, etc) where one consistent system would do. It just feels cluttered.

There was, and still is, lack of coherence in community views on what is best. I believe pluggable security modules is the best possible way to give freedom to everybody.


> The vendors won't do that because there's no stable ABI, which means that people need to reinvent the wheel.

This is utter nonsense; I suggest you have a good look at a "git log" of some driver files source files in linux.git, here's just an example:

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux....

Edit: In particular, there not being a stable ABI prevents vendors from re-inventing various wheels because they can just argue for and submit enhancements and changes to various core kernel subsystems so that they better match the requirements of their hardware / driver, instead of putting extra code into every driver like they have to do on other systems where they have to live with whatever ABI was declared stable by the OS vendor a decade ago.


Concerning the argument:

> In particular, there not being a stable ABI prevents vendors from re-inventing various wheels

Here it depends on what you consider as "wheel": If a hardware vendor has to write two versions of the driver (one using the existing wheel in the Linux kernel and one with an own wheel on other OSes (Windows, OS X)) this clearly leads to reinventing various wheels. On the other hand inside the Linux kernel there is only one subsystem wheel that is used by all drivers using it instead of reinventing this wheel for all drivers of some fixed type.


Yes, it always depends...

NVIDIA's huge blob is probably a case where they re-invented lots of wheels in order to share the same driver between different OSes. For something as complicated as a modern GPU driver, this could well make sense from an engineering point of view (legal objections notwithstanding).

On the other hand, 10 years ago there were various out-of-tree Linux WLAN drivers, and each of them contained a full independently developed 802.11 stack, perhaps one shared across drivers for multiple OSes by the same vendor - surely having this much duplication of generic infrastructure that dwarfs the actual model-specific part of the drivers is quite pointless.

IMHO only GPUs have an inherent complexity that is high enough that one could think about alternatives to just using the shared infrastructure that's already in Linux. But even in this area you see companies like AMD moving towards a Linux-specific kernel driver approach with their new "amdgpu" driver.


@the_why_of_y: what should I note here? That some vendors do that within the source tree? Yes, some do, but many don't, including a fair share of smaller ones.


This sadly true, but I don't see how this is caused by lack of a stable ABI, more likely you'll see justifications such as "we need to protect our intellectual properteh", management at hardware vendors not understanding the development process, or just not caring about a niche market (in case of desktop/laptop hardware).


The vendors "should" be shipping open source drivers. That's why there's no stable ABI. They're quite at liberty to submit drivers - and then kernel devs can maintain them!

(Obviously not everyone agrees with this, but that's the point of GPL/copyleft licensing)


I'm a driver developer at IBM. We're the hardware vendor. We maintain the drivers. Simple.


>The vendors won't do that because there's no stable ABI

That only causes problems for vendors who only supplies proprietary drivers, of which there are very few these days (only ones I've personally come across are NVidia and Broadcom), and it's not as if the unstable ABI prevents them from supporting Linux either, since they do.

Beyond that, reasons for why you'd want open source drivers are many, like the kernel devs being able to figure out and fix problems without having to rely on a third party (vendor), upgrade drivers to make use of new kernel functionality without waiting for the vendor to comply/agree, being able to audit them for security problems, being able to port them to every platform on which Linux runs, which includes platforms which the hardware vendor might not think worthwile to support, and of course other systems alltogether like the BSD's (stable ABI sure didn't get them more proprietary drivers), fringe OS'es like Haiku, Minix etc.

So no, the answer is not to lose the flexibility of a changeable ABI, but to have vendors either supply fully open source drivers or the documentation necessary for creating open source drivers, which is pretty much the situation we have today, again with a few aforementioned holdouts.


I second the recommentation for the Eudyptula Challenge although the "rule" about not asking/posting for help is probably why I have been on Challenge 5 for about a year.


Lots of good suggestions on the list. I would add a few (maybe advanced ones) that I don't think I saw:

* write a device driver for a real or virtual device; hook into the various kernel virtual filesystems that allow userspace interaction.

* write a small network stack (for example a raw protocol on top of ethernet).

* add a scheduling algorithm or mess with an existing one; analyze its performance impact.

Lots of new-to-me tools in that list were interesting. Back in my day, we had to use a serial port console, and if we were fancy we would set it up so that you could remotely cycle the computer's power when it hung up (uphill both ways in the snow).


Everything old is new again - serial port debugging has been invaluable for me when I was messing with the Dreamplug, and then the Raspberry Pi.

I think I'd add using a serial-based terminal to that list too. Embedded hardware can be infuriating, but fun.


Is this still running?

I've sent a few emails about joining the challenge around 2 months ago (after a multi year break) but have yet to receive even the automated response..


Did you make sure to send non-HTML email?


yeah - i'd gotten up to level 9 before. i just wanted to start again but didn't receive any response.


An approach that's worked well for me was to first hack qemu to add some new (emulated) hardware, then write the kernel drivers using qemu to test (followed by running on the hardware). That does leave you with two problems instead of one, but is quite flexible when you are trying to track down bugs and can increase your hack/compile/test velocity.


Which kinds of drivers you have developed this way? I guess it's not possible to test driver code unless probe() succeeds, which requires actual hardware to be in place. You say you had "emulated" hardware - what this means, exactly?

Thanks in advance for explanations.


> You say you had "emulated" hardware - what this means, exactly?

Software pretending it is hardware, for example QEMU: http://wiki.qemu.org/Main_Page

> Which kinds of drivers you have developed this way?

As an example, this approach helped me develop a pin-controller driver[1]. I first added a new bare-bones SoC and machine to QEMU[2] to give me enough of a environment to start adding other models, such as the SoC's System Control Unit[3], which in-turn contains the registers the pin-controller driver pokes at. Using QEMU I could control the initial values in the registers and exercise the pin controller driver's implementation.

> I guess it's not possible to test driver code unless probe() succeeds, which requires actual hardware to be in place.

Not entirely - if you are writing the driver you can always pretend it succeeds! Ideally you want to get some gratification as soon as possible - if that means making some assumptions and nasty hacks, that's fine (as long as it won't smoke something!) Then start iterating to iron them out. At least, that's the kind of approach that works for me.

[1] https://lkml.org/lkml/2016/7/20/69

[2] https://lists.nongnu.org/archive/html/qemu-devel/2016-03/msg...

[3] https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg070...


> I guess it's not possible to test driver code unless probe() succeeds, which requires actual hardware to be in place.

Not the OP, but I think qemu and VirtualBox have APIs for implementing custom virtual hardware (similar to how the rest of the machine is emulated). To your kernel running in the VM, the emulated device appears as a real device.


As you say, now you have two million-line codebases to understand instead of one :-) I think there are things we (the QEMU community) could do to make the learning curve a bit less steep but it is still definitely there...


> As you say, now you have two million-line codebases to understand instead of one :-)

Yes, I guess part of the problem is figuring out how much of that you (don't) need to know.

> I think there are things we (the QEMU community) could do to make the learning curve a bit less steep but it is still definitely there...

I found the reviews were clear and insightful: In my opinion that goes a long way to making it more approachable, though I guess only if you've got so far as having a patch to be reviewed. That said, even if you don't yet have a patch, reading feedback on relevant patches from others can be just as good.



It's not that getting started with Linux kernel is inherently hard. It's just what a lot of people want to have as "getting started with Linux kernel" (see motivation part of this article) is hard, but that's not about Linux kernel itself. Some people get this wrong and start thinking that Linux kernel or its community itself is the problem (Maybe kerneldevs and Linus himself are aggressive picky bullies unfriendly to newcomers? Maybe you should ditch C and rewrite kernel in better language? Maybe Linux development process is wrong and they should change it? Maybe Linux maintenance organization structure is bad? Maybe ...)

There is also common frustration about friction at getting your patch accepted. But I believe in most cases maintainers are right and there are some issues to fix in any patch, so author has something to do on it more. Maintainers have reasons to care - they are who improve subsystems API and modify ALL drivers in the tree, so they'd better have all drivers as perfect as possible.

> Are you interested in knowing how operating systems work in general? Do you want to know how parts of Linux specifically work?

An undertaking which a lot of novices take for reasons unknown to me. I was always fine without this.

I follow #kernelnewbies IRC channel (good source of help, BTW) and see which questions are being asked. Surprisingly many novices decide to start with understanding kernel codebase in its entirety. Such people should just understand that their aim is inherently hard. Any large codebase has non-obvious and/or complex things, a lot of indirection, etc.

  - I don't understand how this internal kernal mechanism works, please explain!
  - It's complicated for outright explanation. If in trouble, read the code. And just trust that it works. What are you trying to achieve?
  - I'm just studying how kernel works.

> Is your hardware broken?

Regarding device drivers, a lot of opportunities are missing because most of devices don't have datasheets good enough to allow third-side developer to make a driver. So you just have no way to know how to give commands to your piece of hardware. It's a shame, but this shame is not on Linux, it is on manufacturers. Even if you are stubborn enough to get in touch with manufacturer, you may end up with something like

  - The development team was dismissed N years ago and people don't work here anymore, so nobody to ask, but if you want NNNNNNN pieces order, we can make new research team for this.
You can reverse-engineer something, of course, but as somebody having experience of reverse-engineering heavily obfuscated _sources_ of Linux v2.6 driver, I'd say the amount of wasted time and frustrations is tremendous.

> Do you just want to make an Open Source contribution? Do you want a high five?

Oh yeah, kernel is a fantastically reputable place to have open source contribution. Your bros will respect you as coolest guy ever, your girlfriend/boyfriend will praise you as superhero, hiring managers will want you. Because they don't know that contributing to kernel is not THAT hard.


>> Are you interested in knowing how operating systems work in general? Do you want to know how parts of Linux specifically work?

> An undertaking which a lot of novices take for reasons unknown to me. I was always fine without this.

I'm probably atypical, but my motivation for wanting to get involved is that many of the jobs I really want demand Linux kernel development experience. I have good systems-level experience at all levels except the kernel, and I want to fill that hole.


Good luck, but in my opinion there's very little room for improvements in core code which could be found and undertaken by newcomer. It is more realistic to get first kernel-related experience by working on drivers - they tend to be much less perfect. For drivers work, all you need is usage of provided API and communication with maintainers and mentors. That's my point.


Is there any kernel that was written and maintained using python language?

I assume that the C and C++ is a must to learn to this, but is it or we can work with karnel using python.


Beside performance and safety concerns even a kernel written in C needs to to jump down to native assembly for certain parts, so I don't believe this would/could have been done


As much as I love using Python for my day job, this is one place where it really doesn't fit. Learning OS development is actually a great reason to practice C.


I suppose there are systems that run "micropython" on the bare metal. But note that to work on Python itself often means working on or with CPython in C.


Technically there's nothing that forbids doing it. It would essentially boil down to port the Python interpreter / VM / Cython (pick your flavour) to run on the naked metal (hard, but not impossible; probably easier done with Lua), which would mostly involve memory management and all the low level stuff to get it running in the first place. Besides all the really hard stuff one had to solve, ironically mapping a pointer-less / array-less language like Python to a page-wise memory model would be rather easy; as soon as one does not have to care about that things like lists and dicts are contiguous in address space suddenly a lot of constraints you'd have in the Linux kernel would vanish.


[flagged]


Where does OpenBSD's kernel fit into all of this? Isn't that, in turn, lightyears ahead of illumos?


I'm not sure in what way they would be light years ahead.

But I will say the it's probably much easier to hack on the bsd kernels than Linux.

Taking FreeBSD as an example, if for no other reason than the absurdly great documentation, tooling, and developer community.


For one, Theo De Raadt Says FreeBSD Is Just Catching Up On Security:

https://tech.slashdot.org/story/13/12/16/0121213/theo-de-raa...

OpenBSD's documentation is also known to be a tad better than FreeBSD's.


In some things, yes. OpenBSD was always, and remains the industry leader in security innovations.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: