Demystifying 'The Kernel'
‘The Kernel’ is a ominous sounding term which sometimes comes up when you talk about containers. To me, it felt a bit hand-wavy at times. Whenever it came up it was as a side-note. With the intention to move towards the actual topic right after. Understanding was assumed. Or at least there was a silent agreement that nobody had the intention to go any closer to the topic.
- “Containers share ‘The Kernel’ of the host you see.”
- “The isolation is not as good because the workloads interact with the same Kernel.”
- “There is less performance loss, because containerized applications talk directly to ‘The Kernel’.”
It took me a while to take the time and look into what ‘The Kernel’ actually was. It wasn’t without an amount of guilt, and a feeling like I should have known all along.
Me: Hello my name is Vladislav and I’m an impostor. Impostor syndrom support group: Hi Vladislav.
By now I think I found a way to convey a good understanding of what ‘The Kernel’ actually is! We are going to talk around it a lot, you still should bring a bit of time (and patience), it won’t be complete, but by the end of this I am quite confident that you will have a better grasp on the term and what it really is.
Note: I am going to focus on the Linux kernel. Also, I am no expert! I have acquired a bit more understanding than was needed for my day-to-day work, but there is so much depth you can get into! There might be errors or simplifications in there. I still think it will be helpful though!
Let’s get started. Let’s take the mystery, the scary parts and those capital letters out of ‘The Kernel’.
It Starts With A Boot
One way to look at the kernel is in terms of the boot sequence of a boring ol’ desktop PC.
Note: there is a lot of variety in setups! We will look at a somewhat more old-shool system with BIOS, an MBR record and GRUB2 as bootloader. Instead of something like UEFI. The older stuff is usually easier to get into and more fun to learn about, at least for me :)
Here’s what usually happens (you can read another version over here on Wikipedia or (!) the boot man-page):
- You press the power button
- It’s firmware time. Your machine has a motherboard, there is a chip on it which has had BIOS (Basic Input/Output System) flashed on it.
- The BIOS starts doing its thing, enumerates the available hardware, does some checks, finds a boot device and “executes the boot code from there”.
Intermission. What does it mean to execute code? You read data from somewhere (usually a disk), put that data (which happens to be valid machine-readable instructions) into RAM (random access memory), tell the processor to go to that part of the memory and start executing stuff from there.
So the BIOS read data from the boot device, and gave it control. The bootloader has control now. If it’s GRUB2, it does a fascinating little dance at this point.
Intermission again. It’s too fun not to get into (once again, you can read another version of this on Wikipedia).
You see, the BIOS is only supposed to load a tiny bit of data. The MBR is only 512 bytes (!). That’s not a lot of space for anything. This is why GRUB2 is structured in stages. Stage 1 lives in those 512 bytes, taking up (according to Wikipedia) 440 bytes. It’s job is to make it possible to run stage 1.5.
Stage 1.5 “lives” in the empty space between the MBR sector and the first partition. That empty space is there for technical reasons, and is not used otherwise. Here we have quite a bit more space, and can fit more code. For example, it can handle a limited number of file systems - ext4 among them, the by-now usual file system choice for Linux distributions.
Stage 1.5 finally loads Stage 2, which is located in wherever you chose to create your
/boot/grub/ folder. Phew! We are at the bootloader screen, and you can choose what kernel version to boot. (or the default is picked automatically).
Note: all of these stages are basically files, being read and loaded to memory. Stage 1 corresponds to a really small
boot.imgfile. Stage 1.5 is called
core.img. (No idea how exactly Stage 2 looks, but I bet it’s yet another file being read to RAM). No magic here.
Note: this stackexchange comment has got even more details and pretty graphics, if you’re interested.
At this point I had to close a bunch of tabs. Okay, what happens next?
“GRUB loads the linux kernel into memory and passes control to it” -Wikipedia
How exactly does that happen you may ask?
Well, we read a file, load it to RAM and start running the content aka hand over control to the loaded code.
The kernel is in a file with
vmlinuxin it (there may be multiple ones, and you chose one during the previous stage, with all that GRUB business), located in the
But there are a few more hidden steps until everything “just starts”.
Memory May Be Limited
This is the reason, why the kernel file is kept as small as possible. If memory was plentiful on any system which needed to run a Linux kernel, we would just jam everything in there and put it into memory.
But alas, once again, memory is limited. There are a lot of different file systems which may be needed. We want to pick and choose, and be selective what to load. We may not be able to get to
/ (the root partition) without that extra knowledge!
This is where
initramfs comes into play (you can read more on Wikipedia). They make it possible to have a temporary root partition, with all the stuff the kernel needs to get to the real root partition.
Note: Here I am completely out of my depth, and don’t intend to dive deeper for now. Let’s just assume this works.
Any necessary parts are loaded as kernel modules from that temporary file system, and we are able to read the actual system files. Neat-o!
The Kernel Is Just Code
I’m not sure whether $(
initramfs) is mounted first, or the kernel takes care of hooking them in. Anyway, eventually the kernel starts. Which means that the content of the chosen
vmlinux file is loaded into RAM and given over control. Yeii.
Of course it’s compiled and in machine code, but essentially the kernel is just a piece of code. You can read what mostly constitutes to its main function (as I learned from this cool stackoverflow reply) can be found here.
How cool is that!? Look at that code! Sure it does a lot of stuff, and I’m sure all of it is pretty well thought out (as can be asserted by the illustration over at the Linux kernel Wikipedia page).
But in the end it’s just compiled code, doing what it’s supposed to.
The Kernel Gets Special Treatment
The kernel code is pretty important for the machine to run well. We need to make extra sure that it’s really hard to mess with it unintentionally. This is why there is a distinction between kernel-space and user-space.
The kernel-space is a basically a protected part of the memory, which can only be messed with by the running kernel itself (more about it here).
This way normal programs don’t have it easy to interfere.
Okay, So How Do They Interact?
System calls. Think of the kernel as the only thing any user-space dwelling code can do ANYTHING with the hardware. They talk to the kernel.
- Want to read a file? There’s a system call for that.
- Need to claim some memory? There’s a system call for that.
- Want to do something with the network interface? There’s a bunch of system calls for that.
There are over 300 system calls, to interact with various parts of the System Call Interface. Terms and conditions apply.
CPU registers and interrupts are involved. I haven’t gone there much deeper than this yet.
You don’t read a file. You ask the kernel nicely to read a file. The kernel goes off, does the busy work (talking to drivers to have them talk to the hardware, but that’s another topic I know almost nothing about), and returns a result as agreed upon through the system call interface.
Note: the “Various layers within Linux” table over here (you have to scroll down a bit) is a great overview.
The Kernel Keeps Stuff Running
We talked about system calls, but the fact that other code get to run is entirely up to the kernel as well.
By the way, we were not done with the boot process. What happens after the kernel has control?
Let’s look at this real quick, to complete the overview.
The kernel goes ahead, once everything else is taken of, and runs the init system, also known as system manager. This may be systemd for example.
This one goes ahead and takes care that everything which needs to run is is started in the right order, so you end up with a functional system.
Everything the system manager, or any process started by it does, will talk to the kernel and be handled over time on the CPU by the kernel.
Think of it like a tennis match, where the user space code and the kernel space code interact. Well, at least this is the image I have in my mind right now.
By now it’s a bit less mysterious to talk about containerized applications. They are plain-old applications! They talk to the kernel (I hope this sounds way less mysterious by now)! They are most likely limited in what they can see (by the kernel) and what they can access (by the kernel).
I hope this writeup has been as engrossing to read for you as it has been to write for me! This is an early version, and I will be sure to revisit it in the near future, to correct mistakes and add any learnings (or suggestions) to it.
Understanding the kernel a bit better has helped me a lot. Not because I actually needed to know about it, but because I feel like I understand what’s going on inside all of those computing machines a bit better.
Who knows, maybe I’ll actually dive deeper into the topic one day. By now, I am quite sure that understanding what’s going on inside the kernel might be a great thing to do. Not because it’s strictly necessary, but because it might open up new vistas and insights for my work where I have not suspected to find them.
Thanks for reading! And all the best on your own journey to feel at home among sometimes-confusing but essentially-not-that-hard-to-grasp technical terms.