It’s been quite a while since my last blog post – and I’ve dived too deep since then and thought that it would be a nice timing to make another blog post on the progress so far before I go deeper.
To recap, my research project breaks down into 3 parts:
1. Flashing Coreboot on the Gizmosphere machine provided to me.
2. Getting Coreboot to boot the Nautilus kernel, a tiny kernel written by Kyle Hale, a Ph.D. student in our lab.3. Porting the Palacios virtual machine monitor to the Nautilus kernel
The first part was easily (relatively) done because Gizmosphere provides an interface for flashing its ROM. However, for the sake of debugging, I decided to use QEMU, an open source machine emulator. Using QEMU for debugging Coreboot is relatively easier because debugging at a hardware level can be difficult, especially when it comes to debugging processes that happen before the OS comes into the place.
For the past week, I’ve been struggling with Coreboot. Coreboot, previously known as the LinuxBIOS project, is an open-source project aimed at completely replacing the BIOS. As mentioned in previous posts, BIOS is often troublesome because they may contain strange bugs, but are not open about it at all – in short, it’s like an ugly duckling that no one wants to take care of, mainly because of its nature.
Coreboot supports many different CPU architectures, including the most popular ones such as x86 architecture.
Compiling Coreboot
The most difficult part about dealing with Coreboot was compiling it. Coreboot aims to eventually replace BIOS, which means that it has to deal with setting up the hardware of different architectures. Cross compilation is a terminology used to describe such situations – when a program targeted at a specific architecture is compiled on a machine that has a different architecture. Cross compilation requires very specific toolchains, and such requirement is quite common when it comes to low level code that has to deal with behavior of the hardware. However, Coreboot used by far the most specific toolchain I have ever seen. Not only that it required particular versions of compilers such as gcc, but it also patched different commits of the library dependencies and compiler versions. In addition, for some reason Coreboot didn’t really compile on Ubuntu, probably due to the toolchain issues. I tried to compile Coreboot on multiple server machines used in our lab, and there was one machine where it succeeded in compiling (a x86 machine using the ancient Fedora 15.. but whatever, it works). I decided to not struggle with installing this on my local Ubuntu or Mac machine and decided to move forward.
Initial Attempt – Coreboot + SeaBIOS + GRUB2 + Nautilus
Coreboot uses something called payload, which is just an ELF formatted program that it jumps to after doing a basic setup of the machine. Coreboot can also be configured to use open source BIOS like SeaBIOS and bootloaders like GRUB. To see whether Nautilus would boot under (more normal) scenarios, I configured Coreboot to use SeaBIOS as its BIOS and GRUB2 as the bootloader.
This worked without any problem as expected, and I started moving forward.
Next: Coreboot + GRUB2 + Nautilus
I then configured Coreboot to not use any BIOS at all, and instead just use GRUB2 as its bootloader to boot the Nautilus kernel. This also worked without any problem, and I was quite surprised that Coreboot did everything correctly to set up Nautilus. The boot time was also extremely fast due to the minimal boot time of the Nautilus kernel (it takes about as much as a Linux kernel can fork a process – basically clone a process – for Nautilus kernel to boot).
The real trouble: Coreboot + Nautilus
This didn’t go so well. I configured Coreboot to use the Nautilus kernel as image, and it didn’t go so well. The machine went on an infinite loop of boot -> kernel panic -> reboot. This indicated a double fault, which happens when a fault happens in the middle of handling another fault. But it was unclear whether the fault was happening from Coreboot or Nautilus.
Since then it has been a long debugging process that hasn’t ended yet. First I wasn’t even sure whether Coreboot succeeded in loading Nautilus kernel correctly into the memory, so I looked at the different segments. Using readelf, I compared the physical addresses of the different segments of Nautilus executable with the outputs from Coreboot, and they matched correctly. I then started looking into the Coreboot code to see what was happening, and it simply did a jmp instruction to the location of Nautilus. I then debugged by modifying the Nautilus kernel’s main() function (the place it starts executing first) and seeing what changes.
Interestingly enough, the kernel didn’t panic any more when I commented out the code that shows the cool “Nautilus” sign at the start of the kernel setup. I started to wonder what could be causing this, and discussed this problem with Professor Dinda and Kyle. While it’s not entirely sure, the double fault may have been caused due to multiboot tables not being set up correctly. When the print was removed, problems still existed. It got stuck while setting up some serial output device.
The root of the problem is that Nautilus is a multiboot compliant kernel, and Coreboot does not support the version of multiboot that Nautilus uses (which is quite strange, since it’s not too much of work to do this). I started working on modifying Coreboot code to do the setup fitting to multiboot 2 specifications.
My experience with research so far
So far, I’m going through the pain that happens during any systems research project – debugging. I can’t debug using tools like GDB, and have to rely on reading memory dump and register values. Yet I’m learning a lot from this project, since I get to look into different parts of systems such as virtual machines, OS, and bootloaders.
In the near future I will be posting on my work with porting Palacios to Nautilus.