Monday, May 15, 2006

Microkernel fracas drones on

I had the pleasure of meeting Andrew Tannenbaum at SOSP 2005. He's a smart man, and anybody who distributes their OS as a VMware VM gets props on that account alone. He's also a refreshingly unreconstructed microkernelist. To hear Andy tell it, the advantages of microkernels in 2006 are a lot like their supposed advantages in 1994: stability, minimality, easy to reason about code, user-level drivers that automagically survive nuclear armageddon, etc. For the most part, I'm willing to stand on the sidelines on the microkernel/megakernel holy war. However, since none of the actual principals of this debate will pay any attention to anything I say, I can attempt the following drive-by cheapshots:

  • Dr. Tannenbaum is fond of mentioning "self-healing" as a goal of Minix 3. I understand this to mean that, since microkernels enable user-level drivers, failures in hardware and software can be routed around with the same sorts of logging, checkpoint-restore facilities, etc., used for enabling high-availability in other user-level applications.

    This has always struck me as a bit too far-fetched to be a short-term goal of a research project. E.g., if your graphics card driver just went down in flames, there's no way in hell a reinstantiated graphics card driver will be able to recover your system seamlessly, because the hardware is in an unknown and, more importantly, unknowable state. When I asked one of Andy's grad students about Minix 3's plans for tackling this problem, I got the cryptic admission that "that's a hard problem." "That's a hard problem" can very occasionally mean, "We have a solution that's entirely too clever to share before I cash some checks from VCs," but it usually means, "We have no freaking clue on earth." Regardless, if you do solve this "incoherent hardware after driver crash" problem, your solution is likely to be just as applicable to a monolithic kernel as to Minix 3, since the problem isn't inherently related to software structure.

    If you don't solve this problem, I don't see in what sense your system is self-healing. While a box might continue to "run" with a crashed display driver (or filesystem, storage, your networking driver, etc.), the system is "running" in name only; i.e., it's unlikely to usable for non-trivial work until it's power-cycled. If we're counting that as "self-healing," then all the Linux crashes where I'm still able to ping the machine should be counted, too.

  • Xen has been carefully, and somewhat silently transmogrifying itself into a microkernel. At some level, this is inherent to the paravirtualization project. Once you liberate the interface between the hypervisor and a "guest" from hardware constraints, you might as well call the "hypervisor" a microkernel and the "guest" a process and be done with it. After all, what is UNIX but a very high-level virtual machine definition, consisting of a set of system calls?

    Xen 3.0 is drinking another serving of the microkernel kool-aid by moving all of its drivers into user-level processes (oops, I mean "guests"). I predict that they'll hit the same bumps in the road as the microkernel folks did, and have just as much trouble reaping the supposed benefits.

1 Comments:

Blogger Mark Williamson said...

Actually, device drivers have been running in domains since Xen 2.0. 3.0 moves some more hardware initialisation into domain 0, but basically the data path is the same as before.

It does increase the CPU load a bit in some circumstances (high load network traffic is particularly nasty to virtualise in this way).

Doing the device driver recovery stuff properly is harder. Whilst 2.0 was in development, we got device driver restartability to work for block and network devices. The contentious part is likely to be how to decide driver restart policy - nobody's really tackled this issue. In the case of a proper driver crash it's fairly straightfoward, although it's important that the restart mechanism doesn't increase the latency. In the case of a deadlock / infinite loop detecting the failure is a bit tougher, and you wouldn't want false positives!

9:31 AM  

Post a Comment

<< Home