Saturday, April 05, 2008

Herlihy & Shavit: The Art of Multiprocessor Programming

(Disclaimer: I've taken courses from Maurice Herlihy at Brown.)

So! Concurrency's the hot thing. Intel says so. The technical bookshelves are groaning with books about concurrent programming. Most of these books are occult thread package tutorials; they walk you through using Java's monitors, or pthreads' conditions and mutexes, or Windows' Events or what have you, often with the same shopworn collection of toy problems: philosophers producing elves who dine on ... eh.

The Art of Multiprocessor Programming is different. It is, in part, a theory book. Don't panic! The use of formal reasoning is tasteful; a teaspoon or so of actual math is necessary when designing concurrent programs, because our intuitions can fool us. However, the authors don't overwhelm the reader; an undergraduate level background in CS theory is not necessary to appreciate the book. If you can read the word "tuple" without giggling, you'll be fine.

More importantly, though, the actual data structures and algorithms that are highlighted in this book are not the same-old. Sure, mutual exclusion is discussed, but it's with a focus on implementing mutual exclusion, which is a topic dear to me. Most strikingly, the majority of the examples in the book are of lock-free data structures. You'll find lock-free linked lists, hash tables, trees, etc.

In my day job, I treat lock-free concurrent data structures as, well, not quite black magic, but certainly Powerful Medicine. They take a lot of thought, bugs can hide in them for years, reasoning about their performance is non-trivial, even compared to locking, and frankly, not every programmer is up to maintaining such code; but when you need them, you need them. If the multi-core popular press is to be believed, more and more of us will be needing them in the near future. In as much as this is the first off-the-shelf collection of useful, debugged lock-free data structures, it's an invaluable contribution. Go buy it. Even if you never use a counting network, it will make you a better programmer. It's also the most writerly book about programming you'll read all year; Herlihy's deadpan humor is a welcome break from the super-serious tone programming books usual adopt.

However. ...

Reading between the lines a bit, I see this book as part of a larger project, which I refer to as "pop concurrency*". "PC" is a loose confederation of researchers in programming languages, compilers, architecture and operating systems whose implicit goal is to make concurrent programming, well, "easy." Much, much easier, anyway. Of course, sequential programming isn't all that easy. Before the current fashion in concurrency, the "software engineering" community was haranguing us about the expense, difficulty, and intractability of programming. We've made only incremental progress in making sequential programming easier; garbage collectors and safe languages are probably the most important contributions so far, ad they are not order-of-magnitude revolutions over traditional tools.

It's cheap and easy to make negative predictions about ambitious research agendas. E.g., making fun of strong AI people stopped being fun for me about 10 years ago. But at the risk of seeming to make a cheap shot, I am really, really skeptical about how easy we can make concurrent programming. Even if the thicket of hardware and software problems around transactional memory** resolve, I suspect TM only seems easy because it's the devil we don't know. After all, locks sound easy too: just identify the shared data and wrap mutexes around it! Sounds simple enough, doesn't it? It took years, and lots of big systems being constructed this way before problems like priority inversion were framed, and many years thereafter before they were solved. Paraphrasing Fred Brooks, there's no silver bullet; concurrency is hard not because we're Western linear Cartesian dualist squares, or because of bad notation, but because it's actually hard. Like all software, the easy parts are quickly discovered and factored, and the code that's left to be written represents real problems to solve.

* Warning: I made this term up. Do not use it when trying to sound intelligent.
** E.g.: how do you debug a TM program? When you're stepping through a transaction, won't it basically always fail? System calls during transactions? I/O? What if you've DMA'ed into or out of a piece of physical memory that's being transacted-upon? Smarter people than me are thinking about these problems, but man. They are non-trivial.

Saturday, March 15, 2008

dtrace.conf(08)

DTrace's (first?) unconference in San Francisco yesterday was a blast. I attempted a VProbes demo with Thursday morning's Fusion bits from our development branch, which I would rate a qualified success. The VMware Fusion UI crashed right after plugging into the projector, but the VProbe stuff worked. In case it was unclear to anybody in attendance, yes, there really was a Windows VM struggling on in the background; sorry you couldn't see its screen. I'm told video will be up shortly, so stay tuned. In the meantime, I let slip that we're attempting to ship VProbes in Workstation 6.5; all of you ESX users who asked me about VProbes on ESX, I have no official news for you, but yes, we feel your pain.

I enjoyed meeting like-minded folks, putting faces to names, and getting a glimpse of the DTrace community's roadmap. Having slept on it a bit, both of Bryan's demos stand out.

As Bryan admits, distributed DTrace as implemented thus far is nothing that you couldn't do with ssh and a lot of elbow grease. But I'm excited about the implications of this project. A wire format for generalized aggregation could provide a lingua franca between DTrace and other sources of aggregation data (*cough* VProbes). You can actually imagine "aggregation" in this VProbes/DTrace sense overlapping with "aggregation" in the RSS sense: administrators "publish" streams of worthwhile data from individual systems, and trees of aggregators pool together broader and broader summaries of these aggregations by talking the same protocol upstream.

Which brings me to dtrace.conf's well-planned climax, Bryan's graphical DTrace demo.

VProbes' textual UI feels like a throwback to me; when showing it to a skeptic, you can almost read the thought balloon floating over his head: "OK, you just typed something weird into an xterm, and it spit something out. It's 2008. I wonder why he expects me to care. Actually, I wonder what the stock market's doing. Oh, wait, this VProbe guy is still talking." But what can I do? I'm a programmer, it's a programming language; how else do I demonstrate its power and more importantly, its flexibility? But people (quite correctly) expect data to be presented visually, because our visual systems are the most reliable, highest bandwidth way of getting quantitative data our brains.

Most graphical tools, though, have only one trick: plotting some scalar data against time. It's amazing how little we've really improved past xload in this regard. Bryan's demo was the first graphical tool I've seen that captures some of the combinatorial and iterativepower of a dynamic instrumentation system: finding an anomaly, then breaking it down by foo, then by bar, backtracking a bit because foo isn't as important as you initially thought, then keeping only the purple bars, etc. Sure, there will always have to be something like a programming language underneath for full generality, but a tool like this has the potential to capture much of the benefit with much less investment of intellectual energy. We professional programmers tend to, err, overestimate others' passions for learning new languages.

It took courage and generosity of spirit on the part of my hosts to provide a platform for what could be viewed as a competing technology at their conference. So, thanks guys. I would (and did) argue that VProbes and DTrace are complements. Users need visibility into the virtualization stack, and we're going to try to give them as much insight into that layer as possible with VProbes, but we're happy to cede higher levels to OS tools, including DTrace. Consider for example TCP. In the grand scheme of a real application, TCP is a very low layer, but from the virtual hardware's point of view, TCP is a modestly high-level protocol to reconstruct from what's sitting on the wire and in guest memory. DTrace will always provide superior visibility into what is really an OS level of abstraction. In my mind, the interesting question is how to weave together these different instrumentation engines for different levels in the stack. Hopefully we'll have something to show for our efforts at dtrace.conf(09).

Wednesday, January 23, 2008

MacOS X/DTrace tryst ends in tears.

See Adam's break-up note. This also got picked up on slashdot.

My thoughts, in no particular order:

  1. Shame on Apple. What is the point of building system-wide infrastructure that any random process can opt out of? Forget about DRM; what about legitimate uses of DTrace for system health monitoring? Won't all malware be sure to set the "don't dtrace me" bit now?
  2. This highlights the value of having system-wide instrumentation at the hardware level. Systems like VProbes do not allow a process, or indeed even a kernel, to "opt out." If the VM's user wants something traced, he gets it traced, by golly.
  3. On the other hand: if code really wants to evade VProbes, DTrace, debuggers, etc., the arms race is heavily stacked in favor of the sneaky code. E.g., suppose you've got some super-s3kr3t, double-plus DRM code that you absolutely don't want instrumented with DTrace's pid provider; on x86's, you could easily checksum the text to find smashed-in int3's. More elaborate evasion/detection schemes are possible too, similar in spirit to the VMM detection techniques summarized in our HOTOS paper about VMM detectors from last year. We concluded that creating a completely invisible VMM is an ill-posed problem, and trying too hard is a waste of time. I believe a similar dynamic is at work with dynamic tracing systems.

Thursday, October 25, 2007

Ouch, Theo!

Slashdot, by way of kerneltrap, picked up a hilarious flamewar over on the OpenBSD mailing list. The short version is that some guy thinks virtualization involves code, and code has bugs, and therefore virtualization is only ever bad. Or something. I'll leave the histrionics to those who think there's a substantial argument lurking in there somewhere, but I just couldn't help myself when I saw the following appeal to authority:

While x86 hardware has the same page-protection hardware that an IBM 390 architecture machine has, modern PC machines are a mess. They are architecturally so dirty, that parts of the video, keyboard, and other IO devices are interfaced with even to do simple things like context switching processes and handling interrupts. Those of us who have experience with the gory bits of the x86 architecture can clearly say that we know what would be involved in virtualizing it, and if it was so simple, we would not still be fixing bugs in the exact same area in our operating system going on 12 years.


I'd like to call "BS" on the above claim of inscrutability. No, you do not dork around with, e.g., the video card, to switch contexts. You may indeed mess about with an I/O device in "handling interrupts," but only because, gee, there's an interrupt to handle. In my seven years of debugging every goofball OS written for the PC on our VMM, I haven't encountered a single task switch that wasn't basically some variation of disabling interrupts, taking a couple spin locks, switching stacks, and changing the page table pointer. You know, kind of like it is on every architecture on the planet.

Wednesday, September 05, 2007

Denier's Dilemma

Shankar Vendatam at the Washington Post writes:

The federal Centers for Disease Control and Prevention recently issued a flier to combat myths about the flu vaccine. It recited various commonly held views and labeled them either "true" or "false." Among those identified as false were statements such as "The side effects are worse than the flu" and "Only older people need flu vaccine."

When University of Michigan social psychologist Norbert Schwarz had volunteers read the CDC flier, however, he found that within 30 minutes, older people misremembered 28 percent of the false statements as true. Three days later, they remembered 40 percent of the myths as factual.

...

(By way of the often excellent, and often nutty, Overcoming Bias.) I'm reminded of the difficulty I've had explaining that VMs and security don't really have much to do with one another. Is it possible that, in attempting to explain something really convoluted to a crowd that barely cares, we sometimes do more harm than good? If you find yourself in a situation where you're in possession of a counter-intuitive result, that's really hard to explain, but important none the less, what's the ethical thing to do, given that saying "X" will register in a lot of folks' minds as "not X"?

Saturday, September 01, 2007

Presenting VProbes

VMworld 2007 is coming up, and I'm more excited than usual this time around. Alex Mirgorodskiy, Robert Benson and I will be demonstrating a piece of technology called VProbes.

VProbes was born from an experience that many of you have probably shared: I was wondering why my computer was so slow. I have whiled away many hours over the years investigating this question, and I have rarely gotten a solid answer. Sometimes the slowness is intermittent, and by the time I've developed a theory, it has already disappeared. Other times, the diagnostic tool I want to run (top, or strace, or ltrace, or what have you) isn't installed in the VM, because it's a purpose-built VM with a constrained set of user-land tools. Or, the tools are there, but the system is not in a state to allow me to use them; e.g., it's hung early in boot, and I can't log in to the system. Still other times, I never really scratch the surface of the problem because it's Windows and I don't know my way around.

VProbes attempts to provide a set of tools for answering the question, "What the heck is this computer doing?" It's an open-ended question, so vprobes is accordingly open-ended, as well. In its current form, it provides an interactive, safe way of instrumenting a running VM at any level: from user-level processes down to the kernel, and even into VMware's VMM and hypervisor, if need be.

We also attempt to solve the "unfamiliar tools" problem, by papering over (to the degree possible) the differences among operating systems. There is a broad consensus among Windows and the UNIX flavors about the abstractions that operating systems provide: all major operating systems provide threads, a tree of named processes, named files in a hierarchical namespace with byte contents, sockets, etc. VProbes tries to get by on enough guest-specific knowledge to "wrap" those differences. Thus, for Windows, Linux, and any other operating system that VProbes can be "taught" to understand, the following vprobe script provides a rough and ready approximation of the top UNIX utility:

Guest_TimerIRQ ticks[curprocname()] <- 1;


"Big deal," you might be thinking. "I had top already." Well, you didn't have top on Windows. And you didn't have it while the machine was booting, or shutting down. And you certainly didn't have it if the machine in question was a virtual appliance that doesn't even allow logins.

Anyway, if this sounds interesting to you, please consider attending our VMworld breakout session on the 13th, from 3:30 to 4:30, numbered TA70. I should warn that this is basically a research prototype: we haven't committed to putting VProbes in any upcoming VMware product, and indeed, might never ship it at all. We're putting what we've got in front of the public now because we are at the point where we need feedback from real users to improve VProbes.

In the "credit where it's due" department: we owe an enormous debt in our thinking about this problem to our colleagues at Sun. I've never hidden my admiration for DTrace. We hope to improve on it. First, we are aiming to provide a Dtrace-like tool for other commercially important operating systems than Solaris. Second, VProbes can combine with other virtualization-based techniques in powerful ways. For example, VProbes and deterministic replay combine to make the most potent tool that I'm aware of for debugging intermittent performance anomalies. I'll leave applications to VMotion, snapshots, etc., as tantalizing hints...

Wednesday, August 01, 2007

Hmm. That's not what I think I wrote...

I'm perplexed by this peculiar interpretation of our HotOS paper:

Security has been touted as one of the benign by-products of virtualisation – but according to a recent study, that’s no longer the case.

Huh? "Security" is a big concept, covering many possible applications. Our paper is just a tiny footnote in a complicated discusssion about the role of VMMs in providing security; in fact, to the extent we challenge conventional wisdom at all, we suggest that the purported security threat posed by VMM-based rootkits is non-existent. To get out the sock puppets: VMMs are always detectable, which is a good thing.

I think I see where the author got a bit tripped up reading our paper, though. Currently, there is a trend for malware to disable itself in the presence of VMMs. We argue that this trend cannot continue, not because VMMs are becoming undetectable, but because VMMs are becoming too ubiquitous for malware authors to ignore. Note that this current fad among malware authors of refusing to do dirty business in a VM is not an inherent security benefit of VMs!

A nice thought experiment when thinking about security applications of virtual machines is to replace "VM" with "laptop." Suppose, for the sake of argument, security researchers started building honeynets out of laptops, because it was more economical to do so. For a while, malware authors might decide to detect that they're running on a laptop, and refuse to do so, in order to thwart the security researchers. (Note, of course, that it's completely trivial for user-level software to figure out what sort of hardware platform it's running on; this information is intentionally, usefully exposed by, e.g., /proc nodes in Linux, or the \Device directory on NT, etc.). The answer to this situation is not to attempt to cloak the laptop-iness of the hardware platform, though; attempting to do so is a bottomless pit of wasted effort. Instead, we simply observe that there are a lot of laptops out there, and figure that, in the long run, malware that refuses to run on laptops will be giving up too many targets to represent a Nash equilibrium for the black hats.

Honestly, though, I'm not that happy with the paper. We're telling a complicated story, and I'm not sure we tell it clearly enough. Even if Manek Dubash's misinterpretation were intentional*, the fact that it the paper can even be plausibly distorted to read the way he suggests means that, at some basic level, we've failed. Good technical writing is hard.

* Purely speculating, here. I'm not casting aspersions on anyone's intentions; an honest misunderstanding is, sadly, possible.