Saturday, March 15, 2008

dtrace.conf(08)

DTrace's (first?) unconference in San Francisco yesterday was a blast. I attempted a VProbes demo with Thursday morning's Fusion bits from our development branch, which I would rate a qualified success. The VMware Fusion UI crashed right after plugging into the projector, but the VProbe stuff worked. In case it was unclear to anybody in attendance, yes, there really was a Windows VM struggling on in the background; sorry you couldn't see its screen. I'm told video will be up shortly, so stay tuned. In the meantime, I let slip that we're attempting to ship VProbes in Workstation 6.5; all of you ESX users who asked me about VProbes on ESX, I have no official news for you, but yes, we feel your pain.

I enjoyed meeting like-minded folks, putting faces to names, and getting a glimpse of the DTrace community's roadmap. Having slept on it a bit, both of Bryan's demos stand out.

As Bryan admits, distributed DTrace as implemented thus far is nothing that you couldn't do with ssh and a lot of elbow grease. But I'm excited about the implications of this project. A wire format for generalized aggregation could provide a lingua franca between DTrace and other sources of aggregation data (*cough* VProbes). You can actually imagine "aggregation" in this VProbes/DTrace sense overlapping with "aggregation" in the RSS sense: administrators "publish" streams of worthwhile data from individual systems, and trees of aggregators pool together broader and broader summaries of these aggregations by talking the same protocol upstream.

Which brings me to dtrace.conf's well-planned climax, Bryan's graphical DTrace demo.

VProbes' textual UI feels like a throwback to me; when showing it to a skeptic, you can almost read the thought balloon floating over his head: "OK, you just typed something weird into an xterm, and it spit something out. It's 2008. I wonder why he expects me to care. Actually, I wonder what the stock market's doing. Oh, wait, this VProbe guy is still talking." But what can I do? I'm a programmer, it's a programming language; how else do I demonstrate its power and more importantly, its flexibility? But people (quite correctly) expect data to be presented visually, because our visual systems are the most reliable, highest bandwidth way of getting quantitative data our brains.

Most graphical tools, though, have only one trick: plotting some scalar data against time. It's amazing how little we've really improved past xload in this regard. Bryan's demo was the first graphical tool I've seen that captures some of the combinatorial and iterativepower of a dynamic instrumentation system: finding an anomaly, then breaking it down by foo, then by bar, backtracking a bit because foo isn't as important as you initially thought, then keeping only the purple bars, etc. Sure, there will always have to be something like a programming language underneath for full generality, but a tool like this has the potential to capture much of the benefit with much less investment of intellectual energy. We professional programmers tend to, err, overestimate others' passions for learning new languages.

It took courage and generosity of spirit on the part of my hosts to provide a platform for what could be viewed as a competing technology at their conference. So, thanks guys. I would (and did) argue that VProbes and DTrace are complements. Users need visibility into the virtualization stack, and we're going to try to give them as much insight into that layer as possible with VProbes, but we're happy to cede higher levels to OS tools, including DTrace. Consider for example TCP. In the grand scheme of a real application, TCP is a very low layer, but from the virtual hardware's point of view, TCP is a modestly high-level protocol to reconstruct from what's sitting on the wire and in guest memory. DTrace will always provide superior visibility into what is really an OS level of abstraction. In my mind, the interesting question is how to weave together these different instrumentation engines for different levels in the stack. Hopefully we'll have something to show for our efforts at dtrace.conf(09).

2 Comments:

Blogger fche said...

Question regarding a little vprobes detail. For your aggregations, how do you track the parameter called "latest"? The others can be done without immediate synchronization across cpus, but can this one?

4:33 PM  
Blogger Keith Adams said...

Excellent point. "Latest" is a vcpu- (or thread-) local hint; so "latest" is only meaningful at all for the thread that you call logaggr from. In fact, it might not even be meaningful there, because some other vcpu might have snuck in and dumped its aggregation buffer between you dumping yours and calling logaggr.

So, it doesn't necessarily mean the "latest" value in any concrete sense at all. So, what is this actually good for, you might ask? Well, "median" isn't a very aggregation-friendly statistic to keep around, but you can get some of its utility by having a concrete, un-toyed-with example of a sample value; e.g., if you've got a bimodal distribution, min, max, and avg can all be somewhat deceptive.

You could get some of this yourself, without building on the aggregations, by, e.g., remembering every thousandth (or whatever) sample in a local variable. In our internal use, this pattern seemed common enough to support it explicitly.

10:37 AM  

Post a Comment

<< Home