Tuesday, June 12, 2012

VisPerf -- Visualize The Output of Visual Studio's Stand Alone Profiler

At work we've recently been making pretty extensive use of the Visual Studio Stand Alone Profiler. We use the professional edition of Visual Studio, so we don't have any of the nice integrated profiling features with pretty GUIs built into more expensive editions of Visual Studio. Instead, with the free command line tools, we can get a set of csv performance reports, including most importantly --

  • A "Function Report" with each function listed with time listed including the functions it calls (inclusive time) and time excluding the functions it calls (exclusive time). 
  • A "Caller-Callee" report with, for each function F in the program (1) a root entry for F (2) A list of callers of F that function with inclusive/exclusive time when that caller calls F (3) A list of callees -- functions that F calls and inclusive/exclusive time when these functions are called by F. 
  • A "Call Tree" report. Similar to the caller-callee report, but represented as a tree starting from the top-level function, a list of all the functions called the next layer down, etc.
The report we've found most useful is the caller-callee report. With it, you can get a nice set of statistics on program performance. You can answer most of your basic questions. Which functions are taking the most time? Which called functions dominate those functions performance? When are those functions hurting performance -- when called by a single specific heavy-hitting caller? Or when called by a variety of different callers?

Unfortunately, as you can imagine, the csv files are very hard to use. Imagine, you want to know why function G is dominating function F's performance. You need to (1) save off/bookmark the root record for function F then (2) find the root record for function G and examine it. Once in function G, you need to repeat the process... say for function H. While keeping these breadcrumbs for the whole F->G->H investigation you've lost any ability to step back and look at the big picture. Maybe F->G->H isn't worth pursuing, but a lot can be gained by optimizing, say, the F->I call stack.

Wouldn't it be nice if you could easily navigate the performance output keeping the big picture in view? If we could only, in one glance, see F, all the functions F calls, and maybe even some levels deeper into the functions those functions call.

Enter VisPerf:


VisPerf is a visualizer for the caller-callee csv reports. The main view shows the performance of one "parent" function, showing rectangles for each function called by the parent.. If you've ever used windirstat, the following description will be second nature to you. Each function's rectangle is sized proportional to how much it contributes to parent's total execution time. So in the above view, we can see that PigLatinize dominates the performance of PigLatin simply by seeing how much area PigLatinize takes in PigLatin's square. Numbers prepended to the function name give more information, with the number in the <> angle brackets giving that function's percentage of the immediate parent and the () parenthesis giving that function's percentage of the overall program (not just when called here, but when called everywhere in the program).

The view doesn't just stop at one level, we can view in one glance at more than one level of depth. In the above view, we see how PigLatinize calls PigLatin which in turn calls PigLatinizeBeginsWithConsonant. We can see in one glance not just how PigLatinize dominates the performance of PigLatin, but that PigLatinizeBeginsWithConsonant is whats causing PigLatinize to be so slow and ultimately causing PigLatin to be slow. We can see right away obvious places for optimization and ignore the harder optimization problems. The depth can be controlled by the spinbox with the number in the upper right.

The current, top-level function can be changed by either selecting from the drop-down or by clicking the one of the rectangles. Each rectangle has a nice mouse-over tool-tip summary with some detailed info which you can copy to the clipboard using the context menu.. Also in the context menu, you can browse to other callers of the selected function to see if this function might matter in performance scenarios elsewhere in your program. The context menu allows you to browse there instantly.

To use VisPerf, download the latest binaries from here. Right click on a caller-callee csv file you've generated from the profiler and select "Open With". Browse to the VisPerf executable. From that point on, VisPerf will be in your context menu for csv files under "Open With" and you'll be able to open caller-callee reports with ease.

This tool has been a game changer for us. My colleagues and I have been using VisPerf to analyze the performance of our projects. We've been able to solve and prevent many performance problems in our everyday development. With an easy-to-use tool to see what's going on, bad performance decisions have be avoided early on, before it becomes an emergency. Its also helping us learn what constructs to use and what to avoid when performance matters (I'm looking at you iostreams and boost mutex). The whole team has become a lot more performance aware and confident because of our new ability to be strongly backed up by powerful, demonstrable data that can be unequivocally seen by everyone.

I'm releasing this because I feel there's a sliver of a chance that it might be helpful to someone else. You too can get the awesome that we've acquired! Please let me know if you have any feedback or find any bugs, I'd love to hear from you!

Odds-and-ends

The source code is available under the Boost license at github for anyone to do whatever with. The code is written in Python using PySide to bind to QT. There's also a python module in the source code "callercallee" that you can use to parse and navigate caller-callee reports programatically.

Its been a very fun project, giving me a lot of fun python code to write and challenging problems to solve. Its also been fun to figure out how to map a 1-dimensional value (percentage of time) to a 2-dimensional value (area) and have it work every time. This "windirstat" method seems to be a novel way to represent performance data and get an understanding of performance data at a quick glance. This continues to be an experiment along these lines. Most profiler GUIs, like Jprofiler for example, seem to use a tree-view to display this kind of data. So I'd love to hear any feedback you might have should you use this tool. Hopefully I'll have some time to blog about some of the technical details and interesting challenges that came up with this tool.

PS for a traditional tree-view of the caller-callee report, check out this CodeProject article.