Ohio Caverns, West Liberty

One of the problems I have in my field is that it is sometimes difficult to produce clear unassailable facts, for example a problem can occur once a year in a production environment that is difficult to replicate or account for in a development setting. We are generally expressing a set of truths that exist in a particular space and time (aka OS, memory constraint, CPU load, network load etc.). Let me quote from The Last Crusade just to help my point:

Indiana Jones: Archaeology is the search for fact, not truth. If it's truth you're interested in, Dr. Tyree's Philosophy class is right down the hall. So forget any ideas you've got about lost cities, exotic travel, and digging up the world. We do not follow maps to buried treasure, and "X" never, ever marks the spot.

Just a few minutes after this memorable statement Indy is doing what action heroes do finding buried and unrecoverable information. For a moment our protagonist is standing directly on the spot he needs but does not actually realize it, not that he is missing the information or that or is looking in the wrong place, it is his perspective that keeps him in the dark. Everyone had missed that “X” had indeed marked the spot. One of the things lost on me when I first watched Indiana Jones, as noted in his quotes, was that this particular action hero was really a scholar. Throughout the movies these two distinct parts of Jones' character were both complimentary and in conflict.

There is on occasion a precarious relationship between theory and practice and this makes space for what looks like magical intuition but our engineering discipline is honed by experience and perspective.

Detecting a Leak

One of the first tasks when designing a large platform or service is to ensure that you can measure load. For the type of systems I design this usually takes the form of an authenticated session. Once you understand what an Authenticated session is you need to ensure you have the ability to monitor the session count, there are many ways to do this but the most popular is to share data via Performance Counters.

If we setup performance monitoring for the following:

  • Authenticated sessions: You create a custom performance counter that measures active sessions.
  • Private bytes: Total bytes is the sum of Managed and Unmanaged bytes.
  • GC Heap: Bytes associated with the Managed heap

Just to state the obvious, the difference between the Private bytes and the GC Heap is the amount of Unmanaged bytes.

Depending on what we see, we can come to one or more of the following conclusions.

  1. In a healthy environment we expect that number of Authenticated sessions to be directly proportional to Private bytes and GC Heap, that is, as the Authenticated session count goes up so will the memory allocations necessary to deal with the increase. Conversely when the Authenticated sessions go down, we expect some recovery from Private bytes and GC Heap (based on execution the Garbage Collector).
  2. If over the course of time Authenticated sessions increase and decrease, but Private bytes trend continue to increase (but not GC Heap) we can say, in a very general sense, that we have a leak in managed or unmanaged memory (or both).
  3. If over the course of time Authenticated sessions increase and decrease, but the GC Heap trend continue to increase but not Private bytes does indeed recover we can say, in a very general sense, that we have a leak in managed memory.
  4. If over the course of time Authenticated sessions increase and decrease, but Private bytes trend continue to increase but the GC Heap does indeed recover we can say, in a very general sense, that we have a leak in unmanaged memory.

Get to know performance counters they add almost imperceptible overhead but offer clues that you would otherwise miss. You see enough of these performance graphs and it eventually becomes easier to detect the facts buried in the truth, in this way X (and Y) always mark the spot.