Monday 10 June 2013

How not to manually manage Garbage Collection

Sometimes code tells you something about the person that wrote it.  I've had the pleasure of working with a codebase that contains the following code*:
while (MainApplicationWork())
{
 GC.Collect();
 GC.WaitForPendingFinalizers();
 GC.Collect();
 GC.WaitForPendingFinalizers();
}

NullMyObjectGraph();
GC.Collect();
GC.WaitForPendingFinalizers();
GC.Collect();
GC.WaitForPendingFinalizers(); 

Which I think bears some scrutiny - whenever you see a call to GC.Collect your .Net spidey sense should be going bonkers.

Code like this is dotted throughout the suite of applications this snippet comes from, specifically this:

GC.Collect();
GC.WaitForPendingFinalizers();
GC.Collect();
GC.WaitForPendingFinalizers();

which implies that this is standard practice for the dev that wrote it, rather than the response to a problem.  It's conceivable that this person has done a lot of performance tests and could demonstrate that all the calls to GC.Collect are justified.  Or more formally, it's a logical possibility, like the sky being green or everyone from Birmingham having six fingers on their right hands.

Unfortunately because the code is proprietary I can't share the full horror and give the performance metrics to demonstrate how truly awful this is - for instance, the application that contains the top snippet calls GC.Collect() at least four times for every iteration of the while loop.  However what I can do is examine this code, look at why you might do something like this and try to show that the code tells you that the person who wrote it doesn't really understand it.

There are many sources that tell devs to not call GC.Collect() [1, 2], however there are some circumstances when it might be the right thing to do.  According to Microsoft you should "Use the Collect method when there is a significant reduction in the amount of memory being used at a defined point in your application's code."[3]  Rico Mariani also offers this piece of advice for when you might call GC.Collect, "if some non-recurring event has just happened and this event is highly likely to have caused a lot of old objects to die."[4]

At first glance this code doesn't seem to fit that last piece of advice, since it sits in a while loop, however the method that is used for the loop termination criterion (MainApplicationWork()) effectively returns false when there are no more records for the application to process.  Those records do not arrive in a regular fashion, so it is arguable that this is an event that occurs repeatedly, but is irregular (perhaps akin to the canonical example of a user closing a windows form - it might happen multiple times, but it's not a predictable event and so the GC algorithm self-tuning will not work).

When the call to GC.Collect() is made it "Forces an immediate garbage collection of all generations."[5]  Part of this process involves moving pointers from the finalization queue to the freachable queue for all objects with Finalizer methods that have been found to be garbage (i.e. are unreachable).  Interestingly this resurrects the objects you are trying to get rid of because there is now a reachable reference to them (on the freachable queue, which is considered a root) and they will only become unreachable again once their finalizers have been run by the Finalization thread (which runs automatically whenever there are objects in the freachable queue).

Because of this it is recommended[6] to call GC.WaitForPendingFinalizers after GC.Collect as this forces the current thread to wait until all the finalizers in the objects you are trying to clear up have been executed and thus avoids the objects getting promoted to Gen1.  Given that we've just resurrected some objects, it is recommended to call GC.Collect again to finally release the memory of all the objects that have just been finalized. So you should end up with:
GC.Collect();
GC.WaitForPendingFinalizers();
GC.Collect();

But in this codebase there is a further call to GC.WaitForPendingFinalizers which does, ummmm, precisely nothing since the only compaction/garbage collection that takes place as part of the second call to GC.Collect() is of the objects on the heap that have just been finalized.  Nothing else has changed in the application in the meantime since both GC.Collect() and GC.WaitForPendingFinalizers() block the current thread.

Does it do any harm?  Probably not (although that requires some actual performance metrics to be confident on), but what it does shout is that this snippet is a piece of cargo-cult programming, repeated through a codebase, no doubt with the best of intentions, but almost certainly for no good reason and very likely damaging application performance.


* more or less - the application method names have been changed to protect the innocent
1. http://blogs.msdn.com/b/ricom/archive/2003/12/02/40780.aspx - Rico Mariani "Two things to avoid for better memory usage"
2. http://msdn.microsoft.com/en-us/library/ff647790.aspx - MS Patterns & Practices "Improving Managed Code Performance"
3. http://msdn.microsoft.com/en-us/library/bb384155.aspx - MSDN "Induced Collections"
4. http://blogs.msdn.com/b/ricom/archive/2004/11/29/271829.aspx - Rico Mariani "When to call GC.Collect()"
5. http://msdn.microsoft.com/en-us/library/xe0c2357.aspx - MSDN Library GC.Collect definition
6. http://msdn.microsoft.com/en-us/library/ff647790.aspx - MS Patterns & Practices "Improving Managed Code Performance"

No comments:

Post a Comment