Dispose and IDisposable in .net classes
Developers developing .net applications have a lot of work done for them when it comes to memory management. In some other languages memory has to be very explicitly allocated and deallocated which can be both laborious and error prone. Errors can in fact lead to memory leaks which are a very bad thing. By memory leaks I mean that a process allocates some memory for something but then doesn't free up that memory afterwards so no other process will be able to use it. If the part of the program containing the memory leak is run again and again then the amount of unusable memory creeeps up and up and can eventually crash the system.
So within .net there is something called the Garbage Collector which does a lot of this stuff for us. At a very basic level what it does is it runs at undetermined times - ie every now and then, maybe when the system is quiet or resources are getting low. What it does is it looks for all objects which have memory allocated but which do not have any references to them anywhere which basically means it is impossible for the object to be used anyway from any managed code and so the memory they are using can safeley be deallocated.
This all works very well for intrinsic .net objects. However there are many times when our classes have to deal with objects which are not intrinsic to .net. These are things like database connections, files, instances of office applications, COM objects and so on.
For these types of things we have to deal with memory management ourself. Furthermore we must also consider not just memory but we must make sure that files are closed for example so that locks are released and other programs can access them. So whatever is involved with cleaning up unmanaged objects, we have to do it ourselves is the key point.
Consider a C# program which opens up an Excel spreadsheet and reads some data from it. We must make sure that we clean up this resource when we are done with it. If this all happened in one method such as public void ProcessSpreadsheet(string spreadsheetPath) then this method would clean up the spreadsheet at the end of the method - ie close the file, kill off the memory etc. This has to be done by hand as the Garbage collector only knows how to clean up .net objects.
Ok so this is all fine but sometimes we have a class which has a reference to an unmanaged resource which doesn't just exist for the lifetime of a method execution - some objects exists for the lifetime of a class. For example we may be working with an Excel spreadsheet and we write a class which encapsulates a spreadsheet. We pass in a spreadsheet path in the class constructor and this creates an instance of a spreadhsheet and reads in the file. We then do stuff with that for the duration of the existence of the class and then when we are done with the class then we have make sure we close the spreadsheet, release memory etc.
Such a class might look something like this:
So lets get a little Winforms app going which will use this class.
Create a form which looks like this:
If you want to cheat then feel free to use my designer generated code which is as follows:
So we now need to write some code so that when the browse button is clicked we can use an OpenFileDialog object to set the spreadsheet path. We then need to write code which will run when the Open button is clicked. This will simply create a new instance of our spreadsheet class and call the DoSomeWork method which simulates doing some work.
The rest of the form code should then look like this:
Now hopefully the problem here should be obvious. After we call the DoSomeWork method then our sheet object will out of scope and there will be no further references to it. So it will be eligible for Garbage collection and will be garbage collected hopefully at some point. Now when this object is garbage collected could be in 1 second, 1 hour, 1 day or even never at all if there are some system problems.
Let's be optomistic and say that the object is garbage collectted in 1 second. All of the memory used up by .net for this object will then be freed up. However the garbage collector deosn't know about and isn't responsible for cleaning up Excel files so we will be left with a instance of Excel.exe running but not belonging to anything.
Did you notice that our Spreadsheet class constructor has a boolean parameter. Ths determines whether the spreadsheet will be visible or invisible. It is quite common to create instances of Excel spreadsheets from .net applications and have them invisible so that we may just read data from it for an import process maybe and then we don't want the screen cluttering up which would look unprofessional.
Try running the above code and opening any old spreadsheet. First try running one with the boolean parameter set to true and you will see that a visible instance of excel is spawned. Then after you close the WinForms application you can just manually close the spreadsheet and everything is fine.
If you change the code though and then try and do this again with the boolean parameter set to false then this time you will not see a visible instance of Excel. However it will still definatley be running in memory and will be using up resources and these resources will not be released as we haven't written any code to do so yet.
We can run this code and see that the Excel instance is running by using Task Manager which should reveal the Excel.exe instance as shown below:
You will now have to manually get rid of this Excel.exe process by right clicking on it in Task Manager and then choosing kill process.
Ok so the problem then is if we have a non intrinsic resource used by a class and that resource exists for the lifetime of the class then how do we clean up those resources? As mentioned earlier if this resource only exists for the execution of 1 method then that is easy - we just create the resource at the start of our method, do whatever work we need to do, and then clean up the resource.
The problem with a class level resource is that we don't really have an event like Class_NotUsedAnyMore or anything like that do we? Well actually we do - there is a way we can write code which will get executed just prior to the object being garbage colleceted. So whilst we could use this to trigger clean up code, we don't know when or even if this will ever run and consequently our Excel.exe could be sat there using resources for long and unnecessary amounts of time.
So what is the answer to this problem?
Well we could write a method which does our clean up code and call that something like CleanResources for example. Ok so let's write this method and add it to our Spreadsheet class as follows:
Now the more observant of you may have noticed the indiscrimant brutality of this code. This is deliberate as Excel can be one nasty object when it comes to killing it off. My code is overly simple as I don't want to get side tracked looking at the problems you may encounter when trying to clean up running instances of Excel.exe. As long as my code does clean up that is all that is needed for what we are looking at here. This article is about how and when we trigger clean up code, not how to write the actual clean up code.
However if you are interested in reading about the adventures of Excel.exe then the following article is pretty good:
Ok so now we have this clean up method, we need to make sure our client code calls it so we will need to change our button click event handler code as follows:
Ok so that's great but what if the person writing the client code forgot to call this method or suppose the Spreadsheet class is not a class we have written but is an intrinsic .net class or some third party class or component? That would be very problematic. The problem is not just that coders might forget to call this method but they might not even know of it's existence. What we need is some kind of standard right? Yes this will work and the standard is realised by using an interface. IDisposable to the rescue saves us. Any class which has unmanaged resources which need cleaning up when the class is finished with should implement the IDisposable interface.
If you right click on IDisposable in Visual Studio and click on to definition then you will see the following:
So we can see that the comments confirm our understanding of what this interface should be used for and that the only method implementors need to provide is a Dispose method.
So in essense the convention is as follows:
Ok that's all great so lets build that into our Spreadsheet class. All we need to do is implement IDisposable and change the name of our CleanResources method to Dispose. The new and improved Spreadsheet class should look like this:
Now all we need to do is change our client code to make sure we call our Dispose method. So our new client code will look like this:
Another way of calling Dispose and a way which you have probably seen before is to use a using statement. This would look like this:
So all you need to do is enclose the line which creates your object inside a using with brackets and then follow this with curly braces. Once all of the code in the curly braces has been executed then Dispose will be automatically called against the object.
Ok so this is all looking a lot better and if you now come across any classes which have a Dispose method then you should always call this when you are done with it and using the using statement is a nice way to do it so you won't forget. Common objects which do implement IDisposable include the following:
And the list goes on the common theme being that they all work with unmanaged resources.
Ok so now we still have a problem and that is that this is just a convention - there is no way to force client code to call Dispose on our objects so how can we tackle this problem should this be the case?
Well this is where things start to get a little bit more difficult to follow but lets give it a .
Classes in C# can have a constructor in which code is placed which needs executing when a new instance of the class is created. This is well known and understood. Constructors have no return type and have the same name as the class. In fact look back at our earlier code and you will see that our Spreadsheet class already has a constructor into which we pass the path of the spreadsheet we wish to open up.
C# Classes can also have a method called a destructor. You can write a destructor by creating a method which the same name as the class and prefix the method name with a tilde(~). So if we create a destructor for our Spreadsheet class it will end up looking like this:
Note that in the destructor all I have put is a call to the Dispose method. So when does this method get executed? Well when and if the garbage collector gets around to garbage collecting our object then just prior to doing so it will execute any code in the object's destructor if it has a destructor. So all we do is place in there a call to Dispose so that if client code forgets to call Dispose then at least it will get called when/if the object is garbage collected. Not an ideal solution but its just a safety mechanism really.
So now we have a new problem. What happens if our client code does call Dispose on our object? In that case then when the object is garbage collected then it will get called again. This could cause unexpected behaviour and is at the very least bad as it will cause the code to run twice which is unnecessary.
So how can we get round this. Well the standard way it to have a private bool field in the class to keep track of whether we have run Dispose or not.
This would be declared like this:
Now this can be set to true when we run Dispose so we can make sure we don't run the code twice. Our new Dispose method will then look like this:
Now that we have a boolean private field which gives us an easy way to see if our object has been disposed, we should really use this and check when any methods are run - in our case the DoSomeWork method. This needs to throw an exception if this method is called after Dispose has been called.
This should be done as follows:
Ok so hopefully this all make sense and isn't too complicated. Unfortunatley though now things do need to get a little bit more complicated. For reasons that will soon become apparent we need to call Dispose in a way which allows us to know whether it was called by the client calling Dispose or by the destructor calling it. For this reason we shall introduce an overload of the Dispose method which will have the following signature:
This new overload will then be called by the Dispose method which will call Dispose(true) but when the destructor runs this will call Dispose(false) and so we will know how the method was invoked.
So to recap, client code will still call the parameterless Dipose but that will then call Dispose(true) like this:
Meanwhile if client code neglects to call Dispose, then the destructor will directly call the overloaded Dispose with Dispose(false) as follows:
Ok so now we will be moving the clean up code into the overloaded Dispose and this will now look like this:
So the new bit of code we have in there is run only if the overloaded Dispose has been called from the client via the parameterless Dispose method. What this code does is clean up any managed resources by just setting them to null. We would also use this section to call Dispose on any other objects we had references to which themselves implemented IDisposable. For example in this class we might be using the spreadsheet to read in some data but then writing that data somewhere else, perhaps to a stream or to a disk file via a stream. In such a case we might have a private class field of type FileStream which exists for the lifetime of the Spreadsheet class. In that case when we call Dispose via the client, then we should in turn call Dispose on these types of objects. The code would then look something like this:
So this new code now only cleans up managed resources (setting to null, calling Dispose) if the Dispose overload was called with the disposing parameter set to true. That explains why this parameter is called disposing - because the client called Dipsose. Not a great reason for choice of parameter name but thats how the convention has evolved. Anyway so if we were to clean up managed resources when the garbage collector kicked in and called Dispose(false) then this would be pointless as timeley disposal of resources is not an issue anymore. In addition we could end up trying to clean up objects which no longer exist and could incite exceptions so this should therefore be avoided.
Now there is a problem with this new overloaded Dispose method. We have declared it public but since it should never be called directly by client code - it is always called either via the parameterless Dispose method or via the destructor (which is in the context of this class) then we need to make this protected rather than public and in fact we also need to make it virtual so that any sub classes can override it if necessary. Of course we wouldn't need to make this virtual if for any reason we were going to make this a sealed class.
So assuming this isn't going to sealed then this overload should in fact now look like this:
Ok so now we need to make a slight modification to our initial parameterless Dispose method. We need to add a line to it so that it looks like this:
Ok so what is this new line doing. Well what actually happens is that when a new object is created, if that object has a destructor then the Grabage collector places this in a finalization que. Then when the garbage collector runs, it frees up unreferenced obejcts and also iterates over the finalization que and calls the destructor on each of these and then removes them from the finalization que. These obejcts wil then have their memory reclaimed the next time the garbage collector runs. So it seems that this functionality exists pureley to give us a way to hook into that process via our destructor. So once we have called Dispose via the client, then there is no need to then have the destructor run later on and so we remove this object from the finalization que. This is what the GC.SupressFinalize(this) does. It also does lead to some performance increase too.
I think we are about there now but one point that I really don't know the answer to is whether there is one Garbage collector running somewhere, or whether there is one instance of it per .net process. Either way I am led to believe that despite garbage collection supposedly being non deterministic, all objects for a process will be collected when that process terminates. My personal theory is that each process has it's own garbage collector running on a seperate thread with a low priority but then that priority is raised to high when the garbage collector kicks in. If anyone can shed any light on this then please feel free to post a comment via the form below.
So in summary lets look at the complete code for our Spreadsheet class:
And to wrap things up, lets have a look at the two process flows via a couple of flowcharts.
Dispose via the Client
Dispose via the Destructor
So that's about it. A very difficult topic to follow - I know I had to write the article!
These articles are good:
Also note that GC.SupressFinalisation just stops the GC from calling the destrcutor twice, it still garbage collects of course.
These points are also useful:
Garbage collection: GC reclaims the memory used by the object when the object is referenced no more.
Dispose: a method from the IDisposable interface that is to release all managed and unmanaged resources when the programmer calls it (either directly or indirectly via a using block).
Finalizer: a method to release all unmanaged resources. Called by the GC before reclaiming the memory.
Managed resource: any .NET class that implements the
IDisposable interface, like Streams and DbConnections.
Unmanaged resource: the stuffing wrapped in the Managed Resource classes. Windows handles are the most trivial examples.
Any comments or corrections greatly appreciated.