Immutable classes in C# - String and Delegate
This article is going to have a brief look at immutable types, in particular strings and delegates. Before doing so though we need to have a look at some background theory concerning value and reference types after which we can get to the main point of the article. Within .net we have two types of variables - Value and Reference Types.
When we have value types then they are stored in a type of memory called the stack. If we try to visualise the stack in a very simple form it might look like this:
The above would be how data is stored in memory for a simple program such as the one below:
So we declare the variables a, b, c and d and these all contain pieces of data which have a little area of memory where that data is stored. This is one way which memory is used and the types of variables which use data this way are called value types and the type of memory they use is called the stack.
Note that the variables declared in my program are in the reverse order to how they are shown in the diagram. The reason for this is that memory in the stack is used just like a stack of plates for example. You start with 1 plate and then if you need another one then you stack one on top of that and then if you need another then you stack another on top of that. This is not suprisingly why it is called the stack. The last thing to be placed on the stack will be the first thing removed when that variable is no longer used in our program and the memory needs freeing up. In computing this kind of structure is often referred to as a LIFO - Last In First Out. The stack is most commonly used for variables which are simple in nature such as int, char, double, dates etc.
If we have more complex variables such as say a Customer object then this would not be stored this way but visually might look something like this:
So we have a second area of memory coming into play here called the heap. The actual variable name is stored in the stack but has a memory address which points to an area of the heap where the real, complex data is stored. Access to the heap is random meaning that you don't have to read through it bit by bit until you get to the bit you want - you can jump straight there once you know what the address is.
A program to represent this in action could look something like this:
These types of variables are called reference types and use the Heap as their main storage. They are called reference types since their entry in the stack contains a reference to the area in the heap where the real data is stored.
It is important to understand what happens when we use value types and reference types in certain ways.
Consider the following small program:
When we run this we get the following output:
So whats happening here is that when we declare b and say that it is equal to a then b is declared as a new variable on the stack and given the value which a has which in this case is 2. Then we change b to equal 3 and then output an and b again to see that a is still equal to 2 and b is now equal to 3. This should hopefully all seem natural and expected.
When we are dealing with a simillar process but working with reference types we get results which might not be what you expect.
Consider the following program:
The output when we run this is not what you might first think. It looks like this:
Ok so whats happening?
Well first of all we create an instance of the Customer class and call it fred. We then set the properties of the fred object and then we output the fred object to the console.
Next we create another instance of a Customer object but call this dave. We set this equal to fred. Now this process of setting dave equal to fred works differently. Creating dave does create a new instance of dave and places it on the stack but dave never gets its own portion of the heap allocated. The reason for this is that when we create value types we typically like this:
And so on. When we do this then a variable is created on the stack.
With reference types though we typically :
Or we can sometimes split this up eg:
So when we say Customer fred then this causes a variable to be created on the stack and when we say fred = new Customer() then this is what causes memory to be allocated on the heap and that heap address to be placed into the stack area for the fred reference to point to.
Now in our code we never said dave = New Customer() - all we said was dave = fred. And so what this does is make our dave variabe on the stack point to the same heap area as the fred variable. Diagramatically this might look like this:
And so when we change the balance of the dave object we are in effect changing the value in the fred object since they both point to the same thing!
We can further prove this by using a static method of the System.Object class - the God of all classes. This method is called Reference equals and it takes two objects as parameters and returns a boolean value which is true if the objects are the same instance.
Since this method is declared as public static we can refer to it with either of the following two lines of code (or any other class for that matter since all classes ultimatley derive from System.Object)
And so we can build this into our little Customer program to prove that dave and fred are the same instance. The new code will look like this:
We now need to have a quick look at how different types, that is value and reference types, are dealt with when they are passed into methods as arguments.
To help us study this, consider the following program:
When we run this program, we should get the following output:
This should hopefully be as expected. When x gets passed into the method as the parameter called a, then a takes a copy of x and it is this which is doubled inside the method and so when we output the value of x after we have made the call to the Double method, then the original value of x is unchanged.
Now let us try a simillar experiment but using our Customer class. Consider the following program:
When we run this program, we should see the following results:
So in this program we create an instance of a Customer class and call the variable fred. We set some properties of the fred object and then output these to screen. We then call a DoubleBalance method which takes a Customer object as a parameter. Into this we pass our fred object and inside the method we double the balance of the passed in object and then output that to screen. We then output fred to screen again to see if that has been affected and we see that it has. Hopefully this should be as expected since when we passed fred into the method then c within the method was now pointing at the same heap memory which fred uses and so any changes to c are in effect changes to fred!
So this is really just a logical extension of what we established earlier about reference and value types.
There is one more point to mention concerning passing value types into methods as parameters. Sometimes because of the nature of the design of our methods, we may wish to pass in a value type to a method but have it behave as if it were a reference type - ie have any changes to that variable made within the method direcly change the variable which was passed into the method. If you really want to do this then there is a way we can make this happen.
To do this we have to use the ref keyword as follows:
Doing this makes the variable behave as if it were a reference type and so any changes to a within the method will in fact actually be changes to x itself.
We can confirm this by examining the output which now looks like this:
We can use another keyword called out instead of ref which does exactly the same thing other than the variable being passed in does not need to have been initialised whereas with ref it does.
Ok so all of this has been a look at the background theory of value and reference types and now we are happy with that we can get to the main point of the article which is Immutable types, in particular strings and delegates.
So what is an immutable class?
Well if we lookup immutable in a dictionary we get:
im⋅mu⋅ta⋅ble
not mutable; unchangeable; changeless. |
So basically an immutable class is a class which one would expect to be a reference type and behave as such but which doesn't do so. These immutable classes are reference types in that they are stored on the heap but when their value is changed, it doesn't change it's existing memory on the heap but rather creates a whole new copy of itself but with the new values in. It then updates the corresponding pointer on the stack to point to this new area of memory on the heap.
The result of this is that when you use these in ways simillar to our experiments then they will behave more like value types, which if you don't know about it can be very confusing!
Two famous types of Immutable classes are strings and delegates so we will look at some experimental code with these to see how they behave. Strings in particular can be very confusing particularly to new programmers as they are never sure if they are classes(reference types) or structs(value types).
Classes are normally reference types and structs are normally value types. Things like ints and dates are structs.
If you are unsure about structs or delegates then it maybe a good idea to have a look at the following articles first
Structs - http://www.audacs.co.uk/ViewPage.aspx?PageID=486
Delegates - http://www.audacs.co.uk/ViewPage.aspx?PageID=474 & http://www.audacs.co.uk/ViewPage.aspx?PageID=476
Ok so lets have a look at a little experiment program with strings. Consider the following:
Running this should give the following results:
So first of all we declare a string variable called a and assign this a value of "ABC". We then create another string variable called b and assign this equal to a.
We then output the value of a and b to see they are the same and also use ReferenceEquals to see if they really do point to the same area of memory on the heap which they do so this really is a class behaving with its main memory on the heap. You can even prove this is a class by right clicking on the word string in Visual studio and clicking on goto definition.
So next we set the value of b to "DEF" which if this was a normal behaving class then this would also set the value of a to "DEF". However we now know that strings are immutable. To prove this we use ReferenceEquals again and also output the values of a and b and this shows us that a and b are now not the same instance. This is because changing the value of b caused b to be given a new area of memory on the heap to store the new "DEF" data. Then b had its stack variable changed so that it points to this new heap memory. Meanwhile a is still pointing at it's own original heap memory.
Ok so now lets look at a simillar experiment but this time with delegates as shown below:
Output from this program should look like this:
Ok so what's happening in this program? Well we declare a delegate called MyDelegate which has a signature of returning void and taking no parameters.
We then create an instance of this delegate called d and give it a reference to a method called Test. We then create another instance of a MyDelegate called e and assign this as equal to d. We then use ReferenceEquals to see if they are the same instance - ie point to the same area of memory on the heap and in fact they do. This of course would be the case for all classes at this point in the program regardless of whether they are immutable. But now we add another method to the invocation list of e - so if we were to invoke e now then it run the Test method and the AnotherTest method.
We then use ReferenceEquals again to see if d and e are still the same instances and now they are not and this is for exactly the same reason as the strings behaviour in our last example. The delegate e has been given a new area of the heap memory and had its variable on the stack point to the new heap address. Meanwhile d still points to the old memory address.
We then use some code to output the contents of the invocation list of both d and e as a further test and as you would suspect d just contains a reference to the Test method whereas e has a reference to the Test method and the AnotherTest method so we can deduce that delegates are immutable.
A few final things to mention are firstly a list of some of the common value types within .net as follows:
- sbyte
- short
- int
- long
- byte
- ushort
- uint
- ulong
- float
- double
- decimal
- char
- bool
Also what we haven't mentioned here is why we would have an immutable type?
Reasons for having immutable types are I think mainly to do with simplifying multi threaded operations but if you can shed more light on this then please feel free to comment via the form below.