Recently I have been talking with somebody regarding the actual difference between the C++ type system and managed C# type system. I fact the CLR Type system is different from the former as any object (not a value type) is in memory contains a baggage of information when laid out in memory. This makes CLR objects considerable different from traditional C++ programs.
Classification of Types
In .NET there are mainly two kind of Types.
- Value Types (derived from System.ValueType)
- Reference Type (derived directly from System.Object)
Even though ValueTypes are internally inherited from System.Object in its core, but CLR treats them very differently. Indeed from your own perception the Value Types are actually allocated in stacks (occationally) while reference types are allocated in Heaps. This is to reduce the additional contension of GC heaps for Heap allocation, GC cycles, occasional call to OS for additional memory needs etc. The object that is allocated in managed Heap is called Managed Object and the pointer that is allocated in stack to refer to the actual object in heap is called Object Reference (which is sometimes called as Managed Pointer).
Additional to this basic difference a Value Type is treated completely different from CLR point of view. CLR treats any object that is derived from System.ValueType differently in respect of any other object derived from System.Object directly. The memory of a ValueType contains just the value of its fields and the size of the Value Type is just the addition to its content, while for reference types the size is completely different. Let us consider looking at the memory layout of both the types.
In case of Value Types, the Managed Pointer holds reference to the initial location of the actual Memory.Thus in this case, the Managed pointer holds reference to 0x0000 which is the address location of Field 1. Hence CLR needs to do pointer arithmetic to find Fields ... N. Thus we can easily use Sizeof operator on ValueTypes to get the actual size of the object.
The reference Types on the other hand holds some complex informations in its header.Lets define the individual blocks that comprises one object using a diagram.
In the diagram, I have depicted the entire layout of memory for a reference type. The initial Managed Pointer here for reference types holds the address of Reference to RTTI address (Run Time Type Information). The initial 4 bytes of the memory is allocated for Synchronization Block. In CLR every object holds its own lock information by itself using this storage. There is another important consideration that you need to think of, is every CLR object holds its Type information inside it. This ensures that every object can explain its own type from itself without any dependency from outside. Hence the reference types are Self Explanatory types and programs can use these information while casting, polymorphism, dynamic binding, reflection etc. Even though the Method Table structure reside outside the actual object, the RTTI Address holds the initial address of the Method Table runtime object which holds all the informations regarding the Type of the object. We query the information of the Runtime Type using the GetType method from any reference Type. The .NET runtime creates a special object of Type which helps to find out the actual type information.
On the other hand, the ValueTypes are simply a chunk of memory without any clue of what it acutally contains. This is the major difference between the two types.
During instantiation, a valuetype automatically calls its default constructor when it is declared. You cannot define default constructor for a ValueType. But language (like C#) puts additional restriction to ensure that the valuetype is initialized before it is used to save additional constructor calls.
ReferenceType on the other hand must have an object assigned to it using new operator. The new operator first allocates memory of its fields with default values and then calls the constructor.
Caveats of CLR Variables
- Value Type variable directly represents memory of a Stack.
- Reference Type represents a pointer (or probably we should call it as Reference) that refer to the start location of the object produced in Heap. The reference points to RTTI address location.
- CPU registers can hold managed pointers as well as managed objects. Hence in certain cases, you can either store your value type or a reference to a Object in CPU registers depending on your need.
- AppDomain wide managed Table contains all the references pointing to the managed object references that are marked by GC. It also holds static ValueTypes and static Reference Types.
Implicit Object Reference (this)
There is nothing new with "this". Any instance member have access to this pointer even though they are Value Type or a Reference. The method generally pass the object from which it is called as "this" as first argument of the call. Hence it is available inside any method. "this" pointer for ValueType points to the first instance field address location while the "this" pointer for the Reference Type points to the address of Method Table information.
How CLR Methods are called ?
There are two types of method being called in CLR. One with invoking call IL instruction which needs the current object to be loaded in stack before any other argument passed as parameters; and other by using CallVirt which is almost similar to call, but produces an additional instruction to validate the object reference. The call statement does not produce NullReferenceException to a call to the method, but passes null as "this" pointer. But eventually if there is any instruction that requires access to its field or any other method call, it will produce NullReferenceException. CallVirt directly produces NullReferenceException prior to the call to the method when the object does not assign anything.
When using Interface based methods, the ValueType generally needs to be boxed to produce Method Table information so that the Virtual Methods could be called by CLR. The Reference type does not bother to invoke callvirt which actually translates it based on the type of the runtime object rather than the original type it is called from (interface reference).
Delegates on the other hand is a special type that holds reference of methods. MSIL has two opcodes to deal with them, ldvirtftn that load virtually a method, and ldftn. The ldftn loads the method address into stack. The type of the method token loaded by this IL instruction can be searched to MethodTable of the type to get the actual address of the method. Delegates generally pass the object on which the member needs to be executed as Target and the method address stored in delegate instance to invoke the method. In case of Static method call the target is apssed as null. The process of retrieving an storing address to a delegate is expensive and is called delegate binding.
Delegates are derived from a special type called System.Delegate in .NET CLR. A delegate can hold multiple methods in a chain. You can add more methods of same signature to a delegate which will be invoked sequentially by calling the last method added to it first and thereby calling the first method that is added as the final method in delegate chain. The return statement of the final method call is actually passed to the caller.
Trying out some basic debugging with Son of Strike(SOS)
Before we start SOS debugging with Visual Studio, you should recollect that there are three important data structure that you need to keep in mind.
- MethodTable : Stores all information about a Type. Holds information regarding static data, table of method descriptors, pointers to EEClass, pointers to other Methods from other VTable and pointers to Constructors.
- EEClass : This is almost same structure as of Method Table, but holds more static data information.
- MethodDesc : Information regarding a particular method such as IL or JIT'ed informations.
Now let us consider a dummy class to start debugging.
public class MyClass { public static int RefCounter; private int age; public MyClass(string name, int age) { this.Name = name; this.Age = age; MyClass.RefCounter++; } private string name; public string Name { get { return name; } set { name = value; } } public int Age { get { return age; } set { age = value; } } public void GetNext(int age) { Console.WriteLine("Getting next at age" + age); } }
Here I have defined one static field, two member fields, a parameterized constructor and one instance method to start testing. We will use SOS to test instance of this code.
So as we know already when an object is created, the memory contains an object reference to the actual object placed in heap. Hence lets create an object of it in Main.
static void Main(string[] args) { MyClass obj = new MyClass("Abhishek", 28); obj.GetNext(20); }
To start debugging you need to enable native debugging. To do this, Right Click on project => Debug => Enable unmanaged code debugging. Now put a breakpoint on the first line and step into the constructor. Lets say I go until one instance field is loaded. Now open Intermediate Window and type .load C:\WINDOWS\Microsoft.NET\Framework\v4.0.30319\sos.dll. Please replace the exact location if folder structure differs.
Now once the extension is loaded we can examine the instance of MyClass.
We call !DumpStackObjects to dump the object that is loaded in memory. We see something like below :
These are the managed objects that are loaded into memory. Our concern is to see instance of MyClass. So copy the object reference handle of the corresponding Object of MyClass ( in our case it is 00c2c1d0). Now lets use !DumpObj 00c2c1d0
The instruction actually dumped the entire object with the address to MethodTable, EEClass and size of the object in bytes. You can see the size of the object in heap is 16 bytes. This is because the Objects in CLR holds more than the fields and members (like sync headers, method pointers etc.) It also lists the fields currently in memory. You can see Value for name has an address as we execute this after the instruction line name in constructor but before age. Now if I pass through all the lines of constructor it will show the memory address of all the members. You can use !DumpObject to play around the addresses of Name (00c2c1b0) or age or any instance member from here onwards.
Now lets try few more commands to get interesting behaviors. There are few commands that you can try. !CLRSTACK dumps the managed code on the stack on the CurrentThread. !DumpStack on the other hand dumps both managed and native stack. Lets try !CLRStack now, the output will be like this :
Here I have moved to the call to GetNext method. Hence you can see the GetNext and Main in stack for the command !CLRStack. With additional argument like -p -l produces the result with better output listing all the parameters passes to the method. You can see this represents the first parameter for the method GetNext, as I have told you which holds the object obj (in our case).
Similarly you can use !DumpStack to dump both the native and managed objects and !EEStack to execute !DumpStack on all threads.
To conclude,
This post is just the starting point of your understanding. There are lots of depth in this topic. You can try an excellent article on MSDN magazine here, which talks more on CLR object creation. If you want to know more on internals of .NET, you can also try my Internal Series here.
Thank you for reading, give your feedback.
Stay tune for more.
No comments:
Post a Comment
Please make sure that the question you ask is somehow related to the post you choose. Otherwise you post your general question in Forum section.