Pages

Saturday, October 16, 2010

Hidden Facts of C# Structures in terms of MSIL

Did you ever thought of a program without using a single structure declared in it ?  Never thought so na? I did that already but found it would be impossible to do that, at least it does not make sense as the whole program is internally running on messages passed from one module to another in terms of structures.

Well, well  hold on. If you are thinking that I am talking about custom structures, you are then gone too far around. Actually I am just talking about declaration of ValueTypes. Yes, if you want to do a calculation, by any means you need an integer, and System.Int32 is actually a structure. So for my own program which runs without a single ValueType declared in it turns out to be a sequence of Print statements. Does'nt makes sense huh! Really, even I thought so. Thus I found it is meaningless to have a program which just calls a few library methods (taking away the fact that library methods can also create ValueType inside it) and prints arbitrary strings on the output window.

Please Note : As struct is actually an unit of ValueType, I have used it synonymously in the post.

So what makes us use Structures so often? 

The first and foremost reason why we use structures is that
  • They are very simple with fast accessibility.
  • It is also created on Thread local stack, which makes it available only to the current scope and destroyed automatically after the call is returned. It doesn't need any external invisible body to clear up the memory allocated henceforth.
  • Most of the primitives have already been declared as struct.
  • In case of struct we are dealing with the object itself, rather than with the reference of it.
  • Implicit and explicit operators work great with struct and most of them are included already to the API.
Download Sample - 24KB

I agree, I can name 100 reason on the same, but lets focus on the topic I have started discussed. Lets make a bit practical and create a struct for you.

public struct DemoStruct
{
    int DemoIntItem = 0;
    string DemoStringItem = "Abhishek";
    public DemoStruct()
    {
        this.DemoIntItem = 20;
    }
    public DemoStruct(int loadDemoInt)
    {
        this.DemoIntItem = loadDemoInt;
    }

    public override string ToString()
    {
        return string.Format("DemoIntItem : {0} , DemoStringItem : {1}", this.DemoIntItem, this.DemoStringItem);
    }
}

Hmm, Just after I wrote this I go on compile this, and found a nasty error message comes out of the Compiler.


What is it? Where I am wrong? Ahh.. After seeing the message, I found that actually the culprit is you cannot have value initializers for a struct as we do have for classes. Yes, from the CLR point of view, it does not allow you to initialize the value of a variable. Then how would you initialize a member in a struct? Should I declare a constructor ? If you see the code above, you have already seen that I have tried to declare a constructor for my type DemoStruct, but alas, "Structs cannot contain explicit parameterless constructors".  This is weird.  I need to at least write a constructor with a parameter in it to initialize the members with my default values.

Does it mean structs already have a Default implicit constructor ?

Yes, as far as the MSDN is concerned, structs does implement a default parameterless constructor implicitly for you which automatically initializes the members to its default values. Actually, the fact is, structs does not have it mandatory rule to declare it with a new operator. And hence if you do not create an object of structure with a new operator, all the fields will be left unassigned. So the compiler does not need to have a default constructor for you, and hence the implicit constructor is not required at all.
Lets redeclare the same structure again :

public struct DemoStruct
{
    int DemoIntItem;
    string DemoStringItem;
    //public DemoStruct()
    //{
    //    this.DemoIntItem = 20;
    //}
    public DemoStruct(int loadDemoInt)
    {
        this.DemoIntItem = loadDemoInt;     
    }

    public override string ToString()
    {
        return string.Format("DemoIntItem : {0} , DemoStringItem : {1}", this.DemoIntItem, this.DemoStringItem);
    }
}

So after commenting out a few things it looks good to me. Now if I compile..... holy s***.. it gives me error again....


So it says, you need to initialize all the members before returning from the constructors. Hmm, that means, if you declare a constructor, you need to reassign everything again. Well, it makes sense to call the default implicit constructor to do this for me.

Lets redesign the structure once again :

public struct DemoStruct
{
    int DemoIntItem;
    string DemoStringItem;
    //public DemoStruct()
    //{
    //    this.DemoIntItem = 20;
    //}
    public DemoStruct(int loadDemoInt)
        :this()
    {
        this.DemoIntItem = loadDemoInt;
    }

    public override string ToString()
    {
        return string.Format("DemoIntItem : {0} , DemoStringItem : {1}", this.DemoIntItem, this.DemoStringItem);
    }
}

Now finally it works. So this() will initialize all the members of the object. We need to initialize them as initializer for structures is not there as it is there for classes.

Lets call the structure :
static void Main(string[] args)
{
    DemoStruct mystruct = new DemoStruct(30);
    Console.WriteLine(mystruct);

    Console.ReadKey();
}

So clearly we call new DemoStruct to create the object of structure DemoStruct. Hence in this case our own parametrized constructor will be called. We could have also done something like this :
static void Main(string[] args)
{
    DemoStruct mystruct;
    Console.WriteLine(mystruct);

    Console.ReadKey();
}

In this case also, the structure will call the implicit default constructor and get the object automatically.

Difference between a class and a structure in terms of IL 

Let us not talk in terms of general differences between a class and a structure. It is very common and talked many times. I will look into the differences in terms of IL. To see the IL, I am going to use ILDASM. Probably this tool is already available with Visual Studio which lets us to look into IL. Lets declare a DemoClass with the same structure with that of DemoStruct and see how the MSIL generated looks like.


In the above tree structure you can see the structure of the two objects. One is for DemoClass which is exactly the same while the other is DemoStruct which is a structure.

Now lets differentiate the two structure one by one.

IL for DemoClass

.class public auto ansi beforefieldinit TestConsoleApps.DemoClass
    extends [mscorlib]System.Object
{
    .field private int32 DemoIntItem
    .field private string DemoStringItem

    .method public hidebysig specialname rtspecialname instance void  .ctor(int32 loadDemoInt) cil managed
    {
        // Code size       17 (0x11)
        .maxstack  8
        IL_0000:  ldarg.0
        IL_0001:  call       instance void [mscorlib]System.Object::.ctor()
        IL_0006:  nop
        IL_0007:  nop
        IL_0008:  ldarg.0
        IL_0009:  ldarg.1
        IL_000a:  stfld      int32 TestConsoleApps.DemoClass::DemoIntItem
        IL_000f:  nop
        IL_0010:  ret
    }
}

IL for DemoStruct
.class public sequential ansi sealed beforefieldinit TestConsoleApps.DemoStruct
       extends [mscorlib]System.ValueType
{
    .field private int32 DemoIntItem
    .field private string DemoStringItem
    .method public hidebysig specialname rtspecialname 
        instance void  .ctor(int32 loadDemoInt) cil managed
    {
        // Code size       17 (0x11)
        .maxstack  8
        IL_0000:  ldarg.0
        IL_0001:  initobj    TestConsoleApps.DemoStruct
        IL_0007:  nop
        IL_0008:  ldarg.0
        IL_0009:  ldarg.1
        IL_000a:  stfld      int32 TestConsoleApps.DemoStruct::DemoIntItem
        IL_000f:  nop
        IL_0010:  ret
    }
} 

Based on the two IL above you can see, both the objects produces a public ansi object where one extends System.ValueType while the other(DemoClass) directly extends System.Object. The difference that we could see in the class header is :

  1. DemoClass is declared as auto while DemoStruct is created as sequential. Auto is used to impose the object to have full Garbage collection and also allows the object to allow reducing the size of it when not in use. Well Auto allows the loader to change the layout of the class which it sees fit. That means the order of the members will not be kept intact while the object is created. It is also going to ignore any layout information for the class mentioned explicitly. Sequential objects are aligned with the object memory boundary. It maintains the order in which the fields are emitted. The offsets are calculated by the CLI .Each of them points individually to the memory allowed for it.
  2. DemoStruct is declared as sealed (Hence this disallows the struct from being inherited)
 On the other hand if you differentiate the IL for the constructors, you can see only one difference :
The DemoClass uses
call  instance void System.Object :: ctor()
that means the constructor for Object is called. So basically the System.Object constructor is called and an object from same is created before the variable are initialized.

DemoStruct on the other hand uses InitObj which actually initializes each member individually with its default.

Quick look on Calls

Finally if you look into the IL for the calls made from Main method it looks like :

.method private hidebysig static void  Main(string[] args) cil managed
{
  .entrypoint
  // Code size       45 (0x2d)
  .maxstack  2
  .locals init ([0] valuetype TestConsoleApps.DemoStruct mystruct,
           [1] class TestConsoleApps.DemoClass myclass)
  IL_0000:  nop
  IL_0001:  ldloca.s   mystruct
  IL_0003:  ldc.i4.s   30
  IL_0005:  call       instance void TestConsoleApps.DemoStruct::.ctor(int32)
  IL_000a:  nop
  IL_000b:  ldc.i4.s   30
  IL_000d:  newobj     instance void TestConsoleApps.DemoClass::.ctor(int32)
  IL_0012:  stloc.1
  IL_0013:  ldloc.0
  IL_0014:  box        TestConsoleApps.DemoStruct
  IL_0019:  call       void [mscorlib]System.Console::WriteLine(object)
  IL_001e:  nop
  IL_001f:  ldloc.1
  IL_0020:  call       void [mscorlib]System.Console::WriteLine(object)
  IL_0025:  nop
  IL_0026:  call       valuetype [mscorlib]System.ConsoleKeyInfo [mscorlib]System.Console::ReadKey()
  IL_002b:  pop
  IL_002c:  ret
}

Here the entrypoint is declared for the Main method. It creates local instance for both mystruct and myclass but uses call to call .ctor of structure while use newobj for class. newObj is used to instantiate a new ValueType to hold the reference of the object created. Thus the reference will be stored in stack.


Download Sample - 24KB

Conclusion

To conclude, structure in C# is basically similar to classes but with restrictions imposed in them to work the best with ValueTypes. Features like inheritance, parameterless constructors, initializers are intentionally revoked from struct to work the best for ValueTypes and with P/Invoke statements. Its fun writing this post. I hope you have also liked it.

Thanks for reading

4 comments:

  1. > Auto is used to impose the object to have full Garbage collection and also allows the object to allow reducing the size of it when not in use.

    auto has nothing to do with garbage collection, and the rest of the description is not very relevant.

    CIL is defined in ISO/IEC 23271 and ECMA-335.

    ECMA-335, page 61
    autolayout:: A class marked autolayout indicates that the loader is free to lay out the class in any way it sees fit; any layout information that might have been specified is ignored. This is the default.

    > Sequential objects are aligned with the object memory boundary. Each of them points individually to the memory allowed for it.

    Sort of, but still seems to missing the point.

    ECMA-335, page 61
    sequentiallayout: A class marked sequentiallayout guides the loader to preserve field order as emitted, but otherwise the specific offsets are calculated based on the CLI type of the field; these can be shifted by explicit offset, padding, and/or alignment information.

    ReplyDelete
  2. @Annonymous,

    Thank you for your nice comment. I was also in a confusion with these. Thanks for clearing me out.
    :)

    ReplyDelete
  3. You are the mastering of English language!

    ReplyDelete
  4. @Andy,

    Well Good joke buddy... :D :D

    Glad you liked my language, but you wont believe English is not my mother tongue.

    Cheers

    ReplyDelete

Please make sure that the question you ask is somehow related to the post you choose. Otherwise you post your general question in Forum section.