Hope you all are good. If you are new to this post, I would recommend to red my other post too from the Internals series. In this series I am trying to cover the basic C# programming and also relate them with the compiled MSIL. In my previous post while I am going through with the internals of foreach loop I told you to cover more on iterators in my next post. It is time to cover the basis on which the C# IEnumerable stands and the iterators.
If you ask me why I like to work on C# not VB.NET or other languages, I would point to some of the flexibilities that I get with C#. Even though in VB.NET vNEXT, iterators are coming into being but still C# is the primary language which introduced yield.
In this post I am going to demonstrate the basic feature behind the C# iterators and also introduce you the secret behind the yield keyword of C#.
The Basics
Before we start with C# iterators, let me explain what an iterator means exactly. Surprisingly there are many who knows nothing about IEnumerable, the next section is for them. If you already know about IEnumerable and IEnumerator, please skip the next paragraph and read ahead.
IEnumerable and IEnumerator
C# comes with 2 basic interfaces namely IEnumerable, and IEnumerator which represents the base for any collection. IEnumerable is an interface that defines a GetEnumerator which gets an IEnumerator. An IEnumerator on the other hand provides a simple iteration over a collection. Using the interface ensures that you could use this collection in foreach loop of C# or ForEach in VB.NET. If you look back to MSDN, it says :
IEnumerator is the base interface for all enumerators. Enumerators only allow reading the data in the collection. Enumerators cannot be used to modify the underlying collection.
Initially, the enumerator is positioned before the first element in the collection. Reset also brings the enumerator back to this position. At this position, calling Current throws an exception. Therefore, you must call MoveNext to advance the enumerator to the first element of the collection before reading the value of Current.
Almost all the collection in .NET class library is derived from IEnumerable and hence you can iterate through the collection it internally holds and use it. To know more about these please go through my previous post on Loops and move to Foreach section.
Iterator
Iterators in C# is one of the best features of all times. C# 2.0 comes with a new keyword called yield which lets you generate an IEnumerable instantly. Iterators in C# is actually a method or get accessor of a property which returns IEnumerable without letting you manually create the whole enumerable and enumerator yourself. The C# iterator block invokes an yield return to return each individual element of the block and yield break to end the enumerator. The return type of the iterator method is IEnumerable or IEnumerator which represents their actual implementation.
Let me put a sample iterator implementation
class Program { static void Main(string[] args) { var enumerable = new Program().GetEnumerated(10, 20); Console.WriteLine("After I got the Enumearable"); foreach (int i in enumerable) { Console.WriteLine("Got i = {0}", i); Thread.Sleep(10); } Console.Read(); } public IEnumerable<int> GetEnumerated(int start, int end) { Console.WriteLine("Starting Enumerating!!!"); Stopwatch watch = new Stopwatch(); watch.Start(); for (int i = start; i <= end; i++) { Console.WriteLine("Value of watch = {0} before yield", watch.ElapsedTicks); yield return i; Console.WriteLine("Value of watch = {0} after yield", watch.ElapsedTicks); } watch.Stop(); } }
In this implementation I am using a stopwatch to see what happens in background. Lets see the output for the code above :
In the output console you can see the console prints "After I got the Enumerable", that means the function actually returns immediately after the call is made? Yes you are right. So to get an enumerable it doesnt need to enumerate the whole collection within the property. Now pointing to the next lines, you can see after it gets the value of i, the value of watch increases considerably. That means the method finds the yield and stops the execution and again waits for the enumerator to call its next value. Hence you can have your iterator running as the program goes, you can store the IEnumerator to fetch the data whenever it is required.
The Internals
In fact the C# iterator internally holds a state machine for each iterator. The state machine is actually a CompilerGenerated class which is capable of storing the local variable as properties of the class, the execution point as delegate etc. Thus the state machine allows you to Pause and Resume execution of the block as and when required.
This is very cool concept. Let me demonstrate the fact with an example :
public IEnumerable<int> GetFirst10Nos() { for (int i = 0; i < 10; i++) yield return i; }
This is the most simple method which returns the first 10 numbers starting from 0. Now lets see how it looks like after compilation :
// Methods public IEnumerable<int> GetFirst10Nos() { <GetFirst10Nos>d__0 d__ = new <GetFirst10Nos>d__0(-2); d__.<>4__this = this; return d__; } // Nested Types [CompilerGenerated] private sealed class <GetFirst10Nos>d__0 : IEnumerable<int>, IEnumerable, IEnumerator<int>, IEnumerator, IDisposable { // Fields private bool $__disposing; private bool $__doFinallyBodies; private int <>1__state; private int <>2__current; public Iteratordemo <>4__this; private int <>l__initialThreadId; public int <i>5__1; // Methods [DebuggerHidden] public <GetFirst10Nos>d__0(int <>1__state) { this.<>1__state = <>1__state; this.<>l__initialThreadId = Thread.CurrentThread.ManagedThreadId; } private bool MoveNext() { bool CS$1$0000; try { this.$__doFinallyBodies = true; if (this.<>1__state == 1) { goto Label_0068; } if (this.<>1__state == -1) { return false; } if (this.$__disposing) { return false; } this.<i>5__1 = 0; while (this.<i>5__1 < 10) { this.<>2__current = this.<i>5__1; this.<>1__state = 1; this.$__doFinallyBodies = false; return true; Label_0068: if (this.$__disposing) { return false; } this.<>1__state = 0; this.<i>5__1++; } this.<>1__state = -1; CS$1$0000 = false; } catch (Exception) { this.<>1__state = -1; throw; } return CS$1$0000; } [DebuggerHidden] IEnumerator<int> IEnumerable<int>.GetEnumerator() { if ((Thread.CurrentThread.ManagedThreadId == this.<>l__initialThreadId) && (this.<>1__state == -2)) { this.<>1__state = 0; return this; } Iteratordemo.<GetFirst10Nos>d__0 d__ = new Iteratordemo.<GetFirst10Nos>d__0(0); d__.<>4__this = this.<>4__this; return d__; } [DebuggerHidden] IEnumerator IEnumerable.GetEnumerator() { return this.System.Collections.Generic.IEnumerable<System.Int32>.GetEnumerator(); } [DebuggerHidden] void IEnumerator.Reset() { throw new NotSupportedException(); } [DebuggerHidden] void IDisposable.Dispose() { this.$__disposing = true; this.MoveNext(); this.<>1__state = -1; } // Properties int IEnumerator<int>.Current { [DebuggerHidden] get { return this.<>2__current; } } object IEnumerator.Current { [DebuggerHidden] get { return this.<>2__current; } } }
Well, basically the compiler generates a type for holding the state machine for you. The type is generated in such a way so that it implements the IEnumerator so that it can produce the iterators and hold the state of the method within itself. Let me explain few methods for you :
- Our method actually creates a nested class
d__0 which holds the state machine and also implements IEnumerable and IEnumerator. Once our method is called, it creates a new object of it and returns back the object. As the class implements the IEnumerable, it doesnt produce any problem. I should remind, no code from our method is still executed yet. - Initially when we use the IEnumerable in foreach loop, it internally calls the GetEnumerator. If you see closely this method checks if the call is made from the current Thread or not and also checks for the state to be -2. You can see, while creating the object, it passes the state as -2. Hence to conclude, the GetEnumerator always creates a new object of Enumerator if the call is made either for the first time, or through a different thread than which owns it. You should note, while creating the object from GetEnumerator,the object is initialized to 0, which states that the enumerator is initialized.
- MoveNext, being the important part of the object, actually checks the value of the state, to indicate the various stages of the object.
- 0 represents before calling MoveNext
- -1 end of the enumerator, returns false.
- -2 represents no enumerator is fetched. (before call to GetEnumerator)
- 1 represents the enumeration in running, sets the value of this.<>2__current and returns true.
- Now as for each request to MoveNext the state is checked and the initial GoTo statement moves the control to Label_0068: , the object keeps on running our code and start producing numbers.
- Finally when the while loop fails condition, the state is set to -1 and the execution terminates.
So, the state machine object is capable of producing numbers and also to pause and resume the method.
Member variables of State Machine Represents :
- Locals, Parameters etc are creates as members variable, such that local variable i is represented as <i>5__1.
- Two boolean variables to hold the state of disposing and finally execution $__disposing and $__doFinallyBodies.
- Current value of the object in <>2__current.
- State in which the object is (Even though the state is not given any enumerated names). <>1__state
- Stores object which invokes the iterator, <>4__this.
Conclusion
Well, C# iterators are by far the best thing in .NET language. It is really tedious to build each enumerator by hand. Linq and other language feature extensively uses this feature to achieve the goal of making C# more reliable yet simple to write on. I tried to demonstrate the fact behind all that occurs for iterators. I hope you like this post and also read my other post on Internals to .NET.
Thanks for reading.
I too agree. iterator is nice design pattern. but I observed some of the concerns. here I've documented them.
ReplyDeletehttp://www.codekicks.com/2011/01/try-to-avoid-foreachfor-loops.html
@Dutt
ReplyDeleteYes, but for loop is just a normal jump statement and it differs completely with foreach.
Foreach is actually working on state machine. It is bound to be slower than normal for loops.
You can see the exact code of ForEach from this post.
http://www.abhisheksur.com/2011/01/internals-of-loops-while-for-and.html
thanks :)
@Dutt
ReplyDeleteTelling people to avoid foreach loop is terrible advice. As noted by many people in your post's comments, your test is flawed. You didn't actually fetch the values of the array in your for loop, for a start.
I don't see why is "the best" thing ?
ReplyDeleteWhat would happen if two functions or two distinct threads call the loop of iterator ?
What would give me for speed ?
@dimitris
ReplyDeleteHmm dimitris,
you have a point. Foreach actually handles cross Thread calls. If you like to store the IEnumerator and use the same for more than once without disposing it, you need to take care of these scenarios.
For those who dont want to deal with the complexities its better to use Foreach instead.
this is help for understand this concept....
ReplyDelete