Sunday, December 19, 2010

ADO.NET - Some Internals Uncovered

During the early phaze of your career, when you just have started learning about .NET technology, you might have came across the sentence "ADO.NET is a disconnected database architecture". Well, its true, but how? How can ADO is transformed to such an architecture which no other database architecture supports. Every other database programming supports only RecorsSet which you need to use to get Data from the database.  Did you ever thought of it ? If you don't, it is time to rethink of it now.

In this post, I will uncover few internals of ADO.NET architecture, and also refer to my favorite tool Reflector to see the code that is written inside the .NET classes. My focus in the post is only on the internals of the architecture, so if you  are new to ADO.NET, it would not be a good idea to read it over and confuse yourself more, rather read thorough Overview of ADO.NET and come back later.


Introduction

Well, In fact, I didn't ever thought of writing such a post. Strangely, while talking with Devs I found out people commenting that ADO.NET is a new way of accessing data missing out the point where ADO.NET differs with ADO. It is true, ADO.NET is disconnected. There are large performance gain of an application which releases the Connection after fetching data from the database. ADO.NET have inherent capability to release the database connection, after the so called DataSet / DataTable is filled with data which you might be looking for. There are few Adapters provided with .NET library which lets you Fill these DataSet or DataTable when you want and later you can use these cached data to your application.  Yes every line here is a fact. But internally there is still a Reader associated with every call. Let me detail the fact a little.


Details of Loading Data using Select Query

Say you invoke a Selection operation on the Database using Select Query. Lets jot down the steps what happens internally :



  1. We create an instance of DataAdapter and call Fill method to fill in the data. 
  2. Fill method is defined within DbDataAdapeter is called.
  3. It sends the Sql script to the Database process and DataBase loads the data to a memory.
  4. ADO.NET gets the pointer to the first row of Data being loaded from the SQL server process. 
  5. DbDataAdapter.Fill internally creates a DataReader to read the Data from the Memory location.
  6. The DataReader reads Data sequentially and creates DataRow for each Record. 
  7. DataRow are enumerated into a DataTable and the final Result is returned back to you. 
  8. After it returns the DataTable ( A cached version of RecordSet ) you can eventually release the connection and can work with the Data.
Thus from the above steps, you must be clear about the fact that there is nothing magical happening inside the architecture of ADO.NET, rather it uses the same way of loading data into .NET objects which eventually lets you manipulate the data without letting the connection to remain active for long.  So as you now know about the steps, let us take you deep into Reflector to see how the code looks like :



The Code looks simple enough. As we pass the select Query as SelectCommand in DbDataAdapter, it in turn create a Native Scope to run your code and place the same into the DataTable. The code Bid.ScopeEnter actually creates a scope for the Command API. The FillCommandBehavior is an internal enumeration which identifies how the data to be fetched. Now let me move ahead to this.Fill.


Hmm, The Fill seems to be bit fishy. It does nothing, but checks the parameter it receives, does some manual checks and finally calls FillInternal to get the result. So basically, there is nothing to look at this method, and let me quickly jump to FillInternal now.


There are quite a number of things that is going on in FillInternal. First of all, it is an internal method and is not exposed to the outer world, and it actually creates the Data in a DataTable and returns the result. In between, it opens the Connection to DataBase and finally closes it.

If you see the code above, it first Gets the connection from command object and tries to Open it. QuietOpen is a private static method which just set the ConnectionState to Open. The QuiteClose on the other hand closes the connection (as marked in blue).

The most interesting part of the code is the DataReader. Basically, after the connection is opened, ADO.NET actively creates a Reader object and Execute the Reader to create the DataTable. The command.ExecuteReader actually invokes the command we pass to the database, creates a DataStream which points to the base location of the loaded memory (in which the resultant data is stored), creates an object to wrap each and every information and return the Reader object. So now after this line, the reader will load the DataTable using this.Fill method. If you look into this Fill overload, you will find it actually calls FillFromReader which again call FillLoadDataChunk and eventually we come to FillLoadDataRow. This method is one which we find interest. Lets look at the code first :


So basically ExecuteReader loas each DataRow from the DataStream for each Record, and adds the same to the contextual DataTable.

Hence, coming to the conclusion, ADO.NET is actually a wrapper to existing technology available to allow the developer to get Cached Managed Object(DataTable) after loading the data loaded from SQL server leaving the Connection closed after DataTable is totally loaded.

Choice between DataReader and DataTable

There is always a dilemma between the choice of DataTable and DataReader amongst Developers. Some thinks DataReader is faster as you are getting the data immediately, while some argue for DataTable as being a total managed data object. Well, I must say each of them have its own merits and demerits.
  • You should choose DataReader over DataTable when your requirement to a data is somewhat limited to 1 source. Say for instance, while generating an ASP.NET Grid. Thus once the Reader reads the whole data the html is produced immediately, hence you eventually removed the loops and checks that is imposed (as shown above) during the creation of DataTable.
  • You should choose DataTable over DataReader when you have to do random analysis on the Data retrieved from the database and the Data representation is not straight forward. Even if you want to feed the same Data into multiple objects or want to show aggregation of data, it would be a good idea to use DataTable (and we can neglect initial data load)
Conclusion

Even though there are lot of things left out from the post, the main idea to let you know how ADO.NET releases connection and the difference between DataReader and DataTable, which I think is clear enough. I would also like to notify that internally everything runs the same way as we think of and there is nothing magical happening in between. Even though it is very old topic, I hope most of you like it.

Thank you for reading.

No comments:

Post a Comment

Please make sure that the question you ask is somehow related to the post you choose. Otherwise you post your general question in Forum section.

Author's new book

Abhishek authored one of the best selling book of .NET. It covers ASP.NET, WPF, Windows 8, Threading, Memory Management, Internals, Visual Studio, HTML5, JQuery and many more...
Grab it now !!!