Sunday, October 26, 2008

Best Practices of Memory Usage

Lets talk about memory management in practical sense. While we do programming, we often do use of memory in excess of what we need. Generally memory is cheap when you are working with Desktop applications, but while you are doing an ASP.NET application that handles lots of memory of server, excess use of memory and Session may sometimes bring with lots of pain. So let us discuss about best practices of Memory Management so that we can reduce memory wastage.

There is some odd behaviour of the programmers to give memory to the member variables inside a class. This is really odd, because this may sometimes loose extra amount of memory usage unnecessarily. Just take a look on the code below:

public class BadUse
{
private SqlConnection con = new SqlConnection();
private DataSet ds = new DataSet("MyData");

public BadUse() {}
public BadUse(string connectionString)
{
SqlConnection = new SqlConnection(connectionString);
}
public BadUse(SqlConnection con)
{
this.con = con;
}
}
If you see the code above, we are definately loosing unnecessary memory of our system. For every class before any calls been made, even before the calls to the constructors, object member initializer is called. Which executes and gives memory to all the member variables. Now in the class demonstrated avove, we are making an object of SqlConnection during the initialisation. After that, we are either calling the default constructor or creating object within the constructor. Thus without making use of the already created object, I am creating object again, and thus loosing memory.
Best Practice :

public class GoodUse
{
private SqlConnection con = null;
private DataSet ds = null;

public SqlConnection Connection // Better to use Properties
{
get
{
if(this.con == null) // Always check whether there is an existing object assigned to member
this.con = new SqlConnection();
return this.con;
}
set
{
if(value == null || this.con !=null)
{
this.con.dispose(); // Clears out Existing object if member is assigned to Null
this.con = null; // Always better to assign null to member variables
}
if(value !=null) this.con = value;
}
}
public GoodUse() {}
public GoodUse(string connectionString)
{
this.Connection = new SqlConnection(connectionString); //Assignes new object to null member
}
public GoodUse(SqlConnection con)
{
this.con = con;
}
}

Thus from the above code we are clear, it is always better to have properties rather than accessing objects directly. This gives you an interface to modify each calls later. Similar to this, it is always better to use Event Accessors for accessing Events.
private MyDelegate MyEvent;
public MyDelegate CheckEvent
{
add
{
lock(this); // Better to invoke a lock before adding EventHandler to the Event
MyEvent + =value;
}
remove
{
lock(this); // Use lock before removing Event Handler also
MyEvent -= value;
}
}
In case of VB.NET we have a third block too, for RaiseEvent, which will be invoked whenever some Event is raised from within the code.

Use Using and Try/Catch block for Resource Cleanups
Going like this, It is always better to use Using block whenever you use Disposable Objects. In case of all constructs .NET provides, Try /Catch block and Using block generally calls Dispose() function automatically whenever object which implements IDisposable comes out of the block. Thus use of Try/Catch block and Using block is always better in .NET. See the example below:

public void Execute(string connectionstring, string sql)
{
SqlConnection con = new SqlConnection(connectionstring);
SqlCommand cmd = new SqlCommand(sql, con);
con.Open();
cmd.ExecuteNonQuery();
cmd.Dispose();
con.Dispose();
}


In the above code snippet, we are simply creating an object of SqlConnection and SqlCommand. It is true that both objects implements IDisposable. Thus it is better to rewrite the code like below:
public void Execute(string connectionstring, string sql)
{
using(SqlConnection con = new SqlConnection(connectionstring))
{
using(SqlCommand cmd = new SqlCommand(sql, con))
{
con.Open();
cmd.ExecuteNonQuery();
}
}
}

Thus rewriting like this will automatically call Dispose method, but we dont need to call it directly. Therefore, it is better to make use of Using statement for quick resource deallocation.

You can also use Try/ Catch block similar to this as below
try
{
SqlConnection con = new SqlConnection(connectionstring);
try
{
SqlCommand cmd = new SqlCommand(sql, con);
con.Open();
cmd.ExecuteNonQuery();
}
catch {}
finally
{
cmd.Dispose();
}
}
catch(){}
finally
{
con.Dispose();
}
}


Next, it is always better to use "as" or "is" rather than casts. Means while we want convert types we should use "as" keyword rather than implicitely typecasting.

object o = new SqlConnection();
SqlConnection con = o as SqlConnection; // Better to use this
SqlConnection con = CType(o, SqlConnection); // Not always better

In the above two statements, if you use the second one for conversion rather than opting for the first, it will throw error if Ctype cannot convert object ot that type and also if there is null in o. But in case of using 'as' statement it will not throw error, but rather it will assign null to con.

Use Structure while calling a Function

Good to call functions with small numbers of arguments. Generally it takes a lots of time to send multiple arguments rather than sending a large object directly to the function. Try creating a Structure for all those arguments that you want to send, and send the structure directly. As structures are sent using value type, we can also minimize boxing.
public void Callme(int x, int y, string zy)
public void Callme(argumentStruct st) // Better in performance

Thus it would be always better to send a structure rather than discrete objects.

Better to have one Large Assembly rather than having a number of Small Assemblies

Similar to what I have told you earlier, it will be a good practice to have one large assembly with lots of namespaces in it rather than creating a number of small class libraries, one for each namespaces. Even microsoft does this by creating all assemblies within mscorlib.dll, thus reducing load of metadata, JIT compile time, security checks etc.

Better to avoid Threading if it is not unavoidable

Generally use of many threads may lead to lack of performance as each threads takes a lot of memory from the main process to run independently. Is it seem strange to you? Its true. In cases when you need quick processing, you can use threading, but it will increase memory consumption.
Do use of ThreadPool when you create Threads.

Avoid use of ArrayList or HashTables, rather go for Linked Arrays when you need to insert data randomly

Even like you, I am also surprized to say this. Actually, if you see the internal structure of an ArrayList or HashTables, they are just a wrapper of Array. Whenever you insert an object to these structure, it redims all the allocations, and shifts them manually. ArrayList is an Array of objects while HashTable is an Array of Structure.
Another strange thing is, for ArrayList or HashTables, Extents are made in modulus of 4. That means whenever it needs memory it always allocates in a multiple of 4. LinkLists, Generic Lists, LinkedArrays are always better in performance than Collection Objects when you need random insertion. Collections are better when you need to just add data and show data in sequence.

I will talk about memory management more, but need some more experience for writing. Thanks for reading.

Thursday, October 9, 2008

Memory Management in .NET

In .NET memory is managed through the use of Managed Heaps. Generally in case of other languages, memory is managed through the Operating System directly. The program is allocated with some specific amount of memory for its use from the Raw memory allocated by the Operating system and then used up by the program. In case of .NET environment, the memory is managed through the CLR (Common Language Runtime) directly and hence we call .NET memory management as Managed Memory Management.


Allocation of Memory

Generally .NET is hosted using Host process, during debugging .NET creates a process using VSHost.exe which gives the programmer the basic debugging facilities of the IDE and also direct managed memory management of the CLR. After deploying your application, the CLR creates the process in the name of its executable and allocates memory directly through Managed Heaps.

When CLR is loaded, generally two managed heaps are allocated; one is for small objects and other for Large Objects. We generally call it as SOH (Small Object Heap) and LOH (Large Object Heap). Now when any process requests for memory, it transfers the request to CLR, it then assigns memory from these Managed Heaps based on their size. Generally, SOH is assigned for the memory request when size of the memory is less than 83 KBs( 85,000 bytes). If it is greater than this, it allocates memory from LOH. On more and more requests of memory .NET commits memory in smaller chunks.

Now let’s come to processes. Generally a process can invoke multiple threads, as multi-threading is supported in .NET directly. Now when a process creates a new thread, it creates its own stack, i.e. for the main thread .NET creates a new Stack which keeps track of all informations associated with that particular thread. It keeps informations regarding the current state of the thread, number of nested calls etc. But every thread is using the same Heap for memory. That means, Heaps are shared through all threads.

Upon request of memory from a thread say, .NET allocates its memory from the shared Heap and moves its pointer to the next address location. This is in contrast to all other programming languages like C++ in which memory is allocated in linked lists directly managed by the Operating system, and each time memory requests is made by a process, Operating system searches for the big enough block. Still .NET win32 application has the limitation of maximum 2GB memory allocation for a single process.

32 bit processors have 32 bits of address space for locating a single byte of data. This means each 2^32 unique address locations that each byte of data can locate to, means 4.2 billion unique addresses (4GB). This 4GB memory is evenly distributed into two parts, 2 GB for Kernel and 2 GB for application usage.


De- Allocation of Memory

De - allocation of memory is also different from normal Win32 applications..NET has a sophisticated mechanism to de-allocate memory called Garbage Collector. Garbage Collector creates a thread that runs throughout the runtime environment, which traces through the code running under .NET. .NET keeps track of all the accessible paths to the objects in the code through the Graph of objects it creates. The relationships between the Object and the process associated with that object are maintained through a Graph. When garbage collection is triggered it deems every object in the graph as garbage and traverses recursively to all the associated paths of the graph associated with the object looking for reachable objects. Every time the Garbage collector reaches an object, it marks the object as reachable. Now after finishing this task, garbage collector knows which objects are reachable and which aren’t. The unreachable objects are treated as Garbage to the garbage collector. Next, it releases all the unreachable objects and overwrites the reachable objects with the Unreachable ones during the garbage collection process. All unreachable objects are purged from the graph. Garbage collection is generally invoked when heap is getting exhausted or when application is exited or a process running under managed environment is killed.

Garbage collector generally doesn’t take an object as Garbage if it implements Finalize method. During the process of garbage collection, it first looks for the object finalization from metadata. If the object has implemented Finalize(), garbage collector doesn’t make this object as unreachable, but it is assigned to as Reachable and a reference of it is placed to the Finalization queue. Finalize is also handled by a separate thread called Finalizer thread which traces through the finalizer queue and calls the finalize of each of those objects and then marks for garbage collection. Thus, if an object is holding an expensive resource, the finalize should be used. But there is also a problem with this, if we use finalize method, the object may remain in memory for long even the object is unreachable. Also, Finalize method is called through a separate thread, so there is no way to invoke it manually when the object life cycle ends.

Because of this, .NET provides a more sophisticated implementation of memory management called Dispose, which could be invoked manually during object destruction. The only thing that we need is to write the code to release memory in the Dispose and call it manually and not in finalize as Finalize() delays the garbage collection process.

Cost of Finalize in your Program:

Now let us talk about the cost that you have to bear if you have implemented indeterministic approach of .NET and included Finalize in your class. To make it clear you must know how GC works in CLR:

Generation 0 object means the objects that we have declared after last garbage collection is invoked. 1st Generation objects means which is persisting for last 1 GC cycle. Likewise 2nd Generation objects and so on. Now GC does imposes 10 examinies for 0 to 1 generation objects before doing actual Garbage Collection. For 1 to 2 Generation objects it does 100 examinees before collecting.

Now lets think of Finalize, an object that implemented Finalize will remain 9 cycle more than it would actually collected. If it still not finalized, it would move to Geeration 2 and have to go through 100 examinees to be collected. Thus use of Finalize is generally very expensive in your program.

IDisposable implementation:


For Deterministic approach of resource deallocation, microsoft introduced IDisposable interface to clear up all the resources that may be expensive.

Let us take an example :

Protected virtual void Dispose(bool isDisposing)
{
if(IsDisposed) return;
if(isDisposing)
{
// Dispose all Managed Resources
}
IsDisposed = true;
GC.SuppressFinalize(this);
}

Now let us explain,
The first line indicates an if condition statement, Here I have checked if the object is already disposed or not. This is very essential, as in code one can call dispose a multiple times, we need to always check whether the object is already disposed or not. Then we did the disposing, and then made IsDisposed to true.
Now GC.SuppressFinalize will suppress the call to finalize if it is there. This is because, if the user already disposed the object and cleared up all the expensive resources using deterministic approach of deallocation, we dont need the GC to wait to call Indeterministic Finalize method during the Garbage Collection process.

For local objects, we can call dispose directly after using the object. We can also make use of Using block or try/catch block for automatic disposal of objects.

Note: In case of USING, you must remember it works only with the objects that Implements IDisposable. If you use object that dont have implemented IDisposable interface in using block, .NET will through error.

Author's new book

Abhishek authored one of the best selling book of .NET. It covers ASP.NET, WPF, Windows 8, Threading, Memory Management, Internals, Visual Studio, HTML5, JQuery and many more...
Grab it now !!!