Pages

Friday, October 21, 2011

Regular Expressions with Timeout in .NET 4.5

.NET 4.5 Developer preview is out with Visual Studio 2011. I was already thinking to try out what's new in .NET 4.5 myself and share what exactly been changed.

Lets start by the new Regex Api introduced with the framework. The improvement that has been made is minor yet handy at certain cases. The Regex class of .NET 4.5 supports Timeout. Lets take a look how to work with it.



 Lets try to write a simplest RegEx validator to look into it.
try
{
    Regex regexpr = new Regex("[A-Z ]{10}", RegexOptions.Singleline, TimeSpan.FromMilliseconds(1));
    Match mch = regexpr.Match("ABHISHEK SUR");
    if (mch.Success)
        Console.WriteLine("Match found");
    else
        Console.WriteLine("Not matched");

}
catch (RegexMatchTimeoutException ex)
{
    Console.WriteLine("Regex Timeout for {1} after {2} elapsed. Tried pattern {0}", ex.Pattern, ex.Message, ex.MatchTimeout);
}
catch (ArgumentOutOfRangeException ex)
{
    Console.WriteLine(ex.ToString());
}
finally
{
    Console.ReadKey(true);
}

Here in the code you can see I simply check a string with a Regular expression. It eventually finds success as Pattern matches the string. Now this code is little different than what we have been doing for last few years. The constructor overload of Regex now supports a Timespan seed, which indicates the timeout value after which the Regular expression validator would automatically generate a RegexMatchTimeoutException. The Match defined within the Regex class can generate timeout after a certain time exceeds.

 You can specify Regex.InfiniteMatchTimeout to specify that the timeout does not occur. The value of InfiniteMatchTimeout is -1ms internally and you can also use Timespan.Frommilliseconds(-1) as value for timespan which will indicate that the Regular expression will never timeout which being the default behavior of our normal Regex class. Regex also supports AppDomain to get default value of the Timeout. You can set timeout value for "REGEX_DEFAULT_MATCH_TIMEOUT" in AppDomain to set it all the way through the Regular expressions being used in the same AppDomain. Lets take a look how it works.

try
{
    AppDomain.CurrentDomain.SetData("REGEX_DEFAULT_MATCH_TIMEOUT", TimeSpan.FromMilliseconds(2));

    Regex regexpr = new Regex("[A-Z ]{10}", RegexOptions.Singleline);
    Match mch = regexpr.Match("ABHISHEK SUR");
    if (mch.Success)
        Console.WriteLine("Match found");
    else
        Console.WriteLine("Not matched");

}
catch (RegexMatchTimeoutException ex)
{
    Console.WriteLine("Regex Timeout for {1} after {2} elapsed. Tried pattern {0}", ex.Pattern, ex.Message, ex.MatchTimeout);
}
catch (ArgumentOutOfRangeException ex)
{
    Console.WriteLine(ex.ToString());
}
finally
{
    Console.ReadKey(true);
}
Now this works exactly the same as the previous one. Here the Regex constructor automatically checks the AppDomain value and applies it as default. If it is not present, it will take -1 as default which is Infinite TImeout and also if explicitely timeout is specified after the default value from AppDomain, the Regex class is smart enough to use the explicitly set value only to itself for which it is specified. The Regex Constructor generates a TypeInitializationException if appdomain value of Timespan is invalid. Lets check the internal structure.

This is the actual code that runs in background and generates the timeouts. Infact while scanning the string with the pattern, there is a call to CheckTimeout which checks whether the time specified is elapsed for the object. The CheckTimeout throws the exception from itself.

The Constructor sets DefaultMatchTimeout when the object is created taking it from AppDomain data elements.


If you read MSDN thoroughly, it suggests to use Timeouts when specifying the Regular expressions. If the pattern is supplied from external or you are not sure about the pattern that needs to be applied to the string, it is always recommended to use Timeouts. Basically you should also specify a rational limit of AppDomain regex default to ensure no regular expression can ever hang your application.

This is a small tip on the new Regex enhancements introduced with .NET 4.5 recently.  I hope you like it. More to come shortly, stay tune.

Thank you for reading

No comments:

Post a Comment

Please make sure that the question you ask is somehow related to the post you choose. Otherwise you post your general question in Forum section.