Friday, October 28, 2011

Classes, Structs and LLBLGen Pro

I just found an interesting bug that has trickled into some of our code.

Can you identify what is wrong with the following code (don’t cheat by reading below):
Guid? enrollmentId = (from e in metaData.Enrollment
                      where e.LearnerId == learnerId 
                      select e.Id).FirstOrDefault();

The problem comes from the way that FirstOrDefault() operates. In LLBLGen's case, that method will either take the first value, or it will set the default value if it does not find a match. This is where it is important to know what you are selecting. In this case, the query is selecting the e.Id, which is a Guid. A Guid can never be null because it is a struct and it is not a class.
Other things that are structs include: bool, int, short, long, byte, DateTime, KeyValuePair, etc...
If you are selecting a value that is a class, the default value will be NULL because classes are reference types.
If you are selecting a value that is a struct, the default value will never be NULL because structs are value types.
Side Note: The default value for a struct can be determined generically using the following (Where T is a type of struct):
T defaultValueForType = default(T);
So how do we select a struct using FirstOrDefault()?
You can work around this by selecting a class instead. The best way to do it is to use an Anonymous Type as shown below.
var enrollmentId = (from e in metaData.Enrollment
                    where e.LearnerId == learnerId 
                    select new { e.Id }).FirstOrDefault();

Anonymous Types are classes, therefore they can be null. By using an anonymous type in our query, the enrollmentId variable will be NULL if there are no records found.

Links

Sunday, May 1, 2011

.NET 4 Concurrent Dictionary Gotchas

While the .NET ConcurrentDictionary<T, V> is thread-safe, not all of its methods are atomic. Microsoft points out (http://msdn.microsoft.com/en-us/library/dd997369.aspx) that the GetOrAdd and AddOrUpdate methods that take in a delegate invoke the delegate outside of the locking mechanism used by the dictionary.

Because the locking is not employed when invoking the delegate, it could get executed more than once. I wrote some sample code and posted it onto a new StackOverflow community called CodeReview to get some feedback from the community on an approach to use the delegate overloads while maintaining atomicity. I found that by writing an extension method for the delegate methods I could use the .NET Lazy<T> class as a wrapper for the value in the dictionary. Because the Lazy<T> class is only evaluated once, the new methods are atomic.

Pinned below is the CodeReview post.

Tuesday, January 4, 2011

.NET 4 Concurrent Dictionary Collection

In ASP.NET we use a few static generic dictionaries that store data which needs to be accessed very quickly. Generic dictionaries are not thread-safe when adding items so we had to provide locks around all access to the collection.

Example using locks to add thread-safety when manipulating a generic dictionary.
private static readonly object cachedAnonymousIdentitiesLock = new object();
private static readonly Dictionary<Guid, Identity> cachedAnonymousIdentities = 
    new Dictionary<Guid, Identity>();

private Identity GetAnonymousIdentity(Guid orgId)
{
    Identity id = null;
    lock (cachedAnonymousIdentitiesLock)
    {
        if (cachedAnonymousIdentities.ContainsKey(orgId))
            id = cachedAnonymousIdentities[orgId];
        else
        {
            id = new Identity(orgId);
            cachedAnonymousIdentities.Add(orgId, id);
        }
    }
    return id;
}

The code above uses a mutual exclusion lock to provide thread-safe access to the generic dictionary which contains an immutable Identity object. While this code works fine, it does take a performance hit because of the blocking mutual exclusion lock. If a thread becomes blocked because of the lock, extra overhead is encountered because a context switch in the OS will occur. We really only need to maintain a lock for a very small amount of time while a new item is added to the collection (and truthfully a spin lock would be more efficient here).

Now in case you haven't heard, there is a new namespace in .NET 4 for concurrent collections. When I found out about this I immediately got the refactor itch. The thread-safe collections are discussed on MSDN. Looking around I found the System.Collections.Concurrent.ConcurrentDictionary<TKey, TValue> class. This class uses a spin lock instead of a blocking lock for thread-safe access. Additionally the class provides a few friendly methods that make adding a new item to the dictionary easier.  The GetOrAdd method is a welcomed method that allows you to quickly either get the item or add it to the dictionary.  Below is the same code above refactored using the ConcurrentDictionary.

private static readonly ConcurrentDictionary<Guid, Identity> cachedAnonymousIdentities =
    new ConcurrentDictionary<Guid, Identity>();

private Identity GetAnonymousIdentity(Guid orgId)
{
    return cachedAnonymousIdentities.GetOrAdd(orgId, key => new Identity(orgId));
}
The code is easy to follow and the locking is done automatically! I ran a quick unit test to double check the efficiency of the ConcurrentDictionary over our previous blocking method. The results show that in our scenario we do achieve a performance boost. Running the example below on a multi-core machine results in the blocking example executing in about 450 milliseconds while the concurrent example executes in about 350 milliseconds.

private readonly static ConcurrentDictionary<Guid, string> myThreadsafeObjects = new ConcurrentDictionary<Guid, string>();
private readonly static Dictionary<Guid, string> myObjects = new Dictionary<Guid, string>();
private readonly static object lockObj = new object();

[TestMethod]
public void LockingTest()
{
    int iterations = 1000000;
    string value = "hello world";
    int keyCount = 10000;
    List<Guid> keys = new List<Guid>(keyCount);
    for (int i = 0; i < keyCount; i++)
        keys.Add(Guid.NewGuid());
   
    Stopwatch lockSw = new Stopwatch();
    lockSw.Start();
    Thread t1 = new Thread(() =>
    {
        for (int i = 0; i < iterations; i++)
        {
            lock (lockObj)
            {
                if (!myObjects.ContainsKey(keys[i % keyCount]))
                    myObjects.Add(keys[i % keyCount], value);
                string itemValue = myObjects[keys[i % keyCount]];
            }
        }
    });
    t1.Start();
    for (int i = 0; i < iterations; i++)
    {
        lock (lockObj)
        {
            if (!myObjects.ContainsKey(keys[i % keyCount]))
                myObjects.Add(keys[i % keyCount], value);
            string itemValue = myObjects[keys[i % keyCount]];
        }
    }
    t1.Join();
    lockSw.Stop();
    Trace.WriteLine(string.Format("Blocking Dictionary lock test: {0} milliseconds", lockSw.ElapsedMilliseconds));

    Stopwatch concurrentSw = new Stopwatch();
    concurrentSw.Start();
    Thread t2 = new Thread(() =>
    {
        for (int i = 0; i < iterations; i++)
        {
            lock (lockObj)
            {
                string itemValue = myThreadsafeObjects.GetOrAdd(keys[i % keyCount], value);
            }
        }
    });
    t2.Start();
    for (int i = 0; i < iterations; i++)
    {
        lock (lockObj)
        {
            string itemValue = myThreadsafeObjects.GetOrAdd(keys[i % keyCount], value);
        }
    }
    t2.Join();
    concurrentSw.Stop();
    Trace.WriteLine(string.Format("Concurrent Dictionary lock test: {0} milliseconds", concurrentSw.ElapsedMilliseconds));
}

I'll take the simpler code and the performance gain. Great job MS on this new .NET class!

In an upcoming post I'll look into the use of the new BlockingCollection class in the Concurrent namespace. The class implements a new Producer-Consumer interface which is a pattern we reply upon in a few scenarios and where we have implemented the best practice ourselves. I am hoping this new collection will again simplify our code while improving efficiency.

Links