Tuesday, January 4, 2011

.NET 4 Concurrent Dictionary Collection

In ASP.NET we use a few static generic dictionaries that store data which needs to be accessed very quickly. Generic dictionaries are not thread-safe when adding items so we had to provide locks around all access to the collection.

Example using locks to add thread-safety when manipulating a generic dictionary.
private static readonly object cachedAnonymousIdentitiesLock = new object();
private static readonly Dictionary<Guid, Identity> cachedAnonymousIdentities = 
    new Dictionary<Guid, Identity>();

private Identity GetAnonymousIdentity(Guid orgId)
{
    Identity id = null;
    lock (cachedAnonymousIdentitiesLock)
    {
        if (cachedAnonymousIdentities.ContainsKey(orgId))
            id = cachedAnonymousIdentities[orgId];
        else
        {
            id = new Identity(orgId);
            cachedAnonymousIdentities.Add(orgId, id);
        }
    }
    return id;
}

The code above uses a mutual exclusion lock to provide thread-safe access to the generic dictionary which contains an immutable Identity object. While this code works fine, it does take a performance hit because of the blocking mutual exclusion lock. If a thread becomes blocked because of the lock, extra overhead is encountered because a context switch in the OS will occur. We really only need to maintain a lock for a very small amount of time while a new item is added to the collection (and truthfully a spin lock would be more efficient here).

Now in case you haven't heard, there is a new namespace in .NET 4 for concurrent collections. When I found out about this I immediately got the refactor itch. The thread-safe collections are discussed on MSDN. Looking around I found the System.Collections.Concurrent.ConcurrentDictionary<TKey, TValue> class. This class uses a spin lock instead of a blocking lock for thread-safe access. Additionally the class provides a few friendly methods that make adding a new item to the dictionary easier.  The GetOrAdd method is a welcomed method that allows you to quickly either get the item or add it to the dictionary.  Below is the same code above refactored using the ConcurrentDictionary.

private static readonly ConcurrentDictionary<Guid, Identity> cachedAnonymousIdentities =
    new ConcurrentDictionary<Guid, Identity>();

private Identity GetAnonymousIdentity(Guid orgId)
{
    return cachedAnonymousIdentities.GetOrAdd(orgId, key => new Identity(orgId));
}
The code is easy to follow and the locking is done automatically! I ran a quick unit test to double check the efficiency of the ConcurrentDictionary over our previous blocking method. The results show that in our scenario we do achieve a performance boost. Running the example below on a multi-core machine results in the blocking example executing in about 450 milliseconds while the concurrent example executes in about 350 milliseconds.

private readonly static ConcurrentDictionary<Guid, string> myThreadsafeObjects = new ConcurrentDictionary<Guid, string>();
private readonly static Dictionary<Guid, string> myObjects = new Dictionary<Guid, string>();
private readonly static object lockObj = new object();

[TestMethod]
public void LockingTest()
{
    int iterations = 1000000;
    string value = "hello world";
    int keyCount = 10000;
    List<Guid> keys = new List<Guid>(keyCount);
    for (int i = 0; i < keyCount; i++)
        keys.Add(Guid.NewGuid());
   
    Stopwatch lockSw = new Stopwatch();
    lockSw.Start();
    Thread t1 = new Thread(() =>
    {
        for (int i = 0; i < iterations; i++)
        {
            lock (lockObj)
            {
                if (!myObjects.ContainsKey(keys[i % keyCount]))
                    myObjects.Add(keys[i % keyCount], value);
                string itemValue = myObjects[keys[i % keyCount]];
            }
        }
    });
    t1.Start();
    for (int i = 0; i < iterations; i++)
    {
        lock (lockObj)
        {
            if (!myObjects.ContainsKey(keys[i % keyCount]))
                myObjects.Add(keys[i % keyCount], value);
            string itemValue = myObjects[keys[i % keyCount]];
        }
    }
    t1.Join();
    lockSw.Stop();
    Trace.WriteLine(string.Format("Blocking Dictionary lock test: {0} milliseconds", lockSw.ElapsedMilliseconds));

    Stopwatch concurrentSw = new Stopwatch();
    concurrentSw.Start();
    Thread t2 = new Thread(() =>
    {
        for (int i = 0; i < iterations; i++)
        {
            lock (lockObj)
            {
                string itemValue = myThreadsafeObjects.GetOrAdd(keys[i % keyCount], value);
            }
        }
    });
    t2.Start();
    for (int i = 0; i < iterations; i++)
    {
        lock (lockObj)
        {
            string itemValue = myThreadsafeObjects.GetOrAdd(keys[i % keyCount], value);
        }
    }
    t2.Join();
    concurrentSw.Stop();
    Trace.WriteLine(string.Format("Concurrent Dictionary lock test: {0} milliseconds", concurrentSw.ElapsedMilliseconds));
}

I'll take the simpler code and the performance gain. Great job MS on this new .NET class!

In an upcoming post I'll look into the use of the new BlockingCollection class in the Concurrent namespace. The class implements a new Producer-Consumer interface which is a pattern we reply upon in a few scenarios and where we have implemented the best practice ourselves. I am hoping this new collection will again simplify our code while improving efficiency.

Links