System.Linq – Unleash the power!

I’ve been poking about with the System.Linq namespace in C# 3.5 this week.  Like most “new” areas of a language, it can be difficult to find the time or motivation to delve into the capabilities it provides, but I’ve just had it proven to me that I should have taken the plunge months ago.

In one small project, I have to compare two generic lists and pull out the records that are equal across the sets, as well as those that are different from one set to another (additions and deletions).  Luckily, the sets are fairly small, so the implementation I wrote last week is good enough to get the job done.  This week, however, I decided to do a quick benchmark to see how much faster it would be to use the new Linq-y methods.

First up, I created a small test class – MyType.

public class MyType
{
	public int MyId { get; set; }
	public string GivenName { get; set; }
	public string FamilyName { get; set; }
	public DateTime DateOfBirth { get; set; }

	public override bool Equals(object obj)
	{
		if (null == obj) return false;
		MyType compareTo = obj as MyType;
		if (null == compareTo) return false;
		return (FamilyName == compareTo.FamilyName && GivenName == compareTo.GivenName  && DateOfBirth == compareTo.DateOfBirth);
	}
}

Next up, I created a comparer class, inheriting from IEqualityComparer<MyType>

public class MyTypeComparer : IEqualityComparer
{
	public bool Equals(MyType x, MyType y)
	{
		return (x.FamilyName == y.FamilyName  && x.GivenName == y.GivenName  && x.DateOfBirth == y.DateOfBirth);
	}

	public int GetHashCode(MyType obj)
	{
		return obj.GivenName.GetHashCode() ^ obj.FamilyName.GetHashCode() ^ obj.DateOfBirth.GetHashCode();
	}
}

Next we create two collections of 10,000 members each.  I did that using the following snippet.

private void SetupDataForTest(int numberOfRecords)
{
	recordSetOne = new List();
	recordSetTwo = new List();
	matches = new List();
	matches2 = new List();

	for (int i = 0; i < numberOfRecords; i++)
	{
		recordSetOne.Add(new MyType {MyId = i, GivenName="Trevor", FamilyName="Test" + i.ToString(), DateOfBirth = new DateTime(1970,1,1).AddDays(i)});

		if(i % 3 == 0)
		{
			recordSetTwo.Add(new MyType { MyId = i, GivenName = "Trevor", FamilyName = "Test" + i.ToString(), DateOfBirth = new DateTime(1970, 1, 1).AddDays(i) } );
		}
		else
		{
			recordSetTwo.Add(new MyType { MyId = i, GivenName="Ian", FamilyName="Wilson" + (i*2).ToString(), DateOfBirth = new DateTime(1971,1,1).AddDays(i*2) } );
		}
	}
}

So now we can get to the meat of the thing and find all the records that exist in both sets.  The old way of doing this was looping through the first set, and for each member loop through the second until we find a match (I’m not going to provide code – it’s too darned ugly).  Doing it this way took around 8 seconds on my dev machine.

The Linqy way, though is to use the Enumerable.Intersect method :

private void RunLinqTest()
{
	var intersection = recordSetOne.Intersect(recordSetTwo, new MyTypeComparer());
	matches.AddRange(intersection);
}

Guess what?  Adding all the matches to a pre-existing collection returned the expected 3334 results, and took 7 milliseconds.Yes, milliseconds.  That means I could run this method 1142 times in the time it took to run the old way.

I’m not going to go into the specifics of how to pull out the added/deleted records (but I’ll give you a clue -> Enumerable.Except), but the performance gains were enormous there, too.