Friday, August 17, 2012

Performance Tips for Asynchronous Development in C#

GET CODE HERE
In a recent online C# Corner column, "Exceptional Async Handling with Visual Studio Async CTP 3", I showed how the Visual Studio Async CTP (version 3), which extends Visual Studio 2010 SP1, handles aggregating exceptions that happen in background, asynchronous methods. In this column, I'm going to cover the mechanics of the Async framework and offer some tips on maximizing its performance.
Breaking Up Is Hard to Do
A deep dive of exactly how the C# compiler implements Async is beyond the scope of this article. Instead, I'll highlight how the compiler breaks up and rearranges my code, so that I can write it in a synchronous fashion but still have the runtime execute it asynchronously.

Here's a simple example in a Windows Forms application (project "BreakingUpAsync" in the sample code). I have a single button on my form and when I click it, the form's caption will display the current time for the next 15 seconds:
private async void button1_Click(object sender, EventArgs e)
{
  var now = DateTime.Now;
  button1.Enabled = false;
  for (var x = 0; x < 15; x++)
  {
    this.Text = now.AddSeconds(x).ToString("HH:mm:ss");
    await TaskEx.Delay(1000);
  }
  button1.Enabled = true;
}
Nothing fancy here. I disable the button at the start of the loop. Inside the loop, I update the form's caption and wait for one second. Finally, I re-enable the button.
Remove the "async" keyword, the "await" keyword and change TaskEx.Delay(1000) to Thread.Sleep(1000), and without Async support, I'd lock up the UI. See my previous column, "Multithreading in WinForms"), for more details. However, thanks to Async support, this code runs just fine with a fully responsive UI. How?
First, I pull out ILSpy, an open source .NET assembly browser and decompiler. ILSpy makes inspecting the IL generated by the C# compiler much easier. If you're a fan of IL, just use the MSIL Disassembler (Ildasm.exe).
Here's what my button1_Click event handler looks like after it's compiled (I've massaged the names a bit because the type names generated by the compiler can be pretty ugly to read):
private void button1_Click(object sender, EventArgs e)
{
  Form1.button1ClickCode clickInstance = new Form1.button1ClickCode(0);
  clickInstance.<>4__this = this;
  clickInstance.sender = sender;
  clickInstance.e = e;
  clickInstance.<>t__MoveNextDelegate = new Action(clickInstance.MoveNext);
  clickInstance.$builder = AsyncVoidMethodBuilder.Create();
  clickInstance.MoveNext();
}
No loop code. No enabling or disabling of the button. Where's the code I wrote? Notice the first thing this code does is create an instance of a class called button1ClickCode. This is a compiler-generated class that contains the code I originally put in the event handler, along with a bunch of state-based mechanics to handle asynchrony.
It's important to notice a few key things here. First off, this code is creating a new object. The Microsoft .NET Framework is pretty quick at allocating objects, but not without cost. This doesn't mean you should avoid Async. Quite the opposite: Writing code to handle this asynchronously without the Async framework might require even more objects to be created. Just be aware that this happens, and try not to make a bunch of fine-grained Async methods. Instead, opt for larger Async methods.
The next thing to notice is that the arguments of the event handler ("sender" and "e") are passed along to the button1ClickCode instance. Every local variable is "lifted" to this class. This is necessary because the code I wrote (which gets manipulated and placed in the special button1ClickCode class) probably uses those locals and, therefore, needs access to them. If I look at the generated code for the button1ClickCode class, I'll see:
  • A Form field, which has a reference to my form.
  • An object field, which has a reference to my "sender" argument.
  • An EventArgs field, which has a reference to my "e" argument.
  • A DateTime field that represents the "now" variable.
  • A field-level int to hold onto my "x" loop counter.
The compiler is creating a whole new object for this Async method (as I noted earlier). Now I see that this object's size can be affected by how I write my Async method. A bigger object means more pressure on memory, which leads to more garbage collections and decreased performance.
I can limit the size of that generated class by how I write my Async methods. In the previous example, I'm not using "sender" or "e" and I really don't need to store the current DateTime -- I can grab it each time I need it in the loop with DateTime.Now. So I rearrange my Click event handler as shown in Listing 1.
Now when I use ILSpy to check out the generated class with my event handler code, there's no more reference to "sender," "e" or "now." I've trimmed three fields and, therefore, the resulting class has a smaller memory footprint. Granted, this is just a small example, but knowing this is happening can help you write better Async code.
The compiler-generated class that runs my code in the background (and thus, asynchronously) has to handle exceptions. That means it's wrapped in a try/catch block and has to handle storing and re-throwing the exception back on my UI thread should an exception happen. Again, not super-expensive in terms of memory/clock cycles, but it's important to know what you're getting into and be aware of it.
Finally, note the call to AsyncVoidMethodBuilder.Create inside the Click event handler. This is more setup for Async support. It also has a cost. Take a look at the StateMatchingBuilding project in the sample code. I have two empty methods: one I call synchronously and another I call asynchronously. If I sit in a loop and call each method about 10 million times, my laptop takes about 11 percent to 15 percent longer for the Async calls. Don't write Async methods just because you can -- write them because they make sense for your solution.
Be Careful How You Wait
Another "gotcha" to watch out for is how you wait for an Async process to complete. Suppose I have the following method that does something and returns a Task:
public Task DoSomething()
{
  // Create and return Task that does something intensive
}
This method returns a Task, so there are two ways I can wait for it to finish. The best way would be to use the C# "await" keyword that I've been using:
public async void GoodWait()
{
  await DoSomething();
}
However, because DoSomething returns a Task, I could also just as easily use the Task Wait method:

public void BadWait()
{
  DoSomething().Wait();
}
The problem with the Wait method is that it's synchronous. The Task might be off doing something, but by calling Wait, my code sits right there inside the BadWait method until the Task completes. Imagine if this were in a Windows Forms app inside of a button click event. My UI would be locked waiting for the Task to complete.
On the other hand, by using the "await" keyword, a state machine is built to move my code into another class and run it asynchronously -- so the waiting actually happens asynchronously. No UI lockups, and it removes the possibility of deadlocks between the Async code and the caller that may be waiting for completion.
Cache Task Results When Possible
As I noted earlier, the C# compiler creates additional objects to handle the asynchronous implementation. More objects mean more pressure on the garbage collector. That, in turn, can have a negative impact on my application's performance. Here's another case where a few tweaks give me more performance from my code.
Let's say I have an application that has to check about 100 Web sites to see if they're up and running. Network calls and possible timeouts could negatively affect my application's responsiveness, so I'm going to do the site checks asynchronously.
For this example, I don't want to actually make 100 network calls, so I have a simple way to return a consistent set of data (see the project "CacheResults" in the sample code):
public static async Task SiteIsUpAsync(string url)
{
  return url.Length % 2 == 0;
}
The issue with this sample code is that every call to this method will result in either a true or a false result, but I'm creating a new Task for every call. This approach is going to create a lot of extra objects and put more pressure on the garbage collector.
Instead, I could cache an instance of Task for the "true" result and another Task for the "false" result. This approach only adds two objects and greatly reduces the amount of work the garbage collector has to do. The code is a little more involved, but the impact is huge, as shown in Listing 2.
When the Listing 2 code runs in a loop that checks 100 sites 100,000 times, my laptop gives me about a 55 percent to 60 percent increase in performance by caching the results (instead of returning a new result each time). Anytime you have results from an Async method that may be repeated from call to call, consider caching the results instead of creating a new result for each invocation.
The Microsoft Visual Studio Async framework is a great tool for your tool belt. Just make sure you understand some of the inner workings of the technology -- then you'll really see the benefits that asynchronous programming can bring to your applications.

The New Read-Only Collections in .NET 4.5

Eric Vogel covers some practical uses for the long awaited interfaces, IReadOnlyList and IReadOnlyDictionary in .NET Framework 4.5.
The Microsoft .NET Framework 4.5 includes the IReadOnlyList, IReadOnlyDictionary and IReadOnlyCollection generic interfaces. The main benefit is that the new interfaces are covariant, except for IReadOnlyDictionary. This means that you can use a derived type as the generic parameter, when passing in a collection into a method that's defined for a base type. If you have a Dog class, for example, that derives from Animal, you can have a method that accepts an IReadOnlyList and pass it an IReadOnlyList.
The IReadOnlyCollection interface, which forms the base of the IReadOnlyList and IReadOnlyDictionary classes, is defined as IReadOnlyCollection. The out modifier was added in .NET 4 to denote covariance, whereas the in modifier marks a type as being contravariant. A contravariant type may be substituted by one of its base classes.

Prior to .NET 4.5, the primary covariant collection interface was IEnumerable. If you wanted to have a read-only view of a List or a Dictionary class, you had to roll your own custom interface, or class, to get the full feature set. Enough theory, let's look at some real applications of these new interfaces.
A common scenario you may run into, is storing a list of people or employees. The application may be a case or customer relationship management system. Either way, you're dealing with similar class representations. For example, if you have a Person class that contains FirstName and LastName properties (Listing 1), and an Employee subclass that adds EIN and Salary properties (Listing 2). This is a very simplified view of a business domain, but it gets the picture across.
You could then create a typed list of Employee objects and access them as a read-only collection using the new interfaces. In a real-world application, your employee list is likely to be quite large and retrieved from a database.
List employees = new List()
{
    new Employee() { EIN = 1, FirstName = "John", LastName  = "Doe", Salary= 55000M },
    new Employee() { EIN = 2, FirstName = "Jane", LastName = "Doe", Salary= 55000M },
    new Employee() { EIN = 3, FirstName = "Don", LastName  = "DeLuth", Salary= 55000M },
};
The IReadOnlyCollection is the most basic read-only collection interface and provides a Count property on top of its inherent IEnumerable members. For example, you could store a read-only view of employees for a directory listing and easily retrieve the number of people.
IReadOnlyCollection directory = employees;
int numStaff = directory.Count;
The IReadOnlyList interface is the same as IReadOnlyCollection with the addition of an item indexer.
  IReadOnlyList staff = employees;
 Person firstHire = staff[0];
The IReadOnlyList would be well-suited for a read-only grid display of the needed items.
The IReadOnlyDictionary interface, as its name suggests, provides a read-only view of the Dictionary class. The accessible Dictionary class members include the Keys, Values and key indexer properties, in addition to the ContainsKey and TryGetValue methods.
Dictionary einLookUp = employees.ToDictionary(x => x.EIN);
IReadOnlyDictionary readOnlyStaff = einLookUp;
var eins = readOnlyStaff.Keys;
var allEmployees = readOnlyStaff.Values;
var secondStaff = readOnlyStaff[2];
bool haveThirdEin = readOnlyStaff.ContainsKey(3);
Employee test;
bool fourthExists = readOnlyStaff.TryGetValue(4, out test);
The IReadOnlyDictionary interface could prove useful for validation, as you would not need to modify the items but may want to quickly access them via a key, such as a control identifier.
As you can see, there are many uses for the new read-only collection interfaces. Primarily, they can be used to clean up your application's API to indicate that a method or class should not modify the contents of a collection that it is accessing. One caveat to note is that the interfaces do not provide an immutable copy of the collection but rather a read-only view of the source mutable collection.

About the Author
Eric Vogel is a Software Developer at Red Cedar Solutions Group in Okemos, MI. He is the president of the Greater Lansing User Group for .NET. Eric enjoys learning about software architecture and craftsmanship and is always looking for ways to create more robust and testable applications. Contact him at eric.vogel@rcsg.net.

Could not find a part of the path ... bin\roslyn\csc.exe

I am trying to run an ASP.NET MVC (model-view-controller) project retrieved from TFS (Team Foundation Server) source control. I have added a...