When developing Software, often comes a point where you have to make the typical "Generic versus performance" decision. Developing generic code usually has a tradeoff: performance loss. For example, the use of Reflection has an extra overhead. Sometimes you don't need to build it generic, sometimes you have to do it (plugins are the most obvious example) and other times you have the possibility to choose either way. In my case, the decision is simple: Building it generic simplifies maintenance and future developments but has the performance overhead. Is that overhead relevant in the overall performance? What's the weight? Will the code be "usage intensive" or the overhead is irrelevant considering all the constraints? Of course if you get to this point of "tuning", you probably have already fine-tuned a thousand other places before reaching here, but nevertheless...
Enough talking. Let's check it out!
What we will be testing
Before we start: I deleted code comments on purpose! This is the class that will be used.
Just that simple!
Now lets take a look at 5 ways to create "myClass" objects:
– Using the common "new" keyword;
– Activator.CreateInstance (using the full name of the type);
– Activator.CreateInstance (using type information directly);
– Using the ILGenerator;
– Using Lambda Expressions.
The first one is focused on "compile-time" as opposite to the others whose focus is on runtime. Our goal here is to verify the performance of each one of these ways to create new objects. The first three are quite simple:
(click to enlarge)
Nothing to explain here, just pure and simple Framework stuff. The DynamicInitializer is an improved version (using a "CachedStyle") of the code provided by Dean Oliver @ CodeProject, which uses an ILGenerator:
(click to enlarge)
The ObjectGenerator is "heavy work", so, we used "a cached version". The Lambda expressions version is adapted from Roger Alsing and it's pretty cool:
(click to enlarge)
However, just like in the ILGenerator's version, it's only useful if we have a scenario where you pre-compile the lambda expressions once and then create several objects. The first run of this version is also heavy work!
Three machines used:
– Intel Core 2 DUO @ 3Ghz using WindowsXP. CPU Usage averaging 5%;
– Six-Core AMD Opteron Processor 2427 @ 2.20Ghz (4 processors) using Windows Server 2008 R2 Enterprise. CPU Usage averaging 3%;
– Intel Core 2 DUO @ 2Ghz using Windows 8. CPU Usage averaging 16%.
(Ignoring the machines uptime. The tests ran locally, no network issues involved. Beware of 32 bits versus a 64 bits architecture)
Note that for the purpose of these tests, we're using just parameterless constructors. The performance test was done by executing 100 tests consisting of 1 million iterations per test on 3 different machines. Meaning:
100 tests x 1 million iterations x 3 machines = 300 million objects
So, each of the 5 methods shown above was invoked 300 million times.
Here's the results. The Average Time Wasted is per 1 million "invokes"! In the graphic, I omitted the "ActivatorByName" since it would make the graph less readable.
So, "no brainer": The direct and simple new operator is the quickest. The main reason is obvious: most of the "work" was done at compile time rather than runtime. If you look at the generated IL, all the OpCodes were emitted in place and no reflection is used:
Now, the second best shot is using the ILGenerator. However, this is only valid due to the use of the "Cached" version. You might have noticed that in the "for" cycle, we are using the "Cached" version, meaning, a previous call to the ObjectGenerator method was made in order to actually emit the IL. If we were to call ObjectGenerator on every cycle iteration, the performance would actually be the worst and outrageous! What this means is that this implementation is only good for this particular scenario in which we have several object creations in the same scope. The Lambda Expressions suffers from the same issue. It's the next best shot but, just like the ILGenerator version, we have to ensure only one compilation of the Lambda Expression and several object creations in the same scope. You can see more info here.
Now, the ActivatorByType was approximately 10 times worse than the basic "new". This is because most of the work is now done at runtime using Reflection and not at compiled time like when using the "new" keyword. If you look at an implementation of the Activator.CreateInstance() you can see this more clearly:
(click to see the full version)
This is an implementation from the Shared Source Common Language Infrastructure (aka SSCLI20 (Rotor)). Finally, the CreateInstance by name. It's the worse because it has even more work to do under the hood (See the Activador.cs and RRType.cs from the SSCLI20 mentioned above).
The good news!
First of all, keep in mind that our worst case scenario here is 11173ms. That's roughly eleven seconds per 1 million objects! Wouldn't it be awesome if you could earn 1 million euros in 11 seconds? Second, if you're reaching this point of fine-tuning, you're either building an extremely high-performance, multithreaded, extensible, expansible and scalable piece of software or you should definitely be searching performance improvements somewhere else.