Tuesday, May 28, 2013

String Pooling – are you immutable?

Last night I was doing some performance improvements to an already ultra light application with a co-worker. The focus was to reduce as much as possible the Memory FootPrint. In the middle of this work, somewhere in the code there was something like:

(click to enlarge)


Ok, it was more "string intensive", but lets keep it simple. Lets look to this example. What's "wrong" here? Well, just a detail:

(click to enlarge)


Ok. They hold the same text BUT they actually "reference" two different objects. Strings are immutable, so far so good, here's the output:

(click to enlarge)


BUT, what happens if you have something like:

(click to enlarge)


Here's the output:

(click to enlarge)


So, firstString and secondString are not only equal in value but reference the same object. How is this possible? This is actually not a .NET related stuff. Lots of unmanaged compilers have been doing this for ages. Quoting Jeffrey Richter:

"When compiling source code, your compiler must process each literal string and emit the string into the managed module's metadata. If the same literal string appears several times in your source code, emitting all of these strings into the metadata will bloat the size of the resulting file. To remove this bloat, many compilers (including the Microsoft C# compiler) write the literal string into the module's metadata only once. All code that references the string will be modified to refer to the one string in the metadata. This ability of a compiler to merge multiple occurrences of a single string into a single instance can reduce the size of a module substantially."

Just another detail. But a cool one!

Final note: Some people refer to this as String Interning. However, be careful when using the String.Intern method directly. It does have bad performance side effects (refer to the section "Performance considerations" in the MSDN documentation.