C# performance tips for Unity, part 2: structs and enums

by robert, August 7th, 2015

This is part two of a previous post on C# performance tips for Unity. If you haven’t seen Part 1, check it out for more background and context!

In the last post, we looked at a number of surprises around memory allocation – such as heap trash being generated by foreach loops, array properties, or variadic functions. Throwaway objects on the heap can easily cause performance hiccups, so the less garbage we generate at runtime, the less we have to worry about memory pressure or garbage collection cost.

After the last post, several commenters on Gamasutra and on Reddit pointed out additional surprising memory allocations in Unity and Mono, when using structs or enums inside generic collections. I benchmarked them in a similar way as last time, and here are some of the interesting results.

In short, there are three more areas to watch out for, where we can observe unexpected garbage generation:

  1. lists of structs (including built-ins like Vector4)
  2. dictionaries keyed by structs
  3. dictionaries keyed by enums

We’ll talk about these in detail. But just to front-load the conclusions, here’s how to fix them:

  • Make sure your structs implement IEquatable<T>
  • Make sure your structs override Equals() and GetHashCode()
  • Add a custom comparer to dictionaries keyed by enums

Pretty simple, huh? More details await. But first, a word or two about autoboxing.


A Kind of Magic

Before we get into the weeds with structs and memory allocation, let’s very quickly refresh our memory about boxing and autoboxing. If you’re already familiar with this part of .NET, feel free to skip this section and jump straight to benchmarks below.

.NET draws a distinction between value types and reference types. Value types are things like primitives (int, float, etc) and structs, there are passed by value and can live on the stack, which greatly speeds up their creation and removal. Reference types (like class instances or strings) get passed by reference, always live on the heap, and get garbage-collected when not needed. Finally, when we try to pass a value that lives on the stack, into a place that expects a reference type such as the base “Object” class, the value will automatically get copied to the heap (boxed) for our convenience, and the reference will be passed around.

This last part – automatic, silent boxing of value types – is where most of our problems will be coming from.

For an example of boxing, consider functions like Object.Equals(object). It accepts a reference type as its parameter, so if we pass in a value object from the stack, it’s going to get automatically boxed first. Similar boxing happens when up-casting a value object to the base “Object” class, like in this snippet:

object obj = 1;
bool flag = obj.Equals(2);

That turns into this unfortunate snippet of IL code (as seen in ILSpy):

IL_0000: ldc.i4.1
IL_0001: box [mscorlib]System.Int32
IL_0006: stloc.0
IL_0007: ldloc.0
IL_0008: ldc.i4.2
IL_0009: box [mscorlib]System.Int32
IL_000e: callvirt instance bool [mscorlib]System.Object::Equals(object)
IL_0013: stloc.1

This shows both of the boxing examples mentioned above: first when take a reference to the integer “1” as an object, and second when we pass the integer “2” into System.Object.Equals(object).

While autoboxing is convenient and usually very cheap, the downside is that very often the copied, boxed reference object is just going to get thrown away, right after it’s created and used. This trash accumulates on the heap and puts pressure on the garbage collector, which will have to stop the world and do some cleanup. It would be better if we could just avoid trash generation altogether.

Ok, let’s take a look at some numbers. 🙂

(As in the previous installment, a caveat: the following measurements are taken using Unity’s performance profiler, using a standalone build under Unity 5.1 on an Intel i7 desktop machine. Yes, I’m aware of the limitations of synthetic benchmarks – use these as rough guidelines, always profile your own production code, etc. Also, these kinds of effects only start to matter when performing thousands of operations per frame. The standard rule of optimization applies: we should write readable and maintainable code first, then profile, and only then optimize actual observed hot spots, because each optimization reduces flexibility.)


I Want It All

First, let’s look at searching over generic lists of structs. Let’s say we create some structs like this:

public struct SmallStruct {
  //  2 int fields. total size:  2 * 4 B + 8 B = 16 B
  public int a, b;         
}

public struct LargeStruct {
  // 20 int fields. total size: 20 * 4 B + 8 B = 88 B
  public int a, b, /* ... more here */; 
}

Then we create a List<SmallStruct>, and start using and querying it, for example using List<T>.Contains(T value).

If you come from a C++ background and familiarity with the STL, you might think that given all this type information, and since we’re using a strongly typed collection, the compiler might be smart enough to avoid boxing.

Unfortunately, when we try this, we get performance numbers like this:

// List<T>.Contains(T) over a list with 128K struct elements:

                              mem alloc          time
SmallStruct ...................  4.0 MB ....... 28 ms
LargeStruct ................... 22.0 MB ....... 70 ms

This is an unpleasant surprise. It looks like Contains() creates not one, but two temporary copies of each element in the list! (For example: 2 * 128K * 16B per small struct = 4MB)

What’s going on? Contains() should be a simple function that just iterates over all elements and compares them. But there are two problems:

  1. Without any hints from us, the compiler will call the equality function on the base class: ValueType.Equals(object) which will box the struct during the function call
  2. The base ValueType.Equals() knows nothing about the struct it’s working on, so it has to use reflection to get the fields and compare them one by one. This is implementation-specific, but also seems to cause allocations proportional to the size of the struct.

So if we just wrote our own version of Equals() that compares structs our own way, could we get some speedup? Let’s see:

// List<T>.Contains() over a list with 128K struct elements
// structs override Equals(), GetHashCode()

SmallStruct ...................  2.0 MB ....... 12 ms
LargeStruct ................... 11.0 MB ....... 26 ms
UnityEngine.Vector4 ...........  3.0 MB ....... 15 ms

(By the way, as you can see I also threw in the built-in UnityEngine.Vector4 struct, because it also overrides Equals(). Lists of built-in Vectors suffer from spurious allocations as well.)

So far so good, these numbers look better, and now we merely duplicate the elements once during each call to Contains(). Progress! 🙂   But what we really want is for the collection to use a type-specific Equals(T) function, instead of Equals(object), so that we can avoid boxing completely.

We do that by having our structs implement IEquatable<T>, which is the “magic” interface used by generic collections to instantiate type-specific equalizers.

And the results are:

// List<T>.Contains() over a list with 128K struct elements
// structs implement IEquatable<T>

SmallStruct ...................    0  B .......  2 ms
LargeStruct ...................    0  B .......  8 ms

That’s much better. By implementing this interface we got rid of boxing and sped up iteration time. High fives all around.

The main takeaway here is knowing that the IEquatable<T> interface is there, and should be implemented by all structs that will be used in generic collections (or, in other words, all structs).


Another One Bites the Dust

Generic dictionaries also present a challenge when using structs as keys.

In this following test we’ll stress key comparisons, by continually calling Dictionary<K,V>.ContainsKey().

Performance of just a plain struct used as a key can be surprisingly bad:

// dict is a tiny Dictionary<K,V> with just one entry
// 128K calls to dict.ContainsKey()

SmallStruct ...................  2.0 MB ....... 20 ms
LargeStruct ................... 11.0 MB ....... 52 ms

Once again, we’re getting memory thrashing consistent with spurious boxing on each call (128K * 16B = 2MB).

Also, since we’re using structs as keys, the dictionary will use the default ValueType.GetHashCode() hashing function, which does hashing in some implementation-specific way that figures out the right hash regardless of what struct it’s operating on (so probably not the fastest).

Let’s try the same tactics that we applied to lists, and see how they work:

// 128K calls to dict.ContainsKey(SmallStruct)

plain ................................... 2.0 MB ....... 20 ms
IEquatable<T> ........................... 2.0 MB ....... 20 ms
Equals(), GetHashCode() ................  2.0 MB ....... 14 ms
IEquatable<T>, Equals(), GetHashCode() ..   0  B .......  2 ms 
// 128K calls to dict.ContainsKey(LargeStruct)

plain .................................. 11.0 MB ....... 52 ms
IEquatable<T> .......................... 11.0 MB ....... 52 ms
Equals(), GetHashCode() ................ 11.0 MB ....... 38 ms
IEquatable<T>, Equals(), GetHashCode() ..   0  B ....... 13 ms 

Just as before, we get the best performance if we implement our own better, faster tests for equality, and our own hashing routine.

In fact, when implementing custom structs, the best practice is to do all of the following:

  1. provide a type-specific equality comparison function via IEquatable<T>
  2. override GetHashCode() and Equals() with faster (custom) versions
  3. likewise, override operators == and != to use strongly typed equality checks

Alternatively, you could make your own implementation of IEqualityComparer<T>, and pass it in to the dictionary, to avoid boxing in a similar way, but that needs to happen on a per-collection basis instead of once for the whole struct.

Unfortunately, built-in structs like Vector3 follow only some of these guidelines, and so they might not have the best performance when stored inside generic collections.

By the way, you might have noticed that we always override Equals() and GetHashCode() together. Why is this?

These two functions must be compatible with each other, for collections like Dictionaries to work correctly. Specifically, whenever two objects are equal, they must return the same hash code. (Imagine if they didn’t – then a value might never be found inside a collection.)

There are some specific rules that need to be followed when implementing those two functions, which I’m going to describe quickly, and refer you to Bill Wagner’s Effective C# for all the sordid, checkered details:

  • Custom Equals() predicate must be:
    • Reflexive (a == a is always true)
    • Symmetrical (if a == b then b == a)
    • Transitive (if a == b and b == c then a == c)
  • Custom GetHashCode() must follow:
    • If a.Equals(b) then both must generate the same hash code
    • If object is modified but the change doesn’t affect the output of Equals(), the hash code must remain the same
      • In practice this is important for reference types, but not much of an issue for structs used as keys in collections, because it’s not possible to get a reference to a struct used as a key inside a generic collection and modify it.

Also, hash codes should be uniformly distributed over the space of signed integers, for optimal performance.


I’m Going Slightly Mad

Finally, let’s talk about enums.

This is another interesting departure from C or C++. In those languages, enum is essentially interconvertible with one of the int types.

In .NET, enum is also backed by int (or some other integral type of your choosing), but because of that it’s a value type, inheriting various functionality from their ValueType parent class.

Given this preamble, you probably won’t be very surprised to see these kinds of performance numbers:

// loop of 128K iterations, a and b are enums
// enum does not implement IEquatable<T>

1. a == b .......................   -  B ......  0.1 ms
2. (int)a == (int)b .............   -  B ......  0.1 ms
3. a.Equals(b) .................. 3.0 MB ...... 24.0 ms

In fact, both cases 1 and 2 result in the same IL code:

// result = (a == b);
   IL_0007: ldsfld valuetype TestEnum BaseEnumTest::a
   IL_000c: ldsfld valuetype TestEnum BaseEnumTest::b
   IL_0011: ceq

… while the third case causes double boxing:

// result = a.Equals(b);
   IL_0007: ldsfld valuetype TestEnum BaseEnumTest::a
   IL_000c: box TestEnum
   IL_0011: ldsfld valuetype TestEnum BaseEnumTest::b
   IL_0016: box TestEnum
   IL_001b: callvirt instance bool [mscorlib]System.Enum::Equals(object)

And once we start using them as keys in dictionaries, we get behavior like this:

// loop of 128k lookups keyed by enum
//
// edict is of type Dictionary<TestEnum, int>
// idict is of type Dictionary<int, int>

1. result = edict[enum] .......... 4.5 MB ...... 40 ms
2. result = idict[int] ...........   -  B ......  2 ms

Confusingly enough, this first behavior is not due to boxing on our end, but there is heap allocation going on inside the dictionary when it encounters an enum key. This is a bit unfortunate, since dictionaries keyed by enums are very, very convenient, eg. for converting from enums to human readable strings. (Thanks to Lucas Meijer for more info on this.)

Fortunately, we can get around that, in two ways. First, we can implement our own IEqualityComparer<T> for the enum type, and pass that into the Dictionary constructor, and that will successfully avoid boxing as well. (Thanks to Borut Pfeifer for this tip!) The second options is to cast enums into ints, and use an int-keyed dictionary.

Here’s a comparison:

// loop of 128k lookups 
// edict is of type Dictionary<TestEnum, int>
// newedict is same as edict, but with a custom IEqualityComparer<T>
// idict is of type Dictionary<int, int>

1. result = edict[enum] ..................... 4.5 MB ...... 40 ms
2. result = newedict[enum] ..................   -  B ......  2 ms
3. result = idict[(int)enum] ................   -  B ......  2 ms

Once again, the trick is in knowing that something like this is happening, and needs action on our part. Either of these is a relatively small price to pay in exchange for not generating 36 bytes of garbage on each invocation.


Play the Game

Hopefully this was an interesting and useful jog through the far away lands where value types and generic collections intersect.

These areas are not well known and explored, but they’re interesting – and good to know, especially for those of us coming from languages like C or C++, where enums, structs, and data structures look quite similar, but behave very, very differently.

.

Special thanks to Ben Perez for feedback on Part 1.


Follow me on Twitter at @rzubek to hear about future installments of this series.