Making equality comparisons work

One of the benefits of inheritance is that you can treat a derived object as though it is an object of the inherited class: a Fooby is-a Foo. But as you saw in my previous post, treating a Fooby as a Foo when you’re comparing two objects for equality can result in inconsistencies.

This is a problem because there are times when you really do want a collection of objects that are derived from class Foo, but limit that collection so that no two objects, regardless of their type, have identical Bar values. For example, you might want the code below to keep only the first and third items.

var foos = new HashSet<Foo>();
foos.Add(new Foo(0));
foos.Add(new Fooby(0, "Hello"));
foos.Add(new Fooby(1, "Jim"));
foos.Add(new Foo(1));
Console.WriteLine(foos.Count);

If you run that code, you’ll see that it outputs ‘4’, meaning that all four items were added to the collection.

Many programmers will attempt to solve this problem by implementing IEquatable<T> on the classes in question. My previous article showed why that approach can’t work.

There’s simply no good way to supply default behavior that works well in all circumstances. The solution is to stop trying to make the classes understand that you’re changing the meaning of equality. If you want to compare all Foo-derived instances as though they’re actually Foo instances, then supply a new equality comparer that does exactly what you want.

The easiest way to create a new equality comparer is to derive from the generic EqualityComparer<T> class, like this:

public class FooEqualityComparer : EqualityComparer<Foo>
{
    public override bool Equals(Foo x, Foo y)
    {
        if (ReferenceEquals(x, y)) return true;
        if (ReferenceEquals(x, null) || ReferenceEquals(y, null)) return false;
        return x.Bar == y.Bar;
    }

    public override int GetHashCode(Foo obj)
    {
        if (ReferenceEquals(obj, null)) return -1;
        return obj.Bar.GetHashCode();
    }
}

EqualityComparer<T> is a base class implementation of the IEqualityComparer<T> generic interface. You could create your own class that implements IEqualityComparer<T> directly, but the documentation recommends deriving a class from EqualityComparer<T>.

By now you should be familiar with the two methods that IEqualityComparer<T> requires: Equals and GetHashCode. Implementations of these two methods are subject to all of the rules I mentioned in my previous article, including those rules about handling null values.

We can then change the example code above so that the collection uses an instance of this class to do its equality comparisons, rather than depending on whatever possibly incorrect default behavior exists in the classes themselves.

var foos = new HashSet<Foo>(new FooEqualityComparer());
foos.Add(new Foo(0));
foos.Add(new Fooby(0, "Hello"));
foos.Add(new Fooby(1, "Jim"));
foos.Add(new Foo(1));
Console.WriteLine(foos.Count);

Now, when the collection wants to compute a hash code, it always calls the GetHashCode method in the supplied equality comparer. And when it needs to compare two items, it always calls that Equals method. The collection is no longer at the mercy of objects’ individual GetHashCode and Equals methods.

You don’t need to limit this to collections, by the way. You can create an equality comparer and call its Equals method directly:

var myEquals = new FooEqualityComparer();
var f1 = new Foo(0);
var fb1 = new Fooby(0, "Hello");
var areEqual = myEquals.Equal(f1, fb1);

By supplying your own equality comparer, you control exactly how hash codes are computed and how items are compared. You are no longer hampered by the default meaning of equality as implemented by the individual types.

It’s interesting to note that much of what I’ve said about equality comparisons in this and the previous post applies equally to comparisons in general (is object a less than, equal to, or greater than object b). Perhaps I’ll talk about that soon.

Inheritance and IEquatable do not mix

Do not, under any circumstances, implement the IEquatable<T> interface on a non-sealed class.

I’m serious. If you do it, you will be sorry. You cannot make it work.

Think I’m crazy? People who know a lot more about this stuff than I do, agree.

Let me explain what they apparently knew some time back, and I just recently discovered. Here’s the short version:

A class that implements IEquatable<T> is saying, “I know how to compare two instances of type T or any type derived from T for equality.” After all, if type B derives from A, then B is-a A. That’s what inheritance means! The problem with implementing IEquatable<T> on a non-sealed type is that a base class cannot know how to compare instances of derived types. Trying to implement IEquatable<T> on a non-sealed class ends up producing inconsistent results when comparing instances of derived types.

Now the long version.

I assume here that you understand the difference between reference equality and value equality, know how to properly override the Object.Equals(Object) method on a class, and you’re at least passingly familiar with IEquatable<T>.

To review, you implement the IEquatable<T> generic interface on your class to provide a type-safe method for determining equality of instances. The type-safe Equals(T)method is called by certain methods of the generic collection classes, and is in general slightly faster (according to the documentation) than an overridden Object.Equals method. The documentation recommends that you implement the interface on any class that you expect will be stored in an array or a generic collection, “so that the object can be easily identified and manipulated.” It also recommends that if you implement IEquatable<T>, you should override the base class implementations of Equals(Object) and GetHashCode.

To implement IEquatable<T>, a class merely has to define a public Equals method that accepts a single parameter of type T, and returns a Boolean value indicating whether the object passed is considered equal. IEquatable<T>.Equals is subject to all of the same rules as Object.Equals(Object). It’s really just a type-safe version of that method, and in fact the documentation recommends that if you implement IEquatable<T>, then you should implement the overridden Object.Equals(object) method in terms of IEquatable<T>.Equals.

In the code below, I’ve created a base class, Foo, and a derived class, Fooby. Both override the Equals method as recommended, and both conditionally implement IEquatable<T>. The conditional implementation is there so that I can illustrate why implementing IEquatable<T> on non-sealed classes is a bad idea.

//#define UseIEquatableFoo
//#define UseIEquatableFooby

using System;

namespace EqualsFoo
{
    public class Foo
#if UseIEquatableFoo
        : IEquatable<Foo>
#endif
    {
        public readonly int Bar;

        public Foo(int bar)
        {
            Bar = bar;
        }
#if UseIEquatableFoo
        // IEquatable<Foo>.Equals
        public
#else
        private 
#endif
        bool Equals(Foo other)
        {
            // If the other is null, it can't be equal
            if (ReferenceEquals(other, null)) return false;

            // If they're the same object, then they're equal
            if (ReferenceEquals(this, other)) return true;

            // If they're not the same type, they can't be equal
            if (this.GetType() != other.GetType()) return false;

            // and do the comparison
            return Bar == other.Bar;
        }

        public override bool Equals(object obj)
        {
            return Equals(obj as Foo);
        }

        public override int GetHashCode()
        {
            return Bar;
        }
    }

    public class Fooby: Foo
#if UseIEquatableFooby
        , IEquatable<Fooby>
#endif
    {
        public readonly string Barby;

        public Fooby(int bar, string by)
            : base(bar)
        {
            Barby = by;
        }

#if UseIEquatableFooby
        // IEquatable<Fooby>.Equals
        public
#else
        private
#endif
        bool Equals(Fooby other)
        {
            // If the other is null, it can't be equal
            if (ReferenceEquals(other, null)) return false;

            // If they're the same object, then they're equal
            if (ReferenceEquals(this, other)) return true;

            // If they're not the same type, they can't be equal
            if (this.GetType() != other.GetType()) return false;

            // and do the comparison
            return base.Equals(other) && (Barby == other.Barby);
        }

        public override bool Equals(object obj)
        {
            return Equals(obj as Fooby);
        }

        public override int GetHashCode()
        {
            return base.GetHashCode() ^ Barby.GetHashCode();
        }
    }
}

The conditional code for the Equals(Foo) method looks a little weird, I’ll admit. I wrote it this way to eliminate some code duplication. If UseIEquatableFoo is defined, then Equals(Foo) is public, as required for interface implementations. Otherwise it’s private so that all equality checks go through the overridden Equals(Object) method. The Fooby.Equals(Fooby) method is similarly decorated.

Leave those two symbols undefined for now.

The rules for Object.Equals(Object) say that x.Equals(y) == y.Equals(x). Similarly, the rules for Object.Equals(Object, Object) say that Object.Equals(x, y) == Object.Equals(y, x). I can’t point you to anything that specifically says Object.Equals(x, y) == x.Equals(y), but it seems like a reasonable thing to expect.

Given the above, the following code should output false four times.

var fb1 = new Fooby(0, "hello");
var fb2 = new Fooby(0, "goodbye");

Console.WriteLine(fb1.Equals(fb2));
Console.WriteLine(fb2.Equals(fb1));
Console.WriteLine(Object.Equals(fb1, fb2));
Console.WriteLine(Object.Equals(fb2, fb1));

If you run that code, you’ll see that it does indeed work as expected.

Now implement IEquatable<Foo>. Uncomment the conditional symbol UseIEquatableFoo, and run the program again. The output might surprise you:

True
True
False
False

To borrow a line from one of my favorite movies: “What the schnitzel?”

The problem is overload resolution. When the compiler sees the expression, fb1.Equals(fb2), it has to decide what method call to generate code for. It has the choice of four methods:

  • Object.Equals(Object)
  • Foo.Equals(Object)
  • Fooby.Equals(Object)
  • Foo.Equals(Foo) (Implementation of IEnumerable<Foo>.Equals)

The overload resolution rules pick the “best” function to call based on the types of the arguments. The compiler decided that the best overload is the last one, Foo.Equals(Foo). After all, fb1 is-a Foo, and so is fb2. (If you’re interested in the exact rules used to make this determination, see the specification linked above.)

The two instances of Fooby are compared as though they are of type Foo. The Barby fields aren’t ever compared, so the result of the comparison is true. This doesn’t happen when we call Object.Equals(fb1, fb2), because Object.Equals(Object, Object) doesn’t know the static types. All it can do is call the virtual Object.Equals(Object) method on fb1. That call is resolved at runtime.

If fb1.Equals(fb2) and Object.Equals(fb1, fb2)don’t agree, you have a problem. Clients using your class will rightfully suspect your competence and the wisdom of continuing to trust anything else you’ve written.

If you’re not yet convinced that implementing IEquatable<T> on a non-sealed type is a bad idea, then your only solution to this problem is to make class Fooby implement IEquatable<Fooby>. That will give the compiler a better overload to bind. Just uncomment the UseIEquatableFooby conditional, and re-run the test. Once again, all is right with the world.

Or so you think. Let’s move the equality test to a separate method:

static void TestEquality(Foo f1, Foo f2)
{
    Console.WriteLine(f1.Equals(f2));
    Console.WriteLine(f2.Equals(f1));
    Console.WriteLine(Object.Equals(f1, f2));
    Console.WriteLine(Object.Equals(f2, f1));
}

And add a call to that method after the original tests:

var fb1 = new Fooby(0, "hello");
var fb2 = new Fooby(0, "goodbye");

Console.WriteLine(fb1.Equals(fb2));
Console.WriteLine(fb2.Equals(fb1));
Console.WriteLine(Object.Equals(fb1, fb2));
Console.WriteLine(Object.Equals(fb2, fb1));

TestEquality(fb1, fb2);

Now what happens when you run the test?

False
False
False
False
True
True
False
False

The problem, once again, is overload resolution. But there’s nothing you can do about it this time.

Implementing IEquatable<T> makes the definition of equality dependent on the compile time static type. In this particular case, the compiler sees that the f1 and f2 parameters in the TestEquality method have a static type Foo, so it generates code to call Foo.Equals(Foo), even though their runtime types are Fooby.

The problem occurs in collections, too, although it’s more difficult to show. To illustrate the problem, I have to generate two Fooby instances that produce the same hash code. The code below generates 8-character hexadecimal strings until it finds two that produce the same hash code. It then creates two Fooby instances and tries to add them to a HashSet<Foo>.

// Find two strings that have identical hash codes
string clash1 = null;
string clash2 = null;

var d = new Dictionary<int, string>();
for (var i = 0; i < int.MaxValue; ++i)
{
    string s = i.ToString("X8");
    int hash = s.GetHashCode();
    string val;
    if (d.TryGetValue(hash, out val))
    {
        Console.WriteLine("string '{0}' clashes with '{1}'", s, val);
        clash1 = s;
        clash2 = val;
        break;
    }
    d.Add(hash, s);
}

// Add the two items to a HashSet<Foo>
var foos = new HashSet<Foo>();
foos.Add(new Fooby(0, clash1));
foos.Add(new Fooby(0, clash2));
Console.WriteLine("foos.Count={0}", foos.Count);

On my system, it tells me, "string '000B020A' clashes with '00070101'".

If I run the program without implementing IEquatable<T> on the classes, foos.Count returns 2, which is correct. If I enable implementation of IEquatable<T>, foos.Count returns 1. It thinks that Fooby(0, "000B020A") is equal to Fooby(0, "00070101"), which obviously is not true. You can take issue with my GetHashCode implementation if you like, but the nature of hash codes is that you’re mapping an essentially infinite number of objects (how many possible strings are there?) onto a code that can hold a finite (2^32) number of unique values. No matter how good your hash code generation is, there will be duplicates!

There is no way to make this work correctly. If you implement IEquatable<T> on a non-sealed class, equality comparisons of derived types will produce inconsistent results. Don’t do it!

Most likely, the behavior you’re looking for is properly implemented by deriving a class from EqualityComparer<T>. I’ll give you a closer look at that next time.

The Fiscal Cliff

cartoon_640

I think this is the first cartoon I’ve ever attempted to draw. It’s certainly the first one I’ve ever published in any form. And, yes, I do need to improve my drawing ability. I’m working on it . . .

Categories

A sample text widget

Etiam pulvinar consectetur dolor sed malesuada. Ut convallis euismod dolor nec pretium. Nunc ut tristique massa.

Nam sodales mi vitae dolor ullamcorper et vulputate enim accumsan. Morbi orci magna, tincidunt vitae molestie nec, molestie at mi. Nulla nulla lorem, suscipit in posuere in, interdum non magna.