Know How to Implement Hashcode and Equals

Summary

Implementing hashCode and equals is not straightforward. Do not implement them unless it is necessary to do so. If you do implement them, make sure you know what you are doing.

Details

It is well known that if you override equals then you must also override the hashCode method (see Effective Java item 9).

If logically-equal objects do not have the same hashCode they will behave in a surprising manner if placed in a hash based collection such as HashMap.

By “surprising”, we mean your program will behave incorrectly in a fashion that is very difficult to debug.

Unfortunately, implementing equals is surprisingly hard to do correctly. Effective Java item 8 spends about 12 pages discussing the topic.

The contract for equals is handily stated in the Javadoc of java.lang.Object. We will not repeat it here or repeat the discussion of what it means, that can be found in Effective Java and large swathes of the internet. Instead we will look at strategies for implementing it.

Whichever strategy you adopt, it is important that you first write tests for your implementation.

It is easy for an equals method to cause hard-to-diagnose bugs if the code changes (e.g. if fields are added or their type changes). Writing tests for equals methods used to be a painful and time-consuming procedure, but libraries now exist that make it trivial to specify the common cases (see Testing FAQs).

Don’t

This is the simplest strategy and the one you should adopt by default in the interests of keeping your codebase small.

Most classes do not need an equals method. Unless your class represents some sort of value it makes little sense to compare it with another so stick with the inherited implementation from Object.

An irritating gray area are classes where the production code never has a requirement to compare equality but the test code does. The dilemma here is whether to implement the methods purely for the benefit of the tests or to complicate the test code with custom equality checks.

There is, of course, no right answer here; we would suggest first trying the compare-it-in-the test approach before falling back to providing a custom equals method.

The custom equality checks can be cleanly shared by implementing a custom assertion using a library such as AssertJ or Hamcrest.

Effective Java tentatively suggests having your class throw an error if equals is unexpectedly called

  1. @Override public boolean equals(Object o) {
  2. throw new AssertionError(); // Method is never called
  3. }

This seems like a good idea but, unfortunately, it will confuse most static analysis tools. On balance, it probably creates more problems than it solves.

Auto-Generate With an IDE

Most IDEs provide some method of auto-generating hashCode and equals methods. This is an easily-accessible approach, but the resulting methods are (depending on the IDE and its settings) often ugly and complex such as the ones generated by Eclipse shown below:

  1. @Override
  2. public int hashCode() {
  3. final int prime = 31;
  4. int result = 1;
  5. result = prime * result + ((field1 == null) ? 0 : field1.hashCode());
  6. result = prime * result + ((field2 == null) ? 0 : field2.hashCode());
  7. return result;
  8. }
  1. @Override
  2. public boolean equals(Object obj) {
  3. if (this == obj)
  4. return true;
  5. if (obj == null)
  6. return false;
  7. if (getClass() != obj.getClass())
  8. return false;
  9. MyClass other = (MyClass) obj;
  10. if (field1 == null) {
  11. if (other.field1 != null)
  12. return false;
  13. } else if (!field1.equals(other.field1))
  14. return false;
  15. if (field2 == null) {
  16. if (other.field2 != null)
  17. return false;
  18. } else if (!field2.equals(other.field2))
  19. return false;
  20. return true;
  21. }

Unless your IDE can be configured to produce clean methods (as discussed below) we do not generally recommend this approach. It is easy for bugs to be introduced into this code by hand editing over time.

Hand Roll Clean Methods

Java 7 introduced the java.util.Objects class that makes implementing hashCode trivial. Guava provides the similar com.google.common.base.Objects class which may be used with earlier versions of Java.

  1. @Override
  2. public int hashCode() {
  3. return Objects.hash(field1, field2);
  4. }

The Objects class also simplifies implementing equals a little by pushing most null checks into the Objects.equals method.

  1. @Override
  2. public boolean equals(Object obj) {
  3. if (this == obj) // <- performance optimisation
  4. return true;
  5. if (obj == null)
  6. return false;
  7. if (getClass() != obj.getClass()) // <- see note on inheritance
  8. return false;
  9. MyClass other = (MyClass) obj;
  10. return Objects.equals(field1, other.field1) &&
  11. Objects.equals(field2, other.field2);
  12. }

The first if statement is not logically required and could be safely omitted; it may, however, provide performance benefits.

Usually, we would recommend that such micro-optimizations are not included unless they have been proven to provide a benefit. In the case of equals methods, we suggest that the optimization is left in place. It is likely to justify itself in at least some of your classes and there is value in having all methods follow an identical template.

The example above uses getClass to check that objects are of the same type. An alternative is to use instanceof as follows

  1. @Override
  2. public boolean equals(Object obj) {
  3. if (this == obj)
  4. return true;
  5. if (obj == null)
  6. return false;
  7. if (!(obj instanceof MyClass)) // <- compare with instanceof
  8. return false;
  9. MyClass other = (MyClass) obj;
  10. return Objects.equals(field1, other.field1) &&
  11. Objects.equals(field2, other.field2);
  12. }

This results in a behavioral difference - comparing instances of MyClass with its subclasses will return true with instanceof but false with getClass.

In Effective Java Josh Bloch argues in favor of instanceof as the getClass implementation violates a strict interpretation of the Liskov substitution principle.

However, if instanceof is used, it is easy for the symmetric property of the equals contract to be violated if a subclass overrides equals. i.e.:

  1. MyClass a = new MyClass();
  2. ExtendsMyClassWithCustomEqual b = new ExtendsMyClassWithCustomEqual();
  3. a.equals(b) // true
  4. b.equals(a) // false, a violation of symmetry

If you find yourself in a situation where you need to consider the nuances of whether subclasses are equal to their parents then we strongly suggest you reconsider your design.

Having to think about maintaining the equals contract in a class hierarchy is painful and you shouldn’t need to put yourself, or your team, through this for normal server-side coding tasks.

In the majority of cases, if you think it makes sense for your class to implement hashCode and equals, we strongly suggest you make your class final so hierarchies do not need to be considered.

If you believe you have a case where it makes sense for subclasses to be treated as equivalent to their parent, use instanceof but ensure that the parent equals method is made final.

Avoid relationships that are more complex than this.

Commons EqualsBuilder and HashCodeBuilder

The Apache commons hashcode and equals builders were once a popular way of generating these methods. We do not recommend their use in new code as most of what they achieved is now provided by java.util.Objects without bringing in a 3rd party library, or by the Guava equivalent.

These classes do provide the option of a single line reflection based implementation.

  1. public boolean equals(Object obj) {
  2. return EqualsBuilder.reflectionEquals(this, obj);
  3. }
  1. public int hashCode() {
  2. return HashCodeBuilder.reflectionHashCode(this);
  3. }

The brevity of these implementations is attractive, but their performance is measurably poorer than others discussed so far. Good performance tests and regular profiling can help determine whether a poorly performing method genuinely leads to performance bottlenecks in your application. If you are confident that you would detect such adverse impacts then using these methods as initial placeholder implementations may be a reasonable approach. But in general we suggest you avoid them.

Code Generators

A number of projects exist that can auto-generate value objects at build-time. Two of the better known options are :

But many others are available.

Google Auto

Google Auto will create a subclass with the obvious implementation of an abstract class annotated with @AutoValue. This implementation will include functioning hashcode and equals methods.

  1. import com.google.auto.value.AutoValue;
  2. @AutoValue
  3. abstract class Animal {
  4. static Animal create(String name, int numberOfLegs) {
  5. return new AutoValue_Animal(name, numberOfLegs);
  6. }
  7. Animal() {}
  8. abstract String name();
  9. abstract int numberOfLegs();
  10. }

This is clearly far less effort than hand crafting a complete Animal class, but there are some downsides.

Some of the issues with code generators are discussed in “Consider Code Generators Carefully”, which categorized them into friction and surprise.

Here, Google Auto introduces some friction as the code shown above will not compile within an IDE until the generator has run to produce the AutoValue_Animal class.

There is also some surprise.

Because it is a value, Animal would normally be implemented as a final class - but we have been forced to make it abstract. The team behind Auto recommend you add a package-private constructor to prevent other child classes being created.

Unlike normal Java, the order in which accessors are declared is important because it is used by the generator to define the order of the constructor parameters. Re-ordering the accessors can, therefore, have the surprising effect of introducing a bug.

Lombok

Lombok can also (amongst other things) generate full implementations of value objects.

It takes a different approach to Google auto.

Given an annotated class such as:

  1. @Value
  2. public class ValueExample {
  3. String name;
  4. @NonFinal int age;
  5. double score;
  6. }

It will alter the class at build-time to produce an implementation along the lines of:

  1. public final class ValueExample {
  2. private final String name;
  3. private int age;
  4. private final double score;
  5. public ValueExample(String name, int age, double score) {
  6. this.name = name;
  7. this.age = age;
  8. this.score = score;
  9. }
  10. public String getName() {
  11. return this.name;
  12. }
  13. public int getAge() {
  14. return this.age;
  15. }
  16. public double getScore() {
  17. return this.score;
  18. }
  19. public boolean equals(Object o) {
  20. // valid implementation of equality based on all fields
  21. }
  22. public int hashCode() {
  23. // valid hashcode implementation based on all fields
  24. }

While Google Value asks the programmer to provide a valid public API for a class, Lombok creates the public API based on a description of its internal state. The description is valid Java syntax but has a different meaning when interpreted by Lombok.

Lombok causes some friction. It is not practical to use Lombok without an IDE that understands it - code using the autogenerated api will appear to be invalid. An IDE plugin must be installed.

While it (arguably) introduces less friction than auto once the IDE plugin is installed, the behavior of Lombok is much more surprising. It is easy to explain what Auto does - it generates a class at build-time that implements an interface you define. It is much harder to predict or explain what Lombok will do.

Although Lombok requires the programmer to write less code than solutions such as Auto, it deviates further from normal Java.

If you consider using a code generator for Value classes, we would recommend you consider approaches such as Auto before Lombok.

To its credit Lombok does provide an escape route (see “Prefer reversible decisions”) in the form of delombok which allows you to output the generated classes. These can then be used to replace the annotated originals.

Removing Auto is similarly straightforward - the generated classes can be checked into the source tree. The artificial abstract class/implementation split can then be removed via simple refactorings.