Friday, July 21, 2017

The need for immutability

Immutable objects are entities that in the eyes of the external observer their state doesn't change. This doesn't strictly mean that internally the object doesn't change. It rather means, that as far as the API consumer is aware, any exposed method can be called without affecting the outcome of any future calls of all the exposed methods of the same instance.

This might be a confusing definition at first but let's take it bit by bit. First, let's look at the seemingly heretic statement that an immutable object can change internally.

Consider the String class in Java. It is without doubt an immutable object. It encapsulates a char array which it protects by copying it every time the API consumer requests for it. Any instance method of the String class that does string operations gives a new String. If you look closely in the source code though, there is one field (in Java 8) that changes; the hash.

Calling the hashCode method of a string has a side effect. It caches the hashCode result so it won't have to compute it again. This is invisible to the external observer as all the future method calls will return the same result both before and after calling the hashCode method. Even the memory usage remains constant as the 4 bytes of the primitive int are already reserved.

This is not even a problem for multi-threading. Hash doesn't strictly need to be volatile, it will either be 0 and thus re-computed (it's not a real problem if it's computed multiple times in parallel or before the threads get the updated version of it) or not. There is no middle state since writing to an int is atomic.

Immutable objects are enjoying such optimisation delights specifically because they are immutable; You can call hash code a million times, you'll get the same result, cached or not.

Strings can also be interned ( a pool of re-usable strings) and Integers (Integer.class) are cached (from -128 to 127). Yes, immutability can also enable easy memory optimisations.

API considerations


Who is the consumer of the object? There are three types of consumers, the developer that interacts with an object via its API, the object itself including internally defined classes and a special case of consumers that break the immutability contract because "they know what they are doing".


The developer as a user


The first consumer is the one you need to worry about the most. Every public method you define in your class is a contract between you and the API user. If the public methods change the state, you need not only maintaining them but also handling all state errors at every entry point.

To demonstrate this, let's have a look at my favourite example:

final class Dog {
    public final int barkLevel;
    public final String name;
    public Dog(int barkLevel, String name) {
        if (barkLevel < 0)
            throw new IllegalArgumentException("Bark level cannot be a negative number");
        if (name == null || name.isEmpty())
            throw new IllegalArgumentException("A dog needs a name");
        this.name = name;
        this.barkLevel = barkLevel;
    }
}

This is an immutable dog. Once it's created neither its name nor its barkLevel can change. Of course this is not true in real life and we'll come back to that.

The benefit here is that given any Dog instance, it can be safely used forever. There is no way, as far as the external observer is aware, that you have a Dog that doesn't have a name or the bark level is negative. So the rest of the codebase need not worry about any validations of any Dog property.

In real life, the bark level can change (even the name in rare cases). But despite the fact that you can easily model real life in OOP, a computer program remains a different world with its own domain and semantics. Here the semantics clash a bit. Having no setter, the model says this dog will forever have this bark level. In the software world we can model this with a with method and give a new dog, preserving the real life semantics.

Also remember that this is not a real dog but it's the idea of what our program thinks of a dog. Thus, you can think of getting a new idea of what the dog is, instead of thinking in terms of physically getting a new dog because the bark level changed.

Now consider having this type of implementation, which is the commonest among Java codebases:

class Dog {
    private int barkLevel;
    private String name;
 
    public int getBarkLevel() { return barkLevel;}
    public String getName() { return name;}
    public void setBarkLevel(int barkLevel) { this.barkLevel = barkLevel;}
    public void setName(String name) {this.name = name;}

}

You can argue that you can add validation in every setter method. Even so, that means at any point, your dog instance can break and you'll need a new dog or go back to the previous one. How do you manage failures? You need to remember previous states and recover the dog. Or put logic that doesn't change the state to an erroneous one. All these just add more technical depth.

Builders

The second consumer which is also important is the object's class definition itself. It can incorporate all the mutability needs of the object in order to create the pre-defined immutable object.

For instance, consider that you have quite a few object properties. Calling a constructor with more than 3 parameters (or any method in fact) is inconvenient. What you need is a builder.

The builder will manage the mutability, you can call method after method defining the desirable state of the object. This is the concept of the string builder as well. That way you also tackle some performance considerations where you won't need to create and destroy N objects for N properties.

Creating a builder though is often a burden since you need to write a lot of boilerplate code. Fear not:  there are libraries to generate that code for you on compile time (e.g. Lombok for Java).

What about changing a single property? For POJOS it's tempting to have setter methods because they seem cheap and they change the state of a single object, no copies involved. To write a method that gives you the same object with a single property changed is again involving a lot of boilerplate code which also doesn't seem efficient.

All these though can be automated, either by macros or by using libraries such as Lombok. In Lombok's case you can just use @Wither which gives you a with method for every parameter that you want to be changeable. The performance overhead is minimal, since the copy of the properties is shallow.

Deserialising objects


We finally have the case of the third consumer. The one that breaks our immutability contract because it knows what it's doing. There is a way to even avoid that but we'll discuss it at the end.

Such consumers are normally serialisation libraries, such as Gson for Json or Hibernate for database entities. What they do with the most common configuration is instantiate an object and for each field they'll try to find a setter method that matches the field name prefixed with set in camel case. Configured appropriately for immutable objects they will instantiate the object and reflectively assign a value to each object field. Even final fields can be altered on runtime - in the case of Java at least.

Now the assumption here is that the immutability breaks in a limited scope; the method that does the deserialisation. At the end you will get a reference of a deserialised object and not a reference of an object which is being deserialised.

Given the right configuration some libraries allow calling the constructor with all the arguments needed directly. For example, Jackson has a set of annotations that you can use to map each json field with a constructor parameter.

Semantics


We've seen the 3 consumers, now we need to go back to the most important one; The human developers. So far we've seen that the benefit we give to them is not to worry about an object being in an invalid state.

Immutable objects give great semantics to the external observer as well. Consider the following definitions of an Exception:

final class JsonTypeCastingDecodingFailure extends Exception {
    public JsonTypeCastingDecodingFailure(String fieldName, Class expectedType, Class found) {
        super(String.format("%s cannot be casted to %s from %s", fieldName, expectedType.getName(), found.getName()));
    }
}

class JsonTypeCastingDecodingFailure extends Exception {
    private String message;
    public JsonTypeCastingDecodingFailure(String fieldName, Class expectedType, Class found) {
        this.messageString.format("%s cannot be casted to %s from %s", fieldName, expectedType.getName(), found.getName());
    }
    public void setMessage(String message) {
        this.message = message;
    }

    public String getMessage() {
        return message;
    }
}


The second definition makes no sense. What are the semantics of having an error that you allow someone to alter its message?

Semantics are often ignored during programming. Most developers have been using the same wrong things over and over again until they have become the normal; adding getters and setters is one of them, post-fixing Exception at every exception class name is another (but this is a story for another time).


Summary


Every public method provided is a contract, it has a purpose and a meaning and allows interpretations which are sometimes the wrong ones; semantics are the thing that everyone cares only when they have inherited legacy code.

State changes need to be managed in a restricted area where a Facade provides the minimum API to do one thing and the internal state is encapsulated and protected.

My final advice is an old but often neglected one: Make something public if and only if it needs to be public. A setter method will never pass that condition.