Wednesday, February 27, 2013

Entity Identities

Entity Identities - say that ten times really fast! Entity Identities; Entity Identities; Eddity Iddedy; Eddy Idy; Idddy Eddy; Identity Entity. Ah foobar!

A recent twitter conversation and stackoverflow question sparked some latent thoughts I've been having about identities for entities and where responsibilities lie.

If you're in a hurry: As a convention, use the string datatype for all identities. Don't worry about Guid's, ints, bigints, Uri's etc. String is all you'll ever need and provides the most flexibility. Let the data store dictate the storage data type and the identity creation strategy - not your code.

I'm referring in particular to surrogate identities, not natural domain identities or composite identitites. Think CustomerId or something less boring like, say, GalaxyId.

The Identity

Let's explore requirements for the identity of an entity for a moment. These can be broken down into two categories: requirements imposed by code that uses the identity and requirements imposed by persistence layer. Sure... if you want to be complete, include a third category for requirements imposed by humans, but who cares about them, right?

Requirements imposed by code that uses the identity:
  • The Id must be serializable so it can be transported across boundaries
  • The Id must be comparable for equality against another Id
  • The data type for the Id must have enough of a range to provide a unique value for every possible instance of an entity.
Note the complete absence of any need to perform any mutable operations on the identity. In fact, it's probably a good thing if the identity is immutable. We don't really even care about the data type used as long as it meets our needs. The most we need to do is transport it and compare it.

Requirements imposed by the persistence layer
Performant and scalable persistence layers typically use some form of key hashing to get O(1) look-up performance on entities. Think key/value stores or clustered indexes in RDBMS's. Based on this, we can derive some more requirements:
  • A hashcode must be easily computable from the Id
  • The Id must be comparable for equality against another Id
Some persistence stores impose limitations on the data types that can be used as an identity while others suggest best practices on how identities are generated (for example, randomized guid's are as well suited to clustered indexes as I am to being a potted plant). Given this reality, combined with the fact that the code generally has no restriction on the datatype of the Id, it's probably best to let the datastore dictate the data type as well as the Id creation strategy (sequential, random, magic etc.).

If the data store dictates the datatype of the identity, what happens if I need to change my data store from one that uses guids to one that uses strings, or integers? Don't I need to go and update all my code? Not if you used the widest possible datatype. In this case, strings. (By the way, you probably have bigger problems if you're changing your datastore technology!). 

Why do you need to represent a GalaxyId as a Guid in code, even if the datastore actually does store it as a UniqueIdentifier? What ya gonna do? Get the bytes of the Id?

If a string is good enough for URL's and ETag, it's good enough for your GalaxyId too.

There's probably some caveats with this strategy (mostly where ORM frameworks are concerned), but it's served me well in my quest to make code immutable.

No comments:

Post a Comment