Wednesday, February 27, 2013

Entity Identities

Entity Identities - say that ten times really fast! Entity Identities; Entity Identities; Eddity Iddedy; Eddy Idy; Idddy Eddy; Identity Entity. Ah foobar!

A recent twitter conversation and stackoverflow question sparked some latent thoughts I've been having about identities for entities and where responsibilities lie.

If you're in a hurry: As a convention, use the string datatype for all identities. Don't worry about Guid's, ints, bigints, Uri's etc. String is all you'll ever need and provides the most flexibility. Let the data store dictate the storage data type and the identity creation strategy - not your code.

I'm referring in particular to surrogate identities, not natural domain identities or composite identitites. Think CustomerId or something less boring like, say, GalaxyId.

The Identity

Let's explore requirements for the identity of an entity for a moment. These can be broken down into two categories: requirements imposed by code that uses the identity and requirements imposed by persistence layer. Sure... if you want to be complete, include a third category for requirements imposed by humans, but who cares about them, right?

Requirements imposed by code that uses the identity:
  • The Id must be serializable so it can be transported across boundaries
  • The Id must be comparable for equality against another Id
  • The data type for the Id must have enough of a range to provide a unique value for every possible instance of an entity.
Note the complete absence of any need to perform any mutable operations on the identity. In fact, it's probably a good thing if the identity is immutable. We don't really even care about the data type used as long as it meets our needs. The most we need to do is transport it and compare it.

Requirements imposed by the persistence layer
Performant and scalable persistence layers typically use some form of key hashing to get O(1) look-up performance on entities. Think key/value stores or clustered indexes in RDBMS's. Based on this, we can derive some more requirements:
  • A hashcode must be easily computable from the Id
  • The Id must be comparable for equality against another Id
Some persistence stores impose limitations on the data types that can be used as an identity while others suggest best practices on how identities are generated (for example, randomized guid's are as well suited to clustered indexes as I am to being a potted plant). Given this reality, combined with the fact that the code generally has no restriction on the datatype of the Id, it's probably best to let the datastore dictate the data type as well as the Id creation strategy (sequential, random, magic etc.).

If the data store dictates the datatype of the identity, what happens if I need to change my data store from one that uses guids to one that uses strings, or integers? Don't I need to go and update all my code? Not if you used the widest possible datatype. In this case, strings. (By the way, you probably have bigger problems if you're changing your datastore technology!). 

Why do you need to represent a GalaxyId as a Guid in code, even if the datastore actually does store it as a UniqueIdentifier? What ya gonna do? Get the bytes of the Id?

If a string is good enough for URL's and ETag, it's good enough for your GalaxyId too.

There's probably some caveats with this strategy (mostly where ORM frameworks are concerned), but it's served me well in my quest to make code immutable.

Friday, February 22, 2013

Reviews and feedback


As I work to establish a functioning enterprise and system architecture practice at my company, I came up with some of the below guidance for reviews - code, design, or otherwise.

Anybody can do a review at any time. This includes System Admins, DBAs, Product Owners, UX, Architects, Developers, Leprechauns and Fairies. Everybody looks for something different and brings a fresh perspective, but all in the name of making a better product for our users and keeping Mr. Murphy at bay. In general, everybody, no matter how smart they think they are, can learn from anybody else (no matter how smart or stupid you may think they are).

Feedback must be well articulated, shared, constructive, and actionable. If you don't like something, but can't tell me why you don't like it or what's better, then that's not feedback... that's complaining.

Feedback must be tracked and actioned upon. You just wasted a valuable way to learn if you don't accept feedback. Put it in the backlog as a type of bug, track it, close it. Do your reviewer justice.

We're all human (architects barely so). Humans have feelings. You may hurt my feelings if you criticize my code masterpiece. Okayyyy then. I've written code that’s less than stellar and ignore my own principles from time to time. At no point should reviews be personal or derogatory to the author. The flip side is that no developer should be so attached to their code that if they receive a negative review they become offended. We need to build a culture of continuous improvement and learning, not big brother with a big stick.

There's probably others, but I feel the above is a good start.

Tuesday, February 19, 2013

The compiler is your friend

The compiler is your friend, but only if you’re constantly aware of its presence and leverage it to its full potential. Get to know its warnings and errors – it’s trying to tell you something about your code.

A simple example is how the design of a method can be changed around to get the compiler to tell you if you’ve forgotten something.

Friday, February 15, 2013

Security as a functional concern

As we embark on a pretty large system rewrite, one of my goals is to bring some traditionally just-in-time concerns to the forefront as first class citizens in design and implementation. One of these concerns is security and one of the ways I want to make it a first class citizen is to make sure it's injected at the source (via business requirements) so that it flows all the way through the development and QA pipelines and is verifiable at different stages. This isn't the only way I'd like to tackle security, but it's probably the most visible.

Given recent examples of security breaches that aren't a result of a technical bug in the system, ensuring business stakeholders and product owners take notice of security is an important goal.