Wednesday, February 27, 2013

Entity Identities

Entity Identities - say that ten times really fast! Entity Identities; Entity Identities; Eddity Iddedy; Eddy Idy; Idddy Eddy; Identity Entity. Ah foobar!

A recent twitter conversation and stackoverflow question sparked some latent thoughts I've been having about identities for entities and where responsibilities lie.

If you're in a hurry: As a convention, use the string datatype for all identities. Don't worry about Guid's, ints, bigints, Uri's etc. String is all you'll ever need and provides the most flexibility. Let the data store dictate the storage data type and the identity creation strategy - not your code.

I'm referring in particular to surrogate identities, not natural domain identities or composite identitites. Think CustomerId or something less boring like, say, GalaxyId.

The Identity

Let's explore requirements for the identity of an entity for a moment. These can be broken down into two categories: requirements imposed by code that uses the identity and requirements imposed by persistence layer. Sure... if you want to be complete, include a third category for requirements imposed by humans, but who cares about them, right?

Requirements imposed by code that uses the identity:
  • The Id must be serializable so it can be transported across boundaries
  • The Id must be comparable for equality against another Id
  • The data type for the Id must have enough of a range to provide a unique value for every possible instance of an entity.
Note the complete absence of any need to perform any mutable operations on the identity. In fact, it's probably a good thing if the identity is immutable. We don't really even care about the data type used as long as it meets our needs. The most we need to do is transport it and compare it.

Requirements imposed by the persistence layer
Performant and scalable persistence layers typically use some form of key hashing to get O(1) look-up performance on entities. Think key/value stores or clustered indexes in RDBMS's. Based on this, we can derive some more requirements:
  • A hashcode must be easily computable from the Id
  • The Id must be comparable for equality against another Id
Some persistence stores impose limitations on the data types that can be used as an identity while others suggest best practices on how identities are generated (for example, randomized guid's are as well suited to clustered indexes as I am to being a potted plant). Given this reality, combined with the fact that the code generally has no restriction on the datatype of the Id, it's probably best to let the datastore dictate the data type as well as the Id creation strategy (sequential, random, magic etc.).

If the data store dictates the datatype of the identity, what happens if I need to change my data store from one that uses guids to one that uses strings, or integers? Don't I need to go and update all my code? Not if you used the widest possible datatype. In this case, strings. (By the way, you probably have bigger problems if you're changing your datastore technology!). 

Why do you need to represent a GalaxyId as a Guid in code, even if the datastore actually does store it as a UniqueIdentifier? What ya gonna do? Get the bytes of the Id?

If a string is good enough for URL's and ETag, it's good enough for your GalaxyId too.

There's probably some caveats with this strategy (mostly where ORM frameworks are concerned), but it's served me well in my quest to make code immutable.

Friday, February 22, 2013

Reviews and feedback

As I work to establish a functioning enterprise and system architecture practice at my company, I came up with some of the below guidance for reviews - code, design, or otherwise.

Anybody can do a review at any time. This includes System Admins, DBAs, Product Owners, UX, Architects, Developers, Leprechauns and Fairies. Everybody looks for something different and brings a fresh perspective, but all in the name of making a better product for our users and keeping Mr. Murphy at bay. In general, everybody, no matter how smart they think they are, can learn from anybody else (no matter how smart or stupid you may think they are).

Feedback must be well articulated, shared, constructive, and actionable. If you don't like something, but can't tell me why you don't like it or what's better, then that's not feedback... that's complaining.

Feedback must be tracked and actioned upon. You just wasted a valuable way to learn if you don't accept feedback. Put it in the backlog as a type of bug, track it, close it. Do your reviewer justice.

We're all human (architects barely so). Humans have feelings. You may hurt my feelings if you criticize my code masterpiece. Okayyyy then. I've written code that’s less than stellar and ignore my own principles from time to time. At no point should reviews be personal or derogatory to the author. The flip side is that no developer should be so attached to their code that if they receive a negative review they become offended. We need to build a culture of continuous improvement and learning, not big brother with a big stick.

There's probably others, but I feel the above is a good start.

Tuesday, February 19, 2013

The compiler is your friend

The compiler is your friend, but only if you’re constantly aware of its presence and leverage it to its full potential. Get to know its warnings and errors – it’s trying to tell you something about your code.

A simple example is how the design of a method can be changed around to get the compiler to tell you if you’ve forgotten something.

Friday, February 15, 2013

Security as a functional concern

As we embark on a pretty large system rewrite, one of my goals is to bring some traditionally just-in-time concerns to the forefront as first class citizens in design and implementation. One of these concerns is security and one of the ways I want to make it a first class citizen is to make sure it's injected at the source (via business requirements) so that it flows all the way through the development and QA pipelines and is verifiable at different stages. This isn't the only way I'd like to tackle security, but it's probably the most visible.

Given recent examples of security breaches that aren't a result of a technical bug in the system, ensuring business stakeholders and product owners take notice of security is an important goal.

Wednesday, January 30, 2013

Code should be immutable

Change by addition, not mutation. Putting the open-closed principle into practice.

If I write a unit test and some code that satisfies that unit test, I should be reasonably confident that I will never have to change that unit of code again. I may add more units of code and I may delete old units it as soon as they become redundant, but I shouldn't put myself in a position to have to mutate an existing unit of code.

Within reason of course. Units of code aren't always atomic enough for this guidance to be considered a standard. The law of diminishing returns applies - be pragmatic in anticipating the ways the program will be required to change and pay particular attention to the probable extension points and axes of change.

I've realized a few benefits of this approach:
  • New code has fewer dependencies. I have much less risk if I introduce something new than if I modify something that exists.
  • I can have more control over when I replace an existing unit with a new unit. For example, I can run both side-by-side and compare their performance.
  • It helps me maintain a consistent velocity. Since the side effects of new code are less than changing existing code, I run into fewer unexpected problems and edge cases.
  • It gives me feedback on my code organization. If I cannot take this approach, it tends to be a good indicator of a violation of the single responsibility principle or proper encapsulation. It also means that in the future, I'll be spending all my time changing existing code to add new features and my velocity will tank.
  • I feel more creative when coding. Creating new code is more fulfilling than modifying existing code.

Some things need to be in place for this to happen successfully:
  • As mentioned above, strong and consistent application of the single responsibility principle and open-closed principle.
  • Good discipline in ensuring the units of code are of the right size. Typically it's smaller than a class, but bigger than a method. Perhaps a nested class.
  • Ability to swap an old unit with a new unit with minimal mutation (doesn't have to mean IoC - just reduce the proliferation of the unit).
  • Ability to flag redundant units for safe removal and the discipline to actually remove them.
  • Ability to measure the ratio of mutation to addition/removal and see it trend in the right direction.

Go ahead, try it on a small scale and see how you feel. Does the artificial restriction of not being able to modify your existing code cause you to think more about your design? Does it hurt your velocity or help maintain or improve it?

Keen observers will notice that this is just a spin on well established OOP principles (SRP and OCP). What I find fascinating is that code and the systems it's contained in are like fractals... the principles and hence the way they look is the same from any level of zoom.

Wednesday, January 23, 2013

What does the web mean to me?

The web to me is not the browser. It's not REST, it's not JavaScript. It's not Facebook, nor is it Twitter. The web, to me, is the fact that I'm writing this blog post offline and on my mobile phone while on the subway home*.

Two concepts embody the web to me: seamless computing and graceful handling of occasionally connected scenarios.

Seamless computing (sometimes called "the cloud" in nontechnical contexts), is the notion that my profile and my data are accessible and understood no matter what device I'm on. My email appears on my computer and phone. My music appears on my home theater and my portable music player. My eBooks appear on my eInk device and my phone. Hell, my thermostat is even on my wall and my phone!

Dealing with occasionally connected scenarios is a bit trickier, but oh, so, necessary. A contrived example is that I don't need to be online to read my eBooks. A slightly more thoughtful example is how I'm authoring this post offline. An often overlooked scenario is: Do I need to be online to search? 

With many of today's "instant search", the answer is "yes" and there is no graceful fallback for offline scenarios. What would a graceful fallback for search when offline look like?
  • Capture and save my search criteria
  • Wait until there is a connection to execute the search
  • Notify me asynchronously that you have results for me

Is that a useful use case? It's better than nothing. Maybe I'm out and about in a land of roaming charges and I see something I'm interested in that I want to research later.

I hereby redefine the term "world wide web" to mean any set of loosely coupled applications that enable seamless computing and rich occasionally connected experiences. The browser is just another application in that ecosystem.

* Yep, all text was written on my Galaxy Nexus during my 1h commute home. In true tradition of seamless computing, formatting and linking was performed on my laptop via the blogger web interface while connected. The right tools for the right job at the right time all working together. This is my web.

Sunday, January 13, 2013

Quick comprehension and information density

I've become a big fan of the PechaKucha format for almost all types of presentations. I'm also a fan of bite-sized information from other sources like the Khan Academy and MinutePhysics - the barrier to entry for complex topics is low and I highly recommend that you check them out. I find that my attention rarely wavers from the topic being discussed - primarily due to the time constraints and talent of the presenters.

I've been inspired to incorporate some similar principles into other formats (technical documents for example). It's a fun challenge for the author to work within these types of constraints and the benefits of a focused and understanding audience should be obvious.

Here is what I set out to achieve, whether I'm giving a presentation in 20x20 format or authoring a high level design document:
  • Keep the audience focused
  • Ensure high information density with no filler
  • Ensure a high amount of cohesiveness within the topic being discussed
  • Aim to gain comprehension in a short amount of time as possible

The 20x20 format (20 images, 20 seconds per image) provides a really good framework for guiding the author to the goals outlined above: You're forced to focus on the topic, you have a limited amount of time to discuss the topic (6 minutes), and you have a limited area to visualize your concepts (20 images).

For technical documentation (like high level software architecture documents), here are some constraints I'm going to try imposing on myself and my team to gauge how effective it is. A document must:
  • Be able to be read and understood in under 15 minutes. This constrains the length of document.
  • Be highly cohesive - all parts of the document must relate to the central concept being discussed.
  • Be independent - while it's OK for the document to reference other documents, reading those references must not be a requirement to the understanding of the topic of the document being read.
There are parallel's here to good code design (e.g. single responsibility principle or highly cohesive, loosely coupled classes). I'm not sure if that's a coincidence or a sign of a universal guideline of the universe!

Like the 20x20 format, there will be challenges for authors to trim complex concepts down to these formats, but it's a worthwhile quest as the benefits are very large indeed.

I can't wait to attend my first organized PechaKucha event here in Toronto later this month.

Tuesday, January 8, 2013

Structured decision making

Making decisions that involve multiple possible options and multiple stakeholders is hard. It's hard because it's usually done in an unstructured way, typically involving a never ending cycle of "pros & cons" that seem to get you into analysis paralysis until somebody makes an "executive decision" just because you've run out of time. As a consultant caught in the middle of a number of key decisions, I've come to rely on a process that I've derived from some structured thinking principles to reduce the time needed to make an informed and consensus-building decision.

Friday, January 4, 2013

Enterprise Information Management

Information management is difficult in any form, but there's no shortage of tooling to try and address the needs. My own needs are probably a subset of true ECM, but here are some of the more basic ones I would like to see. The main categories I care about are the authoring experience, the content features, the reading experience and the organization and ability to discover information.