Friday, February 27, 2015


Estimates are a valuable tool that nobody really knows how to use properly or consistently. But with a bit of critical thinking we can fix that.

What's up with estimates?

Estimates suffer from unvalidated or uncommunicated assumptions, language differences, measurement differences (effort vs. calendar), or deliberate abstraction (points vs. hours, buffer, hidden budgets), and fail to account for interruptions. We also rarely do the most basic thing that would help us improve our estimates over time – measure actuals and learn from them. Hell, we barely notice the time passing as we work. We’re not building estimation experience. This does assume a certain level of repeatability in what's being estimated (repeatable = reliable estimate based on historical data, new = speculative).

Can we fix this? Absolutely.

We estimate every day and we're great at it. How long it will take me to get to work each morning? What time do I need to leave to pick up the kids? We're generally good at this type of thing - we've done it many times. We even know the threats to the estimate - traffic jams, transit delays, weather. Surely this scales to more complex systems? Not so much.

Situational Awareness

There are four situations for a project and recognizing which one applies most is important:
  1. The work just needs to get done as quick and efficiently as possible. 
  2. The work must be coordinated with other work. 
  3. There is a limit on budget. An external constraint that the work must fit into,
  4. A continuous development: there is no limit on budget or time as long as there is an ongoing positive return on investment. 

The situations aren't really mutually exclusive, but generally have a dominant theme over a given situation. Indeed, projects can morph between different situations and as such it's important to recognize the situation that applies and use the appropriate strategy for it.

How does estimation fit within these situations?

Situation #1: As quickly and efficiently as possible

It's not uncommon to just have a situation arise where the work just needs to get done. Estimates provide low value in this type of situation and it's rarely worth going to the effort of generating them - it sucks valuable time and produces inaccurate results anyway. This seems to be the foundation of the #NoEstimates movement as I understand it. 

Far more important in this situation is a solid definition of "done" (a goal to reach) and a measurement of progress to that goal. These are the tools you use to define and manage the work, not estimation. 

Too many times, this situation is not recognized and taken advantage of and we go through the motions of providing estimates that ultimately and unfortunately turn it into an artificially created deadline, morphing this situation into one with a limit on time and budget that is often inaccurate leading to undue stress, cut corners, and sacrificed quality.

Situation #2 The work must be coordinated with other work

Life has dependencies. Again with the travel analogy, flight arrival times are estimates and your family depends on them to determine when they should leave home to come pick you up from the airport. If the flight is delayed, things fall apart (sometimes with knock-on effects to other flights). To counteract this, updates to the estimate are provided via airline websites and a lot of work goes into perfecting the estimate in the first place.

Accurate up front estimates as well as constant readjustment are needed to keep the system as a whole from descending into chaos. If you look at the ultimate goal of a project, the delivery of the software is only one of many components - there's marketing, PR, training, financial forecasts - efforts that all go into making a product successful. Making and sticking to commitments is key to the success of the network of projects.

Some inaccuracy should be expected. The earlier the estimate is made the more inaccurate it will be. Refinement (both expansion and shrinkage) should be expected. Be flexible in what you expect from your dependencies, but be rigid in sticking to the commitments you make to your dependents. This simple rule will give the right amount of flex and predictability.

Situation #3: Constrained Budget, Time, or Resources

Infinite is not a concept that applies to money or time and almost all projects have a real limit placed on the amount that can be spent on them. There may be an available budget to achieve something or a timeslot to do it within. 

The question in this situation should be "How much can you do within this timeframe?", not "How long will it take you to do x?". Realize that the budget or timeframe is as much a goal or requirement as any of the features being proposed and design within it. 

We run into this all the time - do I have enough time to eat lunch today? Can I grab a coffee before I have to leave to catch the train? In project world it's usually a target date to coordinate with some event or it's a fixed budget you can spend. 

The only purpose an estimate serves in this situation is to determine if the work is worth starting at all. If the scope roughly matches up with the available timeframe, then turn it into the Situation #1: as quickly and efficiently as possible situation. If the scope and budget don't match at all, do something else or go back to the drawing board. 

Situation #4: Continuous Improvement

In a long-lived agile product cycle, the budget or timescale spans many years and the goals change throughout that timeframe. In this case, estimates serve no purpose on the macro scale of the product. Instead the focus is on providing continuous returns to match the continuous investment.  

The important factor here is to ensure that value is being measurably delivered at a consistent pace that is in line with the investment being spent. Usually this is measured by introducing an artificial constraint on time (usually a "sprint") and executing many of these in sequence. Within the sprint, you switch over to the Situation #3: constrained time.

From time to time, a product lifecycle may call for a large effort towards a goal (the initial launch of V1 or a new major feature). This then morphs away from continious improvement and into the territory of Scenario #2: the work must be coordinated with other work.

The estimate producers and consumers

Typically in development, engineers produce estimates based on a set of information and previous experience and stakeholders consume estimates so they can make plans to align different activities or allocate budget. The "business" is commonly called out as a stakeholder of estimates and is accused of taking an estimate and turning it into a commitment.

A lot of developers don't realize that the business also makes estimates - be they financial (revenue, margin etc.) or non-financial (#customers, etc.). The business also runs in "sprints" of different lengths (yearly, quarterly, monthly, weekly, daily) with commitments usually being made on a yearly or quarterly basis and measured monthly for progress.  

One key difference seems to be a defined point of turning the "estimate" into a "commitment". In the business world, this is usually tied into the forecasting process for the next financial period (year or quarter - the "sprint" of the business) - the time when the estimates turn into a target to reach by any means possible. I've rarely seen this explicit transition from estimate to commitment in development and perhaps it's time it was formalized. 

The estimation funnel

In projects that match the situation of needing an estimate, how can we avoid the hell of a stakeholder latching on to an early guess at timelines and complaining when the timelines aren't met? Smart, explicit commitment, that's how. Developers tend to hide behind "it's just an estimate, and it'll change" or "it should take this as long as these arbitrary unvalidated assumptions are true". We're scared of commitment. This isn't tolerated in business - a solid commitment is expected when it comes to targets. How do we bridge the gap?

The more you know about the goal and environment the more accurate the estimate will be, but it takes time to gain knowledge of the task. Ask me today, I'll tell you it'll take a 6-12 months to do. Ask me 2 weeks from now and I'll tell you it'll take between 7-8 months to do. Ask me in the middle of the project and I'll tell you our target release date.

Various levels of estimation maturity need to be taken into account as illustrated in the following diagram:

The red line in the diagram above illustrates a potential lifecycle of an estimate that gets turned into a commitment. Notice how if varies widely from 50% under estimated in the early stages of discovery, but tends to solidify as the project is defined and designed?

The line representing the commitment point is rarely called out explicitly, but it should be. If it's not, the stakeholder will assume the estimate given on day 1 (the one typically 50% under estimated) is the commitment. There will be a constant push to make this commitment point earlier in the life cycle and that should be a goal to achieve, but not recklessly. 

Budgets and Car Sales

For projects that have a constraint in budget or time, consider this analogy:

You want a new car so go to a car dealership. The first question you're likely to be asked is "what is your budget?". You're coy and don't want to tell the dealer your budget because you want to see what they've got on offer and what their best price will be because you're afraid that if you tell them your budget, they'll use 100% of it and won't give you any discount. You end up getting shown every car and trim level in the show room, wasting time on options that you're not going to be able to afford. Only after you've made your choice, and the car sales person has taken 3 trips to their manager do you agree on a price.

In that analogy, the stakeholder with the money is the person buying the car and the dealer is the developer trying to figure out the needs of their client without a critical piece of information: their target budget. A lot of time and effort could be saved with a little bit of trust and openness: Share the budget constraint with the supplier and help them arrive at the best option sooner. 

Developers and stakeholders are not in a buyer and seller relationship, so there's absolutely no need for the coy behavior of hiding the true budget. Provide the budget to the developer and ask them how much car can they give you for $X instead of leading them down the path of designing the world just to have to descope it to meet your constraints. 

If you don't have a budget, create one based on an ROI in the business case. It's going to deliver $x margin this year, so allocate a % of that to developing it. Be realistic and open to feedback if your budget is not matching up with cost. 

From the developer point of view, treat the budget as a goal to fit within and design accordingly. It may not be possible,but work collaboratively to find a solution. 

Project Estimation Cheat Sheet

Identify the situation of the project and employ the correct strategy.
  1. As fast and efficiently as possible to reach a goal: 
    • No estimates, 
    • Unambiguous goal (definition of done)
    • Measured incremental progress towards the goal.
    • Corrective action if progress is not being made or goal moves.
  2. The work must be coordinated with other work:
    • Up front estimate, break into smaller atomic units if possible. 
    • Defined point of turning estimate into commitment.
    • Rigid in sticking to commitments
    • Flexible in dealing with missed commitments
    • Constant readjustment and update of target
  3. There is a constraint in budget or time
    • Make the constraint known and treat it as a requirement and input to the design
    • How much can be done within that constraint?
    • Estimate if is it worth starting at all (is there enough time or budget to do MVP)?
    • Turn it into #1 (as fast and as efficiently as possible) to push what MVP means in a positive way.
  4. Continuous Improvement
    • Establish a way of measuring ROI on a continuous basis. Maybe opex vs margin.
    • Split into sprints that resemble #3 (there is a constraint in budget or time) and iterate.
    • Recognize when you should switch to #1 or #2 temporarily. 

You may notice that estimates (as in how long will it take to do xyz?) are only every required for projects in category #2 and the only legitimate reason for this type of estimate is to coordinate between activities. #3 calls for a different type of estimate (as in how much can you do in xyz time?), whereas #1 and #4 don't estimate at all. 


Thursday, March 20, 2014

Topics of Interest & Keeping Up To Date

Today, a friend asked me to recommend topics to help him get back into things with regards to technology and architecture. In line with the philosophy of conserving keystrokes, here's my braindump of interesting topics of personal research and interest.

Wednesday, February 27, 2013

Entity Identities

Entity Identities - say that ten times really fast! Entity Identities; Entity Identities; Eddity Iddedy; Eddy Idy; Idddy Eddy; Identity Entity. Ah foobar!

A recent twitter conversation and stackoverflow question sparked some latent thoughts I've been having about identities for entities and where responsibilities lie.

If you're in a hurry: As a convention, use the string datatype for all identities. Don't worry about Guid's, ints, bigints, Uri's etc. String is all you'll ever need and provides the most flexibility. Let the data store dictate the storage data type and the identity creation strategy - not your code.

I'm referring in particular to surrogate identities, not natural domain identities or composite identitites. Think CustomerId or something less boring like, say, GalaxyId.

The Identity

Let's explore requirements for the identity of an entity for a moment. These can be broken down into two categories: requirements imposed by code that uses the identity and requirements imposed by persistence layer. Sure... if you want to be complete, include a third category for requirements imposed by humans, but who cares about them, right?

Requirements imposed by code that uses the identity:
  • The Id must be serializable so it can be transported across boundaries
  • The Id must be comparable for equality against another Id
  • The data type for the Id must have enough of a range to provide a unique value for every possible instance of an entity.
Note the complete absence of any need to perform any mutable operations on the identity. In fact, it's probably a good thing if the identity is immutable. We don't really even care about the data type used as long as it meets our needs. The most we need to do is transport it and compare it.

Requirements imposed by the persistence layer
Performant and scalable persistence layers typically use some form of key hashing to get O(1) look-up performance on entities. Think key/value stores or clustered indexes in RDBMS's. Based on this, we can derive some more requirements:
  • A hashcode must be easily computable from the Id
  • The Id must be comparable for equality against another Id
Some persistence stores impose limitations on the data types that can be used as an identity while others suggest best practices on how identities are generated (for example, randomized guid's are as well suited to clustered indexes as I am to being a potted plant). Given this reality, combined with the fact that the code generally has no restriction on the datatype of the Id, it's probably best to let the datastore dictate the data type as well as the Id creation strategy (sequential, random, magic etc.).

If the data store dictates the datatype of the identity, what happens if I need to change my data store from one that uses guids to one that uses strings, or integers? Don't I need to go and update all my code? Not if you used the widest possible datatype. In this case, strings. (By the way, you probably have bigger problems if you're changing your datastore technology!). 

Why do you need to represent a GalaxyId as a Guid in code, even if the datastore actually does store it as a UniqueIdentifier? What ya gonna do? Get the bytes of the Id?

If a string is good enough for URL's and ETag, it's good enough for your GalaxyId too.

There's probably some caveats with this strategy (mostly where ORM frameworks are concerned), but it's served me well in my quest to make code immutable.

Friday, February 22, 2013

Reviews and feedback

As I work to establish a functioning enterprise and system architecture practice at my company, I came up with some of the below guidance for reviews - code, design, or otherwise.

Anybody can do a review at any time. This includes System Admins, DBAs, Product Owners, UX, Architects, Developers, Leprechauns and Fairies. Everybody looks for something different and brings a fresh perspective, but all in the name of making a better product for our users and keeping Mr. Murphy at bay. In general, everybody, no matter how smart they think they are, can learn from anybody else (no matter how smart or stupid you may think they are).

Feedback must be well articulated, shared, constructive, and actionable. If you don't like something, but can't tell me why you don't like it or what's better, then that's not feedback... that's complaining.

Feedback must be tracked and actioned upon. You just wasted a valuable way to learn if you don't accept feedback. Put it in the backlog as a type of bug, track it, close it. Do your reviewer justice.

We're all human (architects barely so). Humans have feelings. You may hurt my feelings if you criticize my code masterpiece. Okayyyy then. I've written code that’s less than stellar and ignore my own principles from time to time. At no point should reviews be personal or derogatory to the author. The flip side is that no developer should be so attached to their code that if they receive a negative review they become offended. We need to build a culture of continuous improvement and learning, not big brother with a big stick.

There's probably others, but I feel the above is a good start.

Tuesday, February 19, 2013

The compiler is your friend

The compiler is your friend, but only if you’re constantly aware of its presence and leverage it to its full potential. Get to know its warnings and errors – it’s trying to tell you something about your code.

A simple example is how the design of a method can be changed around to get the compiler to tell you if you’ve forgotten something.

Friday, February 15, 2013

Security as a functional concern

As we embark on a pretty large system rewrite, one of my goals is to bring some traditionally just-in-time concerns to the forefront as first class citizens in design and implementation. One of these concerns is security and one of the ways I want to make it a first class citizen is to make sure it's injected at the source (via business requirements) so that it flows all the way through the development and QA pipelines and is verifiable at different stages. This isn't the only way I'd like to tackle security, but it's probably the most visible.

Given recent examples of security breaches that aren't a result of a technical bug in the system, ensuring business stakeholders and product owners take notice of security is an important goal.

Wednesday, January 30, 2013

Code should be immutable

Change by addition, not mutation. Putting the open-closed principle into practice.

If I write a unit test and some code that satisfies that unit test, I should be reasonably confident that I will never have to change that unit of code again. I may add more units of code and I may delete old units it as soon as they become redundant, but I shouldn't put myself in a position to have to mutate an existing unit of code.

Within reason of course. Units of code aren't always atomic enough for this guidance to be considered a standard. The law of diminishing returns applies - be pragmatic in anticipating the ways the program will be required to change and pay particular attention to the probable extension points and axes of change.

I've realized a few benefits of this approach:
  • New code has fewer dependencies. I have much less risk if I introduce something new than if I modify something that exists.
  • I can have more control over when I replace an existing unit with a new unit. For example, I can run both side-by-side and compare their performance.
  • It helps me maintain a consistent velocity. Since the side effects of new code are less than changing existing code, I run into fewer unexpected problems and edge cases.
  • It gives me feedback on my code organization. If I cannot take this approach, it tends to be a good indicator of a violation of the single responsibility principle or proper encapsulation. It also means that in the future, I'll be spending all my time changing existing code to add new features and my velocity will tank.
  • I feel more creative when coding. Creating new code is more fulfilling than modifying existing code.

Some things need to be in place for this to happen successfully:
  • As mentioned above, strong and consistent application of the single responsibility principle and open-closed principle.
  • Good discipline in ensuring the units of code are of the right size. Typically it's smaller than a class, but bigger than a method. Perhaps a nested class.
  • Ability to swap an old unit with a new unit with minimal mutation (doesn't have to mean IoC - just reduce the proliferation of the unit).
  • Ability to flag redundant units for safe removal and the discipline to actually remove them.
  • Ability to measure the ratio of mutation to addition/removal and see it trend in the right direction.

Go ahead, try it on a small scale and see how you feel. Does the artificial restriction of not being able to modify your existing code cause you to think more about your design? Does it hurt your velocity or help maintain or improve it?

Keen observers will notice that this is just a spin on well established OOP principles (SRP and OCP). What I find fascinating is that code and the systems it's contained in are like fractals... the principles and hence the way they look is the same from any level of zoom.