Newsletter
Engineering Management

Save Time And Money From The Coin Toss Accuracy Of Delivery Time Estimation Sessions

Have you ever felt the pressure of not knowing when a project will be finished? Well, instead of sitting through long meetings, we used past project info in a smart way to estimate future deadlines. It showed the estimations could actually be more reliable and less of a wild guess.

This was a “for fun only” POC project that I did in a previous role.

The sales team at this company pushed for an accurate time frame for product delivery . We were expected to run product delivery the same way we ran projects with Gantt charts… not saying Gantt charts are not good. However, the tension in the air was high, as we often felt like we were forced to commit to a deadline we weren’t so sure of.

We (product/ tech) often try to avoid committing to a timeframe for a couple of reasons. First of all, because it led us to spend some serious man hours on estimating the delivery time, instead of actually working directly on bringing actual value to the customers. Especially when dealing with legacy systems, it’s hard to give an accurate estimation due to all the unknown variables.

So I thought I’d have some fun with the historical delivery data in JIRA, to see if I can reduce some of the time wasted on planning meetings, by estimating delivery dates based on historic records. If this approach works, Great! I’ve helped improve the efficiency; if not…well… At least I had some fun 😉

Photo by charlesdeluvio on Unsplash

Project Estimation: A Problem Exists In All Companies

Product managers in IT (very frequently) are asked to estimate a project delivery time. A lot of product managers and techies tend to avoid giving a specific timeframe, especially when working with a complex or a legacy system it’s near impossible to give an accurate estimation. There’s nothing worse than committing to a promise that you can’t keep.

If I put on my product manager hat, I’d be thinking:

“The system is so messy! And we end up introducing new bugs every time we release an update, how can I possibly give you an estimate?

“Arrgh! The roadmap is changing every minute! When are we going to start doing instead of just estimating?

Now, to put on my founder’s hat for a moment. Of course, I fully understand the reasoning and value behind having an estimate for a product delivery date. Founders, sales, marketing, and people of this sort have to give out what can be expected up front, this includes the timeframe of course. And we’d slowly lose the trust of our customers by committing to things that in the end couldn’t be delivered on time.

So what do we do to avoid the loss of confidence? We turned to our product managers, and asked them to manage the team the way a project manager would (sort of), ie. give me the estimate and make sure you deliver on time.

Team Setup

Of course I ran this experiment before I was ever a founder, so I just hoped to come up with something to shut their mouths 😉 Again, this is a just for fun project. Right, since you get the background, here’s our team set up:

  • Four squads
  • Each squad had at least two projects running in parallel, ie. at least eight deliverables in any given time
  • The set up between each team was very similar. Where the majority of team members remained unchanged for about a year, so the velocities were in a reasonably well-calibrated range

The problems

  1. Between closing new deals with clients, new product ideas, new C-level executives, our 1-year roadmap got shuffled around on a monthly basis
  2. ROI played a big part in our roadmap prioritization. To get ROI, we needed to estimate the cost, i.e. (development time) x (avg. hourly rate of the squad)
  3. Between switching between deliverables and fixing the issues from the legacy systems, we ended up late on the delivery way too often (see the data below). The actual time spent on a feature turned out was a million miles away from the initial estimation.

I found out that the accuracy of our delivery time estimation was like… randomly calling heads or tails throughout a series of coin tosses (just a figure of speech). The distribution of course, was not like a coin toss but a long tail distribution, more than 25% of tickets took longer to complete compared to the initial estimation.

  • In 50% of tickets the time spent was within 20% over or under the estimate
  • about 25% was between no work done (-100%) or significantly under the estimate (-100% → -80%)
  • The other 25% included that long-tail estimation

Estimation Meetings Can Be More Costly Than You Think

Now, let’s talk about the cost:

  1. Typically we spent two hours in an estimation meeting and sometimes it took more than two meetings to estimate one deliverable
  2. Members who were typically part of the estimation meetings: head of dev, head of product, team lead, product manager, program manager, CTO (sometimes). Where the total annual salary of the members mentioned above is based on the avg. IT salary survey in 2020: that’s £580k per year, that’s £285 per hour before tax
  3. We normally ended up changing half of the deliverables in the coming six months, some projects lasted longer, others shorter (one monthish). There were normally eight projects running in parallel. Let’s say we have a three month project and another one month project for all squads at all times. That’s 12 small projects and four big projects, per year, per squad

For every deliverable to be on the roadmap, we needed time estimation to work out ROI, which meant at least one estimation meeting for each. So we had (12 small projects + 4 big projects) * 4 (squads) = 64, that was at least 64 project time estimation meetings per year. Since we spent around 2 hours per meeting, we were looking at 2 hours/meeting * £285 avg. hourly rate of all participants * 64 meetings per year = £36,480. An equivalent of a junior full time per year.

Task

To be fair, £36k is nothing for a lot of companies. What was the point of having the meetings, if the result was pretty much a known distribution? Even if we used those hours for people to just socialize and create bonds within the team it would have been a better way to spend the time.

Since we had a year without much changes to personnel or process, I assumed that the estimation and delivery related data would be some sort of distribution. If it was a distribution, then we could use it to predict our following deliverables’ timeframe. Then we could confidently go to the C-level stakeholders or sales and say “we estimated this deliverable can be done around x (time) with y confidence level, so we are looking at the cost of z” WITHOUT having to spend time on the estimation meetings anymore!!

Sounds lovely, doesn’t it?

To achieve this, I needed the following information:

  • Delivery velocity
  • Size of each ticket
  • Correlation between the ticket size and the actual time spent
  • An estimated amount of how many tickets for a project. It’d be an educated guess from the BAs and tech leads

Feel free to skip and go directly to the conclusion if you aren’t interested in the data crunching steps in between.

Raw data from JIRA

Alright, here’s the boring part. Below is the data I exported from JIRA:

  • Issue key: this was just the internal key we used, masked in the screenshot, as it is an index for me and not relevant to the result
  • Issue Type: story, bug, hotfix
  • Sum Original Estimate (unit min)
  • Sum Time Spent (unit min)

The data range was a year long, roughly after a couple months after we restructured the team. Meaning the data range I selected for this experiment was after the teams started to gel as a team.

Clean up and categorize data

I removed the tickets that had no record of either original estimation or the time spent, and categorized the raw data based on ticket types.

Generate Data

Here’s a list of data I generated based on the raw data. This information was formed based on ticket type, ie. story, bugs, and hotfix. Within each type I had:

  • Time difference between estimation and actual time spent (per ticket) = Sum original estimate time  –  Sum time pent
  • Prediction accuracy (per ticket) = (Sum original estimate time  –  Sum time pent) / Sum original estimate time

Since I am not interested in counting time to the minutes, I also created time intervals. So that the tickets with a similar actual time spent, ended up in the same bin or the adjacent bin.

How Accurate Is Our Project Time Estimation?

The histogram above is the result of the data manipulation. Where the y-axis is ticket counts; x-axis is the prediction accuracy. When a ticket landed on the 0% bin in X-axis, means the estimation time equals the actual time spent; and when a ticket landed on any where left of the 0% means the exact time spent is less than the estimation, and vice versa. The bar height indicates how many tickets fall within the bin. On the far right-hand side is a bin named “Over”, which is the collection of tickets with the time spent 2X more than the original estimated time.

You can see the following trend of my initial analysis:

  • 50% of tickets delivery time spent were within 20% over or under the original estimation
  • about 25% of the work done was under the estimate
  • And another 25% was part of that long-tail in the diagram

Time spent distribution of each ticket type

Although how a team categorizes their tickets could be very different, the principle here is to differentiate the fundamental differences of these ticket types. Take the squad I worked with for example, a story is a piece of work neatly defined in acceptance criteria and user story; where a bug is a piece of work that the result of the error is known, but may require extra time for further investigations.

The combination of different amounts of each type of ticket makes up a project, meaning the combination of the proportional probability distribution of each ticket type, will then be the final project delivery time probability distribution.

You can see below the screenshots (distribution included) of the ticket types- story and bugs in a squad. The y-axis represents ticket count and the x-axis represents time spent (in seconds, I kept the JIRA’s raw data format).

I didn’t include hotfixes’ screenshot and distribution simply because a hotfix is something that should never happen… in theory. I also didn’t have much hotfix tickets data to make sense of its pattern.

You can view the distributions here as the possibility of how many hours a story or a bug ticket takes. And now, we have everything we need to build a quick and straightforward reusable model. From there, I am able to explain the statistical meanings of the model, and how it helps with the project time estimation.

Result

Building model

Assuming you enjoy playing with numbers, then instinctively you’d know the next step is to build a model and run a simulation.

I built this model based on the information derived above plus the estimation on how many of each type of ticket a project might have. So I started with an imaginary project (a simulation), which has five story tickets, three bug tickets, and a hotfix ticket. I then use randomisation + vlookup (as formula shown below) to get random hours based on the possibility distribution of the ticket type:

The first column represents the ticket type and amount of tickets that would be required in a project. The second column is the time each ticket spent (in second), which is generated by the formula mentioned above. The ‘Total’ is the sum of the time spent on the imaginary project in total.

To get a distribution of the possible delivery time of the project this size, I ran 200 sets of simulation like the example set here. And I get a distribution shown the diagram below:

Result Interpretation

The Y-axis (counts) is the frequency generated from the result of the simulation, and the X-axis (standard interval) shows the intervals left and right of 0 standard distribution (STD). The blue histogram chart is a distribution of the number of simulations under each accumulated hours bin, the gray line represents the accumulated possibility that the project could be completed within a certain STD.

It’d be easier to discuss the result in time instead of in STD. This table shows the possible time spent.

The data from the simulation shows there is a 19.5% chance to complete this project between 55hr (~200,000 seconds) and 69 hr (250,000 seconds). Although it is great to know this number, it is usually not so helpful to know the possibility of completing the project within a specific time range. A better way to interpret this is via the cumulative distribution (gray line). The possibility of the project to be completed in 69 hours or less is 28.5%.

I could say that I have more than 99% of confidence in the fact that this project can be completed within 208 hours, but this becomes meaningless as this statement has completely missed the point of estimation in the first place. It is always true that the longer you estimate the possible completion rate the higher the possibility you have to deliver the project in time. Let’s put it this way, you can say that you are 100% confident that this project will be completed some time in the future. This however doesn’t help your team to make an ROI estimation, the prediction contains no commercial value and it means nothing in the roadmap prioritization meeting.

A more sensible way is to either pick +1STD which gives you a percentage, or to look for a percentage that you are comfortable risking. From here you can easily convert hours into the cost for further prioritization.

Premise In This Propose Process

As you have probably already sensed, there are two premises required for this prediction to work. Firstly, the delivery team personnel has to be somewhat stable, or in other words, the rate and quality of delivery needs to be consistent.

Secondly, the product owner/BA/team lead, whoever breaks down an epic to tickets needs to have a relatively good and accurate sense of what the tickets need to go in to get to a milestone. So that you have the rough amount of tickets needed for the estimation.

Conclusion: Do You Estimation Wisely

This is a very flexible model you can play with whichever way you like. For example, you can split the tickets between FE, BE, DevOps, or even split the tickets between individual contributors. Taking into account the various velocities between different developing functions. We just need to know that the more detailed factors added, the less accurate a model will be, which has been proven via many well-known prediction models.

You can download the model from here. Feel free to play around with it and let me know your thoughts.