More and more I am of the opinion that putting points against stories is a waste of time. I’ve spent many hours, as I’m sure have you, sitting in meetings of various shapes and sizes guessing numbers and looking back I’m starting to question if it was really worth it. I’ll say upfront, I’m going to be fairly critical of story pointing here, I’m not just being a grumpy old Yorkshireman! I think if we want to be truly agile, we need to make sure we are always questioning the way we work, looking for the value and actively making changes where we think there is room for improvement. Allen Holub once said “You don’t change things by kicking the tires or working round the edges, you have to open the bonnet and get into the engine” (or words to that effect). Story pointing is often a fundamental part of many teams’ ways of working and as such it can often be overlooked or ring-fenced when looking at improving process. Without further ado then, here’s why I think story pointing is probably wasting your time.
Let’s talk about story points: what are they and where did they come from? In a typical Scrum-like team we often have a ceremony called Refinement or something similar, this meeting usually aims to achieve two things:
- To bring a story, often written by one person, in front of the team to make sure it’s sound, i.e. to spot any gaps and make sure it’s well written for some definition.
- Put a value on the story to indicate its size. Sometimes this task is split into a separate estimation meeting.
On the face of it that’s not a big brief but there’s a lot to unpack here.
Origin story
Story points as a concept were originally developed as part of Extreme Programming (XP). According to their inventor Ron Jeffries, this was an attempt to allow the team to put idealised estimates on stories without confusing power holders1 or risking being held accountable to these numbers. As we all know, if you write “3 days” on a story, regardless of how much you might emphasise it’s just a guess, you are undoubtedly going to be pulled up at the end of those three days and asked where the work is. Thus the concept of points was created to logically separate the number from a time estimate, “it’s not 3 days, it’s 3 points”. This number could then, at least in theory, be used to inform how much work to bring into scope without giving the impression of committing to a time frame2.
Fast forward to present day and the practice of story pointing is standard for many teams. While it’s not explicitly mentioned in the Scrum guide, “estimation”, or more recently “sizing” of stories is in there. What is often seen as a common point of misunderstanding and a hot topic for discussion in agile communities is people confusing story points with time. Story points are not time (at least not explicitly), we can’t estimate in time because we can’t say how long things will take. We’ve probably never had to solve this particular problem before, we probably don’t know the exact requirements or they are subject to change and even if we do, we’re not great at predicting the future which is why we call it estimation in the first place. Instead we use story points to tell us how much work we are likely to get through in a given period. So story points do represent time then? Well sort of, but not really, it’s confusing at best but more importantly, we are still putting numbers on things which attracts the attention of those looking for ways to track progress and build charts.
Who needs a story point?
What are story points useful for then and more importantly who uses them? I think it’s safe to say that the team doesn’t use them when developing software. If as a developer I pick up a story to implement, I don’t care about the number that’s been put on it particularly. It doesn’t help complete the work and for the most part I know how much work a story is (if I don’t I’ll soon find out!), besides I probably helped come up with the number in the first place. It also most likely doesn’t affect if I pick it up or not, either the work needs to be done or it doesn’t, how big it is shouldn’t really make much difference unless it’s huge in which case we should probably split it up a bit. In fact I’ve only ever heard two arguments for why we need story points that I think hold any weight at all:
- It helps us decide how much to bring into sprint.
- It helps us highlight where a story does not have enough detail. I.e. if we all give different numbers as an estimate it’s an indication the story isn’t clear enough.
I want to address those two subjects in more depth so let’s put a pin in that for now and come back to it later.
If the dev team doesn’t need the story points, then the only other people who might need them are the power holders. Why would the power holders need story points? The answer I think is pretty simple: they don’t and even worse, leaving them lying around on stories just encourages their abuse. Let’s think about it for a moment, we know that we can’t estimate, or even know, the entire backlog of work. If we did we wouldn’t need to be agile at all and we could just develop software the same way we build houses. As a result, we can’t use story points to tell us when the project will be completed. Anything more than a sprint or two ahead is likely to change. If we need to tell what can be done in a given iteration or get a high-level estimate for a piece of work then we have to ask the team, that’s one of the rules right? Business people are in charge of deciding what needs to be done, the development team decides how long it will take, because that’s their area of expertise. So if power holders can’t track the future with story points or the present, what about the past? To see what we’ve done we just need to look at the working software we produced, we no longer care about the estimates because we know exactly how long it took for real. If we got our predictions right it’s no guarantee we’ll get it right next time as the stories will be different, likewise if it was wrong there’s nothing (as we’ll see) we can realistically do to make sure it’s right next time, so looking back is not helpful either. If we accept that a story point is merely a hand-wavy guess as to the size of a story then there’s no reason why anyone outside the development team would need to care about it and by using these values to make predictions we just line ourselves up for failure. On that note, let’s revisit the development use cases we highlighted earlier.
Refinement
In agile development we often write user stories to describe the work which needs to be done. These are ideally written from a user perspective to help us identify the business value the story represents but even if we write technical stories the story should contain just enough information to get started on the work. Why? Because things change, that’s why we need to be agile in the first place right? If we document what needs to be done for a story in detail then it’s likely to change and will need to be updated. Avoiding too much detail encourages face to face conversations and reduces waste because we don’t write a load of detail which either changes or isn’t used.
This brings us on to refinement. One of the defences of story pointing I hear is that it helps with refining stories I.e. if the team all have to assign a number to a story using a system such as planning poker it can highlight where there is uncertainty around the scope of the work and facilitate a conversation. The argument goes that refinement is very valuable (which I think it often is), story pointing can enhance refinement and takes little time on top, so it’s worth doing.
The first and maybe most obvious thing to call out here is that, by forcing each team member to guess a number we risk simply embarrassing individuals who guess an outlying number. Team members who are unsure, maybe because they are junior or new to the codebase, will just try and guess the number they think the rest of the team are going to put rather than risk being the only one with a different estimate and forced to explain it. In the case where half the team estimates say a 1 and the other half say an 8 it may facilitate a useful conversation which may highlight things that have been missed but I’ll come back to that. The rest of the time we end up discussing the difference between say a 3 and a 5 or else the poor junior dev who estimated an 8 where everyone else said 3 has to be asked why they put an 8 and if they would be happy to change their estimate because we can’t be sure that they haven’t spotted an issue no one else has.
Implementation details
Let’s go back to our hypothetical example where half a team has estimated a 1 and the other half an 8. Maybe this is because half the team is experienced enough in the codebase to understand there is some complexity to the task which will need to be done as part of this work. Adding this information to the story could save someone time but we didn’t need the entire team to guess a number to uncover this, the dev who picks up the story can discuss it with one or two others with appropriate knowledge.
What if, for the same scenario, rather than half the team knowing a thing the other half don’t, half the team imagine one implementation and the other half a different one? This time the estimate depends on how we implement it, the only way to agree on an estimate across the team at this point is to decide on what the implantation will be upfront. This is a problem, we either have to start getting into the weeds on how to implement the story or else take it away and discuss it elsewhere and bring it back. As before, that might be useful but again do we need the entire team involved to agree on an implementation approach for a story? There’s other options here, we arguably should have looked at the options before we ever get to estimation or else this is again a discussion to be had when we pick up the story between the dev doing the work and relevant parties.
The common problem in both scenarios is that, rather than aiding refinement, estimation drags us back into discussions and even arguments on implementation that are best had with a smaller, focused audience. When we refine a story, we talk about it, raise any concerns and ensure there’s enough detail for someone to get started and for them to know when it’s done. Then we estimate and at this point we either all agree (broadly) or else we disagree in which case the only choice is to continue discussing the story until we all agree or give up and come back to it once someone has added more detail. The problem is that the reasons we disagree are many; maybe we forgot test effort, maybe we are not as familiar with the code, maybe we just think it sounds complex, maybe we’re factoring in dependencies etc. We can spend lots of time trying to gain consensus but does it matter? Maybe we save the person picking up the story some time or uncover something we didn’t think of when talking about the story but if that comes at the expense of the entire team’s time it seems like a poor trade-off. I would argue that refinement doesn’t require story points, we don’t really need this extra step and worse, it simply encourages over-refining stories. Even if we decide that estimating stories really does help refinement and is worth the cost, why add that number to the story once the team agrees? By putting the number on the story we are saying we expect to use it again, we encourage its abuse. Once the team agrees and the story is refined, why not just throw the number out and move on?
Planning
Let’s look now at the argument that story points help us decide how much work to bring into a sprint or iteration, sounds reasonable right? We work in iterative cycles, each iteration someone with business knowledge, often a product owner, will decide what is most important to accomplish in a given cycle. We then take away those stories and commit to get them done in the time frame. How does our business person know how much stuff they can have? Well we point each story to guess its size and track how many points we complete each cycle, this becomes our velocity. We can then use the average to give our business person an idea how much stuff they can have based on our previous performance. Or can we? We still don’t know really how much stuff we can do, we’re using guesses to inform a guess. The problem is, we have now implied we can do a certain amount of work in a given period however much we might stress we’re not promising to complete it all. We’re lining ourselves up for failure again even if we are fortunate enough to work in an environment where people are not held to account over it.
Crunching the numbers
The real danger here is that the process of guessing how much we can do in a sprint becomes an endless quest where more and more variables are needed to chase that elusive number which we can never truly know. As an example, I recently watched someone present on sprint velocity and capacity planning. The speaker first explained what is meant by each term; velocity as a measure of the pace the team works at and capacity as a measure of available effort. It’s worth noting we only know velocity if every story taken into sprint has points assigned to it. Figuring out capacity involves working out how much time the team can spend working on stories. We do this by figuring out how many hours in a sprint a person is at work for, minus the hours spent in scheduled ceremonies (stand-up, refinement etc.) and of course minus a small value to account for other things such as breaks and ad-hoc meetings (works out about 6 and a half hours if you’re interested). We can then multiply that number by the number of people in the team to get our max capacity, i.e. the maximum hours of work we can expect from the team over the course of a sprint in the best case scenario. Now I know what you’re thinking, what if someone has the day off? Well not to worry, we’ve got that covered in the form of total capacity, which is of course the 6.5 hours a dev works for multiplied by how many days they are going to be in for a sprint. What we end up with is a table that looks something like this:
You could at this point be forgiven for thinking that all this sounds a bit mental. There are some glaring problems here, the most obvious of which is that by turning people into “Resources” we have lost a whole load of context and indeed humanity. Resource A, might actually prefer to go by the name of Bob and Bob may well have recently had a new baby for example so won’t be operating at full capacity. Maybe Resource B (or Jane) has a doctors appointment or else was pulled into a call about something else for a bit. We gloss over all those details with a neat little table which reduces everything down to a nice numeric value we can use to build, you guessed it, a chart. In the second half of the lecture our speaker explains how we tie this into velocity. For each sprint we complete, we take a note of the story points we completed and the capacity we had. We work out what our velocity is when we’re at full capacity by taking an average of the story points completed in a full capacity sprint. Now we can use that to cleverly figure out how much work the team ought to be doing next sprint. We know from our table that the team has a capacity of 265 next sprint as some team members are on leave. So we take our velocity (38), divide it by our max capacity (325) and times by our total capacity for the sprint (265) to get our predicted output (31).
(38 / 325) * 265 = 31
Tada! We now know what the team can do next sprint and have some good solid equations we can point to if they don’t get there (presumably because they are slacking off).
This is of course a bit of an extreme example and I’ve no doubt that this is done with nothing but the best intentions in mind. If we’re lucky enough to work in a blameless culture, then we don’t hold people to account however, it’s easy to see how at best this becomes a slippery slope. We add numbers to stories so we can figure out what to bring into sprint but what we can get done in a sprint varies depending on team size, so we need capacity planning to make the adjustments. Now we have more data to work out the number but wait, this only works if our working hours figure is correct, which means we need to make sure everyone is doing their 6.5 hours solid work per day. When the team fails to knock off the number of story points expected, that failure is more pronounced if we’ve sunk a lot of effort into figuring out the number in the first place. It’s only natural that those who generated that number will start to ask questions on why we didn’t get there, that’s the only way to improve right? We need to find out why the calculation was wrong and fix it! What’s missed is that this calculation will almost always be wrong, because we’ve ignored the context, the human factor and therefore every second we spend trying to improve on it is simply wasted time.
#NoEstimates
If we point stories then we have to base this on the best case scenario because anything else is too unknown to be even slightly useful. We at least have a chance at saying how much work something is in an ideal setting, if we’re asked to include a whole load of unknown risks into that estimate we might as well just use a random number generator. How much work will it be if we come across a problem? How much work if the server goes down, if we need to wait on another team? How long is a piece of string? Most teams end up with something like this:
- A value that represents small
- A value for medium
- A value for large
- Everything larger which is often broken down if possible.
The unknowns between best case and worst for any given story mean that in practical terms this list reduces to stuff that’s small (day or two) and stuff that’s bigger but can’t be broken down more. As we’ve seen, no matter how good we get at estimation, what we bring into a sprint is a best guess. Even if we magically get it exactly right, we are faced with a new problem in that developers have nothing to do at the end of a sprint. If the last of the dev work is done on the final day of sprint that leaves no time for testing. Even if the developers lend a hand with the testing it’s almost impossible to keep everyone occupied without bringing in more work from the next sprint at which point we’re messing up our calculations again.
Why does it matter though? If we accept that what we bring into sprint is our best guess, we know we might get through it too fast or else not complete the work and we’re fine with that. Let me flip that around: if that’s the case, then why does it matter how good our guess is at what we bring in? One approach might be to use a Kanban technique and simply pick up the next most important thing from the backlog each time and not worry about chunking the stories into sprints. It may be however, the team finds it valuable to have a bucket of work to use as a short-term goal. It may be that you need to give power holders an indication of how much you can get done in the next iteration so they can plan. As I said, even in the best case where our power holders understand and accept all this, it’s only ever giving us a rough idea what to aim for so do we gain anything from using story points over just counting stories? There’s certainly some evidence to suggest not as much as you might think.
Enter the #NoEstimates movement. The argument is that we actually don’t need to bother with estimates and even where we do, we can get away with just counting stories which turns out to be almost as good without all the problems of estimation.
Final thoughts
One of the most important things to remember is that each team is different and almost nothing is true for all teams. Therefore it’s not true to say that there is never a case where story points work for a team. What I will say is this: as soon as we put a number on something it attracts attention. We talk about how big it is and when it will be done rather than making sure we’re doing the most important thing as quickly as we can.
- If we put a number on a story we are setting ourselves up to be held to that number and risking its abuse.
- Estimating stories takes up the team’s time, even if you’re great at it.
- If we’re honest with ourselves we probably don’t need to point stories in order to decide how much work to bring into an iteration.
- Putting an estimate against work requires knowing what the work is. The finer grained the number the more detail about the work we need to know upfront. If we’re not careful this drags us into over-refining stories.
What I would encourage you to do here holds true for most things we do in software as agile teams. Question what the point is behind the process. What do you hope to gain? Does that match up with reality? What would realistically happen if you stopped doing it? Try something else and see what improves. Always keep a sharp lookout for where you are wasting time and effort following the plan over delivering software.
Certain things in process are often seen as untouchable, we are open to changing the things around the edges but baulk at the idea of questioning the core ceremonies and process we have. To make change we need to get into the engine and pull out the bits that are not adding value. Even better if we pull something out and it makes things worse we can just put it back in, but if we’re too afraid to make real changes then we’ll never see real improvements.
-
Power holders: Ron Jeffries uses the term power holders to describe all the people who have and exert power over the developers. I quite like this term as a broad umbrella to describe project managers, product owners, stakeholders etc. My only niggle is that the term “developers” implies that developers are the only people who matter on the development team. For the purposes of this post you can interpret a “power holder” as anyone outside the core development team who has the ability to apply pressure on the team. ↩
-
Story points: For more on the origins of story points see Story Points Revisited. ↩