Loosing my Scrum virginity...what not to do the first time.

02/10/2010

At the end of Sprint 4, things started to get ugly.

It was a warm Friday afternoon in London and my colleague, Simon, and I were walking along the banks of the river Thames towards our client’s office. We were chatting about who was doing what during the Inspect & Adapt and Sprint Review meeting we were about to have. Simon was going to demonstrate a few features and then go through the status of all features in Sprint 4. Then it was over to me; I was going to walk through the plan for Sprint 5 which is all I could contribute as I had only joined the project the previous week.

At the start of the meeting, there were the usual introductions. Simon knew most of the people in the room quite well, and there was light-hearted banter before things got under way. Before launching into the first part of the meeting, the walkthrough of completed features, Simon made an apology that not all of the features were complete. The rest of the room seemed to collectively shrug, not really caring.

The demonstration went relatively smoothly, as smoothly as a live demo can! The occasional glitch, the odd, unexpected result, „it was working at the office“, the usual story. At various points someone around the room would pipe up, suggest a change, seek clarification or wonder why a particular feature wasn’t different. Each time there was a reasonable answer and the changes and enhancements were noted. All in all, the demonstration went fine. No major stumbling blocks, no major changes. That’s when we reached the point of reviewing the status of features in Sprint 4, and things took a severe turn for the worse.

Simon opened up the shared spreadsheet showing the status of each sprint. The initial plan was to use Rally to plan and manage user stories and sprints. For reasons that are still unclear to me, the team stopped using Rally after the user stories were entered. From that point onwards, the team reverted to a series of Word documents and spreadsheets. Simon opened the tab for Sprint 4. Looking across at the final column, “percent complete”, there wasn’t a single feature that was 100%. Not only that, there were a number of features that hadn’t even started.

The mood in the room turned stony cold. Jason, the Project Manager, asked what everyone else was thinking, “Why is there not a single feature complete?” Simple question, not so simple to answer. Simon struggled to explain. The reality was that the sprint had been overloaded, and there was no way the team was going to get it all done. Taking on so many features divided the attention of Simon and Sally, the product owner. They were trying to cover too much, and so nothing got finished.

The bigger picture was even more dire. Despite a month of upfront analysis, there were still details that needed to be fleshed out during each sprint. Too many details. So much so that Simon was way behind in getting analysis done and was putting features into sprints when he knew the analysis was incomplete. To add to that, Simon was playing the role of both analyst and project manager, never having managed a project of this scale. Torn between getting the analysis done, monitoring progress and planning ahead, Simon struggled to keep things together, and at the end of Sprint 4, the true state of affairs had come to light, and it wasn‘t pretty. Not that Jason, who had replaced the previous Project Manager at the start of Sprint 3, cared, nor did the rest of the room. The reality was, that at the end of Sprint 4 not a single feature had been completed, which is the entire purpose of sprints. The issue was, this was the first time anybody, other than myself and Simon, had any insight into the fact that the project was in serious trouble.

Simon tried to explain that Sprint 4 had been deliberately overloaded in an attempt to get through many of the features that needed detailed analysis, not to mention that we had been held up by a third party on several features. It fell on deaf ears, as Jason put it bluntly, the sprint had been poorly planned. It should never had so many features. He was right. Simon, with the best of intentions, had dug himself a hole that he couldn‘t talk himself out of. Even though the project was using Scrum terminology, it wasn‘t actually following the Scrum approach, especially with sprint planning not the only departure from Scrum. Mind you, this was not exactly surprising, as neither Simon or any of the dev team had any experience with Scrum.

There was a long, uncomfortable silence in the room. No one truly accepted Simon‘s explanation, and there wasn‘t a lot of confidence in the room that it was going to get any better. That‘s when Simon handed over to me, in order to outline the plan for Sprint 5. At the time, I only had a superficial understanding of Scrum, so I relied on my previous experience in putting together the plan for Sprint 5. I kept it simple and came up with a plan that I thought was achievable after speaking to each of the developers, allowing a buffer for the usual issues that surface. This, as I was to learn, was not exactly how sprint planning works in Scrum. I didn’t talk to Sally, Jason or anyone other than the dev team.

I presented my plan for Sprint 5 and was greeted with scepticism. “Why do you think this plan will work, when no features were completed in Sprint 4?” asked Jason. I did my best to explain the logic. First, finish features that were mostly done. Second, start features where the analysis is complete and could be completed within the sprint safely. Third, allow time to deal with the changes that arose in the inspect & adapt. And finally, allow a margin for error. Common sense, well at least to me. Jason asked a few more questions on areas that I had already considered, so was able to address them easily.. Still, there was only begrudging acceptance. The reality was the proof was in the pudding. Confidence was low, we needed to get runs on the board, we had to actually deliver what we said before we could win back any trust.

As part of reluctantly accepting the plan for Sprint 5, there were some caveats: Jason wanted better communication, wanted better visibility of progress during the sprint. He didn’t want to get to the sprint review to find out the true state of affairs. That was fine by me, my job was to let Simon get on with completing analysis on the features that still needed to have details worked out. It wasn’t hard for me to provide updates on progress, even if I still didn’t understand Scrum or what the project was about.
The meeting finally ended. It had been a torturous two hours.

The next week went relatively smoothly. The developers made good progress on completing features and getting started on new features. After the daily sprint, I would catch up with each developer individually to confirm how far along he was with the features they were assigned and if they were on track to complete them as per the plan. During the short one-on-one catch-ups, I found out the key pain points for each developer, not just the blockers.

By the end of the week, I had to face the music at a mid-sprint review. Fortunately, I was able to report that we were making good progress and were slightly ahead of plan. We were even hoping to bring forward a couple of features and get a head start on Sprint 6. I didn’t realize at the time that this wasn’t really in the spirit of Scrum, but for me, what mattered most was proving to Jason that we could deliver, and the best way to do that was to deliver. It wasn’t until the second week that I started to understand the underlying problems that had made it nigh impossible for Simon to have succeeded in previous sprints.

Although there‘d been some analysis done before sprint 0, there were still a lot of details to be worked out. Rather than use Rally, add the tasks to each user story, a Word document was used as the product backlog. The initial estimates done before sprint 0 were assumed to be correct. As the details of each feature were fleshed out, changes, enhancements, adjustments, additions crept into the backlog. What didn’t happen was for those changes to be reflected in the effort required. Each change or adjustment on its own was minor, but added up, there was a significant increase over the past two months.

The problem was not that the team wasn‘t following Scrum to the letter; the team was responding to change over following a plan. The problem was that the changes weren’t being reflected in the overall effort. Each time a change was identified, it would be done, during the sprint if possible, if not, the feature would rollover to the next sprint. The flow-on effect wasn’t realized until the fateful Sprint 4 review, by which point the damage was done. There was no way to go back and add up all the little amendments that turned a small snowfall into an avalanche.

Despite the damage done, Sprint 5 went relatively well. The team completed the majority of the features. There were a few that didn‘t make the cut, but Jason had been pre-warned at the midsprint review. The Inspect & Adapt and Sprint Review for Sprint 5 was the opposite of the last one. It started tense and by the end everyone was relaxed and joking. There was more to demonstrate, more features complete and most of the time was spent on discussing refinements rather than recriminations. My plan for Sprint 6 followed that of Sprint 5, once again without consolation from Sally or Jason. I was still missing the spirit of sprint planning. Nonetheless, the plan was accepted on face value given the previous plan had worked. Things were looking up, or so it seemed.

Sprint 6 kicked off well, shielding Simon from reporting and planning meant he could focus on analysis and make sure features were ready for developers to start work. Even though I had made it clear at the start of Sprint 6 that we wouldn’t be able to finish all the features within 2 weeks, Jason was trying to avoid a Sprint 7 and the overhead that came with it. Not wanting to rock the boat after only just winning back some trust, I went along with the plan and we decided in the second week of Sprint 6 to make it a 3-week sprint. Things progressed relatively well until we were in the middle of week 3. Jason was pushing me to commit to when we would be code complete. I pushed the developers for an answer. We would be code complete bar one feature by the end of the week. All appeared well, “appeared” being the operative word.

The end of Sprint 6 arrived, and we were code complete except for two features. Before the Inspect & Adapt, Simon was frantically preparing, going through various user journeys to make sure all was in order. It wasn‘t, it wasn‘t even close. Individual features were code complete and worked individually, but not necessarily with each other. Cracks in the surface emerged and soon turned into a yawning gap between the concept of code complete and an operational site that we could demonstrate.

It was too late to fix the situation, there too many scenarios that had never been considered, as there were features that had never come together. It wasn’t for lack of analysis or direction, it was simply that some situations hadn‘t been foreseen. Simon did his best to avoid these scenarios, but it was only a matter of time that testing would start in earnest and the truth would come out. Naturally, testing was the next topic of discussion. We had 80 features, but had failed to make much progress with testing except for the stand-alone features which didn’t represent the true complexity of the system. Basically, we had only just started. Once again, not the way Scrum is supposed to work, but neither myself nor Simon knew that. I had a gut feeling that testing would take around 4 weeks. Jason wanted it done in 2. Neither of us were even in the ballpark.

A week later we only had 10 features properly tested. I met with Jason to plan out the rest of testing. He wanted half of the features tested by the end of the second week and the rest by the end of the third week, thinking we’d increase velocity by adding another tester. I wasn’t convinced, but had little choice.

With a huge effort, we reached the target of testing half the features by the end of the second week, but I knew there was no way we‘d get the next 40 done in the following week. Jason was constantly badgering me to give an indication of velocity and asking me about code quality. The issue wasn‘t the quality of code, it was just that scenarios kept arising that no one had considered, and each time this happened, we had to go back to the drawing board and work out how to solve it. At the end of the third week,
even though were we getting through bug fixing quickly, there were still around 20 features to be fully tested end-to-end. That’s when things got ugly, again.

Sprint 6 had gone over by a week, and testing had gone over by 2 weeks. A budget overrun had been flagged as early as Sprint 4, but optimism had got in the way of reality. Not any more. The budget had finally run out. That was it, there was nothing left. The project came to a grinding halt. Assumptions around code complete, code quality and the concept of done had caught us all with our pants down.

A crisis meeting was held with the key stakeholders. It took several days to nut it out, but a solution was found. There would be a workshop with the key stakeholders to go through the site, identify all P1 issues. We would then fix them. Nothing else would be added, changed or amended. It was a once-only workshop. Scheduled to take 6 hours, the workshop kicked off with everyone in a surly mood. After 10 hours, there were only half of the original people left. Pragmatic decisions were made, and everyone finally went home spent but clear on what the path to completion was, in theory.

Along with phrases like, never say never, it can’t be done, I promise not to...etc., the “once-only” workshop wasn’t, new P1 issues crept in, a symptom of the project from its very inception. The two weeks allowed to fix all the issues stretched to three. Additional P1 issues were found and added to the list. Starting at 65 P1 issues, the number after three weeks had crept up to 107. It was no surprise that everyone was tired, tense and strung out. P1 fatigue had hit everyone. Deployments were rushed to get out fixes breaking previous fixes. Everyone was getting sick and tired of the whole thing and just wanted it to end. Finally it did, finally there were enough compromises made on both sides and the site was in a state to be launched. Not perfect, not what Sally wanted, but close enough. Late on a Thursday evening, the dns was finally switched over and the site was live. The celebrations were limp and washed out. We were simply glad it was finally over.

It wasn‘t until a few weeks later when I‘d recovered, did some reading about Scrum and considered the previous few months that I realized the project was never a Scrum project. It used all the terminology of a Scrum project, but without the spirit of it. During the project, when things started to go wrong, the dev team’s inclination was to revert back to what they knew and were comfortable with, a waterfall approach. On the other hand, Jason was pushing to follow the Scrum approach more closely, as he felt that would help rectify matters. Both approaches could have worked, the problem was as things got tense, we were moving in different directions and made things even worse.

There were 3 main issues.

The first was embarking on a project using a new approach and assuming that it was going to be fine. When does anyone try something new and get it right first time? Jason had plenty of experience with Scrum, but Simon and the dev team didn’t. Each time a developer was handed details of a feature they expected it was fully specified. Sally wasn’t used to that approach, she thought she‘d have the opportunity to inspect & adapt, change was normal, once again, who gets a spec right the first time? Neither Simon or the developers had that mindset. Each change was greeted by a developer wondering why the client hadn’t through it through upfront, a waterfall mentality. So on one hand there was Sally expecting to be able to change and adapt, and on the other hand, Simon and the developers expecting it to be built once with little or no changes.

The second issue was that of managing change. In waterfall, you have a list of features that are done in order, as decided by the project manager, the team puts their collective heads down, works hard for a couple of months and then surfaces to test and fine tune. Any changes are handled with a change request, and the impact on time and budget is understood. That’s what Simon and the dev team had in mind, except in this case, rather than a couple of months, it would be a couple of weeks and repeated 6 times. That’s not what Sally and Jason were thinking, they were thinking the Scrum approach, pick the key features, get them done right and then get onto the next set. At the end of each sprint, the velocity is understood and the next sprint is planned. That didn‘t happen. Changes were squeezed into the next sprint without an understanding of the impact, the hope that they could be done along with the features already in that sprint. No allowance was actually made for change and the inevitable flow-on effect. It took 4 sprints before that hit home. Change is fine, thinking it will have no flow-on effect isn‘t.

The final issue was the concept of “done”. Different people had different definitions. The developers thought they were done when they checked their code in. I thought it was done once individual features were tested on the staging server. Jason thought it was done when all P1 issues were closed. Sally thought it was done when it matched her vision of what she wanted, even if it meant fine tuning a feature 10 times. Key stakeholders thought it was done when the budget ran out. All of these views were correct from the individual perspective. The problem was that we, as a team, didn’t have a single view. The inevitable tension and frustration of the back and forth that ensued caused great damage to team morale and progress.

All of these issues can be tracked back to a single root cause, a lack of common understanding. We didn‘t have a common view on how things were to be done. When things went astray, people started pulling in different directions, reverting to what they knew best and making things even worse. It would be easy to blame the problems on it being the first time the dev team had used Scrum, but that’s a symptom, not the cause. They didn’t understand that things would work differently, that change would be managed differently, that the client wasn’t expecting it to be right the first time, nor did Sally understand that the developers were expecting just that. Whether it was Scrum, waterfall or a hybrid approach, the problem was the team was not on the same page. Without a common understanding, a team can’t truly form or perform to the best of its ability. The single most important thing I learned about using Scrum for the first time was not about Scrum itself, it was that if the team members aren‘t clear on the approach, whatever it might be, there will be problems. The method is secondary, a common understanding is first and foremost.