March 21, 2012Blog, Improve Performance

Shuttle’s software quality head favours pragmatic approach to process improvement

by Prof Barry Dwolatzky

You’re reversing your car when, suddenly, you hear a loud crunch as you drive over something. You stop, jump out and run to see what you’ve hit. It’s your 5-year-old daughter’s tricycle crushed under your back wheel. Someone must have left it in the driveway yesterday. What do you do next?

a) Feel extremely relieved that your daughter wasn’t sitting on the tricycle when it got crushed under your car …. and then forget all about the incident;

b) Feel irritated that you’ll now have to buy another tricycle … and then forget all about the incident;

c) Feel angry that the tricycle was left in the driveway … and then forget all about the incident;

d) Spend the rest of the day thinking about how it could have happened that you reversed over a child’s tricycle (with or without a child on it). You analyse your routine, you think about precautions you will take in the future and you get your family together to discuss how a similar incident can be avoided in future.

To be honest I probably fall into category a), b) or c). I would probably think, “thank heaven’s no one got hurt!” and then forget about it. Life is busy and who has time to worry about what might have happened.

I met someone recently who definitely falls into category d). His name is Ted Keller and he worked for IBM as the software quality manager on NASA’s Space Shuttle programme. Before each Shuttle launch a number of senior managers were required to sign a form certifying that the launch could proceed. In signing this form Ted Keller was making a very specific and terrifying statement. He was formally certifying that the Shuttle’s software was 100% correct and free of defects and errors. As he says, “Anyone who knows anything about large and complex software systems would probably have to be an idiot to make such a statement.” Not only was Ted Keller’s professional reputation on the line every time he signed the “Flight Readiness” certificate – he was also accepting responsibility for a $4 billion space vehicle and the lives of 7 astronauts, many of whom were his personal friends and neighbours.

How did Keller feel confident enough to sign the “Flight Readiness” form time after time? He attributes his faith in the Shuttle software to two things: (i) the capability and dedication of the team of software engineers who developed it, and (ii) the processes they followed to ensure that everything humanly possible had been done to ensure quality. Every time a defect was detected in software testing, causes were analysed and processes were changed to ensure that the same error never occurred again – category d) in the tricycle example. And it paid off. The Space Shuttle flew for 30 years and did not experience a single mission critical software defect. Keller’s confidence in his software was justified.

I met Ted Keller last week at the SEPG North America Conference in Albuquerque, USA, where he presented a really interesting paper. He spoke about his frustrations and successes in bringing the lessons learnt on the Shuttle programme to other domains. On leaving the Shuttle programme Keller began working as a consultant trying to convince software product companies, banks and others engaged in software development that they should follow NASA’s lead in developing high quality software. To his surprise managers and developers showed very little enthusiasm for the message he brought. In their domains quality just had to be “good enough”.

Over time Keller has changed his approach. He now begins by understanding the “pain points” experienced by software developers and their managers. While, in the case of the Shuttle, defect free software was the key business goal, other domains have their own business drivers. These may include getting a product to market quickly, keeping a project within budget, or eliminating the need for team members to work overtime. In each case Keller has been able to draw on the lessons he learnt working on the Shuttle and come up with a process improvement strategy that helps an organization eliminate its specific pain points.

He listed the following as some of the lessons he has learnt over his long and distinguished career:

Don’t treat process improvement as a “textbook” activity. Textbooks provide guidance but DO NOT teach how to improve processes for any real world situation.
“Process tailoring” has evolved to “process crafting” to create a process improvement method that is appropriate to a specific organization.
It is important to understand which aspects of the organization’s performance can be changed. “Traditional” parameters such as cost, quality and even the skills of workers may not be alterable. Restrictions on education, population, time, materials, workable hours, travel, schedules, etc. can limit or prevent many “text-book” process improvement activities.
Before starting on a process improvement journey one should make certain of senior executives support. It is not always true or obvious that there will be a positive ROI or that any financial benefit can be realized soon enough to justify the impacts and perceived risk to the business.
One of the most successful approaches to achieve process improvement buy-in and stakeholder participation (with or without Executive endorsement) is to build and advertise the intent as a remedy for serious “pain points” being experienced by the stakeholders.

In summary Ted Keller is not an advocate of textbook process improvement but recommends an agile, flexible and pragmatic approach. This is interesting and somewhat surprising coming from the man who successfully shaped the software development processes at the heart of the Shuttle programme.

3 Replies to “Shuttle’s software quality head favours pragmatic approach to process improvement”

Stefan
March 23, 2012 at 9:54 am

Very related to Barry’s concerns, my research group is organizing two fine workshops in the not-too-far future (both in October 2012) which readers of Barry’s blog should not miss:

1. At the first South African Workshop on Software Architecture (SAWOSA’12) in Centurion (South Africa) we want to address the problem of how software quality (in general) is related to software architecture (in particular). We are convinced that all the typical “tinkering” in-the-small is of little value if already the software architecture as the backbone of a software system is crooked. For all the details about this upcoming event, see http://ssfm.cs.up.ac.za/workshop/SAWOSA12.htm

2. Related to Ted Keller’s (and Barry’s) question about what is non-negotiatable quality and what is “just good enough”, we have the third international Workshop on Formal Methods and Agile Methods (FM+AM’12) in Thessaloniki (Greece). In this workshop we want to explore new ways of making formally sound software development methods faster, and rapid software development approaches more formally sound. In this context we are convinced that CASE tool support will play a crucial role. For all the details about this upcoming event, see http://ssfm.cs.up.ac.za/workshop/FMAM12.htm

Thus, readers with some spare money in their pockets may already book their travel tickets either to Centurion or to Thessaloniki! Unfortunately both events are scheduled on almost the very same day, such that you will have to make an exclusive choice – unless you have access to some extraordinary means of miraculous transportation 😉
Ernest Mnkandla
April 5, 2012 at 6:48 pm

Quality in software development requires dedication from the entire organisation in order to enable effective implementation of the quality management process by the development team. I have however, found that people seem to think of quality only when they develop software they perceive to be critical by whatever standard of criticality they use. My question is; shouldn’t quality be the focus of every nontrivial software project? There is an interesting conference on quality come up sometime this year see details at: http://2012.quatic.org:9000/tracks/thematic-tracks/quality-in-agile-methods/
Stefan
April 10, 2012 at 2:05 pm

It’s the 100th “anniversary” in these days of the sinking of the famous ship “Titanic” after a nocturnal collision with an iceberg in the northern atlantic ocean. At this occasion a technical non-fiction book has recently been published which analyses a large web of rather small mistakes which all contributed together to the sinking after the collision. It is surely interesting to read that nautic engineering book from a software engineering perspective! In hindsight, the biggest mistake was the proud presumption that the new ship would be any way unsinkable, because of its very strong external hull of very thick steel plates. From that false initial presumption of unsinkability, construction mistakes inside the belly of the ship followed, for example: water-tight doors were built into the belly of the ship only up to floor level 4, whereas all the higher level floors did not have such doors. The insufficient equipment of the ship with too few rescue boats (which were in themselves, however, very well constructed) also followed from the initial presumption of unsinkability. Moreover, NO well-drilled emergency PROCEDURES amongst the nautic men were in place, also from the false presumption that there would not be any emergency anyway. As a consequence, because of lack of such nautic emergency drills and procedures, the few existing rescue boats were not even filled up to full capacity; some of them were launched half-empty. Thus many altruistic passengers, who had volunteered to drown, could still have got a space in a rescue boat if there would have been proper procedures in place. Last but not least, another near-by ship could not come for rescue, because that other ship’s telecommunication officer was off duty during those hours; in those days 1912 it was not compulsory for ships to be on telecommunication for 24 hours per day. Had the other near-by ship been online, many souls could have been saved. And so on, etc. I think that you immediately see the point what I want to say. Also in software engineering, many subsequent mistakes follow from initially false presumptions and wishful thinking. Sinking comes from wishful thinking.

Comments are closed.