Sunday, June 20, 2010

4 Absence of Evidence is Not Evidence of Absence – The System Testing Conundrum

Jerry’s discussion of the pitfalls and psychological traps surrounding testing struck home, especially bringing to mind one episode I have always referred to as the “No Show Stopper Case.”

I had recently been promoted to Vice President of Quality Management in GTE’s Telenet division.  They had recently launched the first commercial e-mail system, TeleMail, and were poised to release the full-featured version 2—with much fanfare.  Some of the maintenance releases of version 1 had not gone well, with much bad will among early customers and a spreading reputation for poor quality.

It was in this atmosphere that the CEO directed me to conduct an independent review of the recommendation to immediately release the new version into production.  This was not a product that could be rolled out to isolated beta customer sites, but rather a network-based service, so any problems would affect all customers and further erode our fragile industry reputation.

The director of the development shop and his staff were apparently unaware that I had spent three years of my GTE career heading the corporate-wide effort at standardization of methods and tools for large system development.  I suspect that they thought an hour or so of boring technical slides would make the “Quality Control” guy glaze over in boredom and retreat, in awe of their brilliance.  Then they could get on with their release.

The meeting started to deteriorate (at least from the project team’s perspective) when I started probing for such practices as test case generation and test coverage metrics.  By the time I got around to asking to see their development process standards, the room had developed a noticeable chill.  When the director stated that the project was under too tight a deadline for such “overhead”, I asked what was, then, the criterion on which he was recommending to go into production.  His answer was, “We tested until there were no more show stoppers.”

It was at this point that I channeled my high school Jesuit teachers and applied the principle of “reductio ad absurdum” to state, “Well, then, I plan to recommend that we hold the release until your staff delivers to me the exhaustive list of ’show stoppers’ for which you tested, thereby proving they are all absent.”  The director began sputtering at the absurdity of my request, at which point I suggested that maybe the absurdity was lodged in the claim that there were none in the program, given that there was no definition and no test to which he could point to prove the assertion.

After things settled down, we agreed that the delicacy of the situation with our customers and our historically poor industry reputation probably did justify a reasonably short delay to apply some additional rigor to the testing and test results analysis.  When we did, it turns out that the system functioned as specified, but the performance was dismally inadequate for full production use, so performance optimization was undertaken in parallel to the upgrading of the testing discipline.

The blind spot exposed in this case was hardly unique to this team, but seems to be rather common, as continuing industry experience with software project failures indicates.

By the way, over three years time, including this incident, Telenet went from last place to first place in industry ratings for our products and services.

Saturday, June 5, 2010

3 If You Don’t Think Documentation is a Critical Component of a Product, Think Again

Jerry’s Silver Anniversary edition comments on documentation rang true when I recall an incident that came up during a review of the maintenance manual for a telecommunications operations support system.  At the time, I was serving as VP of Engineering in a private company producing a business-critical billing teleprocessing system for telephone companies.

I was not always present for the review of every component of documentation, but there were a lot of programmers on the staff with no particular telecommunications systems experience, so I was very interested to see how the maintenance manual turned out.  The system was conceived to operate continually throughout its decade or more of deployment, carrying all of the usage detail records for every phone call in all but one of the major telephone companies in the United States (as well as several in other countries), from which all of their usage-based billing revenues were derived.  Obviously, routine maintenance and fault repair needed to be handled very carefully.

We were not 30 minutes into the presentation by the documentation team when a procedure was presented that started out: “First, power down both sides of the unit.”  The unit in question was a central office-based real time teleprocessing node with 100% redundancy and fault tolerance designed into the hardware and, presumably, the software.  I probably scared the meeting attendees half to death as my 6+ foot, 250+ pound frame nearly levitated with agitation.  As I explained that the last time both sides of this system were ever intended to be powered down at the same time was the moment just before both sides were first powered up for the rest of their service life, I could tell from the look on the face of the programmers in the meeting (in addition to the look they had on their face when they entered the room due to even having to attend a documentation review) that there was a serious issue.

The senior programmer attending tried, as delicately as he could under the circumstances of my mood, to explain why this was the way to perform the procedure.  I was calm enough to explain to him and the rest of those present that any design or implementation that compelled such a step in any procedure was flawed and they better go back to the drawing board and rethink their solution.  Luckily, as it turned out, the changes in design and implementation of the software was able to adapt to the reality that had somehow escaped the programmer’s understanding.  The hardware, which I had also influenced greatly during its design, was totally capable of supporting the proper approach.

I offer this example of the critical role documentation should play in the total packaging of a software-centered product.

In retrospect, I’d go so far as to say that today I’d take the approach of teaming systems designers and documentation specialists to co-produce the manuals before the detailed design.   I’d make approval of the design and user documentation components a gating event for the commencement of detailed design and implementation.