Sunday, June 20, 2010

4 Absence of Evidence is Not Evidence of Absence – The System Testing Conundrum

Jerry’s discussion of the pitfalls and psychological traps surrounding testing struck home, especially bringing to mind one episode I have always referred to as the “No Show Stopper Case.”

I had recently been promoted to Vice President of Quality Management in GTE’s Telenet division.  They had recently launched the first commercial e-mail system, TeleMail, and were poised to release the full-featured version 2—with much fanfare.  Some of the maintenance releases of version 1 had not gone well, with much bad will among early customers and a spreading reputation for poor quality.

It was in this atmosphere that the CEO directed me to conduct an independent review of the recommendation to immediately release the new version into production.  This was not a product that could be rolled out to isolated beta customer sites, but rather a network-based service, so any problems would affect all customers and further erode our fragile industry reputation.

The director of the development shop and his staff were apparently unaware that I had spent three years of my GTE career heading the corporate-wide effort at standardization of methods and tools for large system development.  I suspect that they thought an hour or so of boring technical slides would make the “Quality Control” guy glaze over in boredom and retreat, in awe of their brilliance.  Then they could get on with their release.

The meeting started to deteriorate (at least from the project team’s perspective) when I started probing for such practices as test case generation and test coverage metrics.  By the time I got around to asking to see their development process standards, the room had developed a noticeable chill.  When the director stated that the project was under too tight a deadline for such “overhead”, I asked what was, then, the criterion on which he was recommending to go into production.  His answer was, “We tested until there were no more show stoppers.”

It was at this point that I channeled my high school Jesuit teachers and applied the principle of “reductio ad absurdum” to state, “Well, then, I plan to recommend that we hold the release until your staff delivers to me the exhaustive list of ’show stoppers’ for which you tested, thereby proving they are all absent.”  The director began sputtering at the absurdity of my request, at which point I suggested that maybe the absurdity was lodged in the claim that there were none in the program, given that there was no definition and no test to which he could point to prove the assertion.

After things settled down, we agreed that the delicacy of the situation with our customers and our historically poor industry reputation probably did justify a reasonably short delay to apply some additional rigor to the testing and test results analysis.  When we did, it turns out that the system functioned as specified, but the performance was dismally inadequate for full production use, so performance optimization was undertaken in parallel to the upgrading of the testing discipline.

The blind spot exposed in this case was hardly unique to this team, but seems to be rather common, as continuing industry experience with software project failures indicates.

By the way, over three years time, including this incident, Telenet went from last place to first place in industry ratings for our products and services.

Saturday, June 5, 2010

3 If You Don’t Think Documentation is a Critical Component of a Product, Think Again

Jerry’s Silver Anniversary edition comments on documentation rang true when I recall an incident that came up during a review of the maintenance manual for a telecommunications operations support system.  At the time, I was serving as VP of Engineering in a private company producing a business-critical billing teleprocessing system for telephone companies.

I was not always present for the review of every component of documentation, but there were a lot of programmers on the staff with no particular telecommunications systems experience, so I was very interested to see how the maintenance manual turned out.  The system was conceived to operate continually throughout its decade or more of deployment, carrying all of the usage detail records for every phone call in all but one of the major telephone companies in the United States (as well as several in other countries), from which all of their usage-based billing revenues were derived.  Obviously, routine maintenance and fault repair needed to be handled very carefully.

We were not 30 minutes into the presentation by the documentation team when a procedure was presented that started out: “First, power down both sides of the unit.”  The unit in question was a central office-based real time teleprocessing node with 100% redundancy and fault tolerance designed into the hardware and, presumably, the software.  I probably scared the meeting attendees half to death as my 6+ foot, 250+ pound frame nearly levitated with agitation.  As I explained that the last time both sides of this system were ever intended to be powered down at the same time was the moment just before both sides were first powered up for the rest of their service life, I could tell from the look on the face of the programmers in the meeting (in addition to the look they had on their face when they entered the room due to even having to attend a documentation review) that there was a serious issue.

The senior programmer attending tried, as delicately as he could under the circumstances of my mood, to explain why this was the way to perform the procedure.  I was calm enough to explain to him and the rest of those present that any design or implementation that compelled such a step in any procedure was flawed and they better go back to the drawing board and rethink their solution.  Luckily, as it turned out, the changes in design and implementation of the software was able to adapt to the reality that had somehow escaped the programmer’s understanding.  The hardware, which I had also influenced greatly during its design, was totally capable of supporting the proper approach.

I offer this example of the critical role documentation should play in the total packaging of a software-centered product.

In retrospect, I’d go so far as to say that today I’d take the approach of teaming systems designers and documentation specialists to co-produce the manuals before the detailed design.   I’d make approval of the design and user documentation components a gating event for the commencement of detailed design and implementation.

Thursday, May 27, 2010

2 Jerry’s Comments on Comments Start a Fight

Soon after I was put in charge of the Tools and Standards group of my first big project in GTE, I discovered and enthusiastically read Jerry Weinberg’s Psychology of Computer Programming.  One thing that particularly caught my attention was the experiment regarding comments in code and their effect on a debugging exercise.  I had made and found a few of these myself, so I made the recommendation to my boss that we set our coding standard such that no line-by-line comments were to be used.

This meshed nicely with another standard that I was recommending, based on my discovery of a method of designing and documenting detailed designs called HIPO (Hierarchy plus Input-Process-Output), wherein the entire discussion of the code was encapsulated in a block of text we called the Prologue.  There were also sections in the design documentation for internal storage variables, as well as flow charts for the logic of the component.

Much to my surprise, virtually the entire programmer workforce on the project rose in opposition to this standard, even when presented with the evidence of Jerry’s experimental results.  Being supremely confident in my powers of persuasion, I even constructed a sample bit of code, with and without comments and with an injected bug, but no one was convinced, mainly because I had let the cat out of the bag when I showed them the section from the book.  To further complicate the whole matter, there were programmers from 13 different nationalities on this project, so language was a further barrier to communication.

Finally, the project director declared the standard as approved and, luckily, everyone soon got used to producing documentation and code to the new pattern and project cohesion was not destroyed.

Even so, to this day I am still struck by how attached to the status quo programmers can become, maybe because of their apparent tendency toward conservative versus radical approaches.

1 - Our COBOL Program Owns the Machine – Give us the Feature

One of the recurring attributes I have found in the programmer psychological makeup is a strong reaction in the face of being told something “can’t be done.” One memory from my personal experience that comes to mind occurred when I was working on the BLMPS project while serving in the Air Force. After having first been fully trained on DOS for the IBM System 360 and then retrained on the Burroughs 3500 mainframe (that’s another story that I’ll get to in another installment), I and my oldest friend/programmer extraordinaire Ron Field were trying to get some application code to do something or other when we decided that the Burroughs operating system MCP (Master Control Program) should have a feature for what we were trying to do.

We went through the proper channels and documented the request, going so far as to specify the form of the system routine call that would be used in an application program, be it in assembly language or COBOL. In no time at all, an official response from Burroughs was returned, summarily declaring that what we asked couldn’t be done and we should find another way to do what we were trying to do.

Well, that’s all it took. Due to the gigantic nature of the contract between the Air Force and Burroughs (150+ mainframe computers, plus all the extras that can be charged for in a plum DOD contract), the project programming team had access to the full assembly code source of the MCP. Ron and I proceeded to pore through it until we found an unused executive interrupt vector location. Then we exploited a neat (albeit dangerous) feature that the COBOL compiler guys had put into their implementation of the language. One could place a console command into a working storage variable and, by referencing it from a form of the COBOL PERFORM verb, send operator console commands and receive responses as if you were the operator actually typing at the physical console of the machine. Through a series of these steps, we discovered the relocateable memory location of our COBOL application, poked an entry point address into the unused interrupt vector slot, and triggered an executive mode interrupt to that address, upon which the entire machine was now under control of our COBOL application program in executive mode.

We then, just to be cute and prove that we “owned” the machine, caused the master console lamps to flash in some silly little pattern before executing the very code that we needed and that the gurus at Burroughs told us was impossible to do. Needless to say, when we demonstrated our little trick program, the Burroughs guys demanded to know how we had pulled it off. After feeling mighty proud of ourselves, we drug the concession out of them that they would indeed add the feature to the MCP, in return for which we would disclose the secret to our trick.

There are many times I have seen the tightened jaw and steel in the eye of a programmer who is told something can’t be done. I must admit that it is difficult to harness this energy for sustained productive results, but it is real and must reveal something about the inner workings of the programmer’s mind. I’m sure Ron and I could have gotten our program done without this fun little excursion, but our minds didn’t work that way.