Saturday, September 8, 2012

On Testing

More than decade ago, I spend a year doing quality assurance at a big and successful smart card company. It was one of the more intellectually stimulating jobs I've had in the software industry and I ended up developing a scripting language, sadly long lost, to do test automation. Doing testing right can be much harder than writing the software being tested. Especially when you're after 100% quality as is the case with software burned on the chips of millions of smart cards that would have to be thrown away in case of a serious bug discovered post-production. Companies aiming at this level of quality get ISO 900x certified to show off to clients. To get certified, they have to show a QA process that guarantees, to an acceptable degree of confidence, that products delivered work, that the organization is solid, knowledge is preserved etc. etc. The interesting part that I'd like to share with you is the specific software QA approach. It did involve an obscene amount of both documentation and software artifacts that had to be produced in a very rigid formal setting, but the philosophy behind spec-ing out the tests was sound, practical and better than anything I've seen since. 

Dijkstra famously said that testing can only prove the presence of errors, never the absence of errors. True that. To have 100% guarantee that a program works, one would need to produce a mathematical proof of its correctness. Perhaps not sufficient...as Knuth not less famously noted, when sending a program to a colleague, that the program must be used with caution because he only proved it correct, but never tested it. In either case, ultimately the goal is to gain confidence in the quality of a piece of software. We do that by presenting a strong, very, very convincing argument that the program works. 

When discussing testing methodology people generally talk about automated vs. manual testing, test-driven development where test cases are developed before the code or "classic" testing where they are done after development, but rarely do I see people mindful of how tests should be specified. The term test itself is used rather ambiguously to mean the action of testing, or the specification or the process or the development phase. And in some contexts a test means a test case which refers to an input data set, or it refers to an automated program (e.g. a jUnit test case). So let's agree that, whether you code it up or do it manually, a test case consists of a sequence of steps taken to interact with the software being tested and verify the output with the goal of ensuring that said software behaves as expected. So how do you go about deciding what this sequence of steps should be? In other words, how do you gather test requirements? 

Think of it in dialectical terms - you're trying to convince a skeptic that a program works correctly. First you'd have to agree what it means for that program to work correctly. Well, they say, it must match the requirements. So you start by reading all requirements and translating that last statement ("it must match the requirements" ) for each one of them into a corresponding set of test criteria. Naturally, the more detailed the requirements are, the easier that process is. In an agile setting, you might be translating user stories into test criteria. Let's have a simple running example:

Requirement:

Login form should include a captcha protection

Test Criteria:

  • C1 - the login form should display a string as an image that's hard to recognize by a program.
  • C2 - the login form should include an input field that must match the string in the captcha image for login to succeed. 

Notice how the test criteria explicitly state under what conditions one can say that a program works. One can list more criteria with further detail, stating what happens if the captcha doesn't match. What happens after n number of repeats etc. Also, this should make it clear that test criteria are not actual tests. They are not something that can be executed (manually or automatically). In fact, they are to a test program what requirements are to the software being QA-ed. And as with conventional requirements, the more clear you are on your test criteria, the better chance you have in developing adequate tests. 

The crucial point is that when you write a test case, you want to make sure that it is with a well-defined purpose, that it serves as a demonstration that an actual test criterion has been met. And this is what's missing in 90% of testing efforts (well, to be sure this is just anecdotal evidence). People write or perform tests simply for the sake of trying out things. Tests accumulate and if you have a lot of them, it makes it look like you're in good shape. But that's not necessarily the case because tests can only prove the presence of errors, not their absence. To convince your dialectical opponent of the absence of errors, given the agreed upon list of criteria, you'd have to show how your tests prove that the criteria have been met. In other words, you want to ensure that all your test criteria have been covered by appropriate test cases - for each test criterion there is at least one test case that, when successful, shows that this criterion is satisfied. A convenient way to do that is to create a matrix where you list all your criteria in the rows and all your test cases in the columns and checkmark a given cell whenever the test case covers the corresponding criterion, where "covers" means that if the test case succeeds one can be confident that the criterion is met. This implies that the test case itself will have to include all necessary verification steps. Continuing with our simple example, suppose you've developed a few test cases:

  • T1 - test succesful login
  • T2 - test failed login for all 3 fields, bad user, bad password, bad captcha
  • T3 - test captcha quality by running image recognition algos on captcha
 T1T2T3T4T5......
C1  X    
C2XX     
...       

A given test case may cover a certain aspect of the program, but you'd put a checkmark only if it actually verifies the criteria in question. For instance T3 would be loading a login page, but it won't be testing actual login. Similarly, T1 and T2 can observe the captcha, but they won't evaluate its quality. It may appear a bit laborious as an approach. In the aforementioned company, this was all documented ad nauseam. Criteria were classified as "normal", "abnormal", "stress" and what not, reflecting different types of expected behaviors and possible execution contexts. Now, I did warn you - this was a QA process aimed at living up to ISO standards. And it did. But think about the information this matrix provides you. It is a full, detailed spec of your software. It is a full inventory of your test suite. It tells what part of the program is being tested by what test. It shows you immediately if some criteria are not being covered by a test, or not covered enough. If shows you immediately if some criteria are being covered too much, i.e. if some tests are superfluous. When tests fail, it tells you exactly what behaviors of the software are not functioning properly. Recall that one of the main problems with automated testing is the explosion of code that needs to be written to achieve descent coverage. This matrix can go a long way to controlling that code explosion by keeping each test case with a relatively unique purpose. Most importantly, the matrix presents a pretty good argument for the program's correctness - you can see at a glance both how correctness has been defined (the list of criteria) and how it is demonstrated (the list of tests cross-referenced with criteria).

Reading about testing even from big industry names, I have been frequently disappointed at the lack of systematic approach to test requirements. In practice it's even worse. Developers, testers, business people in general have no idea what they are doing when testing. This includes agile teams where tests are sometimes supposed to constitute the specification of the program. That's plain wrong, first because it's code and code is way too low-level to be understood by all stakeholders, hence it can't be a specification that can be agreed upon by different parties. Second, because usually the same people write both the tests and the program tested, the same bugs sneak in both places and never get discovered, the same possibly wrong understanding of the desired behavior is found in both places. So expressing the quality argument (i.e. with the imaginary dialectical adversary) simply in the form of test cases can't cut it. 

That said, I wouldn't advocate following the approach outlined above verbatim and in full detail. But I would recommend having the mental picture of that Criteria x Tests matrix as guide to what you're doing. And if you're building a regression test suite, and especially if some of the tests are manual, it might be worth your while spelling it out in the corporate wiki somewhere.

Boris

PS This is original content from kobrix.blogspot.com

Wednesday, July 18, 2012

Dealing With Change - Events

Events are a great way to manage change in a complex software made up of many components. When you have decoupled software entities that need to be notified about changes, it's easier the represent the change itself explicitly, as an event entity, so that producers (originators) and consumers (receivers) of the event don't have to know about each other. This leads to fewer connections in the graph of dependencies between the software components comprising the system. This blog post documents the event framework in the Sharegov CiRM platform. This is a first draft and the framework is expected to be evolve of course.

Overview

Within the context of software, events essentially model data changes at various locations. So an event framework needs to define how events are represented, what kinds of data changes are supported, what kind of information would an event entity contain as well as the gluing infrastructure that allows components to publish events and others to consume them. 

Events Ontology Model

The various types of events are modelled in the ontology under Event->SoftwareEventType. A seemingly natural way to model events in OWL is for each event occurrence to be an individual and the various event types to be described via punned classes. Since we don't record event occurrences anywhere, we don't really need to represent events as OWL individuals. So we model the types of events that can occur as OWL individuals with properties that govern their behavior to an extent and we categorize those event types into a few broad categories. One one hand we have events processed entirely at the client and on the other we have entity related events that can be processed on the server or result in server<->client communication. The client only events help in connecting otherwise decoupled client-side components and they are described lastly. The entity (i.w. OWL individuals) events are more thoroughly formalized and they are described next.

Server-Side Event Management

Event handling on the server-side is implemented by the classes in the org.sharegov.cirm.event package. The most common types of events are those that reflect a change in an entity. Such events are modelled with the SoftwareEventType->EntityChangeEvent class. Each individual belonging to that class models how a change of some kind of entity is dealt with. The "kind" of entity is specified through a DL query expression. The following properties comprise that model:

  1. hasChangeType : any suitable individual that represents the type of change, normally an instance of Activty. 
  2. hasImplementation: the fully qualified class name of an org.sharegov.cirm.event.EventTrigger implementation that is invoked to process this event occurrence. There can multiple such properties and each will be invoked in an unspecified order.
  3. hasQueryExpression: A Description Logics (DL) query expression that specifies for what types of individuals this event will be triggered. The query expression is evaluated to obtain the set of all sub-classes. Then whenever an individual change is submitted for query processing, it is checked whether it belongs to one of the sub-classes as defined by that expression. Multiple hasQueryExpression properties are allowed. 

Events are processed on the server by an org.sharegov.cirm.EventDispatcher singleton. All events defined in the ontology are loaded upon startup and the DL query expressions evaluated to create a map of OWLClass->EventTrigger. That singleton is accessed by the various services to explicitly publish events via one of the overloaded EventDispatcher.dispatch methods. 

Server to Client Events

As a lot of application logic resides in the browser, it is wise to load the relevant data beforehand in order to minimize network traffic and improve response time. This of course poses the problem of updates on the server which invalidate the data at the client. Synchronization of such updates happens through server->client events, the so called "server push".  The most efficient way to implement a server push is for the client to do what is refered to as long polling (see http://en.wikipedia.org/wiki/Push_technology) - open a connection with the server and let it timeout if the server has nothing to say, then open a new connection right away again. However, the Restlet framework we are currently using doesn't support this mode, so we had to revert to the traditional style of polling where the server returns right away if there are no events to deliver and the client polls again after a certain interval. In order to to decide which event a server should send to a particular client, the client send the timestamp of the last time it polled. The server then responds with all events timestamped with a later timestamp. Because the comparison is only relative to the client, there aren't any clock synchronization issues to worry about.

The queue of events sent to clients is implemented by the org.sharegov.cirm.event.ClientPushQueue class. Events are added to that queue by a org.sharegov.cirm.event.PushToClientEventTrigger associated to the event via the hasImplementation property in the event descriptor.

At the client, polling and event dispatching is managed by cirm.events object (see EventManager function inside that cirm.js library). To register for an event coming from the server call:

cirm.events.bind(eventIri, listenerFunction)

Call cirm.events.unbind to unregister a listener. The cirm.events also exposes startPolling, stopPolling methods and the ability to explicitly trigger an event via cirm.trigger. 

Client-side events

Such events happen entirely on the client (browser). They are triggered by a change of some value on the client and processed by some other component on the same client. Such events are categorized under the ClientSideEventType class. One case of client-side events is connecting model changes of otherwise disconnected and independent components. When two components are completely decoupled, yet a part of their models represent the same underlying real-world entity, we want a change in one model to be reflect into the other model. When we have such a model that can receive its value from another model through events, we express declaratively in the following way:

  1. Declare the event individual under ClientSideEventType class.
  2. Declare a data source individual under the EventBasedDataSource with  two properties:
    • providedBy pointing to the event created in step (1)
    • hasPropertyName specifying the name of the property in the runtime event data object that contains the model value.
  3. Add a hasDataSource property to the model individual that must be automatically updated when that event is triggered.

Pure client-side events as described in this section are not processed on the server at all. They just define the model used by the JavaScript libraries on the client to communicate between decoupled components. The event dispatching is implemented by the jQuery events mechanism rather than our cirm.events object. Perhaps we should also go through the cirm.events object here as well. However, jQuery has the advantage of scoping listeners and events to DOM elements which can be important if we have multiple instantiations of the same component at different places on a web page.