0 Comments

> Using your own words, describe the difference between verification and validation. Do both make use of test-case design methods and testing strategies?
> What is integration testing? What is a good integration test case? Who should perform integration testing?
> Describe the steps associated with user experience testing for a mobile app.
> What is the objective of security testing? Who performs this testing activity?

Need 2-3 pages with peer-reviewed citations. No introduction or conclusion needed.

372

C H A P T E R

19 Software Testing—Component Level

What is it? Software is tested to uncover errors that were made inadvertently as it was de- signed and constructed. A software compo- nent testing strategy considers testing of individual components and integrating them into a working system.

Who does it? A software component testing strategy is developed by the project manager, software engineers, and testing specialists.

Why is it important? Testing often accounts for more project effort than any other software engineering action. If it is conducted haphaz- ardly, time is wasted, unnecessary effort is ex- pended, and even worse, errors sneak through undetected.

What are the steps? Testing begins “in the small” and progresses “to the large.” By this we mean that early testing focuses on a

single component or on a small group of re- lated components and applies tests to uncover errors in the data and processing logic that have been encapsulated by the component(s). After components are tested, they must be integrated until the complete system is constructed.

What is the work product? A test specifica- tion documents the software team’s approach to testing by defining a plan that describes an overall strategy and a procedure that defines specific testing steps and the types of test cases that will be conducted.

How do I ensure that I’ve done it right? An effective test plan and procedure will lead to the orderly construction of the software and the discovery of errors at each stage in the construction process.

Q u i c k L o o k

basis path testing . . . . . . . . . . . . . . . . . . . . . .384 black-box testing . . . . . . . . . . . . . . . . . . . . . . .388 boundary value analysis . . . . . . . . . . . . . . . . .389 class testing . . . . . . . . . . . . . . . . . . . . . . . . . . .390 control structure testing . . . . . . . . . . . . . . . . .386 cyclomatic complexity . . . . . . . . . . . . . . . . . . .385 debugging . . . . . . . . . . . . . . . . . . . . . . . . . . . .373 equivalence partitioning . . . . . . . . . . . . . . . . .389 independent test group . . . . . . . . . . . . . . . . .375 integration testing . . . . . . . . . . . . . . . . . . . . . .375

interface testing . . . . . . . . . . . . . . . . . . . . . . .388 object-oriented testing . . . . . . . . . . . . . . . . . .390 scaffolding . . . . . . . . . . . . . . . . . . . . . . . . . . . .379 system testing . . . . . . . . . . . . . . . . . . . . . . . . .376 testing methods . . . . . . . . . . . . . . . . . . . . . . .373 testing strategies . . . . . . . . . . . . . . . . . . . . . .373 unit testing . . . . . . . . . . . . . . . . . . . . . . . . . . . .376 validation testing . . . . . . . . . . . . . . . . . . . . . . .376 verification . . . . . . . . . . . . . . . . . . . . . . . . . . . .373 white-box testing . . . . . . . . . . . . . . . . . . . . . . .383

k e y c o n c e p t s

Software component testing incorporates a strategy that describes the steps to be conducted as part of testing, when these steps are planned and then undertaken, and how much effort, time, and resources will be required. Within the testing strategy, software component testing implements a collection of component test- ing tactics that address test planning, test-case design, test execution, and resul- tant data collection and evaluation. Both component testing strategy and tactics are considered in this chapter.

CHAPTER 19 SOFTWARE TESTING—COMPONENT LEVEL 373

To be effective, a component testing strategy should be flexible enough to promote a customized testing approach but rigid enough to encourage reasonable planning and management tracking as the project progresses. Component testing remains a respon- sibility of individual software engineers. Who does the testing, how engineers com- municate their results with one another, and when testing is done is determined by the software integration approach and design philosophy adopted by the development team.

These “approaches and philosophies” are what we call strategy and tactics—topics to be discussed in this chapter. In Chapter 20 we discuss the integration testing techniques that often end up defining the team development strategy.

19.1 A st r At e g i c Ap p roAc h to so f t wA r e te st i ng

Testing is a set of activities that can be planned in advance and conducted system- atically. For this reason, a template for software testing—a set of steps into which we can place specific test-case design techniques and testing methods—should be defined for the software process.

A number of software testing strategies have been proposed in the literature [Jan16] [Dak14] [Gut15]. All provide you with a template for testing, and all have the following generic characteristics:

∙ To perform effective testing, you should conduct technical reviews (Chapter 16). By doing this, many errors will be eliminated before testing commences.

∙ Testing begins at the component level and works “outward” toward the integration of the entire computer-based system.

∙ Different testing techniques are appropriate for different software engineering approaches and at different points in time.

∙ Testing is conducted by the developer of the software and (for large projects) an independent test group.

∙ Testing and debugging are different activities, but debugging must be accom- modated in any testing strategy.

A strategy for software testing incorporates a set of tactics that accommodate the low-level tests necessary to verify that a small source code segment has been correctly implemented as well as high-level tests that validate major system functions against customer requirements. A strategy should provide guidance for the practitioner and a set of milestones for the manager. Because the steps of the test strategy occur at a time when deadline pressure begins to rise, progress must be measurable and problems should surface as early as possible.

19.1.1 Verification and Validation Software testing is one element of a broader topic that is often referred to as verifica- tion and validation (V&V). Verification refers to the set of tasks that ensure that software correctly implements a specific function. Validation refers to a different set

374 PART THREE QUALITY AND SECURITY

of tasks that ensure that the software that has been built is traceable to customer requirements. Boehm [Boe81] states this another way:

Verification: “Are we building the product right?” Validation: “Are we building the right product?”

The definition of V&V encompasses many software quality assurance activities (Chapter 19).1

Verification and validation include a wide array of SQA activities: technical reviews, quality and configuration audits, performance monitoring, simulation, feasi- bility study, documentation review, database review, algorithm analysis, development testing, usability testing, qualification testing, acceptance testing, and installation test- ing. Although testing plays an extremely important role in V&V, many other activities are also necessary.

Testing does provide the last bastion from which quality can be assessed and, more pragmatically, errors can be uncovered. But testing should not be viewed as a safety net. As they say, “You can’t test in quality. If it’s not there before you begin testing, it won’t be there when you’re finished testing.” Quality is incorporated into software throughout the process of software engineering, and testing cannot be applied as a fix at the end of the process. Proper application of methods and tools, effective technical reviews, and solid management and measurement all lead to quality that is confirmed during testing.

19.1.2 Organizing for Software Testing For every software project, there is an inherent conflict of interest that occurs as test- ing begins. The people who have built the software are now asked to test the software. This seems harmless in itself; after all, who knows the program better than its devel- opers? Unfortunately, these same developers have a vested interest in demonstrating that the program is error-free, that it works according to customer requirements, and that it will be completed on schedule and within budget. Each of these interests mitigates against thorough testing.

From a psychological point of view, software analysis and design (along with cod- ing) are constructive tasks. The software engineer analyzes, models, and then creates a computer program and its documentation. Like any builder, the software engineer is proud of the edifice that has been built and looks askance at anyone who attempts to tear it down. When testing commences, there is a subtle, yet definite, attempt to “break” the thing that the software engineer has built. From the point of view of the builder, testing can be considered to be (psychologically) destructive. So the builder treads lightly, designing and executing tests that will demonstrate that the program works, rather than to uncover errors. Unfortunately, errors will be nevertheless present. And, if the software engineer doesn’t find them, the customer will!

1 It should be noted that there is a strong divergence of opinion about what types of testing constitute “validation.” Some people believe that all testing is verification and that validation is conducted when requirements are reviewed and approved, and later, by the user when the system is operational. Other people view unit and integration testing (Chapters 19 and 20) as verification and higher-order testing (Chapter 21) as validation.

CHAPTER 19 SOFTWARE TESTING—COMPONENT LEVEL 375

There are often a number of misconceptions that you might infer from the preced- ing discussion: (1) that the developer of software should do no testing at all, (2) that the software should be “tossed over the wall” to strangers who will test it mercilessly, and (3) that testers get involved with the project only when the testing steps are about to begin. Each of these statements is incorrect.

The software developer is always responsible for testing the individual units (com- ponents) of the program, ensuring that each performs the function or exhibits the behavior for which it was designed. In many cases, the developer also conducts inte- gration testing—a testing step that leads to the construction (and test) of the complete software architecture. Only after the software architecture is complete does an inde- pendent test group become involved.

The role of an independent test group (ITG) is to remove the inherent problems associated with letting the builder test the thing that has been built. Independent test- ing removes the conflict of interest that may otherwise be present. After all, ITG personnel are paid to find errors.

However, you don’t turn the program over to ITG and walk away. The developer and the ITG work closely throughout a software project to ensure that thorough tests will be conducted. While testing is conducted, the developer must be available to correct errors that are uncovered.

The ITG is part of the software development project team in the sense that it becomes involved during analysis and design and stays involved (planning and specify- ing test procedures) throughout a large project. However, in many cases the ITG reports to the software quality assurance organization, thereby achieving a degree of indepen- dence that might not be possible if it were a part of the software engineering team.

19.1.3 The Big Picture The software process may be viewed as the spiral illustrated in Figure 19.1. Ini- tially, system engineering defines the role of software and leads to software require- ments analysis, where the information domain, function, behavior, performance,

Figure 19.1 Testing strategy

Unit testing

Code

Integration testing

Design

Validation testing

Requirements

System testing

System engineering

376 PART THREE QUALITY AND SECURITY

constraints, and validation criteria for software are established. Moving inward along the spiral, you come to design and finally to coding. To develop computer software, you spiral inward along streamlines that decrease the level of abstraction on each turn.

A strategy for software testing may also be viewed in the context of the spiral (Figure 19.1). Unit testing begins at the vortex of the spiral and concentrates on each unit (e.g., component, class, or WebApp content object) of the software as imple- mented in source code. Testing progresses by moving outward along the spiral to integration testing, where the focus is on design and the construction of the software architecture. Taking another turn outward on the spiral, you encounter validation testing, where requirements established as part of requirements modeling are vali- dated against the software that has been constructed. Finally, you arrive at system testing, where the software and other system elements are tested as a whole. To test computer software, you spiral out along streamlines that broaden the scope of testing with each turn.

Considering the process from a procedural point of view, testing within the context of software engineering is actually a series of four steps that are implemented sequen- tially. The steps are shown in Figure 19.2. Initially, tests focus on each component individually, ensuring that it functions properly as a unit. Hence, the name unit testing. Unit testing makes heavy use of testing techniques that exercise specific paths in a component’s control structure to ensure complete coverage and maximum error detec- tion. Next, components must be assembled or integrated to form the complete software package. Integration testing addresses the issues associated with the dual problems of verification and program construction. Test-case design techniques that focus on inputs and outputs are more prevalent during integration, although techniques that exercise specific program paths may be used to ensure coverage of major control paths.

Figure 19.2 Software testing steps

High-order tests

Integration test

Unit test

Requirements

Design

Code

Testing "direction"

CHAPTER 19 SOFTWARE TESTING—COMPONENT LEVEL 377

After the software has been integrated (constructed), a set of high-order tests is con- ducted. Validation criteria (established during requirements analysis) must be evalu- ated. Validation testing provides final assurance that software meets all functional, behavioral, and performance requirements.

The last high-order testing step falls outside the boundary of software engineering and into the broader context of computer system engineering (discussed in Chapter 21). Software, once validated, must be combined with other system elements (e.g., hard- ware, people, databases). System testing verifies that all elements mesh properly and that overall system function and performance is achieved.

Preparing for Testing

The scene: Doug Miller’s office, as component-level design con-

tinues and construction of certain components begins.

The players: Doug Miller, software engineer- ing manager; Vinod, Jamie, Ed, and Shakira, members of the SafeHome software engineering team.

The conversation: Doug: It seems to me that we haven’t spent enough time talking about testing.

Vinod: True, but we’ve all been just a little busy. And besides, we have been thinking about it . . . in fact, more than thinking.

Doug (smiling): I know . . . we’re all overloaded, but we’ve still got to think down the line.

Shakira: I like the idea of designing unit tests before I begin coding any of my components, so that’s what I’ve been trying to do. I have a pretty big file of tests to run once code for my components is complete.

Doug: That’s an Extreme Programming [an agile software development process, see Chapter 3] concept, no?

Ed: It is. Even though we’re not using Extreme Programming per se, we decided that it’d be a good idea to design unit tests before we build the component—the design gives us all of the information we need.

Jamie: I’ve been doing the same thing.

Vinod: And I’ve taken on the role of the inte- grator, so every time one of the guys passes a component to me, I’ll integrate it and run a series of regression tests [see Section 20.3 for a discussion on regression testing] on the partially integrated program. I’ve been working to design a set of appropriate tests for each function in the system.

Doug (to Vinod): How often will you run the tests?

Vinod: Every day . . . until the system is integrated . . . well, I mean until the software increment we plan to deliver is integrated.

Doug: You guys are way ahead of me!

Vinod (laughing): Anticipation is everything in the software biz, Boss.

sAfehome

19.1.4 Criteria for “Done” A classic question arises every time software testing is discussed: “When are we done testing—how do we know that we’ve tested enough?” Sadly, there is no definitive answer to this question, but there are a few pragmatic responses and early attempts at empirical guidance.

378 PART THREE QUALITY AND SECURITY

One response to the question is: “You’re never done testing; the burden simply shifts from you (the software engineer) to the end user.” Every time the user executes a computer program, the program is being tested. This sobering fact underlines the importance of other software quality assurance activities. Another response (somewhat cynical but nonetheless accurate) is: “You’re done testing when you run out of time or you run out of money.”

Although few practitioners would argue with these responses, you need more rigor- ous criteria for determining when sufficient testing has been conducted. The statistical quality assurance approach (Section 17.6) suggests statistical use techniques [Rya11] that execute a series of tests derived from a statistical sample of all possible program executions by all users from a targeted population. By collecting metrics during soft- ware testing and making use of existing statistical models, it is possible to develop meaningful guidelines for answering the question: “When are we done testing?”

19.2 pL A n n i ng A n d re c o r d k e e p i ng

Many strategies can be used to test software. At one extreme, you can wait until the system is fully constructed and then conduct tests on the overall system in the hope of finding errors. This approach, although appealing, simply does not work. It will result in buggy software that disappoints all stakeholders. At the other extreme, you could conduct tests on a daily basis, whenever any part of the system is constructed.

A testing strategy that is chosen by many software teams (and the one we recom- mend) falls between the two extremes. It takes an incremental view of testing, begin- ning with the testing of individual program units, moving to tests designed to facilitate the integration of the units (sometimes on a daily basis), and culminating with tests that exercise the constructed system as it evolves. The remainder of this chapter will focus on component-level testing and test-case design.

Unit testing focuses verification effort on the smallest unit of software design—the software component or module. Using the component-level design description as a guide, important control paths are tested to uncover errors within the boundary of the module. The relative complexity of tests and the errors those tests uncover is limited by the constrained scope established for unit testing. The unit test focuses on the internal processing logic and data structures within the boundaries of a component. This type of testing can be conducted in parallel for multiple components.

The best strategy will fail if a series of overriding issues are not addressed. Tom Gilb [Gil95] argues that a software testing strategy will succeed only when software testers: (1) specify product requirements in a quantifiable manner long before testing commences, (2) state testing objectives explicitly, (3) understand the users of the software and develop a profile for each user category, (4) develop a testing plan that emphasizes “rapid cycle testing,”2 (5) build “robust” software that is designed to test

2 Gilb [Gil95] recommends that a software team “learn to test in rapid cycles (2 percent of project effort) of customer-useful, at least field ‘trialable,’ increments of functionality and/ or quality improvement.” The feedback generated from these rapid cycle tests can be used to control quality levels and the corresponding test strategies.

CHAPTER 19 SOFTWARE TESTING—COMPONENT LEVEL 379

itself (the concept of antibugging is discussed briefly in Section 9.3), (6) use effective technical reviews as a filter prior to testing, (7) conduct technical reviews to assess the test strategy and test cases themselves, and (8) develop a continuous improvement approach (Chapter 28) for the testing process.

These principles are reflected in agile software testing as well. In agile develop- ment, the test plan needs to be established before the first sprint meeting and reviewed by stakeholders. This plan merely lays out the rough time line, standards, and tools to be used. The test cases and directions for their use are developed and reviewed by the stakeholders as the code needed to implement each user story is created. Testing results are shared with all team members as soon as practical to allow changes in both existing and future code development. For this reason, many teams choose to keep their test recordkeeping in online documents.

Test recordkeeping does not need to be burdensome. The test cases can be recorded in a Google Docs spreadsheet that briefly describes the test case, contains a pointer to the requirement being tested, contains expected output from the test case data or the criteria for success, allows testers to indicate whether the test was passed or failed and the dates the test case was run, and should have room for comments about why a test may have failed to aid in debugging. This type of online form can be viewed as needed for analysis, and it is easy to summarize at team meetings. Test-case design issues are discussed in Section 19.3.

19.2.1 Role of Scaffolding Component testing is normally considered as an adjunct to the coding step. The design of unit tests can occur before coding begins or after source code has been generated. A review of design information provides guidance for establishing test cases that are likely to uncover errors. Each test case should be coupled with a set of expected results.

Because a component is not a stand-alone program, some type of scaffolding is required to create a testing framework. As part of this framework, driver and/or stub software must often be developed for each unit test. The unit-test environment is illustrated in Figure 19.3. In most applications a driver is nothing more than a “main program” that accepts test-case data, passes such data to the component (to be tested), and prints relevant results. Stubs serve to replace modules that are subordinate (invoked by) the component to be tested. A stub or “dummy subprogram” uses the subordinate module’s interface, may do minimal data manipulation, prints verification of entry, and returns control to the module undergoing testing.

Drivers and stubs represent testing “overhead.” That is, both are software that must be coded (formal design is not commonly applied) but that is not delivered with the final software product. If drivers and stubs are kept simple, actual overhead is rela- tively low. Unfortunately, many components cannot be adequately unit-tested with “simple” scaffolding software. In such cases, complete testing can be postponed until the integration test step (where drivers or stubs are also used).

19.2.2 Cost-Effective Testing Exhaustive testing requires every possible combination of input values and test-case orderings be processed by the component being tested (e.g., consider the move gen- erator in a computer chess game). In some cases, this would require the creation of

380 PART THREE QUALITY AND SECURITY

a near-infinite number of data sets. The return on exhaustive testing is often not worth the effort, since testing alone cannot be used to prove a component is correctly imple- mented. There are some situations in which you will not have the resources to do comprehensive unit testing. In these cases, testers should select modules crucial to the success of the project and those that are suspected to be error-prone because they have complexity metrics as the focus for your unit testing. Some techniques for minimizing the number of test cases required to do a good job testing are discussed in Sections 19.4 through 19.6.

Figure 19.3 Unit-test environment

Interface Local data structures Boundary conditions Independent paths Error-handling paths

RESULTS

Exhaustive Testing Consider a 100-line program in the lan- guage C. After some basic data declara-

tion, the program contains two nested loops that execute from 1 to 20 times each, depending on conditions specified at input. Inside the interior loop, four if-then-else constructs are required. There are approximately 1014 possible paths that may be executed in this program!

To put this number in perspective, we assume that a magic test processor (“magic” because no

such processor exists) has been developed for exhaustive testing. The processor can develop a test case, execute it, and evaluate the results in one millisecond. Working 24 hours a day, 365 days a year, the processor would work for 3170 years to test the program. This would, undeniably, cause havoc in most development schedules.

Therefore, it is reasonable to assert that ex- haustive testing is impossible for large software systems.

Info

CHAPTER 19 SOFTWARE TESTING—COMPONENT LEVEL 381

19.3 Te sT-Ca s e De s i g n

It is a good idea to design unit test cases before you develop code for a component. This ensures that you’ll develop code that will pass the tests or at least the tests you thought of already.

Unit tests are illustrated schematically in Figure 19.4. The module interface is tested to ensure that information properly flows into and out of the program unit under test (Section 19.5.1). Local data structures are examined to ensure that data stored tempo- rarily maintains its integrity during all steps in an algorithm’s execution. All indepen- dent paths through the control structure are exercised to ensure that all statements in a module have been executed at least once (Section 19.4.2). Boundary conditions are tested to ensure that the module operates properly at boundaries established to limit or restrict processing (Section 19.5.3). And finally, all error-handling paths are tested.

Data flow across a component interface is tested before any other testing is initi- ated. If data do not enter and exit properly, all other tests are moot. In addition, local data structures should be exercised and the local impact on global data should be ascertained (if possible) during unit testing.

Selective testing of execution paths is an essential task during the unit test. Test cases should be designed to uncover errors due to erroneous computations, incorrect comparisons, or improper control flow.

Boundary testing is one of the most important unit-testing tasks. Software often fails at its boundaries. That is, errors often occur when the nth element of an n-dimensional array is processed, when the ith repetition of a loop with i passes is invoked, or when the maximum or minimum allowable value is encountered. Test cases that exercise

Figure 19.4 Unit test

Interface Local data structures Boundary conditions Independent paths Error-handling paths

382 PART THREE QUALITY AND SECURITY

data structure, control flow, and data values just below, at, and just above maxima and minima are very likely to uncover errors.

A good design anticipates error conditions and establishes error-handling paths to reroute or cleanly terminate processing when an error does occur. Yourdon [You75] calls this approach antibugging. Unfortunately, there is a tendency to incorporate error handling into software and then never test the error handling. Be sure that you design tests to execute every error-handling path. If you don’t, the path may fail when it is invoked, exacerbating an already dicey situation.

Among the potential errors that should be tested when error handling is evaluated are: (1) error description is unintelligible, (2) error noted does not correspond to error encountered, (3) error condition causes system intervention prior to error handling, (4) exception-condition processing is incorrect, or (5) error description does not pro- vide enough information to assist in the location of the cause of the error.

Designing Unique Tests

The scene: Vinod’s cubical.

The players: Vinod and Ed, members of the SafeHome software engineering team.

The conversation: Vinod: So these are the test cases you intend to run for the passwordValidation operation.

Ed: Yeah, they should cover pretty much all possibilities for the kinds of passwords a user might enter.

Vinod: So let’s see . . . you note that the correct password will be 8080, right?

Ed: Uh-huh.

Vinod: And you specify passwords 1234 and 6789 to test for error in recognizing invalid passwords?

Ed: Right, and I also test passwords that are close to the correct password, see . . . 8081 and 8180.

Vinod: Those are okay, but I don’t see much point in running both the 1234 and 6789 inputs. They’re redundant . . . test the same thing, don’t they?

Ed: Well, they’re different values.

Vinod: That’s true, but if 1234 doesn’t uncover an error . . . in other words . . . the password- Validation operation notes that it’s an invalid password, it’s not likely that 6789 will show us anything new.

Ed: I see what you mean.

Vinod: I’m not trying to be picky here . . . it’s just that we have limited time to do testing, so it’s a good idea to run tests that have a high likelihood of finding new errors.

Ed: Not a problem . . . I’ll give this a bit more thought.

sAfehome

19.3.1 Requirements and Use Cases In requirements engineering (Chapter 7) we suggested starting the requirements gath- ering process by working with the customers to generate user stories that developers can refine into formal use cases and analysis models. These use cases and models can be used to guide the systematic creation of test cases that do a good job of testing the functional requirements of each software component and provide good test cover- age overall [Gut15].

CHAPTER 19 SOFTWARE TESTING—COMPONENT LEVEL 383

The analysis artifacts do not provide much insight into the creation of test cases for many nonfunctional requirements (e.g., usability or reliability). This is where the customer’s acceptance statements included in the user stories can form the basis for writing test cases for the nonfunctional requirements associated with components. Test-case developers make use of additional information based on their professional experience to quantify acceptance criteria to make it testable. Testing nonfunctional requirements may require the use of integration testing methods (Chapter 20) or other specialized testing techniques (Chapter 21).

The primary purpose of testing is to help developers discover defects that were previously unknown. Executing test cases that demonstrate the component is running correctly is often not good enough. As we mentioned earlier (Section 19.3), it is important to write test cases that exercise the error-handling capabilities of a compo- nent. But if we are to uncover new defects, it is also important to write test cases to test that a component does not do things it is not supposed to do (e.g., accessing privileged data sources without proper permissions). These may be stated formally as anti-requirements3 and may require specialized security testing techniques (Section 21.7) [Ale17]. These so-called negative test cases should be included to make sure the component behaves according to the customer’s expectations.

19.3.2 Traceability To ensure that the testing process is auditable, each test case needs to be traceable back to specific functional or nonfunctional requirements or anti-requirements. Often nonfunctional requirements need to be traceable to specific business or architectural requirements. Many agile developers resist the concept of traceability as an unneces- sary burden on developers. But many test process failures can be traced to missing traceability paths, inconsistent test data, or incomplete test coverage [Rem14]. Regres- sion testing (discussed in Section 20.3) requires retesting selected components that may be affected by changes made to other software components that it collaborates with. Although this is more often considered an issue in integration testing (Chapter 20), making sure that test cases are traceable to requirements is an important first step and needs to be done during component testing.

19.4 wh i t e-Box te st i ng

White-box testing, sometimes called glass-box testing or structural testing, is a test- case design philosophy that uses the control structure described as part of component- level design to derive test cases. Using white-box testing methods, you can derive test cases that (1) guarantee that all independent paths within a module have been exer- cised at least once, (2) exercise all logical decisions on their true and false sides, (3) execute all loops at their boundaries and within their operational bounds, and (4) exercise internal data structures to ensure their validity.

3 Anti-requirements are sometimes described during the creation of abuse cases that describe a user story from the perspective of a malicious user and are part of threat analysis (discussed in Chapter 18).

384 PART THREE QUALITY AND SECURITY

19.4.1 Basis Path Testing Basis path testing is a white-box testing technique first proposed by Tom McCabe [McC76]. The basis path method enables the test-case designer to derive a logical complexity measure of a procedural design and use this measure as a guide for defin- ing a basis set of execution paths. Test cases derived to exercise the basis set are guaranteed to execute every statement in the program at least one time during testing.

Before the basis path method can be introduced, a simple notation for the repre- sentation of control flow, called a flow graph (or program graph), must be introduced.4

A flow graph should be drawn only when the logical structure of a component is complex. The flow graph allows you to trace program paths more readily.

To illustrate the use of a flow graph, consider the procedural design representation in Figure 19.5a. Here, a flowchart is used to depict program control structure. Figure  19.5b maps the flowchart into a corresponding flow graph (assuming that no compound conditions are contained in the decision diamonds of the flowchart). Referring to Figure 19.5b, each circle, called a flow graph node, represents one or more procedural statements. A sequence of process boxes and a decision diamond can map into a single node. The arrows on the flow graph, called edges or links, represent flow of control and are analogous to flowchart arrows. An edge must terminate at a node, even if the node does not represent any procedural statements (e.g., see the flow graph symbol for the if-then-else construct). Areas bounded by edges and nodes are called regions. When counting regions, we include the area outside the graph as a region.

An independent path is any path through the program that introduces at least one new set of processing statements or a new condition. When stated in terms of a flow

4 In actuality, the basis path method can be conducted without the use of flow graphs. However, they serve as a useful notation for understanding control flow and illustrating the approach.

Figure 19.5 (a) Flowchart and (b) flow graph

10

1

2,31

6

3

2

4

587

6

9

4,5

8

9 101 1

(a) (b)

Edge

Node

R 3

R 2

Region

1 1

7

R 1

R 4

CHAPTER 19 SOFTWARE TESTING—COMPONENT LEVEL 385

graph, an independent path must move along at least one edge that has not been traversed before the path is defined. For example, a set of independent paths for the flow graph illustrated in Figure 19.5b is

Path 1: 1-11 Path 2: 1-2-3-4-5-10-1-11 Path 3: 1-2-3-6-8-9-10-1-11 Path 4: 1-2-3-6-7-9-10-1-11

Note that each new path introduces a new edge. The path

1-2-3-4-5-10-1-2-3-6-8-9-10-1-11

is not considered to be an independent path because it is simply a combination of already specified paths and does not traverse any new edges.

Paths 1 through 4 constitute a basis set for the flow graph in Figure 19.5b. That is, if you can design tests to force execution of these paths (a basis set), every state- ment in the program will have been guaranteed to be executed at least one time and every condition will have been executed on its true and false sides. It should be noted that the basis set is not unique. In fact, a number of different basis sets can be derived for a given procedural design.

How do you know how many paths to look for? The computation of cyclomatic complexity provides the answer. Cyclomatic complexity is a software metric that pro- vides a quantitative measure of the logical complexity of a program. When used in the context of the basis path testing method, the value computed for cyclomatic com- plexity defines the number of independent paths in the basis set of a program and provides you with an upper bound for the number of tests that must be conducted to ensure that all statements have been executed at least once.

Cyclomatic complexity has a foundation in graph theory and provides you with an extremely useful software metric. Complexity is computed in one of three ways:

1. The number of regions of the flow graph corresponds to the cyclomatic complexity.

2. Cyclomatic complexity V(G) for a flow graph G is defined as

V(G) = E − N + 2

where E is the number of flow graph edges and N is the number of flow graph nodes.

3. Cyclomatic complexity V(G) for a flow graph G is also defined as

V(G) = P + 1

where P is the number of predicate nodes contained in the flow graph G.

Referring once more to the flow graph in Figure 19.5b, the cyclomatic complexity can be computed using each of the algorithms just noted:

1. The flow graph has four regions. 2. V(G) = 11 edges − 9 nodes + 2 = 4. 3. V(G) = 3 predicate nodes + 1 = 4.

386 PART THREE QUALITY AND SECURITY

Therefore, the cyclomatic complexity of the flow graph in Figure 19.5b is 4. More important, the value for V(G) provides you with an upper bound for the

number of independent paths that form the basis set and, by implication, an upper bound on the number of tests that must be designed and executed to guarantee cover- age of all program statements. So in this case we would need to define at most four test cases to exercise each independent logic path.

Using Cyclomatic Complexity

The scene: Shakira’s cubicle.

The players: Vinod and Shakira—members of the SafeHome software engineering team who are working on test planning for the security function.

The conversation: Shakira: Look . . . I know that we should unit-test all the components for the security function, but there are a lot of ‘em and if you consider the number of operations that have to be exercised, I don’t know . . . maybe we should forget white-box testing, integrate everything, and start running black-box tests.

Vinod: You figure we don’t have enough time to do component tests, exercise the opera- tions, and then integrate?

Shakira: The deadline for the first increment is getting closer than I’d like . . . yeah, I’m concerned.

Vinod: Why don’t you at least run white-box tests on the operations that are likely to be the most error-prone?

Shakira (exasperated): And exactly how do I know which are the most error-prone?

Vinod: V of G.

Shakira: Huh?

Vinod: Cyclomatic complexity—V of G. Just compute V(G) for each of the operations within each of the components and see which have the highest values for V(G). They’re the ones that are most likely to be error-prone.

Shakira: And how do I compute V of G?

Vinod: It’s really easy. Here’s a book that describes how to do it.

Shakira (leafing through the pages): Okay, it doesn’t look hard. I’ll give it a try. The ops with the highest V(G) will be the candidates for white-box tests.

Vinod: Just remember that there are no guar- antees. A component with a low V(G) can still be error-prone.

Shakira: Alright. But at least this’ll help me to narrow down the number of components that have to undergo white-box testing.

sAfehome

19.4.2 Control Structure Testing The basis path testing technique described in Section 19.4.1 is one of a number of techniques for control structure testing. Although basis path testing is simple and highly effective, it is not sufficient in itself. In this section, other variations on control structure testing are discussed. These broaden testing coverage and improve the qual- ity of white-box testing.

Condition testing [Tai89] is a test-case design method that exercises the logical conditions contained in a program module. Data flow testing [Fra93] selects test paths of a program according to the locations of definitions and uses of variables in the program.

CHAPTER 19 SOFTWARE TESTING—COMPONENT LEVEL 387

Loop testing is a white-box testing technique that focuses exclusively on the valid- ity of loop constructs. Two different classes of loops [Bei90] can be defined: simple loops and nested loops. (Figure 19.6).

Simple Loops. The following set of tests can be applied to simple loops, where n is the maximum number of allowable passes through the loop.

1. Skip the loop entirely. 2. Only one pass through the loop. 3. Two passes through the loop. 4. m passes through the loop where m < n.

5. n − 1, n, n + 1 passes through the loop.

Nested Loops. If we were to extend the test approach for simple loops to nested loops, the number of possible tests would grow geometrically as the level of nesting increases. This would result in an impractical number of tests. Beizer [Bei90] suggests an approach that will help to reduce the number of tests:

1. Start at the innermost loop. Set all other loops to minimum values. 2. Conduct simple loop tests for the innermost loop while holding the outer

loops at their minimum iteration parameter (e.g., loop counter) values. Add other tests for out-of-range or excluded values.

3. Work outward, conducting tests for the next loop, but keeping all other outer loops at minimum values and other nested loops to “typical” values.

4. Continue until all loops have been tested.

Figure 19.6 Classes of loops

Simple loops Nested loops

388 PART THREE QUALITY AND SECURITY

19.5 BL Ac k-Box te st i ng

Black-box testing, also called behavioral testing or functional testing, focuses on the functional requirements of the software. That is, black-box testing techniques enable you to derive sets of input conditions that will fully exercise all functional require- ments for a program. Black-box testing is not an alternative to white-box techniques. Rather, it is a complementary approach that is likely to uncover a different class of errors than white-box methods.

Black-box testing attempts to find errors in the following categories: (1) incorrect or missing functions, (2) interface errors, (3) errors in data structures or external database access, (4) behavior or performance errors, and (5) initialization and termi- nation errors.

Unlike white-box testing, which is performed early in the testing process, black-box testing tends to be applied during later stages of testing. Because black-box testing purposely disregards control structure, attention is focused on the information domain. Tests are designed to answer the following questions:

∙ How is functional validity tested? ∙ How are system behavior and performance tested? ∙ What classes of input will make good test cases? ∙ Is the system particularly sensitive to certain input values? ∙ How are the boundaries of a data class isolated? ∙ What data rates and data volume can the system tolerate? ∙ What effect will specific combinations of data have on system operation?

By applying black-box techniques, you derive a set of test cases that satisfy the following criteria [Mye79]: test cases that reduce, by a count that is greater than one, the number of additional test cases that must be designed to achieve reasonable test- ing, and test cases that tell you something about the presence or absence of classes of errors, rather than an error associated only with the specific test at hand.

19.5.1 Interface Testing Interface testing is used to check that the program component accepts information passed to it in the proper order and data types and returns information in proper order and data format [Jan16]. Interface testing is often considered part of integration test- ing. Because most components are not stand-alone programs, it is important to make sure that when the component is integrated into the evolving program it will not break the build. This is where the use stubs and drivers (Section 19.2.1) become important to component testers.

Stubs and drivers sometimes incorporate test cases to be passed to the component or accessed by the component. In other cases, debugging code may need to be inserted inside the component to check that data passed was received correctly (Section 19.3). In still other cases, the testing framework should contain code to check that data returned from the component is received correctly. Some agile developers prefer to do interface testing using a copy of the production version of the evolving program with some of this debugging code added.

CHAPTER 19 SOFTWARE TESTING—COMPONENT LEVEL 389

19.5.2 Equivalence Partitioning Equivalence partitioning is a black-box testing method that divides the input domain of a program into classes of data from which test cases can be derived. An ideal test case single-handedly uncovers a class of errors (e.g., incorrect processing of all char- acter data) that might otherwise require many test cases to be executed before the general error is observed.

Test-case design for equivalence partitioning is based on an evaluation of equiva- lence classes for an input condition. Using concepts introduced in the preceding sec- tion, if a set of objects can be linked by relationships that are symmetric, transitive, and reflexive, an equivalence class is present [Bei95]. An equivalence class represents a set of valid or invalid states for input conditions. Typically, an input condition is either a specific numeric value, a range of values, a set of related values, or a Boolean condition. Equivalence classes may be defined according to the following guidelines:

1. If an input condition specifies a range, one valid and two invalid equivalence classes are defined.

2. If an input condition requires a specific value, one valid and two invalid equivalence classes are defined.

3. If an input condition specifies a member of a set, one valid and one invalid equivalence class are defined.

4. If an input condition is Boolean, one valid and one invalid class are defined.

By applying the guidelines for the derivation of equivalence classes, test cases for each input domain data item can be developed and executed. Test cases are selected so that the largest number of attributes of an equivalence class are exercised at once.

19.5.3 Boundary Value Analysis A greater number of errors occurs at the boundaries of the input domain rather than in the “center.” It is for this reason that boundary value analysis (BVA) has been developed as a testing technique. Boundary value analysis leads to a selection of test cases that exercise bounding values.

Boundary value analysis is a test-case design technique that complements equiva- lence partitioning. Rather than selecting any element of an equivalence class, BVA leads to the selection of test cases at the “edges” of the class. Rather than focusing solely on input conditions, BVA derives test cases from the output domain as well [Mye79].

Guidelines for BVA are similar in many respects to those provided for equivalence partitioning:

1. If an input condition specifies a range bounded by values a and b, test cases should be designed with values a and b and just above and just below a and b.

2. If an input condition specifies a number of values, test cases should be developed that exercise the minimum and maximum numbers. Values just above and below minimum and maximum are also tested.

3. Apply guidelines 1 and 2 to output conditions. For example, assume that a temperature versus pressure table is required as output from an engineering

390 PART THREE QUALITY AND SECURITY

analysis program. Test cases should be designed to create an output report that produces the maximum (and minimum) allowable number of table entries.

4. If internal program data structures have prescribed boundaries (e.g., a table has a defined limit of 100 entries), be certain to design a test case to exercise the data structure at its boundary.

Most software engineers intuitively perform BVA to some degree. By applying these guidelines, boundary testing will be more complete, thereby having a higher likelihood for error detection.

19.6 oB j e c t-or i e n t e d te st i ng

When object-oriented software is considered, the concept of the unit changes. Encapsulation drives the definition of classes and objects. This means that each class and each instance of a class packages attributes (data) and the operations that manipulate these data. An encapsulated class is usually the focus of unit testing. However, operations (methods) within the class are the smallest testable units. Because a class can contain a number of different operations, and a particular oper- ation may exist as part of a number of different classes, the tactics applied to unit testing must change.

You can no longer test a single operation in isolation (the conventional view of unit testing) but rather as part of a class. To illustrate, consider a class hierarchy in which an operation X is defined for the superclass and is inherited by a number of subclasses. Each subclass uses operation X, but it is applied within the context of the private attributes and operations that have been defined for the subclass. Because the context in which operation X is used varies in subtle ways, it is necessary to test operation X in the context of each of the subclasses. This means that testing operation X in a stand-alone fashion (the conventional unit-testing approach) is usually ineffec- tive in the object-oriented context.

19.6.1 Class Testing Class testing for object-oriented (OO) software is the equivalent of unit testing for conventional software. Unlike unit testing of conventional software, which tends to focus on the algorithmic detail of a module and the data that flow across the module interface, class testing for OO software is driven by the operations encapsulated by the class and the state behavior of the class.

To provide brief illustrations of these methods, consider a banking application in which an Account class has the following operations: open(), setup(), deposit(), with- draw(), balance(), summarize(), creditLimit(), and close() [Kir94]. Each of these operations may be applied for Account, but certain constraints (e.g., the account must be opened before other operations can be applied and closed after all operations are completed) are implied by the nature of the problem. Even with these constraints, there are many permutations of the operations. The minimum behavioral life history of an instance of Account includes the following operations:

open•setup•deposit•withdraw•close

CHAPTER 19 SOFTWARE TESTING—COMPONENT LEVEL 391

This represents the minimum test sequence for account. However, a wide variety of other behaviors may occur within this sequence:

open•setup•deposit•[deposit|withdraw|balance|summarize|creditLimit]n• withdraw•close

A variety of different operation sequences can be generated randomly. For example: Test case r1:

open•setup•deposit•deposit•balance•summarize•withdraw•close

Test case r2:

open•setup•deposit•withdraw•deposit•balance•creditLimit•withdraw•close

These and other random order tests are conducted to exercise different class instance life histories. Use of test equivalence partitioning (Section 19.5.2) can reduce the number of test cases required.

Class Testing

The scene: Shakira’s cubicle.

The players: Jamie and Shakira, members of the SafeHome software engineering team who are working on test-case design for the security function.

The conversation: Shakira: I’ve developed some tests for the Detector class [Figure 11.4]—you know, the one that allows access to all of the Sensor objects for the security function. You familiar with it?

Jamie (laughing): Sure, it’s the one that allowed you to add the “doggie angst” sensor.

Shakira: The one and only. Anyway, it has an interface with four ops: read(), enable(), dis- able(), and test(). Before a sensor can be read, it must be enabled. Once it’s enabled, it can be read and tested. It can be disabled at any time, except if an alarm condition is being pro- cessed. So I defined a simple test sequence that will exercise its behavioral life history. (She shows Jamie the following sequence.)

#1: enable•test•read•disable

Jamie: That’ll work, but you’ve got to do more testing than that!

Shakira: I know, I know. Here are some other sequences I’ve come up with. (She shows Jamie the following sequences.)

#2: enable•test•[read]n•test•disable #3: [read]n

#4: enable•disable•[test | read]

Jamie: So let me see if I understand the intent of these. #1 goes through a normal life history, sort of a conventional usage. #2 repeats the read operation n times, and that’s a likely sce- nario. #3 tries to read the sensor before it’s been enabled . . . that should produce an error message of some kind, right? #4 enables and disables the sensor and then tries to read it. Isn’t that the same as test #2?

Shakira: Actually no. In #4, the sensor has been enabled. What #4 really tests is whether the disable op works as it should. A read() or test() after disable() should generate the error message. If it doesn’t, then we have an error in the disable op.

Jamie: Cool. Just remember that the four tests have to be applied for every sensor type since all the ops may be subtly different depending on the type of sensor.

Shakira: Not to worry. That’s the plan.

sAfehome

392 PART THREE QUALITY AND SECURITY

19.6.2 Behavioral Testing The use of the state diagram as a model that represents the dynamic behavior of a class is discussed in Chapter 8. The state diagram for a class can be used to help derive a sequence of tests that will exercise the dynamic behavior of the class (and those classes that collaborate with it). Figure 19.7 [Kir94] illustrates a state diagram for the Account class discussed earlier. Referring to the figure, initial transitions move through the empty acct and setup acct states. The majority of all behavior for instances of the class occurs while in the working acct state. A final withdrawal and account closure cause the account class to make transitions to the nonworking acct and dead acct states, respectively.

The tests to be designed should achieve coverage of every state. That is, the oper- ation sequences should cause the Account class to transition through all allowable states:

Test case s1: open•setupAccnt•deposit (initial)•withdraw (final)•close

Adding additional test sequences to the minimum sequence, Test case s2: open•setupAccnt•deposit(initial)•deposit•balance•credit•

withdraw (final)•close

Test case s3: open•setupAccnt•deposit(initial)•deposit•withdraw•accntInfo•

withdraw (final)•close

Figure 19.7 State diagram for the Account class Source: Kirani, Shekhar and Tsai, W. T., “Specification and Verification of Object-Oriented Programs,” Technical Report TR 94-64, University of Minnesota, December 1994, 79.

Empty acct

Set up acct

Working acct

Nonworking acct

Dead acct

Open

Close

Deposit (initial)

Withdrawal (Final)

Balance credit

accntInfo

Deposit

Withdraw

Setup Acct

CHAPTER 19 SOFTWARE TESTING—COMPONENT LEVEL 393

Still more test cases could be derived to ensure that all behaviors for the class have been adequately exercised. In situations in which the class behavior results in a col- laboration with one or more classes, multiple state diagrams are used to track the behavioral flow of the system.

The state model can be traversed in a “breadth-first” [McG94] manner. In this context, breadth-first implies that a test case exercises a single transition and that when a new transition is to be tested, only previously tested transitions are used.

Consider a CreditCard object that is part of the banking system. The initial state of CreditCard is undefined (i.e., no credit card number has been provided). Upon reading the credit card during a sale, the object takes on a defined state; that is, the attributes card number and expiration date, along with bank-specific identifiers, are defined. The credit card is submitted when it is sent for authorization, and it is approved when authorization is received. The transition of CreditCard from one state to another can be tested by deriving test cases that cause the transition to occur. A breadth-first approach to this type of testing would not exercise submitted before it exercised undefined and defined. If it did, it would make use of transitions that had not been previously tested and would therefore violate the breadth-first criterion.

19.7 su m m A ry

Software testing accounts for the largest percentage of technical effort in the software process. Regardless of the type of software you build, a strategy for systematic test planning, execution, and control begins by considering small elements of the software and moves outward toward the program as a whole.

The objective of software testing is to uncover errors. For conventional software, this objective is achieved through a series of test steps. Unit and integration tests (discussed in Chapter 20) concentrate on functional verification of a component and incorporation of components into the software architecture. The strategy for testing object-oriented software begins with tests that exercise the operations within a class and then moves to thread-based testing for integration (discussed in Section 20.4.1). Threads are sets of classes that respond to an input or event.

Test cases should be traceable to software requirements. Each test step is accom- plished through a series of systematic test techniques that assist in the design of test cases. With each testing step, the level of abstraction with which software is consid- ered is broadened. The primary objective for test-case design is to derive a set of tests that have the highest likelihood for uncovering errors in software. To accomplish this objective, two different categories of test-case design techniques are used: white-box testing and black-box testing.

White-box tests focus on the program control structure. Test cases are derived to ensure that all statements in the program have been executed at least once during test- ing and that all logical conditions have been exercised. Basis path testing, a white-box technique, makes use of program graphs (or graph matrices) to derive the set of linearly independent tests that will ensure coverage. Condition and data flow testing further exercise program logic, and loop testing complements other white-box techniques by providing a procedure for exercising loops of varying degrees of complexity.

394 PART THREE QUALITY AND SECURITY

Black-box tests are designed to validate functional requirements without regard to the internal workings of a program. Black-box testing techniques focus on the infor- mation domain of the software, deriving test cases by partitioning the input and out- put domain of a program in a manner that provides thorough test coverage. Equivalence partitioning divides the input domain into classes of data that are likely to exercise a specific software function. Boundary value analysis probes the program’s ability to handle data at the limits of acceptability.

Unlike testing (a systematic, planned activity), debugging can be viewed as an art. Beginning with a symptomatic indication of a problem, the debugging activity must track down the cause of an error. Testing can sometimes help find the root cause of the error. But often, the most valuable resource is the counsel of other members of the software engineering staff.

Pro b l e m s a n d Po i n t s to Po n d e r

19.1. Using your own words, describe the difference between verification and validation. Do both make use of test-case design methods and testing strategies?

19.2. List some problems that might be associated with the creation of an independent test group. Are an ITG and an SQA group made up of the same people?

19.3. Why is a highly coupled module difficult to unit test?

19.4. Is unit testing possible or even desirable in all circumstances? Provide examples to justify your answer.

19.5. Can you think of any additional testing objectives that are not discussed in Section 19.1.1?

19.6. Select a software component that you have designed and implemented recently. Design a set of test cases that will ensure that all statements have been executed using basis path testing.

19.7. Myers [Mye79] uses the following program as a self-assessment for your ability to spec- ify adequate testing: A program reads three integer values. The three values are interpreted as representing the lengths of the sides of a triangle. The program prints a message that states whether the triangle is scalene, isosceles, or equilateral. Develop a set of test cases that you feel will adequately test this program.

19.8. Design and implement the program (with error handling where appropriate) specified in Problem 19.7. Derive a flow graph for the program and apply basis path testing to develop test cases that will guarantee that all statements in the program have been tested. Execute the cases and show your results.

19.9. Give at least three examples in which black-box testing might give the impression that “everything’s OK,” while white-box tests might uncover an error. Give at least three examples in which white-box testing might give the impression that “everything’s OK,” while black-box tests might uncover an error.

19.10. In your own words, describe why the class is the smallest reasonable unit for testing within an OO system.

Design element: Quick Look icon magnifying glass: © Roger Pressman

395

C H A P T E R

20

What is it? Integration testing assembles com- ponents in a manner that allows the testing of increasingly larger software functions with the intent of finding errors as the software is assembled.

Who does it? During early stages of testing, a software engineer performs all tests. How- ever, as the testing process progresses, test- ing specialists may become involved in addition to other stakeholders.

Why is it important? Test cases must be de- signed using disciplined techniques to ensure that the components have been integrated properly into the complete software product.

What are the steps? Internal program logic is exercised using “white-box” test-case design

techniques and software requirements are exercised using “black-box” test-case design techniques.

What is the work product? A set of test cases designed to exercise internal logic, interfaces, component collaborations, and external re- quirements is designed and documented, ex- pected results are defined, and actual results are recorded.

How do I ensure that I’ve done it right? When you begin testing, change your point of view. Try hard to “break” the software! Design test cases in a disciplined fashion, and review the test cases you do create for thoroughness.

Q u i c k L o o k

Software Testing— Integration Level

artificial intelligence . . . . . . . . . . . . . . . . . . . .403 black-box testing . . . . . . . . . . . . . . . . . . . . . . .397 bottom-up integration . . . . . . . . . . . . . . . . . . .399 continuous integration . . . . . . . . . . . . . . . . . .400 cluster testing . . . . . . . . . . . . . . . . . . . . . . . . .404 fault-based testing . . . . . . . . . . . . . . . . . . . . .405 integration testing . . . . . . . . . . . . . . . . . . . . . .398 multiple-class partition testing . . . . . . . . . . . .405

patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .409 regression testing . . . . . . . . . . . . . . . . . . . . . .402 scenario-based testing . . . . . . . . . . . . . . . . . .407 smoke testing . . . . . . . . . . . . . . . . . . . . . . . . .400 thread-based testing . . . . . . . . . . . . . . . . . . . .404 top-down integration . . . . . . . . . . . . . . . . . . .398 validation testing . . . . . . . . . . . . . . . . . . . . . . .407 white-box testing . . . . . . . . . . . . . . . . . . . . . . .397

k e y c o n c e p t s

A single developer may be able to test software components without involving other team members. This is not true for integration testing where a component must interact properly with components developed by other team members. Inte- gration testing exposes many weaknesses of software development groups who have not gelled as a team. Integration testing presents an interesting dilemma for software engineers, who by their nature are constructive people. In fact, all test- ing requires that developers discard preconceived notions of the “correctness” of software just developed and instead work hard to design test cases to “break” the software. This means that team members need to be able to accept sugges- tions from other team members that their code is not behaving properly when it is tested as part of the latest software increment.

396 PART THREE QUALITY AND SECURITY

Beizer [Bei90] describes a “software myth” that all testers face. He writes: “There’s a myth that if we were really good at programming, there would be no bugs to catch  .  .  . There are bugs, the myth says, because we are bad at what we do; and if we are bad at it, we should feel guilty about it.”

Should testing instill guilt? Is testing really destructive? The answer to these questions is, No!

At the beginning of this book, we stressed the fact that software is only one element of a larger computer-based system. Ultimately, software is incorporated with other system elements (e.g., hardware, people, information), and systems testing (a series of system integration and validation tests) is conducted. These tests fall outside the scope of the software process and are not conducted solely by software engineers. However, steps taken during software design and testing can greatly improve the probability of successful software integration in the larger system.

In this chapter, we discuss techniques for software integration testing strategies applicable to most software applications. Specialized software testing strategies are discussed in Chapter 21.

20.1 so f t wa r e te st i ng fu n da m e n ta L s

The goal of testing is to find errors, and a good test is one that has a high probabil- ity of finding an error. Kaner, Falk, and Nguyen [Kan93] suggest the following attri- butes of a “good” test:

A good test has a high probability of finding an error. To achieve this goal, the tester must understand the software and attempt to develop a mental picture of how the software might fail.

A good test is not redundant. Testing time and resources are limited. There is no point in conducting a test that has the same purpose as another test. Every test should have a different purpose (even if it is subtly different).

A good test should be “best of breed” [Kan93]. In a group of tests that have a similar intent, time and resource limitations may dictate the execution of only those tests that have the highest likelihood of uncovering a whole class of errors.

A good test should be neither too simple nor too complex. Although it is some- times possible to combine a series of tests into one test case, the possible side effects associated with this approach may mask errors. In general, each test should be executed separately.

Any engineered product (and most other things) can be tested in one of two ways: (1) Knowing the specified function that a product has been designed to perform, tests can be conducted that demonstrate each function is fully operational while at the same time searching for errors in each function. (2) Knowing the internal workings of a product, tests can be conducted to ensure that “all gears mesh,” that is, internal operations are performed according to specifications and all internal components have been adequately exercised. The first test approach

CHAPTER 20 SOFTWARE TESTING—INTEGRATION LEVEL 397

takes an external view of testing and is called black-box testing. The second requires an internal view of testing and is termed white-box testing.1 Both are useful in integration testing [Jan16].

20.1.1 Black-Box Testing Black-box testing alludes to integration testing that is conducted by exercising the component interfaces with other components and with other systems. It examines some fundamental aspect of a system with little regard for the internal logical structure of the software. Instead, the focus is on ensuring the component executes correctly in the larger software build when the input data and software context specified by its preconditions is correct and behaves in the ways specified by its postconditions. It is of course important to make sure that the component behaves correctly when its preconditions are not satisfied (e.g., it can handle bad inputs without crashing).

Black-box testing is based on the requirements specified in user stories (Chapter 7). Test-case authors do not need to wait for the component implementation code to be written once the component interface is defined. Several cooperating components may need to be written to implement the functionality defined by a single user story. Val- idation testing (Section 20.5) often defines black-box test cases in terms of the end- user visible input actions and observable output behaviors, without any knowledge of how the components themselves were implemented.

20.1.2 White-Box Testing White-box testing, sometimes called glass-box testing or structural testing, is an inte- gration testing philosophy that uses implementation knowledge of the control struc- tures described as part of component-level design to derive test cases. White-box testing of software is predicated on close examination of procedural implementation details and data structure implementation details. White-box tests can be designed only after component-level design (or source code) exists. The logical details of the program must be available. Logical paths through the software and collaborations between components are the focus of white-box integration testing.

At first glance it would seem that very thorough white-box testing would lead to “100 percent correct programs.” All we need do is define all logical paths, develop test cases to exercise them, and evaluate results, that is, generate test cases to exercise program logic exhaustively. Unfortunately, exhaustive testing presents certain logisti- cal problems. For even small programs, the number of possible logical paths can be very large. White-box testing should not, however, be dismissed as impractical. Test- ers should select a reasonable number of important logical paths to exercise once component integration occurs. Important data structures should also be tested for validity after component integration.

1 The terms functional testing and structural testing are sometimes used in place of black-box and white-box testing, respectively.

398 PART THREE QUALITY AND SECURITY

20.2 in t e g r at i o n te st i ng

A neophyte in the software world might ask a seemingly legitimate question once all modules have been unit-tested: “If they all work individually, why do you doubt that they’ll work when we put them together?” The problem, of course, is “putting them together”—interfacing. Data can be lost across an interface; one component can have an inadvertent, adverse effect on another; subfunctions, when combined, may not produce the desired major function; individually acceptable imprecision may be mag- nified to unacceptable levels; and global data structures can present problems. Sadly, the list goes on and on.

Integration testing is a systematic technique for constructing the software architec- ture while at the same time conducting tests to uncover errors associated with inter- facing. The objective is to take unit-tested components and build a program structure that has been dictated by design.

There is often a tendency to attempt nonincremental integration, that is, to construct the program using a “big bang” approach. In the big bang approach, all components are combined in advance and the entire program is tested as a whole. Chaos usually results! Errors are encountered, but correction is difficult because isolation of causes is complicated by the vast expanse of the entire program. Taking the big bang approach to integration is a lazy strategy that is doomed to failure.

Incremental integration is the antithesis of the big bang approach. The program is constructed and tested in small increments, where errors are easier to isolate and correct; interfaces are more likely to be tested completely; and a systematic test approach may be applied. Integrate incrementally and testing as you go is a more cost-effective strategy. We discuss several common incremental integration testing strategies in the remainder of this chapter.

20.2.1 Top-Down Integration Top-down integration testing is an incremental approach to construction of the soft- ware architecture. Modules (also referred to as components in this book) are integrated by moving downward through the control hierarchy, beginning with the main control module (main program). Modules subordinate (and ultimately subordinate) to the main control module are incorporated into the structure in either a depth-first or breadth- first manner.

Referring to Figure 20.1, depth-first integration integrates all components on a major control path of the program structure. Selection of a major path is somewhat arbitrary and depends on application-specific characteristics (e.g., components needed to implement one use case). For example, selecting the left-hand path, components M1, M2, M5 would be integrated first. Next, M8 or (if necessary for proper function- ing of M2) M6 would be integrated. Then, the central and right-hand control paths are built. Breadth-first integration incorporates all components directly subordinate at each level, moving across the structure horizontally. From the figure, components M2, M3, and M4 would be integrated first. The next control level, M5, M6, and so on, follows. The integration process is performed in a series of five steps:

1. The main control module is used as a test driver, and stubs are substituted for all components directly subordinate to the main control module.

CHAPTER 20 SOFTWARE TESTING—INTEGRATION LEVEL 399

2. Depending on the integration approach selected (i.e., depth or breadth first), subordinate stubs are replaced one at a time with actual components.

3. Tests are conducted as each component is integrated. 4. On completion of each set of tests, another stub is replaced with the real

component. 5. Regression testing (discussed later in this section) may be conducted to ensure

that new errors have not been introduced.

The process continues from step 2 until the entire program structure is built. The top-down integration strategy verifies major control or decision points early

in the test process. In a “well-factored” program structure, decision making occurs at upper levels in the hierarchy and is therefore encountered first. If major control prob- lems do exist, early recognition is essential. If depth-first integration is selected, a complete function of the software may be implemented and demonstrated. Early demonstration of functional capability is a confidence builder for all stakeholders.

20.2.2 Bottom-Up Integration Bottom-up integration testing, as its name implies, begins construction and testing with atomic modules (i.e., components at the lowest levels in the program structure). Bottom-up integration eliminates the need for complex stubs. Because components are integrated from the bottom up, the functionality provided by components subor- dinate to a given level is always available and the need for stubs is eliminated. A bottom-up integration strategy may be implemented with the following steps:

1. Low-level components are combined into clusters (sometimes called builds) that perform a specific software subfunction.

2. A driver (a control program for testing) is written to coordinate test-case input and output.

3. The cluster is tested. 4. Drivers are removed and clusters are combined, moving upward in the

program structure.

M1

M3 M4

M5

M2

M6 M7

M8

Figure 20.1 Top-down integration

400 PART THREE QUALITY AND SECURITY

Integration follows the pattern illustrated in Figure 20.2. Components are combined to form clusters 1, 2, and 3. Each of the clusters is tested using a driver (shown as a dashed block). Components in clusters 1 and 2 are subordinate to Ma. Drivers D1 and D2 are removed, and the clusters are interfaced directly to Ma. Similarly, driver D3 for cluster 3 is removed prior to integration with module Mb. Both Ma and Mb will ultimately be integrated with component Mc, and so forth.

As integration moves upward, the need for separate test drivers lessens. In fact, if the top two levels of program structure are integrated top down, the number of driv- ers can be reduced substantially and integration of clusters is greatly simplified.

20.2.3 Continuous Integration Continuous integration is the practice of merging components into the evolving soft- ware increment once or more each day. This is a common practice for teams follow- ing agile development practices such as XP (Section 3.5.1) or DevOps (Section 3.5.2). Integration testing must take place quickly and efficiently if a team is attempting to always have a working program in place as part of continuous delivery. It is some- times hard to maintain systems with the use of continuous integration tools [Ste18]. Maintenance and continuous integration issues are discussed in more detail in Section 22.4.

Smoke testing is an integration testing approach that can be used when product software is developed by an agile team using short increment build times. Smoke testing might be characterized as a rolling or continuous integration strategy. The software is rebuilt (with new components added) and smoke tested every day. It is designed as a pacing mechanism for time-critical projects, allowing the software team

Figure 20.2 Bottom-up integration

Mc

D3 D2

Mb Ma

D1

Cluster 1

Cluster 2

Cluster 3

CHAPTER 20 SOFTWARE TESTING—INTEGRATION LEVEL 401

to assess the project on a frequent basis. In essence, the smoke-testing approach encompasses the following activities:

1. Software components that have been translated into code are integrated into a build. A build includes all data files, libraries, reusable modules, and engineered components that are required to implement one or more product functions.

2. A series of tests is designed to expose errors that will keep the build from properly performing its function. The intent should be to uncover “show-stopper” errors that have the highest likelihood of throwing the software project behind schedule.

3. The build is integrated with other builds, and the entire product (in its current form) is smoke tested daily. The integration approach may be top down or bottom up.

The daily frequency of testing gives both managers and practitioners a realistic assessment of integration testing progress. McConnell [McC96] describes the smoke test in the following manner:

The smoke test should exercise the entire system from end to end. It does not have to be exhaustive, but it should be capable of exposing major problems. The smoke test should be thorough enough that if the build passes, you can assume that it is stable enough to be tested more thoroughly.

Smoke testing provides a number of benefits when it is applied on complex, time-critical software projects:

∙ Integration risk is minimized. Because smoke tests are conducted daily, incompatibilities and other showstopper errors are uncovered early, thereby reducing the likelihood of serious schedule impact when errors are uncovered.

∙ The quality of the end product is improved. Because the approach is con- struction (integration) oriented, smoke testing is likely to uncover functional errors as well as architectural and component-level design errors. If these errors are corrected early, better product quality will result.

∙ Error diagnosis and correction are simplified. Like all integration testing approaches, errors uncovered during smoke testing are likely to be associated with “new software increments”—that is, the software that has just been added to the build(s) is a probable cause of a newly discovered error.

∙ Progress is easier to assess. With each passing day, more of the software has been integrated and more has been demonstrated to work. This improves team morale and gives managers a good indication that progress is being made.

In some ways smoke testing resembles regression testing (discussed in Section 20.3), which helps to ensure that the newly added components do not interfere with the behaviors of existing components that were previously tested. To do this, it is a good idea to rerun a subset of the test cases that executed with the existing software com- ponent before the new components were added. The effort required to rerun test cases is not trivial, and automated testing can be used to reduce the time and effort re-created to rerun these test cases [Net18]. A complete discussion of automated testing is

402 PART THREE QUALITY AND SECURITY

beyond the scope of this chapter, but links to representative tools can be found on the Web pages that supplement this book.2

20.2.4 Integration Test Work Products An overall plan for integration of the software and a description of specific tests is documented in a test specification. This work product incorporates a test plan and a test procedure and becomes part of the software configuration. Testing is divided into phases and incremental builds that address specific functional and behavioral charac- teristics of the software. For example, integration testing for the SafeHome security system might be divided into the following test phases: user interaction, sensor pro- cessing, communications functions, and alarm processing.

Each integration test phase delineates a broad functional category within the soft- ware and generally can be related to a specific domain within the software architec- ture. Therefore, software increments are created to correspond to each phase.

A schedule for integration, the development of scaffolding software (Section 19.2.1), and related topics are also discussed as part of the test plan. Start and end dates for each phase are established, and “availability windows” for unit-tested modules are defined. When developing a project schedule, you’ll have to consider the manner in which integration occurs so that components will be available when needed. A brief description of scaffolding software (stubs and drivers) concentrates on characteristics that might require special effort. Finally, test environment and resources are described. Unusual hardware configurations, exotic simulators, and special test tools or tech- niques are a few of many topics that may also be discussed.

The detailed testing procedure that is required to accomplish the test plan is described next. The order of integration and corresponding tests at each integration step are described. A listing of all test cases (annotated for subsequent reference) and expected results are also included. In the agile world, this level of test-case description occurs when code to implement the user story is being developed so the code can be tested as soon as it is ready for integration.

A history of actual test results, problems, or peculiarities is recorded in a test report that can be appended to the test specification. It is often best to implement the test report as a shared Web document to allow all stakeholders access to the latest test results and the current state of the software increment. Information contained in this online document can be vital to developers during software maintenance (Section 4.9).

20.3 art i f i c i a L in t e L L i g e nc e a n d re g r e s s i o n te st i ng

Each time a new module is added as part of integration testing, the software changes. New data flow paths are established, new input/output (I/O) may occur, and new control logic is invoked. Side effects associated with these changes may cause prob- lems with functions that previously worked flawlessly. In the context of an integration

2 See the SEPA 9e website.

CHAPTER 20 SOFTWARE TESTING—INTEGRATION LEVEL 403

test strategy, regression testing is the reexecution of some subset of tests that have already been conducted to ensure that changes have not propagated unintended side effects. Regression tests should be executed every time a major change is made to the software (including the integration of new components). Regression testing helps to ensure that changes (due to testing or for other reasons) do not introduce unintended behavior or additional errors.

Regression testing may be conducted manually, by reexecuting a subset of all test cases or using automated capture/playback tools. Capture/playback tools enable the software engineer to capture test cases and results for subsequent playback and com- parison. The regression test suite (the subset of tests to be executed) contains three different classes of test cases:

∙ A representative sample of tests that will exercise all software functions ∙ Additional tests that focus on software functions that are likely to be affected

by the change ∙ Tests that focus on the software components that have been changed

As integration testing proceeds, the number of regression tests can grow quite large. Therefore, the regression test suite should be designed to include only those tests that address one or more classes of errors in each of the major program functions.

Yoo and Harman [Yoo13] write about potential uses of artificial intelligence (AI) in identifying test cases for use in regression test suites. A software tool could exam- ine the dependencies among the components in the software increment after the new components have been added and generate test cases automatically to use for regres- sion testing. Another possibility would be using machine learning techniques to select sets of test cases that will optimize the discovery of component collaboration errors. This work is promising, but still requires significant human interaction to review the test cases and the recommended order for executing them.

Regression Testing

The scene: Doug Miller’s office, as integration testing is under way.

The players: Doug Miller, software engineer- ing manager; Vinod, Jamie, Ed, and Shakira, members of the SafeHome software engineering team.

The conversation: Doug: It seems to me that we are not spending enough time retesting software components after new components are integrated.

Vinod: I guess that is true, but isn’t it good enough that we are testing the new compo- nents’ interactions with the components they are supposed to collaborate with?

Doug: Not always. Sometimes components make unintended change to data used by other components. I know we are busy, but it is important to discover these problems early.

Shakira: We do have a test-case repository we have been drawing from. Perhaps we can randomly select several test cases to run using our automated testing framework.

safeHome

404 PART THREE QUALITY AND SECURITY

20.4 in t e g r at i o n te st i ng i n t H e oo co n t e x t

Object-oriented software does not have an obvious hierarchical control structure, so traditional top-down and bottom-up integration strategies (Section 20.2) have little meaning. In addition, integrating operations one at a time into a class (the conven- tional incremental integration approach) is often impossible because of the “direct and indirect interactions of the components that make up the class” [Ber93].

There are two different strategies for integration testing of OO systems: thread- based testing and use-based testing [Bin99]. The first, thread-based testing, integrates the set of classes required to respond to one input or event for the system. Each thread is integrated and tested individually. Regression testing is applied to ensure that no side effects occur. An important strategy for integration testing of OO software is thread-based testing. Threads are sets of classes that respond to an input or event.

The second integration approach, use-based testing, begins the construction of the system by testing those classes (called independent classes) that use very few (if any) server classes. After the independent classes are tested, the next layer of classes, called dependent classes, that use the independent classes are tested. This sequence of test- ing layers of dependent classes continues until the entire system is constructed. Use- based tests focus on classes that do not collaborate heavily with other classes.

The use of scaffolding software also changes when integration testing of OO sys- tems is conducted. Drivers can be used to test operations at the lowest level and for the testing of whole groups of classes. A driver can also be used to replace the user interface so that tests of system functionality can be conducted prior to implementa- tion of the interface. Stubs can be used in situations in which collaboration between classes is required but one or more of the collaborating classes has not yet been fully implemented.

Cluster testing is one step in the integration testing of OO software. Here, a clus- ter of collaborating classes (determined by examining the CRC and object-relationship model) is exercised by designing test cases that attempt to uncover errors in the collaborations.

Doug: That’s a start. But maybe we should be more strategic in how we select our test cases.

Ed: I suppose we could use our test-case/ requirement traceability table and check our CRC card model.

Vinod: I have been using continuous integra- tion, meaning I integrate each component as soon as one of the developers passes it to me. I try to run a series of regression tests on the partially integrated program.

Jamie: I’ve been trying to design a set of appropriate tests for each function in the

system. Maybe I should tag some of the more important ones for Vinod to use for regression testing.

Doug (to Vinod): How often will you run the regression test cases?

Vinod: Every day I integrate a new component I will use the regression test cases . . . until we decide the software increment is done.

Doug: Let’s try using Jamie’s regression test cases as they are created and see how things go.

CHAPTER 20 SOFTWARE TESTING—INTEGRATION LEVEL 405

20.4.1 Fault-Based Test-Case Design3

The object of fault-based testing within an OO system is to design tests that have a high likelihood of uncovering plausible faults. Because the product or system must conform to customer requirements, preliminary planning required to perform fault- based testing begins with the analysis model. The strategy for fault-based testing is to hypothesize a set of plausible faults and then derive tests to prove each hypothesis. The tester looks for plausible faults (i.e., aspects of the implementation of the system that may result in defects). To determine whether these faults exist, test cases are designed to exercise the design or code.

Of course, the effectiveness of these techniques depends on how testers perceive a plausible fault. If real faults in an OO system are perceived to be implausible, then this approach is really no better than any random testing technique. However, if the analysis and design models can provide insight into what is likely to go wrong, then fault-based testing can find significant numbers of errors with relatively low expenditures of effort.

Integration testing looks for plausible faults in operation calls or message connec- tions. Three types of faults are encountered in this context: unexpected result, wrong operation/message used, and incorrect invocation. To determine plausible faults as functions (operations) are invoked, the behavior of the operation must be examined.

Integration testing applies to attributes as well as to operations. The “behaviors” of an object are defined by the values that its attributes are assigned. Testing should exercise the attributes to determine whether proper values occur for distinct types of object behavior.

It is important to note that integration testing attempts to find errors in the client object, not the server. Stated in conventional terms, the focus of integration testing is to determine whether errors exist in the calling code, not the called code. The operation call is used as a clue, a way to find test requirements that exercise the calling code.

The approach for multiple-class partition testing is similar to the approach used for partition testing of individual classes. A single class is partitioned as discussed in Section 19.6.1. However, the test sequence is expanded to include those operations that are invoked via messages to collaborating classes. An alternative approach parti- tions tests based on the interfaces to a particular class. Referring to Figure 20.3, the Bank class receives messages from the ATM and Cashier classes. The methods within Bank can therefore be tested by partitioning them into those that serve ATM and those that serve Cashier.

Kirani and Tsai [Kir94] suggest the following sequence of steps to generate multiple-class random test cases:

1. For each client class, use the list of class operations to generate a series of ran- dom test sequences. The operations will send messages to other server classes.

2. For each message that is generated, determine the collaborator class and the corresponding operation in the server object.

3 Sections 20.4.1 and 20.4.2 have been adapted from an article by Brian Marick originally posted on the Internet newsgroup comp.testing. This adaptation is included with the permis- sion of the author. For further information on these topics, see [Mar94]. It should be noted that the techniques discussed in Sections 20.4.1 and 20.4.2 are also applicable for conven- tional software.

406 PART THREE QUALITY AND SECURITY

3. For each operation in the server object (that has been invoked by messages sent from the client object), determine the messages that it transmits.

4. For each of the messages, determine the next level of operations that are invoked and incorporate these into the test sequence.

To illustrate [Kir94], consider a sequence of operations for the Bank class relative to an ATM class (Figure 20.3):

verifyAcct•verifyPIN•[[verifyPolicy•withdrawReq]|depositReq|acctInfoREQ]n

A random test case for the Bank class might be

Test case r3 = verifyAcct•verifyPIN•depositReq

To consider the collaborators involved in this test, the messages associated with each of the operations noted in test case r3 are considered. Bank must collaborate with ValidationInfo to execute the verifyAcct() and verifyPIN(). Bank must collabo- rate with Account to execute depositReq(). Hence, a new test case that exercises these collaborations is

Test case r4 = verifyAcct [Bank:validAcctValidationInfo]•verifyPIN [Bank: validPinValidationInfo]•depositReq [Bank: depositaccount]

20.4.2 Scenario-Based Test-Case Design Fault-based testing misses two main types of errors: (1) incorrect specifications and (2) interactions among subsystems. When errors associated with an incorrect specifi- cation occur, the product doesn’t do what the customer wants. It might do the wrong

Figure 20.3 Class collaboration diagram for banking application

Source: Kirani, Shekhar and Tsai, W. T., “Specification and Verification of Object-Oriented Programs,” Technical Report TR 94-64, University of Minnesota, December 4, 1994, 72.

ATM User Interface ATM Bank

Validation InfoCashier Account

cardInserted password deposit withdraw accntStatus terminate

verifyStatus depositStatus dispenseCash printAccntStat readCardInfo getCashAmnt

acctInfo

verifyAcct verifyPIN verifyPolicy withdrawReq depositReq

openAcct initialDeposit authorizeCard deauthorize closeAcct

creditLimit accntType balance withdraw deposit close

validPin validAcct

CHAPTER 20 SOFTWARE TESTING—INTEGRATION LEVEL 407

thing or omit important functionality. But in either circumstance, quality (conformance to requirements) suffers. Errors associated with subsystem interaction occur when the behavior of one subsystem creates circumstances (e.g., events, data flow) that cause another subsystem to fail.

Scenario-based testing will uncover errors that occur when any actor interacts with the software. Scenario-based testing concentrates on what the user does, not what the product does. This means capturing the tasks (via use cases) that the user has to perform and then applying them and their variants as tests. This is very similar to thread testing.

Scenario testing uncovers interaction errors. But to accomplish this, test cases must be more complex and more realistic than fault-based tests. Scenario-based testing tends to exercise multiple subsystems in a single test (users do not limit themselves to the use of one subsystem at a time).

Test-case design becomes more complicated as integration of the object-oriented system begins. It is at this stage that testing of collaborations between classes must begin. To illustrate “interclass test-case generation” [Kir94], we expand the banking example introduced in Section 19.6 to include the classes and collaborations noted in Figure 20.3. The direction of the arrows in the figure indicates the direction of mes- sages, and the labeling indicates the operations that are invoked as a consequence of the collaborations implied by the messages.

Like the testing of individual classes, class collaboration testing can be accom- plished by applying random and partitioning methods, as well as scenario-based test- ing and behavioral testing.

20.5 Va L i dat i o n te st i ng

Like all testing steps, validation tries to uncover errors, but the focus is at the require- ments level—on things that will be immediately apparent to the end user. Validation testing begins at the culmination of integration testing, when individual components have been exercised, the software is completely assembled as a package, and interfac- ing errors have been uncovered and corrected. At the validation or system level, the distinction between different software categories disappears. Testing focuses on user- visible actions and user-recognizable output from the system.

Validation can be defined in many ways, but a simple (albeit harsh) definition is that validation succeeds when software functions in a manner that can be reasonably expected by the customer. At this point, a battle-hardened software developer might protest: “Who or what is the arbiter of reasonable expectations?” If a software require- ments specification has been developed, it describes each user story, all user-visible attributes, and the customer’s acceptance criteria for each. The customer’s acceptance criteria form the basis for a validation-testing approach.

Software validation is achieved through a series of tests that demonstrate conform- ity with requirements. A test plan outlines the classes of tests to be conducted, and a test procedure defines specific test cases that are designed to ensure that all functional requirements are satisfied, all behavioral characteristics are achieved, all content is accurate and properly presented, all performance requirements are attained, documen- tation is correct, and usability and other requirements are met (e.g., transportability,

408 PART THREE QUALITY AND SECURITY

compatibility, error recovery, maintainability). If a deviation from specification is uncovered, a deficiency list is created. A method for resolving deficiencies (acceptable to stakeholders) must be established. Specialized testing methods for these nonfunc- tional requirements are discussed in Chapter 21.

An important element of the validation process is a configuration review. The intent of the review is to ensure that all elements of the software configuration have been properly developed, are cataloged, and have the necessary detail to bolster the support activities. The configuration review, sometimes called an audit, is discussed in more detail in Chapter 22.

Preparing for Validation

The scene: Doug Miller’s office, as component-level design con-

tinues and construction of certain components continues.

The players: Doug Miller, software engineer- ing manager; Vinod, Jamie, Ed, and Shakira, members of the SafeHome software engineering team.

The conversation: Doug: The first increment will be ready for validation in what . . . about three weeks?

Vinod: That’s about right. Integration is going well. We’re smoke testing daily, finding some bugs, but nothing we can’t handle. So far, so good.

Doug: Talk to me about validation.

Shakira: Well, we’ll use all of the use cases as the basis for our test design. I haven’t started yet, but I’ll be developing tests for all of the use cases that I’ve been responsible for.

Ed: Same here.

Jamie: Me too, but we’ve got to get our act to- gether for acceptance testing and also for al- pha and beta testing, no?

Doug: Yes. In fact I’ve been thinking; we could bring in an outside contractor to help us with validation. I have the money in the budget . . . and it’d give us a new point of view.

Vinod: I think we’ve got it under control.

Doug: I’m sure you do, but an ITG gives us an independent look at the software.

Jamie: We’re tight on time here, Doug. I for one don’t have the time to babysit anybody you bring in to do the job.

Doug: I know, I know. But if an ITG works from requirements and use cases, not too much babysitting will be required.

Vinod: I still think we’ve got it under control.

Doug: I hear you, Vinod, but I am going to overrule on this one. Let’s plan to meet with the ITG rep later this week. Get ‘em started and see what they come up with.

Vinod: Okay, maybe it’ll lighten the load a bit.

safeHome

At the validation or system level, the details of class connections disappear. The validation of OO software focuses on user-visible actions and user-recognizable out- puts from the system. To assist in the derivation of validation tests, the tester should draw upon use cases (Chapters 7 and 8) that are part of the requirements model. The use case provides a scenario that has a high likelihood of uncovered errors in user-interaction requirements. Conventional black-box testing methods (Chapter 19) can be used to drive validation tests. In addition, you may choose to derive test cases from the object-behavior model created as part of object-oriented analysis (OOA).

CHAPTER 20 SOFTWARE TESTING—INTEGRATION LEVEL 409

20.6 te st i ng pat t e r n s

The use of patterns as a mechanism for describing solutions to specific design problems was discussed in Chapter 15. But patterns can also be used to propose solutions to other software engineering situations—in this case, software testing. Testing patterns describe common testing problems and solutions that can assist you in dealing with them.

Much of software testing, even during the past decade, has been an ad hoc activity. If testing patterns can help a software team to communicate about testing more effec- tively, to understand the motivating forces that lead to a specific approach to testing, and to approach the design of tests as an evolutionary activity in which each iteration results in a more complete suite of test cases, then patterns have accomplished much.

Testing patterns are described in much the same way as design patterns (Chapter 15). Dozens of testing patterns have been proposed in the literature (e.g., [Mar02]). The following three testing patterns (presented in abstract form only) provide representa- tive examples:

Pattern name: PairTesting

Abstract: A process-oriented pattern, pair testing describes a technique that is analogous to pair programming [Chapter 3] in which two testers work together to design and execute a series of tests that can be applied to unit, integration or validation testing activities.

Pattern name: SeparateTestInterface

Abstract: There is a need to test every class in an object-oriented system, including “internal classes” (i.e., classes that do not expose any interface outside of the component that used them). The SeparateTestInterface pattern describes how to create “a test interface that can be used to describe specific tests on classes that are visible only internally to a component” [Lan01].

Pattern name: ScenarioTesting

Abstract: Once unit and integration tests have been conducted, there is a need to determine whether the software will perform in a manner that satisfies users. The ScenarioTesting pattern describes a technique for exercising the software from the user’s point of view. A failure at this level indicates that the software has failed to meet a user visible requirement [Kan01].

A comprehensive discussion of testing patterns is beyond the scope of this book. If you have further interest, see [Bin99], [Mar02], [Tho04], [Mac10], and [Gon17] for additional information on this important topic.

20.7 su m m a ry

Integration testing builds the software architecture while at the same time conducting tests to uncover errors associated with interfacing between software components. The objective is to take unit-tested components and build a program structure that has been dictated by design.

Experienced software developers often say, “Testing never ends; it just gets trans- ferred from you [the software engineer] to your customer. Every time your customer uses the program, a test is being conducted.” By applying test-case design, you can

410 PART THREE QUALITY AND SECURITY

achieve more complete testing and thereby uncover and correct the highest number of errors before the “customer’s tests” begin.

Hetzel [Het84] describes white-box testing as “testing in the small.” His implication is that the white-box tests that have been considered in this chapter are typically applied to small program components (e.g., modules or small groups of modules). Black-box testing, on the other hand, broadens your focus and might be called “test- ing in the large.”

Black-box integration testing is based on the requirements specified in the user stories or some other analysis modeling representation. Test-case authors do not need to wait for the component implementation code to be written, as long as they under- stand the required functionality of the components undergoing testing. Validation test- ing is often accomplished with black-box test cases that produce end-user visible input actions and observable output behaviors.

White-box testing requires a close examination of procedural implementation details and data structure implementation details for the components undergoing test- ing. White-box tests can be designed only after component-level design (or source code) exists. Logical paths through the software and collaborations between compo- nents are the focus of white-box integration testing.

Integration testing of OO software can be accomplished using a thread-based or use-based strategy. Thread-based testing integrates the set of classes that collaborate to respond to one input or event. Use-based testing constructs the system in layers, beginning with those classes that do not make use of server classes. Integration test- case design methods can also make use of random and partition tests. In addition, scenario-based testing and tests derived from behavioral models can be used to test a class and its collaborators. A test sequence tracks the flow of operations across class collaborations.

OO system validation testing is black-box oriented and can be accomplished by applying the same black-box methods discussed for conventional software. However, scenario-based testing dominates the validation of OO systems, making the use case a primary driver for validation testing.

Regression testing is the process of reexecuting a selected test case following any change made to a software system. Regression tests should be executed whenever new components or changes are added to a software increment. Regression testing helps to ensure that changes do not introduce unintended behavior or additional errors.

Testing patterns describe common testing problems and solutions that can assist you in dealing with them. If testing patterns can help a software team to communicate about testing more effectively, to understand the motivating forces that lead to a spe- cific approach to testing, and to approach the design of test cases as an evolutionary activity in which each iteration results in a more complete suite of test cases, then patterns have accomplished much.

pro b L e m s a n d po i n t s to po n d e r

20.1. How can project scheduling affect integration testing?

20.2. Who should perform validation testing—the software developer or the software user? Justify your answer.

CHAPTER 20 SOFTWARE TESTING—INTEGRATION LEVEL 411

20.3. Will exhaustive testing (even if it is possible for very small programs) guarantee that a program is 100 percent correct?

20.4. Why should “testing” begin with requirements analysis and design?

20.5. Should nonfunctional requirements (e.g., security or performance) be tested as part of integration testing?

20.6. Why do we have to retest subclasses that are instantiated from an existing class, if the existing class has already been thoroughly tested?

20.7. What is the difference between thread-based and use-based strategies for integration testing?

20.8. Develop a complete test strategy for the SafeHome system discussed earlier in this book. Document it in a test specification.

20.9. Pick one of the SafeHome system user stories to use as the basis of scenarios-based testing, and construct a set of integration test cases needed to do integration testing for that user story.

20.10. For the test cases you wrote in Problem 20.9, identify a subset of test cases you will use for regression testing software components that are added to the program.

Design element: Quick Look icon magnifying glass: © Roger Pressman

412

C H A P T E R

21 Software Testing—Specialized Testing for Mobility

What is it? Mobility testing is a collection of re- lated activities with a single goal: to uncover errors in MobileApp content, function, usabil- ity, navigability, performance, capacity, and security.

Who does it? Software engineers and other project stakeholders (managers, customers, end users) all participate in mobility testing.

Why is it important? If end users encounter errors or difficulties within the MobileApp, they will go elsewhere for the personalized content and function they need.

What are the steps? The mobility testing process begins by focusing on user-visible aspects of the MobileApp and proceeds

to tests that exercise technology and infrastructure.

What is the work product? A MobileApp test plan is often produced. A suite of test cases is developed for each testing step, and an archive of test results is maintained for future use.

How do I ensure that I’ve done it right? Although you can never be sure that you’ve performed every test that is needed, you can be certain that testing has uncovered errors (and that those errors have been corrected). In addition, if you’ve established a test plan, you can check to ensure that all planned tests have been conducted.

Q u i c k L o o k

accessibility testing . . . . . . . . . . . . . . . . . . . . .432 alpha test . . . . . . . . . . . . . . . . . . . . . . . . . . . . .430 beta test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .430 compatibility testing . . . . . . . . . . . . . . . . . . . . 414 content testing . . . . . . . . . . . . . . . . . . . . . . . .420 documentation testing . . . . . . . . . . . . . . . . . .434 internationalization . . . . . . . . . . . . . . . . . . . . .423 load testing . . . . . . . . . . . . . . . . . . . . . . . . . . .425 model-based testing . . . . . . . . . . . . . . . . . . . .429 navigation testing . . . . . . . . . . . . . . . . . . . . . . 421

performance testing . . . . . . . . . . . . . . . . . . . .424 real-time testing . . . . . . . . . . . . . . . . . . . . . . .426 recovery testing . . . . . . . . . . . . . . . . . . . . . . . 417 security testing . . . . . . . . . . . . . . . . . . . . . . . .423 stress testing . . . . . . . . . . . . . . . . . . . . . . . . . .425 testing AI systems . . . . . . . . . . . . . . . . . . . . . .428 testing guidelines . . . . . . . . . . . . . . . . . . . . . . 413 test strategies for MobileApps . . . . . . . . . . . . 413 test strategies for WebApps . . . . . . . . . . . . . . 418 usability testing . . . . . . . . . . . . . . . . . . . . . . . .430

k e y c o n c e p t s

The same sense of urgency that drives MobileApp projects also pervades all mobility projects. Stakeholders are worried that they will miss a market window and press to introduce the MobileApp to its intended market. Technical activities that often occur late in the process, such as performance and security testing, are sometimes given short shrift. Usability testing that should occur during the design phase may end up being deferred until just before delivery. These can be catastrophic mistakes. To avoid this situation, you and other team members must ensure that each work product exhibits high quality, or users will move to a competing product [Soa11].

CHAPTER 21 SOFTWARE TESTING—SPECIALIZED TESTING FOR MOBILITY 413

MobileApp requirements and design models cannot be tested solely with executable test cases. You and your team should conduct technical reviews (Chapter 16) that examine usability (Chapter 12) as well as MobileApp performance and security.

There are several important questions to ask when creating a mobility testing strategy [Sch09]:

∙ Do you have to build a fully functional prototype before you test with users? ∙ Should you test with the user’s device or provide a device for testing? ∙ What devices and user groups should you include in testing? ∙ What are the trade-offs associated with lab testing versus remote testing?

We address each of these questions throughout this chapter.

21.1 Mo b i L e te st i ng gu i d e L i n e s

MobileApps that run entirely on a mobile device can be tested using traditional soft- ware testing methods (Chapters 19 and 20). Alternatively, they can be tested using emulators running on personal computers. Things become more complicated when thin-client MobileApps1 are to be tested. They exhibit many of the same testing chal- lenges found in WebApps (Section 20.2), but thin-client MobileApps have the addi- tional concerns associated with transmission of data through Internet gateways and telephone networks [Was10].

In general, users expect MobileApps to be context aware and deliver personalized user experiences based on the physical location of a device in relation to available network features. Testing MobileApps in dynamic ad hoc networks for every possible device and network configuration is difficult, if not impossible.

MobileApps are expected to deliver much of the complex functionality and reliability found in desktop applications, but they are resident on mobile platforms with relatively limited resources. The following guidelines provide a basis for mobile application testing [Kea07]:

∙ Understand the network and device landscape before testing to identify bottlenecks (Section 21.6).

∙ Conduct tests in uncontrolled real-world test conditions (field-based testing, Section 21.8).

∙ Select the right automation test tool (Section 21.11). ∙ Use the Weighted Device Platform Matrix method to identify the most critical

hardware/platform combination to test (Section 21.8). ∙ Check the end-to-end functional flow in all possible platforms at least once

(Section 21.10).

1 Thin-client apps typically have software for the user interface running on the mobile device (or Web browser software) and use a network interface to an Internet-based application or cloud-based data storage.

414 PART THREE QUALITY AND SECURITY

∙ Conduct performance testing, GUI testing, and compatibility testing using actual devices (Sections 21.8 and 21.11).

∙ Measure performance only in realistic conditions of wireless traffic and user load (Section 21.8).

21.2 th e te st i ng st r at e g i e s

The strategy for testing mobile applications adopts the basic principles for all software testing. However, the unique nature of MobileApps demands the consideration of a number of specialized issues:

∙ User-experience testing. Users are involved early in the development process to ensure that the MobileApp lives up to the usability and accessibility expec- tations of the stakeholders on all supported devices (Section 21.3).

∙ Device compatibility testing. Testers verify that the MobileApp works cor- rectly on all required hardware and software combinations (Section 21.9).

∙ Performance testing. Testers check nonfunctional requirements unique to mobile devices (e.g., download times, processor speed, storage capacity, power availability) (Section 21.8).

∙ Connectivity testing. Testers ensure that the MobileApp can access any needed networks or Web services and can tolerate weak or interrupted network access (Section 21.6).

∙ Security testing. Testers ensure that the MobileApp does not compromise the privacy or security requirements of its users (Section 21.7).

∙ Testing in the wild. The app is tested under realistic conditions on actual user devices in a variety of networking environments around the globe (Section 21.9).

∙ Certification testing. Testers ensure that the MobileApp meets the standards established by the app stores that will distribute it.

Technology alone is not sufficient to guarantee commercial success of a Mobile- App. Users abandon MobileApps quickly if they do not work well or fail to meet expectations. It is important to recall that testing has two important goals: (1) to cre- ate test cases that uncover defects early in the development cycle and (2) to verify the presence of important quality attributes. The quality attributes for MobileApps are based on those set forth in ISO 2050:2011 [ISO17] and encompass functionality, reli- ability, usability, efficiency, maintainability, and portability (Chapter 17).

Developing a MobileApp testing strategy requires an understanding of both soft- ware testing and the challenges that make mobile devices and their network infrastruc- ture unique [Kho12a]. In addition to a thorough knowledge of conventional software testing approaches (Chapters 19 and 20), a MobileApp tester should have a good understanding of telecommunications principles and an awareness of the differences and capabilities of mobile operating systems platforms. This basic knowledge must be complemented with a thorough understanding of the different types of mobile test- ing (e.g., MobileApp testing, mobile handset testing, mobile website testing), the use

CHAPTER 21 SOFTWARE TESTING—SPECIALIZED TESTING FOR MOBILITY 415

of simulators, test automation tools, and remote data access services (RDA). Each of these topics is discussed later in this chapter.

21.3 us e r ex p e r i e nc e te st i ng is s u e s

In a crowded marketplace in which products provide the same functionality, users will choose the MobileApp that is easiest to use. The user interface and its interaction mechanisms are visible to the MobileApp users. It is important to test the quality of the user experience provided by the MobileApp to ensure that it meets the expecta- tions of its users.

What characteristics of MobileApp usability become the focus of testing, and what specific objectives are addressed? Many of the procedures for assessing the usability of software user interfaces discussed in Chapters 12 and 13 can be used to assess MobileApps. Similarly, many of the strategies used to assess the quality of WebApps (Section 21.5) may be used to test the user interface portion of the MobileApp. There is more to building a good MobileApp user interface than simply shrinking the size of a user interface from an existing desktop application.

21.3.1 Gesture Testing Touch screens are ubiquitous on mobile devices, and, as a consequence, developers have added multitouch gestures (e.g., swiping, zooming, scrolling, selection) as a means of augmenting the user interaction possibilities without losing screen real estate. Figure 21.1 shows several commonly found gestures on MobileApps. Unfor- tunately, gesture-intensive interfaces present a number of review and testing challenges.

Paper prototypes, sometimes developed as part of the design, cannot be used to adequately review the adequacy or efficacy of gestures. When testing is initiated, it’s difficult to use automated tools to test touch or gesture interface actions. The location

Figure 21.1 Mobile app gestures

Tap Double Tap Drag Flick

Pinch Spread Press Press + Tap

416 PART THREE QUALITY AND SECURITY

of screen objects is affected by screen size and resolution, as well as previous user actions, making accurate gesture testing difficult. And even as testing is conducted, gestures are hard to log accurately for replay.

Instead, testers need to create test framework programs that make calls to functions that simulate gesture events. All of this is expensive and time consuming.

Accessibility testing for visually impaired users is challenging because gesture interfaces typically do not provide either tactile or auditory feedback. Usability and accessibility testing for gestures become very important for ubiquitous devices like smartphones. It may be important to test the operation of the device when gesture operations are not available.

Ideally, user stories or use cases are written in sufficient detail to allow their use as the basis for test scripts. It is important to recruit representative users and include all targeted devices to take screen differences into account when testing gestures with a MobileApp. Finally, testers should ensure that the gestures conform to the standards and contexts set for the mobile device or platform.

21.3.2 Virtual Keyboard Input Because a virtual keyboard may obscure part of the display screen when activated, it is important to test the MobileApp to ensure that important screen information is not hidden from the user while typing. If the screen information must be hidden, it is important to test the ability of the MobileApp to allow page flipping by the user without losing typed information [Sch09].

Virtual keyboards are typically smaller than personal computer keyboards, and therefore, it is difficult to type with 10 fingers. Because the keys themselves are smaller and harder to hit and provide no tactile feedback, the MobileApp must be tested to ensure that it allows easy error correction and can manage mistyped words without crashing.

Predictive technologies (i.e., autocompletion of partially typed words) are often used with virtual keyboards to help expedite user input. It is important to test the correctness of the word completions for the natural language chosen by the user, if the MobileApp is designed for a global market. It is also important to test the usabil- ity of any mechanism that allows the user to override a suggested completion.

Virtual keyboard testing is often conducted in the usability laboratory, but some should be conducted in the wild. If virtual keyboard tests uncover significant prob- lems, the only alternative may be to ensure that the MobileApp can accept input from devices other than a virtual keyboard (e.g., a physical keyboard or voice input).

21.3.3 Voice Input and Recognition Voice input has become an increasingly common method for providing input and commands in hands-busy, eyes-busy situations. Voice input may take several forms with different levels of programming complexity required to process each. Voice- mail input occurs when a message is simply recorded for playback later. Discrete word recognition can be used to allow users to verbally select items from a menu with a small number of choices. Continuous speech recognition translates dictated speech into meaningful text strings. Each type of voice input has its own testing challenges.

CHAPTER 21 SOFTWARE TESTING—SPECIALIZED TESTING FOR MOBILITY 417

According to Shneiderman [Shn09], all forms of voice input and processing are hindered by interference from noisy environments. Using voice commands to control a device impresses a greater cognitive load on the user, as compared to pointing to a screen object or pressing a key. The user must think of the correct word or words to get the MobileApp to perform the desired action. However, the breadth and accuracy of speech recognition systems are evolving rapidly, and it is likely the voice recogni- tion will become the dominant form of communication in many MobileApps.

Testing the quality and reliability of voice input and recognition should take environmental conditions and individual voice variation into account. Errors will be made by users of the MobileApp and by the portions of the system processing the input. The MobileApp should be tested to ensure that bad input does not crash the MobileApp or the device. Large numbers of users and environments should be involved to be sure the MobileApp is working with an acceptable error rate. It is important to log errors to help developers improve the ability of the MobileApp to process speech input.

21.3.4 Alerts and Extraordinary Conditions When a MobileApp runs in a real-time environment, there are factors that may impact its behavior. For example, a Wi-Fi signal may be lost, or an incoming text message, phone call, or calendar alert may be received while the user is working with the MobileApp.

These factors can disrupt the MobileApp user’s work flow, yet most users opt to allow alerts and other interruptions as they work. A MobileApp test environment must be able to simulate these alerts and conditions. In addition, you should test the Mobile- App’s ability to handle alerts and conditions in a production environment on actual devices (Section 21.9).

Part of MobileApp testing should focus on the usability issues relating to alerts and pop-up messages. Testing should examine the clarity and context of alerts, the appropriateness of their location on the device display screen, and when foreign lan- guages are involved, verification that the translation from one language to another is correct.

Many alerts and conditions may be triggered differently on various mobile devices or by network or context changes. Although many of the exception-handling processes can be simulated with a software test harness, you should not rely solely on testing in the development environment. This again emphasizes the importance of testing the MobileApp in the wild on actual devices.

Many computer-based systems must recover from faults and resume processing with little or no downtime. In some cases, a system must be fault tolerant; that is, processing faults must not cause overall system function to cease. In other cases, a system failure must be corrected within a specified period of time or severe economic damage will occur.

Recovery testing is a system test that forces the software to fail in a variety of ways and verifies that recovery is properly performed. If recovery is automatic (performed by the system itself), reinitialization, checkpointing mechanisms, data recovery, and restart are evaluated for correctness. If recovery requires human intervention, the mean time to repair (MTTR) is evaluated to determine whether it is within acceptable limits.

418 PART THREE QUALITY AND SECURITY

21.4 We b ap p L i cat i o n te st i ng

Many Web testing practices are also appropriate for testing thin-client MobileApps and interactive simulations. The strategy for WebApp testing adopts the basic prin- ciples for all software testing and applies a strategy and tactics that are used for object-oriented systems. The following steps summarize the approach:

1. The content model for the WebApp is reviewed to uncover errors. 2. The interface model is reviewed to ensure that all use cases can be

accommodated. 3. The design model for the WebApp is reviewed to uncover navigation errors. 4. The user interface is tested to uncover errors in presentation and/or navigation

mechanics. 5. Each functional component is unit tested. 6. Navigation throughout the architecture is tested. 7. The WebApp is implemented in a variety of different environmental configu-

rations and is tested for compatibility with each configuration. 8. Security tests are conducted in an attempt to exploit vulnerabilities in the

WebApp or within its environment. 9. Performance tests are conducted. 10. The WebApp is tested by a controlled and monitored population of end users.

The results of their interaction with the system are evaluated for errors.

Because many WebApps evolve continuously, the testing process is an ongoing activity, conducted by support staff who use regression tests derived from the tests developed when the WebApp was first engineered. Methods for WebApp testing are considered in Section 21.5.

21.5 We b te st i ng st r at e g i e s

Testing is the process of exercising software with the intent of finding (and ultimately correcting) errors. This fundamental philosophy, first presented in Chapter 20, does not change for WebApps. In fact, because Web-based systems and applications reside on a network and interoperate with many different operating systems, browsers (or other personal communication devices), hardware platforms, communications protocols, and “backroom” applications, the search for errors represents a significant challenge.

Figure 21.2 juxtaposes the mobility testing process with the design pyramid for WebApps (Chapter 13). Note that as the testing flow proceeds from left to right and top to bottom, user-visible elements of the WebApp design (top elements of the pyr- amid) are tested first, followed by infrastructure design elements.

Because many apps evolve continuously, the testing process is an ongoing activity, conducted by app support staff who use regression tests derived from the tests devel- oped when the app was first engineered.

CHAPTER 21 SOFTWARE TESTING—SPECIALIZED TESTING FOR MOBILITY 419

Figure 21.2 The testing process

technology

user

WebApp Testing

The scene: Doug Miller’s office.

The players: Doug Miller, manager of the SafeHome software engineering group, and Vinod Raman, a member of the product software engineering team.

The conversation: Doug: What do you think of the SafeHome Assured.com e-commerce WebApp V0.0?

Vinod: The outsourcing vendor has done a good job. Sharon [development manager for the vendor] tells me they’re testing as we speak.

Doug: I’d like you and the rest of the team to do a little informal testing on the e-commerce site.

Vinod (grimacing): I thought we were going to hire a third-party testing company to validate

the WebApp. We’re still killing ourselves trying to get the product software out the door.

Doug: We’re going to hire a testing vendor for performance and security testing, and our out- sourcing vendor is already testing. Just thought another point of view would be help- ful, and besides, we’d like to keep costs in line, so . . .

Vinod (sighs): What are you looking for?

Doug: I want to be sure that the interface and all navigation are solid.

Vinod: I suppose we can start with the use cases for each of the major interface functions:

Learn about SafeHome. Specify the SafeHome system you need. Purchase a SafeHome system. Get technical support.

safehoMe

420 PART THREE QUALITY AND SECURITY

21.5.1 Content Testing Errors in WebApp content can be as trivial as minor typographical mistakes or as significant as incorrect information, improper organization, or violation of intellectual property laws. Content testing attempts to uncover these and many other problems before the user encounters them.

Content testing has three important objectives: (1) to uncover syntactic errors (e.g., typos, grammar mistakes) in text-based documents, graphical representations, and other media; (2) to uncover semantic errors (i.e., errors in the accuracy or complete- ness of information) in any content object presented as navigation occurs; and (3) to find errors in the organization or structure of content that is presented to the end user.

Content testing combines both reviews and the generation of executable test cases. Although technical reviews are not a part of testing, content review should be per- formed to ensure that content has quality and to uncover semantic errors. Executable testing is used to uncover content errors that can be traced to dynamically derived content that is driven by data acquired from one or more databases.

To accomplish the first objective, automated spelling and grammar checkers may be used. However, many syntactic errors evade detection by such tools and must be discov- ered by a human reviewer (tester). In fact, a large website might enlist the services of a professional copy editor to uncover typographical errors, grammatical mistakes, errors in content consistency, errors in graphical representations, and cross-referencing errors.

Semantic testing focuses on the information presented within each content object. The reviewer (tester) must answer the following questions:

∙ Is the information factually accurate? ∙ Is the information concise and to the point? ∙ Is the layout of the content object easy for the user to understand? ∙ Can information embedded within a content object be found easily? ∙ Have proper references been provided for all information derived from other

sources? ∙ Is the information presented consistent internally and consistent with infor-

mation presented in other content objects?

Doug: Good. But take the navigation paths all the way to their conclusion.

Vinod (looking through a notebook of use cases): Yeah, when you select Specify the SafeHome system you need, that’ll take you to:

Select SafeHome components. Get SafeHome component

recommendations. We can exercise the semantics of each path.

Doug: While you’re there, check out the content that appears at each navigation node.

Vinod: Of course . . . and the functional elements as well. Who’s testing usability?

Doug: Oh . . . the testing vendor will coordi- nate usability testing. We’ve hired a market research firm to line up 20 typical users for the usability study, but if you guys uncover any usability issues . . .

Vinod: I know, pass them along.

Doug: Thanks, Vinod.

CHAPTER 21 SOFTWARE TESTING—SPECIALIZED TESTING FOR MOBILITY 421

∙ Is the content offensive, misleading, or does it open the door to litigation? ∙ Does the content infringe on existing copyrights or trademarks? ∙ Does the content contain internal links that supplement existing content?

Are the links correct? ∙ Does the aesthetic style of the content conflict with the aesthetic style of the

interface?

Obtaining answers to each of these questions for a large WebApp (containing hundreds of content objects) can be a daunting task. However, failure to uncover semantic errors will shake the user’s faith in the WebApp and can lead to failure of the Web-based application.

21.5.2 Interface Testing Interface testing exercises interaction mechanisms and validates aesthetic aspects of the user interface. The overall strategy for interface testing is to (1) uncover errors related to specific interface mechanisms (e.g., errors in the proper execution of a menu link or the way data are entered in a form) and (2) uncover errors in the way the interface implements the semantics of navigation, WebApp functionality, or content display. With the exception of WebApp-oriented specifics, the interface strategy noted here is applicable to all types of client-server software. To accomplish this strategy, a number of tactical steps are initiated:

∙ Interface features are tested to ensure that design rules, aesthetics, and related visual content are available for the user without error.

∙ Individual interface mechanisms are tested in a manner that is analogous to unit testing. For example, tests are designed to exercise all forms, client-side scripting, dynamic HTML, scripts, streaming content, and application-specific interface mechanisms (e.g., a shopping cart for an e-commerce application).

∙ Each interface mechanism is tested within the context of a use case or network semantic unit (NSU) (Chapter 13) for a specific user category.

∙ The complete interface is tested against selected use cases and NSUs to uncover errors in the semantics of the interface. It is at this stage that a series of usability tests are conducted.

∙ The interface is tested within a variety of environments (e.g., browsers) to ensure that it will be compatible.

21.5.3 Navigation Testing A user travels through a WebApp in much the same way as a visitor walks through a store or museum. There are many pathways to take, stops to make, things to learn and look at, activities to initiate, and decisions to make. This navigation process is predictable in the sense that every visitor has a set of objectives when he arrives. At the same time, the navigation process can be unpredictable because the visitor, influenced by something she sees or learns, may choose a path or initiate an action that is not typical for the original objective. The job of navigation testing is (1) to ensure that the mechanisms that allow the WebApp user to travel through the

422 PART THREE QUALITY AND SECURITY

WebApp are all functional and (2) to validate that each NSU can be achieved by the appropriate user category.

The first phase of navigation testing actually begins during interface testing. Nav- igation mechanisms (links and anchors of all types, redirects,2 bookmarks, frames and frame sets, site maps, and the accuracy of internal search facilities) are tested to ensure that each performs its intended function. Some of the tests noted can be performed by automated tools (e.g., link checking), while others are designed and executed man- ually. The intent throughout is to ensure that errors in navigation mechanics are found before the WebApp goes online.

Each NSU (Chapter 13) is defined by a set of navigation paths (called “the user journey”) that connect navigation nodes (e.g., Web pages, content objects, or functional- ity). Taken as a whole, each NSU allows a user to achieve specific requirements defined by one or more use cases for a user category. Navigation testing exercises each NSU to ensure that these requirements can be achieved. If NSUs have not been created as part of WebApp analysis or design, you can apply use cases for the design of navigation test cases. You should answer the following questions as each NSU or use case is tested:

∙ Is the NSU achieved in its entirety without error? ∙ Is every navigation node (defined for an NSU) reachable within the context of

the navigation paths defined for the NSU? ∙ If the NSU can be achieved using more than one navigation path, has every

relevant path been tested? ∙ If guidance is provided by the user interface to assist in navigation, are direc-

tions correct and understandable as navigation proceeds? ∙ Is there a mechanism (other than the browser “back” arrow) for returning to

the preceding navigation node and to the beginning of the navigation path? ∙ Do mechanisms for navigation within a large navigation node (i.e., a long

Web page) work properly? ∙ If a function is to be executed at a node and the user chooses not to provide

input, can the remainder of the NSU be completed? ∙ If a function is executed at a node and an error in function processing occurs,

can the NSU be completed? ∙ Is there a way to discontinue the navigation before all nodes have been

reached, but then return to where the navigation was discontinued and proceed from there?

∙ Is every node reachable from the site map? Are node names meaningful to end users?

∙ If a node within an NSU is reached from some external source, is it possible to process to the next node on the navigation path? Is it possible to return to the previous node on the navigation path?

∙ Does the user understand his location within the content architecture as the NSU is executed?

2 When a server request is forwarded to a nonexistent URL.

CHAPTER 21 SOFTWARE TESTING—SPECIALIZED TESTING FOR MOBILITY 423

Navigation testing, like interface and usability testing, should be conducted by as many different constituencies as possible. You have responsibility for early stages of navigation testing, but later stages should be conducted by other project stakeholders, an independent testing team, and ultimately, by nontechnical users. The intent is to exercise WebApp navigation thoroughly.

21.6 in t e r nat i o na L i z at i o n

Internationalization is the process of creating a software product so that it can be used in several countries and with various languages without requiring any engineering changes. Localization is the process of adapting a software application for use in targeted global regions by adding locale-specific requirements and translating text elements to appropriate languages. Localization effort may involve taking each coun- try’s currency, culture, taxes, and standards (both technical and legal) into account in addition to differences in languages [Sla12]. Launching a MobileApp in many parts of the world without testing it there would be very foolish.

Because it can be very costly to build an in-house testing facility in each country for which localization is planned, outsourcing testing to local vendors in each country is often more cost effective [Reu12]. However, using an outsourcing approach risks a degradation of communication between the MobileApp development team and those who are performing localization tests.

Crowdsourcing has become popular in many online communities.3 Reuveni [Reu12] suggests that crowdsourcing could be used to engage localization testers dispersed around the globe outside of the development environment. To accomplish this, it is important to find a community that prides itself on its reputation and has a track record of successes. An easy-to-use real-time platform allows community members to communicate with the project decision makers. To protect intellectual property, only trustworthy community members who are willing to sign nondisclosure agreements are allowed to participate.

21.7 se c u r i t y te st i ng

Any computer-based system that manages sensitive information or causes actions that can improperly harm (or benefit) individuals is a target for improper or illegal pen- etration. Penetration spans a broad range of activities: hackers who attempt to pene- trate systems for sport, disgruntled employees who attempt to penetrate for revenge, dishonest individuals who attempt to penetrate for illicit personal gain.

Security testing attempts to verify that protection mechanisms built into a system will, in fact, protect it from improper penetration. Given enough time and resources, thorough security testing will ultimately penetrate a system. The role of the system designer is to make penetration cost more than the value of the information that will

3 Crowdsourcing is a distributed problem-solving model where community members work on solutions to problems posted to the group.

424 PART THREE QUALITY AND SECURITY

be obtained. Security assurance and security engineering are discussed in more detail in Chapter 18.

Mobile security is a complex subject that must be fully understood before effective security testing can be accomplished.4 MobileApps and the client-side and server-side environments in which they are housed represent an attractive target for external hack- ers, disgruntled employees, dishonest competitors, and anyone else who wishes to steal sensitive information, maliciously modify content, degrade performance, disable functionality, or embarrass a person, organization, or business.

Security tests are designed to probe vulnerabilities of the client-side environment, the network communications that occur as data are passed from client to server and back again, and the server-side environment. Each of these domains can be attacked, and it is the job of the security tester to uncover weaknesses that can be exploited by those with the intent to do so.

On the client side, vulnerabilities can often be traced to preexisting bugs in brows- ers, e-mail programs, or communication software. On the server side, vulnerabilities include denial-of-service attacks and malicious scripts that can be passed along to the client side or used to disable server operations. In addition, server-side databases can be accessed without authorization (data theft).

To protect against these (and many other) vulnerabilities, firewalls, authentication, encryption, and authorization techniques can be used. Security tests should be designed to probe each of these security technologies in an effort to uncover security holes.

The actual design of security tests requires in-depth knowledge of the inner work- ings of each security element and a comprehensive understanding of a full range of networking technologies. If the MobileApp or WebApp is business critical, maintains sensitive data, or is a likely target of hackers, it’s a good idea to outsource security testing to a vendor who specializes in it.

21.8 pe r f o r M a nc e te st i ng

For real-time and embedded systems, software that provides required functionality but does not conform to performance requirements is unacceptable. Performance testing is designed to test the run-time performance of software within the context of an integrated system. Performance testing occurs throughout all steps in the testing pro- cess. Even at the unit level, the performance of an individual module may be assessed as tests are conducted. However, it is not until all system elements are fully integrated that the true performance of a system can be ascertained.

Nothing is more frustrating than a MobileApp that takes minutes to load content when competitive apps download similar content in seconds. Nothing is more aggra- vating than trying to log on to a WebApp and receiving a “server-busy” message, with the suggestion that you try again later. Nothing is more disconcerting than a MobileApp or WebApp that responds instantly in some situations and then seems to go into an infinite wait state in other situations. All of these occurrences happen on the Web every day, and all of them are performance related.

4 Books by Bell et al. [Bel17], Sullivan and Liu [Sul11], and Cross [Cro07] provide useful information about the subject.

CHAPTER 21 SOFTWARE TESTING—SPECIALIZED TESTING FOR MOBILITY 425

Performance testing is used to uncover performance problems that can result from a lack of server-side resources, inappropriate network bandwidth, inadequate data- base capabilities, faulty or weak operating system capabilities, poorly designed WebApp functionality, and other hardware or software issues that can lead to degraded client-server performance. The intent is twofold: (1) to understand how the system responds as loading (i.e., number of users, number of transactions, or overall data volume), and (2) to collect metrics that will lead to design modifications to improve performance.

Performance tests are often coupled with stress testing and usually require both hardware and software instrumentation. That is, it is often necessary to measure resource utilization (e.g., processor cycles) in an exacting fashion. External instrumen- tation can monitor execution intervals, log events (e.g., interrupts) as they occur, and sample machine states on a regular basis. By instrumenting a system, the tester can uncover situations that lead to degradation and possible system failure.

Some aspects of MobileApp performance, at least as the end user perceives it, are difficult to test. Network loading, the vagaries of network interfacing hardware, and similar issues are not easily tested at the client or browser level. Mobile performance tests are designed to simulate real-world loading situations. As the number of simul- taneous app users grows, or the number of online transactions increases, or the amount of data (downloaded or uploaded) increases, performance testing will help answer the following questions:

∙ Does the server response time degrade to a point where it is noticeable and unacceptable?

∙ At what point (in terms of users, transactions, or data loading) does performance become unacceptable?

∙ What system components are responsible for performance degradation? ∙ What is the average response time for users under a variety of loading

conditions? ∙ Does performance degradation have an impact on system security? ∙ Is app reliability or accuracy affected as the load on the system grows? ∙ What happens when loads that are greater than maximum server capacity are

applied? ∙ Does performance degradation have an impact on company revenues?

To develop answers to these questions, two different performance tests are con- ducted: (1) load testing examines real-world loading at a variety of load levels and in a variety of combinations, and (2) stress testing forces loading to be increased to the breaking point to determine how much capacity the app environment can handle.

The intent of load testing is to determine how the WebApp and its server-side environment will respond to various loading conditions. As testing proceeds, permu- tations to the following variables define a set of test conditions:

N, number of concurrent users T, number of online transactions per unit of time D, data load processed by the server per transaction

426 PART THREE QUALITY AND SECURITY

In every case, these variables are defined within normal operating bounds of the system. As each test condition is run, one or more of the following measures are collected: average user response, average time to download a standardized unit of data, or average time to process a transaction. You should examine these measures to deter- mine whether a precipitous decrease in performance can be traced to a specific com- bination of N, T, and D.

Load testing can also be used to assess recommended connection speeds for users of the WebApp. Overall throughput, P, is computed in the following manner:

P = N × T × D

As an example, consider a popular sports news site. At a given moment, 20,000 concurrent users submit a request (a transaction, T) once every 2 minutes on average. Each transaction requires the WebApp to download a new article that averages 3K bytes in length. Therefore, throughput can be calculated as:

P = 20,000 × 0.5 × 3kb

60 = 500 Kbytes/sec

= 4 megabits per second

The network connection for the server would therefore have to support this data rate and should be tested to ensure that it does.

Stress testing for mobile apps attempts to find errors that will occur under extreme operating conditions. In addition, it provides a mechanism for determining whether the MobileApp will degrade gracefully without compromising security. Among the many actions that might create extreme conditions are: (1) running several mobile apps on the same device, (2) infecting system software with viruses or malware, (3) attempting to take over a device and use it to spread spam, (4) forcing the mobile app to process inordinately large numbers of transactions, and (5) storing inordinately large quantities of data on the device. As these conditions are encountered, the MobileApp is checked to ensure that resource-intensive services (e.g., streaming media) are handled properly.

21.9 re a L-ti M e te st i ng

The time-dependent, asynchronous nature of many mobile and real-time applications adds a new and potentially difficult element to the testing mix—time. Not only does the test-case designer have to consider conventional test cases but also event handling (i.e., interrupt processing), the timing of the data, and the parallelism of the tasks (processes) that handle the data. In many situations, test data provided when a real-time system is in one state will result in proper processing, while the same data provided when the system is in a different state may lead to error. In addition, the intimate relationship that exists between real-time software and its hardware environment can also cause testing problems. Software tests must consider the impact of hardware faults on software processing. Such faults can be extremely difficult to simulate realistically.

Many MobileApp developers advocate testing in the wild, or testing in the users’ native environments with the production release versions of the MobileApp resources

CHAPTER 21 SOFTWARE TESTING—SPECIALIZED TESTING FOR MOBILITY 427

[Soa11]. Testing in the wild is designed to be agile and respond to changes as the MobileApp evolves [Ute12].

Some of the characteristics of testing in the wild include adverse and unpredict- able environments, outdated browsers and plug-ins, unique hardware, and imperfect connectivity (both Wi-Fi and mobile carrier). To mirror real-world conditions, the demographic characteristics of testers should match those of targeted users, as well as those of their devices. In addition, you should include use cases involving small numbers of users, less-popular browsers, as well as a diverse set of mobile devices. Testing in the wild is always somewhat unpredictable, and test plans must be adapted as testing progresses. For further information, Rooksby and his colleagues have identified themes that are present in successful strategies for testing in the wild [Roo09].

Because MobileApps are often developed for multiple devices and designed to be used in many different contexts and locations, a weighted device platform matrix (WDPM) helps ensure that test coverage includes each combination of mobile device and context variables. The WDPM can also be used to help prioritize the device/ context combinations so that the most important are tested first.

The steps to build the WDPM (Table 21.1) for several devices and operating sys- tems are: (1) list the important operating system variants as the matrix column labels, (2) list the targeted devices as the matrix row labels, (3) assign a ranking (e.g., 0 to 10) to indicate the relative importance of each operating system and each device, and (4) compute the product of each pair of rankings and enter each product as the cell entry in the matrix (use NA for combinations that are not available).

Testing effort should be adjusted so that the device/platform combinations with the highest ratings receive the most attention for each context variable under consider- ation.5 In Table 21.1, Device4 and OS3 have the highest rating and would receive high-priority attention during testing.

Actual mobile devices have inherent limitations precipitated by the combination of hardware and firmware delivered in the device. If the range of potential device plat- forms is large, it is expensive and time consuming to perform MobileApp testing.

5 Context variables are variables that are associated with either the current connection or the current transaction that the MobileApp will use to direct its visible-user behavior.

Table 21.1 Weighted device platform matrix

OS1 OS2 OS3

Ranking 3 4 7

Device1 7 N/A 28 49

Device2 3 9 N/A N/A

Device3 4 12 N/A N/A

Device4 9 N/A 36 63

428 PART THREE QUALITY AND SECURITY

Mobile devices are not designed with testing in mind. Limited processing power and storage capacity may not allow loading of the diagnostic software needed to record the test-case performance. Emulated devices are often easier to manage and allow easier acquisition of test data. Each mobile network (there are hundreds, worldwide) uses its own unique infrastructure to support the mobile Web. Emulators often cannot emulate the effects and timing of network services, and you may not see problems that users will have when the MobileApp runs on an actual device.

Creating test environments in-house is an expensive and error-prone process. Cloud-based testing can offer a standardized infrastructure and preconfigured software images, freeing the MobileApp team from the need to worry about finding servers or purchasing their own licenses for software and testing tools [Goa14]. Cloud service providers give testers access to scalable, ready-to-use virtual laboratories with a library of operating systems, test and execution management tools, and storage necessary for creating a test environment that closely mirrors the real world [Tao17].

Cloud-based testing is not without potential problems: lack of standards, potential security issues, data location and integrity issues, incomplete infrastructure support, improper usages of services, and performance issues are only some of the common challenges that face development teams that use the cloud approach.

Last, it is important to monitor power consumption specifically associated with the use of the MobileApp on a mobile device. Transmitting information from mobile devices consumes more power than monitoring a network for a signal. Processing streaming media consumes more power than loading a Web page or sending a text message. Assessing power consumption accurately must be done in real time on the actual device and in the wild.

21.10 te st i ng ai syst e M s

As we discussed in Chapter 13, mobile users expect products like MobileApps, VR systems, and video games to be context aware. Whether the software product is react- ing to the user’s environment [Abd16], automatically adapting the user interface based on past user behaviors [Par15], or providing a realistic nonplaying character (NPC) in a game situation [Ste16], artificial intelligence (AI) techniques are involved. Often these techniques make use of things like machine learning, data mining, statistics, heuristic programming, or rule-based systems that are outside the scope of this book. There are several problems common to testing these systems that can be addressed with the techniques we have discussed.

AI techniques make use of information that has been obtained from human experts or summarized from large numbers of observations saved in a data store of some kind. These data need to be organized in some way so that they can be accessed and updated efficiently if the software product is to be context aware or self-adaptive. The heuris- tics for making use of these data to assist decision making in the software are usually described by humans in use cases or formulas obtained from statistical data analysis. Part of what makes these systems hard to test is the large number of data interactions that need to be accounted for by the software, but whose occurrence is hard to predict. Software engineers often need to rely on simulation and model-based techniques to test AI systems.

CHAPTER 21 SOFTWARE TESTING—SPECIALIZED TESTING FOR MOBILITY 429

21.10.1 Static and Dynamic Testing Static testing is a software verification technique that focuses on review rather than executable testing. It is important to ensure that human experts (stakeholders who understand the application domain) agree with the ways in which the developers have represented the information and its use in the AI system. Like all software verification techniques, it is important to ensure that the program code represents the AI specifica- tions, which means mapping between use case inputs and outputs is reflected in the code.

Dynamic testing for AI systems is a validation technique that exercises the source code with test cases. The intent is to show that the AI system conforms to the behav- iors specified by the human experts. In the case of knowledge discovery or data min- ing, the program may have been designed to discover new relationships unknown to human experts. Human experts must validate these new relationships before they are used in safety-critical software products [Abd16] [Par15].

Many of the real-time testing issues discussed in Section 21.9 apply in dynamic testing of AI systems. Even if automatically generated simulated test cases are used, it is not possible to test every combination of events the software will encounter in the wild. It is often desirable to build in mechanisms to allow the users to specify when they are not happy with the decisions made by the program and collect informa- tion on the program state for future corrective action by developers.

21.10.2 Model-Based Testing Model-based testing (MBT) is a black-box testing technique that uses information contained in the requirements model (in particular the user stories) as the basis for the generation of test cases [DAC03]. In many cases, the model-based testing tech- nique uses formalism like UML state diagrams, an element of the behavioral model (Chapter 8), as the basis for the design of test cases.6 The MBT technique requires five steps:

1. Analyze an existing behavioral model for the software, or create one. Recall that a behavioral model indicates how software will respond to exter- nal events or stimuli. To create the model, you should perform the steps discussed in Chapter 8: (1) evaluate all use cases to fully understand the sequence of interaction within the system, (2) identify events that drive the interaction sequence and understand how these events relate to specific objects, (3) create a sequence for each use case, (4) build a UML state diagram for the system (e.g., see Figure 8.8), and (5) review the behavioral model to verify accuracy and consistency.

2. Traverse the behavioral model and specify the inputs that will force the software to make the transition from state to state. The inputs will trigger events that will cause the transition to occur.

3. Review the behavioral model, and note the expected outputs as the software makes the transition from state to state. Recall that each state transition is triggered by an event and that as a consequence of the transition

6 Model-based testing can also be used when software requirements are represented with decision tables, grammars, or Markov chains [DAC03].

430 PART THREE QUALITY AND SECURITY

some function is invoked and outputs are created. For each set of inputs (test cases) you specified in step 2, specify the expected outputs as they are characterized in the behavioral model.

4. Execute the test cases. Test cases can be executed manually, or a test script can be created for use by an automated testing tool.

5. Compare actual and expected results and take corrective action as required.

MBT helps to uncover errors in software behavior, and as a consequence, it is extremely useful when testing event-driven applications such as context-aware MobileApps.

21.11 te st i ng Vi rt ua L en V i ro n M e n t s

It is virtually impossible for a software developer to foresee how the customer will actually use a program. Instructions for use may be misinterpreted; strange combina- tions of input actions may be used; feedback that seemed clear to the tester may be unintelligible to a user in the field. User experience designers are very aware of the importance of getting feedback from actual users early in the prototyping process to avoid creating software that users dislike.

Acceptance tests are a series of specific tests conducted by the customer in an attempt to uncover product errors before accepting the software from the developer. Conducted by the end user rather than software engineers, acceptance testing can range from an informal “test drive” to a planned and systematically executed series of scripted tests.

When a software product is built for one customer, it is reasonable for that person to conduct a series to validate all requirements. If software is a virtual simulation or game developed as a product to be used by many customers, it is impractical to allow each user to perform formal acceptance tests. Most software product builders use a process called alpha and beta testing to uncover errors that only end users seem able to find.

The alpha test is conducted at the developer’s site by a representative group of end users. The software is used in a natural setting with the developer “looking over the shoulder” of the users and recording errors and usage problems. Alpha tests are con- ducted in a controlled environment.

The beta test is conducted at one or more end-user sites. Unlike alpha testing, the developer generally is not present. Therefore, the beta test is a “live” application of the software in an environment that cannot be controlled by the developer. The cus- tomer records all problems (real or imagined) that are encountered during beta testing and reports these at regular intervals. As a result of problems reported during beta tests, the developer makes modifications and then prepares for release of the software product to the entire customer base.

21.11.1 Usability Testing Usability testing evaluates the degree to which users can interact effectively with the app and the degree to which the app guides users’ actions, provides meaningful

CHAPTER 21 SOFTWARE TESTING—SPECIALIZED TESTING FOR MOBILITY 431

feedback, and enforces a consistent interaction approach. Rather than focusing intently on the semantics of some interactive objective, usability reviews and tests are designed to determine the degree to which the app interface makes the user’s life easy.7

Developers contribute to the design of usability tests, but in general the tests are conducted by end users. Usability testing can occur at a variety of different levels of abstraction: (1) the usability of a specific interface mechanism (e.g., a form) can be assessed, (2) the usability of a complete virtual interface (encompassing interface mechanisms, data objects, and related functions) can be evaluated, or (3) the usability of the complete virtual-world application can be considered.

The first step in usability testing is to identify a set of usability categories and establish testing objectives for each category. The following test categories and objectives (written in the form of a question) illustrate this approach:8

Interactivity. Are interaction mechanisms (e.g., pull-down menus, buttons, widgets, inputs) easy to understand and use?

Layout. Are navigation mechanisms, content, and functions placed in a manner that allows the user to find them quickly?

Readability. Is text well written and understandable?9 Are graphic representa- tions easy to understand?

Aesthetics. Do layout, color, typeface, and related characteristics lead to ease of use? Do users “feel comfortable” with the look and feel of the app?

Display characteristics. Does the app make optimal use of screen size and resolution?

Time sensitivity. Can important features, functions, and content be used or acquired in a timely manner?

Feedback. Do users receive meaningful feedback to their actions? Is the user’s work interruptible and recoverable when a system message is displayed?

Personalization. Does the app tailor itself to the specific needs of different user categories or individual users?

Help. Is it easy for users to access help and other support options? Accessibility. Is the app accessible to people who have disabilities? Trustworthiness. Are users able to control how personal information is shared?

Does the app make use of personal information without user permission?

A series of tests is designed within each of these categories. In some cases, the “test” may be a visual review of the app screen displays. In other cases interface semantics tests may be executed again, but in this instance usability concerns are paramount.

7 The term user-friendliness has been used in this context. The problem, of course, is that one user’s perception of a “friendly” interface may be radically different from another’s.

8 For additional information on usability, see Chapter 12. 9 The FOG Readability Index and others may be used to provide a quantitative assessment of

readability.

432 PART THREE QUALITY AND SECURITY

As an example, we consider usability assessment for interaction and interface mechanisms. The following is a list of interface features that might be reviewed and tested for usability: animations, buttons, color, control, graphics, labels, menus, mes- sages, navigation, selection mechanisms, text, and HUDs10 (heads-up user displays). As each feature is assessed, it is graded on a qualitative scale by the users who are doing the testing. Figure 21.3 shows a possible set of assessment “grades” that can be selected by users. These grades are applied to each feature individually, to a com- plete app screen display, or to the app as a whole.

21.11.2 Accessibility Testing Accessibility testing is the process of verifying the degree to which all people can use a computer system regardless of any user’s special need. The special needs most commonly considered for computer system accessibility are: visual, hearing, movement, and cognitive impairments [Zan18]. Many of these special needs evolve as people get older. As a profession, virtual environment development has not done a good job of providing access systems with rich graphical interfaces that rely heavily on touch interactions [Dia14]. The problems merely shift with a switch to voice-activated

10 Mobile apps and games frequently provide graphical displays containing user status, system messages, navigation date, and menu choices as part of the device screen display or HUD.

Figure 21.3 Qualitative assessment of usability

Confusing

Clear

Easy to Learn

E�ective

Simple

Awkward

Ease of Use

Predictability

Difficult to Learn

Misleading

Inconsistent

Lacking Uniformity

Generally Uniform

Predictable

Ease of Understanding

Assessment grade applied to a specific usability feature

Somewhat Ambiguous

Informative

CHAPTER 21 SOFTWARE TESTING—SPECIALIZED TESTING FOR MOBILITY 433

personal assistants like Alexa® or Siri®. Just imagine trying to operate your smart- phone without using all these senses: sight, hearing, touch, or speech.

We discussed guidelines11 for designing accessible software products in Chapter 13. An effective design strategy should ensure that all important interactions with the user be presented using more than one information channel. A few examples of the areas of focus for accessibility testing follow [Zan18] [Dia14]:

∙ Ensure that all nontext screen objects are also represented by a text-based description.

∙ Verify that color is not used exclusively to convey information to the user. ∙ Demonstrate that high contrast and magnification options are available for

elderly or visually challenged users. ∙ Ensure that speech input alternatives have been implemented to accommodate

users that may not be able to manipulate a keyboard, keypad, or mouse. ∙ Demonstrate that blinking, scrolling, or auto content updating is avoided to

accommodate users with reading difficulties.

It is likely that mobile, cloud-based software will come to dominate many things that users need to accomplish on a day-to-day basis (e.g., banking, tax preparation, restaurant reservations, trip planning) and as a consequence, the need for accessible software products will only grow. Along with expert review and automated tools to assess accessibility, a thorough accessibility testing strategy will help to ensure that every user, no matter their challenges, will be accommodated.

21.11.3 Playability Testing Playability is the degree to which a game or simulation is fun to play and usable by the user/player and was originally conceived as part of the development of video games. Game playability is affected by the quality of the game: usability, storyline, strategy, mechanics, realism, graphics, and sound. With the advent of virtual/ augmented reality simulations whose intention is to provide entertainment or learning opportunities (e.g., such as simulated troubleshooting), it makes sense to use play- ability testing as part of the usability testing for a virtual environment created by MobileApp [Vel16].

Expert review can be used as part of the playability testing, but unless expert users are your target user group you may not get the feedback you need to have your MobileApp succeed in the marketplace. Expert review should be supplemented by playability tests conducted by representative end users, as you might do for a beta or acceptance test. In a typical play test, the user might be given general instructions on using the app and the developers would then step back and observe players use of the game without interruption. The players may be asked to complete a survey on their experience once they are done with the play test [Hus15].

The developers might record the play session or simply take notes on what they observe. The developers are looking for places in the play session where the player does

11 Here is an example of a software accessibility checklist used by the United States Depart- ment of Justice: https://www.justice.gov/crt/software-accessibility-checklist.

434 PART THREE QUALITY AND SECURITY

not know what to do next (this is usually marked by a sudden halt in the player’s actions). Developers should note where the player is in the app work flow when this event hap- pens. When the play test has ended, the developers might discuss why the player got stuck and how the player got herself unstuck (it she did). This suggests that playability testing might be helpful in assessing the accessibility of a virtual environment as well.

21.12 te st i ng do c u M e n tat i o n a n d he L p fac i L i t i e s

The term usability testing conjures up images of large numbers of test cases prepared to exercise computer programs and the data that they manipulate. But errors in help facilities or online program documentation can be as devastating to the acceptance of the program as errors in data or source code. Nothing is more frustrating than follow- ing a user guide or help facility exactly and getting results or behaviors that do not coincide with those predicted by the documentation. It is for this reason that docu- mentation testing should be an important part of every software test plan.

Documentation testing can be approached in two phases. The first phase, technical review (Chapter 16), examines the document for editorial clarity. The second phase, live test, uses the documentation in conjunction with the actual program.

Surprisingly, a live test for documentation can be approached using techniques that are analogous to many of the black-box testing methods discussed earlier. Graph-based testing can be used to describe the use of the program; equivalence partitioning and boundary value analysis can be used to define various classes of input and asso- ciated interactions. MBT can be used to ensure that documented behavior and actual behavior coincide. Program usage is then tracked through the documentation.

Documentation Testing The following questions should be an- swered during documentation and/or

help facility testing:

∙ Does the documentation accurately describe how to accomplish each mode of use?

∙ Is the description of each interaction sequence accurate?

∙ Are examples accurate? ∙ Are terminology, menu descriptions, and system

responses consistent with the actual program? ∙ Is it relatively easy to locate guidance within

the documentation? ∙ Can troubleshooting be accomplished easily

with the documentation? ∙ Are the document’s table of contents and

index robust, accurate, and complete?

∙ Is the design of the document (layout, typefac- es, indentation, graphics) conducive to under- standing and quick assimilation of information?

∙ Are all software error messages displayed for the user described in more detail in the document? Are actions to be taken as a consequence of an error message clearly delineated?

∙ If hypertext links are used, are they accurate and complete?

∙ If hypertext is used, is the navigation design appropriate for the information required?

The only viable way to answer these questions is to have an independent third party (e.g., selected users) test the documentation in the context of program usage. All discrepancies are noted, and areas of document ambiguity or weakness are defined for potential rewrite.

info

CHAPTER 21 SOFTWARE TESTING—SPECIALIZED TESTING FOR MOBILITY 435

21.13 su M M a ry

The goal of MobileApp testing is to exercise each of the many dimensions of software quality for mobile applications with the intent of finding errors or uncovering issues that may lead to quality failures. Testing focuses on the quality elements such as content, function, structure, usability, use of context, navigability, performance, power manage- ment, compatibility, interoperability, capacity, and security. It incorporates reviews and usability assessments that occur as the MobileApp is designed, and tests that are con- ducted once the MobileApp has been implemented and deployed on an actual device.

The MobileApp testing strategy exercises each quality dimension by initially exam- ining “units” of content, functionality, or navigation. Once individual units have been validated, the focus shifts to tests that exercise the MobileApp as a whole. To accom- plish this, many tests are derived from the user’s perspective and are driven by infor- mation contained in use cases. A MobileApp test plan is developed and identifies testing steps, work products (e.g., test cases), and mechanisms for the evaluation of test results. The testing process encompasses several different types of testing.

Content testing (and reviews) focus on various categories of content. The intent is to examine errors that affect the presentation of the content to the end user. The content needs to be examined for performance issues imposed by the mobile device constraints. Interface testing exercises the interaction mechanisms that define the user experience provided by the MobileApp. The intent is to uncover errors that result when the MobileApp does not take device, user, or location context into account.

Navigation testing is based on use cases, derived as part of the modeling activity. The test cases are designed to exercise each usage scenario against the navigation design within the architectural framework used to deploy the MobileApp. Component testing exercises content and functional units within the MobileApp.

Performance testing encompasses a series of tests that are designed to assess MobileApp response time and reliability as demands on server-side resource capacity increase.

Security testing incorporates a series of tests designed to exploit vulnerabilities in the MobileApp or its environment. The intent is to find security holes in either the device operating environment or the Web services being accessed.

Finally, MobileApp testing should address performance issues such as power usage, processing speed, memory limitations, ability to recover from failures, and connectiv- ity issues.

Navigation testing applies use cases, derived as part of the modeling activity, in the design of test cases that exercise each usage scenario against the navigation design. Navigation mechanisms are tested to ensure that any errors impeding completion of a use case are identified and corrected. Component testing exercises content and functional units within the MobileApp.

pro b L e M s a n d po i n t s to po n d e r

21.1. Are there any situations in which MobileApp testing on actual devices can be disregarded?

21.2. Is it fair to say that the overall mobility testing strategy begins with user-visible elements and moves toward technology elements? Are there exceptions to this strategy?

436 PART THREE QUALITY AND SECURITY

21.3. Describe the steps associated with user experience testing for an app.

21.4. What is the objective of security testing? Who performs this testing activity?

21.5. Assume that you are developing a MobileApp to access an online pharmacy (YourCornerPharmacy.com) that caters to senior citizens. The pharmacy provides typical func- tions but also maintains a database for each customer so that it can provide drug information and warn of potential drug interactions. Discuss any special usability or accessibility tests for this MobileApp.

21.6. Assume that you have implemented a Web service that provides a drug interaction– checking function for YourCornerPharmacy.com (see Problem 21.5). Discuss the types of component-level tests that would have to be conducted on the mobile device to ensure that the MobileApp accesses this function properly.

21.7. Is it possible to test every configuration that a MobileApp is likely to encounter in the production environment? If it is not, how do you select a meaningful set of configuration tests?

21.8. Describe a security test that might need to be conducted for the YourCornerPharmacy MobileApp (Problem 21.5). Who should perform this test?

21.9. What is the difference between testing that is associated with interface mechanisms and testing that addresses interface semantics?

21.10. What is the difference between testing for navigation syntax and navigation semantics?

Design element: Quick Look icon magnifying glass: © Roger Pressman

Order Solution Now

Categories: