FUJ00119614 - ICL Pathway Acceptance Incident 298 - Resolution Plan v0.8

Evidence on official site

FUJ00119614

FUJ00119614
ICL Acceptance Incident 298 — Resolution Ref. CR/ACD/298
Pathway Plan wepate: 29999
Document Title: Acceptance Incident 298 — Resolution Plan
Document Type: Acceptance Resolution Plan
Abstract: This document contains ICL Pathway’s updated resolution plan

for Acceptance Incident 298.

Status: Draft
Distribution: Expert:
Peter Copping

ICL Pathway:

Terry Austin

David Hollingsworth
Library

POCL:

John Meagher

Min Burdett
Jeff Austin

Author: JCC Dicks & D.C.Hollingsworth

Comments to: Pathway list

Comments by:

© 1999 ICL Pathway Ltd COMMERCIAL IN CONFIDENCE Page I of 29
ICL
Pathway

FUJ00119614

FUJ00119614
Acceptance Incident 298 — Resolution Ref; CR/ACD/298
Plan Version: 0.8
Date: 23/9/99

0 Document control

0.1 Document history

Version
0.1
0.2
0.3
04

0.5

0.6

0.7

0.8

Date
20/8/99
24/8/99
2/9/99
9/9/99

10/9/99

16/9/99

22/9/99

23/9/99

Reason

Initial draft for comments

Version for the Expert and workshop 26/8
Redrafted as a resolution plan

Material added on longer term incidence rates and defect
prevention for future releases; distributed as a draft at
Acceptance Workshop 9/9/99

Statistics updated to CAP 24; amendments to show statistics
by counter volumes as a result of Acceptance Workshop
9/9/99

Summary & outline forward projections added to Section
5.2.4; additional material incorporated into Section 5.5,
following review with POCL

Section 5.4.4 updated to reflect agreement on monitoring
process during Oct/Nov.

(DN: Partial results for CAP26 have been included in this
draft and should be disregarded.]

Further updates arising from drafting of Schedule 2 Part A of
the second supplementary agreement

0.2 Approval authorities

Name Position Signature Date

JH Bennett Managing Director

JCC Dicks Customer Requirements

Director

T P Austin Development Director
0.3 Associated documents

Reference Vers Title Source
0.4 Abbreviations
© 1999 ICL Pathway Ltd COMMERCIAL IN CONFIDENCE Page 2 of 29
FUJ00119614

FUJ00119614
ICL Acceptance Incident 298 — Resolution Ret: CRAcD i298
ersion: 0.8
Pathway Plan Date: 23/9/99

© 1999 ICL Pathway Ltd COMMERCIAL IN CONFIDENCE Page 3 of 29
FUJ00119614

FUJ00119614

ICL Acceptance Incident 298 — Resolution Ref; CR/ACD/298
Pathway Plan Version: 0.8

Date: 23/9/99

0.5 Table of content

1

PURPOSE

SUMMARY ...eessseee

POCL POSITIOD

PATHWAY POSITION ..

5.1 PATHWAY WORK PROGRAMME.
5.1.1 Short- Medium Term Activitie
5.1.2 Medium-Long Term Activities.

5.2 STATISTICS FOR THE PERIOD SINCE 29 JULY ...

5.2.1 High level analysis

5.2.2 System Load Events & Unauthorised Reboots.

5.2.3 System Incident Metrics .... 7
5.2.4 Summary Position (CAP 25) & Future Projections 10
5.3. DETAILED INCIDENT ANALY: 13

5.3.1 Button No Entry Sign
5.3.2 Suspense Account Prin
5.3.3 Virtual Memory Problem:
5.3.4 Printer Hanging...
5.3.5 Freezing during /afie
5.3.6 FI Twice during log-on
5.3.7 System Busy Message
5.3.8 Query Logged-on Users Message...
5.3.9 Miscellaneous Freezing / Usage
5.3.10 Counter Printer problems
5.3.11 APS Problems
5.3.12 OBCS Problem:
5.3.13 Counter Printer Busy Problems ..

5.4 RESOLUTION OF INCIDENT METRICS
5.4.1, Contractual Requirements ...
5.4.2. Comparison against Industry Norms

5.
5.4.3. Acceptance Position
5.4.4, Resolution Proposal.

5.5 IMPROVED DEFECT REMOVAL FOR FUTURE RELEASES.
5.5.1 PINICL Analys'
5.5.2. Implications for C3

© 1999 ICL Pathway Ltd COMMERCIAL IN CONFIDENCE Page 4 of 29
FUJ00119614

FUJ00119614
ICL Acceptance Incident 298 - Resolution —__Refi_ CR/ACD/208
ersion:
Pathway Plan Date: 23/9/99
1 Purpose

This paper seeks agreed ways forward to resolve the system instability issues.

2 Summary

Pathway presents for review the relevant statistics for the period since 29 July, with
particular reference to System Load Events; the progress to date at a detailed level;
and the approach to future measurement, which it is proposed will involve POCL.

3 = Criteria

The Criterion cited is 536/1.

“peripheral and input devices supplied as part of the elements of the Service
Infrastructure on which OPS is provided shall be reliable, robust and easy to use”.

4  POCL position

Based upon the minutes of the Acceptance Board Meeting of 18 August 1999, POCL
contended that:

“the proposed rectification plan does not provide an understanding of how the
problems will be resolved by the proposed fixes. It is also unclear when fixes will be
implemented”.

“POCL would need to see the outturn of [the fixes] as this was the only way to
confirm the impact of the changes”.

“evidence from ringarounds suggested the problem could be 50% higher than reported
at the help desk and that there was no clear evidence from Pathway to confirm or deny
this”

At the Acceptance Workshop on 6" September POCL introduced a proposed metric
of 1 system “lock-up” or “crash” (requiring reboot) per counter PC per annum. This is
based upon the achievement of a 95% reduction in stability incidents reported against
week 19 and is said to be broadly in line with system stability statistics from ECCO
and ALPS.

© 1999 ICL Pathway Ltd COMMERCIAL IN CONFIDENCE Page 5 of 29
FUJ00119614
FUJ00119614

ICL Acceptance Incident 298 — Resolution Ref; CR/ACD/298

Version: 0.8
Pathway Plan Date: 23/9/99

5 Pathway position

5.1 Pathway work programme

5.1.1 Short- Medium Term Activities

The ICL Pathway programme of work to stabilise the current level of system
comprises root cause analysis and resolution of system incidents:

e detailed examination of Horizon System Help Desk call records

e direct telephone contact with post offices to more fully understand the detailed
nature of the problem as seen by the users

© reconstruction and analysis of problems within Pathway test systems

¢ testing and automated distribution of fixes as described in the Acceptance Incident
Analysis of 17 August

The details of this work programme are provided in Section 5.3, which gives an
analysis of the various system stability faults by category, along with details of fixes
applied and associated incidents levels pre- and post-fix.

5.1.2 Medium-Long Term Activities

In parallel with this short term activity, a thorough review of the detected faults is
underway to ascertain their nature and to identify what changes may be appropriate to
the ongoing Pathway development and testing approach. Section 5.5 of this document
provides details of the analysis already undertaken in this respect, the initial
conclusions and suggestions for improved defect removal for future releases.

5.2 Statistics for the period since 29 July

5.2.1 High level analysis

The principal measure of systems instability has been the calls made to the Horizon
Systems Help Desk by outlet staff reporting a problem with the functioning of the
system at the outlet.

For a proportion of such calls the incident is resolved by a system unit reboot (a Help
Desk “authorised reboot”). In other cases the Help Desk staff may recommend an
avoidance action that provides a simple workaround to the problem without rebooting
the system unit. In certain cases the Help Desk may also receive a call from an outlet
advising that outlet staff have locally initiated a reboot; such calls are recorded by the
Help Desk and normally provide some additional information relating to the
circumstances of the incident.

© 1999 ICL Pathway Ltd COMMERCIAL IN CONFIDENCE Page 6 of 29
FUJ00119614
FUJ00119614

ICL Acceptance Incident 298 — Resolution Ref; CR/ACD/298

Version: 0.8
Pathway Plan Date: 23/9/99

5.2.2 System Load Events & “Unauthorised” Reboots

POCL expressed concern over the potential occurrence at outlets of locally initiated
system unit reboots that had not been reported to the Help Desk. ICL Pathway
subsequently mounted an exercise to extract this information by extracting and
analysing the Windows NT System Event Logs at each outlet. This provides precise
statistics for all System Load Events (SLEs) whatever their cause. By correlating these
load events with reboot instructions issued at the Help Desk it has been possible to
produces metrics for both authorised (via HSH) reboots and unauthorised (via local
office action) reboots. This analysis is continuing on a day by day basis.

Such unauthorised reboots may occur for a variety of reasons, including:

1. in response to a perceived systems malfunction of some kind, where the clerk
does not contact the Help Desk and initiates such action of his own volition

2. in response to an environmental incident such as a power cut or through
disconnection of the power supply

3. through failure to leave the machines switched on during periods of unattended
operation (e.g. overnight or weekends) with corresponding reboots when
operation restarts, e.g. on a Monday morning

Since the circumstances relating to such incidents are unknown, the incidents cannot
be directly attributed as systems stability incidents and must be excluded from the
detailed analysis in the following section. Both POCL and ICL Pathway are working
to reduce the incidence of such reboots to the core unavoidable events (category 2)
through improved user education and discipline.

5.2.3 System Incident Metrics
The high level analysis of system instability incidents thus includes three categories:
e Authorised reboots (correlated with Help Desk instructions)
¢ Unauthorised reboots

¢ Total Help Desk system incidents (including authorised reboots and other calls
closed via avoidance actions)

Summary totals for the Cash Account Periods 19-26 are shown in the following charts

(Note that the total for CAP 26 is provisional and the final figure may be subject to
minor variation once all incidents from the 22" September have been fully analysed.)

© 1999 ICL Pathway Ltd COMMERCIAL IN CONFIDENCE Page 7 of 29
FUJ00119614
FUJ00119614

ICL Acceptance Incident 298 — Resolution Ref; CR/ACD/298

Version: 0.8
Pathway Plan Date: 23/9/99

600 + if Reconciled Totals
t

500

400

300

200

Bi ia la ta

CAP19 CAP20 CAP21 CAP22 CAP23 CAP24 CAP25 CAP26

HSH System Incident Calls & HSH Authorised Reboots
Unauthorised Reboots

Note that (i) CAP23 included a Bank Holiday and a planned (authorised) reboot of all
counters, by request to outlets; (ii) “unauthorised” reboots have increased in CAP24
and CAP 25 due to the installation of new outlets showing up in the total. (This trend
is expected to continue.)

A more detailed scheme of incident analysis was instigated by Pathway from CAP23,
to facilitate focused incident analysis and resolution. This places emphasis on that
class of incidents which requires a system reboot. From week 24 an individual
reconciliation of incidents totals between Pathway and POCL has been occurring with
inclusion of a category for “disputed” items which involve an HSH call but not a
reboot. For week 23 a retrospective adjustment has been added to the weekly total to
support comparison between the two weeks. However, direct comparison with earlier
weeks is not valid since the totals were not reconciled in this way.

The following chart shows the same data, with the planned reboot data removed
(31/8/99), CAP 23 adjusted for the bank holiday in terms of incident rate per day, and
the numbers for each week adjusted for the volume of counters installed. This shows
the incidence of the same measurements expressed as a rate of occurrence per counter
per week.

08
07
06
05 7

= Reconciled Totals

04 a

03 7

oo bed bed Ll Ge

an a We al

CAP19 CAP20 CAP21 CAP22 CAP23 CAP24 CAP25 CAP26

[HSH System incident Calls ml HSH Authorised Reboots (i Unauthorised Reboots I

From the above analysis it can be seen that there is a reducing trend, particularly
towards the end of the current period. The chart following shows the incidence of

© 1999 ICL Pathway Ltd COMMERCIAL IN CONFIDENCE Page 8 of 29

FUJ00119614
FUJ00119614

ICL Acceptance Incident 298 — Resolution Ref; CR/ACD/298

Version: 0.8
Pathway Plan Date: 23/9/99

HSH calls per counter per week relating to systems incidents. The level of HSH
(authorised) reboots is now at the level of approximately 0.5 per month per counter,
below the first Pathway target (1.0) and the proposed threshold for classification as
medium severity.

The increase in incidents in CAP 23 is attributable to the introduction of the “System
Busy” indicator and a one-off fault introduced into OBCS which both resulted in a
significant number of new calls.

The following chart shows the HSH system incident calls separated into those
requiring a system reboot and those dealt with by advice or other action.

03

0.25

0.2

0.15 +4

01

0.05 +4

0 :
CAP19 CAP20 CAP21 CAP22 CAP23 CAP24 CAP25 CAP26

[HSH Authorised Reboots ll HSH Advice or Other Action I

A number of analysed HSH calls have not been resolved between ICL Pathway and
POCL and are listed as disputed incidents. Such calls include simple workarounds to
known (predictable and stable) operational problems, and a few other incident types,
such as handling of printer jams and related printing conditions, which have not been
accepted by ICL Pathway as indicative of a system stability problem. The following
chart includes the above totals with disputed items shown as a separate category.

The principal incident in the disputed category is that of button locking (see section
5.3.1). Section 5.4.4 makes it clear that for the period of monitoring, those button
locking incidents that result in reboots or authorised workarounds will be counted.
HSH calls other than those, which result in reboots or workarounds due to button
locking, will not be monitored after CAP26.

© 1999 ICL Pathway Ltd COMMERCIAL IN CONFIDENCE Page 9 of 29
FUJ00119614
FUJ00119614

ICL Acceptance Incident 298 — Resolution Ref; CR/ACD/298

Version: 0.8
Pathway Plan Date: 23/9/99

03

0.25 [I

02

0.15

04

:
0.05
ia

CAP19 CAP20 CAP21 CAP22 CAP23 CAP24 CAP25 CAP26

iG HSH Authorised Reboots HSH Advice or Other Action O Disputed Items

5.2.4 Summary Position (CAP 25) & Future Projections

The incident types have been grouped into 13 categories and a detailed analysis is
provided in section 5.3.

Problems Eliminated

The following incident types have been eliminated with no noted recurrences:

e Back office printer hanging on final cash account production (Section 5.3.4)
e The “one-off” OBCS problem (Section 5.3.12)

© Querying logged-on users problem (Section 5.3.8)

e Improvements in performance of the suspense account print (Section 5.3.2)

Problems Significantly Improved

The following incident types have seen significant improvement but have not been
totally eliminated

¢ Button locking problems (Section 5.3.1) have been reduced to a small number of
incidents, which can be avoided by workaround.

e Virtual memory and other error messages (Section 5.2.3)

e APS application problems - associated with printing and recovery (Section 5.3.11)

Key Outstanding Problems

The following problem areas have seen only minor improvements and are the
principal subjects of current analysis and fixes:

e Freezes during / after log-on and occasionally in other circumstances (Sections
5.3.5, 5.3.6 & 5.3.9)

© 1999 ICL Pathway Ltd COMMERCIAL IN CONFIDENCE Page 10 of 29

FUJ00119614
FUJ00119614

ICL Acceptance Incident 298 — Resolution Ref; CR/ACD/298

Version: 0.8
Pathway Plan Date: 23/9/99

e Counter printing issues (Sections 5.3.10 & 5.3.13)

e System Busy incidents (Section 5.3.7), although these are being re-categorised into
the underlying problems

The history of the main types of incident is shown in the chart below, shown on the
same scale as individual incident rates in section 5.3 (incidents occurring per counter
per annum). This indicates the history of when significant systems problems have
been experienced and eliminated.

16
144 1B Others
124 @ System Busy
10 @ Application Problems
8 D Button Lockung
6 O Freezing
4 [ew IB Back Office Printing
2 @ Counter printing
i)
2 \ > nk <o
SMe MM

This chart shows that the three significant incident categories as of CAPS 24-26 are
counter printing, system freeze conditions and system busy. (It is also apparent that
various AP problems in CAPs 19-23 were related to counter printer incidents, giving
rise to an understatement of such printer conditions during these weeks.)

Future Incident Rates

These are based upon an assessment of the current known problem areas with fixes
either in preparation or distribution, plus an expectation of smaller “second phase”
improvements in the longer term to tackle residual incident types in significant
categories.

The main short term fixes assessed include:

1. The “Double F1” fix to relieve system freezes during log-on (week 25) plus
diagnostic to provide more details of other freeze conditions (should reduce 40-
50% of freeze conditions)

2. The fix to the Riposte Peripheral Server and related incidents involving 2™ page
GIRO reports

3. Elimination of print contention by locking “Previous” during print format
operations

(2 & 3 should eliminate up to 50% of counter printing incidents and are expected in
CAP 26/27)

© 1999 ICL Pathway Ltd COMMERCIAL IN CONFIDENCE Page 11 of 29
FUJ00119614
FUJ00119614

ICL Acceptance Incident 298 — Resolution Ref; CR/ACD/298

Version: 0.8
Pathway Plan Date: 23/9/99

4. Alleviation of button locking problems when an EPOSS receipt is printed after an
APS receipt (should avoid workarounds for a substantial subset of incidents) —
issued in CAP 25

5. A revised version of System Busy based upon Riposte Desktop processor time
rather than total system time (this should eliminate some instances where system
busy is active because of background tasks rather than mainstream counter
operations) — issued in CAP 25

The future projections are separated into near term values for CAP 26 and 27, based
upon extrapolation of the HSH authorised reboot levels, and a set of medium term
values based upon system incident rates. All projections were made during CAP 25
based upon the actual field data obtained up to and including CAP 24. The
(provisional) actual figures for CAP 25 are also shown.

Near Term Projections

Projected Reboot Incidence

0.15
01
eee
0.05 ii) T 1d per3
months
i) level

CAP CAP CAP CAP CAP CAP CAP CAP
24 25 26 27 28 29 30 34

Medium Term Projections

A similar chart for systems incident rate and an extrapolation for actual HSH call
volumes are shown below.

Projected Incident Rate (Per Counter Per Annum)

oN BD @ SO

/

CAP CAP CAP CAP CAP CAP CAP CAP CAP CAP CAP
24 25 26 27 28 29 30 31 32 33 34

The single most significant drop is associated with counter printing fixes expected for
issue during CAP 26/7 and showing in the projection for CAP 28

© 1999 ICL Pathway Ltd COMMERCIAL IN CONFIDENCE Page 12 of 29
ICL

FUJ00119614

FUJ00119614

Acceptance Incident 298 — Resolution Ref: CR/ACD/298

Pathway Plan Version: 0.8

Date: 23/9/99

5.3

Taking account of the projected increase in counter population this leads to the
following outline profile of incidents at the HSH.

Projected Incident Volumes (Per Week)

350 4
300
250
200
150 i

100 a oo oe

50 nee ee oe a a oe of
0 i
CAP CAP CAP CAP CAP CAP CAP CAP CAP CAP CAP
24 25 26 27 28 29 30 31 32 33 34

The projected incident rate remains essentially stable over the near term with
reductions matched by increased counter volume; during October and November there
is a steady increase as the rate of counter build up exceeds incident reduction rate.

Detailed Incident Analysis, Categorisation & Resolution

To facilitate analysis and resolution, system incidents have been filtered into
individual categories, each typically associated with one particular problem area of
system operation. To provide confidence in the improving stability of the system,
incidents are recorded as daily totals within each category, to allow correlation against
the dates at which particular fixes were issued to resolve specific problems. This
analysis includes all system stability incidents whether resolved by a system reboot or
by procedural workaround.

As detailed investigation of incidents proceeds, certain faults may be grouped together
into a new category. Initially 12 categories were identified. At week CAP24 a number
of system busy incidents (category 7) have been categorised differently as the detail of
the fault has been understood. Certain incidents previously recorded under “system
busy” have been identified as hang during/after log-on (category 5) and a specific
problem associated with the counter printer during busy conditions has been created
(category 13).

From version 0.5 of this document, the incident count has been based against the
number of counters installed and quoted as average incidents per counter per annum.

5.3.1 Button No Entry Signs

From time to time under normal system operation Horizon buttons are “locked” to
prevent user entry to the particular function at that point in the menu navigation. Such

© 1999 ICL Pathway Ltd COMMERCIAL IN CONFIDENCE Page 13 of 29
FUJ00119614
FUJ00119614

ICL Acceptance Incident 298 — Resolution Ref; CR/ACD/298

Version: 0.8
Pathway Plan Date: 23/9/99

locked buttons are represented to the user by a “no entry” sign across the button.
Examples of legitimate usage of locked functions include:

e prevention of more than one user selecting cash account functions or producing
certain types of daily printed report

e prevention of logout or entry to training mode when a suspended session exists

At LT2 substantial changes were made to button locking particularly to prevent access
to conflicting functions during cash account and printing functions. The logic
associated with button locking is complex and typically requires combinatorial
analysis of multiple conditions.

Fixes were issued to correct the majority of incidents recorded within this category, by
correcting the complex logic associated with button locking. A minor residual usage
problem has been identified, which results in button locking if the printer goes offline
immediately following a SU balance report. This problem has a simple workaround
and does not require a reboot.

The history of button locking incidents is shown below.

Button No Entry Signs
45
4 7 WP 5138 & 5298 - IB78
35 -
344
25 /
2
15
HEALER
05 /
0 omen
CAP19 CAP20 CAP21 CAP22 CAP23 CAP24 CAP25 CAP26

Note that reported incidents tend to be higher on cash account days because of a
higher incidence of legitimate button locking associated with cash account and office
printing functions. A number of disputed items (incidents which do not require
reboot) are excluded from week 23/24. With these included the average incident rate
is running at approximately 1 — 1.5 per counter p.a.

5.3.2 Suspense Account Print

The suspense account was taking an excessive time to print under certain
circumstances, giving the appearance of a system hang. A fix to improve the
performance was issued in two parts. The history of such incidents is provided below.

© 1999 ICL Pathway Ltd COMMERCIAL IN CONFIDENCE Page 14 of 29
FUJ00119614
FUJ00119614

ICL Acceptance Incident 298 — Resolution Ref; CR/ACD/298

Version: 0.8
Pathway Plan Date: 23/9/99

Suspense AIC Print Hang

0.35
03
0.25
02
0.15
0.1
0.05 i -—

fj

CAP19 CAP20 CAP21 CAP22 CAP23 CAP24 CAP25 CAP26

5.3.3 Virtual Memory Problems

Two problems have been observed which result in progressive memory leakage. (In
these circumstances application routines are obtaining virtual memory from Windows
but not freeing it correctly after use, leading to eventual virtual memory exhaustion.)
The reported symptoms include very slow system operation, virtual memory messages
being displayed and, occasionally a Windows shutdown and reboot. The principal
problem was memory leakage associated with the Print Monitor routine, which
resulted in a substantial loss of virtual memory during print operations. This was fixed
in WP 5408. A further residual, but relatively minor, problem associated with the cash
account reprint function has been diagnosed. A fix (lower priority) will be issued for
this in the future.

Virtual Memory Incidents (including Slow M/C & Dr

Watson)

14 +

1.2 1

iu Le sans 10d I

Tapas I

08 WP 5244 - 23 +

os ny I

o4 4H /
o2+4 +4

: ik

CAP19 CAP20 CAP21 CAP22 CAP23 CAP24 CAP25 CAP26

5.3.4 Printer Hanging

Several problems were detected which result in back office printer hang-ups under
various specific circumstances. A fix for one class of problem, associated with
memory leakage, has already been distributed as part of WP 5408. This has reduced
the average incidence of such hang-ups. A second problem associated with printing

© 1999 ICL Pathway Ltd COMMERCIAL IN CONFIDENCE Page 15 of 29
FUJ00119614
FUJ00119614

ICL Acceptance Incident 298 — Resolution Ref; CR/ACD/298

Version: 0.8
Pathway Plan Date: 23/9/99

the 2 final copies of the cash account was identified, using results obtained from a
diagnostic fix distributed to the live estate. The fix to the cash account print routine
was issued on the 7 September. There have been no occurrences during the following
two weeks (CAPs 24 & 25)

Cash AIC Print Hang

3 Fix: WP 5408 - Ibis
25
2
15 wn £299
¥
0.5 4+4
0 a

CAP19 CAP20 CAP21 CAP22 CAP23 CAP24 CAP25 CAP26

The residual count shown under CAP 24 relates to incidents from Thursday 2"!
September.

5.3.5 Freezing during /after log-on

A number of incidents were observed in which the system froze after user log-on to
Riposte. On detailed investigation these were all connected with the Riposte (35 day)
message archiving procedure. After log-on various Riposte checks are called to trace
message sequences for integrity and (potential) recovery requirements. It was found
that certain of these routines were attempting to check message sequences which lay
beyond the message archiving window, resulting in system lock-up when the
messages could not be accessed. Three fixes were issued covering APS recovery
routines and Stock Unit checking.

Freezing during/after Log-on

[wp s406 - 24/

v

Hii

CAP19 CAP20 CAP21 CAP22 CAP23 CAP24 CAP25 CAP26

An occasional occurrence of freezing during log-in (prior to entering Riposte) has also
been detected and this residual error is under investigation. Some instances of System
Busy incidents have been discovered to relate to freezing after log-in, which accounts

© 1999 ICL Pathway Ltd COMMERCIAL IN CONFIDENCE Page 16 of 29
FUJ00119614
FUJ00119614

ICL Acceptance Incident 298 — Resolution Ref; CR/ACD/298

Version: 0.8
Pathway Plan Date: 23/9/99

for the significantly higher incident rate in CAP 24. Note that the “Double F1”
problem (immediately following section) is also related and has had a significant
effect on incidents during week 25.

5.3.6 F1 Twice during log-on

This was a specific condition associated with incorrect handling of double keystroke
“F1” during log-on (to navigate directly to “Serve Customer”) which could result in a
system hang. A fix was issued for this (WP5406), which left a residual problem with
certain OBCS book operations. A re-implemented fix was issued to cure this - see
Section 5.3.12. A second fix to eliminate a small residual occurrence of the

“F1” condition is under test at the time of writing and will be issued to the live estate
during week commencing 13" September.

Freeze following F1 twice during Log-on
0.14 psn
0.12 iT }-—f +--+
0.1 1
0.08 /
0.06 ¥
0.04 +4 — +
0.02 +4 = == =
0
CAP19 CAP20 CAP21 CAP22 CAP23 CAP24 CAP25 CAP26

5.3.7 System Busy Message

This was introduced following discussion (via CR & Pathway CP2134) to provide
visible indication to the user when the system is busy, particularly during longer,
complex operations such as processing the cash account. This was distributed in WP
5407. The introduction of this message has itself resulted in a number of Help Desk
calls, which have also been tracked and analysed. An improved version of the busy
monitor routine was distributed (week commencing 6" September); this monitors only
resource usage associated with the Riposte desktop and invoked applications. (The
original utility monitored the total processor usage and could display the hourglass
when background routines such as NT or Tivoli functions were consuming resource.)

© 1999 ICL Pathway Ltd COMMERCIAL IN CONFIDENCE Page 17 of 29
ICL

FUJ00119614

FUJ00119614

Acceptance Incident 298 — Resolution Ref: CR/ACD/298

Pathway Plan Version: 0.8

Date: 23/9/99

System Busy Message (exc printer/log-on)

254 WP 5446 - 26/8

2 v we 5489-1297
16 .

v I

: fm,

CAP19 CAP20 CAP21 CAP22 CAP23 CAP24 CAP25 CAP26

A minor problem has been detected with the operation of the Busy Monitor, in that
after a few seconds it can partially obliterate a system message displayed on the screen
if there is a printer problem when printing a Giro transaction. (This can occur when
EPOSS is cycling awaiting the user response before continuing.) The touch panel is
not disabled under these circumstances and the Help Desk will advise users to
complete the response to the printer prompt, thereby allowing normal operation to
continue without reboot. A “fix” to provide reworking of the Giro printer dialogue
will be issued in due course. From CAP 24 specific problems associated with printer
busy and log-on freezes have been separated into their own categories.

Note that it the clerk may legitimately return from a screen to a previous, having set
off a print or transaction log query, and then undertake a second or third intensive
transaction. A number of occurrences of the system busy condition are believed to
result from such clerk initiated sequences. A block on the “previous” button is being
investigated to preclude such behaviour.

5.3.8 Query Logged-on Users Message

This was a specific problem that occurred during various operations when a user
incorrectly received a message querying details of logged-on users. This was fixed in
WP 5406, which has eliminated the problem.

© 1999 ICL Pathway Ltd COMMERCIAL IN CONFIDENCE Page 18 of 29
ICL Acceptance Incident 298 — Resolution Ref:
Pathway Plan Version:

FUJ00119614
FUJ00119614

CR/ACD/298
08
23/9/99

06
05
0.4
0.3
0.2
0.1

Query Logged on Users Message

v

CAP19 CAP20 CAP21 CAP22 CAP23 CAP24 CAP25 CAP26

5.3.9 Miscellaneous Freezing / Usage

There have been a few occurrences of miscellaneous screen freezing during usage,
mostly within Stock Unit declaration and balancing operations. A few reported
occurrences were associated with virtual memory problems and are resolved with the
fix identified in section 5.3.3. Several occurrences resulted from attempts to access
message sequences beyond the 35-day archiving period and other occurrences are
associated with multiple button pressing.

25

Miscellaneous freezing not in other categories

CAP19 CAP20 CAP21 CAP22 CAP23 CAP24 CAP25 CAP26

Diagnosis continues on these and appropriate fixes will be issued in due course.

5.3.10 Counter Printer problems

Two specific problems have been identified with counter printer operations. One was
associated with the failure to print a second APS receipt, resulting in a subsequent
system hang; this was fixed as part of WP 5406.

© 1999 ICL Pathway Ltd

COMMERCIAL IN CONFIDENCE

Page 19 of 29
FUJ00119614
FUJ00119614

ICL Acceptance Incident 298 — Resolution Ref; CR/ACD/298

Version: 0.8
Pathway Plan Date: 23/9/99

Counter Printer Problems exc. System busy

35

3 Lwp $406 = 24!
25

18 b

0 el i
0

CAP19 CAP20 CAP21 CAP22 CAP23 CAP24 CAP25 CAP26

A second problem, associated with incorrect handling of printer failure conditions
within the Giro transaction printing routine, has been identified and work is
progressing on detailed diagnosis and resolution.

5.3.11 APS Problems

A number of APS application problems associated with receipt issue were identified
(including the second receipt problem identified above).

In certain circumstances a failure in the APS receipting routines could leave buttons
locked and a transaction on the stack. This was also fixed as part of WP 5406. A
further fix was issued as part of the system freezing work (WP 5208) to specifically
identify to the user the presence of APS recovery operations since this could give the
appearance of a system freeze.

APS issues

3
25 rr
WP 5406 8

2

15 ro

au =
05

0 of =

CAP19 CAP20 CAP21 CAP22 CAP23 CAP24 CAP25 CAP26

As can be seen, the overwhelming majority of APS related problems have now been
eliminated.

5.3.12 OBCS Problems

The “Double F1” fix (see section 5.3.6) which resulted in problems with jumping
screens during OBCS transactions (rather than normal screen navigation) introduced a
further problem. This showed up on Help Desk call analysis as a significant problem

© 1999 ICL Pathway Ltd COMMERCIAL IN CONFIDENCE Page 20 of 29
FUJ00119614
FUJ00119614

ICL Acceptance Incident 298 — Resolution Ref; CR/ACD/298

Version: 0.8
Pathway Plan Date: 23/9/99

following the “Double F1” fix. The majority of the problems were addressed by
WP5490; a fix relating to one further circumstance was included in WP 5405.

OBCS problems
WP 5490 - 28/

3 “Double

25
I y

2 +
15

1 =
05

0 eal

CAP19 CAP20 CAP21 CAP22 CAP23 CAP24 CAP25 CAP26

There have been no further recurrences of the problem.

5.3.13 Counter Printer Busy Problems

One particular class of problem shown up from the “system busy” indicator relates to
a continuing counter printer busy condition returned to the application These have
now been classified as a particular incident type in their own right (from CAP 24).

Counter Printer Busy

03
0.25 /
02

0.15
04

0.05

CAP19 CAP20 CAP21 CAP22 CAP23 CAP24 CAP25 CAP26

A fix for the Riposte Peripheral Server is currently under test and is expected to be
issued to the live estate during CAP 26.

5.4 Resolution of Incident Metrics
Pathway notes the POCL proposed metric of 1 system “lock-up” or “crash” (requiring

reboot) per counter PC per annum.

The Pathway position is that this is an unrealistic and unwarranted requirement to be
placed on the Pathway Solution.

© 1999 ICL Pathway Ltd COMMERCIAL IN CONFIDENCE Page 21 of 29
FUJ00119614
FUJ00119614

ICL Acceptance Incident 298 — Resolution Ref; CR/ACD/298

Version: 0.8
Pathway Plan Date: 23/9/99

5.4.1. Contractual Requirements

There is no contracted Service Level which Pathway is required to meet relating to
lost time associated with OPS system stability incidents. (Lost time at the counter may
contribute to an increase in the volume of fall-back transactions which may fall within
the service reporting requirements of individual services - EPOSS, APS and OBCS.)

5.4.2. Comparison against Industry Norms

The POCL proposed level is unrealistically high when compared against normal
operational usage of complex distributed systems based upon Windows NT. Typical
industry norms of I event per month are reported. It is noted by both parties that a
periodic planned “preventative” reboot, outside prime usage time, may be a sensible
measure to help reduce the incidence of unplanned reboots.

5.4.3. Acceptance Position

AI 298 was raised against Requirement 536, on the basis of Live Trial usage
experience.

The planned acceptance testing associated with this Requirement was fully completed
with no outstanding issues. This comprised a combination of detailed technical test
and a review of the technical specifications of the relevant equipment.

ICL Pathway has accepted that there have been some incidents at outlets, which have
affected certain aspects of system operation. As detailed within Sections 5.2 and 5.3
there has already been a significant reduction in such incidents from the earlier levels
in June and July when this Al was raised. Pathway set an internal target of one
(authorised) reboot per month per counter and proposes that achievement of this level
reduces the incident to a medium severity. The levels of lost time associated with the
current incident rate fall well within this yardstick.

5.4.4. Resolution Proposal

POCL has indicated a desire to associate this incident with a further metric which
would represent an “acceptable” level of operation with respect to the occurrence of
system incidents prior to the full outlet rollout.

ICL Pathway will use all reasonable endeavours to reduce the incidence of
interruptions to normal counter operations resulting from the use of the OPS platform
and the Riposte desktop functions. Pathway has set a longer term (6 months) internal
target of 1 Help Desk authorised reboot incident per counter per 4 months measured
over the actual population of rolled out counters. Workarounds taking longer than

© 1999 ICL Pathway Ltd COMMERCIAL IN CONFIDENCE Page 22 of 29
FUJ00119614
FUJ00119614

ICL Acceptance Incident 298 — Resolution Ref; CR/ACD/298

Version: 0.8
Pathway Plan Date: 23/9/99

four minutes will counts as reboots. This represents a fourfold improvement beyond
the initial target.

Monitoring during Oct/Nov

The success criteria in relation to this AI to be evaluated in November in relation to
the continuation of national roll-out in January 2000 should have the following
characteristics:

e The number of outlets installed within the live estate at 1‘t October, providing this
number is at least 750, or if less than 750, the number at the end of the week
during which 750 outlets is achieved.

e Incidents to be quantified in “units” where:

e Help Desk authorised re-boots and Office Snapshot Print Previews to count as
one unit;

e Other workarounds to remove invalid no-entry signs to count as half a unit;

e New workarounds to remove the need for re-boots (such workarounds to take
less than 10% of the combined reboot and recovery time) to count as a half
unit (those exceeding 10% to count as one unit).

e The rate of occurrence measured over the 4-week period to mid-November 1999
(CAP 31-34) should average no more than one unit per counter position per 3
months.

In addition, ICL Pathway will be entitled to continue the good business practice of
planned reboots outside working hours not to exceed one per month per counter
position.

Ongoing improvement and longer term

It is important to recognise that ICL Pathway is strongly motivated to reduce such
incidents as they directly affect its own costs through staffing levels required at the
Help Desk. The Pathway Help Desk model and projected staffing levels are consistent
with this approach. For ICL Pathway this equates to a requirement to deal with up to
700 such calls per week as the outlet population increases over the next six months
(and the incident rate falls). Clearly Pathway will be strongly motivated to seek any
further possible reductions in incidents to reduce the corresponding call rate applied to
a full estate.

For POCL the achievement of the ongoing target of 1 reboot per 4 months would
result in a predicted loss of service of the order of 6.25 minutes per counter per month.
For a typical outlet operational period of 42 hours per week this equates to a loss of
service of < 0.06% per counter. In reality lost customer service time is likely to be
significantly less than this since the above calculation:

(i) makes no allowance for the possibility of directing customers to other counters
during an incident

© 1999 ICL Pathway Ltd COMMERCIAL IN CONFIDENCE Page 23 of 29
FUJ00119614
FUJ00119614

ICL Acceptance Incident 298 — Resolution Ref; CR/ACD/298

Version: 0.8
Pathway Plan Date: 23/9/99

(ii) makes no allowance for that proportion of incidents which occur during back
office processing and have no direct impact on customer service.

The incident analysis which has been jointly undertaken to date and the improved
level of understanding of system usage within the live outlets both suggest that the
target will be met within the projected 6 months. The most recent rate of authorised
reboot incidents is approaching half the initial target level, leaving a further required
halving to reach the final target. Pathway has undertaken analysis of several
outstanding incidents and diagnosed the detail of the problem. Software fixes will be
progressively released following regression testing which will see a further reduction
on the current incident rates towards the target. Hence the progression towards the
target is already substantially underpinned by known, diagnosed problems which are
awaiting fix issue.

5.5 Improved Defect Removal for Future Releases

The level of testing conducted on the Pathway solution has by any standard been
exceptionally high (over 100 dedicated testers, a staggering array of test environments,
at a cost of 10s of £Millions). The large, complex and distributed nature of the system
demands a sophisticated multi-layered approach to testing and integration. The
strategy was developed and agreed in conjunction with the sponsor organisations at
the outset, and was independently assessed during the treasury review as being
‘leading edge’. It has been maintained in the light of experience of Release 1, and is
currently again under review in respect of Release 2 (CSR). Of particular importance
here is the experience of the Live Trial period, and the lessons that may be learned to
further improve the Defect Removal rate for future releases, and so reduce the number
of incidents encountered in the Live Estate.

5.5.1 PINICL Analysis

A review is underway of all the PinICL fixes applied across the whole of the Counter
systems for the Live Trial Period. This period was split into 3, known as LT1, LT2,
and CSR. Initial findings, measuring up to 31/08/99, indicate that a total of 133
PinICLs were involved. Of these, 2 were data related (including 1 on POCL Reference
Data), 1 was build related, and 2 were purely administrational to introduce the
decommissioning of BPS, leaving 128 software faults to be considered in all. (It may
be of interest to note here that about 30 of these were for BPS, although this does not
have a material bearing on the analysis.)

Of these 128 faults, just 50 were actually raised from activity in the Live Trial. The
other 78 were all in fact raised during the course of testing. (Most of these were found
long before the Live Trial in Pathway’s System Test and Integration Test stages or in
the MOT/E2E test stages immediately before the Live Trial. These were the subjects
of agreed deferral via the KPR process, to allow for their controlled introduction

© 1999 ICL Pathway Ltd COMMERCIAL IN CONFIDENCE Page 24 of 29
FUJ00119614
FUJ00119614

ICL Acceptance Incident 298 — Resolution Ref; CR/ACD/298

Version: 0.8
Pathway Plan Date: 23/9/99

during the course of the Live Trial, to avoid destabilisation. A small number were
raised after the KPR, as a result of Pathway’s ongoing regression testing)

The records for these PinICLs have been analysed to determine the nature of the
defects concerned. As a result they have been categorised accordingly, to help assess
how best the Development Lifecycle, and in particular the testing and integration
approach, may be revised to best detect such defects earlier, and so better protect the
Live service. A large number of low level classifications were used, which can be
summarised into the following high level categories:

1. Usability/Robustness:
MMI, Menus, Button locking, No-Entry signs, Double key stokes, Cosmetics,
Enforcement of correct practice, Operational usability, Correct error handling, etc.
2. Stability/Performance:

Screen freezes, Printer hangs, Memory leaks, Blue screens, NT messages,
Archiving anomalies, Function performance.

3. Application Logic:

Plain software bugs.

Initial findings indicate that the 128 fixes applied to during the Live Trial (78 faults
found in Testing and 50 faults found in Live) can be categorised as follows:

Category Testing Faults Live Faults
Usability/Robustness 38 38
Stability/Performance 14 5
Application Logic 26 7

(To set these figures in context, overall testing has trapped several thousand defects,
commensurate with the great size and complexity of this system.)

The following conclusions can be drawn:

e The overall approach has been extremely successful in reducing the exposure of
the Live Estate to a very small residue of defects remaining in the system (which
the industry recognises can never be entirely eliminated, although there is always
room to improve).

e The incidence of defects discovered is demonstrably reducing over time,
indicating a steady improvement in overall system stability.

© 1999 ICL Pathway Ltd COMMERCIAL IN CONFIDENCE Page 25 of 29
ICL
Pathway

FUJ00119614

FUJ00119614

Acceptance Incident 298 — Resolution Ref; CR/ACD/298
Plan Version: 0.8
Date: 23/9/99

There is clear evidence that the majority of defects in the Usability/Robustness
category have been trapped during testing, despite this being a notoriously
difficult and expensive problem domain to address exhaustively through testing.

Nonetheless, the majority of defects escaping capture during test are in the
category Usability/Robustness, suggesting that there really is no substitute for
genuine Live exposure to flush out these types of defect (as per generally
accepted industry wisdom). It also suggests that this is the main area to target for
future improvement, offering more scope. Further to this, the report from the
EPOSS Defensive Test exercise was encouraging. It indicates that such short
focussed test activities, concentrating on particular aspects of system usage, can
have considerable success in removing defects both of the Usability/Robustness
and Stability/Performance categories.

(The EPOSS Defensive Test exercise was an additional test initiative introduced
by Pathway to satisfy test objectives relating to Usability/Robustness, which it
was recognised had not been fully met by the Model Office exercise and the
EPOSS Usability Trial.)

Testing has eradicated all but a very few remaining Stability/Performance
defects, albeit that these can impart a disproportionate effect on the Live Estate,
further suggesting the importance of a Live Trial or equivalent period, where the
impact on the business can be limited and controlled. The fact that a significant
number of such defects were still being discovered in these late testing stages
indicates that there is potential for improvement here also. It suggests that a
more detailed analysis of the precise circumstances of these defects should be
conducted to determine any common factors and to assess whether any benefit is
to be had from specific testing actions earlier in the lifecycle.

Testing has eradicated all but a very few remaining Application Logic defects.
Little scope for improvement in this area, other than the perpetual goal of earlier
discovery.

A further observation arising from the analysis would be that many of the PinICLs
arising in the Live Trial system had in fact been the subjects of earlier PinICLs raised
during the course of Testing. This is a common phenomenon. Typically it comes
about because for certain classes of defect (particularly where it is related to timing, or
multiple streams of activity in combination) the symptoms revealing the defect can not
easily be reproduced until the underlying defect is properly understood. Because it
cannot be reproduced the underlying defect can not be properly diagnosed. The faults
are then often put down to some flaw in the test environment, or the wrong code
versions being used, and the PinICL is closed ‘unable to reproduce’. There is no easy
remedy.

© 1999 ICL Pathway Ltd COMMERCIAL IN CONFIDENCE Page 26 of 29
ICL
Pathway

FUJ00119614

FUJ00119614

Acceptance Incident 298 — Resolution Ref; CR/ACD/298
Plan Version: 0.8
Date: 23/9/99

5.5.2 Implications for CSR+

A full review of the testing conducted for Release 2 (CSR) has already been
conducted and a proposal document has been drafted “Revisions to the Testing &
Integration Approach for Pathway Release CSR+”. Based on the findings above ICL
Pathway and POCL will jointly consider the following proposals, and as agreed
include them within a definitive version of that document. This review will take
place by 30/10/99 and the definitive version of that document will be published by
24/11/99 and brought into effect from this date

a)

b)

¢)

co)

e)

Analyse the precise circumstances of the defects in the Stability/Performance
category. Identify any common factors.

Analyse the precise circumstances of the defects in the Usability/Robustness
category. Identify any common factors.

From (a) and (b) above, establish any potential test points for existing testing
stages, and, as reasonably necessary, extend their respective objectives/review-
checklists accordingly. (Include Unit Test, System Test, and Conformance Test.)

As reasonably necessary, extend Code Review checklists to cover the specifics
from (a) & (b) above, with particular emphasis on the handling of exception
conditions.

Adopt the principles of the EPOSS Defensive Test exercise for wider application,
and in particular to mount earlier exercises specifically targeting those attributes
identified in (a) and (b) above. It is important that such test activities include the
involvement of design-aware ‘experts’ having intimate knowledge of system
areas subject to test and capable of targeting potential areas of weakness.
Involvement of Users should also be considered to address usability related
aspects.

Work with POCL in determining appropriate and agreeable alternative(s) to the
Live Trial for future releases, to allow each new product to be exposed to
substantial Live use, but with limited business impact, for an appropriate period
of time prior to general (national) release.

It should be noted that CSR+ has already benefited from the revisions included in the
Testing Strategy and will, in due course, from the additions listed above. A lifecycle-
wide review was also conducted earlier in the year, which resulted in a major
reorganisation of the Systems and Programmes Directorates into the new
Development Directorate. Amongst the initiatives introduced at that time were many
which addressed lessons learned from earlier releases to improve the Design and
Development stages for CSR+ and beyond. Of particular relevance here are:

a)

The formation of Delivery Units, focussing on particular Business and
Infrastructure areas, made up of mixed discipline teams combining Design,

© 1999 ICL Pathway Ltd COMMERCIAL IN CONFIDENCE Page 27 of 29
ICL

FUJ00119614

FUJ00119614

Acceptance Incident 298 — Resolution Ref: CR/ACD/298

Pathway Plan Version: 0.8

Date: 23/9/99

Development, Unit Test, and System Test, and so promoting higher product
quality levels and greater lifecycle awareness. As each Delivery Unit spans all the
platforms supporting the end to end business applications within their respective
areas, this will also help to address the risks previously associated with cross-
paradigm boundaries.

b) The formation of the Technical Design Authority, providing central support to the
Delivery Units, with particular responsibility to oversee the end to end design and
ensure the overall technical integrity of the solution as a whole. One activity
currently under way is the systematic retrospective reviews of the end to end
Designs across the whole solution for CSR+. It should be noted here that these
reviews are not restricted to targeted reviews of the changes at CSR+ but also
encompass those areas of CSR+ inherited from CSR. (For example, it is planned
to review the EPOSS, TPS and RDMC/RDDS systems on an end to end basis,
not just the minor changes made in these areas at CSR+.) Pathway will also
consider seeking the involvement in these reviews of other expert areas within
ICL, external to Pathway, to bring an independent view for certain critical areas.

c) General improvements in the areas of Design Review, Code review, Module Test
Review and Link Test Review.

d) Strengthening of Product Acceptance Test (on entry to Pathway) for 3" Party
developments.

e) Closer working relationship between the Delivery Units and the Technical
Integration area to promote rapid environment stabilisation.

Pathway already has in train a set of initiatives to improve the defensive measures
deployed within the system in key risk areas. Much has already been done to introduce
interlocks within the counter applications to preserve and protect the serial
dependencies inherent in the VB runtime environment. This in large measure has
eliminated the specific ‘double entry’ and parallel process ‘hanging’ issues underlying
many of the Usability/Robustness and Stability/Performance problems. The future
strategic goal here is the gradual introduction of more generic defensive measures,
including a full cross-phase locking mechanism.

Following on from the recent investigations into residual memory problems under
certain complex scenarios, Pathway has decided to deploy the BUSY.EXE toolset
more widely in testing for CSR+, using it in a pre-emptive fashion rather than purely
as a debug tool.

ICL Pathway believes that introducing further changes to the Design and
Development stages (other than ensuring that good practice is maintained) would
result in only a marginal reduction of the defects in question. The majority of the
CSR+ functionality has now entered Link Test or System Test, so it would be sensible
at this stage to focus in these and later stages of the lifecycle.

© 1999 ICL Pathway Ltd COMMERCIAL IN CONFIDENCE Page 28 of 29
FUJ00119614

FUJ00119614
ICL Acceptance Incident 298 — Resolution Ret: CRAcD i298
ersion: 0.8
Pathway Plan Date: 23/9/99

© 1999 ICL Pathway Ltd COMMERCIAL IN CONFIDENCE Page 29 of 29