POL00129915 - post office LTD Board - Outputs of the Fujitsu Services review of Horizon Online

Evidence on official site

POL00129915

POL00129915
Post Office Ltd — Strictly Confidential
POL()
POST OFFICE LTD BOARD
Outputs of the Fujitsu Services Review of Horizon Online

1. Purpose

The purpose of this paper is to:

14 Inform the Post Office Ltd Board of the progress, outputs and action being

taken as a result of the Fujitsu review of Horizon Online following a series of
significant incidents in the branch network.

2. Background

2.1 As a result of the four major service outages experienced in an eight month
period, Fujitsu were requested to undertake a review of the Horizon Online
service to identify measures to prevent similar issues from occurring again.

2.2 For each of the incidents analysis was performed to identify the root cause.
In response, a number of specific process improvement actions and technical
fixes were implemented through the standard service management
procedures. A process to establish lessons to be learnt from the incidents
was then applied. From this we were able to attribute these incidents to the
following generic causes:

« Hardware Failures
« Reference Data Distribution Issues
« Release Process Failures

Date Incident Cause
27/07/2011 I PIN Pad transactions were unavailable between 08:00 Reference Data’
and 14:30. delivery
12/12/2011 I Banking Transactions were unavailable between 12:54 Hardware failure
and 14:30.
01/02/2012 I Post Office Card Account (POca) transactions were Release
unable to complete. In some branches Automated Management
Payments (e.g. utility bill payments), E Top Up and a process failure

small number of Banking transactions were also
affected. The service was impacted between 08:00 and
11:15.

01/03/2012 I 95% of transactions were unable to complete between Hardware failure
11:00 and 14:30.

The initial reviews undertaken by Fujitsu have focused on these generic
areas. These reviews have now been completed and a series of
short/medium term improvement actions have been agreed with Post Office.

The recommended improvements to the service will be delivered by Fujitsu at
their cost, and will be monitored through the Service Improvement Plan,
which is a process undertaken by Fujitsu and Post Office to deliver
continuous improvement.

3. Improvement Actions Summary

' Reference Data — data that is stored on the counter terminals and controls which transactions are available at
each branch, and how they function.
3.1

3.2

3.3

3.4

POL00129915

POL00129915

Post Office Ltd — Strictly Confidential

For each of the areas which caused the service outages the following key
improvement activities have been identified. These include process
improvements, which address weaknesses that had not been exposed until
the incidents, and additional functionality for performance and monitoring
delivered through software upgrades.

Hardware Failures

Service monitoring and fault diagnostics will be improved and upgraded. This

will improve the ability to identify issues before the live service is impacted
and reduce the response times when incidents occur. This will not only
provide monitoring that would have helped resolve the specific incidents that
caused the service outages, but will improve monitoring and response times
across several areas of the Horizon system.

Improvements will be made to the server hardware monitoring and
failover processes.

Performance Analysis Tools will be upgraded and will include
additional functionality to analyse the performance of the network
server hardware (the cause of the service outage on 01/03/2012).
Transaction flow simulation modelling will be enhanced to include
additional key data paths. This enables support teams to use servers
that simulate the live traffic across the Post Office network. They can
then quickly determine which data paths are working and focus on
problem areas, allowing network engineers to locate faults more
quickly.

The operating system software which supports the network routers will
be upgraded to improve the robustness of the platforms. These
upgrades will improve elements of the network, how we switch across
the network devices and how the transaction routers integrate with the
network server hardware.

Scheduling of the upgrade activities by Fujitsu Release Management will be
completed by the end of May.

Reference Data Delivery

Testing of reference data has been improved to ensure data is applied
correctly to the counters in the branch network. The following actions have
been implemented:

Additional automated data traps have been introduced to intercept
data errors before they are released to our branch network counters.
When data is released it will be enabled on one counter 24 hours
ahead of the rest of the Network. This provides a window to correct
any issues and re-deliver data before branches are impacted.

The processes for releasing data have been re-structured to allow
additional time for data proving. This has resulted in an additional 5-6
hours for validation on the day of release.

New automated validation reporting has been introduced to check
data before it is released.

Risk assessments have been introduced to ensure adequate testing
and validation has been undertaken before data is released,
minimising the risk of incorrectly keyed data impacting the live service.

Managing Releases
POL00129915
POL00129915

Post Office Ltd — Strictly Confidential

Improvements will be introduced to the governance processes that control
project developments and their release into the live environment. These are
targeted to be completed by the end of July.

¢ Additional levels of documentation for release handover to business
as usual teams will be introduced to ensure Fujitsu Service Delivery
fully understand the changes that are going live. This will improve
their capacity to respond to any service issues.

e¢ Communications processes and process checks will be introduced to
ensure test and reference data environments are maintained to the
same baseline as live (version control of the test and reference data
environments was the cause of the service outage on 01/02/2012).

e Specific owners will be assigned to manage maintenance releases.
This follows the process for major releases and will ensure
accountabilities are clear.

e Areview and re-definition of the roles and responsibilities within the
project delivery processes will be undertaken to ensure
accountabilities and responsibilities are clear.

¢ Additional governance will be put in place to ensure planning reflects
achievable milestones and has contingent timelines. Conflicts and
contentions will be resolved through improved ways of working with
POL.

4. Residual Risk

4.1 There is a residual risk inherent in running a system as complex as Horizon
Online and it is impossible to eradicate service incidents entirely. However,
there has been a thorough review of the incidents that have occurred and
lessons have been learnt and remedial actions agreed.

4.2 To ensure the robustness of the reviews Fujitsu involved Subject Matter
Experts from both the Post Office account and other client accounts within
their business. They have also engaged independent reviewers from Cisco
?and EMC®, whose recommendations have been shared with POL.

4.3 The review activities and outputs were subject to assurance and acceptance
by Post Office IT & Change teams.

4.4 Inthe areas that have been the focus of the reviews, no issues have been
identified with the core architectural design, but improvements have been
identified in the way that change is implemented and the live service
monitored and maintained. There is, however, a strategic review of the Post
Office end to end IT architecture currently being completed by an external
consultancy, KPMG. This may expose further issues or areas of risk, e.g.
limitations associated with the active/passive data centre configuration of
Horizon Online. This review will be complete by the end of June.

5. Recommendations
The POL Board is asked to:
5.1 Note the contents of this paper. Lesley Sewell

Chief Information Officer
May 2012

? Cisco — a leading supplier of network solutions, including hardware, software and support services.
> EMC - an upper-quartile provider of network storage solutions.