Attendees:
Fujitsu: James Davidson, Mark Gordon, Graham Welsh, Stephen Long
Post Office: Dave Hulbert, Andrew Jacques, Kevin Lenihan,
Background
Key challenges arising from the meeting with Mike, Lesley & Stephen
POL00002031
PoL00002031
The following notes and actions were recorded as a response to the major incident on Wednesday 1" February. A review meeting with Mike Young
Long preceded this meeting within which a number of challenges were issued.
The incident on 1st February: - Until approximately 11:15, the incident impacted Card Account across the entire estate and 1% of AP transactions a
was caused by a reference data change which was successfully tested, but between the data being tested and it being enlivened, the Horizon prodt
due to Release 5.5,
In July 2011 a reference data change (unrelated) impacted the pin pads across all Post Office branches. Two other major service impacting incident:
September & 12 December 2011,
Ensure that testing is rigorous and robust.
Introduce business services checks prior to the start of each business day
Improve the clarity of the incident communications between Fujitsu and Post Office service teams
Hold off doing anymore reference data changes until corrective actions have been taken
Reduce the volume of reference data changes
‘Stop Issuing reference data in advance so that it sits dormant for a period of time
Feo
POL00002031
PoL00002031
[Actions Update W/C 13-Feb Update W/C 20-Feb Update W/C 27-Feb
Fujitsu Update - Validation undertaken and no
issues uncovered on the intall Snapshot.
Subsequent validation check did uncover an issue
ICurrent position - assessment undertaken and
mine which forthcoming r ta changes can milar to that experienced st Feb for
4 [ Determine theoming reference data changes canbe I. I requirement for tabulation of the deta types a [ar 0 that experienced on the 1st Feb
held back (short change freeze) and which can’t
impacts to be created
Data that was to be enabled 1st April.
lTwo code updates have been applied to live to
correct the identified issue of System enablementI
time stamping (MDM - Seconds & RDDS - Days)
Produce or review a set of reference data rules/protocols that
Defined in approved - joint documentation.
JOngoing - focus has been applied to other
2 Graham Rovio
Iwe should be working to: ! Request to challenge the current approach and _ identified actions this week.
review aginst best practice in etl or Banking
[wera Progra: Surman tobe Pas
or mgs have been updated to rete ve
Change the test processes to ensure that any ref data changes Isnapshot of current Ref Data and Counter es Ps
4 [nteveesatetathovertbeenenveredareretsted lessnsm a peter [ROSSER ne weck bwteen St Sgn of and leat to
allowing an release Or the et envionment 0 How Te wee been st Sgn Of ad lease
Ithe live environment will be at the point of enlivenment. Revist the roles / tasks of Verfication vs cefelct this. el i
egrestion Testing
obtain addtional POL testing resources (as outined in pestng Resources and Rigs updete to fellow -
5 ITonnvane's paper) to provide verification tests independent of I Dave H TOBFESSINE
the reference data team I(May relate to Pinpad Julyt1)
6 [conduct an indpendentrevew of thetestenironments I eteT oneane No Update
7 Ioundie changes together Kevin pines 9 2.8.3
Propose options or changing ie way which erence Bea al Ton? resouces Te ited
8 I data is sent to the live estate (eg Bundling) Graham & Kevin, [Outcome of 3 progress achieved
Peoies it vant wo SSNS Kamina ua tobe weatbavad onthe wo
sample to check & Regression Transactions (embedded) and "Key"
° ! sey) frarscton pes tobe ented
cme
pestontcpesss
Establish whet reeds tobe dane to ge the mode oie set vp
10 _Ito perform daily checks (early am, 7 days @ week) Include test I Andy In Progress -
Determine what dal check/routns could be orate and Fn acy roeressing wih ie a =
at inornatin coud eal shared with he Pst Ofice I Graham ial ES
u hat Information could be reacly shared we Post Office I Gr Ivalidation of a RDT "Branch" being run 24hrs in. jransactionsAssuranI \ReferenceDataReviI
teams eee) Soe
pavance five
12 _Iimproved communications to Post Office james / MarkG Ithereof.
B ec ceante, to the Post Office input & attendance to the Dave H lOngoing with Dave
x [Cordes walivouth sonar tstoF ReforinGert Yas angy —_ ragein Mfr Wak Troush
Te Sto wee esting To revew the progres sam TnE Pn Tn 1500 roped a compas Son
lection plan
Page 2
POL 1 FEB MI Action Tracker
Fie91/2
POLO00002031
POL00002031
1 Why was POCA impacted by an AP change
Because of the masking error in the AP tokens, POCA cards were being incorrectly associated to the highlands council acco:
2 Why did a change that successfully exited testing still cause a service issue
It is now understood that the Monthly Refdata Token update was validated on pre 5.5 release enviroments. To ensure
the Data is deployed to all counters prior to the enablement it has been usual practive to commence the delivery in
advance of the go live date.
In this case the deployment of this Ref Data update commenced on the 26th Jan with an enablement date of the ist Feb.
Live data centre migration to release 5.5 took place on 29th Jan. It is now acknowledged that the 5.5 Release introduced
Why did it take so long to determine the service impact i.e. confusion about whether other banks services were impacted
and whether/which AP clients were affected
In our view the initial service impact to the POCA service was very quickly identified. The secondary impact to AP
transactions is acknowledged to have taken longer to understand due to the nature of the incident, and the low volume
of AP transactions impacted (<1%)
The nature of the error resulted in detailed analysis being required to identify the transactions affected and the
associated Clients both at a detail and summary level.
We did communicate that we had reports that AP transactions were being impacted. At the time, this initially generated
a similar level of confusion, given that the linkage between POCA & general AP transactions could not easliy be
identified - as they should be different.
4 Why wasn’t the communication process between IMT and POL Service Desk effective (my perception)
It is acknowledged that communications were not as effective as we would like, given your perception. We believed that
we did follow the lessons previously identifed in the joint communications workshop, chaired by Gary Blackburn prior to
his move to the NTP, and subsequent reviews held with Tony Jamasb.
In future the Service Bridge will be attended by Mark Gordon or in his absence either Peter Thompson or Graham Welsh
will stand in.
What pre-start of day checks can be introduced, and what improvements in monitoring can be made to avoid us being told
about issues by branches (e.g. yesterday again it was branches calling HSD that triggered action) Asa result or previous MI
actions & processes had been put in place, however we are concerned that the process was not followed Point 2 Point
THD 19 DUMJEKL LY DEVEIAH SU CAINS UF LHVUBUL/ SLUVILY ~ WIG 1ULUD 1d PHL WELWEEH NE Vata GULUIGLEU KIELRD aU Witae
could be achieved via Counter tooling. Initial view is that Ref Data is the easier route and that Counter would require a
more substantive investigation and validation study.
w
6 What can we do within Fujitsu and across our teams to better protect the live environment from service failures; particularly
Currently as outlined in separate documents we are evaluating tactical options to protect live service.
The nature of this incident and the link to Reference Data results in limitations as to how effective these will be, as the
operational business needs of Post Office push the existing windows for updating the live environment, the result of
which increases the element of risk.
The containment activity against the risk needs to be agreed jointly, and as such we would wish to propose a joint risk
F/891/3