FUJ00086553 - Peak Incident Management System Report

Evidence on official site

FUJ00086553
FUJ00086553

Psa Sa aa ae em ee mem ee
Peak Incident Management System

Call Reference PC0033128- Call Logger

Release : Targeted At Unknown Top Ref

Call Type ‘null Priority

Contact ‘ ‘Deleted Contact ~~ Call Status

Target Date 30/10/2000 Effort (Man Days)

Summary eo PM - Stock Unit Intergration = Dugannon 43K ~~

All References Type a oe “Value :

a Customer reference : 10000035
Call reference _ PC0032801

Other £9911040163
Customer reference BSM19991110001

Progress Narrative

POA Deleted User -- Deleted Team
10000035

Closed -- Initial
0

jDate:10-Nov-1999 15:43:00 User:Ann-Marie Dick

CALL PC0033128 opened

References entered are:-

It Other : £9911040163

call reference : PC0032801

Product EPOSS & DeskTop Balancing added

ftarget Release entered: Unknown

PM - Dugannon PO £43k discrepancy

atlet has a discrepancy of £43,000 after balancing SUs and doing office
snapshot .

Phil Turnock POCL BSM has advised outlet on this weeks balance. Steve Warwick
evelopment is investigating why this mis-balance occurred.

limmediate impact of this weeks balance has been addressed but POCL are
concerned that the cause is still unknown and this will affect this and other
utlets.

ALL PCO033128:Priority C:Calltype 2 - Target 20/12/99 20:00:00

jhe Call record has been assigned to the Team Member: Paul Curley

befect cause updated to 99:General ~ Unknown

jours spent since call received: 0.5 hours

Date:10-Nov-1999 17:16:00 User:Ann-Marie Dick

teve Warwick believes that this may be an isolated incident as we have no
similar reports from anywhere else, The branch manager and POCL agreed to
jamend the WK32 CA balance figures manually and send into POCL. We anticipate
33 will balance as normal. PO to be monitored until week 33 balance
jcompleted.

Date:12-Nov-1999 11:32:00 User:Ann-Marie Dick

Rang Dungannon PO this morning (12/11/99) and confirmed that Week 33 balance
Went ok. Spoke with John Rovinson Post Master who confirmed that “everything
went ok".

IT will monitor progress of the incident which is now with the EPOSS
evelopment teams for investigation. Paul Curley 12/11/99

Date: 23-Nov-1999 15:05:00 User:Julie Welsh

ithe call references have been updated. They are now:~
jt Other : £9911040163

all reference : PC0032801

ustomer reference : BSM19991110001

Date:03-Dee-1999 11:10:00 User:Janet Reynolds

29/11/99 15:29 - By Dave Fletcher

It have talked with development ref this problem. It is seen as a one off. No
fault can be found and developments do not expect to be able to find a fault
With the evidence available. There is no additional information available as
jevidence. I suggest this call be placed on monitor for 1 month.

jpate:23-Dec-1999 09:56:00 User:Janet Reynolds

Update by Paul Curley:

this problem was discussed at the HSRF 22/12/99 where it was agreed to keep
the problem open.

he problem will remain on monitor

fo date, no further occurances have been reported from Dungannon or similar
instances detected elsewhere.

paul Curley 23/12/99

F/22/1
FUJ00086553
FUJ00086553

Date:13-Jan-2000 14:53:00 User:Janet Reynolds
jpdate by Paul Curley

jo further reports of this type of problem detected from other outlets.
jbungannon has not reported any further instances.

investigations have still continued into the initial incident but so far
findings have been inconclusive.

call will remain on monitor until end of January.

Paul Curley, 13th Jan 2000

jDate:18-Feb-2000 17:07:00 User:Janet Reynolds

jpdate by Paul curely:

Update - 3rd Feb 2000

Support rechecked the message store for the 03-Nov-1999, SU 1A was rolled on
ounter 5 and the office was rolled on counter 1. This outlet is a six
ounter office.

Further examination of the event logs for these two counters indicate that
ounter 5 looks suspect (C drive nearly full and big gap of no messages).
‘alls from PO into HSH for period between 30-Oct and 10-Nov indicate a reboot
(counter not specified, but would tie in with counter 5 event log) on
aturday 30-Oct-1999.

jthe evidence in the message store was that messages continued to be written
jto the message store but that all the ‘Payment’ transactions which should
ave been recorded in the rollover trailer messages failed to appear
(although others did, such as the Rem OUT and Transfer OUT totals).

this indicates that the problem was not one of running out of Disk space but
f failing either to retrieve, or write out, transaction totals for one
articular node in the node hierarchy.

lciven that there were known problems with corrupted Persistent Object indexes
lat about this time, it is possible that an update to an EPOSSNodes object
failed to be registered correctly at the outlet, causing the node
cumulation to fail.

tt was decided to prove this out by deleting the ‘Payments’ node in the node
ierarchy and then running the SU balance, to attempt to identify the root
cause of the problem. Call passed to testing to be scheduled.

Ppdate 18th Feb 2000

The test was carried out on 16th February as follows: delete the Payments
IEPOSSNodes object before

jproducing a SU balance, on a version of the current live system (CI2.2R).
hen trying to print the Payments part of the SU balance, the missing node is
detected by the system and an error tablet with message "A system error has
curred whilst printing. Please ring the helpdesk. Error at 67640." is
jgenerated. So the balance could not be finished.

this type of error trapping error trapping was introduced at the end of last
ear when resolving AI298 issues and we are investigating if the outlet did
not have such error handling when the problem occurred.

certainly, with the current system, a missing Payments node now would not go
jundetected.

he problem is currently back with development for further investigation.

Date:09-Mar-2000 11:20:00 User:Janet Reynolds
jthe call summary has been changed from:
JPM - Dugannon PO £43k discrepancy

the call summary is now:-

IPM - Stock Unit Intergration - Dugannon 43K

the call references have been updated. They are now:-
ther : £9911040163

‘all reference : PC0032801

customer reference : BSM19991110001

jt customer reference : 10000035

Date:17-Mar-2000 15:01:00 User:Janet Reynolds
jpdate by Paul Curley:

londay 13/03 -

kmail chaser to Steve Warwick/ Les Ong to determine progress.

[thursday 16/03 -

Response from Steve Warwick

ralthough there is nothing more we can do about this incident and have
jexhausted all lines of investigation with the evidence available, we had a
further occurrence of a very similar nature at another office in CAP 48.

this has been documented in PinICL 39313 and is under investigation at the
omen".

IT propose this problem is dicussed at the XDMF on Monday.

fthe similar occurrence is currently an incident and I am investigating if the
imilarities are such that we can add this into this problem.

paul 17/03/00

fhe call summary has been changed fro
JPM - Stock Unit Intergration - Dugannon 43K
fthe call summary is now:-

IPM - Stock Unit Intergration - Dugannon 43K

Date: 07-Apr-2000 09:40:00 User:Janet Reynolds

F/22/2
pdatie by Paul Curley 04/04/2000
[21/03/00
Discussed this issue at XDMF and it is thought that a similar incident has
curred at FAD 025511 Yate Sodbury to the value of £52,814.29p, the problem
luill now remain open and be appended with this information for investigation.
[23/03/00

further occurrence has arisen at FAD 158410 Appleby in Westmorland value
e9, 368. 40p.
chased call with development and spoke to Martin McConnell. Martin has made
jextensive investigations on the issue and using the message stores from the
utlet, has been unable to recreate the fault. Martin is will recommend that
la diagnositic patch is developed and issued into the estate to trap any
future occurrances.
lfscalated the issue to Chris Wannell (development) who will discuss options
with Martin and Steve Warwick,
30/03/00

hris Wannell reported back that a diagnosic fix was being prepared and was
© be submitted to the next Release Management Forum to authorised release
into the live estate. Development have also identified the following
jactivities that are in the area of stock unit integration and are therefore
being tracked as being relevent to this problem.

3/04/00
lan incident has identified a stock unit that attempts to commit discrepancies
jto the messagestore via EPOSSCore and fails. The reason this fails is because
ff unit price checks on the discrepancy. A fix has been developed and is
currently in testing.
futher checking of progress of the diagnositic fix for installation onto the
}dataserver shows it has been developed but not yet released for testing.

FUJ00086553
FUJ00086553

Date: 28-Apr-2000 14:51:00 User:Janet Reynolds
Ipdate by Paul curley:

fhe fix is still in testing . Paul Curley has agreed with Phil Turnock that
nce the diagnostic fix has been deployed this problem will be put on monitor
tatus for 12 weeks.

jDate:18-May-2000 13:13:00 User:Janet Reynolds

Update by Paul Curley:

3 of Tuesday 16th May the software diagnostic patch now distributed to 99%
f estate (the stragglers are Non polled offices, closed offices etc and are
being worked through as they come online). Therefore as agreed with Phil
flurnock this problem can now be placed into "monitor" mode to await the next
ecurrance.

Paul 18/05/99

jDate:16-Jun-2000 13:52:00 User:Paul Curley
ftwo additional call numbers have been identified with simliar problems they
re PCO045847 § PC0043811 wrote to development to progress on Friday
09/06/2000.

16/06/00 - No progress reported from Development calls are still being
investigated - Paul Curley

If} Response :

jpdate 16/06/00

[END OF REFERENCE 19206800]

lew target date set 30/06/00 21:00:00

Responded to call type Z as Category 5 —Monitoring

ithe response was delivered on the system

Date: 04-Jul-2000 10:40:00 User:Paul Curley
oot cause of stock unit integration problem,
jpata trees have been failing to build fully, and the system has not been
Jdetecting this. Consequently, discrepancies in the balancing have been
curring. In the case of Dungannon a whole Payments node was missing. There
ave been a number of calls relating to this kind of issue.

fix has been put in at C14 which will prevent this happening.
the root cause identified under PC43811 is as follows:-
tDataserver trees have failed to build. This has now been fixed in CI4 and in
conjuction with CP2587 (where the data tree rebuild is minimised to 2
jattempts instead of 4) should return an abort right back up to the user to
retry the balancing process. Instances where this can potentially occur is
for example if the Riposte service has stopped/failed/unable to complete an
TO request issued etc".
Jr) Response :
jProgress text relating to root cause identification added by Paul Curley
4/07/20. It is suggested that this problem is closed as part of the C14
pgrade analysis agreed between PON and ICLP. Paul Curley
[END OF REFERENCE 19717055)

lew target date set 30/09/00 18:00:00
Responded to call type 2 as Category 5 Monitoring
Ithe response was delivered on the system

Date:09-Oct-2000 12:39:00 User:Paul Curley

F/22/3
FUJ00086553
FUJ00086553

F/22/4