POL00030523
POL00030523
Peak Incident Management System
Call Reference PC0123319 Call Logger Deleted User -- ITU SV&I
Release Targeted At-- BI 3S81R Top Ref. FSTK_2 0 WP22731
Call Type Cloned call Priority C -- Progress restricted
Contact Deleted Contact Call Status Closed -- S/W Fix Available to Call Logger
Target Date 20/08/2005 Effort (Man Days) 0
Summary S80 E2E 2 : Shared SU Cash dec Gain after mig > TP
All References Type : Value
Customer reference 2783
Work Package IPWY_WP_ 22576
Clone Call PCO123647
Customer reference 3005
Work Package PWY_WP_ 22731
Clone Master PCO121925
Fast track fix FSTK_2_0_WP22576
Release PinICL PC0124725
Fast track fix FSTK_2 0 WP22731
Release PinICL
Progress Narrative
te:19-dul-2005 10:34:39 User:clifford Sawdy
CALL PC0123319 opened
Details entered are:~
summary:S80 E2E 2 : Shared SU Cash dec Gain after mig > TP
all Type:¢
Fall Priority:¢
ffarget Release: Future Unspecified
outed to:ITU SVEI - C1if£ Sawdy
JDate:13-gun-2005 09:04:56 User:Clifford Saway
ALL PCO121925 opened
Details entered are:-
ummary:S80 E2E 2: Shared SU Cash dec Gain after mig > TP
call Type:B
call Priority:B
ffarget Release: BI_3580
outed to:QFP - _Unassigned_
Isrc7 cycle 2 Day 6
POL have raised the following Incident:-
MA cash declaration was made in "Stock Balancing" for the amount diplayed on the Snapshot. When the Cash Variance was checked
fterwards a Gain of £45.05 was displayed.
Ihe dicrepency was the cash value of the transactions performed on the stock unit after rolling into Branch Trading.”
PoLs test was performed in a shared SU on the slave after earlier rolling from CAP to TP mode. The gateway SU had rolled to TP the
previous day.
laving rolled into TP mode a few txns were performed. Finally, a snapshot was printed and the cash was declared correctly with
HD:01, overwriting an the existng ID:01 declaration. The gain (shown on the Event Log) was the value of the SC txns performed in
Ee.
store to follow.
[Date:i3-dun-2005 09:41:52 Uscr:idonel Higman
he Call record has been assigned to the Team Member: Mark Scardifield
jDate:43-dun-2005 09:48:02
levidence Added = mstore
jser:Clifford Sawdy
501680 dual counter office
ate:13-Jun-2005 10:55:56 Uscr:Mark Scardifield
the Call record has been transferred to the team:
fthe Call record has been assigned to the Team Membe
S-Dev
Walter Wright
F/288.1/1,
7
POL00030523
POL00030523
Pateri4-dun-2005 15:15:36 Uscr:Lionel Higman
[the call Target Release has been moved to Targeted At -- BI_3S80
Te:i4-dun-2005 15:35:42 User:Walter Wright
the Call record has been assigned to the Team Member: Martin McConnell
jate?15-Jun-2005 09:50:17 User:Ray Donohoe
evidence Added - audit log 501680 for 11/6/05 at counter 2
‘e:i5-Jun-2005 14:02:32 User:Martin McConnell
[Start of Response]
Irrom the audit log it is clear that :
. There is no messageport notify event recorded in the audit.log when there should have been. (This Comes from Riposte). Without
this update, Dataserver will not have the figures ‘in its books' to be able to see past the figures that have been accumulated for
kash since the data tree was first built
b. At, or near the time of the 1st transaction, there is a VB trapped error indication of an ‘overflow’. However, because this has
been trapped, the system should still be able to continue (I've simulated the condition) and normal operations should still be ableI
to continue.
Still looking.
te
nd of Response]
sponse code to call type B as Category 40 -- Pending -- Incident Under Investigation
Hours spent since call received: 0 hours
Date:l6-Jun-2005 15:59:45 User:Martin Mcconnell
i've attempted to recreate the situation which occurred over a CA Rollover in this instance and the injecting a transaction like
the mails txns (<Groupid:501680><Id:2><Num: 6635, 6636>). Difference being that up here the trace has turned up a transaction
frecevied (16:24:15 DS:C/I 1,7505) and in the supplied audit log nothing at all appears for 6635 etc..
{e:29-Jun-2005 12:59:34 User:Martin Mcconnell
fter several days of attempting to recreate this problem with a keyboard controlling test program which has generated tens of
thousands of EPOSS transacttions, thousands of print preview cut off reports I have only seen one instance of this problem whereby
message does not get passed to EPOSS counter code via the message port interface.
m attempting a fresh run as of this morning (29th June) and will leave for a further 24 hours or so to see if I can see the
roblem at least once more.
in essence the problem is out of our control (within Counter development) and even if we pass this onto Escher, they will want some
ort of recreatable scenario. Therefore unless someone can come up with a sequence of actions at the counter that is able to
Ieproduce the failure ton demand all that we can do here is to give some statistical predications on what might happen in live
sombined with a workaround should the problem manifest itself.
11 update this peak tomorrow, with ‘final findings’...
‘€e701-dul-2005 09:30:54 Uscr:Martin McConnell
nly the one failure found in the several thousands of transactions generated thus far. Cliff has informed me that a repeat run of
this cycle is being repeated this coming weekend, and I'll be hoping to get them set up with performance diagnostics to run at the
ritical point from which the clerk logon is made.
$6:04-Jul-2005 10:28:53 User:Ray Donohoe
[Start of Response]
et up a Performance Process Monitor on BTCT on Saturday morning to capture the processes that led to this error first appearing
luring POL's counter testing on Day 6(w/e). Log file is attached.
POL were not able to recreate the error on Saturday.
[End of Response]
esponse code to call type B as Category 40 -- Pending -- Incident Under Investigation
fours spent since call received: 2 hours
[Date:04-sul-2005 1
Evidence Added - Proces
5 User!Ray Donohoe
2rformance Monitor
Te:04—Jul-2005 10:44:59 Uscr:Martin McConnell
l:vidence Added - Audit log for 1 rigs soak test
jDate:04-dul-2005 10:45:39 Uscr:Martin McConnell
evidence Added - Log file generated by soak te:
program
pate:04-Jul-2005 10:58:32 UscriMartin MeConnell
I've added a sample trace file for one of the test rigs where over 17000 transactions were applied? if we multiply this up
actor of 3 (number of days) times 3 number of rigs ~ 193000 transactions generated. Out of thie I have seen ONE failure, F/288.1/2
POL00030523
POL00030523
m this statistic there is, obviously a1 in 153000 chance of hitting this problem. Alas this problem 1s Still related to Some
temporary resourcing problem (the one I wittnessed seemed to be down to a print preview lockup?), but the system did go onto
ecover but leaving the figures stagnant in memory.
After spending more than a week on this I don't know there is anything more than can be done because:
h. The code is dependent on Escher's Notify mechanism, which MAY itself be beholdant of perhaps some NT or other system failure.
- We cannot reproduce this at will with any defined or prescribed pattern.
We can recommend a course of action to allow the clerk to get a fresh rebuild of the tree if he/she finds anything odd, simply
y logging off and on again, the dataserver tree will automatically refresh
[the only other course of action which T can recommend is for us to set up a ‘heartbeat’ to self test the state of the listener to
ee if the tree is reliable or not and if it is NOT then we reset the flags on the tree to start from scratch instead of relying onI
the accumulative effects of these 'notifications' since the last tree build. This is do-able but not something I would recommend at
this late stage in the test cycle.
[Date:04-dul-2005 10:59:24 User:Martin McConnell
passing to Mark for him to see my comments and to decide on the next course of action....
Jate:04-dul-2005 11:00:17 User:Martin McConnell
he Call record has been assigned to the Team Member: Mark $
cardifield
Date:04-Jul-2005 11:15:51 User:Mark Scardifield
[Start of Response]
Hhere is nothing further we can do at this stage. It has been agreed (at Morning Prayers) that this incidentr can be closed.
[Bnd of Response]
esponse code to call type B as Category 63 -- Final -- Programme approved - No fix required
Routing to Call Logger following Final Progress update.
jours spent since call received: 0 hours
[Date:04-dul-2005 18:32:09 User:clifford Saway
Hhis has occurred on a second office during Day 8 Cycle 3. POL have raised a new defect to track this, the text of which is copied
below. We have extracted the test steps from the two offices where this has appeared and will provide a further update tomorrow an:
commonalities.
POL update in Defec'
tA variance check was made at Office G (007 261). The stock unit snapshot showed a cash figure of £100,000.00. The cash figure
keclared was £100,050.00 and therefore a variance check should have produced a gain of £50.00. However, the system showed a gain of
£100,050.00. If the user logs out and then logs back in again, the variance check shows the correct value."
[Date:04-dul-2005 18:32:26 Uscr:clifford Sawdy
[fhe Call record has been transferred to the team: EPOSS-Dev
Jpate:04-dul-2005 18:32:52 User:Glifford Sawdy
lkeference Added: Customer reference 3005
€e:05-Jul-2005 11:21:52 Uscr:Mark Scardifield
lartin - reuniting you with this PEAK.
jpate:05-dul-2005 11:22:02 User:Mark Scardifield
[fhe Call record has been assigned to the Team Member: Martin McConnell
Te:05-dul-2005 13:58:48 User:clifford Saway
evidence Added - 1 7261 = © }_occurrence
Te:06-dul-2005 09:37:34 User:Martin Mcconnell
fhe latest messagestore atta ion totals (as before) but NOT to the
nessageport. notification mechanism because the value of (opening figures) cash was written at
GroupId:7261><Id: 2><Num:2499><Date :01-Jul-2005><Time:10:35:43>. Therefore this failure has occurred as the result of dataloss
between EPOSS and the Riposte query mechanism. The audit log shows that 54 records have been detected from a ‘TALOP QUERY STATUS'
call. Alas there is no trace or other information returned about the success or otherwise about the record(s) returned from
hTALOP_ACCESS QUERY’. Technically therefore this second occurrence is a different category and perhaps MORE disturbing. I am not
ble to tell whether
h. A silent failure has occurred within Riposte
b. A failure has occurred within Riposte and notified to us
b. Riposte is behaving OK but EPOSS has failed within the loop
ed is not quite the same in that Dataserver has lost transai
there is room within the main 'CompletePopulatetree' function to extend error handling in that we know in advance how many records
should be returned and should a null string be returned, but this fuction is literally the ‘engine’ room of Dataserver and not
something which I would like to 'tinker' with at this late stage.
jbate:06-dul-2005 16:02:44 User:Ray Donohoe F/288.1/3
POL00030523
POL00030523
fvidence Added ~ Svent log fo
Date:07-dul-2005 10:01:32 User:Martin McConnell
it of a smoking gun presented in the event viewer log for the 'busy proc
(04/07/05 12:31:01) The description for Event ID ( 1.) in Source ( Busy ) could not be found. It contains the following insertion
ctring(s): VM Usage @Mon Jul 04 12:13:01 2005 :0%, 93336KbPhys, 396996KBPage, 60540KbDesktop.
iled which process
system has
jould have been so much better had perfmon.exe been running on this machine becuase obviously we could have
might have been ‘draining’ the system near the point of failure, but all that we can categorically state is that the
lsuffered a big jump in system resource usage which has most likely led to this failure.
joate:07-dul-2005 10:07:19 User:Martin McConnell
It should add the bracketted date/time string in the event I inserted earlier was of my doing and it should have been (04/07/05
2:13:01), ie a few minutes prior to the point of failure.
pate:13-Jul-2005 15:16:16 Uscr:Clifford Sawdy
fhe call Priority has been changed from B
fhe call Priority is now c
Dater13-Jul-2008 15:20:38 Usor:Clifford Sawdy
educing to 'C' priority because although it is still thought to be a live problem, investigation also points to Riposte/build
anomolies on BTCT which may explain its prevalence in E2E.
Date:i3-dul-2005 17:15:56 User:Lionel Higman
[the call Target Release has been moved to Targeted At -- Future Unspecified
Date:15-Jul-2005 08:19:10 User:Lionel Higman
OP Reference set to: Customer reference 3005
Jate:18-Jul-2005 15:45:17 User:Martin McConnell
ave looked at the parameters on the latest E2E builds and the Message port upper and lower limits look fine to me,
of knowing (and th utching at straws) of what they were, guess we'll never know.
re is no way}
sis
[ate:49-dul-2005 10:34:39 Uscr:clifford Saway
call cloned from original call:PC0121925 by User:Cliff Sawdy
jbate:19-dul-2005 10:35:32 Uscr:Clifford Sawdy
call cloned as per request of Martin McC.
Date:19-dul-2005 10:35:43 Uscr:Clifford Sawdy
Ihe Call record has been transferred to the team: EPOS:
S-Dev
2:38 User:Martin McConnell
signed to the Team Member: Martin McConnell
[pate:i9-dul-2005 1
fhe Call record has been as
Jate:21-Jul-2005 10:24:04 User:Lionel Higman
[fhe call Target Release has been moved to Targeted At ~~ BI 3890
jate:25-dul-2005 14:11:51 User:iionel Higman
Whe call Target Release has been moved to Targeted At -- BI_3S81R
[Date:25-dul-2005 14:12:20 User:idonel Higman
ccepted by prayers as required for S81R.
[pate:27-dul-2005 1:
evidence Added - msto:
User:Clifford Saway
261 Cycle 4
ning
ate:27-Jul-2005 12:00:51 User:Glifford Sawdy
tall has been cloned to Call:PC0123647 by User
Clift Sawdy
jate:27-Jul-2005 14:36:23 Uscr:Martin McConnell
iagnostic ready for build at $80.
F/288.1/4
7
POL00030523
POL00030523
[atei27-sul-2005 14:36:35 UscriMartin Mcconnell
[the Call record has been transferred to the team: EPOSS-Rel
Te:27-dul-2005 15:27:28 User:Geoff Thomas
ference Added: Work Package PWY_WP 22576
Date:27-dul-2005 1.
TOP Reference set to
7:30 User:Geoft Thomas
Work Package PWY WP 22576
[Date:27-dul-2005 15:27:54 Uscr:Geoff Thomas
"ix released in WP22576.
Date:27-dul-2005 15:28:06 Uscr:Geoff® Thomas
Ihe Call record has been transferred to the team: Dev-Int-Rel
[Date:28-Jul-2005 08:25:26 Uscr:Tim Canniffe
Reference Added: Fast track fix FSTK_2_0_WP22576
Te:28-Jul-2005 08:25:28 User: Tim Canniffe
HOP Reference set to: Fast track fix FSTK 2 0 WP22576
Jate?28-dul-2005 0}
[Start of Response]
Is13S81R_WP22576 now released.
[End of Response]
esponse code to call type C as Category 46 -- Pending -~ Product Error Fixed
jours spent since call received: 0 hours
6:01 User: Tim Canniffe
Fe:28-Jul-2005 08:26:16 User:Tim Canniffe
Ihe Call record has been transferred to the team: Live Supp.Test
Date:28-Jul-2005 08:53:09 User:Sheila Bamber
[the Call record has been transferred to the team: ITU Svat
[fhe Call record has been assigned to the Team Member: Annie Holmes
I5-Aug-2005 12:17:59 Uscr:Clifford Saway
efect cause updated to 14: Development — Code
Jbate:15-Aug-2005 12:22:42 User:Clifford Sawdy
[Start of Response]
fycle 5: POL have run the tests from Day 6 and Day 8 that uncovered this issue without any further problems and have now closed
their Incident accordingly. Closing....
[End of Response]
Response code to call type C as Category 60 ~~ Final -- S/W Fix Released to Call Logger
jouting to Call Logger following Final Progress update.
fours spent since call received: 0 hours
jDate:15-Aug-2005 12:22:48 User:Clifford Sawdy
CALL PC0123319 closed: Category 60 Type C
Joate:15-Aug-2005 16:23:15 User:Clifford Sawdy
FALL PC0123319 reopened by Cliff Sawdy
Jate:iS-Aug-2005 16:25:41 Uscr:clifford Saway
further declaration made at the end of the day has after all exhbited the same Gain discrpancy iss
lessagestore for FAD:7261 attached for evidence. The declaration was performed on the slave counter.
OL have updated the Incident as follows:~
POL <Martin Rolfe>, 15/08/2005: When a subsequent cash declaration was made at this office later in the day, the amount entered asI
p cash declaration was again reported as a discrepancy. Therefore status amended to REOPEN and assigned back to Cliff."
I will update the PEAK with the exact test steps once the info has arrived from POL.
[Date:45-Aug-2005 16:26:01 User:Clifford Saway
evidence Added - ystore 726
F/288.1/5
7
POL00030523
POL00030523
PateriS-Aug-2005 1e:a7:16 Usericlifford Sawdy
[the Call record has been transferred to the team: EPOSS-Dev
te:16-Aug-2005 10:05:05 User:Clifford Sawdy
steps taken during this latest manifestation as per e-mail extract from Jon Thompson below:-
as I remeber It happened like thi
(ash declaration originally made by a user with "Clerk" authority from Stock Balancing screen.
fter attempting to roll over the SU but being unable to as the local suspense is not set to zero(Clerk insufficient authority to
resolve) the Clerk logs off and a Manager logs on.
nager goes to declaration screen in Counter Daily and PREVs from the screen before declaring again and this is where it all goes
ear shaped.
[his is on a shared stock unit always using ID01."
[Date:6-Aug-2005 10:58:42 Uscr:Mark Scardifield
Ihe Call record has been assigned to the Team Member: Mi
Date:16-Aug-2005 11:44:51 User:Clifford Sawdy
lbvidence Added ~ A: og FAD:7261
Jbate:17-Aug-2005 16:15:25 User:Martin McConnell
[Start of Response]
[fhe last update by Cliff was a major breakthrough and I've been able to consistently been able to reproduce the problem now as
presented here. Although I haven't got to the bottom of it the ‘backing out' of the local suspense commit phase is the significant
piece of information. I know ~I had to put in some code which tinkered with frerzing/unfreezing of the data tree and when next
logging on a property some where seems to be set which indicates the system is STILL in a freeze state (even though a reset of the
Hata tree SHOULD have been set on logon) .
setting there .. albeit slowly (but this is Dataserver we're talking about here).
(End of Response]
lesponse code to call type as Category 40 —- Pending -- Incident Under Investigation
Hours spent since call received: 0 hours
Date:17-Aug-2005 17:15:19 User:Martin McConnell
[Start of Response]
round it at last. Walter's fix for PC0117320 did not (as contrasted as part of my checks for PC0120824 where the tree was left
frozen) unfreeze the dataserver tree as part of the ‘role back out'; alas this was NOT happening.
so what I intend to do is to unfreeze the tree on the callback confirmation after the ‘clerk’ has cancelled. Then when logging on
the next time as whoever (Manaager OR Clerk) the figures should not ever be viewed as a ‘gain’. EPOSSStockunit.dll to be fixed.
Whis is an ESSENTIAL fix in my opinion for S81R to be in place by the time TP mode comes into operation, and it also explains why
fe are not seeing any occurrences at the moment.
For the testers the scenario (refined from above) :
I Have a single Stockunit with values posted to ‘local’ suspense to clear at the office level.
F Attempt to rollover this last active SU by a clerk, and let the system back out of the balancing process.
Logon as the manager role , simply visit the stock balancing menu and hit the discrepancies button and a gain will be flagged
immediately.
(OE for this to fail, when logging off as the clerk, no other balancing actions must be performed otherwise this will unfreeze thel
tree and all will appear ‘normal’.
[End of Response]
esponse code to call type C as Category 42 -- Pending -- Product Error Diagnosed
Hours spent since call received: 0 hours
[Date:48-Aug-2005 16:24:57 Uscr:Martin McConnell
[the Call record has been transferred to the team: EPOSS-Rel
[Date:18-Aug-2005 16:27:09 Uscr:Geoff Thomas
Reference Added: Work Package PWY WP 22731
[Date:i8-Aug-2005 16:27:12 Uscr:Geoff Thomas
OP Reference set to: Work Package PWY WP 22731
[Date:18-Aug-2005 16:27:30 Uscr:Geoff Thomas
Fix released in WP22731.
F/288.1/6
POL00030523
POL00030523
[lateriS-Aug-2005 16:27:40 UscriGeott Thomas
the Call record has been transferred to the team: Dev-Int-Rel
Te:19-Aug-2005 08:57:28 User:Tim Canniffe
eference Added: Fast track fix FSTK_2 0 WP22731
Date:19-Aug-2005 08:57:30 User:Tim Canniffe
ffOP Reference set to: Fast track fix FSTK_2_0 WP22731
[Date:19-Aug-2005 09:00:43 Uscr:Tim Canniffe
[Start of Response]
13890 _WP22731 now released.
[End of Response]
> code to call type C as Category 46 -- Pending -- Product Error Fixed
Hours spent since call received: 0 hours
spon:
[Date:19-Aug-2005 09:00:55 Uscr:Tim Canniffe
[the Call record has been transferred to the team: ITU sVeI
Date:19-Aug-2005 10:54:47 User:Eric Jennings
[fhe Call record has been assigned to the Team Member: Cliff Sawdy
Jate:19-Aug-2005 12:49
03 User: Tyrone Cozens
eference Added: Relea. 5
2 PinICh PCOl 5
‘e:19-Aug-2005 12:50:43 User:Tyrone Cozens
Reference Added: Release PinTCL PC012472
ate:19-Aug-2005 12:50:51 User: Tyrone Cozens
eference Deleted: Release PinICL PC0124725
HOP Reference automatically set to:Fast track fix FSTK 2 0 WP22731
jbate:05-Sep-2005 15:17:43 User:Clifford Saway
[Start of Response]
OL have retested and closed their cortresponding Incident. Closing....
[End of Response]
esponse code to call type C as Category 60 -- Final -- S/W Fix Released to Call Logger
outing to Call Logger following Final Progress update.
jours spent since call received: 0 hours
te:05-Sep-2005 15:17:47 User:Clifford Sawdy
CALL PC0123319 : Category 60 Type C
Jate:08-Sep-2005 10:42:07 User: Tyrone Cozens
eference Added: @ PinICh PCO125616
le:
Root Cause Development - Code
Logger Deleted User -- ITU SV&I
Subject Product EPOSS & DeskTop -- EPOSS (version unspecified)
Assignee Deleted User -- ITU SV&I
Last Progress 08-Sep-2005 10:42 -- Tyrone Cozens
F/288.1/7