FUJ00117483 - Email from Richard Watt to Allen Graham re Service Request Report- Fujitsu Services

Evidence on official site

FUJ00117483
FUJ00117483

To: Allen Graham (BRAO1)}
Ce: Singh Jim[
From: Richard Watt GRO

Sent: Sun 5/30/2010 8:29:10 PM (UTC) ~
Subject: RE: Service Request Report - Fujitsu Services

Hi Graham,

Yes, I’ve just spoken with Amit (he’s the manager of the Engineer implementing the code changes). His team are making good
progress with the fix and the current ETA for an official fix is end of Friday 4" June (US Time). This will be for 10.2.0.4.3 and
10.2.0.4.4. This ETA is based upon the information we have at the moment and is subject to all tests being passed and no further
complications.

We will keep you updated with any changes as we progress.

Kind Regards

Richard

From: Allen Graham (BRAO1
Sent: 30 May 2010 21:05
To: Richard Watt

Cc: Singh Jim

Subject: RE: Service Request Report - Fujitsu Services

GRO

Hi Richard, any news?

Graham Allen
Application Services — Post Office Account

FUJITSU

st Please consider the environment - do you really need to print this email?
Fujitsu Services Limited, Registered in England no 96056, Registered Office: 22 Baker Street, London, W1U 3BW.

This e-mail is only for the use of its intended recipient. Its contents are subject to a duty of confidence and may be privileged. Fujitsu Services does not guarantee
that this e-mail has not been intercepted and amended or that it is virus-free.

From: Richard Watt: GRO
Sent: 30 May 2010 11:
To: Allen Graham (BRAO1)

Cc: Singh Jim

Subject: RE: Service Request Report - Fujitsu Services

Hi Graham,

No problem, there has been further work yesterday on the bug and development are working on a new version of the patch
applied last Thursday on 20th May (8528171). This new version will include the additional fix to resolve the behaviour seen in our
testcase. I'll ask our development team for further clarification re: ETA as it was something Amit promised on Friday’s 4pm call.
Will feedback later today.

Regards

Richard
FUJ00117483
FUJ00117483

From: Allen Graham (BRAQ1) !
Sent: 30 May 2010 09:55 ~
To: Richard Watt

Subject: RE: Service Request Report - Fujitsu Services

Hi Richard,

Apparently we were expecting an update by close of play yesterday fro the US as to eta for the fix you were working on after having
been able to produce the LCL problem on your development environment. I don’t have access to the SR system and so unsure if
there is anything. Please could you let me know. In the meantime I'll get someone to check the SR.

Sorry to bother you and many thanks,

Graham Allen
Application Services — Post Office Account

FUJITSU
Lovelare.Road..Bracknell, Berkshire. RG12 8SN.

Web: hripsrencrumesu:com

ws Please consider the environment - do you really need to print this email?

s Limited, Registered in England no 96056, Registered Office: 22 Baker Street, London, W1U 3BW

This e-mail is only for the use of its intended recipient. Its contents are subject to a duty of confidence and may be privileged. Fujitsu Services does not guarantee
that this e-mail has not been intercepted and amended or that it is virus-free.

From: richard.watté
Sent: 28 May 2010 20:50777"”
To: Andrew Woodward; Carter Neil D; Stott Martin; Iqbal Zaf; Singh Jim; Goucher Sean; Kostuch Maz; D'Alvarez Alan; Allen Graham
(BRAO1); Adrian Turner; Beardmore Andy; Calvert Wayne; McCann Aidan; Gibson Andrew R; Hanley John (LON22); Cochrane Vince;
Keeling David; Apte Amit; Salt Michael; Butts Geoff

Subject: Service Request Report - Fujitsu Services

Service Request Report - Fujitsu Services

_ Fujitsu Services - Post office - DB/Streams Issues

EMEA, United Kingdom
Enterprise Support

14830841

Richard Watt [mail]

x Fri 28-MAY-2010 07:49 PM GMT (Fri 28-MAY-2010 12:49 PM US/Pacific Time)

This report: Business Problem, Exit Criteria, Conference Schedule, Current Status, Heads-up, Critical SR/Bug Summary, Action Plan.

Distribution List (Oracle)

Adrian Turner, Andrew Woodward, Richard Watt

Distribution List (Fujitsu Services and 3rd parties)

Aidan McCann, Alan D'Alvarez, Amit Apte, Andrew Gibson, Andy Beardmore, David Keeling, Geoff Butts, Graham Allen, Jim

FUJ00117483
FUJ00117483

Singh, John Hanley, Martin Stott, Maz Kostuch, Mike Salt, Neil Carter, Sean Goucher, Vince Cochrane, Wayne Calvert, Zaf Iqbal

\Customer Contac
Yim Singh, Office:

‘System and Product Information: IEscalation Open/Close: 10-MAY-2010/
FIAPPS:

fi ~“IRDBMS: 10.2.0.4 Milestones:

. IAS: N/A
ICustomer Contact: Platform: Intel Critical Production 15 May 10: DB hang/lock must be resolved
Sean Goucher IO/S: RedHat by this deadline

Project: Production

Engagement Manage
Richard Watt, Office

Critical Production 25 June 10: Roll out to 12000 Post Offices
Product List: due to complete

I- Oracle Server - Enterprise Edition
10.2.0.4.3

I- Red Hat Enterprise 4 (64-bit)

Business Problem:

Fujitsu Services (FS) are contracted with the Post Office to replace the current infrastructure used in 12000 branches across the UK.
The project involves rolling out a (non Oracle) financial application using Oracle database and tools products. Oracle Streams is
used to provide a reporting database environment. The migration onto the new platform started at the end of 2009 and is currently
rolled out live to 600 branches.

Post Offices taking part in the pilot have been complaining about intermittent hangs which are having huge impact on over the
counter services as the system hangs for 5 minutes or longer and users are unable to complete their financial transactions. Past
system hangs have lasted for up to 45 minutes — causing queues of people to form in the post offices as they are unable to process
transactions. Customers have become highly agitated causing unrest in some cases where they are unable to complete transactions
such as pension payments and withdrawal of cash. Post Masters have lodged written complaints as they are unable to operate
during these incidents.

FS have been on "red alert" for 6 weeks. Live rollout - up to 11500 branches (approx 30000 counters) due to finish 25/06/2010
across the whole of UK Post Office network is at risk and currently on hold until we can improve confidence in the reliability and
scalability of the solution.

If not resolved quickly the situation could be reported in the press causing huge damage to the reputation of both FS and Oracle.
Exit Criteria:

Resolution of locking issues on RAC and problems using streams

Current Status as of Fri 28-MAY-2010 07:48 PM GMT (Fri 28-MAY-2010 12:48 PM US/Pacific):
Changes since last update:

Oracle Engineering and Bug Diagnostics Engineer have reproduced a similiar behaviour to LCK issue but not to the same extent
as experienced by customer. They're working on a fix which is targeted to resolve the Ick issue causing the service outage. Further
information re: ETA for fix will be provided end of Saturday (US Time).

« Customer has been requested to flush shared memory pool prior to recycling instances and provide Oracle support with further
data. Customer will not recycle instances so will be able to provide data after 36 hours of uptime to Oracle on Monday.

Flushing the shared memory may provide a better workaround and alternative to recycling the database instances. This is under
evaluation by customer.

« Customer has escalated requirement for 10.2.0.4.4 merge patch via SRSR3-1782093091 to meet deadline of implementation into
production on 6th June. Oracle have provided a patch but an overlay patch failed. An acceptable workaround has been provided
and customer testing can continue. A corrected patch is in progress.

Out of hours contacts have been provided by customer for weekend coverage.

-
¢ Richard Watt (Critical Accounts Manager) will be available on :
process fails to meet customer's requirements.
FUJ00117483
FUJ00117483

Current Milestone:

PRODUCTION - resolution required ASAP

Current Status:

e No further LCK issues seen in production but DB instances are being restarted every night to clear shared memory pool
Patch 9733457 to address memory leak issue on schedule to be deployed Sunday 30 May.

e Application changes to in relation to truncation/partition DDL and use of bind variables to be deployed 6th June

e Fujitsu have highlighted plan to implement PSU 10.2.0.4.4.0n 6th June.

e Customer advised to run additional queries to gather additional diagnostic information around memory allocation in subpools.
Diagnosis shows correct behaviour re: subpools.

e Adrian Turner onsite Tuesday 1st June for next 3 weeks.

« Issue is escalated to highest level and Critical Accounts manager engaged. Managers from Bug Diagostic and Engineering attend
daily calls.

Action Plans:

¢ Customer completing remaining action plan for LCK issues including application of patch 9734573 and making application
changes to avoid partition DDL and make use of bind variables (due 6th June)

Oracle to provide guidance re: ETA of potential fix for Ick issue. This will be tested by Oracle in our reproduction environment
before providing to customer.

¢ Customer restarting DB instance every night to mitigate business impact until complete action plan in place. To consider whether
flushing shared memory is a better alternative with less impact.

¢ Oracle recommended customer plan to upgrade to latest PSU (currently 10.2.0.4.4) as part of service release cycle. Onsite
consultant confirms that PO may benefit from some of the fixes it contains, particularly around long log file synchs and partition
operations. Fujitsu currently plan to install on production 6th June.

* Oracle to provide merge patch for 10.2.0.4.4 via SR3-1782093091. This was delivered and an fault found with an overlay patch.
A workaround has been provided and a corrected patch in progress.

e Daily review calls with Oracle Critical Accounts / Support and Development management at 11am UK to review progress.
Technical review calls at 4pm UK as required. Executive calls at 6pm. Next scheduled conference call on 1st June at 11am.

e See SR Report below for current status and priority of technical issues.
Next Conf Call

Daily calls 14am, 4pm UK

Critical SR/Bug Summary

CusI -SR Next j  SR Po OpenList) ee
#IPtyI Action _I Product SR I Bug IDateIDateI SRSev I SR Description _ SR Status

Critical Production 15 May 10: DB hang/lock must be resolved by this deadline

1] 1 I Eisaiiel I Oracle! I & [aoa PAR TE I FCrCaT I SHOWSTOPPER: 5 Capture Diagnostics

FUJ00117483
FUJ00117483
Oracle 4610271271) a I MAY] ESE I 9670959: Receiver Status
(Customer Esc To: a
Working) Edition QKAUTTO

is waiting for a latch
dumping latch state
for receiver

Customer experiencing
intermittent DB hangs. Patch
8666117 already applied. DB
appears to be overloaded at
time - nearing 100% CPU - due,
to LCKO housekeeping
operations. May be a problem
with shared pool issues.
11/05/10=> Dev suggest
setting parameter
_object_statistics=false to
reduce IckO load, working on
one off patch for bug 8528171.
Cus to continue runnig
ProcWatcher and OS Watcher
to collect OS and network
information for further
incidents. Support to phone
Aidan McC: ith updates on
+-GRO I:
12/05/10-5 Patéh 9668554
provided for customer to test,
cus to set
_object_statistics=false and
enable event no 14532 trace
name context forever, level 1',
support have confirmed that
incident at 5:30 today looks
very similar to previous
incidents
13/05/10-> Another hang at
8:50 this morning - same as
previous - shared pool under
pressure - recommendations
remain same.
13/05/10-> No further hangs /
outages. OCS and Support
have communicated Critical
action plan and medium / long
term recommendations to
customer. Alll Critical actions
will be implemented by end of
Tuesday 18th May.
17/05/10-> Cus has set
_object_statistics=false. Minor
blip at 9:50 today - impact
unknown, procwatcher not
running. Patch and event
14532 to be set night of Tues
18 May.
18/05/10-> Patch and event
implementation delayed until
Thurs 20 May. Support confirm
that setting
_object_statistics=false has
lessened load on LCK as ‘obj
stat del’ messages are gone in
error stacks
20/05/10-> Patch and event
enabling implemented
21/05/10-> No further outages
as of 18:14. System under
monitoring. Data has been
provided to development for
further review of system
behavior.

FUJ00117483
FUJ00117483

22/05/10 -> Further occurrence
of LCK issue on Sat 16:00.
Escalated to support and Dev
engaged. Impact to PO unclear
as very few counters would
have been open (only 3rd party
such as at WHSmiths).
23/05/10-> Analysis of issue on
22nd May showed
improvement in memory
usage, further recommendation
to install patch from SR 3-
1777735201. This will further
improve shared memory
usage.

24/5/10: Oracle continue to
investigate informatin from
latest hang including active
session history
25/5/10: Oracle have
suggested running queries to
check for skewness in
allocation of memory between
subpools - need to know if one
pool has lots of free memory
whilst other one is starved.
Latest hang occurred with
1.5Gb of free memory so need
to check subpool allocation.
26/5/10: Initial investigation hasI
shown correct behaviour for
subpools. Further data has
been requested to confirm this.
27/5/10: Customer has been
recommended to flush shared
memory prior to recycling
instances. This is being
evaluated as a workaround to
recycling by customer and will
also provide Oracle
Engineering with more data re:
behavior. Oracle Engineering
have been able to reproduce
similar behavior though not to
same extent.

28/5/10: Oracle Engineering
are working on a fix based on
behaviour found in testcase.
Further details re: ETA will be
provided end of Saturday US
Time.

Actions
Customer / onsite Oracle team
to implement following
suggestions:-
1. Set _object_statistics=false
(completed)

2. Set event 14532 at level 1 to}
enable the fix for bug 5618049.
(completed)

3. Apply fix for bug 8528171
(provided in patch 9668554).
(completed)

4. Install merge patch 9734573
(includes bug 7306915).
(production 30th May)

FUJ00117483
FUJ00117483

5. Shared pool/library cache
tuning activity - application
fixes RA1005 6a (partition
DDL/truncate fix), RA1005.7
and .9 (bind variable fixes)
(due for implementation
weekend of 6th June)

To assist with investigation:-
1. Run queries to check of
skewness in allocation of
memory between subpools

2. Provide Oracle with script

used to attempt to reproduce
issue, this will be useful so

Oracle can assist with
reproduction.

3. Flush shared pool prior to

daily recycle and provide stats
back to Oracle.

Customer
(Customer
Working)

Oracle
Server -
Enterprise
Edition

3-

1777735201

9734782

20-MAY:
10

20-
MAY-
10

1-Critical

Hang on May 19th,
IMay 23rd -
IORA7445 errors

implement Solution

L>No Change

Status

Customer experienced hang
around 4:15 on May 19 - client
iscreens hung for around 4
mins. Further hang on Sun 23
May 5:30 am.

Fujitsu highlighted this had no
impact to post office branches.
llssue appears to match
lbug:7306915 "EXCHANGE
PARTITION leaks "KGL
handles" shared pool memory"
lixed in 10.2.0.4.4 - support to
Irequest backport to 10.2.0.4.3
hich is compatible with
previously supplied patches.
Customer has run script to
collect log file sync diagnostic
information - support reviewing.
20/05/10 -> Awaiting patch and
further findings from diagnostic
information provided

2105/10 -> New version of
lpatch provided to customer
(contains patch from SR3-
1610271271 plus bug 7306915)
24/5/10: Further hang
lexperienced 5:30 am May 23

lActions

[Customer to test and apply
Ipatch 7306915 - plan to put into]
‘OL tonight, production on
eekend of 29/30th May

(Review
Defect)

Oracle
Universal
Installer

3:
1782093091

Esc To:
FEMARTIN

8528171

9749642

21-MAY:
10

23-
MAY-
10

2-
Significant
ESC

lproblem applying
PSU 10.2.0.4.4

[Prepare Solution

Status

IOverlay patch to be provided
hich will contain Patch
9668554

10.2.0.4.4 does contain several
fixes which could be relevant
including partition and

FUJ00117483
FUJ00117483

Iperformance related fixes such
las Long "log file sync" wait in
IRAC

24 May 10: Customer to
Iconsider 10.2.0.4.4 as part of
longer term maintenance plan
I25 May 10: With Oracle to
deliver overlay patch
26/05/10: Escalated to Critical
Accounts Manager, customer
requires merge patch to
icontinue with testing 10.2.0.4.4.
Customer plans to implement
into production on 6th June.
27/05/10: Merge patch is
complex and is unlikely to be
delivered prior to 6th June.
Engineering are reviewing
lextent of code changes
required.

28/5/10: Patch failed due to
merge conflict caused by
loverlay patch 7189722.
\Customer has workaround to
lavoid application of patch and
Ican continue testing. An
corrected official patch will be
provided.

IActions
IOracle to provide corrected
lofficial merge patch.

\Critical Production 25 June 10:

Roll out to 12000 Post Offices di

ue to complete

4) 4 Customer Oracle

Oracle Server -
(DevelopmentI Enterprise

Working) Edition

2-5924536 I7345904]17-AUG] 10-
Esc To: Soe
PROLLAND 19367294

9662623

2:
Significant
ESC

Streams Apply
performance
degrades over time

identify Problem

L>No Change

Status

Desire to be up to date at target!
1.5 hours after batch. Now
varies from 4 -18 hour.
\Customer to upload an export
lof
lsys.streams$_apply_spill_txn
lby doing a copy of the table to
ithe schema of an ordinary user
lvia create table
Icopy_streams$_apply_spill_txn
las select * from
Isys.streams$_apply_spill_txn.
[Oracle consultant has made
jtuning suggestions and advised
jto re instantiate Streams as
current streams is far behind
12/05/10-> Information
Iprovided to Dev for analysis -
WITH DEV

14/05/10-> Dev investigating
ifurther.Customer is re-enabling
streams on adhoc basis as
business requirements
demand. They are aware of
impact to performance. It is not
lenabled at peak periods.
17/05/10-> Dev reviewing
luploaded files, awaiting

FUJ00117483
FUJ00117483

feedback

19/05/10-> Assigned to
development for further
investigation.

IActions

IOracle dev to investigate
further.

\Customer to upload opatch
inventory to SR.

Customer
(Customer
Working)

Oracle
Server -
Enterprise
Edition

3-

1742179041

07-MAY:
40

13-

MAY-

40

o.
Significant

Node Evictions on a
4 Node RAC
(PRODUCTION)

Implement Solution
->No Change
Status
Nodes are evicted from cluster
due to fatal heartbeat failures.
[This is intermittent and
customer is unable to
reproduce issue. Current
lanalysis indicates potential
Inetwork issue may be root
Icause and support have
requested further data.
\Customer has installed
loswatcher to assist with
lanalysis of more critical issues.
13/05/10-> Customer to
Iprovide oswatcher data on next
occurrence.
14/05/10-> Status unchanged
19/0510-> Customer has made
some changes to settings and
Ino node evictions seen since,
ould like Oracle to make
Irecommendations as to settings)
Ito tune the interconnect
21/05/10-> tcp tuning
recommendations provided

(Actions

Customer to implement
loperating system tuning
recommendations.

Customer
(Customer
Working)

Oracle
Server -
Enterprise
Edition

3.

1721140441

Esc To:
BLEVE

29-APR!
10

40-

MAY-

40

oa
Significant!
ESC

[ASM instance
Icrashed when
diskspace added

Implement Solution

L->No Change

Status

This issue may related to SR3-
1610271271 as read/write
laccesses are performed by the
\database instance directly and
Iso a lock from the database
instance could impact the ASM
instance which needs to access
Ito the same location.
13/05/10-> Will review this SR
lonce SR3-1610271271 has
lbeen progressed. Support to
monitor.

14/05/10-> Status unchanged
19/05/10-> No update

IActions

Customer to implement
solution for SR3-1610271271
jthen review this issue.

FUJ00117483
FUJ00117483