FUJ00173153
FUJ00173153
Peak Incident Management System
Call Reference PC0261282 Call Logger Deleted User -- Live Supp.Test
Release Targeted At -- HNG-X 15.31 Top Ref. WIN_ITM_OS_ AGENT CFG 1520 D001
Call Type Cloned call Priority C -- Progress restricted
Contact Deleted Contact Call Status Closed -- Build Fix Available to Call Logger
Target Date No Forcast Effort (Man Days) 0
Se: ‘The Monitoring Agent for Windows OS — Primary’ pid is using 4.7gb of memory
(C\IBM\ITM\TMAITM~1\kn
All References Type Value
DevIntRel-Director Live Supp.Test
Clone Master PC0261026
Release PEAK PC0269263
MSC 04350457573
TRIOLE for Service A16497108
Release PEAK PC0261780
DevIntRel-Director Live Supp.Test
Release PEAK PC0264301
Product Baseline WIN_ITM_OS_AGENT_CFG_1520 D001
Release PEAK PC0264293
Product Baseline WIN_ITM_OS_AGENT CFG_1520_ V001
Impact
St see User Date
Gerald Barnes 04-Aug-2017 17:49:26
SMG have suspended the saving of events because of this bug. This is a security issue.
The problem has uncovered an inefficiency in the sealer. It is repeatedly checking folders to see whether
anything needs to be done in a hard loop. It is always good practice to put a sleep of some duration if there
is nothing that needs to be done so resources will be freed to do other things. This fix should make for
examples prosecution queries quicker than before.
Progress Narrative
oate:10-Aug-2017 15:03:19 User:David Bower
CALL PC0261282 opened
Details entered are:~
summary: “The Monitoring Agent for Windows OS - Primary’ pid is using 4.7gb of memory (C:\IBM\ITM\TMAITM~1\kn
call Type:
call Priority:¢
Target Release:HNG-X Rel. Ind.
Routed to:Live Supp.Test - David Bower
bate:02-Aug-2017 13:42:50 User: Customer Call_
CALL PC0261026 opened
Details entered are:-
Summary: ‘The Monitoring Agent for Windows OS ~ Primary’ pid is using 4.7gb of memory (C:\IBM\ITM\TMAITM~1\kn
call Type:1
call Priority
target. Releas
lkouted to:E
ING-X Rel. Ind.
Unassigned
bate/Time Raised: Aug 2 2017 12:32PM
priority: ¢
contact Nam
contact Phone
riginator: X
wiginator's
product Serial No:
product Site:
ence: A16497108
FUJ00173153
FUJ00173153
lfransfer Note: Please pass to Tivoli-Dev via PEAK, thank:
[Below mail was received from Michael Greene
from: Greene, Michael
sent: Wednesday, August 02, 2017 1:
fo: FC.IN.POA_SMC
Isubject S Call
iti SMc, please raise a call for the following,
Priority : P(3)
pescriptioi : Service Name - ‘KNTCMA Primary’, ‘The Monitoring Agent for Windows OS - Primary’ pid is using 4.7gb of
remory (C:\IBM\ITM\TMAITM~1\kntcma.exe)
Please pass call to ‘POA-HNG NT Support’
Thanks
lichael Greene
lrusrrsu
2017-08-02 12:32:36 [ Sahanir, Rajkumar ]
INIT : Create a new request/incident /problem/change/issue
2017-08-02 12:35:13 [ Sahanir, Rajkumar ]
zneut_en_poa : Transfer Notification
2017-08-02 12:35:13 [ Sahanir, Rajkumar }
Jzneun_en_poa : Open Notification
2017-08-02 12:35:43 [ Sahanir, Rajkumar ]
zneut_en_poa : Transfer Notification
2017-08-02 12:38:28 [ Greene, Michael ]
ItoG : Noticed that the pid for ‘Monitoring Agent for Windows OS - Primary! service on LPRPARC201 was using 4.7qb of memory, (pid
lc: \IBM\ITM\TMATTM~1\kntcma.exe), server has &gb and memory was over 80% utilized. Service was stopped and started and memory has
been freed up.
c:\IBM\ITM\IMATTM~1\kntcma-exe details
File Version : 6.3.0.0
Product Version : 6.3.0.0
size : 2.28mb
bate Modified : 14/07/2017 10:55
11 attach log files from} :\IBM\ITM\TMATTM6 x64\loqs' to PEAK.
the pid is using 623mb memory on I IRRELEVANT
Please pass to Tivoli-Dev via PEAK to investigate, thanks.
2017-08-02 12:42:12 [ Greene, Michael I
lnout_en_poa : Transfer Notification
Jbate:02-Aug-2017 13:57:01 User:Joe Harrison
Product Infrastructure -- Tivoli (version unspecified) added.
jDate:02-Aug-2017 13:57:48 User:Joe Harrison
[fhe Call record has been transferred to the team: Tivoli-Dev
Progress was delivered to Consumer
Date:02-Aug-2017 1.
evidence Addea -{
1:12 Uscr:Michael Greene
Date: 02-Aug-2017 14:14:36 User:Shaun Wood
[the Call record has been assigned to the Team Member: Shaun Wood
Progress was delivered to Consumer
Jbate:02-Aug-2017 14:51:05 User:Shaun Wood
target Date/Time updated: new value is 31/12/9999 00:00
[Start of Response]
it have checked platforms on LST HDCR, the ARC201 has similar issues to live. This is a 4GB machine which is at 88% memory and 55%
cpu, the kntcma.exe was using 1.9gb of memory so nearly half the memory.
It have checked other Windows 2012 platforms on LST as all Windows 2012 are running ITM OS Agent 6.3.0.6 but none have memory
usage as high as the ARC201 platform.
FUJ00173153
FUJ00173153
[TEM201 193,516k
C201 50,184k
lssc201 40, 400k
It have checked IBM, there is a Fix Pack 7 available i.e, 6.3.0.7 but nothing documented about memory leaks. I did find APAR
Itv62549 for a memory leak issue on the Windows OS Agent but this was fixed in 6.3.0.5. It may be that 6.3.0.6 re-introduced the
issue ?
It will stop/start the ITM OS Agent on {if Tion LST and monitor plus I will get a PMR logged with IBM if over the next few
days we see an increase in memory usagé on the Live and LST ARC201 platforms.
549
[End of Response]
Response code to call type L as Category 40 -- Pending -- Incident Under Investigation
Response was delivered to Consumer
JDate:02-Aug-2017 14:58:53 User
ttm OS Agent restarted on
:Shaun Wood
@ 02/08/17 14:55
checked memory usage after this which was
a0, 904
a0, 968
40, 984
40, 968
a0, 916
40, 900
[this shows a low memory usage which does up/down as we'd expected, I will check again tomorrow.
JDate:02-Aug-2017 15:22:56 Uscr:Shaun Wood
It have asked Michael to keep an eye on the live ARC201.
Date:03-Aug-2017 15:40:43 User:Shaun Wood
We've just hit another issue with{
lsenerated 1.8million since 15:04 1
as ITM OS Agent is using 5.5GB, there have been a large number of security events
ig has been overwritten
It suspect the ITM OS agent is grabbing memory to read all of the events as it does provide details of 0S Log Files.
lAccording to Michael
{y03/y08/¥2017 15:33] Greene, Michael:
II can see a lot of audit type security events against the sealer.exe - An attempt was made to access an object. then The handle
to an object was closed - @ 15:04:25, thousands of them
thats whats filled the sec event log up
probably need to relax the audit settings
Jbate:03-Aug-2017 15:54:49 User:Shaun Wood
Looking at the Security log there are vast amounts of Audit Success Events around 15:05 of ID 4663 and 4658 for auditsvrcomp-
[the Security log goes from 15:04 to 15:48, there are 1.8 million records in 44 minutes. Of the 1,833,927 of these 1,825,830 are
Audit Success.
so based on this rate of 2.4 million security events per hour this server along will rack up 59 million security events per day.
this is being done due to new security measure for auditing. I would question what is running which is creating so many of these
levents as these are Success so this looks to be normal running which I'm guessing will only increase as we move into R16 & R17
are most systems will be audited.
joate:03-Aug-2017 15:59:08 User:Shaun Wood
iT have now stopped and disabled the ITM OS Agent so that we don't hit this issue. In order to progress this issue I need Gerald
[Barnes to check the platform to explain why we are getting so many security events for Audit, this is to be expected ? If so then
lwe may need to consider relaxing the security auditing as this will also be creating millions of events to go into audit whi
i'm sure will be 100 times more or higher than the current system. I won't raise a call with IBM at this moment as I suspe
y just advise us to reduce event loads as we don't have any issues on other platforms.
they
IK will pass this over to the audit team.
jbate:03-Aug-2017 15:59:26 Uscr:Shaun Wood
[the Call record has been transferred to the team: Audit-Dev
[the Call record has been assigned to the Team Member: Gerald Barnes
Progress was delivered to Consumer
bate 03-Aug-2017 11
[Start of Response]
8:15 User:Gerald Barnes
FUJ00173153
FUJ00173153
ff have sent an email to Dave Haywood asking whether we can stop generating these success events:
It have no reason to believe it is anything other than BAU.
[End of Response]
Response code to call type L as Category 40 -- Pending -- Incident Under Investigation
Response was delivered to Consumer
fours spent since call received: 4 hours
Joate:04-Aug-2017 10:24:58 Uscr:Dave Haywood
efore I agree to considering relaxing event logging on the ARC servers, I would like to understand why the auditsvrcomp userid
is (I presume) opening and closing so many files over such a short period of time. The evidence doesn't seem to contain details
bof which files are being accessed and why. I would like to rule out a software issue that is causing a large number of events to
be logged. Please provide some analysis of which files are being opened / closed, at what rate and why.
[fhe events in question are:
lan attempt was made to access an object - Event ID 4663
the handle to an object was closed - Event ID 4658
Please supply further analysis / evidence as requested above.
[Date:04-Aug-2017 17:44:10 Uscr:Gerald Barnes
Product HNG-X Platforms -- Audit Server (ARC) (version:2) added.
Jbate:04-Aug-2017 17:49:26 User:Gerald Barnes
lA new Business Impact has been adde
lsc have suspended the saving of events because of this bug. This is a security issue.
[the problem has uncovered an inefficiency in the sealer. It is repeatedly checking folders to see whether anything needs to be
Jaone in a hard loop. It is always good practice to put a sleep of some duration if there is nothing that needs to be done so
jresources will be freed to do other things. This fix should make for examples prosecution queries quicker than before.
Jbate:04-Aug-2017 18:08:52 User:Gerald Barnes
Development Cost updated: new cost is 2 (Man Days)
[Start of Response]
DEVELOPMENT IMPACT OF FIX:
SPECIFY THE HNG-X PLATFORMS IMPAC’
Di
[the platform has been specified and it is the audit server.
I(ECHNICAL SUMMARY:
In routine RGSchedule of SealContol.c it gets into a hard loop of checking ?
Ip: \Archiveserver\CONTROL\SEALER MODULE
ID: \Archiveserver\INTERFACES\IMPORT CAT\Data
lp: \Archiveserver\CONTROL\SEALER 2 MODULE
Jo: \Archiveserver\ INTERFACES \ IMPORT _CAT\Md5
jaiting for something extra to do.
[this is not efficient.
it will be wasting a lot of machine resources doing this.
[the code does sleep for a second of so in the loop when there is absolutely nothing to do.
tt has multiple threads and the problem occurs when some threads are doing things and it is trying to decide whether to start
another one or not.
so in conclusion a sealer fix is required.
[his fix will greatly reduce the number of events and make processing much more efficient at the same time!
IST OF KNOWN DIMENSIONS DESIGN PARTS AFFECTED BY THE CHANGE:
lAUDIT SERVER _APP_v2
DEPENDENCIES:
[there are no dependencies.
DEPLOYMENT DETAIL:
lkeplacement files to be supplied during the evening backup.
DEV EFFORT IN MANDAYS:
2 man days. I have another fix to work on which may need to be done first for 16.21. We may decide to schedule this first in
which case I can start immediately.
IMPACT ON USER:
it will speed things up for SecOps though I am not sure by how much.
FUJ00173153
FUJ00173153
IMPACT ON OPERATIONS:
[they will be able to harvest events again.
HAVE RELEVANT KELS BEEN CREATED OR UPDATED?
io KEL is needed from the audit team.
IMPACT ON TEST:
hey need to check that gathering, ARQs and the evening robocopy works as before without filling up the event log.
RISKS (of releasing and of not releasing proposed fix):
releasing
It cannot see any disadvantage.
Jot. releasing
je will continue to get flooded with these audit success events.
je will continue to needlessly keep checking the same folders hundreds of times a second when it would be sufficient to do it
jonce a second.
LIST OF LIKELY DELIVERABLES:
lsealer.exe definitely
le may decide to make the sleep configurable so as to fine tune the fix later.
In this case additionally -
larchive.exe
configDLL.dll
beleter.exe
catherer
lessages.d11
lketriever.exe
sealer.exe
[Boot le\ConfigurationFile.txt
igan\ConfigurationFile.txt
igan\ConfiurationFile DR.txt
exe
[End of Response]
esponse code to call type L as Category 55 -- Pending -- Live Fix Impact Supplied
lkesponse was delivered to Consumer
Hours spent since call received: 7 hours
jbate:04-Aug-2017 18:10:17 User:Gerald Barnes
(the call Target Release has been moved to Proposed For -- lING-X 15.21
jDate:04-Aug-2017 18:10:46 Uscr:Gerald Barnes
lAction placed on Team:BIF
Jbate:07-Aug-2017 10:37:33 User:Jubita Gurung
fhe call Target Release has been moved to Targeted At -- HNG-X 15.20
Jbate:07-Aug-2017 10:58:57 User:Jubita Gurung
IBIF approved and targeted at 15.20
Jlate:07-Aug-2017 10:59:01 Uscr:dubita Gurung
laction has been removed from the call
Jate:07-Aug-2017 11:28:12 Uscr:Shaun Wood
fter discussing this issue with John Bradley I have raised PMR Ref 16388,019,866 with IBM as we don't think their agents should
lbe utilising so much memory and need to know if there is a way of disabling checking event logs as we have Netcool monitoring the
indows event logs.
JDate:07-Aug-2017 11:54:30 Uscr:Gerald Barnes
Reference Added: Jira POA-2216
[Date :08-Aug-2017
0:01 User:Dimensions Automated User
FUJ00173153
FUJ00173153
Reference Adda
lReference Adde:
Product Baseline AUDIT SERVER APP V2_1520_V019
Product Baseline AUDIT SERVER APP V2_1520_v019-v009
bate:08-Aug-2017 19:18:30 User:Gerald Barnes
[Start of Response]
partially fixed by version 15.20.0.5 of sealer.exe.
If the sealer is not busy then you will get 259,200 of these success events per day.
[this would increase to a maximum of 10 times this number if the sealer was very busy all the time which would never be the case.
So even in the very worst case there will be far less than 59 million security events a day.
[End of Response]
Response code to call type L as Category 46 -- Pending -- Product Error Fixed
Response was delivered to Consumer
tours spent since call received: 15 hours
Date:08-Aug-2017 19:18:35 Uscr:Gerald Barnes
Ibefect cause updated to 14: Development — Code
bate: 08-Aug-2017 19:18:48 User:Gerald Barnes
the Call record has been transferred to the team: Dev-Int-Rel
Progress was delivered to Consumer
jbate:09-Aug-2017 08:30:01 User:Dimensions Automated User
Reference Added: Product Baseline AUDIT SERVER APP_V2_1520 D019-p009
bat e:09-Aug-2017 12:03:20 User:PIT Automated User
[Start of Response]
Peak 0261026 handled by integration auto handler
[fhe following baselines attached to this peak have the targeting flags set:
\UDIT_SERVER_APP_V2_1520_D019-D009 FOR (LIVE:YES TEST:YES RDT:YES) Integrator: Geoff Inglis
these baselines have completed integration testing, moving to holding stack awaiting peak ejection.
[End of Response]
Response code to call type L as Category 47 (Fix Processed by PIT)
the incident has been transferred to the Team: Int-Rel
Progress was delivered to Consumer
pate:09-Aug-2017 12:05:53 Uscr:PIT Automated User
{Start of Response]
## AUTOMATED UPDATE - INTEGRATION PEAK BOT #f
Fix proc
sed by integration, routing to dev-int-rel director...
PLEASE NOTE: If this fix has failed, to send this peak back to integration it MUST have the response code Fix Failed or Response
Rejected on it, otherwise the peak will bounce.
[End of Response]
Response code to call type L as Category 49 (Fix Available for IndependentTest)
the incident has been transferred to the Team: Live Supp-Test
Progress was delivered to Consumer
Date:09-Aug-2017 15:34:57 User:Victoria Griffin
Reference Added: Rele: AK PCO261232
jDate:10-Aug-2017 14:07:37 User:Shaun Wood
It need to get this call cloned So that I can test / change the ITM OS Agent as per advice from IBM.
jbate:10-Aug-2017 15:03:19 User:David Bower
call cloned from original call:PC0261026 by User:David Bower
Date:10-Aug-2017 15:04:30 User:David Bower
lhe Call record has been assigned to the Team Member: David Bower
jbate:10-Aug-2017 15:05:05 User:David Bower
[the Call record has been transferred to the team: Tivoli-Dev
[the Call record has been assigned to the Team Member: Shaun Wood
jDate:10-Aug-2017 16:55:16 Uscr:Shaun Wood
FUJ00173153
FUJ00173153
nis call be used to progress the changes provided by IBM, I will raise an Bmergency MSC to make the changes on I
tomorrow to test as this platform will has the issue and so will prove if the IBM changes are successful.
bate: 11-Aug-2017
Reference Added:
bate:11-Aug-2017
target Date/Time updated
[Start of Response]
SC raised to update KNTENV file on {~ } to address memory issues. Once this has been implemented we will then need to
jonitor for a few days to confirm this has addressed the issue. A formal fix will then be delivered.
[End of Response]
Response code to call type C as Category 41 -- Pending -- Product Error Diagnosed
9:54 User:Shaun Wood
new value is 31/12/9999 00:00
Date:11-Aug-2017 15:22:21 User:Shaun Wood
SC has been implemented, the ITM OS Agent has started and is running fine. I have monitored memory for 5 mins, this as stayed
fairly static around 41,000k. I will inform NT and then check the server again next week.
Date:44-Aug-2017 10:22:16 User:Shaun Wood
It have just checked the ITM 0S Agent ont the memory usage is at 42,656k which looks fine as there have been millions
lof events so the agent is no longer consuming memory like this did prior to the changes. I will continue to monitor for the rest
lof this week, if all still looks fine I will get a formal release sorted.
lbate:14-Aug-2017 1
[Start of Response]
It will action QFP as I'm not sure what target release I should use for this Peak, Gerald has delivered his fix at R15.20 which I
lsuess now needs to go through LST as a hot fix, this ITM OS Agent change also needs to do the same so R15.20 also ? The
IN ITM OSAGENT_VO01 was delivered at R15.20 so I'd just need a VO02-V001 incremental.
[End of Response]
lkesponse code to call type C as Category 40 -- Pending -- Incident Under Investigation
229:20 User:Shaun Wood
[Date:14-Aug-2017 10:29:38 Uscr:Shaun Wood
ction placed on Team:QFP Forum
JDate:47-Aug-2017 17:44:30 User:Shaun Wood
[Start of Response]
Ir have ust cheeked the TIM OS Agent onl IRRELEVANTL the memory usage is at 44. 156k which confimms that we no longer have an lasue
Iso I now need to deliver a formal fix. GFP WITT hed to sanction this and target, I'll propose R15.20 as Gerald has delivered his
ix at this release.
[End of Response]
kesponse code to call type C as Category 41 -- Pending -~ Product Error Diagnosed
bate:i7-Aug-2017 17:44:45 User:Shaun Wood
the call Target Release has been moved to Proposed For ~~ HNG-X 15.20
Jbate:18-Aug-2017 09:03:52 User:Nick Lawman
the call Target Release has been moved to Targeted At -- HNG-X 15.20
JDats:21-Aug-2017 12:37:08 Uscr:Shaun Wood
ction has been removed from the call
jDatc:23-Aug-2017 13:50:02 Uscr:Dimensions Automated User
Reference Added: Product Baseline WIN ITM OS AGENT CFG 1520 vo0l
bate: 23-Aug-2017 13:
(Start of Response]
lew ITM OS Agent config product released to amend agent values to address memory issues. This now needs to be installed onto all
indows 2012 Servers as a top-up to address this issue which has been tested on the live platform.
[End of Response]
Response code to call type C as Category 48 ~~ Pending -- Fix Released to PIT
2:13 User:Shaun Wood
JDate:23-Aug-2017 1:
the Call rec
2:19 Uscr:Shaun Wood
rd has been transferred to the team: Dev-Int-Rel
pate: 23-Aug-2017 14:25:01 User:Dimensions Automated User
lkeference Added: Product Baseline WIN ITM OS AGENT CFG 1520 D001
FUJ00173153
FUJ00173153
pate: 24-Aug-2017 14:41:59 User:Sarah Payne
[the call Target Release has been moved to Targeted At -- HNG-X 15.31
Date: 24-Aug-2017 14:42:30 Usor:Sarah Payne
peak re-targeted to R15.31 as LST have signed off R15.21.
Jbate:24-Aug-2017
Reference Added:
11 User:Karen Cooper
< PC a0
bate:30-Aug-2017 11:48:46 User:Vijesh Pandya
the Call record has been transferred to the team: Live Supp.Test
JDate:04-Sep-2017 14:16:05 User:Mark Ascott
[the Call record has been assigned to the Team Member: David Bower
JDate:26-Oct-2017 15:53:03 User:David Bower
[Start of Response]
Baseline nm all LST win 2012 servers and no issues encountered. This is a top up for changes that were tested by Shaun
joods on {IRRELEVANT: This has passed LS? testing.
[End of Respons:
Response code to call type C as Category 61 -- Final -- Build Fix Available to Call Logger
routing to Call Logger following Final Progress update.
6-Oct-2017 15:5:
CALL PcO261282 closed
1 User:David Bower
Category 61 Type C
[Date:14-Nov-2017 15:36:48 User:Victoria Griffin
a Spe
Reference Added: Release BE02 6425:
Jbate:14-Nov-2017 16:58:50 User:Victoria Griffin
Reference Added:
jDate:17-Apr-2018
05 User:dubita Gurung
Reference Added: cK
ie 269263
Root Cause Development - Code
Logger Deleted User -- Live Supp.Test
Subject Product General/Other/Misc -- Unknown (version unspecified)
Assignee Deleted User -- Live Supp.Test
Last Progress 17-Apr-2018 16:02 -- Jubita Gurung