Transaction Based Sizing for Create Incident using New REST API

Objective:

To determine the throughput per hardware for ITSM Rest API integration via Create Incidents.

Deployment Environment:

The environment for this benchmark was based on containerization. For virtual hardware or physical hardware, the results are similar or negligibly better. The Hardware specifications mentioned below are used to deploy containers required for the current Bench-marking. In case of virtual or physical hardware, the container controller can be ignored, and the hardware specification can be extracted from the “Pod Configuration” column. The logical deployment architecture of BMC Helix ITSM is as shown (the container layer is not shown):

Create Incident using New REST API image2019-12-18_17-8-43-helix.png

Note

The above diagram is the standard architecture in our lab environment. For the REST API workload, all REST calls were serviced by the Platform-User AR Server. Related admin operations triggered on each incident creation were serviced by Platform-Admin; FTS indexing for newly created incidents was performed by Platform-FTS. The Mid-tiers in the above diagram were not invoked during the REST API workload.

Hardware Specifications:

The hardware specifications used in the diagrammed architecture are as follows:

Container Environment
Sno.	VM/Node	Pod Role	VM/Node Configuration		Pod Configuration (Limits)		JVM Heap
Sno.	VM/Node	Pod Role	CPU (Cores)	Memory (GB)	CPU (Cores)	Memory (GB)	Xms (GB)	Xmx (GB)
1	VM01	Master Node	8	16	-	-
2	VM02	AR_Admin	8	20	4	16	12	12
3	VM03	AR_FTS	8	20	4	19	12	12
4	VM04	RSSO	4	8	2	3	2	2
5	VM05	MT User	6	10	4	8	6	6
6	VM06	AR_User	6	14	4	12	8	8
7	VM07	Grafana	8	12	-	-
8	DB01	Database (Physical box)	20	256

Methodology:

The throughput is measured for “Create Incident” use case using REST API “HPD:IncidentInterface_Create”.
REST API Template used is as below:

{
   "values":{
      "First_Name":"${customer}",
      "Last_Name":"User",
      "Description":"REST API: Incident Creation by SILKPerformer",
      "Impact":"1-Extensive/Widespread",
      "Urgency":"1-Critical",
      "Status":"New",
      "Reported Source":"Direct Input",
      "Service_Type":"User Service Restoration",
      "Categorization Tier 1":""      +Tier1+      "",
      "Categorization Tier 2":""      +Tier2+      "",
      "Categorization Tier 3":""      +Tier3+      "",
      "Product Categorization Tier 1":""      +OpTier1+      "",
      "Product Categorization Tier 2":""      +OpTier2+      "",
      "Product Categorization Tier 3":""      +OpTier3+      "",
      "z1D_Action":"CREATE"
   }

The getMenu() and Expand Menu() REST API is used to obtain the “Product Categorizations – 1, 2, 3 and Operational Categorizations – 1, 2, 3” values.

With the specified hardware simulation of various transaction rates for each workload is performed. During the test run monitor and measure the throughput, the average response time, and hardware resource utilization.

Workload details:

The following table shows the planned workload. Each thread was targeted to create about 1,000 transactions per hour. Each subsequent transaction is dependent on the completion of the previous transaction so that the targeted rate is approximate:

REST API Threads	Expected Transactions/hr	Expected Transactions/min
15	15000	250
20	20000	333
25	25000	417
30	30000	500
35	35000	583
40	40000	667

Note that for each Incident creation, related Admin Operations are triggered on the associated AR Admin Server. These operations are asynchronous and not measured as part of this benchmark. However, the resource usage of the AR Admin Server in terms of CPU/memory were measured.
For reference, the related asynchronous operations are:

AR System Email Message
NTE:Notifier
NTE:Notifier Log
NTE:SYS-NT Process Control
HPD:Help Desk Assignment Log
HPD:Help Desk
SLM:Measurement

Load automation:

The workload was performed by the automation tool Silk Performer to simulate multiple threads concurrently interacting with the system. Silk Performer simulates transactions by generating HTTP requests and processing HTTP responses. This mirrors real-life usage. Silk Performer was also used to measure server response times from the client (user) perspective. Silk Performer was deployed on a dedicated 20 CPU cores and 192 GB RAM physical machine connected with the F5 Load Balancer in the test environment setup. This eliminates any potential resource constraint on the tool while executing the workload.

Test Data Volume:

The following BMC products were used: Helix IT Service Management, BMC Service Request Management, and BMC Knowledge Management foundation data and application data. A volume of data was preloaded (generated) into the database to simulate a real-life deployment.

Table 1 summarizes the foundation data inserted into the AR System database prior to starting the tests.

Table 1. IT Service Management foundation data

Type	Foundation
Companies (multi-tenancy)	200
Sites	1,410
People	28,002
People Organizations/Dept	406
People Application Permission Groups	132,054
Support Organizations	18
Support Groups	106
Support Group Functional Role	48,006
Assignments	1,407

Table 2 summarizes the application data inserted into the AR System database prior to starting the tests.

Table 2. IT Service Management and Knowledge Management application data

Type	Volume
Incident	502,949
Change	102,126
Problem	100,000
Service Target	965
CI+relationships in two datasets	9,981,281
Incidents with CI Associations	52,000
Service CI	5026
Knowledge Base Large Documents	10,000
Type	Volume
Knowledge Base Small Documents	80,000
Knowledge Base articles	20,000

Table 3 summarizes the foundation data for Service Request Management.

Table 3. BMC Service Request Management foundation data

Type	Foundation
AOT	138
PDT	239
SRD	138
Navigational Categories	629
Service Requests	504,270
Entitlement Rules	94

Results:

Following tests were carried out. There were no errors during the test duration.

Use case	15 Threads		20 Threads		25 Threads		30 Threads		35 Threads		40 Threads
Use case	Avg (s)	90th % (s)	Avg (s)	90th % (s)	Avg (s)	90th % (s)	Avg (s)	90th % (s)	Avg (s)	90th % (s)	Avg (s)	90th % (s)
Create Incident Using REST API	1.17	1.30	1.37	1.65	1.64	2.0	1.78	2.54	1.99	2.92	2.34	3.13
Expand Menu - OC Tier1	0.25	0.32	0.28	0.41	0.34	0.51	0.33	0.51	0.35	0.52	0.36	0.61
Expand Menu - OC Tier2	0.28	0.33	0.35	0.44	0.39	0.53	0.38	0.52	0.41	0.54	0.41	0.64
Expand Menu - OC Tier3	0.27	0.33	0.36	0.44	0.40	0.52	0.39	0.53	0.43	0.55	0.44	0.65
Expand Menu - PCT Tier1	0.27	0.32	0.35	0.42	0.41	0.51	0.40	0.53	0.42	0.54	0.43	0.64
Expand Menu - PCT Tier2	0.25	0.3	0.32	0.39	0.39	0.49	0.39	0.54	0.40	0.55	0.41	0.62
Expand Menu - PCT Tier3	0.23	0.28	0.29	0.36	0.36	0.45	0.37	0.50	0.39	0.51	0.40	0.61
Get Menu - OC Tier1	0.01	0.01	0.01	0.02	0.01	0.02	0.02	0.05	0.04	0.06	0.05	0.07
Get Menu - OC Tier2	0.01	0.02	0.01	0.02	0.01	0.03	0.03	0.04	0.04	0.07	0.06	0.08
Get Menu - OC Tier3	0.01	0.02	0.01	0.03	0.02	0.04	0.04	0.04	0.05	0.05	0.07	0.09
Get Menu - PCT Tier1	0.01	0.02	0.01	0.02	0.02	0.04	0.02	0.04	0.05	0.06	0.06	0.07
Get Menu - PCT Tier2	0.01	0.01	0.01	0.02	0.02	0.03	0.03	0.05	0.06	0.07	0.07	0.09
Get Menu - PCT Tier3	0.01	0.01	0.01	0.02	0.02	0.04	0.02	0.04	0.04	0.06	0.05	0.06
Total Incidents Created	11262		13311		14784		17098		20411		21268
Incidents/min	188		222		246		285		340		354
Incidents/sec	3.1		3.7		4.1		4.7		5.7		5.9

Threads	AR Admin Pod		AR User Pod
	Avg CPU Utilization (%)	Memory Used (GB)	Avg CPU Utilization (%)	Memory Used (GB)
15	19.4%	15.6	35.7%	10.4
20	33.4%	14.5	43.1%	10.2
25	36.8%	15.2	48.4%	10.6
30	35.6%	15.3	63.2%	10.1
35	34.3%	14.6	66.1%	9.5
40	36.8%	15.2	70.2%	10.8

Analysis

A higher transaction rate with more worker threads can be observed but the AR CPU utilization went beyond the set threshold of 70%. This means the response time per transaction was less than optimal. Using the resource of 4vCPU/16GB, approximately 21,268 transactions per hour can be achieved with an average response time of about 2.34 seconds.

When the transaction rate was increased for the same CPU/RAM resource allocation, the actual number of transactions fell below the expected and the average transaction time increased.
This means, for the given CPU/RAM allocation, the 40 worker threads row in the above table is the optimal throughput with respect to transaction time. The result also indicated that a higher transaction rate is possible if slower response time is within tolerable limit.

Scalability Tests

For horizontal scale testing, 2 user pods were used with resource allocation of 4vCPU/16GB RAM. Correspondingly, the AR Admin pod resource allocation was also increased to 8vcpu/20GB RAM to service additional work.
The workload was executed with 60 worker threads (or 60,000 transactions per hour) to validate the scalability of the horizontal scaling architecture.

Results:

The following 2 tables summarize the result and corresponding resource used.

Use case	60 Threads
Use case	Avg (s)	90th % (s)
Create Incident Using REST API	2.81	3.67
Expand Menu - OC Tier1	0.41	0.59
Expand Menu - OC Tier2	0.42	0.58
Expand Menu - OC Tier3	0.42	0.58
Expand Menu - PCT Tier1	0.43	0.58
Expand Menu - PCT Tier2	0.42	0.59
Expand Menu - PCT Tier3	0.41	0.58
Get Menu - OC Tier1	0.06	0.07
Get Menu - OC Tier2	0.05	0.06
Get Menu - OC Tier3	0.05	0.06
Get Menu - PCT Tier1	0.05	0.06
Get Menu - PCT Tier2	0.05	0.06
Get Menu - PCT Tier3	0.05	0.06
Total Incidents Created	38374
Incidents/min	640
Incidents/sec	10.7

Threads	AR Admin Pod		AR User0 Pod		AR User1 Pod
	Avg CPU Utilization (%)	Memory Used (GB)	Avg CPU Utilization (%)	Memory Used (GB)	Avg CPU Utilization (%)	Memory Used (GB)
60	33.4%	16.4	64.1%	10.45	66.8%	10.6

Configuration Settings:

For the entire benchmark the Fast Thread (queue 390620) was set to 16/20 threads in ar.conf.

Below indexes are created on the database after initial analysis of results.

USE [<Database_Instance>]

GO
CREATE NONCLUSTERED INDEX [IDX_<Name>]
ON [dbo].[T2318] ([C300364200],[C490008000])
INCLUDE ([C179],[C490009000])
GO

USE [<Database_Instance>]
GO
CREATE NONCLUSTERED INDEX [IDX_<Name>]
ON [dbo].[T2173] ([C301494900],[C490008000],[C490009000])
GO

Note :

Schema T2318 is for SLM:Measurement

Schema T2173 is for SLM:SLAComplianceHistory

Summary:

For pragmatic usage, summary from above tests is as following:
For the given 4 vcpu/12 GB user facing VM/node, the throughput is 21,268 transactions per hour at the rate of approximately 5.9 transactions per second, i.e., a throughput of 5317 transactions per CPU core.

The response time of Expand Menu and Get Menu REST API is ~0.41 sec and 0.05 seconds to obtain the required menu lists. This response times may change depending on the Menu options.
We observe decrease in throughput when we add more threads. Please note that the rate of approximately 20,411 transactions per hour represents the maximum throughput achieved on this given hardware as the expected throughput with 35 worker threads was 35,000 transactions per hour. The average CPU utilization on Platform-User AR Server of 66%.
The REST API load was scalable and throughput obtained was nearly linear (up to the limited horizontal scaling done for this benchmark): For a given 2 user facing AR Servers with 4 vcpu/12GB, the throughput is 38,374 transactions per hour at rate of approximately 10.7 transactions per second.

Notes:
• The above results will vary depending on various other factors such as environment, hardware, specific REST API, other AR application customization and so on.
• Platform-Admin AR Server needs to be sized correspondingly to handle the related transactions.

Transaction Based Sizing for Create Incident using New REST API

Objective:

Deployment Environment:

Hardware Specifications:

Methodology:

Workload details:

Load automation:

Test Data Volume:

Results:

Analysis

Scalability Tests

Results:

Configuration Settings:

Summary:

Remedy Deployment 20.02

On this page