FAQs about performance and scalability

This topic provides answers to some frequently-asked questions about performance and scalability.

What are the factors that I must consider for deployment sizing?

For information about the various factors that affect deployment sizing, see Variables that affect sizing and scalability .

How much storage do I require for my deployment?

The storage required for your deployment depends on the compression ratio, volume of data, and the data retention period.

Storage required = Compression ratio * Volume of data (per day) * Retention period (in days)

The compression ratio is the amount of compression you can achieve while storing the data.

This ratio can vary because of many factors, such as:

Type of data (with or without timestamp)
Size of data
Number of extracted fields
Number of unique field values (cardinality)

The following table shows the average storage compression ratio observed in the laboratory tests:

Data pattern	Compression ratio	Comments
Apache Access	1.00 times	Raw text + timestamp + extracted fields
FreeText with Timestamp	0.78 times	Raw text + timestamp

For more information, see Sizing-drivers-and-their-impact.

What is the impact of increasing the retention period?

For information about the impact of increasing the retention period, see Sizing-drivers-and-their-impact.

How many data collectors can I use with a single Collection Station or Collection Agent?

There is no theoretical limit to the number of data collectors that can be configured on a single Collection Station or Collection Agent.

However, we have tested upto 3000 data collectors on a single Collection Station (or Collection Agent) on an independent server, that does not host the application for monitoring. BMC recommends that you do not cross this limit for optimum performance, based on the following scenarios:

If your collection mechanism (Collection Station or Collection Agent) is on the same server as the application for monitoring: In this case, configure data collectors specifically for collecting data related to that application.
If your collection mechanism is on a server that is separate from the server where the application is installed: In this case, you can configure up to 3000 data collectors.

What is the overhead of the Collection Agent on the target host (that also hosts the PATROL Agent)?

The overhead of the Collection Agent on the target host is minimal.

During the performance tests, it was observed that the CPU overhead was around 2 - 5 % with 256 MB of RAM.

The following table provides some indicators regarding the performance tests carried out on a virtual setup using Intel^® Xeon^® CPU E5-2660 @ 2.20GHz processor with a storage of 300 IOPS.

Polling frequency (in minutes)	Average CPU utilization	Number of data collectors	Data collection rate
1	2%	1	67 MB per day
1	2%	5	338 MB per day
1	2%	10	677 MB per day
1	4%	10	1.4 GB per day
1	6%	10	2.7 GB per day

Why do I need to allocate more memory if I increase the retention period?

By default, the product stores the collected data in time-based indices of 6 hours each. In each index, the metadata is stored in the memory to optimize the searches. The additional RAM requirement arises due to this metadata that is stored in the Indexer memory.

For more information, see Variables that affect sizing and scalability.

Why is my browser response time slow while browsing through the different tabs or pages in the product?

The browser response time can be slow due to the following reasons:

High CPU utilization on the BMC TrueSight IT Data Analytics server
Console Server might have too low Java heap setting

For more information, see:

Why does my Configuration Database process (or any other component process) end without any warnings or errors?

The Java heap sizes of the component processes (including for the Configuration Database) might be over-allocating the available RAM.

For more information about configuration recommendations for various components, see Component configuration recommendations.

How can I improve the search performance especially the Filters panel calculations?

The search performance can be impacted due to various reasons as described in the following table:

Reason	Solution
High number of fields in the Filters panel	Reduce the number of fields in the Filters panel
High cardinality fields (fields with a large number of unique values) in the Filters panel	Remove the high cardinality fields from the Filters panel
High time range of the search query	Reduce the time range of the search query

Even though the data collectors are set to polling of 1 minute, the data collected is falling behind the scheduled poll time, what can I do?

Navigate to the agent.properties file of the Collection Station and increase the collection.thread.pool.size property value to 160. For more information, see Modifying-the-configuration-files.

If the collection.thread.pool.size property is already set at 160 and the system seems to be running at full capacity, you can try increasing the polling interval.

The search command execution did not complete and now the product user interface seems very slow or non-responsive; what should I do?

You can increase the maximum Java heap size in the server.conf file. For more information, see Component configuration recommendations.

I found an Out Of Memory Error for the Java heap size in the Indexer logs. What can I do to prevent this error from recurring?

You can perform the following steps:

Increase the maximum Java heap size allocated to the Indexer in the indexer.conf file.
Modify the customIndexStrategies.yml file so that in the name: "data" section, the intervalInHrs property is set to either 2 or 1 (instead of the default 6).
Remove high cardinality fields (fields with a large number of unique values) from the Filters panel, on the left.
Reduce the number of concurrent searches
Reduce the data retention period

For more information, see Component configuration recommendations.

Why is the overall product too slow, how can I investigate the problem?

For information about the possible steps you can take, see Troubleshooting-performance-issues.

How many events can the product index per hour?

The number of events that can be indexed is a factor of the average size of each event. In our performance tests, we focus on the size of the data indexed, because the size of the data is a widely known unit of data that is generated by applications. For example, 100 GB data per day on a reference hardware server.

In which situations can a log message take longer than expected to become searchable in the product?

The data might take longer than the expected time if the system is heavily loaded due to a large number of collectors. At times if the system is stabilizing after a downtime, it attempts to index old data that remained pending in the Collection Station. In such a case, the data will take longer than expected, before it is available for searching.

Is it mandatory to create a data pattern before I index any data?

No, it is not mandatory to create a data pattern before indexing data.

The product offers a list of default data patterns for most of the common log formats that you can use while creating a data collector. For more information, see Default-data-patterns.

In case there is no matching data pattern, the product tries to identify the matching time stamp format (and treats all other data as raw data). Alternatively, you can also index data as free text, in which case no time stamp is extracted from the log file, but the time of indexing of data is associated with the entries. For more information, see Managing-data-patterns.

What is the network bandwidth required for using the product?

The network bandwidth requirements can vary depending on the data generation rate and the type of data collectors used. The network utilization calculation is the same irrespective of the volume of data generated.

For example, for 100 Kb of data generated in one second, the network bandwidth required is approximately 60 Kbps (for data collected using a Collection Agent).

The following formulas illustrate the network bandwidth calculation:

Local file collection using Collection Agent: 0.6 * Data Transfer Rate (or Data Volume)
Data flow
Target host → Collection Station
Target host → Collection Agent → Collection Station
SSH / Windows Share collection: 1.6 * Data Transfer Rate (or Data Volume)
Data flow
Target host → Collection Station
Target host → Collection Agent → Collection Station

Data flow
Target host → Collection Station
Target host → Collection Agent → Collection Station

Data flow
Target host → Collection Station
Target host → Collection Agent → Collection Station

What components to scale?

The IT Data Analytics product helps you perform the following main functions:

Data collection (handled by the Collection Station)
Indexing (handled by the Indexer)
Search (handled by the Search component)

Based on your needs, you can split these functions across multiple servers to handle these functions separately.

Thus, you can consider scaling the Collection Station, Indexer, and Search components.

The following topics provide the recommended deployment scenarios for scaling.

When to scale?

The following two factors are an indication that you might need to scale the Collection Station, Indexer, or Search components:

Product performance is deteriorating
Hardware resources such as the processor, memory, storage and disk I/O start exceeding acceptable limits.

For more information, see Indicators-for-scaling.

Which variables impact sizing and performance?

The amount by which the capacity of your system meets your business needs plays an important role in determining the performance of the system. This means the overall product performance is largely influenced by the hardware capacity available for supporting the business needs. The accuracy of your hardware sizing estimates therefore acts as a base for ensuring a smooth deployment.

The primary drivers that affect sizing are:

Volume of data indexed per day
Retention period (duration for which you want to store data indexed)
Number of concurrent users likely to access or search the data indexed

For more information, see Sizing-drivers-and-their-impact.

Additionally, there are other factors that impact the product performance, for example, the number of fields defined in the data patterns, the number of tags specified in the data collectors, the number of notification set, and so on. These factors impact the resources that support the product functioning (such as processor, memory, storage) and thereby affect the product performance. The amount by which these factors impact the product performance depends on the manner in which you use the product. For more information about the list of factors and the level at which they impact performance, see Variables-that-impact-product-performance.

FAQs about performance and scalability

What are the factors that I must consider for deployment sizing?

How much storage do I require for my deployment?

What is the impact of increasing the retention period?

How many data collectors can I use with a single Collection Station or Collection Agent?

What is the overhead of the Collection Agent on the target host (that also hosts the PATROL Agent)?

Why do I need to allocate more memory if I increase the retention period?

Why is my browser response time slow while browsing through the different tabs or pages in the product?

Why does my Configuration Database process (or any other component process) end without any warnings or errors?

How can I improve the search performance especially the Filters panel calculations?

Even though the data collectors are set to polling of 1 minute, the data collected is falling behind the scheduled poll time, what can I do?

The search command execution did not complete and now the product user interface seems very slow or non-responsive; what should I do?

I found an Out Of Memory Error for the Java heap size in the Indexer logs. What can I do to prevent this error from recurring?

Why is the overall product too slow, how can I investigate the problem?

How many events can the product index per hour?

In which situations can a log message take longer than expected to become searchable in the product?

Is it mandatory to create a data pattern before I index any data?

What is the network bandwidth required for using the product?

What components to scale?

When to scale?

Which variables impact sizing and performance?

On this page