This topic provides answers to some frequently-asked questions about performance and scalability.
For information about the various factors that affect deployment sizing, see Variables that affect sizing and scalability.
The storage required for your deployment depends on the compression ratio, volume of data, and the data retention period.
Storage required = Compression ratio * Volume of data (per day) * Retention period (in days)
The compression ratio is the amount of compression you can achieve while storing the data.
This ratio can vary because of many factors, such as:
The following table shows the average storage compression ratio observed in the laboratory tests:
Data pattern | Compression ratio | Comments |
---|---|---|
Apache Access | 1.3 times | Raw text + timestamp + extracted fields |
FreeText with Timestamp | 0.83 times | Raw text + timestamp |
For more information, see Sizing drivers and their impact.
For information about the impact of increasing the retention period, see Sizing drivers and their impact.
There is no theoretical limit to the number of data collectors that can be configured on a single Collection Station or Collection Agent.
However, we have tested upto 3000 data collectors on a single Collection Station (or Collection Agent) on an independent server, that does not host the application for monitoring. BMC recommends that you do not cross this limit for optimum performance, based on the following scenarios:
The overhead of the Collection Agent on the target host is minimal.
During the performance tests, it was observed that the CPU overhead was around 2 - 5 % with 256 MB of RAM.
The following table provides some indicators regarding the performance tests carried out on a virtual setup using Intel® Xeon® CPU E5-2660 @ 2.20GHz processor with a storage of 300 IOPS.
Tip
IOPS is calculated based on the Bonnie++ metric, Random Seek/s.
Polling frequency (in minutes) | Average CPU utilization | Number of data collectors | Data collection rate |
---|---|---|---|
1 | 2% | 1 | 67 MB per day |
1 | 2% | 5 | 338 MB per day |
1 | 2% | 10 | 677 MB per day |
1 | 4% | 10 | 1.4 GB per day |
1 | 6% | 10 | 2.7 GB per day |
By default, the product stores the collected data in time-based indices of 6 hours each. In each index, the metadata is stored in the memory to optimize the searches. The additional RAM requirement arises due to this metadata that is stored in the Indexer memory.
For more information, see Variables that affect sizing and scalability.
The browser response time can be slow due to the following reasons:
For more information, see:
As a workaround, you can restart the browser and access IT Data Analytics.
The Java heap sizes of the component processes (including for the Configuration Database) might be over-allocating the available RAM.
For more information about configuration recommendations for various components, see Component configuration recommendations.
The search performance can be impacted due to various reasons as described in the following table:
Reason | Solution |
---|---|
High number of fields in the Filters panel | Reduce the number of fields in the Filters panel |
High cardinality fields (fields with a large number of unique values) in the Filters panel | Remove the high cardinality fields from the Filters panel |
High time range of the search query | Reduce the time range of the search query |
You can increase the maximum Java heap size in the server.conf file. For more information, see Component configuration recommendations.
For information about the possible steps you can take, see Troubleshooting performance issues.
The number of events that can be indexed is a factor of the average size of each event. In our performance tests, we focus on the size of the data indexed, because the size of the data is a widely known unit of data that is generated by applications. For example, 100 GB data per day on a reference hardware server.
The data might take longer than the expected time if the system is heavily loaded due to a large number of collectors. At times if the system is stabilizing after a downtime, it attempts to index old data that remained pending in the Collection Station. In such a case, the data will take longer than expected, before it is available for searching.
No, it is not mandatory to create a data pattern before indexing data.
The product offers a list of default data patterns for most of the common log formats that you can use while creating a data collector. For more information, see Default data patterns.
In case there is no matching data pattern, the product tries to identify the matching time stamp format (and treats all other data as raw data). Alternatively, you can also index data as free text, in which case no time stamp is extracted from the log file, but the time of indexing of data is associated with the entries. For more information, see Setting up data patterns to extract fields.
The network bandwidth requirements can vary depending on the data generation rate and the type of data collectors used. The network utilization calculation is the same irrespective of the volume of data generated.
For example, for 100 Kb of data generated in one second, the network bandwidth required is approximately 60 Kbps (for data collected using a Collection Agent).
The following formulas illustrate the network bandwidth calculation:
Local file collection using Collection Agent: 0.6 * Data Transfer Rate (or Data Volume)
Data flow |
---|
Target host → Collection Station |
Target host → Collection Agent → Collection Station |
SSH / Windows Share collection: 1.6 * Data Transfer Rate (or Data Volume)
Data flow |
---|
Target host → Collection Station |
Target host → Collection Agent → Collection Station |
The IT Data Analytics product helps you perform the following main functions:
Based on your needs, you can split these functions across multiple servers to handle these functions separately.
Thus, you can consider scaling the Collection Station, Indexer, and Search components.
The following topics provide the recommended deployment scenarios for scaling.
The following two factors are an indication that you might need to scale the Collection Station, Indexer, or Search components:
For more information, see Indicators for scaling.
The amount by which the capacity of your system meets your business needs plays an important role in determining the performance of the system. This means the overall product performance is largely influenced by the hardware capacity available for supporting the business needs. The accuracy of your hardware sizing estimates therefore acts as a base for ensuring a smooth deployment.
The primary drivers that affect sizing are:
For more information, see Sizing drivers and their impact.
Additionally, there are other factors that impact the product performance, for example, the number of fields defined in the data patterns, the number of tags specified in the data collectors, the number of notification set, and so on. These factors impact the resources that support the product functioning (such as processor, memory, storage) and thereby affect the product performance. The amount by which these factors impact the product performance depends on the manner in which you use the product. For more information about the list of factors and the level at which they impact performance, see Variables that impact product performance.