Indicators for scaling

This topic provides the following information that you can use as guidelines for deciding when to scale. However this information is only indicative and is meant for reference purposes only.

The following factors are an indication that you might need to scale the Collection Station, Indexer, or Search components:

Product performance is deteriorating
Hardware resources such as the processor, memory, storage and disk I/O start exceeding acceptable limits.
The rate of indexing is falling behind the rate of data collection.
If the Anomaly baseline job takes more than 7 minutes to complete.

Note

If the Anomaly baseline job takes more than 7 minutes to complete, you must perform one or more of the following to decrease the baseline job completion time.

Scale the Indexer.
If the Search server has a lot of idle CPU, then increase the baseline.cluster.search.concurrency property present in custom/conf/server/olaengineCustomConfig.properties file.
Scale the Search server.

Indicators for scaling Collection Station

The Collection Station needs to be scaled when a large volume of data to be indexed exceeds the capacity of the current available hardware.

The following table lists the factors that you can use as indicators for scaling up the Collection Station component; you can also see the functions supported by the factors.

Factor	Indicators for scaling	Functions supported
Processor	Average CPU required exceeds 75%.	Large volume of indexing Large number of data collectors
Memory	Maximum Java heap size required is greater than 80% of the physical memory available. This is controlled by the wrapper.java.maxmemory property. For more information, see Component-configuration-recommendations-for-horizontal-scaling.
Disk I/O	Disk I/O transfer rate exceeds 75% of the disk I/O available.

Indicators for scaling Indexer

The Indexer needs to be scaled when the overall product usage exceeds the hardware capacity and starts negatively impacting the Collection Station (indexing) or Search components.

The following table lists the factors that you can use as indicators for scaling up the Indexer component; you can also see the functions supported by the factors.

Factor	Indicators for scaling	Functions supported
Processor	Average CPU required exceeds 75%.	High volume of indexing High number of concurrent searches Large number of hosts with anomaly detection enabled
Memory	Maximum Java heap size required is greater than 60% of the physical memory available. This is controlled by the wrapper.java.maxmemory property. For more information, see Component-configuration-recommendations-for-horizontal-scaling.	High number of concurrent searches Search queries with high number of fields Higher data retention period
Disk I/O	Disk I/O transfer rate exceeds 75% of the disk I/O available.	High volume of indexing High number of concurrent searches
Disk space	Disk space required exceeds 75% of the available disk space. Disk space required = (Rate of disk space growth per day) * (Number of days of data retention)	Large volume of data Higher data retention period

Indicators for scaling Search

The Search component needs to be scaled when the overall product usage exceeds the hardware capacity and starts negatively impacting the Search component.

Use the following factors as indicators for scaling up the Search component:

Factor	Indicators for scaling	Functions supported
Processor	Average CPU required exceeds 75%.	High number of concurrent searches High number of email notifications (containing reports)
Memory	Maximum Java heap size required is greater than 75% of the physical memory available. This is controlled by the wrapper.java.maxmemory property. For more information, see Component-configuration-recommendations-for-horizontal-scaling.	High number of concurrent searches High number of target hosts
Anomaly baseline job	Anomaly baseline job completion time is greater than 7 minutes and the indexer is not CPU bottle-necked. This can be monitored through the itda.log file.	Anomaly detection

Variables that impact hardware resources

Overall, the variables described in the following table determine the amount by which the resources required by the Collection Station, Indexer, and Search components are impacted. Your decision to scale depends on the amount by which the resources in your environment are affected by each of these usage factors.

For example, if you have a large volume of data to index, then you need to scale the Collection Station and Indexer components. Because the level of impact for this usage factor is high, you must certainly consider scaling.

The following table provides information about variables that impact your hardware resources and thereby impact your decision to to scale the Collection Station, Search, and Search components. The levels of impact indicate the impact to the Java heap size and CPU and is described as High, Medium, Low, and None.

Variables that determine impact to components that can be scaled

Variable	Collection Station	Indexer	Search
Large volume of data to index	High	High	None
Large number of concurrent users actively using the product	None	High	High
Large number of email notifications (containing reports)	None	None	High
Large number of notifications	None	Medium	None
Large number of fields added to the Filters panel	None	High	None
Concurrent searches with large time ranges	None	High	None
Large number of concurrent searches (without search commands)	None	High	Low
Large number of concurrent simple search commands (dedup, group, filter, head, tail)	None	High	Medium
Large number of concurrent advanced search commands (timechart, stats,top, rare)	None	High	Medium (Java heap size)
	None	High	Low (CPU)
Higher data retention period (for Java heap size)	None	High	None
Large number of data collectors	Medium	Medium	None
Large number of dashboard (or dashlet) executions	None	High	Medium
Large number of hosts	High	High	High

Assessing the load handled by Collection Station or Indexer

You can assess the amount of load handled by the Collection Station or Indexer by finding out the indexing lag. The indexing lag is the time lag between the time at which data was collected and time at which data was indexed, for each poll (made by the data collector).

This can be done by performing searches on the data available in the Collection_metrics.log file. For example, you can run the following search query:

_index=metrics * && engine=COLLECTION_STATION && indexing-lag=* |
head 100000 | filter indexing-lag > 300000

This search query displays results that have an indexing lag greater than five minutes. The indexing lag is indicated by the indexing-lag field in the search results. The value of this field is provided in milliseconds. The results displayed by running this search query indicates that the indexing activity is not able to keep pace with the collection activity.

Example

You can modify the search query in various ways depending on your goal.

Example 1: If you want to see results with an indexing lag greater than two minutes, you need to change the string 300000 to 120000 in the search query, as follows:

_index=metrics * && engine=COLLECTION_STATION && indexing-lag=* |
head 100000 | filter indexing-lag > 120000

Example 2: If you want to use the search query to see the pattern in which the indexing lag occurred for the last seven days, then you can run the following search query with the time range set to Last 7 days.

_index=metrics * && engine=COLLECTION_STATION && indexing-lag=* |
head 100000 | filter indexing-lag > 300000 | timechart span=1d count(indexing-lag), avg(indexing-lag) by collectorid

In the preceding search query the collectorid represents the Collection Station identifier.

This query displays the following information for each day per collectorid:

Total number of polls where the indexing lag is greater than five minutes (represented by the count function).
Average indexing lag obtained from the polls where the indexing lag is greater than five minutes (represented by the avg function).

You can also monitor the search results obtained by running the preceding search queries by adding dashboards or notifications. To do this, you need to first save the search query. For more information, see Saving-and-sharing-searches-for-analytics-and-monitoring.