Page tree

The following list illustrates the kind of issues that you might want to consider while troubleshooting a problem.

System becoming slow

The following table lists scenarios in which this your system might become slow:

Scenario

Probable causes with solutions (if any)

The Collection Station went down. After the Station was restarted, the system became very slow

When you restart the Collection Station, all of the data collectors try to catch up and send the old pending data into the Collection Station for indexing.

Based on the number of data collectors, this process can take some time (a few minutes to a few hours) to complete.

The Indexer remained down for two days over the weekend, and after IT Data Analytics server was restarted, the system became very slow

This issue can occur if the Collection Station cached a lot of data over the weekend. When you restart the IT Data Analytics server, the Collection Station pushes all cached data together into the Indexer.

Solution:

  1. Stop the Collection Station and clean up the %BMC_ITDA_HOME%\station\collection\data folder.

    Note, however, that when you clean up the folder, you can lose cached data.

  2. Leave the system in the current state until all of the data is sent to the Indexer.


Product components and Collection Agents showing red status

The following table lists scenarios in which one or more product components or Collection Agents displays a red status:

Scenario

Probable causes with solutions (if any)

Some of the Collection Agents are showing up as red on the Administration > Hosts page and do not change to green, even though all of the servers are running.

This issue might occur if the Collection Station remains down at the time at which the Collection Agents start.

Solution: Restart the Collection Agents after the Collection Station is up and running.

The Configuration Database and Indexer are showing as green on the Administration > Components page, but some of the other components are down (on the Linux operating system).

The components might not have been started in the recommended order.

Solution: Restart the services in the correct order. For more information, see Starting or stopping product services .

Status of the Collection Station appears red on the Administration > Components tab.

This issue might occur in two scenarios:

  1. If the computer on which the Collection Station is installed has multiple IP addresses and during installation you provided an IP address (bind address) that cannot be connected from the Console Server.
    Solution: To resolve this issue, you must add the following properties in the Collection Station's agent.properties file (custom file):
    • httpBindAddress=0.0.0.0
    • payload.bindaddress=0.0.0.0
    For more information, see Modifying the configuration files.
  2. The host name specified while registering the self-signed certificate does not match the host name of the computer where the Collection Station is installed. You can find the correct host name by navigating to the %BMC_ITDA_HOME%\logs\itda.log and search for the following line:

    com.sun.jersey.api.client.
    ClientHandlerException:
    javax.net.ssl.SSLHandshake
    Exception: java.security.cert.
    CertificateException: <Host-Name>
    No name matching  found

    where, <Host-Name> refers to the host name of the Collection Station.
Status of the Search component appears red on the Administration > Components tab.The host name specified while registering the self-signed certificate does not match the host name of the computer where the Search component is installed. You can find the correct host name by navigating to the %BMC_ITDA_HOME%\logs\itda.log and search for the following line:

com.sun.jersey.api.client.
ClientHandlerException:
javax.net.ssl.SSLHandshakeException:
java.security.cert.CertificateException:
<Host-Name>No name matching  found

where, <Host-Name> refers to the host name of the Search component.

Issues accessing the product

The following table lists issues related to accessing the TrueSight IT Data Analytics product:

Scenario

Probable causes with solutions (if any)

Unable to access the product from the Start menu

You might not be able to start the product, if:

  • During installation, you did not have the product start services immediately after installation.
  • The ports that you specified during installation, are already in use.

Solutions:

Unable to access the product after cross-launching from ProactiveNet.

This might occur if the product URL uses the internal host name of the Console Server. This can happen if the Console Server is installed on a computer which uses both an external host name and internal host name, and the internal host name is sent to ProactiveNet for cross-launch.

Solution: Edit the olaengineCustomConfig.properties file, locate and uncomment the consoleserver.host property, and then change the value of the property to the correct host name. For more information, see Modifying the configuration files.

After installing the product, you cannot log on to the product with the default admin credentials.

This scenario might occur if you are using Atrium Single Sign-On for user authentication and the default admin, or the default Administrators user group, or both are already present on Atrium Single Sign-On.

Solution: In this scenario, a new user itdaadmin is automatically created. As an Administrator, you can use the following default credentials to log on to TrueSight IT Data Analytics.

  • User name: itdaadmin
  • Password: admin12345

When you log on to the Console Server, you see the following error:

Console Server did not start
correctly. Contact your system
administrator for details or
see the IT Data Analytics log files.

In addition, the itda.log located at %BMC_ITDA_HOME%\logs, contains the following message:

ERROR: Error in initializing DB. 
Please check Database.

This issue can occur if the Console Server was started before the Configuration Database.

Solution: Restart the Console Server. For more information, see Starting or stopping product services.


Search-related issues

The following table lists issues related to not finding data even though it was indexed, issues faced during search, and search results obtained:

Scenario

Probable causes with solutions (if any)

Unable to search for indexed data.

This can happen in two scenarios:

  • Search component is unable to connect to the Indexer: In this scenario, you might not be able to search data.
    You might get the following errors:
    • Error on the Search tab:

      Could not connect to the Indexer.
      Go to Administration > Components
      to see if the Indexer is up and
      running or contact your Administrator
      for support.

    • Error in the itda.log file located at %BMC_ITDA_HOME%\logs:

      org.elasticsearch.transport.
      ConnectTransportException:
      [Blackout][inet[ipaddress]]
      connect_timeout[30s]

  • Collection Station is unable to connect to the Indexer: In this scenario, you might not be able to collect data.
    You can see the following error in the collection.log file located at %BMC_ITDA_HOME%\station\collection\logs:

    org.elasticsearch.transport.
    ConnectTransportException:
    [Blackout][inet
    [ipaddress]] connect_timeout[30s]

Solution:

  • If the Search component is unable to connect to the Indexer: Perform the following steps:
    1. Add the following properties in the searchserviceCustomConfig file located at %BMC_ITDA_HOME%\custom\conf\server:
      • indexing.network.bind_host: Specifies the host name or IP address of the Search component that is accessible to the Indexers.
      • indexing.network.publish_host: Specifies the fully qualified host name of the computer where the Search component is installed.
    2. Restart the Search component. For more information, see Starting or stopping product services.
  • If the Collection Station is unable to connect to the Indexer: Perform the following steps:
    1. Add the following properties in the agent.properties file located at %BMC_ITDA_HOME%\station\collection\custom\conf:
      • indexing.network.bind_host: Specifies the host name or IP address of the Collection Station that is accessible to the Indexers.
      • indexing.network.publish_host: Specifies the fully qualified host name of the computer where the Collection Station is installed.
    2. Restart the Collection Station. For more information, see Starting or stopping product services.
Data indexed time on the Search tab is ahead of the time at which the notification was generated

By default, there is a delay of 90 seconds between data collection and reporting of search results to the product. Therefore, during notification creation, when you select one of the search duration options and apply a condition related to the number of results, you can expect a delay of 90 seconds.

Solution: You can change the 90 seconds time lag by modifying the value of one of the following properties available in the searchserviceCustomConfig.properties file. For more information, see Modifying the configuration files.

  • notification.search.exec.offset.sec
    Change the value of this property if you set the search duration to Last execution to current execution.

  • notification.search.relative.exec.offset.sec
    Change the value of this property if you set the search duration to any other option other than Last execution to current execution. For example, Last 60 minutes, Last 6 hours, and so on.
Data is being generated in the files for monitoring, but no data can be seen when performing a search

This issue might occur if the time zone specified during data-collector creation is set incorrectly.

Solution: Ensure that the time zone is set correctly when you create data collectors.

While searching for data, you see the following error:

The search string is too complex or the time range specified is too large or both.

This might occur if the number of results returned for the search string is too large. This might happen if the search string specified is too complex or the time range specified is too large, or both.

Solution: Reduce the time range for which you are searching data or provide specific search strings that are likely to occur in the data that you are searching.

While running the timechart search command command, you see the following error:

The dataset that you are trying to plot is too large.  The span value specified or the time range selected is too large for the current search query.

This can occur when there are too many data points or bars displayed in the chart.

Solution: Either reduce the time range or reduce the span value in the search string, or both.

The search times out.

OR

The search results are displayed. But, after you export them, the output file does not contain data.

OR

The search times out and if you export the search results, the output file does not contain data.

This issue can occur if the search query is too complex or if you are searching over a large time duration.

Workarounds: Perform one of the following actions:

  • Change the query to search over a smaller time duration. For more information, see the Searching with a time context section in Getting started with search.
  • Increase the search timeout value in the searchserviceCustomConfig.properties file as follows:
    1. Open the searchserviceCustomConfig.properties file from the following location in a text editor such as Notepad or gedit. 
      For Windows: C:\Program Files\BMC Software\TrueSight\ITDA\
      custom\conf\server\searchserviceCustomConfig.properties

      For Linux: /opt/bmc/TrueSight/ITDA/
      custom/conf/server/searchserviceCustomConfig.properties
    2. Search for the following lines in the file:
      # indexing.psJobTimeoutInmsec=600000
      # indexing.psJobGetMoreTimeoutInmsec=60000
      These lines represent timeouts in milliseconds.
    3. Uncomment these lines by removing the prefix - #.
    4. Increase the timeout values. For example:
      indexing.psJobTimeoutInmsec=60000000
      indexing.psJobGetMoreTimeoutInmsec=600000
    5. Restart the ITDA Server component.

Even after specifying the correct data pattern or date format while creating the data collector, the timestamp in not correctly extracted in the search results.

OR

While creating a data collector, the Auto-Detect option used for filtering data patterns and date formats does not display appropriate results.

This scenario might occur if the data that you are trying to collect uses a character set encoding (file encoding) other than the default UTF-8 encoding.

Solution: While creating the data collector, ensure that the character set encoding (file encoding) is set to the correct value.

If you are unable to correctly determine this value, you can use the filtering option to find a list of relevant character set encodings matching your data. To apply the filter, navigate to Advanced Options > File Encoding and click Filter the relevant character set encodings .

While searching for data, you see the following error:

Could not connect to the Indexer. Go to Administration > Components to see if the Indexer is up and running or contact your Administrator for support.

This issue can occur if the Indexer and Search components are unable to communicate with each other.

This can happen in the following scenarios:

  • Host name is not set up correctly in the following file.
    • Windows: \etc\hosts
    • Linux: /etc/hosts
  • Host name is not set up correctly in the DNS records.
  • Any other network issues.

Solution: Ensure that the Indexer is reachable from the Search node and vice versa.

Sometimes when you run particular search commands, you see the following error:

Partial results available. The search string is too complex or the time range specified is too large or both.

To provide optimum search performance, by default, the search results list is limited to a maximum count of 1,000,000 records.

Running particular search commands on large volumes of indexed data can result in searching the entire set of indexed data before returning results. Such search scenarios can cause the default limit to be reached more frequently.

This can happen only in case of particular search commands, such as, dedup search command, group search command, timechart search command, stats search command, and so on.

Solution: Apply one of the following measures to avoid getting this error:

  • Consider narrowing down the time range or simplifying the search string (if it is too long or complex).

  • Change the default limit used for obtaining the maximum number of search results for a search query.

    To do this, navigate to the searchserviceCustomConfig.properties file, and add the search.events.fetch.limit property (with a higher value).

    For more information about the file location, see Modifying the configuration files.

    Best practice

    Changing the search.events.fetch.limit property value can result in slow searches.

    Therefore, it is recommended that you do not change this property value.



Data collection-related issues

The following table lists issues related to the data collectors, data loss occurrences, and other scenarios associated with data collection:

Scenario

Probable causes with solutions (if any)

The Collection Station on a Windows computer is not starting or working properly after the time it stopped abruptly and you see the following exceptions in the collection.log.

  • BadCheckpointException
  • IllegalStateException

This issue is rare and might occur when the Collection Station stops abruptly, which can happen if the %BMC_ITDA_HOME%\station\collection\data\c*\flume-checkpoint(1)\checkpoint file becomes corrupted.

You can find the exact name of the corrupted checkpoint file in the collection.log file located at %BMC_ITDA_HOME%\station\collection\logs. In this file, you can search for the line containing the IllegalArgumentException error. 

Error example:

java.lang.IllegalStateException:
Destination file:
C:\Tasks\HA\MultipleServers\
station1\data\c1\
flume-checkpoint1\checkpoint
unexpectedly exists
 

Workarounds:

  • If data loss is acceptable: Delete the data directory located at %BMC_ITDA_HOME%\station\collection\ and restart the Collection Station.
  • If data loss is unacceptable:
    1. Stop the Collection Station. For more information, see Starting or stopping product services.
    2. Perform a backup of the %BMC_ITDA_HOME%\station\collection\data\c*\flume-checkpoint(2)\checkpoint file if the error occurs for the flume-checkpoint(1) file.
      (On the other hand, if the error occurs for the flume-checkpoint(2) file, then perform a backup of the %BMC_ITDA_HOME%\station\collection\data\c*\flume-checkpoint(1)\checkpoint file).
    3. Replace the corrupted checkpoint file with the backup file.
    4. Restart the Collection Station.
Data collector has been created, but the results cannot be seen

After the data collector is created, it might take some time (approximately 1 minute) for the first poll to happen. The first poll is used to make the data collector ready for data collection. The data is fetched only from the second poll.

Expected time delay (to see the first set of data for a search) = (Time for first poll) + (Poll interval set for the data collector)

Some data got lost when the Collection Station (or Collection Agent) went down briefly or when the data collector was stopped.

By default, the number of days for which data must be collected and indexed (Read from Past (#days) function) is set to zero. As a result, during the data collection, if one of the following occur, you can experience data loss.

  • The Collection Station (or Collection Agent) goes down.
  • The data collector was stopped.

Data collecting resumes from the point when the Collection Station (or Collection Agent) is up again or the data collector is restarted.

Data added during the time when the Collection Station (or Collection Agent) remained down or the data collector was stopped is ignored.

You can change the Read from Past (#days) default value for the Monitor Local Windows Events and Monitor using external configuration type of data collectors.

The polling status for the following data collectors shows red (unsuccessful polling).

You see the Invalid value error when you select one of the preceding data collectors and click Last 10 Polls Status of Data Collector .

Example: 
collection-station_
Host1.bmc.com:Invalid
value: -473379094

This issue might occur in the following scenarios:

This issue occurs because the number of concurrent SSH connections allowed to the target host is lesser than the the number of data collectors that you want to create. The number of concurrent SSH connections determine the number of data collectors that you can create for collecting data from the same target host.

Solution: Navigate to the /etc/ssh/sshd_config directory and increase the value of the MaxSessions parameter. This solution is only applicable for OpenSSH version 5.1 and later.

At the time of creating a data collector, when you try to filter the relevant data patterns, you might see the following error:

Collection Station not
reachable.

This issue might occur if the system response time of the server on which you are trying to create the data collector is slow.

Solution: Edit the olaengineCustomConfig.properties file located at %BMC_ITDA_HOME%\custom\conf\server\ and then add the station.response.timeout property with a value greater than 120.

Example: station.response.timeout=180

This property determines the duration of time (in seconds) for which the Console Server waits to receive a response from the Collection Station.

You are experiencing some data loss and you see the following error in the collection.log file located at %BMC_ITDA_HOME%\station\collection\logs.

ElasticsearchTimeout
Exception: Timeout
waiting for task

Even when all the Indexers in your environment are up and functioning normally, this error might occur due to various reasons. For example, a poor network connection or the system on which the Indexers (or the Collection Station) reside have become slow.

Workaround: Increase the value of the indexing.request.timeoutmillis property. For example, the default value of this property is 5000, you can double it to 10000. For more information, see Component configuration recommendations for horizontal scaling.

Some data collected by the Receive over TCP/UDP data collector is not getting indexed and occasionally you find the following message in the collection.log file.

Buffer is full, write cannot proceed

This might occur when the rate at which the sender sends data via the TCP port is greater than the rate at which the Receive over TCP/UDP data collector indexes data. This indicates that the data collector is dropping records and needs to be tuned.

Solution: To allow for indexing higher volumes of data per day on a single data collector, you must add the following properties with appropriate values in the agent.properties file. For more information, see Modifying the configuration files.

  • collection.reader.batch.size: The total batch size (number of messages) that will be indexed by a single data collector.

  • collection.reader.portreader.eventbuffer.maxsize: The maximum number of messages that will be waiting to be indexed by a single data collector.

Note that the following property values were used for indexing up to 100 GB of data in the lab environment.

  • collection.reader.batch.size=8000
  • collection.reader.portreader.eventbuffer.
    maxsize=2048000

The polling status for the Monitor Remote Windows Events data collector shows red (unsuccessful polling).

OR

While creating the Monitor Remote Windows Events data collector, if you provide all the necessary inputs, and click Test Connection  next to the Domain field, you see the following error:

Error in establishing connection with host.

This can happen in the following two scenarios:

  • If you do not perform the steps necessary for enabling the target host for Windows event collection. For more information, see Enabling Windows event collection (Linux collection host).
  • If the product cannot connect with the selected target host. This can happen if the target host carries multiple host names and the selected target host uses a host name with which the product is unable to connect.

Solution: Provide the correct host name so that the target host is communicable. To do so, you can perform one of the following steps:

  • Modify the target host details on the Hosts tab and then create the data collector with the correct target host selected.
  • Click the cross icon in the Target Host field and then manually provide the correct host name in the Server Name field.

For the Monitor Remote Windows Events data collector: You get the following error message in the Collection Status History page:

WBEM events exceeded

This issue can occur if the data collector has a large number of events in one poll.

Workaround: Add the following property in the agent.properties file that is located at %BMC_ITDA_HOME%\station\collection\custom\conf.

collection.reader.wbem.executor.query.time.slice:  Specifies the time period (in minutes) in which the events must be fetched in a batch within a poll.

While creating a data collector, the Auto-Detect option used for filtering data patterns and date formats does not display appropriate results.

OR

Even after specifying the correct data pattern or date format while creating the data collector, the timestamp in not correctly extracted in the search results.

This scenario might occur if the data that you are trying to collect uses a character set encoding (file encoding) other than the default UTF-8 encoding.

Solution: While creating the data collector, ensure that the character set encoding (file encoding) is set to the correct value.

If you are unable to correctly determine this value, you can use the filtering option to find a list of relevant character set encodings matching your data. To apply the filter, navigate to Advanced Options > File Encoding and click Filter the relevant character set encodings .

Some data got lost when a new Collection Station was added to the environment.

When you scale the Collection Stations in your environment after installation (or upgrade), the Collection Agents in your environment are automatically restarted.

The restart of Collection Agents results in the restart of the existing data collectors. The time taken for the data collectors to start can result in a minor break in the data collected.

If the rolling time for logs is shorter than the polling interval specified in the data collector, data is lost.

This loss happens because the old file is deleted before it gets indexed completely.

Solution: Ensure that the polling interval is shorter than the rolling interval.


Integration-related issues

The following table lists issues that you might face while integrating with other products:

ScenarioProbable causes with solutions (if any)
Unable to create an external configuration for collecting change management (or incident management) data collection.

This can happen in the following scenarios:

  • You have not configured the REST API for an HTTP or HTTPS connection.
  • The connection type for which you configured the REST API does not match the connection type you selected while creating the external configuration

Solution: Ensure that you have correctly configured the REST API necessary for creating the external configuration. For more information, see Configuring the REST API Open link .

To check whether the REST API is already configured for the Remedy AR server that you want to integrate with, run the Remedy login API to see if you can login to that server. A successful login indicates that the REST API is already configured. For more information about the REST API, see Login information Open link .

The data collector created for collecting data from TrueSight Infrastructure Management (or ProactiveNet) shows a red status.

In addition, you see the following messages:

  • In the Collection Status History, you see the following message under the Description column:
    Error in BPPM server connection
  • In the Notification Alerts, you see the following message under the Status column:

    An error occurred while processing sendEvent Host[hostName] cellname[cellName].
    In the preceding message, hostName refers to the Infrastructure Management (or ProactiveNet) server and cellName refers to the Infrastructure Management (or ProactiveNet) cell to which the alert was sen.

This issue can occur if the following conditions apply:

  • If TLS is enabled for TrueSight Infrastructure Management (or ProactiveNet) on TrueSight Operations Management 10.7.00.
  • If you upgrade from a previous version of the product and you had existing external configurations set up for TrueSight Infrastructure Management (or ProactiveNet).

Solution: Change the encryption key used in the external configuration for TrueSight Infrastructure Management (or ProactiveNet) cell. Also, import the certificate generated for TrueSight Infrastructure Management (or ProactiveNet) into TrueSight IT Data Analytics.

For more information, see Enabling security for communication with Infrastructure Management server.

 


Other issues

The following table lists other issues that you might face while using the product:

Scenario

Probable causes with solutions (if any)

Unable to edit a saved search, data pattern, dashboard, or collection profile.

You cannot edit components that are imported via a content pack. Additionally, public saved searches that are created and shared by another user are not editable.

Solution: To edit a component that was initially imported by using a content pack, clone the component and then modify it as per your requirements. The same solution applies for public saved searches.

You see the following error on the product user interface:

Error fetching data
from backend

This can happen if the Configuration Database service is down.

Solution: Perform the following steps:

  1. Ensure that the Configuration Database service is up and running.
  2. If you continue to get this error, then navigate to %BMC_ITDA_HOME%\logs\services\configdb.log and check for the occurrence of OutOfMemoryError. If this error is present in the log, open the configdb.conf file located at %BMC_ITDA_HOME%\custom\conf\services, and increase the Java memory heap size (wrapper.java.maxmemory property value).
    For more information, see one of the following links:
An existing notification stopped working after the saved search used in the notification was modified.

This issue can occur if you modify one or more saved searches used in the notification and update the search query to include a tabular command.

Tabular commands are not supported for notifications. When you update the saved search query with a tabular command, the notification using that saved search becomes invalid.

To avoid this issue, you need to be careful before updating a saved search that is already in use in a notification. To see whether a saved search is already in use, navigate to the Saved Searches page, select the saved search, and click List Notifications and Dashlets .

Even after upgrading to version  2.7 of the product, searches continue to work in a case sensitive way.This issue can occur in a scenario where after the upgrade, an earlier index (default, 6 hours) got created and around the same time you created a new data collector. In this case, the data collected by the new data collector is added to the existing index and therefore until a new index is created search will continue to work in a case-sensitive way.

While creating data collectors, hosts, or collection profiles, user groups cannot be selected (in other words, user groups are not displayed).

This issue can occur if the password corresponding to the BMC Atrium Single Sign-On administrator has changed and this change is not updated in the TrueSight IT Data Analytics records.

Solution: Run the ssoadminpassword CLI command to update the new password and then try creating data collectors, hosts, or collection profiles.