Troubleshooting EFK logging issues


Use the information in this topic to troubleshoot ElasticSearch, Fluent Bit, and Kibana (EFK) logging issues.


EFK pods restart

This occurs because the Fluent Bit Deaemonset checks the health of the nodes. The pods restart until the Fluent Bit Daemonset receives the healthy status of the nodes.

If the installer displays the following message, it means that Fluent Bit needs more time than the default timeout duration in receiving the health status of the nodes:

ERROR: Failed to install helm chart: fluent bit.
ERROR: Failed to install EFK-Fluent bit.

Workaround

  • Wait till the Fluent Bit pods start.
  • Manually restart the nodes or restart the Docker service.


EFK-Elasticsearch-coordinating pod runs into the CrashLoopBackOff error

The efk-elasticsearch-coordinating pod runs into the  CrashLoopBackOff error when it is not provided with sufficient time to restart.

Workaround

  • Go to helix-on-prem-deployment-manager/bmc-helix-logging/efk/elasticsearch/chart_values.yaml.
  • Increase the values for coordinating.livenessProbe.failureThreshold and coordinating.readinessProbe.failureThreshold from 5 to 10 seconds.
  • Restart the deployment manager by running the following command:

    ./deployment-manager.sh

Installation of BMC Helix Loggingfails if there is a restriction to create role and role binding in the namespace

Installation of BMC Helix Logging fails with the following errors:

Release "efk-fluent-bit" does not exist. Installing it now.
Error: rendered manifests contain a resource that already exists. Unable to continue with install: could not get information about the resource: roles.rbac.authorization.k8s.io "efk-fluent-bit" is forbidden: User "u-mfc3wbd3rb" cannot get resource "roles" in API group "rbac.authorization.k8s.io" in the namespace "itom"

Workaround

Perform the following steps:

  1. In the helix-on-prem-deployment-manager/bmc-helix-logging/bmc-helix-logging-deployer.sh file, set the value of the FLUENTBIT_RBAC_ENABLE parameter as false.
  2. Rerun the deployment manager.



Logs are not visible in Kibana after successful deployment of BMC Helix Logging

Run the following command to view the logs of the efk-fluent-bit pod:

#kubectl logs efk-fluent-bit-xxxxxxxxx -n <namespace> 

The following error is shown after running the command:

skip (invalid) entry=/var/log/containers/<pod name>

Workaround

  1. Run the following command to verify that a group of users have execute permission on the host path folder /var/lib/docker/containers: 

    ls -ltr /var/lib/docker/containers

    For example, this image shows that there is an issue with the permission on the /var/lib/docker/container folder.

    image-2023-6-19_20-35-52.png

  2. Run the following command on each worker node to give the execute permission:

    # chmod g+x -R /var/lib/docker/containers

    Run the following command to verify that a group of users have execute permission on the host path folder /var/lib/docker/containers:

    ls -ltr /var/lib/docker/containers

    image-2023-6-19_20-32-34.png

  3. Restart the Fluent Bit pods. 


EFK logging indices are not getting deleted

 The EFK logging indices is not getting deleted in spite of setting the retention to 7 days. This is causing the EFK-logging data PVC to keep increasing which is later making the PVC full.

Workaround

  1. Run the following command to remove the existing Elasticsearch curator:

    #helm delete -n <namespace> elasticsearch-curator

    <namespace> is the name of the BMC Helix ITOM namespace that you use in 23.1.02 version.

  2. Set up Index Lifecycle Management (ILM) policy in the Kibana UI.
    For more information, see ILM: Manage the index lifecycle,


 

Tip: For faster searching, add an asterisk to the end of your partial query. Example: cert*