Troubleshooting EFK logging issues
Use the information in this topic to troubleshoot ElasticSearch, Fluent Bit, and Kibana (EFK) logging issues.
EFK pods restart
This occurs because the Fluent Bit Deaemonset checks the health of the nodes. The pods restart until the Fluent Bit Daemonset receives the healthy status of the nodes.
If the installer displays the following message, it means that Fluent Bit needs more time than the default timeout duration in receiving the health status of the nodes:
Workaround
- Wait till the Fluent Bit pods start.
- Manually restart the nodes or restart the Docker service.
EFK-Elasticsearch-coordinating pod runs into the CrashLoopBackOff error
The efk-elasticsearch-coordinating pod runs into the CrashLoopBackOff error when it is not provided with sufficient time to restart.
Workaround
- Go to helix-on-prem-deployment-manager/bmc-helix-logging/efk/elasticsearch/chart_values.yaml.
- Increase the values for coordinating.livenessProbe.failureThreshold and coordinating.readinessProbe.failureThreshold from 5 to 10 seconds.
Restart the deployment manager by running the following command:
./deployment-manager.sh
Installation of BMC Helix Loggingfails if there is a restriction to create role and role binding in the namespace
Installation of BMC Helix Logging fails with the following errors:
Release "efk-fluent-bit" does not exist. Installing it now.
Error: rendered manifests contain a resource that already exists. Unable to continue with install: could not get information about the resource: roles.rbac.authorization.k8s.io "efk-fluent-bit" is forbidden: User "u-mfc3wbd3rb" cannot get resource "roles" in API group "rbac.authorization.k8s.io" in the namespace "itom"
Workaround
Perform the following steps:
- In the helix-on-prem-deployment-manager/bmc-helix-logging/bmc-helix-logging-deployer.sh file, set the value of the FLUENTBIT_RBAC_ENABLE parameter as false.
- Rerun the deployment manager.
Logs are not visible in Kibana after successful deployment of BMC Helix Logging
Run the following command to view the logs of the efk-fluent-bit pod:
The following error is shown after running the command:
Workaround
Run the following command to verify that a group of users have execute permission on the host path folder /var/lib/docker/containers:
ls -ltr /var/lib/docker/containersFor example, this image shows that there is an issue with the permission on the /var/lib/docker/container folder.
Run the following command on each worker node to give the execute permission:
# chmod g+x -R “/var/lib/docker/containers”Run the following command to verify that a group of users have execute permission on the host path folder /var/lib/docker/containers:
ls -ltr /var/lib/docker/containers- Restart the Fluent Bit pods.
EFK logging indices are not getting deleted
The EFK logging indices is not getting deleted in spite of setting the retention to 7 days. This is causing the EFK-logging data PVC to keep increasing which is later making the PVC full.
Workaround
Run the following command to remove the existing Elasticsearch curator:
#helm delete -n <namespace> elasticsearch-curator<namespace> is the name of the BMC Helix ITOM namespace that you use in 23.1.02 version.
Set up Index Lifecycle Management (ILM) policy in the Kibana UI.
For more information, see ILM: Manage the index lifecycle,