Troubleshooting EFK logging issues

Use the information in this topic to troubleshoot ElasticSearch, Fluent Bit, and Kibana (EFK) logging issues.

The Kibana pod is in CrashLoopBackOff

This issue may occur if the network setting is IPV4 or IPV6.
Workaround

Perform the following steps to specify the host of the back-end server:

Edit the Kibana config map and set the following value:
"kubectl edit cm -n ade-logging elasticsearch-logging-kibana-conf"
server.host: "0.0.0.0"
Delete the Kibana pod by using the following command:
kubectl delete pod <<podname>> -n <<namespace>>

The Fluentd daemonset pods are not visible

This issue occurs if the rbac or psp values are not set correctly in the chart_value.yaml file.
Workaround

Perform the following steps to resolve the issue:

Make sure that the helix-on-prem-deployment-manager/bmc-helix-logging/efk/fluentd/chart_value.yaml file has the following settings:
- (For Rancher Kubernetes) rbac=true, psp=true
- (For OpenShift Kubernetes) rbac=true, psp=false
Make sure that the fluentd-privileged-binding role binding is present in the logging namespace.

Logs are not displayed in Kibana

This issue occurs when the Fluentd forwarder runs the container as a non-root user.
Workaround

In the helix-on-prem-deployment-manager/bmc-helix-logging/efk/fluentd/chart_value.yaml file, verify that the securityContext of the forwarder has the following values:
securityContext:
enabled: true
runAsUser: 0
runAsGroup: 0
fsGroup: 0

EFK pods restart

This issue occurs because the Fluentd daemonset checks the health of the nodes. The pods restart until the Fluentd daemonset receives the healthy status of the nodes.
If the installer displays the following message, it means that Fluentd needs more time than the default timeout duration in receiving the health status of the nodes:
ERROR: Failed to install helm chart: fluentd.
ERROR: Failed to install EFK-Fluentd.
Workaround

Wait till the Fluentd pods start.
Manually restart the nodes or restart the Docker service.

EFK-Elasticsearch-coordinating pod runs into the CrashLoopBackOff error

The efk-elasticsearch-coordinating pod runs into the CrashLoopBackOff error when it is not provided with sufficient time to restart.
Workaround

Go to helix-on-prem-deployment-manager/bmc-helix-logging/efk/elasticsearch/chart_values.yaml
Increase the values for coordinating.livenessProbe.failureThreshold and coordinating.readinessProbe.failureThreshold from 5 to 10 seconds.
Update the values for coordinating.livenessProbe.initialDelaySeconds and coordinating.readinessProbe.initialDelaySeconds to 120 seconds.
Restart the deployment manager by running the following command:
./deployment-manager.sh