Troubleshooting log collection from Kubernetes


Unable to upgrade artifacts by using the Helm upgrade command

Issue symptoms

You cannot upgrade the following artifacts by using the Helm upgrade command. This issue might occur in the following cases:

  • The Helm template has changed.
  • The CRD has changed.
  • Multiple YAML files exist in the Helm directory.

Resolution

You must uninstall and reinstall Helm.

Perform the following steps to uninstall and reinstall Helm:

  1. Run the following command to uninstall Helm.

    helm uninstall fluent-operator -n bmc-k8s-logs

     

  2. Run the following command to verify artifacts that exist in the namespace.

    kubectl get all -n bmc-k8s-logs

     

  3. Run the following command to delete the daemonset or statefulset.

    kubectl delete daemonset.apps/fluent-bit statefulset.apps/fluentd service/fluentd service/fluent-bit -n bmc-k8s-logs

     

  4. If some pods remain in the terminating state because the daemonset or statefulset is not deleted, delete the pods by using the following command:

    kubectl delete pod <pod_ID> -n bmc-k8s-logs --force
  5. Reinstall Helm. For more information, see Install Helm.

The Kubernetes pods fail

Issue symptoms

This issue might occur because autoscaling for the Kubernetes connector pod is not supported.

Resolution

If you want to add an additional connector node, perform the following actions:

  1. In the values.yaml file, go to the fluentd: section.
  2. Change the value of the replicas: property.
    The default value is 1. Change it to the number of nodes that you want to add.
  3. Run the following command:

    helm upgrade fluent-operator . -n bmc-k8s-logs -f values.yaml

Unable to run Helm commands

Issue symptoms

You cannot run Helm commands. This issue might occur because an unsupported version of Helm is installed in the controller.

Resolution

Ensure that Helm v3.2.1 or later is installed in the controller and then run the commands again.

The aggregator pod is in the Pending state

Issue symptoms

The aggregator pod appears to be in the Pending state. This issue occurs if the correct storage class is not provided in the values.yaml file for Fluentd.

Resolution

Perform the following actions:

  1. Run the following command to check the available storage class:

    kubectl get sc
  2. If the storage class is incorrect, add the correct class in the values.yaml file.

The Kubernetes pods do not come up

Issue symptoms

The agent, aggregator, or operator pods do not come up.

Resolution

Perform the following actions:

  • Verify that the image pull secret values are correct.
  • If you have a token, verify that the secret value is created by using a valid token.
  • Verify that the secret name in the values.yaml file is valid.

The Helm installation fails

Issue symptoms

The Helm installation fails. This issue might occur because old and invalid Custom Resource Definitions (CRDs) exist in the controller.

Resolution

Perform the following actions:

  1. Run the following command to check if old CRDs exist in the controller:

    kubectl get crd | grep fluent
  2.  If old CRDs exist, delete only the Fluent-specific CRDs.
  3. Install Helm again.
    For information on installing Helm, see Collecting-Kubernetes-logs.

Bad Request: Mandatory parameters missing error is displayed when you click Create & Download

Issue symptoms

You get the following error when you click the Create & Download button.

Something went wrong

This issue might occur when there is an error in saving the integration.

Issue scope

This issue might affect all integrations.

Resolution

Check the pod logs for the tdc-controller-service service and search for any exceptions.

An error occurs when you run the command to create the bmc-logging namespace and other configurations

Issue symptoms

You get an error after running the following command:

kubectl apply -f <donwloaded_configuration_ yaml_file>

kubectl create -f <donwloaded_configuration_ yaml_file>

This issue might occur if you run the command from wrong directory or you do not have appropriate permissions to run the command.

Issue scope

This issue might affect all integrations.

Resolution

Confirm the following conditions:

  • The downloaded file is present in the directory from where you are running the command.
  • The command is run from the controller and not from a node.
  • You have the appropriate privileges to create Kubernetes entities.

An invalid image error occurs after you apply the .yaml file configurations

Issue symptoms

The connector is not running.

You get the Invalid image error after running the following command:

kubectl apply -f <donwloaded_configuration_ yaml_file>

Issue scope

This issue might affect all integrations.

Resolution

Confirm the following conditions:

  • There is a valid image registry URL value present at the containers : env : image path.
  • The PARAMETERS.docker_container_registry_path string is not present in the daemonset definition in the .yaml file.

An error occurs after you apply the .yaml file configurations

Issue symptoms

The connector is not running.

You get the Unable to pull image error after running the following command:

kubectl apply -f <donwloaded_configuration_ yaml_file>

Issue scope

This issue might affect all integrations.

Resolution

  • If you have not provided a value in the Docker Registry Path field, in the containers : env : image path, replace the PARAMETERS.docker_container_registry_path string with the docker registry URL.
  • Ensure that a valid image can be pulled by using the connector image repository URL and that the URL is entered in the .yaml file at the containers : env : image path.

All pods are in the error status

Issue symptoms

No logs are collected and integration status is Configured.

The status of all pods is in error after you run the following command:

kubectl get pod -n bmc-logging

Issue scope

This issue might affect all integrations with incorrect configurations.

Resolution

  • Validate the .yaml file for the ConfigMap (name: bmc-config-map) definition. Make sure it is a valid .yaml.
  • Validate that the value of the ConfigMap definition follows the fluentd configuration syntax.

Error or CrashLoopBackOff status for some pods

Issue symptoms

All logs are not collected.

Status of multiple pods is error after you run the following command:

kubectl get pod -n bmc-logging

Issue scope

This issue might affect all integrations configured for a cluster.

Resolution

  1. Run the following command:
    kubectl get pod -n bmc-logging -o wide
  2. Find the nodes to which the Error/CrashLoopBackOff pods are assigned.
  3. Check the health of those nodes.

Collected logs are unavailable in Explorer

Issue symptom

No logs are being collected and shown in Explorer.

Issue scope

This issue might affet all integrations.

Resolution

  1. Find the nodes to which the service pods are assigned by running the following command:
    kubectl get pod -o wide | grep <service name> 
  2. For each node, get the connector pod by running the following command: 
    kubectl get pod -n bmc-logging - o wide
  3. Note the connector pods running on the node where service pods are also running. 
  4. For each of these connector pods, run the following command:
    kubectl exec -it <daemonset pod name> bash -n bmc-logging 
  5. Go to the /fluentd/log path, and run the following command:
    cat fluent.log
  6. In the log file, search for the string "following tail of /var/log/containers/*<name of the service pod>".
    If the string is not present, the connector cannot find the logs of the service from the node.
  7. Ensure that the service pod logs are stored on the node.

 

Tip: For faster searching, add an asterisk to the end of your partial query. Example: cert*