Moviri – K8s (Kubernetes) Prometheus Extractor


“Moviri Integrator for BMC Helix Continuous Optimization – k8s (Kubernetes) Prometheus” is an additional component of BMC BMC Helix Continuous Optimization product. It allows extracting data from the Kubernetes cluster management system, a leading solution to manage cloud-native containerized environments.  Relevant capacity metrics are loaded into BMC BMC Helix Continuous Optimization, which provides advanced analytics over the extracted data in the form of an interactive dashboard, the Kubernetes View. 

The integration supports the extraction of both performance and configuration data across different component of the Kubernetes system and can be configured via parameters that allow entity filtering and many other settings. Furthermore, the connector is able to replicate relationships and logical dependencies among entities such as clusters, nodes, namespaces, deployments, and pods.

The documentation is targeted at BMC BMC Helix Continuous Optimization administrators, in charge of configuring and monitoring the integration between BMC BMC Helix Continuous Optimization and Kubernetes.

The latest version of the integrator k8s (Kubernetes) Prometheus is available on EPD. Click the Moviri Integrator for Helix Continuous Optimization link. In the Patches tab, select the latest version of TrueSight Capacity Optimization. This version of the connector compatible with BMC BMC Helix Continuous Optimization 11.5 and onward.


If you used the  “Moviri Integrator for  BMC Helix Continuous Optimization– k8s Prometheus” before, please expect the following change from version 20.02.01:

  1. A new entity type "pod workload" will be imported to replace pods. If you imported pods before, all the Kubernetes pods will be gone and left in All systems and business drivers → "Unassigned".
  2. A new naming convention is implemented for Kubernetes Controller replicasets (the suffix of the name will be removed, eg: replicaset-abcd will be named as replicaset). All the old controller will be replaced by the new controller with the new name, and the old ones will be left in All systems and business drivers → "Unassigned".


Step I. Complete the preconfiguration tasks

Step II. Configure the ETL

Step III. Run the ETL

Step IV. Verify the data collection

k8s Heapster to k8s Prometheus Migration

Pod Optimization - Pod workloads replace pods

Common issues


Step I. Complete the preconfiguration tasks

Step II. Configure the ETL

A. Configuring the basic properties

Some of the basic properties display default values. You can modify these values if required.

To configure the basic properties:

  1. In the console, navigate to Administration ETL & System Tasks, and select ETL tasks.
  2. On the ETL tasks page, click Add > Add ETL. The Add ETL page displays the configuration properties. You must configure properties in the following tabs: Run configuration, Entity catalog, and Amazon Web Services Connection
  3. On the Run Configuration tab, select Moviri - k8s Prometheus Extractor from the ETL Module list. The name of the ETL is displayed in the ETL task name field. You can edit this field to customize the name.

    image2019-6-4_14-33-26.png

  4. Click the Entity catalog tab, and select one of the following options:
    • Shared Entity Catalog:
      • From the Sharing with Entity Catalog list, select the entity catalog name that is shared between ETLs.
    • Private Entity Catalog: Select if this is the only ETL that extracts data from the k8s Prometheus resources.
  5. Click the Connection tab, and configure the following properties:

6. Click the Prometheus Extraction tab, and configure the following properties:

The following image shows a Run Configuration example for the “Moviri Integrator for BMC Helix Continuous Optimization – k8s Prometheus":

image2020-8-18_14-26-1.png

7. (Optional) Enable TLS v1.2 for Kubernetes 1.16 and above

If you are using Kubernetes 1.16, or Openshift 4 and above, there's an incompatibility between Java and TLS v1.3. We are providing a workaround to use TLSv 1.2 for connection. To add the hidden property:

a. On ETL configuration page's very bottom, there's link to mamually modifying the ETL property:

image2020-8-18_14-32-54.png

b. On the manually editing property page, add the property: "prometheus.use.tlsv12"

image2020-8-18_14-35-35.png

c. Set the value as "true", and save the change

8. (Optional) Override the default values of the properties:


Run configuration
Object relationships
ETL task properties


(Optional) B. Configuring the advanced properties


You can configure the advanced properties to change the way the ETL works or to collect additional metrics.


To configure the advanced properties:


  1. On the Add ETL page, click Advanced.
  2. Configure the following properties:

    The [expand] macro is a standalone macro and it cannot be used inline. Click on this message for details.

    The [expand] macro is a standalone macro and it cannot be used inline. Click on this message for details.

    Loader configuration

    Property

    Description

    Empty dataset behavior

    Specify the action for the loader if it encounters an empty dataset:

    • Warn: Generate a warning about loading an empty dataset.
    • Ignore: Ignore the empty dataset and continue parsing.

    ETL log file name

    The name of the file that contains the ETL run log. The default value is: %BASE/log/%AYEAR%AMONTH%ADAY%AHOUR%MINUTE%TASKID

    Maximum number of rows for CSV output

    A numeric value to limit the size of the output files.

    CSV loader output file name

    The name of the file that is generated by the CSV loader. The default value is: %BASE/output/%DSNAME%AYEAR%AMONTH%ADAY%AHOUR%ZPROG%DSID%TASKID

    Capacity Optimization loader output file name

    The name of the file that is generated by the BMC Helix Continuous Optimization loader. The default value is: %BASE/output/%DSNAME%AYEAR%AMONTH%ADAY%AHOUR%ZPROG%DSID%TASKID

    Detail mode

    Specify whether you want to collect raw data in addition to the standard data. Select one of the following options:

    • Standard: Data will be stored in the database in different tables at the following time granularities: Detail (configurable, by default: 5 minutes), Hourly, Daily, and Monthly.
    • Raw also: Data will be stored in the database in different tables at the following time granularities: Raw (as available from the original data source), Detail (configurable, by default: 5 minutes), Hourly, Daily, and Monthly.
    • Raw only: Data will be stored in the database in a table only at Raw granularity (as available from the original data source).

    Unknown macro: variant. Click on this message for details.

    Unknown macro: variant. Click on this message for details.

    Remove domain suffix from datasource name (Only for systems) 

    Select True to remove the domain from the data source name. For example, server.domain.com will be saved as server. The default selection is False.

    Leave domain suffix to system name (Only for systems)

    Select True to keep the domain in the system name. For example: server.domain.com will be saved as is. The default selection is False.

    Update grouping object definition (Only for systems)

    Select True if you want the ETL to update the grouping object definition for a metric that is loaded by the ETL. The default selection is False.

    Skip entity creation (Only for ETL tasks sharing lookup with other tasks)

    Select True if you do not want this ETL to create an entity and discard data from its data source for entities not found in Capacity Optimization. It uses one of the other ETLs that share a lookup to create a new entity. The default selection is False.

    Scheduling options

    Property

    Description

    Hour mask

    Specify a value to run the task only during particular hours within a day. For example, 0 – 23 or 1, 3, 5 – 12.

    Day of week mask

    Select the days so that the task can be run only on the selected days of the week. To avoid setting this filter, do not select any option for this field.

    Day of month mask

    Specify a value to run the task only on the selected days of a month. For example, 5, 9, 18, 27 – 31.

    Apply mask validation

    Select False to temporarily turn off the mask validation without removing any values. The default selection is True.

    Execute after time

    Specify a value in the hours:minutes format (for example, 05:00 or 16:00) to wait before the task is run. The task run begins only after the specified time is elapsed.

    Enqueueable

    Specify whether you want to ignore the next run command or run it after the current task. Select one of the following options:

    • False: Ignores the next run command when a particular task is already running. This is the default selection.
    • True: Starts the next run command immediately after the current running task is completed.
    3.Click Save.

    The ETL tasks page shows the details of the newly configured Prometheus ETL:
    image2019-6-10_8-26-42.png

Step III. Run the ETL

After you configure the ETL, you can run it to collect data. You can run the ETL in the following modes:

A. Simulation mode: Only validates connection to the data source, does not collect data. Use this mode when you want to run the ETL for the first time or after you make any changes to the ETL configuration.

B. Production mode: Collects data from the data source.

A. Running the ETL in the simulation mode

To run the ETL in the simulation mode:

  1. In the console, navigate to Administration ETL & System Tasks, and select ETL tasks.
  2. On the ETL tasks page, click the ETL. The ETL details are displayed.

    image2019-6-10_8-26-42.png

  3. In the Run configurations table, click Edit image2019-6-3_11-28-46.png to modify the ETL configuration settings.
  4. On the Run configuration tab, ensure that the Execute in simulation mode option is set to Yes, and click Save.
  5. Click Run active configuration. A confirmation message about the ETL run job submission is displayed.
  6. On the ETL tasks page, check the ETL run status in the Last exit column.
    OK Indicates that the ETL ran without any error. You are ready to run the ETL in the production mode.
  7.  If the ETL run status is Warning, Error, or Failed:
    1. On the ETL tasks page, click image2019-6-3_11-29-45.png in the last column of the ETL name row.
    2. Check the log and reconfigure the ETL if required.
    3. Run the ETL again.
    4. Repeat these steps until the ETL run status changes to OK.

B. Running the ETL in the production mode

You can run the ETL manually when required or schedule it to run at a specified time.

Running the ETL manually

  1. On the ETL tasks page, click the ETL. The ETL details are displayed.
  2. In the Run configurations table, click Edit image2019-6-3_11-30-9.png to modify the ETL configuration settings. The Edit run configuration page is displayed.
  3. On the Run configuration tab, select No for the Execute in simulation mode option, and click Save.
  4. To run the ETL immediately, click Run active configuration. A confirmation message about the ETL run job submission is displayed.
    When the ETL is run, it collects data from the source and transfers it to the database.

Scheduling the ETL run

By default, the ETL is scheduled to run daily. You can customize this schedule by changing the frequency and period of running the ETL.

To configure the ETL run schedule:

  1. On the ETL tasks page, click the ETL, and click Edit Task image2019-6-3_11-32-42.png. The ETL details are displayed.

    image2019-6-10_8-29-15.png
  2. On the Edit task page, do the following, and click Save:
    • Specify a unique name and description for the ETL task.
    • In the Maximum execution time before warning field, specify the duration for which the ETL must run before generating warnings or alerts, if any.
    • Select a predefined or custom frequency for starting the ETL run. The default selection is Predefined.
    • Select the task group and the scheduler to which you want to assign the ETL task.
  3. Click Schedule. A message confirming the scheduling job submission is displayed.
    When the ETL runs as scheduled, it collects data from the source and transfers it to the database.

Step IV. Verify data collection

Verify that the ETL ran successfully and check whether the k8s Prometheus data is refreshed in the Workspace.

To verify whether the ETL ran successfully:

  1. In the console, click Administration > ETL and System Tasks > ETL tasks.
  2. In the Last exec time column corresponding to the ETL name, verify that the current date and time are displayed.

To verify that the k8s Prometheus data is refreshed:

  1. In the console, click Workspace.
  2. Expand (Domain name) > Systems > k8s Prometheus > Instances.
  3. In the left pane, verify that the hierarchy displays the new and updated Prometheus instances.
  4. Click a k8s Prometheus entity, and click the Metrics tab in the right pane.
  5. Check if the Last Activity column in the Configuration metrics and Performance metrics tables displays the current date.



k8s Heapster to k8s Prometheus Migration

The “Moviri Integrator for BMC Helix Continuous Optimization – k8s Prometheus” supports a seamless transition from entities and metrics imported by the “Moviri Integrator forBMC Helix Continuous Optimization – k8s Heapster”. Please follow these steps to migrate between the two integrators:

Kubernetes ETL Integeration
  1. Stop “Moviri Integrator for BMC Helix Continuous Optimization – k8s Heapster” ETL task.
  2. Install and configure the “Moviri Integrator for BMC Helix Continuous Optimization – k8s Prometheus”, ensuring that the lookup is shared with the “Moviri Integrator for BMC Helix Continuous Optimization – k8s Heapster” ETL task.
  3. Start “Moviri Integrator for BMC Helix Continuous Optimization – k8s Prometheus” ETL task.


Pod Optimization - Pod Workloads replace Pods


The “Moviri Integrator for BMC Helix Continuous Optimization – k8s Prometheus” introduce a new entity "pod workload" from v20.02.01. Pod Workload is an aggregated entity that aggregate a group of pods that are running on the same controller. pod workload is the direct child of the controller that the pods are running on. Pod workload will use the same name as the parent controller. Pods at the same time will be dropped. 


Kubernetes ETL Integeration

If you used the  “Moviri Integrator for  BMC Helix Continuous Optimization– k8s Prometheus” and imported pods before, once you upgrade to this version, all the Kubernetes pods will be gone and left in All systems and business drivers ->

"Unassigned".

Pod Workload

Entity

Pod workload is an aggregated entity, that represents all the pods that are running on the same controller. Pod workload will use the same name as the controller that all the pods are running on.

Hierarchy

Pod Workload is direct child of the controller that the aggregated pods are running on. For stand along pods, the pod workload is exactly the same as stand along pods themselves, and is children of the namespace.

Metrics Meaning

Global Metrics: provides multiple statistics values (avg, max, min, sum) for the last hour, representing max, min, sum and average value of the average of all pods running on that controller.

BYCONT metrics: provdes multiple statistics value (avg, max, min, sum) for the last hour, representing max, min, sum and average value of the average of per container per image running on that controller.

BYCONT Highmar Counters: BYCONT_CPU_USED_NUM_HM and BYCONT_MEM_USED_HM provide multiple statistics value on highmark counters based on 1 minute resolution. Max value is 95th percentile of the max container's the 1m resolution data; Avg value is the 90th percentile of the max container's the 1m resolution data; MIn value is the 75th percentile of the max container's the max container's 1m resolution data.; SUM value is the 95th percentile max 1 min resolution data summing all containers on that controller.

BYCONT_IMAGE metrics: provides multiple statistics value (avg, max, min, sum) for the last hour, representing max, min, sum and average of average value of per container per image running on that controller. Container name and image name as the subentity name, using "::" connecting (eg. controllername::imagename)

Here are some screenshots of the hierarchy and metric

image2020-8-18_14-57-56.png

image2020-8-18_14-58-29.png

image2020-8-18_15-0-56.png

Common Issues


Error Messages / Behaviors

Cause

Solve


Query erros HTTP 422 Unprocessable Entity

You will see these errors shows sometimes. The number of this error messages can vary a lot for each run.

This is usually caused by Prometheus rebuilding or restarting. Right after the Prometheus's rebuilding or reloading, there are couple of days you will see this error showing. They usually goes away organically as the Prometheus running more stable.

They usually goes away organically as the Prometheus running more stable.


Prometheus is running fine but no data is pulled

This usually caused by the last counter is set too far from today's date. Prometheus has a data retention periods which has a default value 15 days, and it can be configured. If the ETL is set to extracting data passed the data retention period, there's not gonna be any data. Prometheus's status page will show the data retention value in "storage retention" field.

Modify the default last counter to a more recent date.


504 Gateway Timeouts

These 504 Timeout query error (server didn't respondd in time) is related to the route timeout is being used on Openshift. This can be configured on a route-to-route basis. For example, the Prometheus route can be increased to the 2min timeout that is also configured on the Prometheus backend. Please follow this link to understand what is the configured timeout and how can it be increased
https://docs.openshift.com/container-platform/4.6/networking/routes/route-configuration.html

Increase timeout period from Openshift side


Data Verification

The following sections provide some indications on how to verify on Prometheus if all the pre-requisites are in place before starting collecting data

Verify Prometheus Build information

Verify the Prometheus "Last successful configuration reload" (from Prometheus UI, check "Status > Runtime & Build Information")
If the "Last successful configuration reload" is reporting less then 3 days, ask the customer to evaluate the status of the integration in the next 2/3 days

Verify the Status of Prometheus Target Services

Verify the status of Prometheus Target (from the Prometheus UI, check "Status > Targets"

  • Check the status of "node-exporter" (there should be 1 instance running for each node in the cluster)
  • Check the status of "kube-state-metrics" (there should be at least 1 instance running)
  • Check the status of "kubelet" (there should be at least 1 instance running for each node in the cluster)

Verify data availability in Prometheus Tables

Verify if the following Prometheus tables contain data (from the Prometheus Ul)

  • "kube_pod_container_info" when missing Pod Workload, Controller, Namespace (but also Cluster and Node for Requests and Limits metrics)
  • "kube_node info" when missing Node and Cluster metrics.