Kubernetes view


The Kubernetes view enables you to manage the capacity and efficiency of containerized environments managed by the Kubernetes platform, and presents key capacity metrics and charts for Kubernetes clusters, nodes, namespaces, controllers and pods.

You can use the Kubernetes view to complete tasks such as:

  • Understand resource bottlenecks and aggregate residual capacity of Kubernetes clusters as well as individual nodes
  • Detect current or imminent resource saturation conditions and days before the resource is saturated for every major Kubernetes resource (e.g. cluster, nodes, controllers, pods)
  • Assess the level of infrastructure efficiency, by comparing allocated vs actually resources, and identify most wasteful controllers or pods
  • Identify application in resource usage patterns and detect resource shortage conditions
  • Characterize the footprint of infrastructure resources on a containers' image basis
  • Understand resource utilization of namespaces, including the level of usage of resource quotas

Moviri Integrator for BMC Helix Continuous Optimization - Kubernetes View is compatible with BMC Helix Continuous Optimization 19.11 and onward.


Videos

The following video (5:35) provides a brief introduction of the Kubernetes views.

icon-play.pnghttps://youtu.be/2LlBvw_zzDk



The following video (9.53) provides information about how to access and use the Kubernetes views.

icon-play.pnghttps://youtu.be/yyAIi8_DMkM


Requirements

Conventions

The Kubernetes view provides summarized, high-level capacity KPIs designed for capacity management.The following common conventions around naming and metrics aggregation are valid for all data presented in the view:

  • Standard table column names: Metric name [unit of measurement] (ex. "Memory [GB]")
  • Unit of measurement is omitted when implicitly evident
  • The metric value is the aggregation of the last 30 days
  • The metric value is computed as follows: for each day, the daily peak is considered (at hourly resolution). Then, the mean value of the daily peak over the last 30 days is shown.

This is only valid for summary metrics presented in tables and overall page. The charts presented in the details pages follow regular over time metrics semantics, whose time frame and time resolution available as filters in the top of the page.

View Structure

The Kubernetes view is composed of the following first-level pages:

  • Overview: it presents a summary of Kubernetes components states
  • Clusters: it shows capacity metrics for Kubernetes clusters
  • Nodes: it shows capacity metrics for Kubernetes nodes
  • Namespaces: it shows capacity metrics for Kubernetes namespaces
  • Controllers: it shows capacity metrics for Kubernetes controllers
  • Pods: it shows capacity metrics for Kubernetes pod workload

For each page except Overview, a set of second-level tabs cover information about some or all of the following parts:

  • Capacity: summary of the most important capacity indicators
  • CPU: the most relevant CPU configuration and performance metrics
  • Memory: the most relevant memory configuration and performance metrics
  • Storage: the most relevant storage configuration and performance metrics (this tab is defined only in the Cluster page)

From all of the second-level tabs it is possible to drill-down to an entity detail page, which presents the most relevant performance metrics as time charts and tables for the most important configuration properties.

Overview

The goal of the overview page is to provide at-a-glance aggregated capacity visibility over all of the main Kubernetes components.

For each component three doughnut graphs are represented, showing the number of entities of the specific component based on the corresponding capacity status:

  • Ok: the component is healthy from a capacity management perspective 
  • Warning: the component has breached (or will be breaching in the near future) a warning utilization threshold for one or more metrics
  • Alert: the component has breached (or will be breaching in the near future) a critical utilization threshold for one or more metrics

Please refer to the section Threshold, Bottleneck & Status below for more details on how the capacity status is calculated for each entity. 

As we can see from the picture below each doughnut:

  • is scaled with the number of components taken in account;
  • has a color dependent on the component capacity status;
  • can be moved in the "Favorites" tabs clicking on the associated star.

worddavebd24aaba9f673faeac0260e33270ed0.png

Entity Pages

These first level pages (Clusters, Nodes, Namespaces, Controllers and Pods) are designed to provide capacity management insights for the corresponding Kubernetes entity. Each page contains four different second-level tabs (Capacity, CPU, memory and Storage) each of which presents capacity and efficiency KPIs related to the resources relevant for the analyzed entity.

Capacity

The first tab "Capacity" is designed to provide a summary of the most important KPIs for managing capacity of Kubernetes environments.

The table below summarize the information that is provided by the tab. Depending on the particular entity that is being analyzed (e.g. Cluster or Pod), only the relevant set of columns is shown. For example, the "Controller" column is only shown for Pods, while columns related to quotas like "Mem Request vs Quota" are only shown for Namespaces.

Column name

Meaning

Cluster

Name of the Cluster

Node

Name of the Node

Namespace

Name of the Namespace

Controller

Name of the controller

Pod

Name of the pod

Pod #

Number of pods (KPOD_NUM)

Status

Indicator of the resource's status (*)

CPU Used vs Cap [%]

Percentage of CPUs actually used with respect to CPU capacity (number of CPUs), calculated as CPU_USED_NUM/CPU_NUM

CPU Request vs Cap [%]

Percentage of CPUs requested with respect to CPU capacity (number of CPUs), calculated as CPU_REQUEST/CPU_NUM

CPU Request vs Quota [%]

Percentage of CPU request with respect to the namespace CPU request quota, calculated as CPU_REQUEST/CPU_REQUEST_MAX

CPU Limit vs Quota [%]

Percentage of CPU limit with respect to the CPU limit quota, calculated as CPU_LIMIT/CPU_LIMIT_MAX

CPU Used vs Limit [%]

Percentage of CPU used with respect to the CPU limit, calculated as CPU_USED_NUM/CPU_LIMIT

Mem Used vs Cap [%]

Percentage of memory actually used with respect to memory capacity (total memory), calculated as MEM_USED/TOTAL_REAL_MEM

Mem Real Used Vs Cap [%]

Percentage of memory actually used with respect to memory capacity (total memory), calculated as MEM_REAL_USED/TOTAL_REAL_MEM

Mem Request vs Cap [%]

Percentage of memory requested with respect to memory capacity (total memory), calculated as MEM_REQUEST/TOTAL_REAL_MEM

Mem Request vs Quota [%]

Percentage of memory request with respect to the namespace memory request quota, calculated as MEM_REQUEST/MEM_REQUEST_MAX

Mem Limit vs Quota [%]

Percentage of memory limit with respect to the namespace memory limit quota, calculated as MEM_LIMIT/MEM_LIMIT_MAX

Mem Used vs Limit [%]

Percentage of memory used with respect to the memory limit, calculated as MEM_USED/MEM_LIMIT

Pod # vs Pod Max [%]

Percentage of pod already created

Spare Pods

Estimated number of residual capacity in terms of additional pods that can be scheduled. This considers the average size of existing pods in the cluster

Bottleneck Resource

First resource to saturate

CPU #

Number of CPUs (CPU_NUM)

CPU USED

Number of cores that is used (CPU_USED_NUM)

CPU Request

Amount of CPU (cores) that will be allowed to use (CPU_REQEST)

CPU Limit

Amount of CPU (cores) that will be allowed to use (CPU_LIMIT)

Memory

The total amount of memory installed on the system (TOTAL_REAL_MEM)

MEM USED

The amount of memory used in bytes (MEM_USED)

MEM REAL USED

The total memory excluding file system cache and buffers in use on UNIX systems during the interval (MEM_REAL_USED)

MEM Request

Amount of memory that will be guaranteed (MEM_REQUEST)

MEM Limit

The maximum amount of physical memory that can be allocated (MEM_LIMIT)

The picture below is an example of Capacity tab.

worddav49a70e8ea8c2bac8f1d4df113e20b5d2.png

CPU

The CPU tab presents the most important metrics related to the CPU. The column definition is defined in the table below.

Column name

Meaning

Cluster

Name of the cluster

Node

Name of the node

Namespace

Name of the namespace

Controller

Name of the controller

Pod

Name of the pod

Status

Resource capacity status

Days To Saturation

Days before the resource is saturated

CPU #

Number of CPUs (CPU_NUM)

CPU USED

Number of cores that is used (CPU_USED_NUM)

CPU Request

Amount of CPU (cores) that will be allowed to use (CPU_REQEST)

CPU Limit

Amount of CPU (cores) that will be allowed to use (CPU_LIMIT)

Quota Request

Quota set for the CPU request resource (CPU_REQUEST_MAX)

Quota Limit

Quota set for the CPU limit resource (CPU_LIMIT_MAX)

CPU USED (mCores)

Number of millicores that is used (CPU_USED_NUM)

CPU Request (mCores)

Amount of CPU (millicores ) that will be allowed to use (CPU_REQEST)

CPU Limit (mCores)

Amount of CPU (millicores ) that will be allowed to use (CPU_LIMIT)

Quota Request (mCores)

Quota set for the CPU request resource (CPU_REQUEST_MAX)

Quota Limit (mCores)

Quota set for the CPU limit resource (CPU_LIMIT_MAX)

CPU Used vs Request[%]

Percentage of CPU used with respect to the CPU request

CPU Used vs Cap [%]

Percentage of CPUs actually used with respect to CPU capacity (number of CPUs), calculated as CPU_USED_NUM/CPU_NUM

CPU Request vs Cap [%]

Percentage of CPUs requested with respect to CPU capacity (number of CPUs), calculated as CPU_REQUEST/CPU_NUM

CPU Request vs Quota [%]

Percentage of CPU request with respect to the namespace CPU request quota, calculated as CPU_REQUEST/CPU_REQUEST_MAX

CPU Limit vs Quota [%]

Percentage of CPU limit with respect to the CPU limit quota, calculated as CPU_LIMIT/CPU_LIMIT_MAX

CPU Used vs Limit [%]

Percentage of CPU used with respect to the CPU limit, calculated as CPU_USED_NUM/CPU_LIMIT

CPU Overcommitment [%]

Percentage of CPU limit with respect to the CPU capacity (number of CPUs), calculated as CPU_LIMIT/CPU_NUM

Bottleneck

First resource to saturate

worddav1ab322995cfc9299fcfbc0c9720f1f0b.png

Memory

The Memory tabs presents the most important metrics related to the memory. The column definition is defined in the table below.

Column name

Meaning

Cluster

Name of the Cluster

Node

Name of the Node

Namespace

Name of the Namespace

Controller

Name of the controller

Pod

Name of the pod

Status

Indicator of the resource's status

Days To Saturation

Days before the saturation of the physical resources

Memory

The total amount of memory installed on the system (TOTAL_REAL_MEM)

MEM USED

The amount of memory used in bytes (MEM_USED)

MEM REAL USED

The total memory excluding file system cache and buffers in use on UNIX systems during the interval (MEM_REAL_USED)

MEM Request

Amount of memory that will be guaranteed (MEM_REQUEST)

MEM Limit

The maximum amount of physical memory that can be allocated (MEM_LIMIT)

Quota Request

Amount of memory quota request (MEM_REQUEST_MAX)

Quota Limit

Amount of memory quota limit (MEM_LIMIT_MAX)

Mem Used vs Request [%]

Percentage of memory used with respect to memory request, calculated as MEM_USED/MEM_REQUEST

Mem Used vs Cap [%]

Percentage of memory actually used with respect to memory capacity (total memory), calculated as MEM_USED/TOTAL_REAL_MEM

Mem Request vs Cap [%]

Percentage of memory requested with respect to memory capacity (total memory), calculated as MEM_REQUEST/TOTAL_REAL_MEM

Mem Request vs Quota [%]

Percentage of memory request with respect to the namespace memory request quota, calculated as MEM_REQUEST/MEM_REQUEST_MAX

Mem Limit vs Quota [%]

Percentage of memory limit with respect to the namespace memory limit quota, calculated as MEM_LIMIT/MEM_LIMIT_MAX

Mem Used vs Limit [%]

Percentage of memory used with respect to the memory limit, calculated as MEM_USED/MEM_LIMIT

Mem Overcommitment [%]

Percentage of memory limit with respect to memory capacity (total memory), calculated as MEM_LIMIT/TOTAL_REAL_MEM

Mem Real Used Vs Cap [%]

Percentage of memory actually used with respect to memory capacity (total memory), calculated as MEM_REAL_USED/TOTAL_REAL_MEM

Mem Real Used Vs Request [%]

Percentage of memory used with respect to memory request, calculated as MEM_REAL_USED/MEM_REQUEST

Bottleneck

First resource to saturate

worddav780ee4a96733c2aa6a50328a9b559f14.png

Storage

Since in Kubernetes the persistent volumes can be requested only from by the Cluster, the cluster tab is the only one with this subsection. Since the persistent volume management is still at the early stages of the developments by Kubernetes, the only relevant information available up to now are the following.

Column name

Meaning

Cluster

Name of the Cluster

Number of PV

Number of persistent volumes

PV Capacity [GB]

Storage capacity aggregated across all of the configured persistent volumes

PV Allocated [GB]

Storage allocated space aggregated across all of the configured persistent volumes

PV Free [GB]

Storage free space aggregated across all of the configured persistent volumes

PV Allocated vs Capacity [%]

Percentage of storage allocated with respect to capacity

worddavf644e24fdc398cd56917f0462da422cd.png

Details page

Clicking on the name of one resource you will arrive in the Details page. In this dashboard, as shown in the picture below, you can find the time charts of the most important metrics, related to the system take into account and a table with the configuration metrics.
worddavc36048f39c7db7867cd49795dd2257b6.png

worddavd2867b0334e297c248662399cce7570c.png

For more details on how the BMC Helix Continuous Optimization metrics map to the Kubernetes ones and how the derived metrics are computed, please refer to the Kubernetes integrator documentation.


Thresholds, Status & Bottleneck

The table below reports the thresholds used in the view.
The column Resource is defined as: "Resource taken into account: status"; the defined resources are:

  • CPU: all the metrics related to the CPU (ex. CPU utilization or CPU Request over CPU Number)
  • MEM: all the metrics related to the memory (ex. memory utilization or memory Request over memory capacity)
  • SAT: days before saturation of the resource
  • POD: percentage of pods required over pod creation limit (this limit depends from the Kubernetes version installed)

Resource

Cluster

Node

Namespace

Controller

Pod (Pod Workload)

CPU: OK

< 80%

< 80%

< 80%

< 80%

< 80%

CPU: ALERT

> 90%

> 90%

> 90%

> 90%

> 90%

MEM: OK

< 75%

< 75%

< 75%

< 75%

< 75%

MEM: ALERT

> 85%

> 85%

> 85%

> 85%

> 85%

SAT: OK

> 90 days

> 90 days

> 90 days

> 90 days

> 90 days

SAT: ALERT

< 30 days

< 30 days

< 30 days

< 30 days

< 30 days

POD: OK

< 70%

< 70%

< 70%

< 70%

< 70%

POD: ALERT

> 80%

> 80%

> 80%

> 80%

> 80%

 The status is evaluated as the worst status among the following resources and the bottleneck is the worst resource associated to that state. 

Resource

Bottleneck

CPU/Memory Utilization

CPU/MEM:USED

CPU/Memory Request on Capacity

CPU/MEM:REQUEST

CPU/Memory Limit on Quota

CPU/MEM:QUOTA

CPU/Memory Request on Quota

CPU/MEM:QUOTA

CPU/Memory Used on Limit

CPU/MEM:QUOTA

Pod number

POD:NUM

Days to saturation

CPU/MEM:USED


Please note CPU/Memory Used on Request metrics are not considered in the status and bottleneck evaluation since used is allowed to be greater than requested, hence they can go above 100%.

Materializer Task

The Kubernetes view takes advantage of  BMC Helix Continuous Optimization materialized data marts in order to enable faster page loading times during user browsing, by pre-computing the underlying data. To achieve this result, a proper DataMartMaterializer task is deployed as part of the view installation process. By default, the task is scheduled to run every day at midnight. Please modify the scheduling parameters to fit the environment needs, such as:

  • Data loading scheduling and warehouse latency
  • Multiple refresh in the same day
  • Kubernetes View user need

The Kubernetes View materializer is shown in the next figure and can be found under System Tasks.

worddavd96ad9574e9d48b4a523350d8744c970.png


 

Tip: For faster searching, add an asterisk to the end of your partial query. Example: cert*