Kubernetes view

The Kubernetes view enables you to manage the capacity and efficiency of containerized environments managed by the Kubernetes platform, and presents key capacity metrics and charts for Kubernetes clusters, nodes, namespaces, controllers and pods.

You can use the Kubernetes view to complete tasks such as:

Understand resource bottlenecks and aggregate residual capacity of Kubernetes clusters as well as individual nodes
Detect current or imminent resource saturation conditions and days before the resource is saturated for every major Kubernetes resource (e.g. cluster, nodes, controllers, pods)
Assess the level of infrastructure efficiency, by comparing allocated vs actually resources, and identify most wasteful controllers or pods
Identify application in resource usage patterns and detect resource shortage conditions
Characterize the footprint of infrastructure resources on a containers' image basis
Understand resource utilization of namespaces, including the level of usage of resource quotas

Moviri Integrator for BMC Helix Continuous Optimization - Kubernetes View is compatible with BMC Helix Continuous Optimization 19.11 and onward.

Videos

The following video (5:35) provides a brief introduction of the Kubernetes views.

https://youtu.be/2LlBvw_zzDk

The following video (9.53) provides information about how to access and use the Kubernetes views.

https://youtu.be/yyAIi8_DMkM

Requirements

Conventions

The Kubernetes view provides summarized, high-level capacity KPIs designed for capacity management.The following common conventions around naming and metrics aggregation are valid for all data presented in the view:

Standard table column names: Metric name [unit of measurement] (ex. "Memory [GB]")
Unit of measurement is omitted when implicitly evident
The metric value is the aggregation of the last 30 days
The metric value is computed as follows: for each day, the daily peak is considered (at hourly resolution). Then, the mean value of the daily peak over the last 30 days is shown.

This is only valid for summary metrics presented in tables and overall page. The charts presented in the details pages follow regular over time metrics semantics, whose time frame and time resolution available as filters in the top of the page.

View Structure

The Kubernetes view is composed of the following first-level pages:

Overview: it presents a summary of Kubernetes components states
Clusters: it shows capacity metrics for Kubernetes clusters
Nodes: it shows capacity metrics for Kubernetes nodes
Namespaces: it shows capacity metrics for Kubernetes namespaces
Controllers: it shows capacity metrics for Kubernetes controllers
Pods: it shows capacity metrics for Kubernetes pod workload

For each page except Overview, a set of second-level tabs cover information about some or all of the following parts:

Capacity: summary of the most important capacity indicators
CPU: the most relevant CPU configuration and performance metrics
Memory: the most relevant memory configuration and performance metrics
Storage: the most relevant storage configuration and performance metrics (this tab is defined only in the Cluster page)

From all of the second-level tabs it is possible to drill-down to an entity detail page, which presents the most relevant performance metrics as time charts and tables for the most important configuration properties.

Overview

The goal of the overview page is to provide at-a-glance aggregated capacity visibility over all of the main Kubernetes components.

For each component three doughnut graphs are represented, showing the number of entities of the specific component based on the corresponding capacity status:

Ok: the component is healthy from a capacity management perspective
Warning: the component has breached (or will be breaching in the near future) a warning utilization threshold for one or more metrics
Alert: the component has breached (or will be breaching in the near future) a critical utilization threshold for one or more metrics

Please refer to the section Threshold, Bottleneck & Status below for more details on how the capacity status is calculated for each entity.

As we can see from the picture below each doughnut:

is scaled with the number of components taken in account;
has a color dependent on the component capacity status;
can be moved in the "Favorites" tabs clicking on the associated star.

Entity Pages

These first level pages (Clusters, Nodes, Namespaces, Controllers and Pods) are designed to provide capacity management insights for the corresponding Kubernetes entity. Each page contains four different second-level tabs (Capacity, CPU, memory and Storage) each of which presents capacity and efficiency KPIs related to the resources relevant for the analyzed entity.

Capacity

The first tab "Capacity" is designed to provide a summary of the most important KPIs for managing capacity of Kubernetes environments.

The table below summarize the information that is provided by the tab. Depending on the particular entity that is being analyzed (e.g. Cluster or Pod), only the relevant set of columns is shown. For example, the "Controller" column is only shown for Pods, while columns related to quotas like "Mem Request vs Quota" are only shown for Namespaces.

Column name	Meaning
Cluster	Name of the Cluster
Node	Name of the Node
Namespace	Name of the Namespace
Controller	Name of the controller
Pod	Name of the pod
Pod #	Number of pods (KPOD_NUM)
Status	Indicator of the resource's status (*)
CPU Used vs Cap [%]	Percentage of CPUs actually used with respect to CPU capacity (number of CPUs), calculated as CPU_USED_NUM/CPU_NUM
CPU Request vs Cap [%]	Percentage of CPUs requested with respect to CPU capacity (number of CPUs), calculated as CPU_REQUEST/CPU_NUM
CPU Request vs Quota [%]	Percentage of CPU request with respect to the namespace CPU request quota, calculated as CPU_REQUEST/CPU_REQUEST_MAX
CPU Limit vs Quota [%]	Percentage of CPU limit with respect to the CPU limit quota, calculated as CPU_LIMIT/CPU_LIMIT_MAX
CPU Used vs Limit [%]	Percentage of CPU used with respect to the CPU limit, calculated as CPU_USED_NUM/CPU_LIMIT
Mem Used vs Cap [%]	Percentage of memory actually used with respect to memory capacity (total memory), calculated as MEM_USED/TOTAL_REAL_MEM
Mem Real Used Vs Cap [%]	Percentage of memory actually used with respect to memory capacity (total memory), calculated as MEM_REAL_USED/TOTAL_REAL_MEM
Mem Request vs Cap [%]	Percentage of memory requested with respect to memory capacity (total memory), calculated as MEM_REQUEST/TOTAL_REAL_MEM
Mem Request vs Quota [%]	Percentage of memory request with respect to the namespace memory request quota, calculated as MEM_REQUEST/MEM_REQUEST_MAX
Mem Limit vs Quota [%]	Percentage of memory limit with respect to the namespace memory limit quota, calculated as MEM_LIMIT/MEM_LIMIT_MAX
Mem Used vs Limit [%]	Percentage of memory used with respect to the memory limit, calculated as MEM_USED/MEM_LIMIT
Pod # vs Pod Max [%]	Percentage of pod already created
Spare Pods	Estimated number of residual capacity in terms of additional pods that can be scheduled. This considers the average size of existing pods in the cluster
Bottleneck Resource	First resource to saturate
CPU #	Number of CPUs (CPU_NUM)
CPU USED	Number of cores that is used (CPU_USED_NUM)
CPU Request	Amount of CPU (cores) that will be allowed to use (CPU_REQEST)
CPU Limit	Amount of CPU (cores) that will be allowed to use (CPU_LIMIT)
Memory	The total amount of memory installed on the system (TOTAL_REAL_MEM)
MEM USED	The amount of memory used in bytes (MEM_USED)
MEM REAL USED	The total memory excluding file system cache and buffers in use on UNIX systems during the interval (MEM_REAL_USED)
MEM Request	Amount of memory that will be guaranteed (MEM_REQUEST)
MEM Limit	The maximum amount of physical memory that can be allocated (MEM_LIMIT)

The picture below is an example of Capacity tab.

CPU

The CPU tab presents the most important metrics related to the CPU. The column definition is defined in the table below.

Column name	Meaning
Cluster	Name of the cluster
Node	Name of the node
Namespace	Name of the namespace
Controller	Name of the controller
Pod	Name of the pod
Status	Resource capacity status
Days To Saturation	Days before the resource is saturated
CPU #	Number of CPUs (CPU_NUM)
CPU USED	Number of cores that is used (CPU_USED_NUM)
CPU Request	Amount of CPU (cores) that will be allowed to use (CPU_REQEST)
CPU Limit	Amount of CPU (cores) that will be allowed to use (CPU_LIMIT)
Quota Request	Quota set for the CPU request resource (CPU_REQUEST_MAX)
Quota Limit	Quota set for the CPU limit resource (CPU_LIMIT_MAX)
CPU USED (mCores)	Number of millicores that is used (CPU_USED_NUM)
CPU Request (mCores)	Amount of CPU (millicores ) that will be allowed to use (CPU_REQEST)
CPU Limit (mCores)	Amount of CPU (millicores ) that will be allowed to use (CPU_LIMIT)
Quota Request (mCores)	Quota set for the CPU request resource (CPU_REQUEST_MAX)
Quota Limit (mCores)	Quota set for the CPU limit resource (CPU_LIMIT_MAX)
CPU Used vs Request[%]	Percentage of CPU used with respect to the CPU request
CPU Used vs Cap [%]	Percentage of CPUs actually used with respect to CPU capacity (number of CPUs), calculated as CPU_USED_NUM/CPU_NUM
CPU Request vs Cap [%]	Percentage of CPUs requested with respect to CPU capacity (number of CPUs), calculated as CPU_REQUEST/CPU_NUM
CPU Request vs Quota [%]	Percentage of CPU request with respect to the namespace CPU request quota, calculated as CPU_REQUEST/CPU_REQUEST_MAX
CPU Limit vs Quota [%]	Percentage of CPU limit with respect to the CPU limit quota, calculated as CPU_LIMIT/CPU_LIMIT_MAX
CPU Used vs Limit [%]	Percentage of CPU used with respect to the CPU limit, calculated as CPU_USED_NUM/CPU_LIMIT
CPU Overcommitment [%]	Percentage of CPU limit with respect to the CPU capacity (number of CPUs), calculated as CPU_LIMIT/CPU_NUM
Bottleneck	First resource to saturate

Memory

The Memory tabs presents the most important metrics related to the memory. The column definition is defined in the table below.

Column name	Meaning
Cluster	Name of the Cluster
Node	Name of the Node
Namespace	Name of the Namespace
Controller	Name of the controller
Pod	Name of the pod
Status	Indicator of the resource's status
Days To Saturation	Days before the saturation of the physical resources
Memory	The total amount of memory installed on the system (TOTAL_REAL_MEM)
MEM USED	The amount of memory used in bytes (MEM_USED)
MEM REAL USED	The total memory excluding file system cache and buffers in use on UNIX systems during the interval (MEM_REAL_USED)
MEM Request	Amount of memory that will be guaranteed (MEM_REQUEST)
MEM Limit	The maximum amount of physical memory that can be allocated (MEM_LIMIT)
Quota Request	Amount of memory quota request (MEM_REQUEST_MAX)
Quota Limit	Amount of memory quota limit (MEM_LIMIT_MAX)
Mem Used vs Request [%]	Percentage of memory used with respect to memory request, calculated as MEM_USED/MEM_REQUEST
Mem Used vs Cap [%]	Percentage of memory actually used with respect to memory capacity (total memory), calculated as MEM_USED/TOTAL_REAL_MEM
Mem Request vs Cap [%]	Percentage of memory requested with respect to memory capacity (total memory), calculated as MEM_REQUEST/TOTAL_REAL_MEM
Mem Request vs Quota [%]	Percentage of memory request with respect to the namespace memory request quota, calculated as MEM_REQUEST/MEM_REQUEST_MAX
Mem Limit vs Quota [%]	Percentage of memory limit with respect to the namespace memory limit quota, calculated as MEM_LIMIT/MEM_LIMIT_MAX
Mem Used vs Limit [%]	Percentage of memory used with respect to the memory limit, calculated as MEM_USED/MEM_LIMIT
Mem Overcommitment [%]	Percentage of memory limit with respect to memory capacity (total memory), calculated as MEM_LIMIT/TOTAL_REAL_MEM
Mem Real Used Vs Cap [%]	Percentage of memory actually used with respect to memory capacity (total memory), calculated as MEM_REAL_USED/TOTAL_REAL_MEM
Mem Real Used Vs Request [%]	Percentage of memory used with respect to memory request, calculated as MEM_REAL_USED/MEM_REQUEST
Bottleneck	First resource to saturate

Storage

Since in Kubernetes the persistent volumes can be requested only from by the Cluster, the cluster tab is the only one with this subsection. Since the persistent volume management is still at the early stages of the developments by Kubernetes, the only relevant information available up to now are the following.

Column name	Meaning
Cluster	Name of the Cluster
Number of PV	Number of persistent volumes
PV Capacity [GB]	Storage capacity aggregated across all of the configured persistent volumes
PV Allocated [GB]	Storage allocated space aggregated across all of the configured persistent volumes
PV Free [GB]	Storage free space aggregated across all of the configured persistent volumes
PV Allocated vs Capacity [%]	Percentage of storage allocated with respect to capacity

Details page

Clicking on the name of one resource you will arrive in the Details page. In this dashboard, as shown in the picture below, you can find the time charts of the most important metrics, related to the system take into account and a table with the configuration metrics.

For more details on how the BMC Helix Continuous Optimization metrics map to the Kubernetes ones and how the derived metrics are computed, please refer to the Kubernetes integrator documentation.

Thresholds, Status & Bottleneck

The table below reports the thresholds used in the view.
The column Resource is defined as: "Resource taken into account: status"; the defined resources are:

CPU: all the metrics related to the CPU (ex. CPU utilization or CPU Request over CPU Number)
MEM: all the metrics related to the memory (ex. memory utilization or memory Request over memory capacity)
SAT: days before saturation of the resource

POD: percentage of pods required over pod creation limit (this limit depends from the Kubernetes version installed)

Resource	Cluster	Node	Namespace	Controller	Pod (Pod Workload)
CPU: OK	< 80%	< 80%	< 80%	< 80%	< 80%
CPU: ALERT	> 90%	> 90%	> 90%	> 90%	> 90%
MEM: OK	< 75%	< 75%	< 75%	< 75%	< 75%
MEM: ALERT	> 85%	> 85%	> 85%	> 85%	> 85%
SAT: OK	> 90 days	> 90 days	> 90 days	> 90 days	> 90 days
SAT: ALERT	< 30 days	< 30 days	< 30 days	< 30 days	< 30 days
POD: OK	< 70%	< 70%	< 70%	< 70%	< 70%
POD: ALERT	> 80%	> 80%	> 80%	> 80%	> 80%

The status is evaluated as the worst status among the following resources and the bottleneck is the worst resource associated to that state.

Resource	Bottleneck
CPU/Memory Utilization	CPU/MEM:USED
CPU/Memory Request on Capacity	CPU/MEM:REQUEST
CPU/Memory Limit on Quota	CPU/MEM:QUOTA
CPU/Memory Request on Quota	CPU/MEM:QUOTA
CPU/Memory Used on Limit	CPU/MEM:QUOTA
Pod number	POD:NUM
Days to saturation	CPU/MEM:USED

Please note CPU/Memory Used on Request metrics are not considered in the status and bottleneck evaluation since used is allowed to be greater than requested, hence they can go above 100%.

Materializer Task

The Kubernetes view takes advantage of BMC Helix Continuous Optimization materialized data marts in order to enable faster page loading times during user browsing, by pre-computing the underlying data. To achieve this result, a proper DataMartMaterializer task is deployed as part of the view installation process. By default, the task is scheduled to run every day at midnight. Please modify the scheduling parameters to fit the environment needs, such as:

Data loading scheduling and warehouse latency
Multiple refresh in the same day
Kubernetes View user need

The Kubernetes View materializer is shown in the next figure and can be found under System Tasks.