Kubernetes view
The Kubernetes view enables you to manage the capacity and efficiency of containerized environments managed by the Kubernetes platform, and presents key capacity metrics and charts for Kubernetes clusters, nodes, namespaces, controllers and pods.
You can use the Kubernetes view to complete tasks such as:
- Understand resource bottlenecks and aggregate residual capacity of Kubernetes clusters as well as individual nodes
- Detect current or imminent resource saturation conditions and days before the resource is saturated for every major Kubernetes resource (e.g. cluster, nodes, controllers, pods)
- Assess the level of infrastructure efficiency, by comparing allocated vs actually resources, and identify most wasteful controllers or pods
- Identify application in resource usage patterns and detect resource shortage conditions
- Characterize the footprint of infrastructure resources on a containers' image basis
- Understand resource utilization of namespaces, including the level of usage of resource quotas
Videos
The following video (5:35) provides a brief introduction of the Kubernetes views.
The following video (9.53) provides information about how to access and use the Kubernetes views.
Requirements
Conventions
The Kubernetes view provides summarized, high-level capacity KPIs designed for capacity management.The following common conventions around naming and metrics aggregation are valid for all data presented in the view:
- Standard table column names: Metric name [unit of measurement] (ex. "Memory [GB]")
- Unit of measurement is omitted when implicitly evident
- The metric value is the aggregation of the last 30 days
- The metric value is computed as follows: for each day, the daily peak is considered (at hourly resolution). Then, the mean value of the daily peak over the last 30 days is shown.
This is only valid for summary metrics presented in tables and overall page. The charts presented in the details pages follow regular over time metrics semantics, whose time frame and time resolution available as filters in the top of the page.
View Structure
The Kubernetes view is composed of the following first-level pages:
- Overview: it presents a summary of Kubernetes components states
- Clusters: it shows capacity metrics for Kubernetes clusters
- Nodes: it shows capacity metrics for Kubernetes nodes
- Namespaces: it shows capacity metrics for Kubernetes namespaces
- Controllers: it shows capacity metrics for Kubernetes controllers
- Pods: it shows capacity metrics for Kubernetes pod workload
For each page except Overview, a set of second-level tabs cover information about some or all of the following parts:
- Capacity: summary of the most important capacity indicators
- CPU: the most relevant CPU configuration and performance metrics
- Memory: the most relevant memory configuration and performance metrics
- Storage: the most relevant storage configuration and performance metrics (this tab is defined only in the Cluster page)
From all of the second-level tabs it is possible to drill-down to an entity detail page, which presents the most relevant performance metrics as time charts and tables for the most important configuration properties.
Overview
The goal of the overview page is to provide at-a-glance aggregated capacity visibility over all of the main Kubernetes components.
For each component three doughnut graphs are represented, showing the number of entities of the specific component based on the corresponding capacity status:
- Ok: the component is healthy from a capacity management perspective
- Warning: the component has breached (or will be breaching in the near future) a warning utilization threshold for one or more metrics
- Alert: the component has breached (or will be breaching in the near future) a critical utilization threshold for one or more metrics
Please refer to the section Threshold, Bottleneck & Status below for more details on how the capacity status is calculated for each entity.
As we can see from the picture below each doughnut:
- is scaled with the number of components taken in account;
- has a color dependent on the component capacity status;
- can be moved in the "Favorites" tabs clicking on the associated star.
Entity Pages
These first level pages (Clusters, Nodes, Namespaces, Controllers and Pods) are designed to provide capacity management insights for the corresponding Kubernetes entity. Each page contains four different second-level tabs (Capacity, CPU, memory and Storage) each of which presents capacity and efficiency KPIs related to the resources relevant for the analyzed entity.
Capacity
The first tab "Capacity" is designed to provide a summary of the most important KPIs for managing capacity of Kubernetes environments.
The table below summarize the information that is provided by the tab. Depending on the particular entity that is being analyzed (e.g. Cluster or Pod), only the relevant set of columns is shown. For example, the "Controller" column is only shown for Pods, while columns related to quotas like "Mem Request vs Quota" are only shown for Namespaces.
Column name | Meaning |
---|---|
Cluster | Name of the Cluster |
Node | Name of the Node |
Namespace | Name of the Namespace |
Controller | Name of the controller |
Pod | Name of the pod |
Pod # | Number of pods (KPOD_NUM) |
Status | Indicator of the resource's status (*) |
CPU Used vs Cap [%] | Percentage of CPUs actually used with respect to CPU capacity (number of CPUs), calculated as CPU_USED_NUM/CPU_NUM |
CPU Request vs Cap [%] | Percentage of CPUs requested with respect to CPU capacity (number of CPUs), calculated as CPU_REQUEST/CPU_NUM |
CPU Request vs Quota [%] | Percentage of CPU request with respect to the namespace CPU request quota, calculated as CPU_REQUEST/CPU_REQUEST_MAX |
CPU Limit vs Quota [%] | Percentage of CPU limit with respect to the CPU limit quota, calculated as CPU_LIMIT/CPU_LIMIT_MAX |
CPU Used vs Limit [%] | Percentage of CPU used with respect to the CPU limit, calculated as CPU_USED_NUM/CPU_LIMIT |
Mem Used vs Cap [%] | Percentage of memory actually used with respect to memory capacity (total memory), calculated as MEM_USED/TOTAL_REAL_MEM |
Mem Real Used Vs Cap [%] | Percentage of memory actually used with respect to memory capacity (total memory), calculated as MEM_REAL_USED/TOTAL_REAL_MEM |
Mem Request vs Cap [%] | Percentage of memory requested with respect to memory capacity (total memory), calculated as MEM_REQUEST/TOTAL_REAL_MEM |
Mem Request vs Quota [%] | Percentage of memory request with respect to the namespace memory request quota, calculated as MEM_REQUEST/MEM_REQUEST_MAX |
Mem Limit vs Quota [%] | Percentage of memory limit with respect to the namespace memory limit quota, calculated as MEM_LIMIT/MEM_LIMIT_MAX |
Mem Used vs Limit [%] | Percentage of memory used with respect to the memory limit, calculated as MEM_USED/MEM_LIMIT |
Pod # vs Pod Max [%] | Percentage of pod already created |
Spare Pods | Estimated number of residual capacity in terms of additional pods that can be scheduled. This considers the average size of existing pods in the cluster |
Bottleneck Resource | First resource to saturate |
CPU # | Number of CPUs (CPU_NUM) |
CPU USED | Number of cores that is used (CPU_USED_NUM) |
CPU Request | Amount of CPU (cores) that will be allowed to use (CPU_REQEST) |
CPU Limit | Amount of CPU (cores) that will be allowed to use (CPU_LIMIT) |
Memory | The total amount of memory installed on the system (TOTAL_REAL_MEM) |
MEM USED | The amount of memory used in bytes (MEM_USED) |
MEM REAL USED | The total memory excluding file system cache and buffers in use on UNIX systems during the interval (MEM_REAL_USED) |
MEM Request | Amount of memory that will be guaranteed (MEM_REQUEST) |
MEM Limit | The maximum amount of physical memory that can be allocated (MEM_LIMIT) |
The picture below is an example of Capacity tab.
CPU
The CPU tab presents the most important metrics related to the CPU. The column definition is defined in the table below.
Column name | Meaning |
---|---|
Cluster | Name of the cluster |
Node | Name of the node |
Namespace | Name of the namespace |
Controller | Name of the controller |
Pod | Name of the pod |
Status | Resource capacity status |
Days To Saturation | Days before the resource is saturated |
CPU # | Number of CPUs (CPU_NUM) |
CPU USED | Number of cores that is used (CPU_USED_NUM) |
CPU Request | Amount of CPU (cores) that will be allowed to use (CPU_REQEST) |
CPU Limit | Amount of CPU (cores) that will be allowed to use (CPU_LIMIT) |
Quota Request | Quota set for the CPU request resource (CPU_REQUEST_MAX) |
Quota Limit | Quota set for the CPU limit resource (CPU_LIMIT_MAX) |
CPU USED (mCores) | Number of millicores that is used (CPU_USED_NUM) |
CPU Request (mCores) | Amount of CPU (millicores ) that will be allowed to use (CPU_REQEST) |
CPU Limit (mCores) | Amount of CPU (millicores ) that will be allowed to use (CPU_LIMIT) |
Quota Request (mCores) | Quota set for the CPU request resource (CPU_REQUEST_MAX) |
Quota Limit (mCores) | Quota set for the CPU limit resource (CPU_LIMIT_MAX) |
CPU Used vs Request[%] | Percentage of CPU used with respect to the CPU request |
CPU Used vs Cap [%] | Percentage of CPUs actually used with respect to CPU capacity (number of CPUs), calculated as CPU_USED_NUM/CPU_NUM |
CPU Request vs Cap [%] | Percentage of CPUs requested with respect to CPU capacity (number of CPUs), calculated as CPU_REQUEST/CPU_NUM |
CPU Request vs Quota [%] | Percentage of CPU request with respect to the namespace CPU request quota, calculated as CPU_REQUEST/CPU_REQUEST_MAX |
CPU Limit vs Quota [%] | Percentage of CPU limit with respect to the CPU limit quota, calculated as CPU_LIMIT/CPU_LIMIT_MAX |
CPU Used vs Limit [%] | Percentage of CPU used with respect to the CPU limit, calculated as CPU_USED_NUM/CPU_LIMIT |
CPU Overcommitment [%] | Percentage of CPU limit with respect to the CPU capacity (number of CPUs), calculated as CPU_LIMIT/CPU_NUM |
Bottleneck | First resource to saturate |
Memory
The Memory tabs presents the most important metrics related to the memory. The column definition is defined in the table below.
Column name | Meaning |
---|---|
Cluster | Name of the Cluster |
Node | Name of the Node |
Namespace | Name of the Namespace |
Controller | Name of the controller |
Pod | Name of the pod |
Status | Indicator of the resource's status |
Days To Saturation | Days before the saturation of the physical resources |
Memory | The total amount of memory installed on the system (TOTAL_REAL_MEM) |
MEM USED | The amount of memory used in bytes (MEM_USED) |
MEM REAL USED | The total memory excluding file system cache and buffers in use on UNIX systems during the interval (MEM_REAL_USED) |
MEM Request | Amount of memory that will be guaranteed (MEM_REQUEST) |
MEM Limit | The maximum amount of physical memory that can be allocated (MEM_LIMIT) |
Quota Request | Amount of memory quota request (MEM_REQUEST_MAX) |
Quota Limit | Amount of memory quota limit (MEM_LIMIT_MAX) |
Mem Used vs Request [%] | Percentage of memory used with respect to memory request, calculated as MEM_USED/MEM_REQUEST |
Mem Used vs Cap [%] | Percentage of memory actually used with respect to memory capacity (total memory), calculated as MEM_USED/TOTAL_REAL_MEM |
Mem Request vs Cap [%] | Percentage of memory requested with respect to memory capacity (total memory), calculated as MEM_REQUEST/TOTAL_REAL_MEM |
Mem Request vs Quota [%] | Percentage of memory request with respect to the namespace memory request quota, calculated as MEM_REQUEST/MEM_REQUEST_MAX |
Mem Limit vs Quota [%] | Percentage of memory limit with respect to the namespace memory limit quota, calculated as MEM_LIMIT/MEM_LIMIT_MAX |
Mem Used vs Limit [%] | Percentage of memory used with respect to the memory limit, calculated as MEM_USED/MEM_LIMIT |
Mem Overcommitment [%] | Percentage of memory limit with respect to memory capacity (total memory), calculated as MEM_LIMIT/TOTAL_REAL_MEM |
Mem Real Used Vs Cap [%] | Percentage of memory actually used with respect to memory capacity (total memory), calculated as MEM_REAL_USED/TOTAL_REAL_MEM |
Mem Real Used Vs Request [%] | Percentage of memory used with respect to memory request, calculated as MEM_REAL_USED/MEM_REQUEST |
Bottleneck | First resource to saturate |
Storage
Since in Kubernetes the persistent volumes can be requested only from by the Cluster, the cluster tab is the only one with this subsection. Since the persistent volume management is still at the early stages of the developments by Kubernetes, the only relevant information available up to now are the following.
Column name | Meaning |
---|---|
Cluster | Name of the Cluster |
Number of PV | Number of persistent volumes |
PV Capacity [GB] | Storage capacity aggregated across all of the configured persistent volumes |
PV Allocated [GB] | Storage allocated space aggregated across all of the configured persistent volumes |
PV Free [GB] | Storage free space aggregated across all of the configured persistent volumes |
PV Allocated vs Capacity [%] | Percentage of storage allocated with respect to capacity |
Details page
Clicking on the name of one resource you will arrive in the Details page. In this dashboard, as shown in the picture below, you can find the time charts of the most important metrics, related to the system take into account and a table with the configuration metrics.
For more details on how the BMC Helix Continuous Optimization metrics map to the Kubernetes ones and how the derived metrics are computed, please refer to the Kubernetes integrator documentation.
Thresholds, Status & Bottleneck
The table below reports the thresholds used in the view.
The column Resource is defined as: "Resource taken into account: status"; the defined resources are:
- CPU: all the metrics related to the CPU (ex. CPU utilization or CPU Request over CPU Number)
- MEM: all the metrics related to the memory (ex. memory utilization or memory Request over memory capacity)
- SAT: days before saturation of the resource
- POD: percentage of pods required over pod creation limit (this limit depends from the Kubernetes version installed)
Resource | Cluster | Node | Namespace | Controller | Pod (Pod Workload) |
---|---|---|---|---|---|
CPU: OK | < 80% | < 80% | < 80% | < 80% | < 80% |
CPU: ALERT | > 90% | > 90% | > 90% | > 90% | > 90% |
MEM: OK | < 75% | < 75% | < 75% | < 75% | < 75% |
MEM: ALERT | > 85% | > 85% | > 85% | > 85% | > 85% |
SAT: OK | > 90 days | > 90 days | > 90 days | > 90 days | > 90 days |
SAT: ALERT | < 30 days | < 30 days | < 30 days | < 30 days | < 30 days |
POD: OK | < 70% | < 70% | < 70% | < 70% | < 70% |
POD: ALERT | > 80% | > 80% | > 80% | > 80% | > 80% |
The status is evaluated as the worst status among the following resources and the bottleneck is the worst resource associated to that state.
Resource | Bottleneck |
---|---|
CPU/Memory Utilization | CPU/MEM:USED |
CPU/Memory Request on Capacity | CPU/MEM:REQUEST |
CPU/Memory Limit on Quota | CPU/MEM:QUOTA |
CPU/Memory Request on Quota | CPU/MEM:QUOTA |
CPU/Memory Used on Limit | CPU/MEM:QUOTA |
Pod number | POD:NUM |
Days to saturation | CPU/MEM:USED |
Please note CPU/Memory Used on Request metrics are not considered in the status and bottleneck evaluation since used is allowed to be greater than requested, hence they can go above 100%.
Materializer Task
The Kubernetes view takes advantage of BMC Helix Continuous Optimization materialized data marts in order to enable faster page loading times during user browsing, by pre-computing the underlying data. To achieve this result, a proper DataMartMaterializer task is deployed as part of the view installation process. By default, the task is scheduled to run every day at midnight. Please modify the scheduling parameters to fit the environment needs, such as:
- Data loading scheduling and warehouse latency
- Multiple refresh in the same day
- Kubernetes View user need
The Kubernetes View materializer is shown in the next figure and can be found under System Tasks.