Kubernetes view


The Kubernetes view enables you to manage the capacity and efficiency of containerized environments managed by the Kubernetes platform, and presents key capacity metrics and charts for Kubernetes clusters, nodes, namespaces, controllers and pods. 

The latest version of the integrator Kubernetes View is available on EPD. Click the Moviri Integrator for TrueSight Capacity Optimization –  Kubernetes View link. In the Patches tab, select the latest version of TrueSight Capacity Optimization . This ETL is compatible with Moviri Integrator for TrueSight Capacity Optimization 11.5 and onward.

You can use the Kubernetes view to complete tasks such as:

  • Understand resource bottlenecks and aggregate residual capacity of Kubernetes clusters as well as individual nodes
  • Detect current or imminent resource saturation conditions and days before the resource is saturated for every major Kubernetes resource (e.g. cluster, nodes, controllers, pods)
  • Assess the level of infrastructure efficiency, by comparing allocated vs actually resources, and identify most wasteful controllers or pods
  • Identify application in resource usage patterns and detect resource shortage conditions
  • Characterize the footprint of infrastructure resources on a containers' image basis
  • Understand resource utilization of namespaces, including the level of usage of resource quotas

Requirements

Conventions

The Kubernetes view provides summarized, high-level capacity KPIs designed for capacity management. The following common conventions around naming and metrics aggregation are valid for all data presented in the view:

  • Standard table column names: Metric name [unit of measurement] (ex. "Memory [GB]")
  • Unit of measurement is omitted when implicitly evident
  • The metric value is the aggregation of the last 30 days
  • The metric value is computed as follows: for each day, the daily maximum is considered (at daily resolution). Then, the average value of the daily peak over the last 30 days is shown. 
  • For "days to saturation", if the value is greater than 90 days, the view will show 90 days
  • Kubernetes View supports entities collected by both Moviri Integrator Prometheus and Heapster. If the entities are collected by Moviri Integrator Heapster, some columns will remain empty or "NA".

This is only valid for summary metrics presented in tables and overall page. The charts presented in the details pages follow regular over time metrics semantics, whose time frame and time resolution available as filters in the top of the page.

View Structure

The Kubernetes view is composed of the following first-level pages:

  • Overview: it presents a summary of Kubernetes components states
  • Clusters: it shows capacity metrics for Kubernetes clusters
  • Nodes: it shows capacity metrics for Kubernetes nodes
  • Namespaces: it shows capacity metrics for Kubernetes namespaces
  • Controllers: it shows capacity metrics for Kubernetes controllers
  • Pods: it shows capacity metrics for Kubernetes pod workload

For each page except Overview, a set of second-level tabs cover information about some or all of the following parts:

  • Capacity: summary of the most important capacity indicators
  • CPU: the most relevant CPU configuration and performance metrics
  • Memory: the most relevant memory configuration and performance metrics
  • Storage: the most relevant storage configuration and performance metrics (this tab is defined only in the Cluster page)

From all of the second-level tabs it is possible to drill-down to an entity detail page, which presents the most relevant performance metrics as time charts and tables for the most important configuration properties.

Overview

The goal of the overview page is to provide at-a-glance aggregated capacity visibility over all of the main Kubernetes components.

For each component three doughnut graphs are represented, showing the number of entities of the specific component based on the corresponding capacity status:

  • Ok: the component is healthy from a capacity management perspective 
  • Warning: the component has breached (or will be breaching in the near future) a warning utilization threshold for one or more metrics
  • Alert: the component has breached (or will be breaching in the near future) a critical utilization threshold for one or more metrics

Please refer to the section Threshold, Bottleneck & Status below for more details on how the capacity status is calculated for each entity. 

As we can see from the picture below each doughnut:

  • is scaled with the number of components taken in account;
  • has a color dependent on the component capacity status;
  • can be moved in the "Favorites" tabs clicking on the associated star.

worddavebd24aaba9f673faeac0260e33270ed0.png

Entity Pages

These first level pages (Clusters, Nodes, Namespaces, Controllers and Pods) are designed to provide capacity management insights for the corresponding Kubernetes entity. Each page contains four different second-level tabs (Capacity, CPU, memory and Storage) each of which presents capacity and efficiency KPIs related to the resources relevant for the analyzed entity.

Capacity

The first tab "Capacity" is designed to provide a summary of the most important KPIs for managing capacity of Kubernetes environments.

The table below summarize the information that is provided by the tab. Depending on the particular entity that is being analyzed (e.g. Cluster or Pod), only the relevant set of columns is shown. For example, the "Controller" column is only shown for Pods, while columns related to quotas like "Mem Request vs Quota" are only shown for Namespaces.

*: CView columns will be blank or show N/A if the entity is collected from a Heapster data source. The data is available from Prometheus.

Column Name

Meaning and Calculation

Cluster

Name of the Cluster

Node

Name of the Node

Namespace

Name of the Namespace

Controller

Name of the controller

Pod

Name of the pod

Pod #

Number of pods (KPOD_NUM)

Status

Indicator of the resource's status

CPU Used vs Cap [%]

Percentage of CPUs actually used with respect to CPU capacity (number of CPUs), calculated as CPU_USED_NUM/CPU_NUM

CPU Request vs Cap [%]

Percentage of CPUs requested with respect to CPU capacity (number of CPUs), calculated as CPU_REQUEST/CPU_NUM

CPU Request vs Quota [%]

Percentage of CPU request with respect to the namespace CPU request quota, calculated as CPU_REQUEST/CPU_REQUEST_MAX

CPU Limit vs Quota [%]

Percentage of CPU limit with respect to the CPU limit quota, calculated as CPU_LIMIT/CPU_LIMIT_MAX

CPU Used vs Limit [%]

Percentage of CPU used with respect to the CPU limit, calculated from the metric CPU_UTIL_LIMIT

Mem Used vs Cap [%]

Percentage of memory actually used with respect to memory capacity (total memory), calculated as MEM_USED/TOTAL_REAL_MEM

Mem Real Used Vs Cap [%] *

Percentage of memory actually used with respect to memory capacity (total memory), calculated as MEM_REAL_USED/TOTAL_REAL_MEM

Mem Request vs Cap [%]

Percentage of memory requested with respect to memory capacity (total memory), calculated as MEM_REQUEST/TOTAL_REAL_MEM

Mem Request vs Quota [%]

Percentage of memory request with respect to the namespace memory request quota, calculated as MEM_REQUEST/MEM_REQUEST_MAX

Mem Limit vs Quota [%]

Percentage of memory limit with respect to the namespace memory limit quota, calculated as MEM_KLIMIT/MEM_LIMIT_MAX.

Mem Used vs Limit [%]

Percentage of memory used with respect to the memory limit, calculated from the metric MEM_UTIL_LIMITS

CPU Request vs Allocatable [%]*

Percentage of CPU Request respect to the CPU Allocatable, calculated from the metric CPU_REQUEST_ALLOCTABLE

Memory Request vs Allocatable [%]*

Percentage of memory request respect to the memory allocatable, calulated from the metric MEM_REQUEST_ALLOCATABLE

Pod # vs Pod Max [%]

Percentage of pod already created (KPOD_NUM/KPOD_NUM_MAX)

Spare Pods

Estimated number of residual capacity in terms of additional pods that can be scheduled. This considers the average size of existing pods in the cluster (KPOD_NUM_MAX - KPOD_NUM)

Bottleneck Resource

First resource to saturate

CPU #

Number of CPUs (CPU_NUM)

CPU USED

Number of cores that is used (CPU_USED_NUM)

CPU Request

Amount of CPU (cores) that will be allowed to use (CPU_REQEST)

CPU Limit

Amount of CPU (cores) that will be allowed to use (CPU_LIMIT)

Memory

The total amount of memory installed on the system (TOTAL_REAL_MEM)

MEM USED

The amount of memory used in bytes (MEM_USED)

MEM REAL USED *

The total memory excluding file system cache and buffers in use on UNIX systems during the interval (MEM_REAL_USED) 

MEM Request

Amount of memory that will be guaranteed (MEM_REQUEST)

MEM Limit

The maximum amount of physical memory that can be allocated (MEM_KLIMIT)

 The picture below is an example of Capacity tab.

worddav49a70e8ea8c2bac8f1d4df113e20b5d2.png

CPU

The CPU tab presents the most important metrics related to the CPU. The column definition is defined in the table below.

*: CView columns will be blank or show N/A if the entity is collected from a Heapster data source. The data is available from Prometheus.

Column Name

Meaning and Calculation

Cluster

Name of the cluster

Node

Name of the node

Namespace

Name of the namespace

Controller

Name of the controller

Pod

Name of the pod

Status

Resource capacity status

Days To Saturation

Days before the resource is saturated

CPU #

Number of CPUs (CPU_NUM)

CPU USED

Number of cores that is used (CPU_USED_NUM)

CPU Request

Amount of CPU (cores) that will be allowed to use (CPU_REQEST)

CPU Limit

Amount of CPU (cores) that will be allowed to use (CPU_LIMIT)

Quota Request

Quota set for the CPU request resource (CPU_REQUEST_MAX)

Quota Limit

Quota set for the CPU limit resource (CPU_LIMIT_MAX)

CPU USED (mCores)

Number of millicores that is used (CPU_USED_NUM)

CPU Request (mCores)

Amount of CPU (millicores ) that will be allowed to use (CPU_REQEST)

CPU Limit (mCores)

Amount of CPU (millicores ) that will be allowed to use (CPU_LIMIT)

Quota Request (mCores)

Quota set for the CPU request resource (CPU_REQUEST_MAX)

Quota Limit (mCores)

Quota set for the CPU limit resource (CPU_LIMIT_MAX)

CPU Used vs Request[%]

Percentage of CPU used with respect to the CPU request, calculated from metric CPU_UTIL_REQUEST

CPU Used vs Cap [%]

Percentage of CPUs actually used with respect to CPU capacity (number of CPUs), calculated as CPU_USED_NUM/CPU_NUM

CPU Request vs Cap [%]

Percentage of CPUs requested with respect to CPU capacity (number of CPUs), calculated as CPU_REQUEST/CPU_NUM

CPU Request vs Quota [%]

Percentage of CPU request with respect to the namespace CPU request quota, calculated as CPU_REQUEST/CPU_REQUEST_MAX

CPU Limit vs Quota [%]

Percentage of CPU limit with respect to the CPU limit quota, calculated as CPU_LIMIT/CPU_LIMIT_MAX

CPU Used vs Limit [%]

Percentage of CPU used with respect to the CPU limit, calculated from metric CPU_UTIL_LIMIT

CPU Request vs Allocatable [%] *

Percentage of CPU Request respect to the CPU Allocatable, calculated from the metric CPU_REQUEST_ALLOCATABLE 

CPU Overcommitment [%]

Percentage of CPU limit with respect to the CPU capacity (number of CPUs), calculated as CPU_LIMIT/CPU_NUM

Bottleneck

First resource to saturate

The picture below is an example of CPU tab.

worddav1ab322995cfc9299fcfbc0c9720f1f0b.png

Memory

The Memory tabs presents the most important metrics related to the memory. The column definition is defined in the table below.

*: CView columns will be blank or show N/A if the entity is collected from a Heapster data source. The data is available from Prometheus.

Column name

Meaning and Calculation

Cluster

Name of the Cluster

Node

Name of the Node

Namespace

Name of the Namespace

Controller

Name of the controller

Pod

Name of the pod

Status

Indicator of the resource's status

Days To Saturation

Days before the saturation of the physical resources

Memory

The total amount of memory installed on the system (TOTAL_REAL_MEM)

MEM USED

The amount of memory used in bytes (MEM_USED)

MEM REAL USED *

The total memory excluding file system cache and buffers in use on UNIX systems during the interval (MEM_REAL_USED)  

MEM Request

Amount of memory that will be guaranteed (MEM_REQUEST)

MEM Limit

The maximum amount of physical memory that can be allocated (MEM_KLIMIT)

Quota Request

Amount of memory quota request (MEM_REQUEST_MAX)

Quota Limit

Amount of memory quota limit (MEM_LIMIT_MAX)

Mem Used vs Request [%]

Percentage of memory used with respect to memory request, calculated from metric mem_util_request (MEM_USED/MEM_REQUEST)

Mem Used vs Cap [%]

Percentage of memory actually used with respect to memory capacity (total memory), calculated as MEM_USED/TOTAL_REAL_MEM

Mem Request vs Cap [%]

Percentage of memory requested with respect to memory capacity (total memory), calculated as MEM_REQUEST/TOTAL_REAL_MEM

Mem Request vs Quota [%]

Percentage of memory request with respect to the namespace memory request quota, calculated as MEM_REQUEST/MEM_REQUEST_MAX

Mem Limit vs Quota [%]

Percentage of memory limit with respect to the namespace memory limit quota, calculated as MEM_LIMIT/MEM_LIMIT_MAX

Mem Used vs Limit [%]

Percentage of memory used with respect to the memory limit, calculated from metric mem_util_limit (MEM_USED / MEM_KLIMIT)

Mem Overcommitment [%]

Percentage of memory limit with respect to memory capacity (total memory), calculated as MEM_LIMIT/TOTAL_REAL_MEM

Mem Real Used Vs Cap [%]*

Percentage of memory actually used with respect to memory capacity (total memory), calculated as MEM_REAL_USED/TOTAL_REAL_MEM 

Mem Real Used Vs Request [%] *

Percentage of memory used with respect to memory request, calculated as MEM_REAL_USED/MEM_REQUEST 

Memory Request vs Allocatable [%]*

Percentage of memory request respect to the memory allocatable, calulated from the metric MEM_REQUEST_ALLOCATABLE 

Bottleneck

First resource to saturate

The picture below is an example of Memory tab.

worddav780ee4a96733c2aa6a50328a9b559f14.png

Storage

Since in Kubernetes the persistent volumes can be requested only from by the Cluster, the cluster tab is the only one with this subsection. Since the persistent volume management is still at the early stages of the developments by Kubernetes, the only relevant information available up to now are the following.

Column name

Meaning

Cluster

Name of the Cluster

Number of PV

Number of persistent volumes 

PV Capacity [GB]

Storage capacity aggregated across all of the configured persistent volumes calulated from the metric ST_SIZE

PV Allocated [GB]

Storage allocated space aggregated across all of the configured persistent volumes calulated from the metric ST_ALLOCATED

PV Free [GB]

Storage free space aggregated across all of the configured persistent volumes Calculated as ST_SIZE-ST_ALLCOATED

PV Allocated vs Capacity [%]

Percentage of storage allocated with respect to capacity calulated from the metric ST_ALLOCATED/ST_SIZE

The picture below is an example of Storage tab.

worddavf644e24fdc398cd56917f0462da422cd.png

Details page

Clicking on the name of one resource you will arrive in the Details page. In this dashboard, as shown in the picture below, you can find the time charts of the most important metrics, related to the system take into account and a table with the configuration metrics.
worddavc36048f39c7db7867cd49795dd2257b6.png

worddavd2867b0334e297c248662399cce7570c.png

For more details on how the TrueSight Capacity Optimization metrics map to the Kubernetes ones and how the derived metrics are computed, please refer to the Kubernetes integrator documentation.


Thresholds, Status & Bottleneck

The table below reports the thresholds used in the view.
The column Resource is defined as: "Resource taken into account: status"; the defined resources are:

  • CPU: all the metrics related to the CPU (ex. CPU utilization or CPU Request over CPU Number)
  • MEM: all the metrics related to the memory (ex. memory utilization or memory Request over memory capacity)
  • SAT: days before saturation of the resource
  • POD: percentage of pods required over pod creation limit (this limit depends from the Kubernetes version installed)

Resource

Cluster

Node

Namespace

Controller

Pod (Pod Workload)

CPU: OK

< 80%

< 80%

< 80%

< 80%

< 80%

CPU: ALERT

> 90%

> 90%

> 90%

> 90%

> 90%

MEM: OK

< 75%

< 75%

< 75%

< 75%

< 75%

MEM: ALERT

> 85%

> 85%

> 85%

> 85%

> 85%

SAT: OK

> 90 days

> 90 days

> 90 days

> 90 days

> 90 days

SAT: ALERT

< 30 days

< 30 days

< 30 days

< 30 days

< 30 days

POD: OK

< 70%

< 70%

< 70%

< 70%

< 70%

POD: ALERT

> 80%

> 80%

> 80%

> 80%

> 80%

 The status is evaluated as the worst status among the following resources and the bottleneck is the resource that will reach saturation first (or if it is already saturated).

Please consider that if the forecasted days to saturation is higher than 90 days, the resource is considered as "Not Saturated"

Resource

Bottleneck

CPU/Memory Utilization vs Capacity or CPU/Memory Utilization vs Limit

CPU/MEM:USED

CPU/Memory Request vs Capacity or CPU/Memory Request on Quota

CPU/MEM:REQUEST

CPU/Memory Limit vs Capacity or CPU/Memory Limit on Quota

CPU/MEM:LIMIT

Pod number vs Max Number of Pod

POD:MAX

Please note CPU/Memory Used on Request metrics are not considered in the status and bottleneck evaluation since used is allowed to be greater than requested, hence they can go above 100%.

Materializer Task

The Kubernetes view takes advantage of  TrueSight Capacity Optimization materialized data marts in order to enable faster page loading times during user browsing, by pre-computing the underlying data. To achieve this result, a proper DataMartMaterializer task is deployed as part of the view installation process. By default, the task is scheduled to run every day at midnight. Please modify the scheduling parameters to fit the environment needs, such as:

  • Data loading scheduling and warehouse latency
  • Multiple refresh in the same day
  • Kubernetes View user need

The Kubernetes View materializer is shown in the next figure and can be found under System Tasks.

worddavd96ad9574e9d48b4a523350d8744c970.png


 

Tip: For faster searching, add an asterisk to the end of your partial query. Example: cert*