Cloud operations
Cloud Operations is about running and optimizing your hybrid cloud to deliver superior service by understanding the current state of your resources and how they will change going forward to meet future needs. While often perceived to be temporary, most cloud services live for weeks, months, or even years — and will continue to grow as shared resources increasingly support traditional IT workloads. When you add up all the operating systems, middleware, tooling, applications, and now hypervisors in the average IT environment, this management task can overwhelm many organizations with ongoing repair and maintenance.
Once a cloud has been designed and is up and running, it is important to optimize it to deliver quality service to each of its users. To do so, IT should proactively monitor for performance issues and maintain optimal use of capacity resources. Three key components are required:
- Service level enforcement— The way to make sure a cloud provider is addressing a customer's business requirements and risks is through an actively managed service level agreement (SLA).
- Proactive performance management— IT can proactively manage performance across your public and private cloud infrastructures through predictive analytics for performance monitoring and identification of performance issues.
- Continuous resource optimization — To make the best use of cloud resources, IT must actively manage the capacity of the broad cloud infrastructure and also right-size individual cloud services on an ongoing basis.
Cloud operations overview diagram
Service level enforcement
A hybrid cloud environment, not unlike any other service, needs to meet expectations. An SLA (service level agreement) is an explicit agreement regarding the level of service offered by a provider to a customer — measured in uptime, response time of the application, or other metrics important to measuring the quality of the service. Different organizations have different requirements based on specific criteria, such as as their size and industry.
In a traditional computing environment, users can track performance on their own dedicated hardware and software stack. In a highly dynamic cloud environment, however, the game changes. In the cloud, service levels are still going to be critical to delivering business value — only the tools with which they can be managed have changed. Traditional levers continue to exist, such as administrator response time. However, now, it is easier to pull other levers, such as adding capacity, moving a workload from one location to another, and reconfiguring networks — all of which can happen without interacting with the physical systems.
Given a service level, the resources of the cloud should not only align to meet that service level, but also, must be flexible enough to change over time. To ensure service levels are adequate, performance management tools must be in place. Without them, it is hard to tell whether the performance components of SLAs are met. The performance management of each cloud service should be married with automated remediation through capacity management and other mechanisms to address major causes of service level failures. Further, as workloads are increasingly sent to public cloud environments, performance management and service level management must be equally proficient at identifying issues in clouds hosted by third parties.
BMC offers solutions for service level enforcement to ensure the business is receiving the agreed-upon service levels. BMC Atrium Service Level Management, which spans physical, virtual, and cloud environments, measures and maintains consistent, quality services to users.
Key activities:
- Actively measure and manage service levels
- Automate remediation
Proactive performance management
Once your hybrid cloud is up and running, IT will want to ensure that the specified performance requirements are met according to the service levels for both private cloud service performance and externally-hosted services. Any potential errors and barriers to performance quality, such as issues with uptime or response time, need to be proactively identified and handled.
The business relies on IT to deliver high quality of service on key applications, optimize the end-user experience, improve application performance and availability, and meet service level commitments. As your IT environment transitions to a hybrid data center, including both virtualized and cloud technologies, your monitoring and event management processes need to be updated to provide proactive and automated detection, isolation, prioritization, diagnosis, and resolution of end-to-end performance and availability issues related to dynamically changing business services.
An integrated performance, availability, event, and impact management solution should be designed specifically to manage high volumes of business service data and events collected across multiple platforms, vendors, and sources; to include components that are managed, but not owned, by IT. By updating monitoring in anticipation of expanding into the cloud, operational costs are lowered, operators are not overwhelmed, users see a seamless transition, and business services thrive — all while IT's responsiveness and ability to meet business demands increases.
BMC provides solutions to meet these needs. TrueSight Operations Management and BMC ProactiveNet Performance Management monitor behaviors and policies for exceptions in all attributes of a cloud service, such as elements, transactions, and users. They deliver predicted irregularities, thus enabling you to remediate issues before service — and your business — is impacted.
Key activities:
- Monitor and manage private cloud service performance
- Monitor and manage performance of externally-hosted services
Continuous resource optimization
Organizations in every market segment require IT to continuously anticipate and meet the changing capacity needs of the business, while also ensuring optimal performance and cost. Companies are adopting continuous, business-aware capacity management as a new approach to IT resource optimization.
As organizations begin to move more and more of their infrastructure into a virtualized, shared-services model, two things happen to capacity:
First, the underlying physical hardware, such as the computer, network, and storage, is shared across more — and more complex — systems. As utilization rates increase, so do the complexity and dynamic nature of the environment, introducing more interdependencies and ongoing change among the various components. As a result, any given component has the potential to impact many others, thus driving the need for active management and modeling of its capacity. In addition, in a cloud environment, the provisioning placement engine needs accurate and up-to-date visibility into current resource capacity in order to properly decide where and how to place services.
Second, and perhaps more important, is the need to provide a way for IT to view its operations from a business perspective. Effective resource optimization balances cost against capacity, and supply against demand, to ensure that sufficient IT resources are available to meet current and future business requirements. New tools providing a high degree of automation and integration are required. These tools should combine flexible visualization, automated exception-based analysis and reporting, and a wide range of planning capabilities to provide a comprehensive solution for ensuring cost-effective, optimal business service performance and alignment.
BMC offers solutions that work hand in hand to provide continuous resource optimization, delivering business-aware capacity planning for modern data centers comprised of physical, virtual, and cloud technologies. BMC Capacity Management plans for immediate and growing capacity requirements, while also ensuring existing capacity is optimized. It works with BMC Cloud Lifecycle Management to build the underlying cloud environment, and with BMC Atrium Discovery and Dependency Mapping to track cloud services and dependencies.
Key activities:
- Continuously audit cloud services
- "Right size" resource pools
- Optimize resource allocation for business priorities